Alternative Splicing

ch_no: 
D6C1

Team

Area under Precision-Recall curve

Total score

Final Ranking

Rhino IPSC

Human ESC

TeamTrinity

0.1898

0.4071

0.5969

1

orangeballs

0.0488

0.2449

0.2937

2

Team #89

0.0258

0.1324

0.1582

3

Team #66

0.0141

0.1431

0.1572

4

Team #112

0.0254

0.1249

0.1503

5

Team #98

0 (no prediction submitted)

0.128

0.1386

6

Teams that participated in the challenge can login to de-anonymize their identity.

Definitions

Gold Standard Generation:

Sub-reads were extracted from raw Pacific Biosciences sequencing data using the SMRT pipeline, and sequences shorter than 100nt were discarded. The sub-reads were mapped with blat to the reference genome (human for HESC and horse for Rhino IPSC). Similarly, Illumina reads were mapped with TopHat to the reference genome.

 

Sub-reads mapped with less than 90% coverage or 75% identity were discarded and only best hits with at least 1% higher identity than the second best hit (if it exists) were used to create Gold Standard transcripts.

 

Because of the numerous indels found in the Pacific Bioscience data, the splice site positions were corrected using the closest spliced Illumina read boundary within 25nt. After correction, exons and introns smaller than 25nt were removed.

 

The corrected sub-reads with at least one overlapping exon were then clustered into genes. For each gene, transcripts were finally defined after merging sub-reads with the same splicing pattern. The orientation of each gene was set to the direction that would produce the highest amount of canonical splice sites.

 

Prediction evaluation

The list of predictions was ordered according to the confidence provided by the participants and resolving ties with the number of reads assigned to each predicted transcript.

Predictions were evaluated using the area under the precision-recall curve using a global alignment strategy as described below. Precision at depth i in the prediction list was obtained by dividing by i the number of predicted transcripts in the first i predictions to which at least a gold standard transcript could be matched with a coverage and an identity of 95% or more Recall at depth i in the predicted list was calculated by dividing the number of gold standard transcripts that could be matched to the first I predicted transcripts with a coverage and an identity of 95% by the total number of transcripts in the gold standard.

The Area under the Precision-Recall curve was integrated until the maximum recall, yielding a value between 0 and 1 for each team and each type of cell studied. The Precision-Recall curve for each team and cell type can be seen in the following figures: Rhino, Human.

image

 image

Total Score

The total score for each team was obtained by adding the Area under the Precision-Recall curve for HESC and Rhino IPSC.

 

Final Ranking

The final ranking was obtained by ordering the teams from the largest to the smallest values of their total scores.