Alternative splicing challenge: Submission of predictions

 We have a few questions regarding the general nature of the submission we should make.

1. There is a 64MB limit on submissions, which our analysis suggests may be smaller than the total size of the transcriptome.  In light of this, what sorts of transcript predictions should we be prioritizing?  Should we try to predict many isoforms of a subset of transcripts (e.g., the longest ones) or a few high-confidence isoforms of a broader range of transcripts?

2. Roughly how should we be trading off specificity and sensitivity?  Is it advisable to predict isoforms up to the 64 MB limit (and just indicate the lower-confidence ones as such)?

3. According to the challenge description, submissions will be scored against reads from the Pacific-BioSciences SMRT sequencer, which can generate read lengths between 1Kb and 2Kb.  Does this mean we should focus on predicting transcripts of length at least 1Kb?

4. The reads in the strand-specific paired end files *_1.fastq and *_2.fastq are oriented in opposite directions.  Should we make predictions in the orientation of the _1 reads or the _2 reads.  (This is just a technical point, since way is simply the reverse complement of the other.)

We understand that scoring this challenge is going to be complicated, but any general direction on these points would be helpful.  Thanks!

Po-Ru