NCI-DREAM Drug Sensitivity Prediction Challenge

 

 

Publication Plan for the NCI-DREAM Drug Sensitivity Prediction Challenge

1)    Nature Biotechnology has agreed to consider for publication two papers, one describing sub-challenge 1 and one describing sub-challenge 2.

2)    Best performer teams will be co-authors of the Nat Bioteech paper reporting on the subchallenge in which their method was the best performer. Other co-authors will be the data providers and the DREAM organizers. All other NCI-DREAM challenge participants will be invited to be collaborators in the paper to which they contributed as part of the "NCI-DREAM consortium". As an example of this type of publication strategy, please see the “Wisdom of Crowds for Robust Gene Network Inference”, Nat Methods, Aug 2012.

3)    We will be asking all members of the NCI-DREAM consortium to provide a high level description of their methods, but the methods will be kept anonymous. (See also the “Wisdom of Crowds” paper.)

4)    We expect to submit the manuscripts to Nature Biotechnology not later than April 2013.

5)    We request that challenge participants refrain from submitting their own manuscripts until the above-mentioned papers are published. We believe this is in everybody’s advantage as publishing pieces of this work elsewhere will jeopardize publication in Nature Biotechnology.

6)    Participants can use the data and results of the challenge to write grants or give talks, with appropriate credit to Joe Gray and Andrea Califano’s lab and the DREAM project.

 

Gold Standards and Scoring Scripts

The gold standards and scoring scripts for Subchallenge 1 can be downloaded from this file.

The gold standards and scoring scripts for Subchallenge 2 can be downloaded from this file.

Note that you must be logged in with your team credentials to access these files.

UPDATES

-9/20/12 - For subchallenge 2, we have added concentration information and titration curves to the FTP site for the 24 and 48 hr time points.

- 9/9/12 - For subchallenge 2, please note the text of the challenge has been updated to reflect clarifications of the challenge, submission format, and scoring.

- 7/5/12 - For subchallenge 1, bam files are available to participants from the Exome Sequencing runs.  Since the data is large (over 1Tb), we ask that you contact the DREAM organizers at with a request for the bam files:dream_admin@googlegroups.com 

-7/18/12 - For subchallenge 1, raw data for dose response, exon microarrays, and SNP6 microarrays have been made available on the download site.  Follow the same instructions as downloading the challenge data. 

 -7/30/12 - For subchallenge 1, please note that the expected submission file has been modified.  We wanted to clarify that the final predictions will be ranking the cell lines in relation to the drugs. Also, we have tasked participants to rank the 18 test cell lines in with the 35 training cell lines to produce a final submission matrix of ranked lists of 53 cell lines by 31 drugs. Please see the challenge description fordetails. The DREAM7_DrugSensitivity1.zip file has been updated on the ftp site.

 

Important Note

The data for this challenge cannot be used for publication without the explicit permission of the data producers (Joe Gray and Andrea Califano), and the DREAM organization. Feel free to contact us about this. 



Challenge Description

Synopsis

The challenge is to use genomic information to build models capable of ranking the sensitivity of cancer cell lines to a set of small molecule compounds or their combinations.

Introduction

Development of new cancer therapeutics currently requires a long and protracted process of experimentation and testing. Human cancer cell lines represent a good model to help identify associations between molecular subtypes, pathways, and drug response. In recent years there have been several efforts to generate genomic profiles of collections of cell lines and to determine their response to panels of candidate therapeutic compounds. These data provide the basis for the development of in silico models of sensitivity based either on the unperturbed genetic potential of a cancer cell, or by using perturbation data to incorporate knowledge of actual cell response. Making predictions from either of these data profiles will be beneficial in identifying single and combinatorial chemotherapeutic response in patients. To that end, the present challenge seeks computational methods, derived from the molecular profiling of cell lines both in a static state and in response to perturbation of a specific drug, that predict the sensitivity of the same or similar compounds in different cell lines. Methodology invoked in this challenge will be useful not only in therapeutic decision-making but also understanding the basic mechanisms of drug mode of action and drug-drug interaction.

Experimental system

Cancers can have many subtypes and a patient’s response to a therapeutic agent depends heavily on the underlying genetic makeup of the cancer being treated. The experimental approach to study drug sensitivity focuses on characterizing the effects of a panel of anticancer drugs on molecular subtypes of a cancer. A few recent landmark studies have been published using this experimental design (Garnett et al. 2012, Barretina et al. 2012, Heiser et al. 2012).

The 2 sub-challenges presented here are based on new datasets that follow an experimental design similar to those reported in the three papers referenced above. For the first sub-challenge, a total of 53 cell lines (48 breast cancer cell lines and 5 non-malignant breast cell lines) were exposed to 31 therapeutic compounds at a concentration needed to inhibit proliferation by 50% after 72 hours (GI50). The drugs include conventional cytotoxic agents (e.g. taxanes, platinum compounds, antrhacylines) and targeted reagents (e.g. hormonal and kinase inhibitors). For each of the 53 cell lines, multiple types of genome scale data were generated before exposure of the cells to the agents (details of these data are described in the “Data” section). The challenge in this case is to link the drug effects to the underlying genetics of the 53 cell lines. The second sub-challenge will be based on experiments that probed molecular information and sensitivity of a Diffuse Large B-cell Lymphoma (DLBCL) cell line (LY3) after the application of 14 therapeutic agents. This sub-challenge consists of predicting the effects of all pair-wise combinations of 14 agents on the LY3 cells.

The Challenge

Sub-challenge 1: Predict the sensitivity of breast cancer cell lines to previously untested compounds

The challenge is to build a model capable of ranking the sensitivity of 18 breast cancer cell lines to 31 compounds. For the purposes of this challenge the identity of the drugs will be kept anonymous. The cell lines for this challenge partially overlaps with the Heiser et al. study, but the 31 compounds are unpublished and separate from those published in the Heiser et al. study. In practice, DREAM participants will be supplied with three pieces of data: 1) the genomic characterization of 53 cell lines, 2) the GI50 concentrations for 31 compounds on 35 cell lines, and 3) a list of the 18 cell lines for which the GI50 concentrations are not given (see Figure 1). The first and second pieces of data are to be used as the training data and the third piece is the test data.  For each one of the 31 compounds, challenge participants are tasked with predicting the rank order of the 18 cell lines in the test set from the most sensitive to the least sensitive in relation to the 35 cell lines in the training dataset. As illustrated in Figure 1, the order of the 35 cell lines in the training set are given and the task is to place the 18 cell lines in the proper order to produce a final ranked list of 53 cell lines for each of the 31 compounds.Evaluation of this challenge will be based on the accuracy of the rank order of the test set cell lines in comparison to ranks of the actual, measured sensitivities.

 
image
Figure 1. Details of sub-challenge 1. Highlighted in red, a collection of 53 breast cancer cell lines has been assembled and extensively profiled by standard genetic, epigenetic, and genomic tests. In addition  GI50  concentrations for 31 new small molecule compounds (not previously published in Heiser et al. (2012)) have been established. The collection of breast cancer cells has been divided into two groups. One group of 35 will serve as the training set for this challenge. Participants will receive all profile information of this group along with complete response data for 31 small molecule chemical probes. The second set of 18 breast cancer cells lines will serve as the test set, and GI50 data for these cells will be withheld. Please note that not all the genomics data was collected for all the cell lines due to technical or quality control reasons. The challenge is to predict the ordering of the test set cell lines in relation to the training set cell lines.  Highlighted in blue, an example case is illustrated for 1 or the 31 drugs.  The training set rank order for drug 1 is given.  Participants are tasked with placing the 18 cell lines in the predicted rank order including the 35 training cell lines, thus producing a rank ordered list of 53 cell lines for each individual drug.


Sub-challenge 2: Predicting compound combinations that have a synergistic effect in reducing viability of a DLBCL cell line

In this sub-challenge participants are asked to predict the activity of pairs of compounds in the DLBCL LY3 cell line, from expression profiles acquired after treatment of the cell line with each of 14 individual compounds. Based on the measured IC20 (20% viability reduction) at 24h induced by each compound in isolation, participants are asked to provide a ranked list of all 91 compound pairs (14x13/2) from the most synergistic to the most antagonistic. Participants are also required to provide the position in the list (i.e., the row number) at which the drug combination produces a purely additive effect where each drug is administered at its IC20 concentration (details in the submission section). The IC20 of the compounds and the synergistic activity has been assessed within the same experiment in triplicate to avoid differences in compound potency or cell line physiology in different experiments.

Each compound tested was titrated in OCI-Ly3 in a twenty-point titration curve.  Viability was determined using CellTiter-Glo (Promega Corporation), at 72 hours post seeding, 60 hours following compound treatment.  Fractional inhibition concentrations were calculated for each compound by interpolation of the titration curve, where the fractional inhibition concentration, ICv, is the concentration needed to inhibit cell growth by v%.  A stock solution of each compound at its experimentally determined IC20 concentration (also called GI20) was created and its activity was again validated to achieve viability reduction close to the expected IC20.  There were two compounds for which 20% viability reduction could not be attained for any achievable concentration, i.e., for which an IC20 concentration could not be effectively determined; in those cases a default concentration of 100mM was used. Each compound combination was then tested at the respective IC20 (or 100mM) concentration of the individual compounds in six replicates (six separate plates run on different days). All compounds and combinations are diluted in 100% DMSO, with 0.4% DMSO in the assay.

 

Details of sub-challenge2

Figure 2. Schematics of sub-challenge 2. Participants will receive the data containing the gene expression profile (GEP) of LY3 cell lines treated with 14 different drugs, at 2 different concentrations and profiled at 3 different time points after drug administration, along with DMSO, and baseline expression in growth media which were also profiled at same time points. All GEPs of drug treated cells are provided in triplicate, GEPs of cells profiled in growth media are provided in duplicate and GEPs of cells in growth media and DMSO were repeated 8 times. The challenge is to use the provided gene expression profiles, SNP profiles, and IC20 values of the drug treatments to predict the order of efficacy of all pairs of drug combinations from the most synergistic to the most antagonistic.

 

The Data

Data Specific to Sub-challenge 1

Participants will be provided with the following genomic characterization data on 53 breast cell lines. For all cell lines, these data were collected before drug treatment. Also, note that not all types of data are collected for every cell line, that is, some of the data types are missing for some cell lines due to technical difficulties. For each data type, much more detail is provided in the Readme file accompanying the provided data.

  1. DNA copy number variation Platform: Affymetrix Genome-Wide Human SNP Array 6. The data file is a tab-delimited text file that contains segmented genome copy number calls for 53 breast cancer cell lines.
  2. Transcript expression values Platform: Affymetrix GeneChip Human Gene 1.0 ST exon array platform. The tab-delimited text file contains gene-level summaries for 53 breast cancer cell lines. Each column represents data from one cell line, each row represents gene expression (in log2 space) for a single gene. Gene identifiers are HUGO gene symbols.
  3. Whole exome sequencing: Mutation status was obtained from exome-capture sequencing (Agilent Sure Select system). Mutations across all cell lines were filtered according to several criteria available in the associated readme file.
  4. RNA sequencing data: Whole transcriptome shotgun sequencing (RNA-seq) was completed on breast cancer cell lines and expression analysis was performed with the ALEXA-seq software package as previously described (Griffith, et al 2010). There are two tab-delimited text files associated with these data: (a) log2 transformed estimates of gene-level expression, and (b) expression status values indicating whether the genes were detected above background level.
  5. DNA methylation data: The Illumina Infinium Human Methylation27 BeadChip Kit was used for the genome-wide detection of 27,578 CpG loci, spanning 14,495 genes (Fackler, et al 2011). We used the GenomeStudio Methylation Module v1.0 software to express the methylation for each CpG locus as a value between 0 (completely unmethylated) to 1 (completely methylated) which is proportional to the degree of methylation at any particular locus.
  6. RPPA protein quantification Reverse phase protein array (RPPA) is an antibody-based method to quantitatively measure protein abundance. RPPA data were generated and pre-processed by the Gordon Mills lab at MD Anderson. Cells were exponentially growing prior to harvest and not subject to any particular treatment. There are two header rows in this data file, indicating the protein name (first header row) and the degree of validation of the antibody used in the assay (second header row). Data derived from antibodies that were not fully validated should be interpreted with caution. The most robust dataset would include only those data from fully validated antibodies. The data represent protein abundance in log2 space.
  7. Drug response data: Participants will be provided with a file containing the GI50 concentration for 31 anonymous compounds on 35 of the 53 cell lines. The GI50 is a measure to assess the efficacy of therapeutic compounds in a cell culture. To estimate the GI50 a series of assays were performed as previously described (Kuo, et al 2009). Briefly, cells were treated for 72 hours with a set of 9 doses of each compound in 1:5 serial dilution. Cell viability was determined using the Cell Titer Glo assay. We used nonlinear least squares to fit the data with a Gompertz curve. The fitted curve was transformed into a GI curve using the method described in ( http://dtp.nci.nih.gov/branches/btb/ivclsp.html ) and previously described (Monks, et al, 1991). We assessed a variety of response measures including the compound concentration required to inhibit growth by 50% (GI50). In cases where the underlying growth data are of high quality, but the GI50was not reached, the values were set to the highest concentration tested. The drug response data was filtered to meet several quality criteria detailed in the readme file. Approximately 80% of the drug plates pass all filtering requirements. 

All the files containing the training data and the cell lines will be contained in the zip file

     DREAM7_DrugSensitivity1.zip

and can be downloaded from the DREAM web site, following the instructions contained in the file at the end of this page (registration is required).

Data Specific to Sub-challenge 2

Participants will be provided with genomic data on the LY3 DLBCL cell line. As opposed to the sub-challenge 1, for which data were collected before drug treatment, in this sub-challenge we provide data collected both before and after treatment with isolated compounds. Also provided are the dose response measurements from which the IC20 of the different compounds were obtained. More detail is provided in the Readme file accompanying the data.

  1. Treated and Untreated Gene Expression Profiles Participants will be provided a set of 252 gene expression profiles corresponding to a set of 14 perturbations (14 compounds in DMSO), at three time points (6h, 12h, and 24h), at two concentrations (one corresponding to the IC20 of the compound at 24 hours and one corresponding to the IC20 at 48 hrs), in triplicate. Also provided will be the gene expression profiles of cells exposed to DMSO in 8 replicates, and the gene expression profiles of cells in growth media in duplicate at all 3 time points. All gene expression profiles were profiled using Affymetrix U219 96 array plate..
  2. LY3 Mutation Data Baseline genetic information for the LY3 cell lines was reported in (Green, 2010) and is provided with the data for this challenge. The data consists of SNP profile (raw dataset and segmentation information), profiled on Affymetrix SNP Array 6.0.
  3. Dose Response Curves Participants will be provided with the drug response data at 24h data from which the IC20 was obtained. IC20 of the compounds and the synergistic activity has been assessed within the same experiment in triplicate to avoid differences in compound potency or cell line physiology in different experiments. To determine IC20, cells were plated into a 96 well assay plate 12 hours before compound addition. If IC20 of any drug is higher than 100uM i.e. could not see decrease in viability in titration curve up to 100uM, then 100uM was set as the concentration for treatment as at higher concentrations, drugs might display toxic effects or start to hit off target.

Expression profiles, mutation data data, drug titration curves at 24 hours and the identity of the compounds used for treatment will be provided in the zip file

    DREAM7_DrugSensitivity2.zip

and can be downloaded from the DREAM web site (registration is required), following contained in the file at the end of this page (registration is required).

Submission of Predictions

You can submit to Sub-challenge 1, or Sub-challenge 2, or both Sub-challenges.  You are not obligated to submit to both Sub-challenges.  Each subchallenge will be scored separately and the best performer in each of the Sub-challenges will be asked to speak at the DREAM conference.  Please note that the incentives (as discussed below) will be awarded to the team that performs best across both Sub-challenges.

Submission to Sub-challenge 1

The following table is provided as file DREAM7_DrugSensitivity1_Predictions.csv in the file DREAM7_DrugSensitivity1.zip. Please replace the placeholder ranks with the predicted ranks (keeping the header row and the first column of cell line names), save it as a comma-separated-value file and submit it as DREAM7_DrugSensitivity1_Predictions_<TeamName>.csv replacing <TeamName> by the name of the team with which you register to the challenge. The lowest ranks (1,2,3,..) correspond to most efficacious compounds to inhibit growth (lower GI50 concentrations). The highest ranks (…,51,52,53) correspond to the weakest compounds in inhibiting growth. Please note that that the GI50 concentrations in the training data are -log10 transformed, thus the lowest ranking (1,2,3...) corresponds to the highest GI50 values and the highest ranks (…,51,52,53) correspond to the lowest GI50 values.

Sub-challenge 1

Compound 1

Compound 2

Compound 31

Cell Line 1

rank of cell line 1 on compound 1

rank of cell line 1 on compound 2

rank of cell line 31 on compound 53

Cell Line 2

rank of cell line 2 on compound 1

rank of cell line 2 on compound 2

rank of cell line 2 on compound 31

Cell Line 53

rank of cell line 53 on compound 1

rank of cell line 53 on compound 2

rank of cell line 31 on compound 53

Within the training data, there are several drugs that show the same GI50 value for more than one cell line.  This is because tehse cell lines never reach 50% growth inhibition within the dosage range tested.   Note the dosage ranges were predetermined to span ~ 5 logs for each compound, with the exact concentrations tailored to each compound.  Where cell lines show this pattern of "flat" GI50 values, simply rank order the cell lines arbitrarily (eg. in alphabetical order). As described in the scoring section, participants will not be penalized for different orderings of cell lines with the same GI50 values. We are collecting this data in hopes that we might be able to learn something of biological relevance, so please attempt to rank the drugs tot he best of your understanding.

Within the training data, there are several "NA" values.  Not all drug/cell line responses were reliably measured due to technical reasons.  Place all cell lines with "NA" values at the end of the list and sort them arbitrarily (eg. in alphabetical order). As with the "flat" profiles, participants will not penalized for the rank order of these cell lines.

Participants are supplied the GI50 values for the 35 cell lines in the training data, thus supplying the rank ordering of these cell lines.  In addition to rank ordering the 18 test cell lines, you must place these 18 predictions in the proper order with the 35 training cell lines.

Submission to Sub-challenge 2

Sub-challenge 2 requires rank ordering all pairwise combination of 14 compounds (i.e., [(14 * 13) / 2] = 91 pairs) from the most synergistic pair to the most antagonistic pair in the Ly3 cell line, based on an assumed IC20 concentration at 72h of the individual compounds. Participants have been provided with the gene expression profiles (microarrays) of Ly3 cell lines treated with each of the 14 DMSO diluted compounds at two distinct concentrations and harvested at 3 different time points (6h, 12h, and 24h) following compound administration. Gene expression profiles of Ly3 cells at the same time points (6h, 12h, and 24h) after administration of DMSO vehicle only (8 replicates per time point) and after placement in growth media (2 replicates per time point) were also provided. The baseline genetic information for the Ly3 cell line and individual compound dose response curves were also provided. Any data may be used to solve the challenge, including data outside of that supplied through DREAM, to rank order the cell lines. The final submission will be a comma separated file (csv) with two columns; the first column should contain the compound pair combination separated by an “&” (e.g. Methotrexate & Blebbistatin) and the second column should contain the rank of that combination, 1 being the most synergistic and 92 being the most antagonistic. The 93rd row (i.e., the last row after the header line and the 91 pairs) should report the first compound pair in the ranked list whose predicted activity is not deemed to have a significant synergistic effect, i.e. excess over Bliss close to 0 (see Scoring Metrics section for the definition of excess over Bliss). This will be used to separate the pairs predicted to be synergistic from those predicted to have additive or antagonistic behavior.

Drug Combination

Rank

Methotrexate & Blebbistatin

N1

Alcinomycin A & Cycloheximide

N2

Geldanamycin & Rapamycin

N3

.

Vincristine & Methotrexate

N90

Monastrol & Cyclohexamide

N91

Compound pair with additive activity (IC36)

X & Y

Please refer to the template file “DREAM7_DrugSensitivity2_Predictions_TeamName.csv” contained in the DREAM7_DrugSensitivity2.zip file.  When submitting your predictions, save it as a comma-separated-value file and replace <TeamName> by the name of your team with which is registered on the DREAM website. To be scored, submissions must provide the predicted rank for all compound pairs.

Write-up

Finally we request that each participating team submits a short write-up (around two to three pages) explaining the methods used to arrive at their predictions for the two subchallenges. This write-up can contain pseudo-code describing the algorithm used, workflows for coming to the prediction of the isoforms, etc. Submit the write-up as the file DREAM7_DrugSensitivity_Writeup_<TeamName>.ext replacing <TeamName>  with the name of your team and the file extension (ext) with your choice of doc or docx. The submission of this write-up is mandatory for participation in this challenge.

Scoring Metrics

Each sub-challenge will be scored independently. There is no obligation to submit solutions to the 2 sub-challenges for the submissions to be scored. However the incentive prize will be awarded to the best performing team that submitted solutions to the two sub-challenges. Each sub-challenge submission corresponds to either the prediction of the ranks of the sensitivities of cells for each of 31 compounds, or the prediction of the rank of the activity of the combinations of compounds for a cell line. The unit of scoring will therefore be the predicted ranking against the experimentally identified ranking (gold standard). The challenges will be scored using metrics such as the Spearman correlation, concordance index, etc.

For Sub-challenge 1, the scoring will base primarily on the ranking of the 18 test cell lines and if tied scores need to be broken, we will then consider the ranking of the 18 test cell lines in relation to the 35 training cell lines.  From replicate experiments run, we have been able to estimate the GI50 variance ofr a drug.  When two cell lines have GI50 values that are within the standard deviation, there is a probability associated with the ranking of one cell line before another. Essentially, there is not one correct and absolute ranking of cell lines, but a distribution of rankings each with an associated probability.  Accordingly, predictions will be scored in relation to the distribution of rankings, which will be done for each drug individually.  A participant's overall score for Sub-challenge 1 will be a summary statistic across all drugs. In order for us to include a drug score in the overall score, we need to be able to separate a random ranking of cell lines from the experimentally determined ranking with statistical signficance.  For example, we will not be able to seaparate the true ranking from random predictions on the "flat" profiles or the cell lines with "NA" values, so these drugs will be excluded from the overall score.

For Sub-challenge 2, the scoring will be done according to a measure of synergy, which is in this case will be defined as Bliss additivisim (or Bliss independence) [12].  Let us assume that the experimentally measured cell growth fractional inhibition induced by the stock solution of compounds X and Y is x and y, respectively, and that the fractional inhibition induced by their combination is z, where x, y, and z are in the range [0, 1], with 0 indicating no inhibition and 1 indicating 100% growth inhibition. Then, the Bliss additivism (or Bliss independence) model predicts that, if Compound X and Compound Y have an additive rather than synergistic effect, the expected fractional inhibition c of the combination X and Y is defined as:

c = 1 - (1 - x) * (1 - y) = x+ yx* y.

For instance, assume that compound X produces an 18% fractional inhibition and compound Y a 24%, as experimentally measured. Then c = 0.24+0.18 – 0.24*0.18 = 0.38.

The excess over Bliss is  determined by subtracting the expected fractional inhibition expected in the additive case from its experimentally determined value z:

Δ = zc.

Thus, from the previous example, if the fractional inhibition for the combination was experimentally determined to be z = 0.54, the excess over Bliss would be D = 0.54 – 0.38 = 0.16. Compound pairs for which Δ ≈ 0 have an additive behavior, whereas compound pairs with positive (or negative) Δ values have synergistic (or antagonistic) behavior. Since all experiments are performed in six replicates, the standard deviation sΔ and error on the mean can also be estimated for each compound and compound pair. Compound pairs are then ranked based on the Δ values as described in the following section.

Timeline

Deadline for submission is 5PM EST October 1st, 2012. Results will be announced no later than October 21st.


Incentives

While this challenge is primarily an academic exercise, the NCI intends, in conjunction with the selected participants, to support the subsequent experimental validation and development of the top performing models. Travel expenses of the best performing teams to the DREAM meeting will be provided. The best performing team in this challenge will be offered the opportunity to publish their results in Nature Biotechnology as a "prize" for outstanding performance. Publication is contingent on the outcome of a peer review process, which will take into consideration that the work achieved best performance in a blind challenge.

Credits

The data was kindly provided pre-publication by the labs of Prof. Joe Gray (sub-challenge 1) and Prof. Andrea Califano (sub-challenge 2). The general idea of the challenge was conceived in a workshop co-organized by NCI and DREAM. We acknowledge the contributions of all participants in that Summit. Laura Heiser, Mukesh Bansal, James Costello, Julio Saez-Rodriguez, Michael Menden, Charles Karan, Gustavo Stolovitzky, Joe Gray, Andrea Califano, Dan Gallahan and Dinah Singer curated the final version of the challenge.

References

  1. Heiser, L. M. et al. Subtype and pathway specific responses to anticancer compounds in breast cancer. Proc Natl Acad Sci USA 109, 2724–2729 (2012).
  2. Barretina, J. et al. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483, 603–307 (2012).
  3. Garnett, M. J. et al. Systematic identification of genomic markers of drug sensitivity in cancer cells. Nature 483, 570–575 (2012).
  4. Lamb, J. The Connectivity Map: a new tool for biomedical research. Nat Rev Cancer 7, 54–60 (2007).
  5. Bengtsson H, Irizarry R, Carvalho B, & Speed TP. Estimation and assessment of raw copy numbers at the single locus level. Bioinformatics (Oxford, England) 24(6):759-767 (2008).
  6. Bengtsson H, Wirapati P, & Speed TP. A single-array preprocessing method for estimating full-resolution raw copy numbers from all Affymetrix genotyping arrays including GenomeWideSNP 5 & 6. Bioinformatics 25(17):2149-2156 (2009).
  7. Venkatraman ES & Olshen AB. A faster circular binary segmentation algorithm for the analysis of array CGH data. Bioinformatics 23(6):657-663 (2007).
  8. Griffith, M. et al. Alternative expression analysis by RNA sequencing. Nat Methods 7, 843-847 (2010).
  9.  Fackler, M. J. et al. Genome-Wide Methylation Analysis Identifies Genes Specific to Breast Cancer Hormone Receptor Status and Risk of Recurrence. Cancer Research, CAN-11-1630 (2011).
  10. Kuo, W. L. et al. A systems analysis of the chemosensitivity of breast cancer cells to the polyamine analogue PG-11047. BMC Med 7, 77, (2009).
  11. Green, M.R. et al. Integrative analysis reveals selective 9p24.1 amplification, increased PD-1 ligand expression, and further induction via JAK2 in nodular sclerosing Hodgkin lymphoma and primary mediastinal large B-cell lymphoma. Blood 116(17):3268-3277, (2010).
  12. Borisy, A.A., et al., Systematic discovery of multicomponent therapeutics. Proceedings of the National Academy of Sciences 100(13):7977-7982, (2003).