DREAM6 Estimation of Model Parameters Challenge
|
Submission is closed |
Synopsis
Given the complete model structures (including expressions for the kinetic rate laws) for three gene regulatory networks, participants are asked to develop and/or apply optimization methods, including the selection of the most informative experiments, to accurately estimate parameters and predict outcomes of perturbations in Systems Biology models.
Background
Much of the recent focus on reverse engineering of gene networks has been on the identification of causal interaction topologies between genes (1, 2). Indeed that was the focus of several challenges in past editions of DREAM (3, 4). Such interactions are usually inferred from high-dimensional gene expression experiments involving different perturbations to the network of interest. The current challenge aims to move one logical step beyond the task of network topology reconstruction. Once network interaction topologies are characterized with some reasonable level of confidence, how do we characterize the actual kinetics of these networks in a way that accurately reflects the causal relationships implied in the proposed topology?
In addressing this question, two key areas that require attention are the tasks of estimating model parameters from data in the case of a known (assumed) model structure, and designing the most informative experiments for parameter inference. These reverse engineering tasks provides the area of focus for the current challenge, namely the Estimation of Model Parameters Challenge.
The desired outcome for the Estimation of Model Parameters Challenge is the development, improvement and application of optimization methods (for model parameter estimation) and experimental design that are particularly well-suited for models of complex (highly-parameterized) systems and for which data are limited, often quite noisy and experiments need to be designed iterativly with the model building process. Despite the fact that these conditions are commonly encountered in biological systems modeling, the problem of parameter estimation and iterative experimental design remains one of the hardest challenges in systems biology (5-8).
The Challenge
Participating teams are provided the full regulatory interaction network topology for three gene networks with 6, 7 and 9 genes. Since mRNA and protein are explicitly modeled, there are 12, 14, and 18 relevant variables, respectively.
Description of gene network models
The models provided for this challenge assume linear kinetics for mRNA degradation and protein synthesis (translation) and degradation. In most cases, mRNA synthesis (transcription) is modeled as a nonlinear (Hill-type) function of 1 or 2 regulatory inputs (i.e. transcription factors), each of which is either an activating or an inhibitory input. In the case where there is no regulatory input to a gene, a basal (constant) rate of transcription is assumed. The detailed functional forms of these kinetic laws are specified below.
Data Used for Parameter Estimation
In addition to network topologies and mathematical models, a limited amount of time-course data is initially provided (start-up data. These data can be used in the early stages of parameter estimation. However, to reflect conditions that exist in actual scientific practice, teams also have the opportunity to ‘purchase’ additional experimental data sets from a menu of perturbations and experimental modalities of their choice.
The data sets to be used as input for the parameter estimation have been generated (by simulation) in reponse to different network perturbations that include gene deletion, siRNA-mediated knockdown and change of RBS (ribosomal binding site) activity. Refer to the section “Allotted Budget and Price List for Perturbation and Measurement Experiments” for a listing of available experiments and their costs. For each of these types of perturbation experiments, teams may purchase data for either mRNA (for all genes) or protein abundance (for 2 proteins of their choice). Prices vary for each of these, reflecting the relative ease or difficulty of acquiring this type of data in reality.
Parameters to be Estimated
All mRNA degradation rate constants in all three models have been fixed to a value of 1 and therefore will not be estimated by participating groups. The unit of time is normalized with the inverse of the mRNA degradation rate, and therefore time is non-dimensional, and measured in units of the mRNA half-life. Similarly, the values of all protein degradation rate constants in the original models are identical, but their value is unknown and must be estimated from data as part of the challenge. All other model parameters will likewise be estimated from data by the challenge participants. For each transcriptional regulation process (activation and repression), there are two parameters to be estimated: Kd and h (to b discussed below). For each protein production process, there are two parameters that would need to be estimated: the promoter strength and the ribosomal binding site strength. Thetranscription rate for a given gene is assumed to be proportional to promoter strength of the corresponding promoter and the translation rate is assumed to be proportional to the ribosomal binding site strength.
Allotted Budget and Price List for Perturbation and Measurement Experiments
The allotted budget for each team will be 10,000 credits per model. The cost breakdown for available perturbations and experiments is summarized in Table 1, 2 and 3. All time-course data will be provided in concentration units (nM) and time units are dimensionless.
Perturbation Experiments
When requesting data, teams will specify the type of perturbation and the identity of the gene to be targeted. Three types of perturbation experiments are possible as summarized in Table 1. In all cases, only a single gene can be targeted at a time for perturbation. The deletion of a gene results in a complete elimination of both mRNA and protein production for the targeted gene. In the case of siRNA-mediated knockdown, mRNA degradation is increased by 5 fold resulting in a decrease in both mRNA and protein concentrations. Finally, change of RBS activity results in modification of protein expression for the targeted gene without affecting mRNA concentration.
| Perturbation Experiment |
Cost |
| Gene deletion to eliminate both mRNA and protein production for a specified gene |
800 |
| mRNA knockdown using siRNA to achieve a 5-fold increase in mRNA degradation |
350 |
| Increase of RBS activity by 100% (i.e. double) to increase translation rate |
450 |
Computational Experiments
In addition to specifying a perturbation type (including the target gene), teams will be asked to specify what is measured inresponse to the perturbation. Time courses of all mRNA concentrations may be obtained by a microarray experiment at two temporal sampling resolutions. In the low resolution sampling rate, data was recorded every 4x time units. For the higher resolution, the sampling rate was twice as fast (i.e., every 2x time units). The highest-resolution of one point every x time units (the time unit x=0.5), may be obtained for protein concentrations time traces (as would be measured from a fluorescence microscope experiment), although this data type is limited to only 2 proteins at a time (to be specified by the group requesting the data) for a given experiment.
| Measurement Experiment | Output | Cost |
| Microarray | Time-courses of all mRNAs | 500 (low resolution: 1 data point every 4x time units) 1000 (higher resolution: 1 data point every 2x time units) |
|
Fluorescent protein fusion (only 2 proteins available at a time) |
Time-courses of protein abundance for 2 chosen proteins | 400 (highest resolution: 1 data point every x time units) |
Determination of Kdand h
There is also the option to buy estimated values of the dissociation constant (Kd) and Hill coefficient (h) for a specified protein (transcription factor), as would be obtained from a gel-shift experiment. This is more expensive and can only be measured for a single transcription factor per experiment.
|
Experiment |
Output |
Cost |
|
Gel-shift assay |
Binding affinity (Kd) and Hill coefficient (h) of any one transcription factor at a time |
1600 |
Models and Data
Full model topologies are provided in graphical form as shown in the example shown in Fig. 1, which corresponds to one of the three networks used in this challenge.

Figure 1A. One of the models (model 1) whose parameters and response to perturbations are requested as part of this challenge.

Figure 1B. key to the symbols and notations used in Fig. 1A and in the other models of this challenge.
Variables are labeled according to their type. For example, variables in the model representing protein concentrations are labeled as p1, p2, ... p6. The meaning of each symbol is detailed in the key to the symbols (Fig. 1B), which also lists the prefix that is used to name the model variables. Each line connecting a protein coding sequence to a protein represents a protein production process and is labeled with the prefix "pp". Protein production consists of two steps: transcription and translation. For simplicity, these two steps, shown in Fig. 2, are not explicitly shown in the diagram of Fig. 1A. The variable names for the mRNA resulting from the transcription of a coding sequence will contain the "pp" prefix. For example, the mRNA corresponding to the pp3 will be labeled pp3_mrna.

Figure 2. Detailed representation of the protein production process.
The same transcription factor can regulate the transcription of one or more genes (e.g., p4 regulates cod5 and cod6 through repressor sites rs3 and rs5 respectively, see Fig. 1). In these situations, the transcriptional control of a transcription factor on different genes could be different if the corresponding parameters are different (e.d., rs3 has parameters v6 and rs5 has parameters v7, see Fig 1). The translation rate can also be different if the ribosomal binding site strengths differ.
The transcription process is modeled using a single rate equation. The rate equation is expressed as a sum of the transcriptional activity ( as) of all the activators in that promoter region multiplied by the product of the transcriptional activity ( rs ) of all the repressor binding sites in the same region. For example the transcription rate of pp4_mrna in Fig. 3 will be modeled as pro4_strength * as4 * rs2 , where

Figure 3. Example of a case of regulation of the transcription of coding sequence cod4.
Also referring to the example of Figure 3, the rate of production of protein "p4" is given by linear rate equation rbs4_strength * pp4_mrna.If there are two transcription factor binding sites that are activating, then a sum is used rather than a product. For example, in the diagram of Fig. 4, the transcription rate of pp3_mrna is modeled as pro3_strength *(as1 + as3).

Figure 4. An instance in which two activation sites regulate the transcription of a coding sequence.
For the complete set of odes for the model of Figure 1, please follow this link.
Models
In addition to image files for the 3 networks (in ‘.jpg’ format), participants can download different mathematical description for the models in four formats: SBML or Systems Biology Markup Language (extension ‘.xml’), Matlab (extension ‘.m’), TinkerCell (extension ‘.tic’) (9) and Antimony (a Jarnac-like script with extension ‘.txt’) (10). The corresponding files are named as:
- model<i>.<ext>
where <i> = 1, 2 or 3 and <ext> = m, sbml, tic, txt or jpg.
TinkerCell was used to design the networks and to generate the network images. It is available for free download here: http://www.tinkercell.com/downloads-2. Antimony scripts may be opened, edited and converted to SBML using an Antimony editor available for download here: http://antimony.sourceforge.net/main.html). Also available for free is the Systems Biology Workbench (SBW), which may be downloaded from the following location: (http://sourceforge.net/projects/jdesigner/files/Systems%20Biology%20Workbench/2.8.1/). Using a translator tool provided by SBW, SBML files may be converted to other formats (including C/C++, C#, Java and Mathematica). Simply open the SBML file in the translator tool, select the desired output format and save the translated model description. Note that SBML files can be loaded into any compatible tool such as COPASI, CellDesigner, SBML for Matlab, etc. SBML files can also be converted to human readable reports (latex files, pdf, etc.) using SBML2LATEX, a conversion tool available online at http://webservices.cs.uni-tuebingen.de/. The resulting files contain, among other things, the list of parameters, the kinetic rules and the basic ODEs. The resulting model report for model 1 can be accessed through this link
Data
Initially all participants will receive a start-up data with the time course data with the dynamics of all the mRNA species for each model. The corresponding simulations initial conditions were set as follows: For model 1 and 2: at t=0, all the mRNA species are set to 0, and all the protein concentrations are set to 1. For model 3: at t=0, all the mRNA species and all the protein concentrations (except for p1) are set to 0, while p1 is set to 5. The data will be contained in the file:
-
array_mod_i_wildtype.tab
where i refers to the model number and can be 1, 2 or 3.
A web interface is under development that permits easy access to data. This interface will be available in early June 2011. Users need to specify the type of perturbation to be performed in their model, and the type of data they wish to acquire (RNA microarray and/or protein concentration). Credits will automatically be deducted from the participating team’s account according to the cost break-down summarized in the previous section (“Allotted Budget and Price List for Perturbation and Measurement Experiments”). The names of the resulting files will follow the nomenclature of Table 4. For example, if a team requests the perturbation “deletion of gene 3” and the data type “microarray” in high sampling rate, the system will offer the user to download the file “array_del_3_high.tab”. The gel-shift experiments will provide the values for the parameters Kd and h for the indicated activation and repressor sites.
|
Filename extension for all files = .tab |
Deletion of gene g |
Downregulation of gene g |
Overexp of protein p |
Wild Type |
|
Microarray |
array_mod_i _del_g_<res> |
array_mod_i_dwn_g _<res> |
array_mod_i_over_p _<res> |
array_mod_i_wildtype _<res> |
|
Fluorescence of proteins j and k |
pj_pk_mod_i _del_g |
pj_pk_mod_i_dwn_g |
pj_pk__mod_i_over_p |
pj_pk_mod_i_wildtype |
|
Gel-shift assay for activation site as_j orrepressor site rs_j |
N/A |
N/A |
N/A |
model_i_asj or model_i_rsj |
Formats, Initial conditions and noise
The microarray files (named ‘array_*.tab’) are tab delimited files containing the time traces of the mRNA abundances for all the transcripts in a given model, at the times indicated in the first column. The protein-pair files (named ‘pj_pk_*.tab’) are tab delimited files containing the time traces of the proteins pj and pk abundances in a given model, at the times indicated in the first column. In both data modalities, the initial conditions for the time traces are as follows. For model 1 and 2: at t=0, all the mRNA species are set to 0, and all the protein concentrations are set to 1. For model 3: at t=0, all the mRNA species and all the protein concentrations (except for p1) are set to 0, while p1 is set to 5. The initial conditions are the same even in the case of perturbations to the system. For example, if gene 3 is deleted in model 1, its corresponding protein p1 will still have an initial condition of p3=1 at t=0. Also in both data modalities, a noisy measurement is simulated by adding some noise to the deterministic value of each variable. More precisely, if v is the simulated value, we report as measured the value
vnoisy = max[0,v + 0.1*g1 + C*g2*v] ,
where g1 and g2 are Gaussian random variables with standard deviation of 1, and C= 0.2 for both the mRNA measurements and for the protein measurements. (NOTE: in an earlier version of this description, we incorrectly stated that C=0.1 for proteins. However, the scoring function will not be changed from what was stated originally because the original scoring function may have been used in the strategy to purchase data. ) That is, for small v the stdev of the noise is close to 0.1, while for large value of v, the noisiy v has a coefficient of variation C of 20% in both the measurements of proteins and mRNA. Note that the if value after noise addition is smaller than zero, the vlue of vnoisy value is clipped at 0.
Submission of Predictions and Writeup
Participants are required to submit a writeup file indicating the methods used to solve the challenge, and 6 additional files corresponding to 2 files per model: one for the parameters and the other for the predicted time courses.
For each model, please submit your inferred parameters completing the provided template file
- DREAM6_ParEst_Parameters_model_<i>_<TeamName>.txt
where <i> = 1, 2 or 3, and <TeamName> is the name of the team with which you registered for the Challenge. This file is a \tab separated two column file, which lists all the parameters that you have to predict in the first column, and contains the word 'PREDICT' in the second column. At submission, replace 'PREDICT' with your prediction, and save as a \tab separated '.txt' two column file with no header.
For each of the models participating teams should submit their predicitons of the time course of the abundance of three proteins (different for each model) resulting from different perturbations, according to Table 5. For all the models the time course should use the following initial conditions: For model 1 and 2: at t=0, all the mRNA species are set to 0, and all the protein concentrations are set to 1. For model 3: at t=0, all the mRNA species and all the protein concentrations (except for p1) are set to 0, while p1 is set to 5.
|
Model Number |
Description of Simultaneous Perturbations |
Time-courses to predict |
|
1 |
10x increase in v5_Kd (Kd of p6)
2x increase in rbs6_strength
8x increase in rbs4_strength
|
p2, p4 and p6 |
|
2 |
100x decrease in v3_Kd (i.e. divide by 100)
2x decrease in v7_Kd (i.e. divide by 2)
5x decrease in rbs3_strength (i.e. divide by 5)
|
p3, p5 and p7 |
|
3 |
decrease v11_Kd by 10x (i.e. divide by 10)
decrease rbs2 strength by 10x (i.e. divide by 10)
increase pp9_mrna degradation by 2x (i.e. multiply by 2)
|
p4, p8 and p9 |
For each model, submit your time course predictions completing the provided template file
- DREAM6_ParEst_TimeCourse_model_<i>_<TeamName>.txt
where <i> = 1, 2 or 3, and <TeamName> is the name of the team with which you registered for the Challenge. This file is a \tab separated four column file, with a header in the first row indicating the nature of the information of each colum. The first column lists the times at which we are requesting the predictions, while the other 3 columns contain the proteins for which we are requesting the abundance. The template file contains the word 'PREDICT' in all the rows for the second to fourth column. At submission, replace 'PREDICT' with your prediction, and save as a \tab separated '.txt' two column file keeping the header.
Write-up
Finally we request that each participating team submits a short write-up (around two to three pages) explaining the methods used to arrive at their predictions. This write-up can contain pseudo-code describing the algorithm used, workflows for coming to the prediction of the isoforms, etc. Submit the write-up as the file
- DREAM6_ParEst_Writeup_<TeamName>.ext
replacing <TeamName> with the name of your team and the file extension (ext) with your choice of doc or docx. The submission of this writeup is mandatory for participation in this challenge.
Scoring Metrics
The Estimation of Model Parameters Challenge will be judged based on the accuracy of model parameter estimates and the accuracy of model predictions.
As there is no standard or obvious way of scoring in which it is optimal to spend some but not all the credits, scoring in this challenge will not take into account the amount of credits spent. Participants are encouraged to spend the whole budget.
The overall metric for for evaluation of predictions will be done by combining scores computed independently for each model. The scoring metric will be based on the sum of the squares of the log ratios between inferred and known parameters and the sum of the squares of the differences between predicted and simulated protein abundance.
The precise definition of the scoring system for this challenge can be found in this document.
For suggestions about how to score or anything else about this challenge , please contact the organizers directly, or post a comment in the discussion forum.
Credits
The challenge was conceived by Herbert Sauro and Gustavo Stolovitzky, the modelling was done by Deepak Chandran who also curated the challenge along with Ryan Roper, Kyung Hyuk Kim, Julio Saez Rodriguez, Raquel Norel, Pablo Meyer and Gustavo Stolovitzky.
Notes
Note about Protein Degradation Parameters
Protein degradation rates are equal and so they are in practice only 1 parameter for each of the 3 models. We still ask to submit the parameters as indicated in the Submission Template, although this will lead to submitting the same degradation rate parameter for all the proteins of a given model
Cheating?
It is expected that the same group does not register multiple times and purchase data under different Team names. We expect some sort of honor system, that will make people refrain from getting more data than they declare for the same credits. Remember that best performers will have to explain in the conference how they reached their good performance. (However we are going to try to detect if people from the same group are registering more than once under different team names. So don't cheat!)
6/5/11: New data posted
The data for this challenge has changed on 6/5/11. People who downloaded the data before that date were alerted of the change by e-mail, to the e-mail addresses used at registration. The most up-to-date file is DREAM6_ParEst_Data_v2.zip.
6/13/2011: Scoring systems posted
A new document detailing how this challenge will be scored has been added. Download it from this link.
6/16/2011: Data Purchasing
You can now "purchase" data from:http://dream.c2b2.columbia.edu/cgi-bin/buy_data.cgi. Please use the SAME e-mail and team_name that you used to register to the challenge.
BEFORE you can "purchase" data, you need to download the "free" DREAM6_ParEst_Data_v3.zip file (use the link at the bottom of this page)
6/22/11: New data posted
The data for this challenge contained in the file DREAM6_ParEst_Data_v2.zip has changed. People who downloaded this v2 file before 6/22/11 were alerted of the change by e-mail, to the e-mail addresses used at registration. The most up-to-date file is DREAM6_ParEst_Data_v3.zip.
7/19/11: New data posted
References
Download
The link below will allow you to download a zipped file the wild type data for the initial process of parameter optimization, the model files and the submission template files.