The DREAM Phil Bowen ALS Prediction Prize4Life

Challenge is open! Please go to https://www.innocentive.com/ar/challenge/9933047 to register and download data.

Synopsis

The goal of this challenge is to predict the progression of disease in ALS patients based on the patient’s current disease status. The data available to make this prediction includes demographics, medical and family history data, functional measures, vital signs, and lab data (blood chemistry/hematology/urinalysis). These data have been obtained from industry, academic, and government-funded clinical trials. The prize award is $50,000. 

 

Background

Amyotrophic Lateral Sclerosis (ALS)–also known as Lou Gehrig’s disease (in the US) or Motor Neurone disease (outside the US)–is a fatal neurological disease causing death of the nerve cells in the brain and spinal cord which control voluntary muscle movements.  This leaves patients struggling with a progressive loss of motor function while leaving cognitive functions intact.  Symptoms usually do not manifest until the age of 50 but can start earlier.  At any given time, approximately five out of every 100,000 people worldwide suffer from ALS, though there would be a higher prevalence if the disease did not progress so rapidly, leading to the death of the patient.

 

There are no known risk factors for developing ALS other than having a family member who has a hereditary form of the disease, which accounts for about 5-10% of ALS patients. There is also no known cure for ALS. The only FDA-approved drug for the disease is Riluzole, which has been shown to prolong the life span of someone with ALS by a few months. 

 

The average life span of an ALS patient is 2-5 years; however, approximately 10% of patients live significantly longer with a much slower disease progression. An example of the latter is astrophysicist Stephen Hawking, who has lived with ALS for the last 49 years.

 

The DREAM-Phil Bowen ALS Prediction Prize4Life (or “ALS Prediction Prize” for short) is based on the PRO-ACT database, which upon completion will contain clinical data for over 7,500 ALS patients from completed clinical trials. The entire PRO-ACT database will be available for research purposes in December 2012, and the ALS Prediction Prize will use a subset of this data. The challenge is being run by Prize4Life in collaboration with DREAM.  Prize4Life is a non-profit organization whose mission is to accelerate the discovery of treatments and a cure for ALS by using powerful incentives to attract new people and drive innovation. Prize4Life is developing the PRO-ACT database in collaboration with the Northeast ALS Consortium (NEALS) with funding from the ALS Therapy Alliance, and with generous data donations from both biopharma companies and academicians. Prize4Life shares DREAM’s vision of scientific advancement through collaboration and open access challenges. Since it’s founding in 2006, Prize4Life has run several ALS related challenges with prizes of up to a million dollars. For more information on Prize4Life please visit Prize4Life.org.

 

The Challenge

The ALS Prediction Prize will focus on the difficult but important issue of determining accurate predictive indicators for disease progression. The typical ALS patient will unfortunately experience rapid disease progression, resulting in complete paralysis and death within 2-5 years. However, a subset of patients experiences a much slower disease progression or even, in extremely rare instances, an apparent arrest of disease progression. In the early stages of the disease, it is currently very difficult to determine whether a given patient will experience slow or fast disease progression. This is obviously of great importance for patients and their families. Additionally, the ability to predict disease progression is critical for those interested in planning ALS clinical trials for potential new treatments. Currently, ALS trials must include large numbers of patients to account for the enormous variance in the course of the disease within the ALS patient population, making these trials costly, slow, and more difficult to interpret.

 

Information regarding anticipated disease progression is currently not provided to patients due to a lack of specific and reliable predictors. For clinical trial purposes, the best such predictions we are currently able to make are based on the rate of change in a widely used functional scale known as the ALS Functional Rating Scale (ALSFRS).  The ALSFRS is a comprehensive scale encompassing changes in limb movements, ability to feed oneself, ease of breathing, etc. For each one of these functional measures, a score of 0 to 4 is assigned, with the final score being the sum of these individual functional scores [2, 3]. The ALSFRS follows a roughly linear negative course and its slope correlates reasonably well with further disease progression. On average, a patient’s ALSFRS score decreases 0.9 units per month. Therefore, a patient with an average deterioration of 0.2 units per month or less over a year’s time would be considered to be progressing slowly, whereas a patient with an average loss of 2.0 units per month would be considered to be progressing quickly. 

 

While future disease progression can be roughly predicted based on the slope of the ALSFRS score (See more on Ref. [3] and supplementary information), this estimation is so variable over the short term that it is not currently part of the clinical guidance provided to ALS patients. The goal of this challenge is to predict disease progression more accurately, as assessed via prediction of the change in functional score over an entire year, based on 3 months worth of data. More specifically, participants in the challenge will need to develop an approach to predict a given patient’s disease status within a year’s time based on 3 months of data. Disease progression will be calculated as the average change in ALSFRS over a year’s time from enrollment in a clinical trial. At the end of the challenge, the prediction submitted (based on 3 months of data) will be compared against the actual ALSFRS slope experienced by the patient over a year (see more below).

 

Ultimately, it is expected that this challenge will improve disease prediction beyond the current capabilities by 1) developing more accurate (sensitive and specific) methods of predicting progression, and 2) identifying markers (variables) that would enable a determination of expected future disease progression earlier on in the course of the disease. 

 

 

The data available for analysis will include symptom onset date, medical and family history data, demographic data, visit dates, and then the following data collected at multiple times throughout the course of the study: functional measures (ALSFRS), body weight and vital signs, and lab data (blood chemistry, hematology, and urinalysis). A full data dictionary will be made available. To be eligible to win, a Proposed Solution (defined below) will have to perform better than an established benchmark generated using an “off-the-shelf” machine learning algorithm (see Scoring Metric below). The algorithm should be able to take as input the covariates corresponding to a single patient, and is required to output the outcome for that patient. 

 

 

Validation

During the challenge, participants will be able to validate their model against a subset of patients that are neither part of the training set nor the final validation (test) set. To do so participants will submit their actual code written in R language (“Validation Code”) and InnoCentive will run the code against the interim validation data set. The results of this partial validation will appear on a leaderboard along with the relative rankings of participants. The scoring metric will be RMSD (see more below).

 

Submission for validation (and ranking on the leaderboard) during the challenge is voluntary but encouraged. Evolving code to be validated can be submitted up to 100 times per participating team during the open phase of the challenge.

 

On October 1st, the leaderboard will be closed and the data from the validation set will be released to the participants for further testing and refining of their models.

 

Submission of final predictions and write-up

To be eligible for the prize, challenge participants will have to submit their actual code written in R language (“Final Code”) and a description (“Final Description”), as described below. Deadline for submission of Final Code and Description (together, the “Proposed Solution”) is 5 p.m. EST October 15th, 2012. InnoCentive will run the Final Code against the final data set. Best performers will be determined based on the performance of their models against the test data set.  The Final Code will be used solely for testing algorithm performance (see also the Terms and Conditions, to be found when registering with Innocentive). 

 

 

By October 15th each participating team will have to submit:

1)  The “Final Code”, which is code written in R language to be run on the final test set. Note that since the challenge focuses on prediction at the single patient level, rather than prediction in a population context, we are requesting an R object that reads data corresponding to a single patient at a time and outputs the predicted ALSFRS slope for that patient between the end of the 3 month training period (month 4) and one year after that patient enrolled in their trial (month 12: 9 months after the training period).

 

2) The “Final Description”, which is a write-up with the necessary documentation to run the algorithm, and an explanation of the methods used to arrive at the submitted predictions. This write-up can contain either the actual code or pseudo-code describing the algorithm, as well as workflows for analyzing the data, etc. The write up needs to be detailed enough to allow someone versed in the field to develop a similar algorithm. Name the write-up as

ALS_ Writeup_TeamName.ext

replacing "TeamName" with the name of your team and the file extension (ext) with your choice of doc or docx.

A Proposed Solution must be complete in order to qualify as an Accepted Solution. While the algorithm will not be used for any purpose but assessing the best performance, the detailed write ups of the best performers will be given to the challenge host for publication purposes (any such write-up will fully attribute the submitting individual/team) as described in the Terms and Conditions.

 

As described also in the Terms and Conditions, winning algorithm descriptions must be written up and submitted for publication within 6 months of award announcement (with help from the challenge host if desired). While it is not required, publishing the solution in an open source journal is encouraged. In close consultation with the prize winner, the challenge host will also make a detailed description of the top performing algorithm publically available. Being recognized as the best performer and winning the challenge is contingent upon these requirements.

This year, Open Network Biology (ONB; a new open source journal http://www.opennetworkbiology.com/  of the BioMed Central family) and DREAM will work together in serving the systems biology community. If the winner so chooses, ONB will publish the best performing methods (after challenge-assisted peer review) and will waive the article publication fee as a prize for best performance in a DREAM challenge (equal to £1,200GBP per paper). Publishing in an ONB journal is optional, not mandatory, and may be open to more than only the best performing team, according to the judges’ discretion.

 

Scoring Metrics

Proposed Solutions will be scored using Root Mean Square of Deviation (RMSD), and in order to be eligible for prize award will have to beat a threshold of accuracy generated by a standard machine learning algorithm. If there are multiple highly accurate submissions, other scoring metrics (such as Pearson correlation) will also be used in addition to RMSD. Clinical utility will also be considered. In cases of a tie, the prize purse may be divided among the top submissions.

 

If you have comments/questions please contact us at

prediction AT prize4life DOT org.

 

Timeline

Deadline for final submission is 5PM EST October 15th, 2012. Results will be announced no later than October 31st.

 

Credits

The DREAM-Phil Bowen ALS Prediction Prize4Life was designed by Rebecca Betensky, Robert Kueffner, Melanie Leitner, Raquel Norel, David Schoenfeld, Gustavo Stolovitzky and Neta Zach. Clinical expertise was provided by Nazem Atassi, James Berry, and Merit Cudkowicz. Data management and database design expertise were provided by Igor Katsovskiy, Alex Sherman, Ervin Sinani, and Jason Walker.

 

Prize4Life is developing the PRO-ACT database in collaboration with the Northeast ALS Consortium (NEALS) with funding from the ALS Therapy Alliance, and with generous data donations from both biopharma companies and academicians.

 

References

1. ALS CNTF Treatment Study (ACTS) Phase I-II Study Group. The Amyotrophic Lateral Sclerosis Functional Rating Scale. Assessment of activities of daily living in patients with Amyotrophic Lateral Sclerosis. Arch Neurol. 1996; 53: 141-147.

2. Kaufmann P, Levy G, Thompson JL, Delbene ML, Battista V, Gordon PH, Rowland LP, Levin B, Mitsumoto H (2005) The ALSFRS-R predicts survival time in an ALS clinic population. Neurology 64:38-43

3. Turner MR, Bakker M, Sham P, Shaw CE, Leigh PN, Al-Chalabi A (2002) Prognostic modelling of therapeutic interventions in amyotrophic lateral sclerosis. Amyotroph Lateral Scler Other Motor Neuron Disord. 3:15-21.

Supplementary Information

Additional information about possible predictors for ALS and associated literature can be found in the Supplement.