Classical machine learning techniques could help children with autism receive treatment earlier in life.
What’s new: Researchers led by Ishanu Chattopadhyay at University of Chicago developed a system that classified autism in young children based on data collected during routine checkups.
Key insight: Autistic children have higher rates of certain conditions — such as asthma, gastrointestinal problems, and seizures — than their non-autistic peers. Incidence of these diseases could be a useful diagnostic signal.
How it works: The authors used Markov models, which predict the likelihood of a sequence of actions occurring, to feed a gradient boosting machine (an ensemble of decision trees). The dataset comprised weekly medical reports on 30 million children aged 0 to 6 years.
- The authors identified 17 disease categories — respiratory, metabolic, nutritional, and so on — that appeared in the dataset.
- They turned each child’s medical history into a time series, one for each disease category. For instance: week 1, no respiratory disease; week 2, respiratory disease; week 3, an illness in a different category; week 4, no respiratory disease.
- Using the time series, the authors trained 68 Markov models: one for each disease category for various combinations of male/female and autistic/not autistic. The models learned the likelihood that the diagnosis a given child received for each disease category occurred in the order that it actually occurred.
- Given the Markov models’ output plus additional information derived from the time series, a gradient boosting machine rendered a classification.
Results: The system’s precision — the percentage of kids it classified as autistic who actually had the condition — was 33.6 percent at 26 months. Classifying children of the same age, a questionnaire often used to diagnose children between 18 and 24 months of age achieved 14.1 percent precision. The model was able to achieve sensitivity — the percentage of children it classified correctly as autistic — as high as 90 percent, with 30 percent fewer false positives than the questionnaire at a lower sensitivity.
Why it matters: It may be important to recognize autism early. Although there’s no consensus, some experts believe that early treatment yields the best outcomes. This system appears to bring that goal somewhat closer by cutting the false-positive rate in half compared to the questionnaire. Nonetheless, it misidentified autism two-thirds of the time, and the authors caution that it, too, could lead to over-diagnosis.
We’re thinking: Data drift and concept drift, which cause learning algorithms to generalize poorly to populations beyond those represented in the training data, has stymied many healthcare applications. The authors' large 30 million-patient dataset makes us optimistic that their approach can generalize in production.