DATA MINING FOR GENETIC CONTRIBUTIONS TO THE ETIOLOGY OF AUTISM SUBGROUPS

Published on March 27, 2019

Autism is a collection complex neurological disorders characterized by behavioral, social, and cognitive deficits. Previous investigation of the etiology of autism reveals it to be a complex disorder with no simple way to identify its root cause in most affected individuals. The difficulty determining causal variation leads to the hypothesis that multiple genetic risk factors are necessary in combination to produce the autistic phenotype. Furthermore, the immense phenotypic heterogeneity seen in autism patients leads to a second hypothesis that there exist multiple subtypes of autism with distinct genetic etiologies. We developed new methods combining strategies from bioinformatics, data science, and statistics to identify multifactorial genetic contributors to specific groups of autism patients with shared characteristics. We discuss how the utilization of a data-driven algorithm to to examine hundreds of putative autism subgroups for interesting genetic distinctions can advance the study of autism. Finally, we describe an analysis pipeline spanning from the initial analysis of raw genotype and phenotype data to the acquisition of knowledge of the functional pathways which play a major role in the development of a complex disease.