Proof tree contrast mining for automatic hypothesis generation

The probability of developing almost any given disease is affected by multiple risk factors. These risk factors often do not behave independently, instead interacting in specific ways which affect the probability of developing the disease. To better understand the root causes of many diseases, it is necessary to study these interactions as they may provide clues about the underlying mechanisms responsible for the development of the disease. We have developed a method for studying these interactions by applying contrast mining to extract patterns of nested logical interactions associated with specific medical outcomes. We demonstrate the effectiveness of this method in preliminary hypothesis generation using both synthesized and real-world datasets including the SPARK autism dataset containing genetic information from roughly 10,000 autistic individuals and their unaffected family members. The results of this analysis suggest a novel interaction of ethnic risk factors which increases the risk of developing autism.