Discovering novel risk factors for age-related eye diseases using subgroup data mining

In the United States, the Medicare cost for the treatment of cataract is continually increasing due to an increased number of cataract surgery done in each year, for example, more than 3 million cataract surgeries were performed in 2017. A study by the World Health Organization reported that a delay of 10 years of the onset of cataracts would cut the number of people who need cataract surgery in half. Henceforth we ventured to investigate potential risk factors for the development of cataract to tackle this growing problem. On the other hand, comorbidities of multiple risk factors and its cumulative effect on the development of cataracts have not been adequately tested due to a limitation in methodology. Recent years, a vast amount of digital information stored in electronic health records (EHRs). We have taken advantage of this database to select the study groups cataract verses non-cataract to identify highly contrasting patterns between these two subgroups. The contrast mining algorithm basically shows patterns and models with the frequently seen condition in one group but relatively rare in the other group. Though we extracted hundreds of contrast patterns, we selected only the top 20 patterns based on the credentials such as support, confidence and lift etc. I would discuss the possible relationship between these risk factors and its role in the development of cataract.