Contrast Data Mining and Pattern Discovery for Glaucoma Risk Assessments

Glaucoma is the second leading cause of irreversible blindness across the world, about 70 million people have glaucoma, and 4.4 million people are blind due to undiagnosed glaucoma by optic nerve damage worldwide. Studies show that early prediction is the best way to prevent irreversible blindness. To address this problem, we applied a subgroup contrast set mining for glaucoma risk assessment. Contrast mining has been successfully applied in health care data analytics and demonstrated in recent work from our lab using a large volume of EHR (Electronic health records) data analysis. The main goal of this method is to identify patterns within a subset of a dataset that shows unusual behavior concerning particular attributes associated with one group but not on other groups. We used an extensive HER database for this study. We applied the international classification of disease diagnostic codes for retrieving glaucoma-related cases from 2001 to 2015 with the inclusion and exclusion criteria of this study. The EHR data was sliced at a one-year interval and used as input data for contrast mining. The in-depth exploratory mining process was applied that identifies small homogenous sub-data with similar patter from a large, diverse, and heterogeneous dataset by using floating and Path Expansion and prioritization of subgroups based on J-value. The results show high contrast patterns with hypertension, alcohol use, and African American race in glaucoma condition compared to non-glaucoma conditions (p-value < 0.05). However, the risk associated with hypertension, along with alcohol use is three-fold higher among the African Americans race compared to other races. We can apply this method to many other diseases for risk assessments.