Predictive Models for Early Detection of Glaucoma using Electronic Health Records

Glaucoma is the second leading cause of irreversible blindness across the world. Around 70 million people have glaucoma, and 4.4 million people are blind due to undiagnosed glaucoma by optic nerve damage worldwide. Studies suggest that early detection of glaucoma is critically essential for the prevention of irreversible blindness. Effective use of electronic health records (EHR) provides data-driven, evidence-based risk factors for glaucoma progression that would also enable applying a practical machine learning (ML) model to predict glaucoma before the onset of clinical symptoms. Predictive models for the early detection of glaucoma are critical steps for subsequent actionable/preventable interventions. We retrieved an age-matched balanced dataset of glaucoma and non-glaucoma cohorts, a total of 33,611 unique patient records obtained from the EHR database with timestamp information from 2001 to 2015. The data was processed and transformed into a data matrix for ML learning. We applied five ML methods such as logistic regression, random forest, MLP classifier, Xgboost classifier, naïve Bayes and compared the performance of prediction accuracy with contrast pattern classification. The results show that hypertension, Female gender, African American race have a higher association with glaucoma conditions than non-glaucoma conditions (p-value < 0.05). The results also suggest that the risk associated with hypertension and alcohol use is three-fold higher among African Americans than other races. The detailed results will be discussed.

Please contact Robert Sanders ( for Zoom information.