Published on
Population health outcomes research based on social determinants of health (SDoH) needs to link electronic health record (EHR) data with social determinants using Identifiable information (patients’ addresses). The connectivity expects additional computational load, privacy risk, and storage for each research. A Data Lake that facilitates research data can provide a framework for SDoH-connected EHR data and cohort phenotyping algorithms. For this study, a framework was developed by staging Census Bureau American Community Survey (ACS), Area Deprivation Index (ADI), and Center for Medicare and Medicaid (CMS) defined phenotyping algorithms for 27 chronic diseases. From 1,673,145 patients of the University of Missouri’s Cerner Millennium data, 899,231 unique addresses were geocoded down to the census block group level using DeGAUSS geocoder. A de-identified dataset was generated with EHR-linked SDoH data using block groups. To demonstrate the feasibility of this framework, this study conducted research using the framework to identify the correlation between the risk of having one of the 27 chronic conditions living in an area with a particular ADI using univariate logistic regression. 18 out of 27 chronic conditions showed a correlation with a significant p-value.