Seminar

Identifying Gene-Gene Interactions Protective Against Autism Using Contrast Mining

Many genetic variants have been linked with the development of ASD. ASD is also known to be more prevalent in males than in females. The underlying mechanism for this difference is unclear. The polygenic nature of the genetic component of ASD makes studying potential mechanisms difficult if the significance of variants is assessed independently, as their effects may interact. Most research has focused on the pathogenic effect of certain genetic variants. However, genetic variants associated with a reduced risk of developing ASD also exist and may provide new clues regarding the mechanism for the difference in ASD between sexes. We…

Early Warning of Health Changes for Older Adults: Implementing a Gaussian Mixture Components Clustering Algorithm to Detect Outliers in Daily Multi-feature Sensor Data Streams

In this case study, we evaluate the implementation of Sequential Possibilistic Gaussian Mixture Models (SPGMM) for accurately modeling changes in feature streams antecedent to known health events, thereby providing predictive relevance for clinical use, including identifying the preprocessing requirements for streams prior to algorithm input. SPGMM is a change detection algorithm developed for use in online data stream processing applications where feature vectors are introduced sequentially as inputs for iterative clustering. SPGMM is comprised of two components: the Sequential Possibilistic One-Means (SP1M) algorithm for generating the initial clusters (Gaussian mixtures), combined with the Gaussian Mixture Model (GMM), which defines how…

Supporting Population Health Outcomes Studies Using a Framework of Social Determinants Linked EHR Data

Population health outcomes research based on social determinants of health (SDoH) needs to link electronic health record (EHR) data with social determinants using Identifiable information (patients’ addresses). The connectivity expects additional computational load, privacy risk, and storage for each research. A Data Lake that facilitates research data can provide a framework for SDoH-connected EHR data and cohort phenotyping algorithms. For this study, a framework was developed by staging Census Bureau American Community Survey (ACS), Area Deprivation Index (ADI), and Center for Medicare and Medicaid (CMS) defined phenotyping algorithms for 27 chronic diseases. From 1,673,145 patients of the University of Missouri’s…

Alzheimer’s disease mitigation: AI, neuroimaging and gut-brain axis

Alzheimer’s disease (AD) is the most common form of dementia and currently there are no effective therapeutics to reverse the course once the clinical symptoms have developed. Early identification of risk factors for AD and effective interventions thereof would be critical to mitigate AD pathological development and prevent the onset of clinical symptoms. In the presentation, I will demonstrate how we used artificial intelligence approach to identity the risk factors from clinical data, and determined the effectiveness of pharmacological and nutritional interventions in an animal model with human APOE4 genes, the strongest genetic risk factor for AD. Future direction on…

Overhead imagery training data quality control: Methods for deep feature label anomaly detection

Spatial analysis of large remotely-sensed imagery (RSI) training datasets for within-class variation and between-class separability is key to uncovering issues of data diversity and potential bias, not just when vetting datasets for usage, but also during the actual dataset creation stage. Project managers of complex imagery annotation campaigns have a largely unaddressed need for tools that continuously monitor for data labeling anomalies which may be due to human bias or error. This presentation outlines a deep-feature change detection approach using Geospatial Fréchet Distance (GFD) for automatically measuring significant regional changes in image label appearance (i.e., within-class variance). An experimental setup is designed…

Biological pathways as graphs: comparison of select similarity methods

We extracted biomedical pathways from 47 publications related to non-small cell lung cancer (NSCLC) and mergedthem into a Neo4j graph database. With this graph serving as ground truth for comparing to other pathways that were extracted from other publications, we investigated several methods of calculating graph similarity. Unlike ontologies and engineered data sets that have uniform representations of data objects, graphs extracted from unstructured texts haveto be compared as text-described entities first, and by using common graph similarity methods second. In this work, we discuss ways of comparing biological graphs composed of text-described entities, both on the node level and on the graph level. Nodes, their adjacent neighbors and their relationships that contain nominal properties (features) areconverted into relational measures by being compared to their counterparts in another graph, then aggregated into a single measure. Also, a method of searching for similar nodes is described that can be used to locate potential mislabeled twin…

Impact of diabetes status and other factors on risk for thrombotic and thromboembolic events: A multicenter, retrospective analysis using the Cerner Real-World DataTM de-identified COVID-19 cohort 

Coronavirus disease 2019 (COVID-19), caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), is a proinflammatory condition that can impact the cardiovascular and cerebrovascular systems, thereby increasing risk for thrombotic and thromboembolic events (TTE). However, little is known about the impact of diabetes status on risk for TTEs during SARS-CoV-2 infection. In this US-based, multicenter retrospective cohort study, we analyze the impact of diabetes status (i.e., diabetes present vs. diabetes absent; Type 1 diabetes versus Type 2 diabetes), race and ethnicity, sex, and other factors on risk for TTEs in adults with suspected and confirmed COVID-19 infection. After using multivariate…

Analysis of polygenic selection in purebred and crossbred pig genomes using Generation Proxy Selection Mapping

Background Artificial selection on quantitative traits using selection indices in commercial livestock breeding populations causes changes in allele frequency over time, termed selection signatures, at causal loci and other surrounding genomic regions.  Researchers and managers of pig breeding programs are motivated to understand the genetic basis of phenotypic diversity across genetic lines, breeds, and populations using selection signature analyses.  Here, we applied Generation Proxy Selection Mapping (GPSM), a genome-wide association analysis of SNP genotype (38,294 to 46,458 SNPs) on birth date, in four pig populations (15,457, 15,772, 16,595 and 8,447 pigs per population) to identify loci responding to artificial selection over a…

Detecting formation and growth of refugee / displaced person camps in the Ukraine crisis: Assisting first-phase humanitarian response using satellite imagery

Amid the worst population displacement crisis in Europe since World War 2, governments and international organizations have struggled with the massive task of tracking and providing aid to Ukrainian refugees and internally displaced persons (IDPs). This presentation reviews requirements and explores solutions for AI-assisted monitoring of formation of ad-hoc refugee encampments and temporary/informal settlements in remotely-sensed imagery to support time-critical humanitarian operations. Data and models for binary geospatial prediction of camp location as well as time-series camp expansion will be discussed, as will deep methods for characterizing similarities and differences among detected encampments.