Discovering novel risk factors for age-related eye diseases using subgroup data mining
In the United States, the Medicare cost for the treatment of cataract is continually increasing due to an increased number of cataract surgery done in each year, for example, more than 3 million cataract surgeries were performed in 2017. A study by the World Health Organization reported that a delay of 10 years of the onset of cataracts would cut the number of people who need cataract surgery in half. Henceforth we ventured to investigate potential risk factors for the development of cataract to tackle this growing problem. On the other hand, comorbidities of multiple risk factors and its cumulative…
eCaregiving 2.0: Feasibility, Data Quality and Cost of Collecting Continuous Self-reported and Passive Data Using a Personal Health Management System
This project aims to demonstrate if self-reported and passive data can be continuously collected from a cohort of 55 study participants using Personal Health Management Information System (PHMS). We estimated measures of data quality, feasibility and cost of data collection after asking participants to wear Fitbit and download the app to manage data transmission from passive (daily) and active data collection through a monthly survey. In this six-month cohort study, we expected to capture 100% of survey-based self-reported data and 100% of passive data parameters totaling 9,848 days of measurement. With minimal instruction to participants, we captured 66% number of days of…
Immunogenomic Pathway and Survival Analysis in Colorectal Cancer Patients Based on Tumor Location and Microsatellite Status
Despite the advancement of available therapies (surgery, chemotherapy and immunotherapy, etc.), colorectal cancer (CRC) as the third most common cancer still remains the second leading cause of cancer-related death worldwide. Typically, CRC patients could be categorized into microsatellite stable (MSS, approximately80-85% in CRC) or microsatellite instability (MSI, approximately 10-15% in CRC) type. An extensive literature has shown that CRC patients with MSI status have more T cells in the primary tumor than those with MSS status. This is believed to contribute to the 78% of the MSI patients who had cancer progression-free survival after receiving immunotherapy (Pembrolizumab, an anti-PD-1 immunotherapy)…
Income Inequality and Health: Expanding our Understanding of State Level Effects by using a Geospatial Big Data Approach
The income inequality hypothesis proposes that ecological income inequality is harmful for population health but findings from extant work are inconsistent across health outcomes and levels of geography. We contribute to this debate by applying a big data geospatial approach to create three innovative measures of uniformity in income inequality across space within US states. Controlling for relevant individual and contextual characteristics, we evaluate multilevel models of individuals within states using data from the Behavioral Risk Factor Surveillance System and American Community Survey to examine the ways that income inequality, operationalized as the Gini coefficient, and the three uniformity measures…
PROTEIN TRANSPORT: BIOINFORMATICS METHODS FOR UNDERSTANDING PROTEIN SUBCELLULAR LOCALIZATION
Eukaryotic cells contain diverse subcellular organelles. These organelles form distinct functional cellular compartments where different biological processes and functions are carried out. The accurate translocation of a protein is crucial to establish and maintain cellular organization and function. Newly synthesized proteins are transported to different cellular components with the assistance of protein transport machineries and complex targeting signals. Mis-localization of proteins is often associated with metabolic disorders and diseases. Compared with experimental methods, computational prediction of protein localization, utilizing different machine learning methods, provides an efficient and effective way for studying the protein subcellular localization on the whole-proteome level. Here,…
Informatics framework for the identification of diagnostic discrepancies and errors
Diagnostic “grey zones” is a term used in pathology, the study of diseases, to describe overlapping morphologic, immunophenotypic and genetic features among various disease subtypes that can lead to diagnostic pitfalls and errors in classifying cancer (e.g. lymphomas). Diagnostic pitfalls are risks that pathologists should be aware of and avoid, and diagnostic errors are failures of medical tests to describe accurately the disease progress in an individual patient. Therefore, pathologists have to perform rigorous medical examinations. These examinations can be used to study diagnostic errors. However, the examinations are documented as unstructured free text. From a computational standpoint, it is…
HOMOLOGY SEQUENCE ANALYSIS USING GPU ACCELERATION
A number of problems in bioinformatics, systems biology and computational biology field require abstracting physical entities to mathematical, computational models. In such studies, the computational paradigms often involve algorithms that can be solved by the Central Processing Unit (CPU). Historically, those algorithms bene- fit from the advancements of computing power in the serial processing capabilities of individual CPU cores. However, the growth has slowed down over recent years, scaling out CPU has shown to be both cost-prohibitive and insecure. To overcome this problem, parallel computing approaches that employ the Graphics Processing Unit (GPU) have gained attention as complementing or replacing…
Using Deep learning method (CNN) for prediction of ubiquitination protein
Ubiquitination, as a post-translational modification, is a crucial biological process presented in cell signaling, death and localization. Identification of ubiquitination protein is of fundamental importance for understanding molecular mechanisms in biological systems and diseases. Although high-throughput experimental studies using mass spectrometry have identified many ubiquitination proteins and ubiquitination sites, the vast majority of ubiquitination proteins remain undiscovered, even in well studied model organisms. To reduce experimental costs, computational (in silico) methods have been introduced to predict ubiquitination sites. If we can predict whether a query protein can be ubiquitinated or not, it is meaningful by itself and helpful for predicting…
Automation of Volumetric Analysis of Adiposity in Canines
Roughly 30-40% of all dogs and cats that are seen by a veterinarian can be classified as obese. Despite this, veterinary practices still utilize a 5 point or 9 point subjective classification system when classifying patients as obese, which can provide difficult when providing accurate nutritional consults to veterinary clients aiming to decrease their pet’s weight. Further, the obesity itself can lead to worsening of comorbid conditions. Thus, an automation of the process of assessing adiposity through CT scan was attempted, looking specifically at the thoracic region of the animal. First, the issues with the current BCS system were highlighted…