Developing a Decision Support Software for CINV Prevention
The US National Center for Health Statistics estimated that more than 19 million adults in the US have ever been diagnosed with cancer. Chemotherapy is one of the important modality of cancer treatments. Chemotherapy-Induced Nausea and Vomiting (CINV) are the two most dreadful and unpleasant side effects of chemotherapy. CINV substantially degrades the patients’ life quality (due to dehydration, nutritional deficits, electrolyte imbalance, etc.) and increases the healthcare cost (by requiring further management of CINV including outpatient visits, drugs, hospitalization, etc.). In addition, cancer patients sometimes discontinue chemotherapy due to intolerable CINV. Thus, this is imperative to identify and treat…
Knowledge Discovery System for Research Hypothesis Generation from Serendipitous Findings
From the discovery of penicillin and x-rays to the development of many of today’s chemotherapy agents, serendipitous findings tangential to the researcher’s intended purpose, the “That’s funny…” moments, have greatly impacted the health and well-being of society. As an information behavior, these unexpected findings are an example of the Opportunistic Discovery of Information (ODI). ODI has been described in many contexts, from information behavior in virtual worlds to the impact of information encountering on health behaviors. Yet, little is known about instances of ODI within the context of scientific research. A major difficulty in the study of the ODI is…
Model-, structure-, and sequence-based methods for prediction of protein binding sites
Identification of protein-protein binding sites is important in understanding the protein function. The binding site prediction methods that rely on structure are generally more accurate than those ones relying on sequence. However, the coverage of structure-based methods is significantly lower than of the sequence-based method due to the lack of experimental structures. Here, we propose a sequence-based protein binding site prediction approach that utilizes structure-based methods’ benefits. We utilize L1-regularized logistic regression to integrate sequence- and structure-based predictions for comparative models. The method relies on a series of features, including evaluation of comparative models, geometric features, solvent accessibility, hydrophobicity, secondary…
Sequence Identity Study for Operational Taxonomic Unit Classification
In metagenomics studies of microorganisms, Operational Taxonomic Unit (OTU) is often used as the replacement for species distinction. This pseudo-species definition is helpful in cases when the scientists would like to understand the composition and diversity of the culture in different environments. Traditional numerical taxonomy method typically defines an OTU as a cluster in a graph resulting from sequence alignment. According to this method, organisms whose 16S rDNA sequences have more than 97% sequence similarity threshold are connected together to firm a cluster. In this study, we investigate on whether the tradition numerical method results in OTUs that behave as…
Predictive Analytics On Medicare/Medicade Cost Outcomes
LIGHT2 (Leveraging Information Technology for Hi-Tech and Hi-Touch Care) is a federally funded project using 24 “Nurse Care Managers” to manage the health of 10,000 Medicare and Medicaid patients. Its goal is to reduce exacerbations of chronic diseases, which would improve health outcomes while lowering healthcare costs. Analytics support (“Hi-Tech”) support for the Nurse Care Managers (“Hi-Touch”) has been used to classify patients by past utilization and costs, but these are imperfect predictors of future exacerbations and increasing utilization. Mining the large available health histories of these patients, along with demographic and other data, reveals some expected and some surprising…
A Deep Learning Network Approach to ab initio Protein Secondary Structure Prediction
The necessity for reliable ab initio protein secondary structure prediction is growing along with the demand for accurate tertiary structure prediction. Although recent developments have slightly exceeded previous methods of secondary structure prediction, these methods rarely surpass 80% accuracy. Developing new tools and methods to improve secondary structure prediction is essential to the improvement of tertiary structure prediction in proteins. Here we present DNSS, a secondary structure predictor that makes use of the position-specific scoring matrix generated by PSI-BLAST and deep learning network architectures. Graphical processing units and CUDA software are used to optimize the deep network architecture and efficiently…
Automated Large-Scale File Preparation and Docking: Evaluation of ITScore and STScore Using the 2012 Community Structure−Activity Resource Benchmark
We present the first study utilizing the full set of compounds from the recently released 2012 Community Structure−Activity Resource (CSAR) data set. The CSAR data set is a realistic benchmark for protein-ligand docking scoring functions, containing 57 crystal structures and 757 compounds, most with known affinities from pharmaceutical companies. We used the CSAR data set to evaluate two knowledge-based scoring functions, ITScore and STScore, and a simple force-field-based potential, FFPScore. To conduct this large-scale docking evaluation, we scripted our docking software and associated tools for automated preparation, docking, and evaluation, enabling others to reproduce our results. We also developed a…
Simulation based Training for Medical Skills: Comparative Effectiveness of Training Methods and Evaluating the Translational Impact
Simulation based medical education is gaining wide spread appeal as a means to increase medical skill training opportunities and enhance patient safety in a changing medical environment. Two factors have accelerated the adoption of patient simulation in health care including; 1) the successful use of simulation in other high risk endeavors such as airline pilot training, and, 2) the high face validity of patient simulation. It is expected that the use of computerized manikins and patient simulation will continue to grow. Much research demonstrates the use and apparent effectiveness of simulation-based training. However, comparative evaluation of simulation-based training methods is…
Large-Scale Pairwise Alignments on GPU Clusters: Exploring the Implementation Space
Several problems in computational biology require the all-against-all pairwise comparisons of tens of thousands of individual biological sequences. Each such comparison can be performed with the well-known Needleman-Wunsch alignment algorithm. However, with the rapid growth of biological databases, performing all possible comparisons with this algorithm in serial becomes extremely time-consuming. The massive computational power of graphics processing units (GPUs) makes them an appealing choice for accelerating these computations. As such, CPU-GPU clusters can enable all-against-all comparisons on large datasets. In this work, we present four GPU implementations for large-scale pairwise sequence alignment: TiledDScan-mNW, DScan-mNW, RScan-mNW and LazyRScan-mNW. The proposed GPU…