Survival analysis in Stage II and III Colorectal Cancer Patients Using Novel Exploratory Data Mining

This study leverages clinicopathological data and genomic mutations based on a framework that includes a companion diagnostic template and a novel explainable AI algorithm to improve the selection of prospective patients for adjuvant therapy in colorectal cancer (CRC). Integrating these two emerging technologies may offer better solutions for assessing treatment outcomes by embracing a data-driven, translational approach to patient care. Exploratory data analysis discovered a sizable collection of CRC patient subgroups within Stages II and III, using criteria that ensure the significance of prevalence for these gene mutations, respective of their group. [1]  

CRC patient data was collected from the cancer genome atlas (TCGA-COADREAD) to identify the feasibility of successfully furthering the application of the Foundation One Companion Diagnostic template (i.e., F1CDx®) for discerning treatment benefit specific to CRC Stage II, and Stage III. By constraining optimal clinical variables that reflect NCCN treatment guidelines, the identification of key genomic factors, via exploratory mining, can illuminate non-obvious relationships in the context of disease-free survival that is CRC stage-specific (i.e., Stage II and/or Stage III). Patients with strong responsiveness to adjuvant chemotherapy, following surgical resection, are valuable components that can be leveraged as predictive elements within a computational model. Our exploratory algorithm approaches this concept through the stratification of patients into subgroups for ‘survival-based-prediction’ analysis, thereby one can infer probabilities of prognostic receptiveness.

The top-ranking subgroups were collected and processed for statistical validation using the Benjamini-Hochberg false discovery rate to control for multiple hypothesis testing. Lasso regression modeling was used to cross-reference the subgroup ranking system calculated by the J-index values for the observed outcomes, or groups. Survival analysis was performed using the Kaplan-Meier method to evaluate both clinical indications and genetic signatures to assess disease-free survival. Distinct genetic signatures were found to significantly impact recurrence probabilities when combined with their corresponding clinical presentations, respective of their subgroups, and according to CRC stage.