USING BIG DATA TO GENERATE HYPOTHESES ON RISK FACTORS FOR POORLY UNDERSTOOD CANCERS

Cancer is one of the most common and deadly diseases and its incidence is increasing. Considering that only 5 -10% of cancers are due to genetics, most cancer types are due to external risk factors such as lifestyle habits and environmental exposure.According to the American Institute for Cancer Research (AICR), 40 percent of cancer cases are preventable through reducing exposure to the controllable risk factors. This means that there are many preventable cancers without prevention recommendations. In order to identify risk factors, innovations in the techniques used to identify risk factors are needed. We will attempt to generate hypotheses about risk factors for cancers based on clinical features, behavioural risk factors, socio-demographics, and exposure or proximity to environmental hazards. This will be done through combining data on cancer incidence and the possible risk factors. The resulting dataset will be analyzed using statistical analyses, GIS analyses, and exploratory data mining.