Published on Sep. 21, 2020
To understand biological differences within groups of people or animals, we often turn to DNA. A genome-wide association study (GWAS) can assess the genetic contribution of biological differences between individuals. However, the scale of input data continues to expand in three ways: the sequence coverage of genomes, the number of individuals sequenced, and the number of phenotype records per individual. High-throughput workflows are computationally intensive and require a laborious interpretation of results. These barriers inhibit systematically investigating hypotheses and limit the effective translation of genetics into biomedical and agricultural solutions. The expansion of data analyzed, compounded by numerous analysis approaches, increases the challenge of accurate interpretation of GWAS results; expensive results are unable to leave the computational lab. Our work results in the methodic evaluation of associations across numerous phenotypes. We expand our flexible, automatic, and computationally efficient pipeline by contrasting between GWAS models, GCTA and FarmCPU. By bootstrapping sample size, we aim to quantify the recurrence of markers which associate with phenotypes. Our method provides insight into complex GWAS results, improves our analysis dexterity within the genome, and shifts our interpretive efforts toward translating the data into meaningful fruition.