Application of Deep Learning in Predicting Phenotypes

Genomic selection (GS) can use single-nucleotide polymorphism (SNPs) markers to predict breeding values (BV) for enhancing quantitative traits in breeding populations. GS has been proved to increase breeding efficiency in both plant and animal breeding. However, existing statistical and machine-learning methods require imputation to missing values in genotypes, which leads to poor generalization and computation inefficiency. Here, we propose a deep-learning model using convolutional neural networks (CNN) to predict the Genomic Estimated Breeding Value (GEBV) and also to investigate contributions of genomic SNPs to GEBV using a saliency map approach.Comparing with traditional statistical models including rr-BLUP, Bayesian ridge regression, Bayesian LASSO and Bayes A on a Glycine Max (soybean) Nested Association Mapping (NAM) dataset and a simulation dataset, our model can better handle the missing values in genotypes and is more efficient in accurately predicting breeding values. Our model also has a great potential in interpreting phenotype-genotype associations over the entire genome. The model is available at https://github.com/kateyliu/DL_gwas.