The Genescape Allele Catalog Development for Precise Identification of Causative SNPs

Next-generation sequencing (NGS) has become more popular in the modern-day. Large amounts of next-generation resequencing data have been generated and are available online for various organisms including soybeans. However, current genome-wide association study (GWAS) prediction tools simply identify the most significant SNP based on Manhattan plots and still have some limitations in pinpointing the exact causative SNPs using the SNP array or NGS datasets. Therefore, we are developing a Genescape catalog, a new bioinformatics approach to integrate all potential alleles for all genes in soybean genome using the genomic variations and phenotypic information from a large subset of cultivated and wild soybeans including ancestor lines. In this research, we have developed an automated pipeline using open-source such as Bcftools, Beagle, and SnpEff and in-house developed methods to create the Genescape catalog. The analysis steps comprise of combining various large scale NGS resequencing datasets and includes SNP and Indel position alignments, imputations, frequency estimation, functional effect calculations, and GWAS to gene mappings. This method allows researchers to understand genetic variation in genes across different samples and also the functional effects of the genetic variations. Most importantly, the soybean Genescape catalog from this research captures the genetic variations in ancestor lines, elite, landraces, and soja categories. We are also implementing a web-based tool to facilitate accessibility and sharing of the Genescape catalogs via SoyKB so that this allele information can be widely utilized by soybean researchers and applied in improving traits. 

Please contact Robert Sanders ( for Zoom information.