Genotype/phenotype correlations (GWAS) using computationally accelerated epistatic interaction models. The objective is to optimise the application of state-of-the-art methods to examine pairwise epistatic effects on the causes of complex disease using high performance computing, in order to detect biologically relevant pathways and potential genetic biomarkers.
This work introduces a user-friendly application allowing two categories of users, clinicians and bioinformaticians, to analyse GWAS genotype/phenotype correlations using computationally accelerated epistatic interaction models. The objective is to optimise the application of state-of-the-art methods to examine pairwise epistatic effects on the causes of complex disease using high performance computing, in order to detect biologically relevant pathways and potential genetic biomarkers.
It is widely agreed that complex diseases are typically caused by joint effects of multiple genetic variations, rather than a single genetic variation (Anunciação et al., 2013). Multi- SNP interactions, also known as epistatic interactions, have the potential to provide information about causes of complex diseases, and build on GWAS studies that look at associations between single SNPs and phenotypes. Genes can be mapped to the SNPs that are identified for downstream analysis, aiding in the identification of functional enrichment for disease using tools such as ClueGO (Bindea et al., 2009) and GOEast (Zheng et al., 2008).
Due to the large number of interactions that have to be calculated, implementation of these epistatic interaction models is not practical. To illustrate; a relatively small GWAS dataset, with 100,000 SNPs that pass quality control, has 5×10-9 pairwise interactions. Using the FaST-LMM epistatic interaction model (Lippert et al., 2011), it would take approximately two years to calculate these pairwise interactions on a desktop computer. As such, this does not present a viable tool for researchers.
High performance computing supports deployment of epistatic models across various cores, thereby accelerating them. As the majority of these models are deployed using command line interfaces, this work proposes pipelining the applications using a Javabased GUI to make them more accessible. Java 1.8 SE provides the fork/join framework, enabling the implementation of parallel computing in applications (Oracle, 2014). This work therefore builds on existing epistatic models by both accelerating them, and making them more accessible. Initially, two different types of model are used: the linear regression model BOOST (Wan et al, 2010), and the linear mixed methods model FaST-LMM (Lippert et al., 2011). Here, we present a scaled-down prototype version of our application that can be deployed on a typical desktop computer to demonstrate our approach, available upon request.
Anunciação, Orlando, Susana Vinga, and Arlindo L. Oliveira. Using Information Interaction to Discover Epistatic Effects in Complex Diseases. PloS one 8, no. 10 (2013): e76300.
Bindea, Gabriela, Bernhard Mlecnik, Hubert Hackl, Pornpimol Charoentong, Marie Tosolini, Amos Kirilovsky, Wolf-Herman Fridman, Franck Pagès, Zlatko Trajanoski, and Jérôme Galon. ClueGO: a Cytoscape plug-in to decipher functionally grouped gene ontology and pathway annotation networks.Bioinformatics 25, no. 8 (2009): 1091-1093.
Zheng, Qi, and Xiu-Jie Wang. GOEAST: a web-based software toolkit for Gene Ontology enrichment analysis. Nucleic acids research 36, no. suppl 2 (2008): W358-W363.
Lippert, Christoph, Jennifer Listgarten, Ying Liu, Carl M. Kadie, Robert I. Davidson, and David Heckerman. FaST linear mixed models for genome-wide association studies. Nature Methods 8, no. 10 (2011): 833-835.
Oracle (2014). Parallelism. The Java TM Tutorials. Oracle Technology Network .
Wan, Xiang, Can Yang, Qiang Yang, Hong Xue, Xiaodan Fan, Nelson LS Tang, and Weichuan Yu. BOOST: A fast approach to detecting gene-gene interactions in genome-wide case-control studies. The American Journal of Human Genetics 87, no. 3 (2010): 325-340.
Purcell, Shaun, Benjamin Neale, Kathe Todd-Brown, Lori Thomas, Manuel AR Ferreira, David Bender, Julian Maller et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. The American Journal of Human Genetics 81, no. 3 (2007): 559-575.
Xie, Zhihui, Vijayaraj Nagarajan, Daniel E. Sturdevant, Shoko Iwaki, Eunice Chan, Laura Wisch, Michael Young, Celeste M. Nelson, Stephen F. Porcella, and Kirk M. Druey. Genome-wide SNP analysis of the Systemic Capillary Leak Syndrome (Clarkson disease). Rare diseases (Austin, Tex.) 1, no. 1 (2013).