Treffer: Gene-gene interaction: the curse of dimensionality.

Title:
Gene-gene interaction: the curse of dimensionality.
Authors:
Chattopadhyay A; Institute of Epidemiology and Preventive Medicine, Department of Public Health, National Taiwan University, Taipei., Lu TP; Institute of Epidemiology and Preventive Medicine, Department of Public Health, National Taiwan University, Taipei.
Source:
Annals of translational medicine [Ann Transl Med] 2019 Dec; Vol. 7 (24), pp. 813.
Publication Type:
Journal Article; Review
Language:
English
Journal Info:
Publisher: AME Publishing Company Country of Publication: China NLM ID: 101617978 Publication Model: Print Cited Medium: Print ISSN: 2305-5839 (Print) Linking ISSN: 23055839 NLM ISO Abbreviation: Ann Transl Med Subsets: PubMed not MEDLINE
Imprint Name(s):
Original Publication: [Hong Kong] : AME Publishing Company
References:
Bioinform Biol Insights. 2017 Oct 04;11:1177932217735096. (PMID: 29051702)
Nature. 2010 Oct 28;467(7319):1061-73. (PMID: 20981092)
Hum Genet. 2011 Jan;129(1):101-10. (PMID: 20981448)
Am J Hum Genet. 2007 Jun;80(6):1125-37. (PMID: 17503330)
Pharmacogenomics. 2008 Feb;9(2):235-46. (PMID: 18370851)
Ann Hum Genet. 2011 Jan;75(1):20-8. (PMID: 21091664)
PLoS One. 2014 Apr 02;9(4):e93379. (PMID: 24695491)
Appl Soft Comput. 2007 Jan;7(1):471-479. (PMID: 20948988)
BMC Res Notes. 2008 Aug 13;1:65. (PMID: 18710518)
Ann Hum Genet. 2011 Jan;75(1):78-89. (PMID: 21158747)
Bioinformatics. 2012 Sep 15;28(18):i582-i588. (PMID: 22962485)
BMC Bioinformatics. 2009 Jan 30;10 Suppl 1:S65. (PMID: 19208169)
Bioinformatics. 2007 Jan 1;23(1):71-6. (PMID: 17092990)
Nature. 2012 Nov 1;491(7422):56-65. (PMID: 23128226)
Proc Natl Acad Sci U S A. 2012 Jan 24;109(4):1193-8. (PMID: 22223662)
Gene. 2014 Jan 1;533(1):304-12. (PMID: 24076437)
Genomics Inform. 2016 Dec;14(4):166-172. (PMID: 28154507)
Hum Mol Genet. 2002 Oct 1;11(20):2463-8. (PMID: 12351582)
Bioessays. 2005 Jun;27(6):637-46. (PMID: 15892116)
Brief Bioinform. 2016 Mar;17(2):293-308. (PMID: 26108231)
Genomics Inform. 2016 Dec;14(4):138-148. (PMID: 28154504)
Bioinformatics. 2019 Jul 15;35(14):i538-i547. (PMID: 31510706)
BMC Bioinformatics. 2011 Dec 12;12:469. (PMID: 22151604)
Genet Epidemiol. 2008 Feb;32(2):152-67. (PMID: 17968988)
Genet Epidemiol. 2003 Feb;24(2):150-7. (PMID: 12548676)
Nature. 2009 Oct 8;461(7265):747-53. (PMID: 19812666)
Nat Rev Genet. 2008 Nov;9(11):855-67. (PMID: 18852697)
Mol Biol Evol. 2017 Dec 1;34(12):3254-3266. (PMID: 29029158)
Contributed Indexing:
Keywords: Gene-gene interaction; PySpark; deep-learning (DL); machine-learning (ML); multifactor dimensionality reduction (MDR); parallel computing
Entry Date(s):
Date Created: 20200212 Latest Revision: 20200928
Update Code:
20250114
PubMed Central ID:
PMC6989881
DOI:
10.21037/atm.2019.12.87
PMID:
32042829
Database:
MEDLINE

Weitere Informationen

Identified genetic variants from genome wide association studies frequently show only modest effects on the disease risk, leading to the "missing heritability" problem. An avenue, to account for a part of this "missingness" is to evaluate gene-gene interactions (epistasis) thereby elucidating their effect on complex diseases. This can potentially help with identifying gene functions, pathways, and drug targets. However, the exhaustive evaluation of all possible genetic interactions among millions of single nucleotide polymorphisms (SNPs) raises several issues, otherwise known as the "curse of dimensionality". The dimensionality involved in the epistatic analysis of such exponentially growing SNPs diminishes the usefulness of traditional, parametric statistical methods. With the immense popularity of multifactor dimensionality reduction (MDR), a non-parametric method, proposed in 2001, that classifies multi-dimensional genotypes into one- dimensional binary approaches, led to the emergence of a fast-growing collection of methods that were based on the MDR approach. Moreover, machine-learning (ML) methods such as random forests and neural networks (NNs), deep-learning (DL) approaches, and hybrid approaches have also been applied profusely, in the recent years, to tackle this dimensionality issue associated with whole genome gene-gene interaction studies. However, exhaustive searching in MDR based approaches or variable selection in ML methods, still pose the risk of missing out on relevant SNPs. Furthermore, interpretability issues are a major hindrance for DL methods. To minimize this loss of information, Python based tools such as PySpark can potentially take advantage of distributed computing resources in the cloud, to bring back smaller subsets of data for further local analysis. Parallel computing can be a powerful resource that stands to fight this "curse". PySpark supports all standard Python libraries and C extensions thus making it convenient to write codes to deliver dramatic improvements in processing speed for extraordinarily large sets of data.
(2019 Annals of Translational Medicine. All rights reserved.)

Conflicts of Interest: The authors have no conflicts of interest to declare.