Treffer: Multi-level learning: improving the prediction of protein, domain and residue interactions by allowing information flow between levels.

Title:
Multi-level learning: improving the prediction of protein, domain and residue interactions by allowing information flow between levels.
Authors:
Yip KY; Department of Computer Science, Yale University, 51 Prospect Street, New Haven, CT 06511, USA. yuklap.yip@yale.edu, Kim PM, McDermott D, Gerstein M
Source:
BMC bioinformatics [BMC Bioinformatics] 2009 Aug 05; Vol. 10, pp. 241. Date of Electronic Publication: 2009 Aug 05.
Publication Type:
Journal Article; Research Support, N.I.H., Extramural; Research Support, Non-U.S. Gov't
Language:
English
Journal Info:
Publisher: BioMed Central Country of Publication: England NLM ID: 100965194 Publication Model: Electronic Cited Medium: Internet ISSN: 1471-2105 (Electronic) Linking ISSN: 14712105 NLM ISO Abbreviation: BMC Bioinformatics Subsets: MEDLINE
Imprint Name(s):
Original Publication: [London] : BioMed Central, 2000-
References:
Bioinformatics. 2005 Aug 15;21(16):3360-8. (PMID: 15961445)
Science. 2006 Dec 22;314(5807):1938-41. (PMID: 17185604)
Genome Res. 2002 Oct;12(10):1540-8. (PMID: 12368246)
Genome Biol. 2007;8(9):R192. (PMID: 17868464)
Protein Sci. 2004 Jan;13(1):190-202. (PMID: 14691234)
BMC Bioinformatics. 2003 Jan 13;4:2. (PMID: 12525261)
Proc Natl Acad Sci U S A. 1999 Apr 13;96(8):4285-8. (PMID: 10200254)
FEBS Lett. 2005 Mar 21;579(8):1854-8. (PMID: 15763563)
J Mol Biol. 2001 Aug 24;311(4):681-92. (PMID: 11518523)
BMC Bioinformatics. 2006 May 25;7:269. (PMID: 16725050)
Bioinformatics. 2005 Apr 1;21(7):993-1001. (PMID: 15509600)
Nature. 2002 May 23;417(6887):399-403. (PMID: 12000970)
Nucleic Acids Res. 2004 Jan 1;32(Database issue):D449-51. (PMID: 14681454)
Nature. 2003 Oct 16;425(6959):686-91. (PMID: 14562095)
J Mol Graph. 1996 Feb;14(1):33-8, 27-8. (PMID: 8744570)
Bioinformatics. 2005 Sep 1;21 Suppl 2:ii220-1. (PMID: 16204107)
J Mol Biol. 2000 Jun 2;299(2):283-93. (PMID: 10860738)
Bioinformatics. 2008 Sep 15;24(18):2064-70. (PMID: 18641010)
Bioinformatics. 2005 Feb 1;21(3):410-2. (PMID: 15353450)
Bioinformatics. 2004 Aug 4;20 Suppl 1:i363-70. (PMID: 15262821)
Bioinformatics. 2006 Apr 1;22(7):823-9. (PMID: 16455753)
Cell. 2005 Sep 23;122(6):957-68. (PMID: 16169070)
Genetics. 2001 Nov;159(3):1291-8. (PMID: 11729170)
Proteins. 2005 May 15;59(3):467-75. (PMID: 15768403)
Bioinformatics. 2008 Aug 15;24(16):i35-41. (PMID: 18689837)
Bioinformatics. 2005 Jan 15;21(2):218-26. (PMID: 15319262)
Bioinformatics. 2004 Nov 1;20(16):2626-35. (PMID: 15130933)
Nature. 2000 Feb 10;403(6770):623-7. (PMID: 10688190)
Science. 1997 Oct 24;278(5338):631-7. (PMID: 9381173)
Nature. 2006 Mar 30;440(7084):637-43. (PMID: 16554755)
Bioinformatics. 2003 Oct 12;19(15):1875-81. (PMID: 14555619)
Science. 1999 Jul 30;285(5428):751-3. (PMID: 10427000)
Bioinformatics. 2005 Dec 15;21(24):4394-400. (PMID: 16234318)
BMC Bioinformatics. 2007 Jun 27;8:223. (PMID: 17594507)
Nature. 2006 Mar 30;440(7084):631-6. (PMID: 16429126)
Bioinformatics. 2005 Aug 1;21(15):3279-85. (PMID: 15905281)
Genome Biol. 2005;6(10):R89. (PMID: 16207360)
Nucleic Acids Res. 2008 Jan;36(Database issue):D281-8. (PMID: 18039703)
Bioinformatics. 2003 May 22;19(8):923-9. (PMID: 12761053)
Proc Natl Acad Sci U S A. 1998 Dec 8;95(25):14863-8. (PMID: 9843981)
Radiology. 1982 Apr;143(1):29-36. (PMID: 7063747)
J Mol Biol. 1991 Sep 5;221(1):327-46. (PMID: 1920412)
Nucleic Acids Res. 2008 Apr;36(6):2002-11. (PMID: 18281313)
Bioinformatics. 2001 May;17(5):455-60. (PMID: 11331240)
Mol Biol Cell. 2000 Dec;11(12):4241-57. (PMID: 11102521)
Nat Biotechnol. 2004 Jan;22(1):78-85. (PMID: 14704708)
Genome Biol. 2006;7(11):R104. (PMID: 17094802)
Mol Biol Cell. 1998 Dec;9(12):3273-97. (PMID: 9843569)
J Mol Biol. 2006 Sep 29;362(4):861-75. (PMID: 16949097)
Bioinformatics. 2005 Jun;21 Suppl 1:i38-46. (PMID: 15961482)
Bioinformatics. 2007 Jul 1;23(13):i57-65. (PMID: 17646345)
Nucleic Acids Res. 1997 Sep 1;25(17):3389-402. (PMID: 9254694)
Genome Res. 2005 Jul;15(7):945-53. (PMID: 15998909)
Proc Natl Acad Sci U S A. 2000 Feb 1;97(3):1143-7. (PMID: 10655498)
Curr Opin Struct Biol. 2007 Jun;17(3):378-84. (PMID: 17574836)
J Mol Biol. 2003 Mar 14;327(1):273-84. (PMID: 12614624)
Nature. 2005 Oct 20;437(7062):1173-8. (PMID: 16189514)
Proteins. 2006 Mar 15;62(3):630-40. (PMID: 16329107)
Science. 2003 Oct 17;302(5644):449-53. (PMID: 14564010)
Nucleic Acids Res. 2002 Jan 1;30(1):31-4. (PMID: 11752246)
Substance Nomenclature:
0 (Proteins)
Entry Date(s):
Date Created: 20090807 Date Completed: 20090929 Latest Revision: 20211020
Update Code:
20250114
PubMed Central ID:
PMC2734556
DOI:
10.1186/1471-2105-10-241
PMID:
19656385
Database:
MEDLINE

Weitere Informationen

Background: Proteins interact through specific binding interfaces that contain many residues in domains. Protein interactions thus occur on three different levels of a concept hierarchy: whole-proteins, domains, and residues. Each level offers a distinct and complementary set of features for computationally predicting interactions, including functional genomic features of whole proteins, evolutionary features of domain families and physical-chemical features of individual residues. The predictions at each level could benefit from using the features at all three levels. However, it is not trivial as the features are provided at different granularity.
Results: To link up the predictions at the three levels, we propose a multi-level machine-learning framework that allows for explicit information flow between the levels. We demonstrate, using representative yeast interaction networks, that our algorithm is able to utilize complementary feature sets to make more accurate predictions at the three levels than when the three problems are approached independently. To facilitate application of our multi-level learning framework, we discuss three key aspects of multi-level learning and the corresponding design choices that we have made in the implementation of a concrete learning algorithm. 1) Architecture of information flow: we show the greater flexibility of bidirectional flow over independent levels and unidirectional flow; 2) Coupling mechanism of the different levels: We show how this can be accomplished via augmenting the training sets at each level, and discuss the prevention of error propagation between different levels by means of soft coupling; 3) Sparseness of data: We show that the multi-level framework compounds data sparsity issues, and discuss how this can be dealt with by building local models in information-rich parts of the data. Our proof-of-concept learning algorithm demonstrates the advantage of combining levels, and opens up opportunities for further research.
Availability: The software and a readme file can be downloaded at http://networks.gersteinlab.org/mll. The programs are written in Java, and can be run on any platform with Java 1.4 or higher and Apache Ant 1.7.0 or higher installed. The software can be used without a license.