Treffer: Snowflake Data Warehouse for Large-Scale and Diverse Biological Data Management and Analysis.

Title:
Snowflake Data Warehouse for Large-Scale and Diverse Biological Data Management and Analysis.
Authors:
Koreeda T; CLINIC FOR Group, Nagisa Terrace 4F, 3-1-32 Shibaura, Minato-ku, Tokyo 108-0023, Japan., Honda H; Kao Corporation, Bunka, Sumida-ku, Tokyo 131-8501, Japan., Onami JI; RIKEN BioResource Research Center, 3-1-1 Koyadai, Tsukuba 305-0074, Japan.
Source:
Genes [Genes (Basel)] 2024 Dec 28; Vol. 16 (1). Date of Electronic Publication: 2024 Dec 28.
Publication Type:
Journal Article; Review
Language:
English
Journal Info:
Publisher: MDPI Country of Publication: Switzerland NLM ID: 101551097 Publication Model: Electronic Cited Medium: Internet ISSN: 2073-4425 (Electronic) Linking ISSN: 20734425 NLM ISO Abbreviation: Genes (Basel) Subsets: MEDLINE
Imprint Name(s):
Original Publication: Basel : MDPI
References:
Biol Direct. 2012 Nov 28;7:43; discussion 43. (PMID: 23190475)
Nucleic Acids Res. 2023 Jan 6;51(D1):D1373-D1380. (PMID: 36305812)
Nature. 2020 May;581(7809):434-443. (PMID: 32461654)
PLoS Comput Biol. 2021 Jul 20;17(7):e1009244. (PMID: 34283824)
Sci Data. 2021 Jan 21;8(1):24. (PMID: 33479214)
Nucleic Acids Res. 2015 Jul 1;43(W1):W612-20. (PMID: 25883136)
NPJ Digit Med. 2022 Dec 26;5(1):194. (PMID: 36572766)
Nat Rev Genet. 2018 Apr;19(4):208-219. (PMID: 29379135)
Nucleic Acids Res. 2014 Jan;42(Database issue):D18-25. (PMID: 24271396)
Nat Med. 2023 Aug;29(8):1930-1940. (PMID: 37460753)
Pediatr Res. 2023 Mar;93(4):969-975. (PMID: 35854085)
PLoS Biol. 2015 Jul 07;13(7):e1002195. (PMID: 26151137)
Einstein (Sao Paulo). 2022 Mar 07;20:eED6324. (PMID: 35303051)
PLoS Comput Biol. 2020 Mar 26;16(3):e1007531. (PMID: 32214318)
Curr Top Med Chem. 2017;17(15):1709-1726. (PMID: 27848897)
Big Data. 2022 Oct;10(5):408-424. (PMID: 35666602)
Genome Biol. 2016 Mar 23;17:53. (PMID: 27009100)
Nature. 2013 Jun 13;498(7453):255-60. (PMID: 23765498)
J Chem Inf Model. 2015 Nov 23;55(11):2324-37. (PMID: 26479676)
Contributed Indexing:
Keywords: Snowflake; big data; biodata; bioinformatics; data science; data warehouse
Entry Date(s):
Date Created: 20250125 Date Completed: 20250503 Latest Revision: 20250503
Update Code:
20250505
PubMed Central ID:
PMC11765040
DOI:
10.3390/genes16010034
PMID:
39858581
Database:
MEDLINE

Weitere Informationen

With the increasing speed of genomic, transcriptomic, and metagenomic data generation driven by the advancement and widespread adoption of next-generation sequencing technologies, the management and analysis of large-scale, diverse data in the fields of life science and biotechnology have become critical challenges. In this paper, we thoroughly discuss the use of cloud data warehouses to address these challenges. Specifically, we propose a data management and analysis framework using Snowflake, a SaaS-based data platform. We further demonstrate its convenience and effectiveness through concrete examples, such as disease variant analysis and in silico drug discovery. By introducing Snowflake, researchers can efficiently manage and analyze a wide array of biological data, enabling the discovery of new biological insights through integrated analysis. Through these specific methodologies and application examples, we aim to accelerate research progress in the field of bioinformatics.