Treffer: Using ChatGPT as a Tool for Training Nonprogrammers to Generate Genomic Sequence Analysis Code

Title:
Using ChatGPT as a Tool for Training Nonprogrammers to Generate Genomic Sequence Analysis Code
Language:
English
Source:
Biochemistry and Molecular Biology Education. 2025 53(4):433-444.
Availability:
Wiley. Available from: John Wiley & Sons, Inc. 111 River Street, Hoboken, NJ 07030. Tel: 800-835-6770; e-mail: cs-journals@wiley.com; Web site: https://www.wiley.com/en-us
Peer Reviewed:
Y
Page Count:
12
Publication Date:
2025
Sponsoring Agency:
National Science Foundation (NSF), Division of Integrative Organismal Systems (IOS)
National Science Foundation (NSF), Division of Molecular and Cellular Biosciences (MCB)
Contract Number:
2243532
2219900
Document Type:
Fachzeitschrift Journal Articles<br />Reports - Research
Education Level:
Higher Education
Postsecondary Education
DOI:
10.1002/bmb.21899
ISSN:
1470-8175
1539-3429
Entry Date:
2025
Accession Number:
EJ1478271
Database:
ERIC

Weitere Informationen

Today, due to the size of many genomes and the increasingly large sizes of sequencing files, independently analyzing sequencing data is largely impossible for a biologist with little to no programming expertise. As such, biologists are typically faced with the dilemma of either having to spend a significant amount of time and effort to learn how to program themselves or having to identify (and rely on) an available computer scientist to analyze large sequence data sets. That said, the advent of AI-powered programs like ChatGPT may offer a means of circumventing the disconnect between biologists and their analysis of genomic data critically important to their field. The work detailed herein demonstrates how implementing ChatGPT into an existing Course-based Undergraduate Research Experience curriculum can provide a means for equipping biology students with no programming expertise the power to generate their own programs and allow those students to carry out a publishable, comprehensive analysis of real-world Next Generation Sequencing (NGS) datasets. Relying solely on the students' biology background as a prompt for directing ChatGPT to generate Python codes, we found students could readily generate programs able to deal with and analyze NGS datasets greater than 10 gigabytes. In summary, we believe that integrating ChatGPT into education can help bridge a critical gap between biology and computer science and may prove similarly beneficial in other disciplines. Additionally, ChatGPT can provide biological researchers with powerful new tools capable of mediating NGS dataset analysis to help accelerate major new advances in the field.

As Provided