Treffer: Large Language Models for Supporting Clear Writing and Detecting Spin in Randomized Controlled Trials in Oncology: Comparative Analysis of GPT Models and Prompts.

Title:
Large Language Models for Supporting Clear Writing and Detecting Spin in Randomized Controlled Trials in Oncology: Comparative Analysis of GPT Models and Prompts.
Authors:
Koechli C; Department of Radiation Oncology, Kantonsspital Winterthur, Brauerstrasse 15, Winterthur, Switzerland, 41 52 266 26 53.; Department of Radiation Oncology, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland., Dennstädt F; Department of Radiation Oncology, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland., Schröder C; Department of Radiation Oncology, Kantonsspital Winterthur, Brauerstrasse 15, Winterthur, Switzerland, 41 52 266 26 53.; Department of Radiation Oncology, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland., Aebersold DM; Department of Radiation Oncology, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland., Förster R; Department of Radiation Oncology, Kantonsspital Winterthur, Brauerstrasse 15, Winterthur, Switzerland, 41 52 266 26 53.; Department of Radiation Oncology, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland., Zwahlen DR; Department of Radiation Oncology, Kantonsspital Winterthur, Brauerstrasse 15, Winterthur, Switzerland, 41 52 266 26 53., Windisch P; Department of Radiation Oncology, Kantonsspital Winterthur, Brauerstrasse 15, Winterthur, Switzerland, 41 52 266 26 53.; Department of Radiation Oncology, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland.
Source:
JMIR cancer [JMIR Cancer] 2026 Jan 21; Vol. 12, pp. e78221. Date of Electronic Publication: 2026 Jan 21.
Publication Type:
Journal Article; Comparative Study
Language:
English
Journal Info:
Publisher: JMIR Publications Country of Publication: Canada NLM ID: 101666844 Publication Model: Electronic Cited Medium: Internet ISSN: 2369-1999 (Electronic) Linking ISSN: 23691999 NLM ISO Abbreviation: JMIR Cancer Subsets: MEDLINE
Imprint Name(s):
Original Publication: Toronto, ON : JMIR Publications, [2015]-
References:
Crit Rev Oncol Hematol. 2019 Dec;144:102821. (PMID: 31733444)
J Clin Oncol. 2016 Mar 1;34(7):706-13. (PMID: 26755507)
J Clin Oncol. 2006 Jul 1;24(19):3089-94. (PMID: 16809734)
J Clin Oncol. 2005 Dec 20;23(36):9227-33. (PMID: 16275936)
JAMA. 2014 Apr 2;311(13):1300-7. (PMID: 24691606)
J Clin Oncol. 2014 Dec 20;32(36):4120-6. (PMID: 25403215)
J Clin Oncol. 2015 Dec 1;33(34):4039-47. (PMID: 26351344)
J Clin Oncol. 2021 Feb 1;39(4):295-307. (PMID: 33332189)
J Clin Oncol. 2013 Jan 20;31(3):301-7. (PMID: 23233721)
J Clin Oncol. 2006 Oct 10;24(29):4738-45. (PMID: 16966688)
Lancet Oncol. 2014 Jan;15(1):114-22. (PMID: 24332514)
Proc Mach Learn Res. 2025 Jun;287:458-479. (PMID: 41257216)
J Clin Oncol. 2023 May 10;41(14):2607-2616. (PMID: 36763945)
J Clin Oncol. 2008 Nov 20;26(33):5458-64. (PMID: 18955452)
JAMA Oncol. 2017 Nov 01;3(11):1538-1545. (PMID: 28715540)
Lancet Oncol. 2018 Jun;19(6):799-811. (PMID: 29753703)
Evid Based Med. 2016 Dec;21(6):201-202. (PMID: 27737894)
Ann Emerg Med. 2019 May 14;:423-431. (PMID: 31101371)
J Clin Oncol. 2016 Mar 10;34(8):786-93. (PMID: 26371143)
Cureus. 2024 Dec 15;16(12):e75748. (PMID: 39811231)
J Clin Oncol. 2021 Jul 20;39(21):2367-2374. (PMID: 33739848)
Lancet Oncol. 2014 Jan;15(1):59-68. (PMID: 24331154)
J Clin Oncol. 2005 Jun 1;23(16):3697-705. (PMID: 15738537)
JAMA Oncol. 2020 Dec 01;6(12):1923-1930. (PMID: 33030515)
J Clin Oncol. 2005 Feb 1;23(4):792-9. (PMID: 15681523)
J Clin Oncol. 2013 Feb 20;31(6):744-51. (PMID: 23129742)
JAMA Netw Open. 2021 Dec 1;4(12):e2135765. (PMID: 34874407)
Lancet. 2011 Jan 22;377(9762):321-31. (PMID: 21247627)
Contributed Indexing:
Keywords: data mining; large language models; natural language processing; randomized controlled trials; spin
Entry Date(s):
Date Created: 20260121 Date Completed: 20260121 Latest Revision: 20260125
Update Code:
20260125
PubMed Central ID:
PMC12823016
DOI:
10.2196/78221
PMID:
41564336
Database:
MEDLINE

Weitere Informationen

Background: Randomized controlled trials (RCTs) are the gold standard for evaluating interventions in oncology, but reporting can be subject to "spin"-presenting results in ways that mislead readers about true efficacy.
Objective: This study aimed to investigate whether large language models (LLMs) could provide a standardized approach to detect spin, particularly in the conclusions, where it most commonly occurs.
Methods: We randomly sampled 250 two-arm, single-primary end point oncology RCTs from 7 major medical journals published between 2005 and 2023. Two authors independently annotated trials as positive or negative based on whether they met their primary end point. Three commercial LLMs (GPT-3.5 Turbo, GPT-4o, and GPT-o1) were tasked with classifying trials as positive or negative when provided with (1) conclusions only; (2) methods and conclusions; (3) methods, results, and conclusions; or (4) title and full abstract. LLM performance was evaluated against human annotations. Afterward, trials incorrectly classified as positive when the model was provided only with the conclusions but correctly classified as negative when provided with the whole abstract were analyzed for patterns that may indicate the presence of spin. Model performance was assessed using accuracy, precision, recall, and F1-score calculated from confusion matrices.
Results: Of the 250 trials, 146 (58.4%) were positive, and 104 (41.6%) were negative. The GPT-o1 model demonstrated the highest performance across all conditions, with F1-scores of 0.932 (conclusions only; 95% CI 0.90-0.96), 0.96 (methods and conclusions; 95% CI 0.93-0.98), 0.98 (methods, results, and conclusions; 95% CI 0.96-0.99), and 0.97 (title and abstract; 95% CI 0.95-0.99). Analysis of trials incorrectly classified as positive when the model was provided only with the conclusions revealed shared patterns, including absence of primary end point results, emphasis on subgroup improvements, or unclear distinction between primary and secondary end points. These patterns were almost never found in trials correctly classified as negative.
Conclusions: LLMs can effectively detect potential spin in oncology RCT reporting by identifying discrepancies between how trials are presented in the conclusions vs the full abstracts. This approach could serve as a supplementary tool for improving transparency in scientific reporting, although further development is needed to address more complex trial designs beyond those examined in this feasibility study.
(© Carole Koechli, Fabio Dennstädt, Christina Schröder, Daniel M Aebersold, Robert Förster, Daniel R Zwahlen, Paul Windisch. Originally published in JMIR Cancer (https://cancer.jmir.org).)