
Education
ChatGPT Utility in Healthcare Education, Research, and Practice: Systematic Review on the Promising Perspectives and Valid Concerns
M. Sallam
Explore the transformative potential of ChatGPT in healthcare through this systematic review by Malik Sallam. Discover its benefits in scientific writing and personalized learning, while also addressing crucial ethical concerns that must be navigated in this evolving landscape.
~3 min • Beginner • English
Introduction
Artificial intelligence (AI) is defined as a multidisciplinary field aiming to create machines capable of tasks requiring human intelligence, including learning, adaptation, reasoning, and understanding complex abstract concepts. The history of AI as a scientific discipline dates to the Dartmouth Summer Research Project on AI, followed by advances in machine learning algorithms and other techniques. Launched in November 2022, ChatGPT is an AI-based large language model (LLM) developed by OpenAI, based on the generative pre-trained transformer (GPT) architecture, capable of generating human-like responses in multiple languages. While ChatGPT can assist conversational and writing tasks and improve efficiency and accuracy, concerns exist around dataset bias, factual inaccuracies and hallucinations, security vulnerabilities, and potential misuse including misinformation. In academia and healthcare, these benefits and risks have sparked debate regarding its role in education, research, and practice. Prior work highlights potential applications in personalized medicine, drug discovery, large dataset analysis, and improved diagnosis and clinical decision-making, as well as potential in healthcare education due to the vast and complex knowledge requirements of students. However, there are valid concerns including bias, discrimination, lack of transparency and reliability, cybersecurity, ethical and societal implications. Therefore, the aim of this systematic review was to explore the future perspectives of ChatGPT as a prime example of LLMs in health care education, academic/scientific writing, health care research, and health care practice, and to identify potential limitations and concerns associated with its application in these areas.
Literature Review
Methodology
Design: Systematic review following PRISMA guidelines.
Information sources: PubMed/MEDLINE and Google Scholar (via Publish or Perish v8).
Search strategy and dates: PubMed/MEDLINE search concluded on 16 February 2023 using (ChatGPT) AND (("2022/11/30" [Date-Publication]: "3000" [Date-Publication])) yielding 42 records. Google Scholar search for term "ChatGPT" for years 2022–2023 yielded 238 records (concluded 16 February 2023).
Eligibility criteria: Any type of published scientific research or preprints (article, review, communication, editorial, opinion, etc.) addressing ChatGPT in (1) health care practice/research; (2) health care education; or (3) academic writing.
Exclusion criteria: Non-English records; records addressing ChatGPT outside the above scopes; records from non-academic sources (e.g., newspapers, websites, magazines).
Screening and selection: 280 records imported into EndNote v20. Title/abstract screening conducted; duplicates excluded (n=40); non-English (n=32) excluded; out-of-scope (n=80) excluded; non-academic sources (n=18) excluded. Full-text screening of remaining 110 records excluded 41 as out-of-scope and 9 inaccessible subscription-based articles. Final inclusion: 60 records.
Data extraction focus: (1) Type of record (preprint, research, commentary, editorial, etc.); (2) listed benefits/applications of ChatGPT in healthcare education, practice, or scientific research/academic writing; (3) listed risks/concerns; (4) main conclusions and recommendations.
Categorization schemes: Benefits categorized as (1) educational benefits; (2) academic/scientific writing; (3) scientific research; (4) healthcare practice; (5) free availability. Risks categorized as (1) ethical issues (bias, plagiarism); (2) hallucination; (3) transparency (black box); (4) declining need for human expertise; (5) over-detailed/excessive content; (6) data privacy concerns; (7) declining clinical/critical thinking skills; (8) legal issues (copyright, authorship); (9) interpretability; (10) referencing issues; (11) academic fraud; (12) incorrect content; (13) infodemic risk.
Key Findings
- Records: 280 identified; after screening and exclusions, 60 records included.
- Reported benefits: 51/60 records (85.0%) cited benefits, including:
• Academic/scientific writing: efficiency and versatility, high-quality text, improved language/readability/translation, promoting research equity, accelerated literature review (mentioned in 31 records; 51.7%).
• Scientific research: analysis of large datasets (e.g., EHR/genomics), code generation, freeing time for experimental design, drug design/discovery (20 records; 33.3%).
• Healthcare practice: personalized medicine, disease risk/outcome prediction, streamlined workflow, improved diagnostics, documentation, cost saving, improved health literacy (14 records; 23.3%).
• Healthcare education: generation of clinical vignettes, personalized learning with feedback, adjunct in group learning, enhanced communication skills (7 records; 11.7%).
• Free availability (2 records; 3.3%).
- Reported risks/concerns: 58/60 records (96.7%) cited concerns, including:
• Ethical concerns (33 records; 55.0%), notably bias (18; 30.0%) and plagiarism (14; 23.3%), plus data privacy/security.
• Incorrect/inaccurate information (20; 33.3%).
• Referencing inaccuracies/inadequate citations (10; 16.7%).
• Transparency issues/black box (10; 16.7%).
• Legal issues (7; 11.7%).
• Knowledge cutoff before 2021 (6; 10.0%).
• Misinformation/infodemic risk (5; 8.3%).
• Over-detailed/excessive content (5; 8.3%).
• Copyright issues (4; 6.7%).
• Lack of originality (4; 6.7%).
- Additional findings:
• One-third of included records were preprints (n=20), with medRxiv (n=6; 30.0%), SSRN and arXiv (n=4; 20.0% each) being common.
• Authorship: Current ICMJE/COPE guidelines do not support listing ChatGPT as an author; some early instances occurred but major journals disallow it.
• Evidence of hallucination and citation fabrication in case studies and evaluations; necessity of human expert oversight is emphasized.
• Demonstrated applications include drafting discharge summaries and aiding radiologic decision-making with moderate accuracy; educational performance includes passing thresholds on USMLE-related assessments but variable topic performance.
Discussion
The review addressed whether and how ChatGPT can be useful across healthcare education, research, and practice, and delineated its accompanying risks. Findings show widespread recognition of potential benefits: in academic writing and research, ChatGPT can accelerate literature reviews, generate code, improve language quality (especially aiding non-native English speakers), and free researchers to focus on experimental design. In clinical practice, it may streamline workflows (e.g., discharge summaries), support diagnostics and risk prediction, contribute to personalized medicine, and improve public health literacy. In education, it can generate realistic clinical vignettes, offer personalized learning with immediate feedback, and support group learning, with evidence of adequate performance on some professional examinations.
At the same time, limitations directly affect reliability and safety: risks of superficial, inaccurate or fabricated content, hallucinations, biased outputs reflecting training data, lack of transparency/interpretability, citation fabrication and referencing inaccuracies, limited up-to-date knowledge, and reproducibility issues. In healthcare settings, these limitations can lead to harmful consequences, complicate accountability and medico-legal responsibility, and challenge data governance and privacy. Educationally, potential plagiarism and academic dishonesty necessitate reassessment of assessment methods toward critical and problem-based thinking. The review underscores that listing ChatGPT as a scientific author is inappropriate under current ICMJE/COPE standards; however, transparent disclosure of LLM use in methods or acknowledgments is recommended. Overall, the results support a cautious, human-in-the-loop approach: harness benefits while instituting safeguards, transparency, and ethical guidelines to mitigate risks, ensure quality, and maintain human oversight and expertise.
Conclusion
The widespread adoption of LLMs, including ChatGPT, in healthcare education, research, and practice is likely inevitable. To ensure safe and responsible use, urgent development of appropriate guidelines and regulations—engaging all relevant stakeholders—is required. A proactive, ethical embrace of these technologies, coupled with rigorous oversight, can limit future complications, expedite innovation, and promote equity and diversity in research by overcoming language barriers. Human-in-the-loop paradigms ("ChatGPT in the Loop: Humans in Charge") should guide deployment, recognizing the indispensable role of human knowledge and expertise. Before broad adoption in healthcare, real-world evaluations, ideally risk-based, are needed to assess impact and prevent misuse-driven harms.
Limitations
The review’s findings should be interpreted considering several limitations: (1) variable quality of included records, affecting generalizability; (2) exclusion of non-English records, introducing potential selection bias; (3) exclusion of some subscription-based records due to inaccessible full texts, possibly missing relevant data; (4) inclusion of preprints not peer reviewed, potentially compromising generalizability; (5) rapidly evolving literature with the search ending on 16 February 2023, necessitating future updates; (6) single-author screening and interpretation, which may limit interpretability; future systematic reviews should consider collaborative approaches to improve quality and credibility.
Related Publications
Explore these studies to deepen your understanding of the subject.