J Dent Hyg Sci.  2024 Mar;24(1):62-70. 10.17135/jdhs.2024.24.1.62.

Performance of ChatGPT on the Korean National Examination for Dental Hygienists

Affiliations
  • 1Department of Dental Hygiene, College of Dentistry, Gangneung-Wonju National University, Gangneung 25457, Korea
  • 2Research Institute of Dental Hygiene Science, Gangneung-Wonju National University, Gangneung 25457, Korea
  • 3Research Institute of Oral Science, Gangneung-Wonju National University, Gangneung 25457, Korea

Abstract

Background
This study aimed to evaluate ChatGPT’s performance accuracy in responding to questions from the national dental hygienist examination. Moreover, through an analysis of ChatGPT’s incorrect responses, this research intended to pinpoint the predominant types of errors.
Methods
To evaluate ChatGPT-3.5’s performance according to the type of national examination questions, the researchers classified 200 questions of the 49th National Dental Hygienist Examination into recall, interpretation, and solving type questions. The researchers strategically modified the questions to counteract potential misunderstandings from implied meanings or technical terminology in Korea. To assess ChatGPT-3.5’s problem-solving capabilities in applying previously acquired knowledge, the questions were first converted to subjective type. If ChatGPT-3.5 generated an incorrect response, an original multiple-choice framework was provided again. Two hundred questions were input into ChatGPT-3.5 and the generated responses were analyzed. After using ChatGPT, the accuracy of each response was evaluated by researchers according to the types of questions, and the types of incorrect responses were categorized (logical, information, and statistical errors). Finally, hallucination was evaluated when ChatGPT provided misleading information by answering something that was not true as if it were true.
Results
ChatGPT’s responses to the national examination were 45.5% accurate. Accuracy by question type was 60.3% for recall and 13.0% for problem-solving type questions. The accuracy rate for the subjective solving questions was 13.0%, while the accuracy for the objective questions increased to 43.5%. The most common types of incorrect responses were logical errors 65.1% of all. Of the total 102 incorrectly answered questions, 100 were categorized as hallucinations.
Conclusion
ChatGPT-3.5 was found to be limited in its ability to provide evidence-based correct responses to the Korean national dental hygiene examination. Therefore, dental hygienists in the education or clinical fields should be careful to use artificial intelligence-generated materials with a critical view.

Keyword

Artificial intelligence; ChatGPT; Dental hygiene; Large language models; National examination
Full Text Links
  • JDHS
Actions
Cited
CITED
export Copy
Close
Share
  • Twitter
  • Facebook
Similar articles
Copyright © 2024 by Korean Association of Medical Journal Editors. All rights reserved.     E-mail: koreamed@kamje.or.kr