Health Tech

'Do we really want it just to pass?': Study finds ChatGPT fails gastroenterology training exam

By Anastassia Gliadkovskaya May 25, 2023 11:00am

image of AI — “This is far from optimized for medical usage," the study's senior author told Fierce Healthcare. (Photo by Igor Kutyaev/iStock/Getty Images Plus)

ChatGPT failed to pass the 2021 and 2022 self-assessment tests for the American College of Gastroenterology, a new study found.

Published earlier this week in the American Journal of Gastroenterology, the study found the tool failed the two multiple-choice tests, which are a barometer for how one would do on the American Board of Internal Medicine Gastroenterology board exam.

ChatGPT is a 175-billion-parameter natural language processing model that generates human-like text in response to user prompts. The tool is a large language model (LLM) trained to predict word sequences based on context.

ChatGPT-3 and ChatGPT-4, two versions last updated in 2021, both scored below the 70% required to pass the exam. For ChatGPT to become a reliable and widely acceptable education tool, it should consistently provide more than 95% accuracy, the authors wrote.

The tests included hundreds of questions with real-time feedback on the correct answer. Each multiple-choice question was copied and pasted directly into ChatGPT and then a corresponding answer was selected on the online test based on the tool’s response. There was no pattern of question type that it answered incorrectly.

The AI tool made waves in healthcare when a study found it passed the U.S. Medical Licensing Exam. Since then, some advocates have called for a need to use multi-disciplinary training incorporating AI to keep up with the changing medical landscape, the study noted.

“There’s been an increased use of ChatGPT in every field, but in medicine, we noticed more and more people using it,” Arvind Trindade, M.D., associate professor at The Feinstein Institutes for Medical Research and senior author of the study, told Fierce Healthcare. Both trainees and patients have been seen using it, he said, so the authors wanted to put it to the test.

“We actually thought it was going to do pretty well,” Trindade said. “When we looked at the final results we were a bit surprised.”

ChatGPT was never specifically trained on medical literature, and was last updated in 2021, the study explained. Most of the data used to train the tool was sourced from open-source information. To answer certain questions correctly on the gastroenterology exam, access to paid journal subscriptions or databases may have been required.

While medical schools can’t enforce where students get their medical information from, there are recommended sources, Trindade said. They include medical guidelines, journals and databases. If used for medical education, future versions of ChatGPT should be trained on the latest medical guidelines and actively updated.

generative AI American College of Gastroenterology Medical Training patient education Patient Safety Artificial Intelligence AI and Machine Learning Providers Practices Health Tech

'Do we really want it just to pass?': Study finds ChatGPT fails gastroenterology training exam

Related