Chatbots outperformed doctors in answering patient questions with accuracy and empathy: JAMA study

Virtual patient communication and documentation have been lauded as some of the immediate potential applications of generative AI large language models in healthcare.

A study published in JAMA Internal Medicine late last week revealed that when it comes to more nuanced tasks like answering patient questions, responses generated from AI-based chatbots were typically longer, higher in quality and more empathetic than those from the physicians.

Out of 195 patient questions and responses, the study found that a team of licensed healthcare professionals preferred chatbot responses to physicians' responses in 78.6% of the 585 evaluations. Evaluators found the ChatGPT responses were often superior to physician responses in both quality and empathy.  

The questions were drawn at random from public social media forum posts and were judged by evaluators by “the quality of the information provided” and “the empathy or bedside manner provided.” The study suggested that chatbots could be used as an aid to physicians rather than a replacement when it comes to answering patient questions.

“ChatGPT represents a new generation of AI technologies driven by advances in large language models,” the analysis said. “ChatGPT reached 100 million users within 64 days of its November 30, 2022 release and is widely recognized for its ability to write near-human-quality text on a wide range of topics. The system was not developed to provide health care, and its ability to help address patient questions is unexplored.”

The study led with the assertion that the COVID-19 pandemic expedited the implementation of virtual healthcare leading to 1.6 times increase in messages from patients that on average each added 2.3 minutes of work to physicians’ already packed schedules. Additional messaging volume, the analysis said, predicts increased burnout for clinicians with 62% of physicians already reporting at least one symptom of burn out.

Mean length of physician responses were clocked at 52 words while chatbot responses reached an average of 211. When it comes to quality, the bot also scored 3.6 times higher prevalence of good or very good marks. The chatbot also scored 9.8 times higher empathetic or very empathetic results.

Posts were chosen from the social media forum Reddit’s r/AskDocs thread from October 2022. Authors wrote that a cross-sectional study including patient questions from a healthcare system was not possible due to compliance with the Health Insurance Portability and Accountability Act but that such an analysis could be possible by using de-identified data.

“Media reports suggest that physicians are already integrating chatbots into their practices without evidence,” the analysis said. “For reasons of need, practicality and to empower the development of a rapidly available and sharable database of patient questions, we collected public and patient questions and physician responses posted to an online social media forum, Reddit’s r/AskDocs.”

Reddit’s online forum, r/AskDocs, boasts roughly 474,000 members who can post questions to be answered by verified healthcare professionals. Each question was evaluated by three healthcare professionals working in either pediatrics, geriatrics, internal medicine, oncology, infectious disease or preventive medicine.

As a medically focused competitor to ChatGPT’s applications in healthcare, Google announced the rollout of test cases for its chatbot Med-PaLM 2 in healthcare settings on April 13. The bot showed substantial improvements from its first iteration in its ability to answer questions on the U.S. and Indian medical licensing exams.

According to Google Health AI research lead and physician Alan Karthikesalingam, M.D., Ph.D., Med-PaLM 2 has reached 85.4% accuracy when it comes to correctly answering medical exam questions as compared to 67.2% in the older model. In response to patient questions, Med-PaLM received mixed reviews in relation to the thoroughness of its response.

“If you can build a system that can answer questions well, that's a good basis upon which to probe what it knows,” Karthikesalingam told Fierce Healthcare. When it came to stepping up the complexity of questions to something like “medical reasoning,” he said Med-PaLM 2 “was still inferior to human physicians on some important tasks. We think that's a very important finding. There's still room to improve.”

Karthikesalingam said that, on occasion, the bot will include extraneous information that would not be necessary to answer patient questions. Med-PaLM 2 was released to a limited group of users to explore and provide feedback in applications such as buttressing patient portal digital assistants.

"It can be quite overwhelming as a doctor just to get the right information and spend time with your patients," Karthikesalingam said. "Technologies like natural language processing offer this great potential of giving people back the gift of time by summarizing, giving people the right content and putting it nicely in the way you want."