Electronic health record messages to patients drafted by generative AI were of similar quality and accuracy to those written by healthcare professionals, according to a newly published study conducted using queries from NYU Langone Health patients.
The analysis, headed by researchers at the system’s affiliate NYU Grossman School of Medicine, had 16 primary care physicians rate AI and human drafts without knowing how each was written.
Among a sample of 334 AI-drafted messages and 169 from professionals (both physicians and non-physicians), the raters found both sets to be on par regarding informational content, completeness and whether the grader would use the draft or start again from scratch.
The findings “suggest chatbots could reduce the workload of care providers by enabling efficient and empathetic responses to patients’ concerns,” study lead William Small, M.D., of the medical school, said in a release.
There were specific areas in which the message responses drafted by the AI—a private instance of OpenAI’s Chat GPT-4—were rated more and less favorably than the human drafts, the researchers wrote.
AI drafts deemed to be usable were rated as more empathetic than usable drafts from humans. They also had higher ratings for understandability and tone and were more likely to use language conveying positivity and affiliation with the patients.
Of note, a subsequent analysis found that human drafts penned by physicians, rather than non-physicians, rated more poorly on communication quality, though the researchers speculated that the finding could result from the physicians tackling more challenging patient questions than their colleagues.
At the same time, researchers wrote that the technology drafted responses that were 38% longer and 31% more likely to include complex language, writing at an eighth-grade level as opposed to human drafters’ sixth-grade level.
The researchers pointed to complexity as a key area for future improvement and warned that more linguistically complex responses could be preferred by physicians rating and using the drafts. That preference could “burden those with low health or English literacy,” and could contribute to concerns of the technology furthering health inequities.
The technology’s draft performance also tended to vary between subject or message type, with the researchers calling out lab results in particular as an area in need of further refinement.
Smoothing down the rough edges of generative AI-drafted patient messages could be worth the time for providers. NYU Langone is among the many healthcare organizations that saw an uptick in patient messages and virtual communications in recent years. Health system physician saw a 30% annual increase in recent years in the number of messages received daily, according to an article by Paul A. Testa, MD, chief medical information officer at NYU Langone. Is not uncommon for physicians to receive more than 150 In Basket messages per day, Testa wrote.
The resulting burdens for responding physicians has not only been financially inefficient but a contributing factor to practitioners’ stress and burnout.
“This work demonstrates that the AI tool can build high-quality draft responses to patient requests,” Devin Mann, M.D., corresponding author and senior director of informatics innovation in NYU Langone’s Medical Center Information Technology, said in a release. “With this physician approval in place, GenAI message quality will be equal in the near future in quality, communication style, and usability to responses generated by humans,”
While several organizations have tackled the influx by attaching charges to the most time-consuming responses, others have taken NYU’s approach and explored how generative AI could cut down the workload.
Additionally, a study published in April from Kaiser Permanente researchers explored another approach in which algorithms attached category labels to patient messages. The system would then direct the messages to respondents with the appropriate level of expertise, saving the most complicated queries for physicians and allowing medical assistants or teleservice representatives handle the others.