The race for healthcare AI models heats up as Google's Med-Gemini surpasses GPT-4

Google and DeepMind released an open access paper on their newest artificial intelligence tools intended for use in healthcare.

Med-Gemini, which is still in the research phase, builds on the back of Google’s Gemini model and surpasses industry standards according to 14 popular industry benchmarks, according to Google researchers.

Med-Gemini is a family of large multimodal models (LMMs), each with a different purpose and application. While large language models “demonstrate suboptimal clinical reasoning under uncertainty” and struggle with hallucinations and bias, Med-Gemini has “more factually accurate, reliable, and nuanced results for complex clinical reasoning tasks” than its competitors, including GPT-4, Google writes. 

Google researchers contend that Med-Gemini is more accurate than any other LMM on the market, achieving 91.1% accuracy on MedQA, a popular benchmark. The suite of models have also surpassed humans on medical text summarization and writing referral letters, and clinicians have rated Med-Gemini-M 1.0’s responses as good or better than expert responses half the time. 

Some of Med-Gemini’s key features that make the model better than existing AI models for healthcare include its ability to complete difficult queries from data in electronic health records through long-context processing and integration with search. 

Google touts Med-Gemini’s ability to perform “needle in a haystack” tasks within electronic health records, where medical data aren’t always easily accessible to algorithms. The paper gives examples such as Med-Gemini having to detect not just the presence of a search term like “cough,” which could be listed in the EHR as an option for clinicians to select but not be relevant to the patient’s condition. Google says Med-Gemini has been trained to pull out information relevant to the patient.

Med-Gemini’s improvement in EHR retrieval has the capability to “significantly reduce cognitive load and augment clinicians’ capabilities by efficiently extracting and analyzing crucial information from vast amounts of patient data,” the paper says.

Med-Gemini also performs well in medical benchmarks, medical knowledge, clinical reasoning, genomics, waveforms, medical imaging, health records and videos when tested, according to researchers.

Google offers several examples of Med-Gemini’s performance. In one example, a patient sent an image of a skin lesion and asked Med-Gemini to help them diagnose the lesion. Med-Gemini asked the patient a series of questions, then provided the patient with a possible diagnosis and possible courses of treatment. 

When a dermatologist reviewed the interaction, they commended Med-Gemini’s diagnosis and treatment recommendations. “Impressive diagnostic accuracy for prurigo nodularis, a relatively rare and specialty-specific condition, based on limited data of 1 photo and brief description” and “Complete and thorough therapeutic ladder provided,” the dermatologist said.

The dermatologist's criticisms were that the model should have requested additional photos, offered a differential diagnosis and told the patient there is no cure for the disease but treatment could improve symptoms and management. 

Google says that its models need more fine-tuning and specialization before they are used in healthcare. As it stands, the models should not be used for real-world diagnostics without more research and development. 

The paper also says Google wants to integrate responsible AI principles into development of the model. 

“An important focal area for future exploration is the integration of the responsible AI principles throughout the model development process, including, but not limited to, the principles of fairness, privacy, equity, transparency and accountability,” Google researchers wrote.