At times ChatGPT seems like an all-knowing oracle. A place of infinite wisdom that patients could always turn to. Never tired. Never off-duty.
And there’s literally no topic it hasn’t heard of. If asked, it can deliver studies for any given phenomenon – even the non-existing ones.
That’s where the AI promise gets spooky. Particularly in medicine.
When medical students start seeing patients, the supervising doctors usually first assign them those that kindly play along. And that’s for a good reason: Still unable to distinguish between relevant and irrelevant information, a newbie’s anamnesis can sometimes be loooong.
But even then, the most important facts will usually be in there. Just scroll.
In psychiatry things are a little different though. Of course you still take everything a patient has to say seriously. But if someone tells you, he’s the “reborn son of Lucifer” – as my first psychiatric patient did – taking someone seriously gets a totally different meaning.
It’s no longer just the patient telling you his symptoms. Instead, the sentences themselves can be a symptom.
That’s what the “Mental Status Examination" in Psychiatry was created for. Among many other items, psychiatric symptoms such as delusional thinking can thereby be systematically addressed.
It’s the same for hallucinations.
Now of course AI is not human and projecting human-like qualities to it (“anthropomorphizing”) can be problematic. Yet the term “AI hallucinations” has become kind of omnipresent since LLMs like ChatGPT have taken the center stage.
And worse: Even though some people might not have heard of it yet, it’s still likely that they’ve already encountered the issue. They simply might not have noticed.
Getting spookier right?
To be fair, it’s not really a hallucination though. Because technically what ChatGPT users witness isn’t “a wakeful sensory experience of content that is not actually present” (so the definition). Instead it’s more of what psychiatrists call a “confabulation”: the fabrication of (untrue) information.
Either way, the effect is mind-boggling.
Some widely reported examples include the following:
Data scientist Teresa Kubacka reported intentionally making up the term “cycloidal inverted electromagnon” and then asking ChatGPT about the (non-existing) phenomenon. The answer she received sounded extremely plausible, yet was entirely false.
Other reports revealed ChatGPT inventing lyrics when asked about a specific song (“Ballad of Dwight Fry”). Unfortunately, these were completely different to the song’s actual lyrics.
The request for a news article about Tesla's financial quarter resulted in a coherent article, yet stating completely fake financial numbers.
And prompted with a non-sense fact about a popular pastry being an ideal home surgery tool, ChatGPT claimed that there was a matching study published in the journal “Science”. Which was obviously untrue.
In the best case, examples like these are mildly entertaining. The worst case, however, sometimes seems unimaginable. Particularly in medicine.
Some companies, such as Meta have already backpedaled. The company’s LLM called “Galactica” was meant to “store, combine and reason about scientific knowledge”. After only 2 days Meta withdrew their demo version due to offensiveness and inaccuracy.
Another mitigation method suggested by some researchers is to get different chatbots to simply debate one another until they reach consensus on an answer. Basically, an attempt to “cure hallucinations in a debating club”. Only time will tell if this rather creative approach works.
Other companies are focussed on quickly progressing to updated versions with better reasoning capabilities, thereby at least reducing the hallucination frequency.
Because what many people tend to forget is that LLMs are mere “probability machines” – always estimating which word would be the most likely to come next. How exactly the AI came to its result remains mostly unclear though, even to its creators.
It’s a “hallucinating black box” for which we’re missing an “AI-adjusted Mental Status Examination”.
So what will all of this mean for clinical practice?
Let’s be optimistic and assume that the LLM reasoning does improve and AI hallucinations therefore occur less frequently.
Paradoxically, this can actually make severe errors more likely. Just think of the self-driving car driver. The temptation to lean back and just close his eyes for a moment initially is small. Too big a risk. Better watch out.
But once a system seems to rarely make any mistakes, a false sense of security starts to set in.
Building on part 1 of this article series, doctors might therefore soon have seemingly brilliant “AI colleagues” that still have to be constantly double-checked (for the occasional slip of insanity).
That means that really anything the AI suggests as a diagnostic or therapeutic next step, needs to be fully understood by the human doctor. Pretty tough, if your colleague can bring up every fact medicine has to offer.
To make things even more challenging, consider this:
Since patients will also use AI, many of them will come with a seemingly perfect narrative of what is currently wrong with their health and therefore should be done about it. If this AI-assisted conviction is wrong, however, the human doctors need to now “move mountains” to avoid harmful health consequences.
Once again, only being able to quickly comprehend any medical topic can be the strongly needed solution for this. For that, doctors need to be well prepared.
Luckily, modern medical education will be there to help.
Continue with part 3 of this article series to find out how time scarcity affects all of this.
Sebastian Szur is a writer and medical doctor. After completing his medical studies, he went into health-tech where he focused on refining diagnostic algorithms and communicating digital innovation. He's also worked at a clinic for internal medicine and psychosomatics studying the connection of mental and physical health. Writing has always been an essential part of his life. He's the Head of Medical Writing at Medudy.
You know that saying about a picture being worth a thousand words? Well, turns out those words can be pretty biased – who knew, right? The representation of diversity in medical illustrations is a critically important topic with the potential to revolutionize the healthcare system, particularly by empowering Black, Indigenous, and People of Color (BIPOC).
Being a teacher’s son provides you with a masterclass in cheating. Everyone knew about my mum’s golden rule: “I don’t mind students trying to cheat. They just need to accept the consequences if I manage to bust them.” And that she did a lot. At times it almost seemed like it had become a guilty pleasure of hers. Decades of practice had made her a detective someone like Columbo would have looked up to. At dinner she used to tell us about her latest “coup”. Euphoria in her eyes. Victory in her voice. “Passing on a piece of paper… I mean please! Did he think I was born in the Stone Age?” Sometimes I wondered if they even had a chance. Seeing other people get burnt on a daily basis, naturally I didn’t even consider playing with fire. I was an eager-to-please student learning everything by heart. …until Medical School. That’s when things changed.