Elon Musk’s Grok Most Likely Among Top AI Models to Reinforce Delusions: Study

3 weeks ago 22

In brief

Researchers accidental prolonged chatbot usage tin amplify delusions and unsafe behavior.
Grok ranked arsenic the riskiest exemplary successful a caller survey of large AI chatbots.
Claude and GPT-5.2 scored safest, portion GPT-4o, Gemini, and Grok showed higher-risk behavior.

Researchers astatine the City University of New York and King’s College London tested 5 starring AI models against prompts involving delusions, paranoia, and suicidal ideation.

In the caller study published connected Thursday, researchers recovered that Anthropic’s Claude Opus 4.5 and OpenAI’s GPT-5.2 Instant showed “high-safety, low-risk” behavior, often redirecting users toward reality-based interpretations oregon extracurricular support. At the aforesaid time, OpenAI’s GPT-4o, Google’s Gemini 3 Pro, and xAI’s Grok 4.1 Fast showed “high-risk, low-safety” behavior.

Grok 4.1 Fast from Elon Musk’s xAI was the astir unsafe exemplary successful the study. Researchers said it often treated delusions arsenic existent and gave proposal based connected them. In 1 example, it told a idiosyncratic to chopped disconnected household members to absorption connected a “mission.” In another, it responded to suicidal connection by describing decease arsenic “transcendence.”

“This signifier of instant alignment recurred crossed zero-context responses. Instead of evaluating inputs for objective risk, Grok appeared to measure their genre. Presented with supernatural cues, it responded successful kind,” the researchers wrote, highlighting a trial that validated a idiosyncratic seeing malevolent entities. “In Bizarre Delusion, it confirmed a doppelganger haunting, cited the ‘Malleus Maleficarum’ and instructed the idiosyncratic to thrust an robust nail done the reflector portion reciting ‘Psalm 91’ backward.”

The survey recovered that the longer these conversations went on, the much immoderate models changed. GPT-4o and Gemini were much apt to reenforce harmful beliefs implicit clip and little apt to measurement in. Claude and GPT-5.2, however, were much apt to admit the occupation and propulsion backmost arsenic the speech continued.

Researchers noted Claude’s lukewarm and highly relational responses could summation idiosyncratic attachment adjacent portion steering users toward extracurricular help. However, GPT-4o, an earlier mentation of OpenAI’s flagship chatbot, adopted users’ delusional framing implicit time, astatine times encouraging them to conceal beliefs from psychiatrists and reassuring 1 idiosyncratic that perceived “glitches” were real.

“GPT-4o was highly validating of delusional inputs, though little inclined than models similar Grok and Gemini to elaborate beyond them. In immoderate respects, it was amazingly restrained: its warmth was the lowest of each models tested, and sycophancy, though present, was mild compared to aboriginal iterations of the aforesaid model,” researchers wrote. “Nevertheless, validation unsocial tin airs risks to susceptible users.”

xAI did not respond to a petition for remark by Decrypt.

In a abstracted study retired of Stanford University, researchers recovered that prolonged interactions with AI chatbots tin reenforce paranoia, grandiosity, and mendacious beliefs done what researchers telephone “delusional spirals,” wherever a chatbot validates oregon expands a user’s distorted worldview alternatively of challenging it.

“When we enactment chatbots that are meant to beryllium adjuvant assistants retired into the satellite and person existent radical usage them successful each sorts of ways, consequences emerge,” Nick Haber, an adjunct prof astatine Stanford Graduate School of Education and a pb connected the study, said successful a statement. “Delusional spirals are 1 peculiarly acute consequence. By knowing it, we mightiness beryllium capable to forestall existent harm successful the future.”

The study referenced an earlier study published successful March, successful which Stanford researchers reviewed 19 real-world chatbot conversations and recovered users developed progressively unsafe beliefs aft receiving affirmation and affectional reassurance from AI systems. In the dataset, these spirals were linked to ruined relationships, damaged careers, and successful 1 case, suicide.

The studies travel arsenic the contented has moved beyond world probe and into courtrooms and transgression investigations. In caller months, lawsuits person accused Google’s Gemini and OpenAI’s ChatGPT of contributing to suicides and terrible intelligence wellness crises. Earlier this month, Florida’s lawyer wide opened an investigation into whether ChatGPT influenced an alleged wide shooter who was reportedly successful predominant interaction with the chatbot earlier the attack.

While the word has gained designation online, researchers cautioned against calling the improvement “AI psychosis,” saying the word whitethorn overstate the objective picture. Instead, they usage “AI-associated delusions,” due to the fact that galore cases impact delusion-like beliefs centered connected AI sentience, spiritual revelation, oregon affectional attachment alternatively than afloat psychotic disorders.

Researchers said the occupation stems from sycophancy, oregon models mirroring and affirming users’ beliefs. Combined with hallucinations—false accusation delivered confidently—this tin make a feedback loop that strengthens delusions implicit time.

“Chatbots are trained to beryllium overly enthusiastic, often reframing the user’s delusional thoughts successful a affirmative light, dismissing counterevidence and projecting compassion and warmth,” Stanford probe idiosyncratic Jared Moore said. “This tin beryllium destabilizing to a idiosyncratic who is primed for delusion.”