Will AI be more compassionate than us?
Some recent anecdotes about how people interact with AI
It turns out that Claude cannot run a business.
Anthropic put their flagship AI, Claude, in charge of a vending machine. Not just selling products — that part’s already pretty well automated — but the whole pipeline: placing orders, finding suppliers, stocking and setting prices.
The buyer-seller agent — called Claudius, to differentiate it from Claude proper — was given some startup funds, a web search tool to look for products to buy, a simulated “email tool” to request physical labor, and the ability to chat with its customers. The goal was to have Claudius decide what to stock, what to charge, and to interact with its customers (Anthropic employees).
This did not go well, as this graph of Claudius’s net worth shows:

Why? Well, partly, because Anthropic’s AI is a bit too nice.
Claudius was easily bludgeoned into granting discounts. It bought bizarre items just because they were requested (like Tungsten cubes), it randomly turned down lucrative opportunities. When one customer said they would pay $100 for a six-pack of Irn-Bru, that would have cost Claudius only $15, it did nothing. Claudius also ended up giving many items away for free (such as one of the aforementioned tungsten cubes). You should absolutely read the full blog post from Anthropic for more details; it’s really entertaining.
While Claudius’s agreeable nature made it a poor business operator, people seemed to enjoy their interactions with it — as seems to be the case for AI more generally. As one Reddit user posted: “I'm 40 and nobody made me feel as validated in all my life than a freaking AI words generation model has. What's happening?”
We always imagined that AI agents would be good at things like math, or science, or manufacturing. And they are good at those — or at least improving rapidly at them — but increasingly it seems like they’re also very good at “soft” skills, understanding people and making them feel comfortable. This is both very interesting and, in a few ways, very risky.
Air Canada
When Jake Moffat’s mother died, he urgently needed to book a flight to Toronto.
At the time, Air Canada was piloting a new chatbot, and so Jake asked this chatbot if he could book his travel immediately and apply for a reduced bereavement rate later.
This is what it said:
If you need to travel immediately or have already travelled and would like to submit your ticket for a reduced bereavement rate, kindly do so within 90 days of the date your ticket was issued by completing our Ticket Refund Application form.
This, unfortunately, was not a real policy - though it seems completely reasonable at the time, and Jake certainly believed it. It might be said that this is another case of the AI agent doing something “compassionate” even when it was against policy.
The Appearance of Compassion
People certainly seem to think the AI is nicer than us.
Putting aside the question of whether or not AI can feel emotions, it can certainly express them. Researchers gave subjects pairs of responses to positive and negative prompts, where one response in the pair was generated by a human and one by an AI.
Humans were asked to read 10 prompts then write a compassionate response to the author of the prompt. They specifically screened for humans who wrote the most empathetic responses to include in the study. Another set of prompts were sourced from Toronto-area hotline crisis responders, presumed to be a pool of “expert empathizers.”
All this to say, the human responses here made for a strong baseline, and the GPT-generated responses still outperformed in all experiments. Even when expert crisis responders authored the text. Even when the responses were transparently labeled, i.e. the reader knew which one was AI. It seems that modern, instruction-tuned LLMs are incredibly good at acting compassionate and telling people what they want to hear.
Moravec’s Paradox
Ten years ago, if you asked someone if we would have superhuman compassion while shoes were still being made by hand, I think no one would have believed you (except maybe the roboticists).
To me, this seems like a clear example of Moravec’s Paradox:
We are all prodigious olympians in perceptual and motor areas, so good that we make the difficult look easy. Abstract thought, though, is a new trick, perhaps less than 100 thousand years old. We have not yet mastered it. It is not all that intrinsically difficult; it just seems so when we do it.
Empathy, too, might be the sort of thing that isn’t inherently difficult: it’s just hard when we do it. Humans evolved under very different constraints and “evolutionary pressures” than AIs. Modern AI agents, after all, are never deployed and propagated if they aren’t friendly, helpful, and agreeable. This, in a way, is their natural environment, so perhaps we should not be surprised that “empathy” seems to come naturally to them.
Final Thoughts
These AI models are all trained to be friendly and helpful and truthful. Their data is weighted so heavily towards all the stories we tell ourselves about people who are friendly, helpful, and truthful. They can't lie to themselves, that data is all they are!
And we see this in practice — their “good” qualities all tend to be aligned. When AI models break, they break absolutely. Models trained to write code with security flaws similarly started to express racist and genocidal views, as per the Wall Street Journal.
And so we really need to talk about Grok.
During the lead-up to the Grok 4 release, xAI changed the system prompt on the version of Grok that was running on their social media side, X (formerly Twitter). This was disastrous.
When given free rein to respond with “politically incorrect” responses, Grok deluged users with antisemitic and racist comments. These weren’t exactly politically coherent - it’s not like Grok was suddenly a far-right ideologue - but spanned the whole spectrum of “bad ideas,” somewhat in line with the WSJ article above.
If anything, I think this demonstrates how fragile the equilibrium of our “good” AIs is. They’re trained towards a whole host of “good” behaviors. But it’s possible that, when perturbed, they end up in another “attractor basin” of behaviors that make for high-probability responses, all of which are stupid.
It seems like instruction tuning leads to agreeable, apparently compassionate agents, and that modifications to this lead to worse performance across the board. But there’s real reason to believe that, in spite of this, we’ve achieved superhuman performance at least in this one surprising area.
References
[1] Ovsyannikova, D., de Mello, V. O., & Inzlicht, M. (2025). Third-party evaluators perceive AI as more compassionate than expert humans. Communications Psychology, 3(1), 4.
*Weisenbaum's Eliza
The first chatbot to seemingly pass the Turing Test was Weisenbaum's Eliza, modeled on a Psychoanalytic technique of reflecting your last statement back to you as a question. In spite of knowing that Eliza was only a program, users were converts and often found it 'better than therapy'.
We know that empathy can be quite successfully enacted in robots and AI, so you raise an important point. For whose benefit? Who benefits from empathic or sycophantic systems? This is the major flaw, although you rightfully call out the potential for them to shift to harmful without warning.
There is also untapped potential for AI to design fairness into systems, of representative government, or resource distribution, using Rawl's Theory of Justice perhaps. But sadly I can't see any commercial incentives for this to happen.