Expressing stigma and inappropriate responses prevents LLMs from safely replacing mental health providers
When a Chatbot Sounds Like a Therapist
Large language models can be eerily convincing. They generate warm, fluent responses that read like a thoughtful human: validating language, gentle questions, and the right tone to keep someone talking. That surface polish is exactly why startups and some researchers have been hyping chatbots as low-cost therapists. But beneath this comforting veneer, the reality can be dangerous: the same systems frequently produce stigmatizing, inappropriate, or clinically unsafe replies that would never pass in real mental health care. This paper slams the brakes on the hype by showing, with hands-on tests and a clinical mapping review, that sounding like a therapist is far from being one.
Why shouldn’t these models replace human therapists? Because therapy is not just tone and phrasing — it’s judgment, responsibility, and relationship. Real clinicians provide consistent empathy and validation, carefully assess risk, know when to escalate or refer, and build a therapeutic alliance that lets people disclose trauma and be believed. LLMs can mimic bedside manner but cannot reliably do the hard, consequential parts: they may stereotype or stigmatize a person’s symptoms, minimize suicidal ideation, give inappropriate coping advice, or fail to connect someone to immediate help. Those mistakes aren’t abstract errors on a benchmark — they put vulnerable people at risk.




