Is Voice the New Trust Signal?
We've come a long way from "Hey Siri."
Now, in 2026, AI voice has matured to the point where "emotionally expressive," real-time synthetic speech is not only possible — it's table stakes. OpenAI retired its old Standard Voice Mode in September 2025, replacing it with a unified ChatGPT Voice. Google's Gemini 3.1 Flash Live — released at the end of March 2026 — is being billed as their "highest-quality audio and voice model yet," collapsing the old transcribe-reason-synthesize pipeline into a single native audio-to-audio process. And ElevenLabs' Eleven v3 is their most expressive text-to-speech (TTS) model yet, with emotional control delivered via simple inline prompt tags like [whispers] or [sighs], and support for more than 70 languages.
The question is no longer whether AI voice can convincingly mimic human expressiveness. The question is: what does this mean for how we relate to, trust, and interact with digital interfaces powered by voice? And, crucially, what does it mean for the people most likely to be harmed?
Voice Is a Livewire
Here's something that tends to get buried in all the hype about large language models and image generation: voice is different. It is not just another interface modality. It is arguably our most human channel.
Psychological research shows that humans form consistent social impressions — judgments about warmth, trustworthiness, dominance, likeability — from extremely brief vocal snippets. We're talking sub-500-millisecond "vocal bursts." A 2014 PLOS ONE study found that listeners reliably made personality inferences from a ~390-millisecond recording of the word "hello." Not a sentence. Barely a phrase. Just a burst of sound less than half a second long.
What this means is that voice carries social meaning in a way that text simply does not. When we hear a voice, we are not just processing information; we are, almost instantly, making decisions about character, reliability, and trustworthiness. This is profoundly evolutionary, and it runs very deep.
And now, voice AI has learned to speak our language. Naturalistic, human-like AI voice increases anthropomorphisation, which in turn increases user acceptance, engagement, and emotional connection. Research in HCI and social robotics shows this pattern consistently: human-like voices are perceived as warmer, more trustworthy, more relatable. This is a feature, from a product standpoint. It is also a major potential risk for its users.
The Tuneable Trust Machine
What makes AI voice particularly interesting is how finely it can be tuned.
Different acoustic features (pitch, intonation, formants, timbre) can be deliberately steered to make a synthetic voice sound more authoritative, more empathetic, more trustworthy, or more dominant. This has been demonstrated empirically: HCI research shows that specific spectral features can be manipulated to craft voices that listeners reliably perceive as warmer or more competent. Other researchers have shown that manipulating voice pitch influences perceptions of leadership capacity. And today, on tools like ElevenLabs, designing a voice to your specifications is literally a matter of writing a single prompt.
So: a highly persuasive medium, with precisely tuneable trust cues, that can be generated at scale, in real time, at near-zero cost.
The opportunities are real and significant. AI voice is already proving effective in education (language learning, speech practice) and in therapy-adjacent contexts (CBT-style coaching apps, etc). The market for voice AI in education alone is projected to be worth $43.5 billion by 2034. Voice interactions have been shown to drive brand engagement with smart speakers. Shopify's VP of Product boasts that users of their Gemini-powered voice agent "often forget they're talking to AI within a minute."
But the risks are also very real. And they map in notable ways onto two groups: young people and vulnerable populations more broadly.
The Companion Problem (Young People)
It's impossible to talk about AI voice without talking about AI companions. A recent 2025 study by US non-profit Common Sense Media found that 72% of US teens have used an AI companion. That is not a niche early-adopter phenomenon; that is a majority of teenagers.
The open endorsement of AI companionship for general "all purpose" use is on the rise
What the research is beginning to show about heavy use of AI companions is concerning: i.e., daily usage can increase loneliness, dependence, and problematic use (MIT Media Lab, March 2025; Nature, May 2025). And although this isn't unique to voice specifically, the particular qualities of voice — warmth, emotional expressiveness, naturalism — seem well-suited to precisely the kind of parasocial relationship formation that makes AI companionship stickiest and potentially most harmful for younger users. For children and teenagers who are still developing their social and emotional cognition, the naturalistic warmth of a well-designed AI voice could make it genuinely difficult to calibrate what counts as a real relationship vs. a simulated one.
I actually think there's something fascinating about our growing need for AI companionship. But the subtext to so many of these AI-human relationships, of course, is that human-human ones are hard – so why not go with the easier and more seamless option towards fulfillment?
The Scam Problem (Vulnerable Populations)
There will be many ways in which AI voice is causing harm to vulnerable populations, but one of the most pervasive ones is the weaponization of this technology for fraud.
Voice-based scams and deepfake voice fraud are on the rise, with large-scale operations deploying AI voice tools as a first point of contact in industrialised romance fraud and financial scams. The "convincing enough" bar for initial contact is much lower than it used to be — and AI voice tools clear it easily. ElevenLabs' Instant Voice Cloning, for instance, can produce a usable voice clone with just a few seconds to a few minutes of sampled audio; their Professional Voice Cloning goes further, producing results the company describes as "often indistinguishable from the original speaker." Both are available on a self-serve basis.
In a University of Portsmouth study back in 2024, 40% of participants, most of whom were aged 75 or older, reported fraud attempts daily, weekly, or monthly. That is not a small number. And the emotional and financial damage wrought by voice-based fraud, particularly on elderly and vulnerable adults, can be catastrophic.
Then there's the subtler, harder-to-legislate risk: over-reliance on AI voice in quasi-therapeutic or quasi-companionship contexts. Hallucinated or misleading advice delivered in a warm, empathetic, naturalistic voice carries a different kind of persuasive weight than the same advice delivered via a text chatbot. (It feels relatively natural to ask Claude or ChatGPT or Gemini to go back and give you citations for what it just wrote; can you imagine asking it to do that mid-convo?) The empathic modality lowers scrutiny. And if you are already lonely, isolated, or mentally unwell, that lowered scrutiny can carry serious consequences.
So What Now?
How do we manage the pervasiveness of this? How do we regulate around it?
In the U.K., the Online Safety Act is primarily oriented around harmful content. But many of the risks around AI voice are risks of form of delivery. Ofcom, the U.K.'s independent regulator of media and telecomms, is surprisingly loose with AI voice regulation. Part of this, I found out last year, is because, as a regulator, it must remain modality agnostic. But when a warm, naturalistic, trustworthy-sounding voice can make harmful advice, manipulative messaging, or outright fraudulent content significantly more dangerous — without the content itself necessarily triggering standard content moderation – that might be a real problem.
[Note: The Percolator was out last week, and I know many of you are waiting for Gen Z Boys Part II – that post has taken a bit longer than planned, but will land in your inboxes early next week.]
What I'm reading/watching/consuming this week:
▶️ That PLOS ONE study on first impressions from "hello" is worth a skim even if you're not a psychologist.
▶️ Have you seen this South Park episode about AI ("Sickofancy")? Because, highly relatable. And quite relevant for this post about the power of AI voice. (Season 27, Ep. 3, in case you were wondering 😉)
