The Bounds of the Cloned Voice
In 2024, journalist Evan Ratliff cloned his own voice, wired it to a chatbot, and let it loose on the phone — and the thing that gave it away was a pause. This deep dive traces where that pause went. By 2026 the technical tells Ratliff relied on — latency, woodenness, the audible seams — have largely collapsed, which means the real boundary of the cloned voice has moved off the machine and onto us: our psychology, our laws, and the economics of who can afford to fake whom. The defense that still holds isn't a better detector. It's whether you and the person on the other end agreed on a code word in advance.
In this episode
- The 2024 baseline, from the source. Evan Ratliff's Shell Game — the three-part voice-agent stack (ElevenLabs clone + LLM "brain" + phone number), the DIY rig that off-the-shelf platforms quickly obsoleted, and the three honest tells: a robotic pause, a wooden affect, and a "world-class bullshitter" that confabulated zip codes.
- The tells fall. Native speech-to-speech models (ICLR 2026; Hume EVI 3) and sub-300ms latency engineer out the pause Ratliff exploited — while the "indistinguishable clone" claim rests on marketing language with no published benchmarks behind it.
- Believability was never in the machine. The psychology research confirms Ratliff's hunch: primed expectations, emotional arousal, and years of bad-VoIP conditioning do the attacker's work. Seventy percent of people can't reliably tell a clone from a real voice.
- The economics that make it inevitable. All-in automated vishing at two to six cents a minute; voice generation is now the cheapest part of the fraud stack. The same cost collapse is reshaping voice-actor labor — but vendors themselves concede the performance gap.
- Consent, the dead, and the law racing to catch up. AB 1836, the ELVIS Act, and the NO FAKES Act's June 2026 committee advance — against Berkeley Law's finding that no will can stop cloning if your audio is public, and Cambridge's warning about griefbots "haunting" the bereaved.
- The escalation: when the camera stops being proof. Ratliff turned his video on to prove he was human; FBI advisories now document multimodal "proof of life" fakes in virtual-kidnapping scams — and detection is measurably behind the attack.
- The contrarian beat. Clones still fail, behavioral defenses outperform technical ones, the scariest fraud-economics numbers are vendor estimates, and the headline detection benchmark was self-published by a contestant.
Sources & References
Primary / originating sources (operator-provided — ground zero)
- https://www.shellgame.co/
- https://davidepstein.substack.com/p/attack-of-the-ai-voice-clones
- https://radiolab.org/podcast/shell-game/transcript
Research & critique (peer-reviewed and academic)
- ICLR 2026 — Speech-to-Speech LLM (Zhao et al., OpenReview)
- Berkeley Law — "Vocal Identity Under Siege" (Lee & Sun, Oct 2025, PDF)
- Cambridge / Philosophy and Technology — Safeguards against AI "hauntings" by deadbots
- AI and Ethics — Non-addiction as a griefbot design principle (ScienceDirect, 2026)
- Computers in Human Behavior — Scam-call psychology / ScamGen (2024)
- Florida International University — Model-statement deception detection (2026)
- arXiv — AT-ADD Grand Challenge audio deepfake benchmark (Apr 2026)
- Hugging Face — Human Perception Audio Deepfake 2026 dataset
- ACM Digital Library — Digital resurrection misuse research
Law, regulation & government
- FBI IC3 — Virtual kidnapping / AI voice cloning advisory (PSA251205, Dec 5, 2025)
- FBI IC3 2025 losses — reporting via Malwarebytes (Jun 2026)
- FTC — Preventing Harms from AI-Enabled Voice Cloning (Nov 2023)
- NO FAKES Act clears Senate Judiciary Committee (Deadline, Jun 2026)
- California AB 1836 — full text (LegiScan; operative Jan 1, 2025)
- Senate JEC — Sen. Hassan presses AI voice-cloning companies (Apr 2026)
- UN News — AI-enabled fraud as a cross-border challenge (Mar 2026)
Industry, implementation & economics
- Hume AI — EVI 3 (native speech-to-speech)
- Inworld AI — Best speech-to-speech model comparison (latency)
- Cresta — Engineering for real-time voice agent latency
- Cerebrium — Global-scale voice agent at 500ms latency
- Deepgram — 2026 Voice AI Buyer's Guide (platform pricing)
- Vellum AI — Voice agent platforms guide
- Forasoft — Real-time voice cloning fraud-stack economics
- Resemble AI — Pricing (synthesis cost floor)
- Resemble AI — 2026 eight-system detection benchmark (vendor-self-published; see caveat)
- McAfee — AI voice scam research (3-second clone; 70% can't detect)
- Loeb & Loeb — SAG-AFTRA / Replica Studios AI voice agreement
- Forbes — Virginie Berger on SAG-AFTRA's AI voice gamble
- Respeecher — How character AI voice works (vendor concedes performance gap)
- Murf AI — Will AI replace voice actors (vendor self-assessment)
- ElevenLabs — Professional Voice Cloning documentation
- DeepIDV — Deepfake detection 2026: injection attacks
Have questions about this episode? Reach out.