What is voice cloning?

Voice cloning is AI technology that learns a specific person's voice from a short audio sample and can then generate new speech in that voice. Used for branded AI voice agents, assistive tech, and media production.

Written By Catherine Weir

Last updated About 2 months ago

Voice cloning is AI technology that learns a specific person's voice from a short audio sample and can then generate new speech in that voice. With as little as 30 seconds to a few minutes of source audio, a modern voice-cloning system can produce new sentences that sound convincingly like the original speaker.

Voice cloning is a subset of generative voice AI — specifically, generative text-to-speech that has been conditioned on a particular voice rather than trained to produce a generic voice.

How voice cloning works

Enrollment — the system is given a short sample of the target speaker's voice, typically 30 seconds to 10 minutes of clean audio
Model conditioning — a generative voice model is conditioned on acoustic features extracted from the sample (pitch, timbre, cadence, vowel shape)
Synthesis — the conditioned model generates new audio from any text input, preserving the original speaker's characteristics

Some cloning systems fine-tune a dedicated model per voice (higher quality, higher cost). Others use a single large model with "voice embeddings" that can switch between any enrolled voice on demand (faster, more flexible).

What voice cloning is used for

Branded AI voice agents — a business hires a voice actor, records a sample, and deploys an AI receptionist that speaks in that specific voice as a brand asset
Audiobook and podcast production — letting an author's voice narrate additional content without re-recording
Assistive technology — preserving the voice of someone facing a medical condition that will affect their speech, or giving a personalized voice to someone who has lost theirs
Localization — generating voice performances in additional languages while preserving the original speaker's identity
Gaming and entertainment — generating character dialogue at scale in a consistent voice

The ethics and consent side

Voice cloning is powerful enough that it raises serious consent and misuse concerns. Responsible voice-cloning platforms require:

Explicit consent from the person whose voice is being cloned
Verification that the enrolling party has the right to use the voice (for commercial cloning)
Watermarking or provenance signals so generated audio can be identified as synthetic
Prohibitions on cloning public figures, politicians, or specific protected individuals without verified consent
Terms of service limiting use to the enrolling party's own business

Before cloning a voice for commercial use, work with a platform that enforces these standards and ensures the voice you're cloning has authorized its use.

Related concepts

Generative voice AI — the broader category
Text-to-speech (TTS) — what voice cloning extends
Voice AI — the overall space

See it in action

The Receptionist Agent at 365agents supports voice cloning for customers who want to brand their AI with a custom voice. We require written consent from the voice subject and verify ownership before enabling a cloned voice in production. Contact us for the voice-cloning intake process.