Glossary · Term

TTS

Also known as: speech synthesis, text to speech

TTS is a technology that reads text in a natural voice. It is the foundation of AI dubbing and audiobooks.

TTS (Text to Speech) is a technology that converts written sentences into a natural human voice and reads them. It is also called speech synthesis. Unlike the rigid mechanical sound announcements of the past, recent TTS has reached a level where it is difficult to distinguish it from a voice actor's reading by reproducing intonation, emotion, and even breathing sounds.

Starting as a means of accessing information in situations where the screen cannot be seen or for the visually impaired, its use has now expanded to include navigation guidance, audiobooks, video dubbing, and AI call assistants. In particular, the addition of voice cloning technology, which reproduces a specific person's voice with a few seconds of sample, is changing the way content is produced.

However, as voice cloning has become easier, concerns about voice phishing and deepfake voice abuse have increased, and the rights of voice professions such as voice actors are also becoming a new issue.

✅ Why it matters

Significantly lowers the cost of producing content such as audiobooks, dubbing, and voice guidance
It is a key technology that increases information accessibility for the visually impaired and others
Multilingual voice conversion makes it easier for content to enter overseas markets

⚠️ Limits and debates

The risk of voice phishing and deepfakes abusing voice duplication has increased
The rights and livelihood of voice-related professions, such as voice actors, are an issue
There are still cases of unnaturalness in long sentences or emotional expressions.