Glossary · Term

STT

Also known as: speech recognition, speech to text

STT is the art of transcribing words into writing. It is used to automatically create meeting minutes and create subtitles.

STT (Speech to Text) is a technology that recognizes human speech and converts it into text. It is also called voice recognition. Just as a stenographer transcribes conversations at a meeting, AI listens to speech and transcribes it into writing in real time.

There are many situations where speaking is faster and more convenient than typing on a keyboard, so it is widely used for smartphone voice input, automatic creation of meeting minutes, generation of video subtitles, and analysis of call center consultation records. Since the introduction of deep learning, recognition accuracy has increased significantly, reaching the level of daily service, and in combination with LLM, it is developing into a service that summarizes and organizes the written content.

However, misrecognition increases in noisy environments, situations where several people are speaking overlapping, and dialects or technical terms, so it is safer to have important records verified by a human.

✅ Why it matters

⚠️ Limits and debates

← View all glossary entries