Glossary · Term

STT

Also known as: speech recognition, speech to text

STT is the art of transcribing words into writing. It is used to automatically create meeting minutes and create subtitles.

STT (Speech to Text) is a technology that recognizes human speech and converts it into text. It is also called voice recognition. Just as a stenographer transcribes conversations at a meeting, AI listens to speech and transcribes it into writing in real time.

There are many situations where speaking is faster and more convenient than typing on a keyboard, so it is widely used for smartphone voice input, automatic creation of meeting minutes, generation of video subtitles, and analysis of call center consultation records. Since the introduction of deep learning, recognition accuracy has increased significantly, reaching the level of daily service, and in combination with LLM, it is developing into a service that summarizes and organizes the written content.

However, misrecognition increases in noisy environments, situations where several people are speaking overlapping, and dialects or technical terms, so it is safer to have important records verified by a human.

✅ Why it matters

Automates transcription tasks such as meeting minutes, subtitles, call logs, etc.
Increases accessibility and convenience with voice input that is faster than the keyboard
Combined with LLM, it goes from transcription to summarization and to-do organization.

⚠️ Limits and debates

Misrecognition occurs due to noise, overlapping speech, dialect, and technical terminology
There are limits to the accuracy of meeting minutes due to imperfect speaker differentiation
Privacy issues arise during the collection and processing of voice data