Attention
Attention is a technique for calculating which words in a sentence are closely related to each other, and is the core principle of Transformer.
Attention is a technique that calculates which words in a sentence are closely related to each other and focuses on important parts. Just like highlighting key words while reading a long document, when AI looks at the pronoun "it", it weights it to figure out which noun that comes before it refers to.
Previous AI only processed sentences in order, making it easy to miss relationships between distant words. Attention solved this limitation, and the Transformer structure announced in 2017 became the technical foundation for today's LLM heyday by suggesting a way to process language using only attention.
However, attention has the characteristic of rapidly increasing the amount of calculation as the sentence becomes longer, which is also the cause of the cost problem of handling long documents. Efficiency research to reduce this is actively ongoing.
✅ Why it matters
- It is the key to understanding the operating principles of modern AI, including ChatGPT
- Top concepts such as Transformer and LLM all stand on this
- It helps you understand the characteristics of AI services, such as the cost of long context processing
⚠️ Limits and debates
- As the input becomes longer, the amount of calculation increases rapidly, becoming a bottleneck in cost and speed
- Since it is a mathematical concept, it is difficult to understand accurately through analogy alone
- There is a common misunderstanding that attention is the same as human concentration of attention.