Glossary · Term

Reinforcement learning

Also known as: RL

Reinforcement learning is a learning method that allows you to learn through trial and error by giving rewards if you do well and penalties if you do not. This is the core technique of AlphaGo and its inference model.

Reinforcement learning is a learning method that instead of directly providing the correct answer, rewards or penalties are given to the results of actions, allowing students to find better actions through trial and error. When teaching a dog how to sit, it is the same principle as repeating the training by giving it a treat when it does well, rather than explaining it with words.

It is difficult to produce correct answer data, but it has been developed to be suitable for problems in which good or bad results can be judged, such as games and robot control. It was the core technique of AlphaGo, which surpassed humans in the game of Go, and has recently been attracting attention again as the driving force of the inference model that trains LLM in tasks where the correct answer can be confirmed, such as mathematics or coding.

However, if the reward design is incorrect, reward hacking can occur in which the AI only collects points through unintended tricks, so deciding what to reward is considered the most difficult part.

✅ Why it matters

⚠️ Limits and debates

← View all glossary entries