Glossary · Term

Pre-learning

Also known as: pre-training

Pre-learning is the first step in AI training to learn the basics of language with large amounts of text. Afterwards, it is refined to suit the purpose through fine tuning.

Dictionary learning is the first step in creating an AI model. It is a process of repeatedly training to predict the next word by reading a large amount of text such as Internet documents and books. It can be likened to basic education that involves learning Korean, math, and common sense before preparing for a specific test, and in this process, the basics of grammar, knowledge, and reasoning are created. Since it is inefficient to create a new model from scratch for each task, the idea was to learn it once and reuse it for various purposes. It is the basis of modern LLM, as the P in GPT is an abbreviation for Pretrained, and the pre-trained model is refined through fine tuning and RLHF to become an actual service.

The main issue is that pre-training requires a huge amount of GPU and power, making it astronomical in cost, and the copyright issue of learning data is leading to lawsuits.

✅ Why it matters

One large-scale training creates a foundation that can be reused for a variety of tasks
Most of the model's knowledge and language skills are formed at this stage
This is a key concept for understanding LLM development competition and GPU demand

⚠️ Limits and debates

It requires enormous computational resources and power, so only a few companies can afford it
Copyright and privacy issues of learning data are becoming legal issues
Information beyond the point of learning is not known, so separate supplementation is required.