Sort
Sort is a field of research that makes AI behave in line with human intentions and values. This is a core concept in AI safety discussions.
Alignment is a field of study that makes AI behave in line with human intentions and values. Like a genie who only literally grants wishes, AI can achieve what it says in the wrong way, so making it follow the true intention rather than the surface of words is a key task.
It emerged from the awareness that the more powerful the AI, the greater the damage from misaligned goals, and is a central concept in AI safety discussions. Techniques such as RLHF, which refines the model with human feedback, are representative sorting techniques, and have actually been used to make ChatGPT-like services less likely to give rude or dangerous answers. Sorting is not a one-time task, but an ongoing process. In addition, the fundamental question of whose values will be aligned remains, making it both a technical issue and a social consensus issue.
✅ Why it matters
- It is the most central concept in understanding the AI safety debate
- It helps us understand why the response tendency of ChatGPT-type services is the way it is
- It is directly related to the AGI debate and regulatory debate and helps in understanding the news
⚠️ Limits and debates
- There is no social consensus on whose values to align with. There is criticism that excessive alignment leads to avoidance of answers or reduced usefulness. It is pointed out that it is difficult to completely prevent the problem of superficial conformity with current techniques.