AI Basics · Part 2

How Does ChatGPT Talk? Understanding How LLMs Work

May 11, 2026 · AI Note Lab

The essence of an LLM: predicting the most likely next token

Chat with ChatGPT long enough and there comes a moment when you think, "Wait — is this thing actually thinking?" But peek under the hood and you'll find a surprisingly simple (and all the more astonishing for it) principle at work. Today I'll explain how a large language model (LLM) builds sentences — no math required.

The Core Idea: "Guess the Next Word"

If you had to compress what an LLM does into a single sentence, it would be this.

"Look at the text so far, and keep predicting the most plausible next chunk of text."

Given the text "The capital of France is", the model outputs the most probable continuation: "Paris". Then it takes "The capital of France is Paris" as the new input, predicts the next word, and so on — repeating the process hundreds or thousands of times until a full answer emerges. Think of it as your phone keyboard's autocomplete on steroids; as a starting point, that mental model is good enough.

Tokens: The Units an AI Reads In

An LLM doesn't read text whole — it chops it into pieces called tokens. A token can be a word or just part of one. For example, "unbelievably" might be split into [un][believ][ably], and a Korean sentence like "인공지능은 재미있다" ("AI is fun") might become [인공][지능][은] [재미][있][다].

This is exactly what the "token limits" in ChatGPT's pricing plans refer to. The amount of text a model can hold in mind and process at once is called its context window — the bigger it is, the longer the documents the model can read and answer about in one go.

Training: Practicing Prediction on the Whole Internet

So how did it get so good at guessing the next word? The answer is a staggering amount of practice.

Pre-training — The model practices filling in hidden parts of sentences across a vast swath of internet text, trillions of times over. Along the way it statistically absorbs grammar, common knowledge, and even patterns of reasoning.
Fine-tuning — Using human-written examples of good questions and answers, it learns the shape of a genuinely helpful response.
Reinforcement learning from human feedback (RLHF) — Among multiple candidate answers, the ones humans rate higher get reinforced, cutting down on rude or dangerous responses.

Roughly speaking, step 1 builds the "knowledge and feel for language," while steps 2 and 3 build the "conversational manners."

The Trait That Follows: Hallucination

Once you understand this principle, the LLM's most famous weakness makes perfect sense too. The model isn't looking up "facts" — it's just producing the "most plausible next words." So from time to time it confidently generates sentences that sound completely convincing but simply aren't true. This is called hallucination.

Non-existent paper titles, wrong dates, made-up court cases stated as fact — they all come from this. That's why newer services are evolving to pair answers with web search and cited sources, but you still need the habit of verifying anything important against the original source.

An LLM's answer is less like "an explanation from a friend who really knows the subject" and more like "an explanation from a friend who's really good at talking." Useful — but the fact-checking is on you.

Today's Takeaways

At its core, an LLM builds sentences by repeatedly predicting the next token.
Tokens are the units an AI reads in, and the amount it can handle at once is its context window.
Massive pre-training plus human feedback is what created today's conversational ability.
Hallucination is baked into the principle, so always verify important facts.

In the next post, we'll go beyond text and map out the full landscape of generative AI — images, video, and audio included.