Watch how language models generate text one word at a time by sampling from probability distributions. The model doesn't always pick the most likely word—it randomly samples based on the probabilities!
Select Example:
Input Tokens:
The
scientist
discovered
3 tokens
Billions of
Parameters
LLM (Neural Network)
Input Tokens
Probability Distribution
Next Token Probabilities:
that
35%
a
28%
new
18%
how
10%
evidence
6%
proof
2%
something
1%
→
→
Step 0 of 5● Generating...
Key Concepts
✓ Autoregressive: One token at a time
✓ Probabilistic: Random sampling, not always top choice
✓ Context matters: Previous tokens influence next distribution