Go to Allen AI Playground, pick the smallest available model, and ask it one question: What does cilantro taste like? Then pick the largest model and ask the same question. The difference between those two answers is a 20-minute lesson in how language models actually work.
Have students open playground.allenai.org on their own devices, or run it as a class demo on a projector.
Round 1 — small model: Select an older, lower-parameter model from the dropdown. Submit the prompt. Note what comes back: responses from small models tend to be short, repetitive, or circular — describing cilantro in terms of cilantro, producing thin or generic sentences, sometimes looping.
Round 2 — large model: Switch to the largest available model and submit the same prompt. The difference is usually striking. Larger models typically describe cilantro’s fresh, citrusy, herbal quality — and often mention the well-known genetic variation that makes it taste like soap to roughly 10% of people. That detail is a good marker of genuine knowledge depth.
Round 3 — model settings: With the large model selected, experiment with the available parameters:
Cilantro is an ideal demo question because it has a concrete, sensory answer students can verify from their own experience — and because the soap-taste detail is a meaningful test of whether a model has encountered real knowledge about the world, or just plausible-sounding filler. The genetic variation angle also opens a brief but productive conversation about what “training data” actually means: the model knows about cilantro-as-soap because enough people wrote about it.
The parameter demo grounds abstract concepts — temperature, tokens, sampling — in something students can observe in real time rather than read about in a textbook. Most students arrive thinking AI is a single, fixed thing. This activity gives them a concrete counter-example in under 30 minutes.
The specific models available on the Allen AI Playground change over time — the contrast between small and large models is the point, not any particular model name. If the playground changes its interface or model lineup, the activity still works; just find the smallest and largest options available.
The activity works best when students generate responses on their own devices so they can see variation across the class — even the same prompt to the same model will produce slightly different output each time, which is itself worth discussing.