Ask AI for a reading list on a focused scholarly topic. Then start verifying citations together as a class. Some will be real. Some will be distorted. Some will be entirely fabricated while sounding perfectly plausible.
Choose a topic narrow enough to sound scholarly but broad enough that students will not already know the literature by heart. Ask AI for eight to ten key books and articles. Then have students track each citation across library catalogs, publisher pages, journal databases, and Google Scholar.
The exercise is not trying to turn library instruction into a gotcha moment about AI; it uses AI’s fluent mistakes to make source evaluation feel concrete, consequential, and part of expert work. It also opens a conversation about how AI works, why confabulation happens, and what particular errors might reveal about training data.
Works individually and as a group exercise where each team verifies two or three citations and reports back. The room usually ends up with a mix of confirmed sources, half-right sources, and fully invented ones.
What to verify:
Prompt
Give me a reading list of 8 to 10 important scholarly works on [topic]. Include author, full title, journal or publisher, year, and a one-sentence note about why each source matters.
Unlike more abstract conversations about hallucination, this exercise gives students a task with a clear answer. Either the source exists or it does not. Either the metadata is right or it is not. That clarity makes it a strong early-semester exercise in classes that involve research papers, annotated bibliographies, or historiographic review.
Once students start finding errors, the conversation usually shifts from “AI makes mistakes” to the more useful question: why are we so easily persuaded by the look and tone of correctness?
AI tools are getting better at avoiding fabrications when they are explicitly asked to verify sources. That creates another useful teaching moment: ask the tool to explain precisely where its citations came from.
A follow-up prompt
Verify each citation for accuracy and tell me precisely how you verified or generated these citations.
Students can then see how the model stitches together nearby ideas, titles, authors, and publication habits into something that feels plausible but has no real source behind it.
Test out a sample bibliography or two before class. AI tools change quickly, and some topics and sources are less error-prone than others. Vaguer prompts tend to get looser bibliographies.
Frame it carefully: the lesson is not simply “AI is bad because it makes mistakes.” The lesson is to think about the nature of LLMs, how prompting can get different levels of precision, and that verification is a crucial scholarly habit that defines expertise.