Medieval Handwriting Recognition Workflow

Teaches: how to design a multi-model agentic pipeline · the difference between usable and citable transcription quality · trade-offs between cost, time, and accuracy at scale
You gain: building an agentic pipeline for bulk document processing · combining multiple LLMs to improve transcription accuracy · enabling full-text search of handwritten archival sources
You'll need: Gemini, Claude, Claude Code
Format: hours to set up; ~12hr/register · researcher

The government of Spain hosts a website called PARES which contains over a million digitized images of archival documents — including much of the material in the Archive of the Crown of Aragon. By the fourteenth century, the chancery produced thousands of pages of documentation every year. A single register might run to 300 folios of dense Gothic secretarial script. These documents have been digitized but never transcribed at scale.

The Workflow

An example of a folio from an ACA register, ACA CR R2053 f4r.

In early February 2026, uploading a PARES image to Gemini produced a transcription of better quality than what I had gotten from the specialized HTR platform Transkribus, even after training a model there with 60 documents of ground-truth transcriptions. In late February, I started combining results from Gemini and Claude to increase transcription quality further.

By March, I was using agentic AI — specifically Claude Code — to obtain usable HTR and translations for entire registers. The pipeline runs as follows: Claude Code downloads images from PARES, passes each one to Gemini and Claude for parallel transcription, merges the outputs, and writes the result to a text file. A final pass combines all image-level text files into a single CSV.

Register 1819 was the first complete register I processed. Register 2053 — the third — produced notably higher quality output, suggesting that prompt refinement and model improvements between February and March made a measurable difference.

Why It Matters

Generating big data from handwritten documents previously unavailable for automated text recognition.
Full-text keyword search for names and toponyms across entire registers.
Leveraging multiple LLMs to check and correct each other’s work.
Analysis of image archives at scale — not just scanned documents but any collection of historical images.

What to Watch For

It takes about 12 hours to generate transcriptions from a 300-page register, and API costs run approximately $75 per register. The resulting text enables discovery through full-text search but is not reliable enough for citation-level accuracy — dates in particular remain inconsistent even after pipeline refinements.