logo
ResearchBunny Logo
Restoring and attributing ancient texts using deep neural networks

Humanities

Restoring and attributing ancient texts using deep neural networks

Y. Assael, T. Sommerschield, et al.

Discover Ithaca, a groundbreaking deep neural network that revolutionizes the restoration of ancient Greek inscriptions. Developed by a team of experts including Yannis Assael and Ion Androutsopoulos, it enhances historians' accuracy and significantly contributes to understanding the past.

00:00
00:00
~3 min • Beginner • English
Introduction
Epigraphy studies texts inscribed on durable materials, which survive in large numbers but are often fragmentary, displaced from their original findspots, and difficult to date due to the inorganic nature of supports. Historians must perform three crucial tasks to contextualize inscriptions: restore missing text, attribute the original geographic origin, and assign a date of writing. Traditional approaches rely on expert memory, manual searches in digital corpora, stylistic criteria and dialectal features, which can be time-consuming, inconsistent, and provide limited probabilistic support. The authors propose Ithaca, a deep neural network designed to assist historians by performing textual restoration, geographical attribution, and chronological attribution for ancient Greek inscriptions spanning the seventh century BC to the fifth century AD. Greek epigraphy presents great variability and has rich digitized corpora, making it an ideal testbed. Ithaca is intended as a collaborative, interpretable decision-support tool to enhance the accuracy, speed and reproducibility of epigraphic analysis.
Literature Review
Prior research on ancient texts has applied traditional machine learning to OCR and visual analysis, writer identification, text analysis, stylometrics and document dating. More recently, deep learning has been used for OCR, text analysis, machine translation of ancient texts, authorship attribution and decipherment. The closest prior work is Pythia (2019), a deep learning model for ancient text restoration. Subsequent works include blank language models, applications to Babylonian and Korean restoration and translation, Latin BERT for multiple NLP tasks, and classification of Cuneiform tablets by period. Ithaca extends this literature by jointly addressing restoration, geographical attribution, and chronological attribution with interpretable outputs, emphasizing human–AI cooperation and demonstrating that hybrid human–model workflows can surpass unaided humans and standalone models.
Methodology
Data: The authors constructed I.PHI from the Packard Humanities Institute (PHI) Greek inscriptions corpus (178,551 transcribed inscriptions). Processing steps included: rendering texts machine-actionable, normalizing epigraphic notations, de-duplicating (removing 9,441 duplicates), filtering out inscriptions under 50 characters, retaining editorial supplements (between square brackets), and representing unrestored/missing characters with hyphens ('-') matching the number of missing characters. Each inscription includes region labels (84 ancient regions) and heterogeneous chronological metadata. An extended ruleset normalized dates across languages and formats, yielding well-defined date intervals for approximately 60% of inscriptions; the remaining 40% lacked usable chronological metadata. The resulting I.PHI contains 78,608 inscriptions (about 1.93× Pythia’s dataset). Splits: inscriptions with PHI IDs ending in 3 and 4 were held out as test and validation sets, respectively. Model architecture: Inputs comprise joint character and word embeddings plus trainable positional embeddings (max length 768 characters). Vocabulary includes all words appearing more than 10 times (35,884 words), with under-represented words mapped to an [unk] token. The torso is an eight-block transformer decoder stack inspired by BigBird, using four sparse multihead attention mechanisms (global, local, random) per block, residual connections and layer normalization to handle long sequences efficiently. Task heads: three two-layer feedforward heads with softmax outputs for (1) restoration (per-character predictions at missing positions), (2) geographical attribution (classification across 84 regions, using the first output embedding t=1), and (3) chronological attribution (predicting a categorical distribution over decades between 800 BC and AD 800; 160 ten-year bins). For inscriptions with date ranges, ground truth is distributed uniformly across the covered decades; precise dates map to a single decade bin. Interpretability and interface: For restoration, the model outputs top-20 ranked restoration hypotheses; saliency maps highlight influential input features for restoration and attribution. Geographical predictions are visualized via maps and bar charts; chronological outputs are presented as distributions over decades. Evaluation: Four methods were compared: (a) expert historians restoring damaged texts using the training set for parallels, (b) historians aided by Ithaca’s top-20 suggestions, (c) a computational baseline reimplementation of Pythia for restoration, and (d) an onomastics baseline for attribution using the distribution of personal names. Metrics include character error rate for restoration (by masking 1–10 characters in undamaged text) and accuracy for region classification; dating performance assessed by deviation from ground-truth ranges.
Key Findings
- Restoration: Ithaca alone achieves 62% accuracy on restoring damaged texts. Historians unaided scored 25% accuracy; with Ithaca’s suggestions, their accuracy rose to 72%, demonstrating substantial human–AI synergy. - Geographical attribution: 71% accuracy across 84 ancient regions. - Chronological attribution: Predictions fall within less than 30 years of the ground-truth ranges on average, via decade-distribution outputs. - Case study on disputed Athenian decrees (three-bar sigma controversy): Training excluded these texts; Ithaca’s dates independently aligned with recent redatings, overturning conventional higher dates. I.PHI labels were on average 27 years off the newer lower datings, whereas Ithaca’s predictions were on average only 5 years off. No prediction exceeded 433 BC; average predicted date across the decrees was 421 BC. - The model provides interpretable top-20 restorations and saliency maps, facilitating expert verification and collaborative decision-making.
Discussion
The results demonstrate that Ithaca effectively addresses the core epigraphic tasks: restoring missing text, and attributing inscriptions across time and space. Crucially, the collaborative use of Ithaca with historians substantially increases restoration accuracy and can narrow broad or vague chronological intervals, improving historical precision and enabling relative datings of events. The geographically and chronologically attributed outputs, combined with interpretable probability distributions and saliency maps, enhance transparency and support scholarly reasoning. The successful redating of controversial Athenian decrees illustrates Ithaca’s potential to inform and reshape key methodological debates in ancient history, offering data-driven corroboration or challenges to traditional criteria (such as letterform-based dating). Overall, the model augments epigraphic workflows by improving speed, accuracy and reproducibility, and shows the broader value of human–AI cooperation in the humanities.
Conclusion
Ithaca is presented as the first integrated, interpretable deep learning system for epigraphic restoration, geographical attribution and chronological attribution. By improving the accuracy and speed of the epigrapher’s workflow and enabling collaborative human–AI analysis, it enhances the historical value of inscriptions and supports more holistic studies of epigraphic practices across the ancient world. The authors provide an open, publicly available interface (https://ithaca.deepmind.com) for researchers. The approach generalizes to other disciplines dealing with ancient texts (papyrology, numismatics, codicology), other languages, and can incorporate additional metadata (e.g., images, stylometrics). Future directions include further interactive, human-in-the-loop training paradigms to strengthen cooperative performance and extend applicability across domains.
Limitations
- Domain and data scope: The model is trained on ancient Greek inscriptions from the PHI corpus; generalization to other languages or corpora requires additional data and adaptation. - Chronological metadata coverage: Only about 60% of PHI inscriptions yielded standardized date intervals; 40% lacked usable chronological labels, potentially limiting dating supervision and evaluation. - Reliance on transcriptions: The system operates on interpretive transcriptions, not raw inscription images; orthographic/editorial conventions and normalization choices may influence outcomes. Image-based features (e.g., letterforms) are not integrated in this version. - Evaluation constraints: Restoration accuracy is assessed via artificially masked characters in undamaged text, which may not fully capture real-world damage patterns. - Geographic granularity: Attribution is to 84 predefined regions; finer-grained provenancing may require additional metadata and models.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny