Cleaning and Reconciling Literary Historical Data with AI: Reflections from the STEMMA Project
Digital Humanities Initiative Talk Series
Date: 21 October 2025 (Tuesday)
Time: 4:00 pm (HKT)
Via Zoom
Speaker: Prof. Erin McCarthy, Professor of English Literature and Computational Humanities and the Principal Investigator of the STEMMA Project, University of Galway
Click here to register.
About the talk
The European Research Council-funded project “STEMMA: Systems of Transmitting Early Modern Manuscript Verse, 1475–1700” aims to build the first large-scale computational model of the circulation on English-language poetry. To do so, the STEMMA team has reconciled five of the most comprehensive sources of data about early modern poetic manuscripts. In this talk, Prof. McCarthy will describe the use of computational methods such as locality sensitive hashing, cosine similarity, and LLM agents to assist with the cleaning and reconciliation of historical data. These methods allow us to strike a balance between working with “dirty” data and retaining evidence of the untidy state in which it was found. However, they still require significant computational effort and literary historical supervision. The talk will therefore reflect on the opportunities and challenges presented by such work and offer ideas about future directions.
About the speaker
Erin A. McCarthy is Established Professor of English Literature and Computational Humanities and the Principal Investigator of the European Research Council-funded project “STEMMA: Systems of Transmitting Early Modern Manuscript Verse, 1475–1700” at the University of Galway. She is the author of Doubtful Readers: Print, Poetry, and the Reading Public (Oxford University Press, 2020), which was named an Outstanding Academic Title by CHOICE and won the 2020 John Donne Society Award for Distinguished Publication. She is currently completing two monographs: a jointly authored monograph about the findings of the RECIRC project, “The Reception and Circulation of Early Modern Women’s Writing in Manuscript Miscellanies, 1550–1700,” with Marie-Louise Coolahan and Sajed Chowdhury, and a sole-authored monograph called “Interpreting Early Modern Manuscripts: Towards a New Methodology.” Her scholarship has also appeared in the John Donne Journal, SEL: Studies in English Literature 1500–1900, the Review of English Studies, Criticism, and Reformation.