Digital Humanities Initiative Talk Series: Who Did What: The Challenges of Multilingual Text Analysis, Grammar, and Copyright

25 September 2023

Digital Humanities Initiative Talk Series: Who Did What: The Challenges of Multilingual Text Analysis, Grammar, and Copyright

Date: 25 September 2023 (Mon)
Time: 12 noon (HK time)
On Zoom
Language: English

Please click here to register

Speaker:
Quinn Dombrowski,
Academic Technology Specialist in Literatures, Cultures, and Languages
Center for Interdisciplinary Digital Research
Stanford University

The approaches of computational text analysis have made it possible, for quite some time now, to ask certain literary questions at large scale. Methods such as topic modeling, stylometry, principal component analysis on word frequencies, and word vectors can be used to find thematic and topical patterns across large corpora, and answer questions that would be impossible to approach using traditional means. One reason why digital humanities has had a relatively limited impact on the field of literary analysis is the fact that those large-scale questions resonate less well with the field more broadly. Within the last five years, advances in natural-language processing (NLP) techniques have made it possible to do meaningful computational work on a scale similar to that used by more traditional scholars, making it possible to analyze a text at the level of the sentence, and track a character’s description or action in detail. Shifting the scale of reading that computational methods can support from broad themes across large corpora, to individual character actions has significant potential for bringing digital humanists into a closer dialogue with traditional literary scholars, but the exact nature of what can be analyzed depends to a great extent on the language of the text, and what markers (e.g. politeness, gender, time) are explicitly encoded in the grammar. This talk will take as a starting point work on distinctive character verbs done through a Stanford Literary Lab project on Star Wars novels. It will explore how tools like spaCy and David Bamman’s BookNLP can allow us to computationally pursue questions with greater crossover interest into traditional literary studies, as well as the affordance of different languages when gathering evidence for making a traditional literary studies argument. Finally, it will touch on the evolving landscape of copyright law and fair dealing in Hong Kong and the US, as it relates to acquiring texts in the necessary form to do this kind of analysis.

About the speaker:

Quinn Dombrowski (non-binary, any pronouns are fine) is the Academic Technology Specialist in the Division of Literatures, Cultures, and Languages, and in the Library, at Stanford University. Prior to coming to Stanford in 2018, Quinn’s many DH adventures included supporting the high-performance computing cluster at UC Berkeley, running the DiRT tool directory with support from the Mellon Foundation, writing books on Drupal for Humanists and University of Chicago library graffiti, and working on the program staff of Project Bamboo, a failed digital humanities cyberinfrastructure initiative. Quinn has a BA/MA in Slavic Linguistics from the University of Chicago, and an MLIS from the University of Illinois at Urbana-Champaign. Since coming to Stanford, Quinn has supported numerous non-English DH projects, taught courses on non-English DH, started a Textile Makerspace, developed a tabletop roleplaying game to teach DH project management, explored trends in multilingual Harry Potter fanfic, and started the Data-Sitters Club, a feminist DH pedagogy and research group focused on Ann M. Martin’s 90’s girls series “The Baby-Sitters Club”. Quinn is currently co-VP of the Association for Computers and the Humanities along with Roopika Risam, and advocates for better support for DH in languages other than English.