Words mean different things depending on time, context, and people (think of the recent new social media meaning of 'tweet'), which represents a challenge with any unstructured datasets. Computational research has made great advances to find meaning change in language using textual data, but has not worked on ancient languages nor has engaged humanists so far, who can offer invaluable expertise in designing and validating these systems. This highly interdisciplinary project is the first one to focus on an ancient language and has taken the first steps towards building computational models of meaning change that engage classicists.
Explaining the science
Bayesian computational models of meaning change (which infer temporal meaning representations as probability distributions over words) have been proposed to find meaning change in texts (Frermann & Lapata 2016). This project incorporates expert-driven knowledge (specifically on the genre of texts) to further improve the models.
The choice of Ancient Greek has several reasons. Its scholarship provides excellent validation data (we know the outcomes) and external knowledge bases (for example of genre of texts). Its words have particularly many different meanings. There are top-quality transcribed texts (no need to correct OCR errors), which enables applications to born-digital texts. Greek has its own language family, unlike Latin (Romance) and English (Germanic); confounding factors from languages of the same family do not apply to Greek, making this a more controlled environment and an ideal testbed for applications to modern languages.
- Build the first large-scale annotated corpus of Ancient Greek
- Develop Bayesian learning models of meaning change that use genre information.
- Annotate Ancient Greek texts with semantic information and use them to evaluate the computational models.
- Disseminate the results in natural language processing and digital humanities venues.
This work is very relevant to humanities scholarship for the investigation of word and concept change. It can also be applied in the context of historical semantic search of large historical text collections to make it possible for users to look for words with different meanings in different historical periods.
- Journal article: McGillivray, B., Hengchen, S., Lähteenoja, Palma, M., Vatri, A. (2019). A computational approach to lexical polysemy in Ancient Greek, Digital Scholarship in the Humanities doi.org/10.1093/llc/fqz036
- Article "GASC: Genre-Aware Semantic Change for Ancient Greek" at the 1st International Workshop on Computational Approaches to Historical Language Change, ACL 2019. Authors: Valerio Perrone , Marco Palma , Simon Hengchen, Alessandro Vatri Jim Q. Smith, and Barbara McGillivray)
- Poster "GASC: Genre-Aware Semantic Change for Ancient Greek" presented at the 1st International Workshop on Computational Approaches to Historical Language Change, ACL 2019 by Valerio Perrone, Simon Hengchen, and Barbara McGillivray.
- Diorisis Ancient Greek corpus released on Figshare https://doi.org/10.6084/m9.figshare.6187256.v1
- Article "The Diorisis Ancient Greek Corpus" published in Research Data Journal for the Humanities and Social Sciences, authors: Alessandro Vatri and Barbara McGillivray.
- Talk: "A computational approach to semantic change in post-Classical Greek" at the workshop “Beyond Standards: Attic, the Koiné and Atticism”, University of Cambridge (Dr Barbara McGillivray and Dr Alessandro Vatri)
- Talk: "A computational approach to lexical polysemy in Ancient Greek" at the workshop "Computational methods for literary-historical textual scholarship", Leicester, UK (Dr Barbara McGillivray and Dr Alessandro Vatri)
- Project ended
- Project started