In this paper we present a methodology based on distributional semantic models that can be flexibly adapted to the specific challenges posed by historical texts and that allow users to retrieve semantically relevant text without the need to close-read the documents. We focus on a case study concerned with detecting smell-related sentences in historical medical reports. We demonstrate a process for moving from generic domain label input to a more nu- anced evaluation of the semantics of smell in a set of sentences extracted from this corpus, and then develop a machine learn- ing technique for compounding scores on a variety of modelling parameters into more effective classifications.

Citation information

McGregor, S. and McGillivray, B. (2018). A Distributional Semantic Methodology for Enhanced Search in Historical Records: A Case Study on Smell. Proceedings of the 14th Conference on Natural  Language Processing (KONVENS 2018) Vienna, Austria, September 19-21, 2018. Austrian Academy of Sciences Press.  

Download list