Abstract
Word embeddings trained on a 4.2-billion-word corpus of 19th-century British newspapers using Word2Vec and the following
- sg = True
- min_count = 5
- window = 5
- vector_size = 100
- epochs = 5
The embeddings are divided into periods of ten years each. Unlike those in this repository, these were not aligned and OCR errors
See related GitHub repository for the full documentation: https://github.com/Living-with-machines/DiachronicEmb-BigHistData
Project website (Living with Machines): https://livingwithmachines.ac.uk/
Citation information
Nilo Pedrazzini. (2023). Decade-level Word2Vec models from automatically transcribed 19th-century newspapers digitised by the British