Abstract
Diachronic word embeddings trained on a 4.2-billion-word corpus of 19th-century British newspapers, using Word2Vec and the following parameters:
sg = True
min_count = 1
window = 3
vector_size = 200
epochs = 5
The embeddings are divided into time slices of ten years each, with the vectors from each decade aligned to the ones from the most recent decade (1910s) using Orthogonal Procrustes.
Source code used to train the embeddings and tools to carry out time-series analysis (e.g. change point detection): https://github.com/Living-with-machines/DiachronicEmb-BigHistData
Citation information
Pedrazzini, Nilo & Barbara McGillivray. (2022). Diachronic word embeddings from 19th-century British newspapers [Data set]. Zenodo. https://doi.org/10.5281/zenodo.7181682