Early Slavic language models

Abstract

Word embeddings trained on the lemmatised TOROT Treebank, using Word2Vec and the following parameters:

sg = True
min_count = <1,3,5>
window = <3,5>
vector_size = <100,200,300>
epochs = 5

One model was trained for each combination of the parameters enclosed in angled brackets (< >). 

The release contains both the full models (.model) and the plain vector files (_vectors.txt). The models are named according to the parameters they were trained with.

Note that these are the result of very preliminary experiments and no systematic evaluation of their quality was carried out, so use with caution.

Citation information

Nilo Pedrazzini. (2023). Early Slavic language models [Data set]. Zenodo. https://doi.org/10.5281/zenodo.8414137

Turing affiliated authors