tBERT – Topic Models and BERT Joining Forces for Semantic Similarity Detection

Abstract

Semantic similarity detection is a fundamental task in natural language understanding. Adding topic information has been useful for previous feature-engineered semantic similarity models as well as neural models for other tasks. There is currently no standard way of combining topics with pretrained contextual representations such as BERT. We propose a novel topic-informed BERT-based architecture for pairwise semantic similarity detection and show that our model improves performance over strong neural baselines across a variety of English language datasets. We find that the addition of topics to BERT helps particularly with resolving domain-specific cases.

Citation information

Nicole Peinelt, Dong Nguyen, Maria Liakata (2020): tBERT - Topic Models and BERT Joining Forces for Semantic Similarity Detection. Proceedings of the 58th Conference of the Association for Computational Linguistics (ACL 2020), pages 7047-7055.

Turing affiliated authors