Transformer Based Model for Political Spanish Text

Date

May 2023

ORCID

Journal Title

Journal ISSN

Volume Title

Publisher

item.page.doi

Abstract

Conflict research is a subfield of political science which covers protests, riots, repression, genocide, criminal violence etc. Conflict researchers are interested in tracking and analyzing conflict events. Due to the large number of conflicts happening across the globe, manually tracking and annotating conflicts can be a laborious task and so researchers use language models to automate the process. While transformer-based language models have already been trained on English text, there has been no work done on training models on Spanish text to the best of our knowledge. Spanish is one of the most widely spoken languages in the world and it’s the medium used to express many conflicts happening in Latin America and so a model trained exclusively on Spanish text would hypothetically outperform models based on other languages. With this objective in mind first a domain-specific text corpus is mined from various Spanish websites and then a BERT based model is trained from scratch on the corpus. The model is then evaluated on downstream tasks on some available datasets to assess the model’s practical application in conflict research. Finally, we evaluate several versions of BERT to compare the performance of our model.

Description

Keywords

Computer Science

item.page.sponsorship

Rights

Citation