Pre-trained Spanish Language Model for Political Conflict and Violence
Date
Authors
ORCID
Journal Title
Journal ISSN
Volume Title
Publisher
item.page.doi
Abstract
Examining political conflict and violence remains a persistent challenge for the political sci- ence and policy communities, because there comes large amount of text to be dealt with to monitor political conflict and violence. In order to contribute to the advance of conflict research in Spanish speaking society, we introduce ConfliBERT Spanish, a domain-specific pre-trained language model tailored for Spanish political conflict and violence analysis. Our method begins with the collection of a comprehensive domain-specific corpus from diverse sources, which is then utilized for language modeling purposes. ConfliBERT Spanish is subsequently developed using continual pre-training process. To evaluate the practical per- formance of ConfliBERT Spanish, we assembled 5 datasets and implemented 3 tasks using them. Through multiple experiments and evaluations on various versions of ConfliBERT Spanish, we proved that ConfliBERT Spanish outperforms in analyzing Spanish political conflict and violence compared to BERT baseline models.