Pre-trained Spanish Language Model for Political Conflict and Violence

Date

August 2023

Journal Title

Journal ISSN

Volume Title

Publisher

item.page.doi

Abstract

Examining political conflict and violence remains a persistent challenge for the political sci- ence and policy communities, because there comes large amount of text to be dealt with to monitor political conflict and violence. In order to contribute to the advance of conflict research in Spanish speaking society, we introduce ConfliBERT Spanish, a domain-specific pre-trained language model tailored for Spanish political conflict and violence analysis. Our method begins with the collection of a comprehensive domain-specific corpus from diverse sources, which is then utilized for language modeling purposes. ConfliBERT Spanish is subsequently developed using continual pre-training process. To evaluate the practical per- formance of ConfliBERT Spanish, we assembled 5 datasets and implemented 3 tasks using them. Through multiple experiments and evaluations on various versions of ConfliBERT Spanish, we proved that ConfliBERT Spanish outperforms in analyzing Spanish political conflict and violence compared to BERT baseline models.

Description

Keywords

Computer Science

item.page.sponsorship

Rights

Citation