Transformer Based Model for Political Spanish Text

dc.contributor.advisorKhan, Latifur
dc.contributor.committeeMemberBastani, Farokh B.
dc.contributor.committeeMemberWu, Weili
dc.creatorZawad, Niamat
dc.date.accessioned2023-09-19T19:27:57Z
dc.date.available2023-09-19T19:27:57Z
dc.date.created2023-05
dc.date.issuedMay 2023
dc.date.submittedMay 2023
dc.date.updated2023-09-19T19:27:58Z
dc.description.abstractConflict research is a subfield of political science which covers protests, riots, repression, genocide, criminal violence etc. Conflict researchers are interested in tracking and analyzing conflict events. Due to the large number of conflicts happening across the globe, manually tracking and annotating conflicts can be a laborious task and so researchers use language models to automate the process. While transformer-based language models have already been trained on English text, there has been no work done on training models on Spanish text to the best of our knowledge. Spanish is one of the most widely spoken languages in the world and it’s the medium used to express many conflicts happening in Latin America and so a model trained exclusively on Spanish text would hypothetically outperform models based on other languages. With this objective in mind first a domain-specific text corpus is mined from various Spanish websites and then a BERT based model is trained from scratch on the corpus. The model is then evaluated on downstream tasks on some available datasets to assess the model’s practical application in conflict research. Finally, we evaluate several versions of BERT to compare the performance of our model.
dc.format.mimetypeapplication/pdf
dc.identifier.uri
dc.identifier.urihttps://hdl.handle.net/10735.1/9868
dc.language.isoEnglish
dc.subjectComputer Science
dc.titleTransformer Based Model for Political Spanish Text
dc.typeThesis
dc.type.materialtext
thesis.degree.collegeSchool of Engineering and Computer Science
thesis.degree.departmentComputer Science
thesis.degree.grantorThe University of Texas at Dallas
thesis.degree.nameMSCS

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
ZAWAD-PRIMARY-2023.pdf
Size:
3.22 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 2 of 2
No Thumbnail Available
Name:
proquest_license.txt
Size:
6.37 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
license.txt
Size:
1.99 KB
Format:
Plain Text
Description: