Multilingual Extractive Question Answering With Conflibert for Political and Social Science Studies

dc.contributor.advisorKhan, Latifur
dc.contributor.committeeMemberGogate, Vibhav
dc.contributor.committeeMemberMazidi, Karen
dc.creatorWhitehead, Parker Madden 2001-
dc.date.accessioned2023-10-26T19:43:44Z
dc.date.available2023-10-26T19:43:44Z
dc.date.created2023-08
dc.date.issuedAugust 2023
dc.date.submittedAugust 2023
dc.date.updated2023-10-26T19:43:45Z
dc.description.abstractPolitical conflict and violence have emerged as prominent concerns for political scientists in both academia and policy circles. The overwhelming influx of complex and dense news makes it increasingly challenging to effectively monitor and analyze political events. To address this challenge and contribute to the advancement of conflict research, we propose the introduction of ConfliBERT English and ConfliBERT Spanish. These two domain-specific pre-trained language models are specifically designed for the analysis of political conflict and violence, and have undergone fine-tuning to excel in extractive question answering tasks, which are not susceptible to hallucination. The pre-training of our ConfliBERT models utilized our comprehensive conflict-specific corpus from diverse sources. In order to evaluate the performance of ConfliBERT for extractive question-answering, We performed fine-tuning on SQuAD v1.1 and NewsQA, two large question-answering datasets. Additionally, we created ConfliQA English and Spanish, two crowd-sourced evaluation datasets for conflict- domain extractive QA. Through extensive experimentation and evaluation on all versions of ConfliBERT English and Spanish, we proved that ConfliBERT English outperforms in analyzing political texts compared to BERT English baseline models, and provided detailed insight into further developing ConfliBERT for low-resource languages.
dc.format.mimetypeapplication/pdf
dc.identifier.uri
dc.identifier.urihttps://hdl.handle.net/10735.1/9984
dc.language.isoEnglish
dc.subjectArtificial Intelligence
dc.subjectInformation Science
dc.subjectComputer Science
dc.subjectPolitical Science, General
dc.titleMultilingual Extractive Question Answering With Conflibert for Political and Social Science Studies
dc.typeThesis
dc.type.materialtext
thesis.degree.collegeSchool of Engineering and Computer Science
thesis.degree.departmentComputer Science
thesis.degree.grantorThe University of Texas at Dallas
thesis.degree.nameMS
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
WHITEHEAD-PRIMARY-2023.pdf
Size:
680.35 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
proquest_license.txt
Size:
6.38 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
license.txt
Size:
1.99 KB
Format:
Plain Text
Description: