Automated extraction of data constraints from software documentation

dc.contributor.advisorMarcus, Andrian
dc.contributor.committeeMemberWei, Shiyi
dc.contributor.committeeMemberChung, Lawrence
dc.creatorZhou, Ying 1998-
dc.date.accessioned2023-10-26T19:34:39Z
dc.date.available2023-10-26T19:34:39Z
dc.date.created2023-08
dc.date.issuedAugust 2023
dc.date.submittedAugust 2023
dc.date.updated2023-10-26T19:34:39Z
dc.description.abstractData constraints encompass crucial business rules that specify the values allowed or required for the data utilized within a software system. These constraints are typically described in textual software artifacts (e.g., requirements and design documents, or user manuals). Previous research on data constraints in software focused on studying their implementation in the code for identifying inconsistencies or to support their traceability. This thesis contribute to the existing knowledge by studying 548 data constraints described in the documentation of nine systems. We identified and documented 15 linguistic discourse patterns employed by stakeholders to describe data constraints in natural language. In a comprehensive extensive study, we explore the use of the discourse patterns we discovered, along with linguistic elements, the operands of the data constraints and their types, as features for automatically classifying sentence fragments as data constraint descriptions. The best combination of features and learner achieves 70.87% precision and 59.73% recall (64.76% F1). The discoveries made in this thesis represent a significant advancement in the automated identification and extraction of data constraints from natural language text, which in turn is essential for enabling the automation of traceability to code and facilitating test generation associated with these constraints.
dc.format.mimetypeapplication/pdf
dc.identifier.uri
dc.identifier.urihttps://hdl.handle.net/10735.1/9983
dc.language.isoEnglish
dc.subjectComputer Science
dc.titleAutomated extraction of data constraints from software documentation
dc.typeThesis
dc.type.materialtext
thesis.degree.collegeSchool of Engineering and Computer Science
thesis.degree.departmentComputer Science
thesis.degree.grantorThe University of Texas at Dallas
thesis.degree.nameMSCS

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
ZHOU-PRIMARY-2023.pdf
Size:
535.78 KB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 2 of 2
No Thumbnail Available
Name:
license.txt
Size:
1.98 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
proquest_license.txt
Size:
6.37 KB
Format:
Plain Text
Description: