Automated Fine-grained Requirements-to-Code Traceability Link Recovery




Journal Title

Journal ISSN

Volume Title



Requirements traceability is not only necessary to support software maintenance activities such as acceptance testing and bug fixing, but it is also mandatory in safety-critical domains. Since accurate or sufficient traces are rarely established during development, very often they have to be retrieved after the fact, in a process called traceability link recovery. Traditionally, this has been formulated as a text retrieval problem, with many approaches improving over this basic formulation by leveraging additional sources of information. Unfortunately, after more than two decades of research, the performance and adoption of these approaches remain fairly low, with one of the key problems being identified as the lack of consistent granularity in trace formulation. Requirements-to-code traceability, and retrieval with line-of-code granularity remain two particularly difficult problems in this area. To mitigate these issues, this dissertation proposes recasting the requirements-to-code traceability link recovery problem as an accurate, heuristic-based approach by focusing on a specific subset of requirements. Namely, we propose exploiting a certain type of business rules (data constraints) to enable more precise traceability link recovery with a line-of-code granularity. The central idea enabling this research is the hypothesis that data constraints are not formulated or implemented in arbitrary ways, but rather follow a series of patterns. In this dissertation, we present empirical studies in support of this hypothesis, along with the techniques that are enabled thereby. Specifically, we qualitatively study a set of of data constraints, discovering patterns in both their textual formulation and their implementation. These patterns are then used to develop an approach that not only allows the creation of traces with line-of-code granularity, but which can also improve the performance of state-of-the-art approaches. We expect that the presented insights and techniques will not only improve the state of the art in requirements-to-code traceability link recovery, but also enable new avenues of research in the field. The work also has applications in areas such as code reviews, automated test generation, and bug localization.



Computer Science