Toward Practical Automatic Program Repair
Date
Authors
ORCID
Journal Title
Journal ISSN
Volume Title
Publisher
item.page.doi
Abstract
Automatic program repair (APR) is one of the recent advances in automated software engineering aiming for reducing the burden of debugging by suggesting patches that either directly fix the bugs or help the programmers during manual debugging. Despite the remarkable progress of APR in the last decade, state-of-the-art techniques suffer from problems in three areas of scalability (i.e., handling large systems), applicability (i.e., handling different programming languages), and effective patch correctness assessment (i.e., combating weak specification problem of test suites), reducing practicality of APR. In this dissertation, we take steps toward realizing practical APR by proposing solutions to alleviate each of the above-mentioned problems. As for scalability and applicability, we introduce and evaluate Java Virtual Machine (JVM) bytecode-level patch generation and validation which allows (1) on-the-fly patch generation and validation and (2) uniform treatment of programs written in dozens of programming languages. Through on-the-fly patch generation and validation, we bypass many expensive steps in transforming programs making our technique 10+X faster than state-of-the-art. This speed-up, in turn, allows our technique to explore more of repair search space and find more genuine fixes than state-of-the-art fixing 55 (out of 587) Defects4J bugs. We provide empirical evidence on the applicability of our technique on programming languages other than Java by applying it on 118 Kotlin bugs from Defexts data set, out of which 14 was fixed. We also introduce and evaluate a technique for correctness assessment of automatically generated patches via both ranking and classification. Our technique is based on the idea that the buggy program is almost correct insofar as fixing bugs involves small changes to the code and does not remove the code implementing correct functionality of the program. Thus, we measure the impact of patches on both production code (via syntactic and semantic similarity) and test code (via code coverage) to separate the patches that result in similar programs and that do not delete desired program elements. We evaluated our technique on 1,290 patches, generated by 29 Java-based APR systems for Defects4J programs. The technique outperforms state-of-the-art raking and classification techniques. Specifically, in 43% (66%) of the cases, it ranks the correct patch in top-1 (top2) positions, and in classification mode, the technique achieves an accuracy and F1-score of 0.855 and 0.846, respectively. We implement all these ideas in an integrated framework named PRF which can both be used as an APR tool and as a framework for developing novel research prototypes.