An Integrated Approach for Automated Software Debugging via Machine Learning and Big Code Mining





Journal Title

Journal ISSN

Volume Title



Over the past decades, software systems have been widely adopted in almost all aspects of human lives, and are making our lives more and more convenient. However, software systems also inevitably suffer from different faults (a.k.a., bugs), which can incur great loss of properties and even lives. Due to the huge code volume, manual debugging can be always time-consuming and error-prone. This thesis is a novel integrated approach for automated debugging that can help localize and detect different software faults. Specifically, fault localization (FL) can help localize the potential faulty location(s) if some test cases fail in a program while API-misuse detection can help detect API related bugs due to API misuses without the execution of test cases. We seek to improve the effectiveness of fault localization and API misuses detection by applying knowledge from various fields such as static and dynamic program analysis, machine learning/deep learning techniques, as well as mining big code repositories. In this dissertation, we propose two fault localization techniques and one API-misuse detection technique. The first fault localization technique is called TraPT, which is a learning-to-rank-based technique to combine transformed impact information extracted from mutation-based fault localization (MBFL) and coverage information extracted from spectrum-based fault localization (SBFL). The second fault localization technique is called DeepFL which is the first deep-learning-based fault localization technique integrating various dynamic and static program features. The two fault localization techniques rely on high-quality test cases to capture necessary program features but not all software systems can provide such tests, making fault localization techniques not always available. To solve more comprehensive debugging problems, we also propose an API-misuse detection technique called BiD3 based on the analysis of a large-scale of bug-fixing commits (958,368 commits in total) in history, which doesn’t require the execution of test cases. Various experiments on the three techniques show the promising effectiveness. For example, DeepFL can localize 213 faults within Top-1 out of 395 real faults, 53 more faults than state-of-the-art technique (33.1% improvement). BiD3 can detect 360 real misuses in the latest Apache projects and 57 misuses have been confirmed and fixed by developers.



Software engineering, Debugging in computer science, Computer software -- Testing, Computer programs -- Testing, Machine learning


©2020 Xia Li. All rights reserved.