Browsing by Author "Zhang, Lingming"

Now showing 1 - 3 of 3

An Integrated Approach for Automated Software Debugging via Machine Learning and Big Code Mining
(2020-08) Li, Xia; Zhang, Lingming
Over the past decades, software systems have been widely adopted in almost all aspects of human lives, and are making our lives more and more convenient. However, software systems also inevitably suffer from different faults (a.k.a., bugs), which can incur great loss of properties and even lives. Due to the huge code volume, manual debugging can be always time-consuming and error-prone. This thesis is a novel integrated approach for automated debugging that can help localize and detect different software faults. Specifically, fault localization (FL) can help localize the potential faulty location(s) if some test cases fail in a program while API-misuse detection can help detect API related bugs due to API misuses without the execution of test cases. We seek to improve the effectiveness of fault localization and API misuses detection by applying knowledge from various fields such as static and dynamic program analysis, machine learning/deep learning techniques, as well as mining big code repositories. In this dissertation, we propose two fault localization techniques and one API-misuse detection technique. The first fault localization technique is called TraPT, which is a learning-to-rank-based technique to combine transformed impact information extracted from mutation-based fault localization (MBFL) and coverage information extracted from spectrum-based fault localization (SBFL). The second fault localization technique is called DeepFL which is the first deep-learning-based fault localization technique integrating various dynamic and static program features. The two fault localization techniques rely on high-quality test cases to capture necessary program features but not all software systems can provide such tests, making fault localization techniques not always available. To solve more comprehensive debugging problems, we also propose an API-misuse detection technique called BiD3 based on the analysis of a large-scale of bug-fixing commits (958,368 commits in total) in history, which doesn’t require the execution of test cases. Various experiments on the three techniques show the promising effectiveness. For example, DeepFL can localize 213 faults within Top-1 out of 395 real faults, 53 more faults than state-of-the-art technique (33.1% improvement). BiD3 can detect 360 real misuses in the latest Apache projects and 57 misuses have been confirmed and fixed by developers.
Towards Faster Software Revision Testing
(2021-12-01T06:00:00.000Z) Chen, Lingchao; Huynh, Dung; Griffith, Todd; Liu, Cong; Yang, Wei; Zhang, Lingming
Software systems have been increasingly prevalent in all facets of our lives over the last few decades and play a critical role in modern living. They have a significant impact on the quality of our lives and provide tremendous convenience. However, software faults (also known as bugs) are unavoidable throughout the development of software systems that can have a substantial negative impact on the commercial company and result in significant losses. Numerous researchers have been working on this problem to test software systems during development and fix bugs after the software systems are established. However, due to the complexity of these systems, these approaches can be very time consuming. For example, mutation testing is an important component of software testing which can be very powerful to evaluate the quality of the test suite, but it can be extremely time consuming due to a large number of mutant execution. Also, Automated Program Repair (APR) techniques can reduce software debugging human efforts by advising plausible patches for buggy programs. However, the APR techniques need to repeatedly execute all the test suites to identify the plausible patches for the bugs under fixing. This process could be extremely costly. Therefore, it is essential to explore some approaches to speed up the processes of software testing and debugging. In this dissertation, we aim to speed up software testing and debugging via faster software revision testing. The idea is to decrease the testing time between different revisions to speed up software testing and debugging. We explored two scenarios in software testing during the evolution of software systems: mutation testing and behavioral backward incompatibilities (BBIs) detection. We applied regression test selection (RTS) techniques to speed up mutation testing for the first study. Our study showed that both file-level static and dynamic RTS could achieve efficient and precise mutation testing, providing practical guidelines for developers. We called the second BBIs detection technique DeBBI which can reduce the end-to-end testing time for detecting the first and average unique BBIs by 99.1% and 70.8% for JDK compared to naive cross-project BBIs detection. Additionally, we detected 97 BBI bugs including 19 that were previously confirmed as unknown bugs. Lastly, we explored the application in patch validation of APR technique to speed up software debugging. We treated every single patch as a revision to develop a unified on-the-fly patch validation framework, named UniAPR. Our study demonstrated that on-the-fly patch validation could often speed up state-of-the-art source-code-level APR by over an order of magnitude, enabling all existing APR techniques to explore a more extensive search space to fix more bugs in the near future.
Towards Improving Program Repair via Patch Repair Execution Information
(2022-05-01T05:00:00.000Z) Benton, Samuel; Zhang, Lingming; Marcus, Andrian; Kim, Jiyoung; Wei, Shiyi; Yang, Wei
Automated program repair is increasingly integral in the debugging process of real-world software systems everywhere. Despite decades of research, however, fault localization on real-world systems suffers from heavy imprecision and inaccuracies. Modern program repair is founded on fault localization as an initial step, so this imprecision feeds into the program repair process to ultimately harm program repair speed and effectiveness. Modern debugging techniques, such as automated program repair, often exclusively use fault localization computed at the beginning of software debugging, essentially ignoring extensive dynamic information generated during the debugging process. As techniques develop and computer systems mature, automated program repair techniques must validate increasingly more patch candidates. This extended validation stage is extremely time consuming to the point where repair techniques must often compromise their repair effectiveness to boost performance. Many solutions have attempted to minimize the length of patch validation, but such solutions are also either resource heavy or exclusively applicable to specific types of debugging techniques. The aforementioned issues provide rather challenging obstacles in the program repair process, ultimately impeding bug correction within software systems due to (1) imprecision in fault localization accuracy, (2), extremely time-consuming debugging, and (3) reduced repair scope. This dissertation addresses the aforementioned issues by investigating the usage of patch repair execution information inherent within automated program repair tools to automatically improve the program repair process without the need for external data or changing the foundation of existing program repair tools. Specifically, this research demonstrates the successful usage of patch repair execution information collected from up to 17 recent stateof-the-art Java-based automated program repair tools to (1) improve both fault localization accuracy and (2) automatically boost performance for automated program repair tools, both achievements based on experimental data against the Defects4J 1.0.0 and Defects4J 1.2.0 benchmarks. This research finds numerous discoveries beneficial to software debugging and automated program repair. Specifically, this research introduces 4 fault localization techniques (UniDebug, UniDebug+, UniDebug++, and UniDebug+⋆ ) utilizing increasing levels of patch repair execution information. These variants boost upon commonly used spectra-based fault localization up to 63% at the method-level granularity and several magnitudes higher (+5,200%) at the statement-level granularity. Additionally, this research introduces one patch validation technique, SeAPR. SeAPR with minimal patch repair execution information shows reductions in the number of patches validated up to 80% and the length of patch validation reduced up to 56%. More so, utilizing additional patch repair execution information further boosts patch reductions in all observed tools. Overall, the research shows that (1) patch repair execution information contains information useful to grossly improve upon initial fault localization strategies in nearly every scenario, (2) patch repair execution information boosts learning-based fault localization techniques, (3) patch repair execution information can be used to reduce the number of patch execution to the first plausible patch, and (4) the potential benefits of patch repair execution information are independent of the effectiveness of the underlying program repair tool.