Software Bugs and Fixes

Detecting Build Conflicts in Software Merge for Java Programs via Static Analysis (ASE22)

In software merge, the edits from different branches can textually overlap (i.e., textual conflicts) or cause build and test errors (i.e., build and test conflicts), jeopardizing programmer productivity and software quality. Existing tools primarily focus on textual conflicts; few tools detect higher-order conflicts (i.e., build and test conflicts). However, existing detectors of build conflicts are limited. Due to their heavy usage of automatic build, current detectors (e.g., Crystal) only report build errors instead of identifying the root causes; de- velopers have to manually locate conflicting edits. These detectors only help when the branches-to-merge have no textual conflict.

We present a new static analysis-based approach Bucond (“build conflict detector”). Given three code versions in a merging scenario: base b, left l, and right r, Bucond models each version as a graph, and compares graphs to extract entity-related edits (e.g., class renaming) in l and r. We believe that build conflicts occur when certain edits are co-applied to related entities between branches. Bucond realizes this insight via pattern matching to identify any cross-branch edit combination that can trigger build conflicts (e.g., one branch adds a reference to field F while the other branch removes F). We systematically explored and devised 57 patterns, covering 97% of the build conflicts in our experiments. Our evaluation shows Bucond to complement build-based detectors, as it (1) detects conflicts with 100% precision and 88%–100% recall, (2) locates conflicting edits, and (3) works well when those detectors do not.

A Characterization Study of Merge Conflicts in Java Projects (TOSEM22)

In collaborative software development, programmers create software branches to add features and ix bugs tentatively, and then merge branches to integrate edits. When edits from diferent branches textually overlap (i.e., textual conlicts) or lead to compilation and runtime errors (i.e., build and test conlicts), it is challenging for developers to remove such conlicts. Prior work proposed tools to detect and solve conlicts. They investigate how conlicts relate to code smells and the software development process. However, many questions are still not fully investigated, such as what types of conlicts exist in real-world applications and how developers or tools handle them. For this paper, we used automated textual merge, compilation, and testing to reveal 3 types of conlicts in 208 open-source repositories: textual conlicts, build conlicts (i.e., conlicts causing build errors), and test conlicts (i.e., conlicts triggering test failures). We manually inspected 538 conlicts and their resolutions to characterize merge conlicts from diferent angles.

Our analysis revealed three interesting phenomena. First, higher-order conlicts (i.e., build and test conlicts) are harder to detect and resolve, while existing tools mainly focus on textual conlicts. Second, developers manually resolved most higher-order conlicts by applying similar edits to multiple program locations; their conlict resolutions share common editing patterns implying great opportunities for future tool design. Third, developers resolved 64% of true textual conlicts by keeping complete edits from either a left or right branch. Unlike prior studies, our research for the irst time thoroughly characterizes three types of conlicts, with a special focus on higher-order conlicts and limitations of existing tool design. Our work will shed light on future research of software merge.

A Lightweight Approach of Human-Like Playtest for Android Apps (SANER22)

A playtest is the process in which testers play video games for software quality assurance. Manual testing is expensive and time-consuming, especially when there are many mobile games to test and every game version requires extensive testing. Current testing frameworks (e.g., Android Monkey) are limited as they adopt no domain knowledge to play games. Learning-based tools (e.g., Wuji) require tremendous manual effort and ML expertise of developers.

This paper presents LIT—a lightweight approach to generalize playtest tactics from manual testing, and to adopt the tactics for automatic testing. LIT has two phases: tactic generalization and tactic concretization. In Phase I, when a human tester plays an Android game G for a while (e.g., eight minutes), LIT records the tester’s inputs and related scenes. Based on the collected data, LIT infers a set of context-aware, abstract playtest tactics that describe under what circumstances, what actions can be taken. In Phase II, LIT tests G based on the generalized tactics. Namely, given a randomly generated game scene, LIT tentatively matches that scene with the abstract context of any inferred tactic; if the match succeeds, LIT customizes the tactic to generate an action for playtest. Our evaluation with nine games shows LIT to outperform two state-of-the-art tools and a reinforcement learning (RL)-based tool, by covering more code and triggering more errors. LIT complements existing tools and helps developers test various casual games (e.g., match3, shooting, and puzzles).

HERO: On the Chaos When PATH Meets Modules (ICSE21) ACM SIGSOFT Distinguished Paper Award

Ever since its first release in 2009, the Go programming language (Golang) has been well received by software communities. A major reason for its success is the powerful support of library-based development, where a Golang project can be conveniently built on top of other projects by referencing them as libraries. As Golang evolves, it recommends the use of a new library-referencing mode to overcome the limitations of the original one. While these two library modes are incompatible, both are supported by the Golang ecosystem. The heterogeneous use of library-referencing modes across Golang projects has caused numerous dependency management (DM) issues, incurring reference inconsistencies and even build failures. Motivated by the problem, we conducted an empirical study to characterize the DM issues, understand their root causes, and examine their fixing solutions. Based on our findings, we developed HERO, an automated technique to detect DM issues and suggest proper fixing solutions. We applied HERO to 19,000 popular Golang projects. The results showed that HERO achieved a high detection rate of 98.5% on a DM issue benchmark and found 2,422 new DM issues in 2,356 popular Golang projects. We reported 280 issues, among which 181 (64.6%) issues have been confirmed, and 160 of them (88.4%) have been fixed or are under fixing. Almost all the fixes have adopted our fixing suggestions.

Automatic Method Change Suggestion to Complement Multi-Entity Edits (JSS19)

When maintaining software, developers sometimes change multiple program entities (i.e., classes, methods, and fields) to fulfill one maintenance task. We call such complex changes multi-entity edits. Consistently and completely applying multi-entity edits can be challenging, because (1) the changes scatter in different entities and (2) the incorrectly edited code may not trigger any compilation or runtime error. This paper introduces CMSuggester, an approach to suggest complementary changes for multi-entity edits. Given a multi-entity edit that (i) adds a new field or method and (ii) modifies one or more methods to access the field or invoke the method, CMSuggester suggests other methods to co-change for the new field access or method invocation. The design of CMSuggester is motivated by our preliminary study, which reveals that co-changed methods usually access existing fields or invoke existing methods in common.

Our evaluation shows that based on common field accesses, CMSuggester recommended method changes in 463 of 685 tasks with 70% suggestion accuracy; based on common method invocations, CMSuggester handled 557 of 692 tasks with 70% accuracy. Compared with prior work ROSE, TARMAQ, and Transitive Association Rules (TAR), CMSuggester recommended more method changes with higher accuracy. Our research can help developers correctly apply multi-entity edits.

An Empirical Study of Multi-Entity Changes in Real Bug Fixes (ICSME18)

Prior studies showed that developers applied repeated bug fixes—similar or identical code changes—to multiple locations. According to the observation, researchers built tools to automatically generate candidate patches from the repeated bug-fixing patterns. However, all such research focuses on the recurring change patterns within single methods. We are curious whether there are also repeated bug fixes that change multiple program entities (e.g., classes, methods, and fields); and if so, how we can leverage such recurring change patterns to further help developers fix bugs. In this paper, we present a comprehensive empirical study on multi-entity bug fixes in terms of their frequency, composition, and semantic meanings. Specifically for each bug fix, we first used our approach InterPart to perform static inter-procedural analysis on partial programs (i.e., the old and new versions of changed Java files), and to extract change dependency graphs (CDGs)—graphs that connect multiple changed entities based on their syntactic dependencies. By extracting common subgraphs from the CDGs of different fixes, we identified the recurring change patterns.

Our study on Aries, Cassandra, Derby, and Mahout shows that (1) 52-58% of bug fixes involved multi-entity changes; (2) 6 recurring change patterns commonly exist in all projects; and (3) 19-210 entity pairs were repetitively co-changed mainly because the pairs invoked the same methods, accessed the same fields, or contained similar content. These results helped us better understand the gap between the fixes generated by existing automatic program repair (APR) approaches and the real fixes. Our observations will shed light on the follow-up research of automatic program comprehension and modification.

Towards Reusing Hints from Past fixes--An Exploratory Study on Thousands of Real Samples (EMSE2017)

With the usage of version control systems, many bug fixes have accumulated over the years. Researchers have proposed various automatic program repair (APR) approaches that reuse past fixes to fix new bugs. However, some fundamental questions, such as how new fixes overlap with old fixes, have not been investigated. Intuitively, the overlap between old and new fixes decides how APR approaches can construct new fixes with old ones. Based on this intuition, we systematically designed six overlap metrics, and performed an empirical study on 5,735 bug fixes to investigate the usefulness of past fixes when composing new fixes. For each bug fix, we created delta graphs (i.e., program dependency graphs for code changes), and identified how bug fixes overlap with each other in terms of the content, code structures, and identifier names of fixes.

Our results show that if an APR approach knows all code name changes and composes new fixes by fully or partially reusing the content of past fixes, only 2.1% and 3.2% of new fixes can be created from single or multiple past fixes in the same project, compared with 0.9% and 1.2% of fixes created from past fixes across projects. However, if an APR approach knows all code name changes and composes new fixes by fully or partially reusing the code structures of past fixes, up to 41.3% and 29.7% of new fixes can be created. By making the above observations and revealing other 10 findings, we investigated the upper bound of reusable past fixes and composable new fixes, exploring the potential of existing and future APR approaches.

A Characterization Study of Repeated Bug Fixes (ICSME17)

Programmers always fix bugs when maintaining software. Previous studies showed that developers apply repeated bug fixes—similar or identical code changes—to multiple locations. Based on the observation, researchers built tools to identify code locations in need of similar changes, or to suggest similar bug fixes to multiple code fragments. However, some fundamental research questions, such as what are the characteristics of repeated bug fixes, are still unexplored. In this paper, we present a comprehensive empirical study with 341,856 bug fixes from 3 open-source projects to investigate repeated fixes in terms of their frequency, edit locations, and semantic meanings. Specifically, we sampled bug reports and retrieved the corresponding fixing patches in version history. Then we chopped patches into smaller fixes (edit fragments). Among all the fixes related to a bug, we identified repeated fixes using clone detection, and put a fix and its repeated ones into one repeated-fix group. With these groups, we characterized the edit locations, and investigated the common bug patterns as well as common fixes.

Our study on Eclipse JDT, Mozilla Firefox, and LibreOffice shows that (1) 15-20% of bugs involved repeated fixes; (2) 73-92% of repeated-fix groups were applied purely to code clones; and (3) 39% of manually examined groups focused on bugs relevant to additions or deletions of whole if-structures. These results deepened our understanding of repeated fixes. They enabled us to assess the effectiveness of existing tools, and will further provide insights for future research directions in automatic software maintenance and program repair.

How Does Execution Information Help with Information-Retrieval Based Bug Localization? (ICPC17)

Bug localization is challenging and time-consuming. Given a bug report, a developer may spend tremendous time in understanding the bug description together with code in order to locate bugs. To facilitate bug report comprehension, information retrieval (IR)-based bug localization techniques have been proposed to automatically search for and rank potential buggy code elements (i.e., classes or methods). However, these techniques do not leverage any dynamic execution information of buggy programs. For this paper, we performed the first systematic study on how dynamic execution information can help with static IR-based bug localization. More specifically, with the fixing patches and bug reports of 157 real bugs, we investigated the impact of various execution information (i.e., coverage, slicing, and spectrum) on three IR-based techniques: the baseline technique, BugLocator, and BLUiR.

Our experiments demonstrate that both the coverage and slicing information of failed tests can effectively reduce the search space and improve IR-based techniques at both class and method levels. Using additional spectrum information can further improve bug localization at the method but not the class level. Some of our investigated ways of augmenting IR-based bug localization with execution information even outperform a state-of-the-art technique, which merges spectrum with an IR-based technique in a complicated way. Different from prior work, by investigating various easy-to-understand ways to combine execution information with IR-based techniques, this study shows for the first time that execution information can generally bring considerable improvements to IR-based bug localization.