Software modifications are often systematic—they consist of similar but maybe different program changes to multiple program locations. Consistently applying such edits can be tedious and error-prone. We developed an approach, Sydit, that infers the systematic edit from one given code change example and applies the edit to user-specified target locations. Sydit creates a general program transformation from one code change example provided by developers, and then applies the transformation to user-selected target locations.
The tool is implemented as an Eclipse plug-in. Its process contains two phases: (1) to generate abstract program transformation from a concrete code change example, and (2) to apply the general transformation to user-selected methods.
When given two versions of a changed program, it leverages Eclipse JDT to build an Abstract Syntax Tree(AST) for each version. It then uses a modified version of ChangeDistiller to infer an AST edit script to represent the code changes. The script may contain four types of edit operations: insertion, deletion, update, and move. Each operation only manipulates a statement. By doing that, we cluster finer-granuarity code changes, such as multiple expression replacements, into statement-level changes and thus reduce the total number of inferred edit operations, simplifying edit script generation and application.
Starting from the edit script, Sydit performs static analysis using a modified version of Crystal. The reason why we used Crystal is that the framework conducts analysis based on AST. When the analysis result refers to code locations, we can easily map those locations to AST nodes, which facilitates us to later manipulate source code. The analysis identifies unchanged AST nodes which are either control or data dependent on by edited AST nodes. By extracting the identified unchanged nodes together with the changed ones, we extract a context relevant to the edit. The reason why we extract context is we want to generalize the edit-relevant context from the original example. Without generalization, we will be only able to apply similar edits to locations with identical program structures. However, with generalization, we tolerate some structural difference between the original location and target locations. We also abstract identifiers used in the exemplar edit so that the generalized transformation is applicable to locations using different identifiers. Finally, we recalculate the position of each operation with respect to the extracted context in order to create an abstract, context-aware edit script.
When developers specify target locations to apply the genralized program transformation, Sydit first establishes context matching between the abstract context and each target location. The context matching checks whether the location contains a subtree where the edit can fit and modify similarly to the original one. If there is not such a match, Sydit gives up applying the transformation to the location because it cannot guarantee that the control and data dependence relations in the target matches the original one and therefore should be changed accordingly. On the other hand, if there is a match, Sydit customizes the abstract transformation with respect to the matched context and creates a concrete applicable edit script. The matching algorithm is implemented in a bottom-up manner. It first finds all candidate nodes matches for each leaf node in the abstract context based on AST node types. Starting with the candidate matches, Sydit identifies a best match for each path between a leaf node and the root node in the abstract context. To find the best match, it checks (1) whether every node on the path has their AST node types matched, (2) whether children lists of corresponding nodes have their sequences matched, (3) whether the identifiers are consistently mapped so that the control and data dependence relations in the original location are contained in a target location as well. Finally, if there are still more than one best match for a given path, Sydit conservatively gives up applying transformation because it cannot break the tie with any other heuristics. The edit script concretization consists of two parts: (1) to replace abstract identifiers with concrete ones, and (2) recalculate the position of each operation with respect to the target location. By applying the edit script to the location, we suggest a modified version of the program for developrs to review.
The tool can be downloaded here. It contains some projects depended on by Sydit. To use the tool, you may also need to configure it to use some local paths of libraries like JDT. This version does not provide UI for developers to specify code change example or target locations. Instead, it reads a configuration file to get the information. We once had a version with proper UI. However, I can't find it now : (
In our evaluation, we use a data set consisting of 56 similarly changed method pairs. 47 of them are extractd from version control histories of five open source projects. The other 9 examples were constructed manually based on other examples in the evaluation set. Therefore, compared with the other examples, these ones are not as interesting. The configuration file I use to specify these examples is here.
For the evaluation, we also create a data set consisting of six similarly changed method groups. All of them are extracted from version control histories. The configuration file I use to specify these examples is here.
Similar to Sydit, Lase also generalizes and applies program transformation based on code change examples. However, it requires developers to provide two or more examples to enable its feature of automatically searching for candidate locations to change. In scenarios when identifying edit locations is more challenging than making the edits, this feature is very important because it can help developers avoid errors of omission. When developers provide multiple code change examples, Lase infers an edit script for each example, extracts the common edit shared between them and regards that as the systematic edit demonstrated by all examples. In this way, it can filter out any edit operation specific to some of the examples. When generalizing a program transformation out of the exemplar edits, it abstracts both identifiers and context when necessary. With identifier abstraction, it only abstracts identifiers which are used differently while keeping the other ones concrete. With context abstraction, it extracts context in each example and then identifies the common context shared between them. The context alignment and commonality extraction is done using a combination of three techniques: clone detection with CCFinder, program dependence analysis, and a subtree isomorphism algorithm. After deriving a program transformation, Lase leverages the transformation to both search for code to change and make changes accordingly.
The tool can be downloaded here. Compared with Sydit, this tool contains an extra project called "changeassistant-multipleexample". Lase depends on Sydit's functionality to do static analysis. However, it has its own entry class "EnhancedChangeAssistantMain.java" to start the application. It contains extra implementation to abstract both context and identifiers.
For the evaluation, in addition to the 56 method-pair examples we mined for Sydit, we also created a dataset consisting of 24 systematic editing tasks which we mined from version control history. Each systematic editing task corresponds to multiple bug fix patches checked in by developers to fix the same bug. The reason why the initial patch was not sufficient is that it only fixed some of the locations containing the bug. If the developers had used our tools earilier when making these changes, they should have fixed all the locations earlier. The dataset of repetitive bug fix patches can be downloaded here.
Rase leverages systematic edits inferred by Lase to scope code regions for refactorign and then automatically applies a series of refactorings to extract common code between systematically edited locations and parameterize any difference. The tool can help developers identify refactoring opportunities on simiarly code snippets which co-evolve so that any future repetitive systematic edits to multiple locations can be condensed to single updates in one location.
The tool can be downloaded here. Compared with Lase, it contains an extra project called "changeassistant-clonereduction". Rase depends on Lase to infer the systematic edit demonstrated by code change examples provided by developers. It then extracts a code region in each location's new version which encloses all edited statements. By identifying the largest common code region shared between locations, we can create a code template with wildcards to represent divergent usage of identifiers and expressions. Based on the template and wildcards, Rase automatically decides refactoring types to handle wildcards and create synactically valid programs.
For the evaluation, in addition to the 56 method-pair examples we mined for Sydit, we also created a data set consisting of 30 systematic editing tasks which we mined from version control histories of jfreechart and elasticsearch. Each systematic editing tasks corresponds to similar edits applied to at least three methods. The dataset can be downloaded here.