Automated Program Transformations

Inferring and Applying Def-Use Like Configuration Couplings in Deployment Descriptors (ASE20)

When building enterprise applications on Java frameworks (e.g., Spring), developers often specify components and configure operations with a special kind of XML files named “deployment descriptors (DD)”. Maintaining such XML files is challenging and time-consuming; because (1) the correct configuration semantics is domain-specific but usually vaguely documented, and (2) existing compilers and program analysis tools rarely examine XML files. To help developers ensure the quality of DD, this paper presents a novel approach--Xeditor--that extracts configuration couplings (i.e., frequently co-occurring configurations) from DD, and adopts the coupling rules to validate new or updated files.

Xeditor has two phases, coupling extraction and bug detection. To identify couplings, Xeditor first mines DD in open-source projects, and extracts XML entity pairs that (i) frequently coexist in the same files and (ii) hold the same data at least once. Xeditor then applies customized association rule mining to the extracted pairs. For bug detection, given a new XML file, Xeditor checks whether the file violates any coupling; if so, Xeditor reports the violation(s). For evaluation, we first created two data sets with the 4,248 DD mined from 1,137 GitHub projects. According to the experiments with these data sets, Xeditor extracted couplings with high precision (73%); it detected bugs with 92% precision, 96% recall, and 94% accuracy. Additionally, we applied Xeditor to the version history of another 478 GitHub projects. Xeditor identified 25 very suspicious XML updates, 15 of which were later fixed by developers.

Meditor: Inference and Application of API Migration Edits (ICPC19)

Developers build programs based on software libraries. When a library evolves, programmers need to migrate their client code from the library's old release(s) to new release(s). Due to the API backward incompatibility issues, such code migration may require developers to replace API usage and apply extra edits (e.g., statement insertions or deletions) to ensure the syntactic or semantic correctness of migrated code. Existing tools extract API replacement rules without handling the additional edits necessary to fulfill a migration task. This paper presents our novel approach, Meditor, which extracts and applies the necessary edits together with API replacement changes.

Meditor has two phases: inference and application of migration edits. For edit inference, Meditor mines open-source repositories for migration-related (MR) commits, and conducts program dependency analysis on changed Java files to locate and cluster MR code changes. For these changes, Meditor further generalizes API migration edits by abstracting away unimportant details (e.g., concrete variable identifiers). For edit application, Meditor matches a given program with inferred edits to decide which edit is applicable, customizes each applicable edit, and produces a migrated version for developers to review.

We applied Meditor to four popular libraries: Lucene, CraftBukkit, Android SDK, and Commons IO. By searching among 602,249 open-source projects on GitHub, Meditor identified 1,368 unique migration edits. Among these edits, 885 edits were extracted from single updated statements, while the other 483 more complex edits were from multiple co-changed statements. We sampled 937 inferred edits for manual inspection and found all of them to be correct. Our evaluation shows that Meditor correctly applied code migrations in 218 out of 225 cases. This research will help developers automatically adapt client code to different library versions.

Automatic Inference of Java-to-Swift Translation Rules for Porting Mobile Applications (MOBILESoft18)

A native cross-platform mobile app has multiple platform-specific implementations. Typically, an app is developed for one platform and then ported to the remaining ones. Translating an app from one language (e.g., Java) to another (e.g., Swift) by hand is tedious and error-prone, while automated translators either require manually defined translation rules or focus on translating APIs. To automate the translation of native cross-platform apps, we present j2sInferer, a novel approach that iteratively infers syntactic translation rules and API mappings from Java to Swift. Given a software corpus in both languages, j2sInferer first identifies the syntactically equivalent code based on braces and string similarity. For each pair of similar code segments, j2sInferer then creates syntax trees of both languages, leverages the minimalist domain knowledge of language correspondence (e.g., operators and markers) to iteratively align syntax tree nodes, and to infer both syntax and API mapping rules. j2sInferer represents inferred rules as string templates stored in a database, to translate code from Java to Swift. We evaluated j2sInferer with four applications, using one part of the data to infer translation rules and the other part to apply the rules. With 76% in-project accuracy and 65% cross-project accuracy, j2sInferer outperforms j2swift, a state-of-the-art Java-to-Swift conversion tool. As native cross-platform mobile apps grow in popularity, j2sInferer can shorten their time to market by automating the tedious and error-prone tasks of source-to-source translation.

Does Automated Refactoring Obviate Systematic Editing? (ICSE15)

Rase leverages systematic edits inferred by Lase to identify code regions for refactoring, and then automatically applies a series of refactorings to extract common code between systematically edited locations and parameterize any difference. The tool can help developers identify refactoring opportunities on simiarly code snippets which co-evolve so that any future repetitive systematic edits to multiple locations can be condensed to single updates in one location.

The tool can be downloaded here. Compared with Lase, it contains an extra project called "changeassistant-clonereduction". Rase depends on Lase to infer the systematic edit demonstrated by code change examples provided by developers. It then extracts a code region in each location's new version which encloses all edited statements. By identifying the largest common code region shared between locations, we can create a code template with wildcards to represent divergent usage of identifiers and expressions. Based on the template and wildcards, Rase automatically decides refactoring types to handle wildcards and create synactically valid programs.

For the evaluation, in addition to the 56 method-pair examples we mined for Sydit, we also created a data set consisting of 30 systematic editing tasks which we mined from version control histories of jfreechart and elasticsearch. Each systematic editing tasks corresponds to similar edits applied to at least three methods. The dataset can be downloaded here.

LASE: Locating and Applying Systematic Edits by Learning from Examples (ICSE13)

Similar to Sydit, Lase also generalizes and applies program transformation based on code change examples. However, it requires developers to provide two or more examples to enable its feature of automatically searching for candidate locations to change. In scenarios when identifying edit locations is more challenging than making the edits, this feature is very important because it can help developers avoid errors of omission. When developers provide multiple code change examples, Lase infers an edit script for each example, extracts the common edit shared between them and regards that as the systematic edit demonstrated by all examples. In this way, Lase can filter out any edit operation specific to some of the examples. When generalizing a program transformation out of the exemplar edits, Lase abstracts both identifiers and context when necessary. With identifier abstraction, Lase only abstracts identifiers which are used differently while keeping the other ones concrete. With context abstraction, Lase extracts context in each example and then identifies the common context shared between them. The context alignment and commonality extraction are done using a combination of three techniques: clone detection with CCFinder, program dependence analysis, and a subtree isomorphism algorithm. After deriving a program transformation, Lase leverages the transformation to both search for code to change and make changes accordingly.

The tool can be downloaded here. Compared with Sydit, this tool contains an extra project called "changeassistant-multipleexample". Lase depends on Sydit's functionality to do static analysis. However, it has its own entry class "EnhancedChangeAssistantMain.java" to start the application. It contains extra implementation to abstract both context and identifiers.

For the evaluation, in addition to the 56 method-pair examples we mined for Sydit, we also created a dataset consisting of 24 systematic editing tasks which we mined from version control history. Each systematic editing task corresponds to multiple bug fix patches checked in by developers to fix the same bug. The reason why the initial patch was not sufficient is that it only fixed some of the locations containing the bug. If the developers had used our tools earilier when making these changes, they should have fixed all the locations earlier. The dataset of repetitive bug fix patches can be downloaded here.

Systematic Editing: Generating Program Transformations from an Example (PLDI11)

Software modifications are often systematic—they consist of similar but maybe different program changes to multiple program locations. Consistently applying such edits can be tedious and error-prone. We developed an approach, Sydit, that infers the systematic edit from one given code change example and applies the edit to user-specified target locations. Sydit creates a general program transformation from one code change example provided by developers, and then applies the transformation to user-selected target locations.

The tool is implemented as an Eclipse plug-in. Its process contains two phases: (1) to generate abstract program transformation from a concrete code change example, and (2) to apply the general transformation to user-selected methods.

Phase I: Generating program transformation from one example

When given two versions of a changed program, it leverages Eclipse JDT to build an Abstract Syntax Tree(AST) for each version. It then uses a modified version of ChangeDistiller to infer an AST edit script to represent the code changes. The script may contain four types of edit operations: insertion, deletion, update, and move. Each operation only manipulates a statement. By doing that, we cluster finer-granuarity code changes, such as multiple expression replacements, into statement-level changes and thus reduce the total number of inferred edit operations, simplifying edit script generation and application.

Starting from the edit script, Sydit performs static analysis using a modified version of Crystal. The reason why we used Crystal is that the framework conducts analysis based on AST. When the analysis result refers to code locations, we can easily map those locations to AST nodes, which facilitates us to later manipulate source code. The analysis identifies unchanged AST nodes which are either control or data dependent on by edited AST nodes. By extracting the identified unchanged nodes together with the changed ones, we extract a context relevant to the edit. The reason why we extract context is we want to generalize the edit-relevant context from the original example. Without generalization, we will be only able to apply similar edits to locations with identical program structures. However, with generalization, we tolerate some structural difference between the original location and target locations. We also abstract identifiers used in the exemplar edit so that the generalized transformation is applicable to locations using different identifiers. Finally, we recalculate the position of each operation with respect to the extracted context in order to create an abstract, context-aware edit script.

Phase II: Applying program transformation to target method(s)

When developers specify target locations to apply the genralized program transformation, Sydit first establishes context matching between the abstract context and each target location. The context matching checks whether the location contains a subtree where the edit can fit and modify similarly to the original one. If there is not such a match, Sydit gives up applying the transformation to the location because it cannot guarantee that the control and data dependence relations in the target matches the original one and therefore should be changed accordingly. On the other hand, if there is a match, Sydit customizes the abstract transformation with respect to the matched context and creates a concrete applicable edit script. The matching algorithm is implemented in a bottom-up manner. It first finds all candidate nodes matches for each leaf node in the abstract context based on AST node types. Starting with the candidate matches, Sydit identifies a best match for each path between a leaf node and the root node in the abstract context. To find the best match, it checks (1) whether every node on the path has their AST node types matched, (2) whether children lists of corresponding nodes have their sequences matched, (3) whether the identifiers are consistently mapped so that the control and data dependence relations in the original location are contained in a target location as well. Finally, if there are still more than one best match for a given path, Sydit conservatively gives up applying transformation because it cannot break the tie with any other heuristics. The edit script concretization consists of two parts: (1) to replace abstract identifiers with concrete ones, and (2) recalculate the position of each operation with respect to the target location. By applying the edit script to the location, we suggest a modified version of the program for developrs to review.

The tool can be downloaded here. It contains some projects depended on by Sydit. To use the tool, you may also need to configure it to use some local paths of libraries like JDT. This version does not provide UI for developers to specify code change example or target locations. Instead, it reads a configuration file to get the information. We once had a version with proper UI. However, I can't find it now : (

In our evaluation, we use a data set consisting of 56 similarly changed method pairs. 47 of them are extractd from version control histories of five open source projects. The other 9 examples were constructed manually based on other examples in the evaluation set. Therefore, compared with the other examples, these ones are not as interesting. The configuration file I use to specify these examples is here.

For the evaluation, we also create a data set consisting of six similarly changed method groups. All of them are extracted from version control histories. The configuration file I use to specify these examples is here.