Artificial Intelligence and Software Engineering

SAIS: Self-Adaptive Identification of Security Bug Reports (TDSC19)

Among various bug reports (BRs), security bug reports (SBRs) are unique because they require immediate concealment and fixes. When SBRs are not identified in time, attackers can exploit the vulnerabilities. Prior work identifies SBRs via text mining, which requires a predefined keyword list and trains a classifier with known SBRs and non-security bug reports (NSBRs). The former approach is not reliable, because (1) as the context of security vulnerabilities and terminology of SBRs change over time, the predefined list will become outdated; and (2) users may have insufficient SBRs for training. This paper introduces a semi-supervised learning-based approach, SAIS, to adaptively and reliably identify SBRs. Given a project’ BRs containing some labeled SBRs, many more NSBRs, and unlabeled BRs, SAIS iteratively mines keywords, trains a classifier based on the keywords from the labeled data, classifies unlabeled BRs, and augments its training data with the newly labeled BRs. Our evaluation shows that SAIS is useful for identifying SBRs.

Automatic Clone Recommendation for Refactoring Based on the Present and the Past (ICSME18)

When many clones are detected in software programs, not all clones are equally important to developers. To help developers refactor code and improve software quality, various tools were built to recommend clone-removal refactorings based on the past and the present information, such as the cohesion degree of individual clones or the co-evolution relations of clone peers. The existence of these tools inspired us to build an approach that considers as many factors as possible to more accurately recommend clones. This paper introduces CRec, a learning-based approach that recommends clones by extracting features from the current status and past history of software projects. Given a set of software repositories, CRec first automatically extracts the clone groups historically refactored (R-clones) and those not refactored (NR-clones) to construct the training set. CRec extracts 34 features to characterize the content and evolution behaviors of individual clones, as well as the spatial, syntactical, and co-change relations of clone peers. With these features, CRec trains a classifier that recommends clones for refactoring.

We designed the largest feature set thus far for clone recommendation, and performed an evaluation on six large projects. The results show that our approach suggested refactorings with 83% and 76% F-scores in the within-project and cross-project settings. CRec significantly outperforms a state-of-the-art similar approach on our data set, with the latter one achieving 70% and 50% F-scores. We also compared the effectiveness of different factors and different learning algorithms.

CCLearner: A Deep Learning-Based Clone Detection Approach (ICSME17)

Programmers produce code clones when developing software. By copying and pasting code with or without modification, developers reuse existing code to improve programming productivity. However, code clones present challenges to software maintenance: they may require consistent application of the same or similar bug fixes or program changes to multiple code locations. To simplify the maintenance process, various tools have been proposed to automatically detect clones. Some tools tokenize source code, and then compare the sequence or frequency of tokens to reveal clones. Some other tools detect clones using tree-matching algorithms to compare the Abstract Syntax Trees (ASTs) of source code. In this paper, we present CCLearner, the first solely token-based clone detection approach leveraging deep learning. CCLearner extracts tokens from known method-level code clones and non-clones to train a classifier, and then uses the classifier to detect clones in a given codebase.

To evaluate CCLearner, we reused BigCloneBench, an existing large benchmark of real clones. We used part of the benchmark for training and the other part for testing. We observed that CCLearner effectively detected clones. With the same data set, we conducted a systematic comparison experiment between CCLearner and three popular clone detection tools. Compared with the approaches not using deep learning, CCLearner achieved competitive clone detection effectiveness with low time cost.