Program Comprehension

An Empirical Study on API Parameter Rules (ICSE20)

Developers build programs based on software libraries to reduce coding effort. If a program inappropriately sets an API parameter, the program may exhibit unexpected runtime behaviors. To help developers correctly use library APIs, researchers built tools to mine API parameter rules. However, it is still unknown (1) what types of parameter rules there are, and (2) how these rules distribute inside documents and source files. In this paper, we conducted an empirical study to investigate the above-mentioned questions. To analyze as many parameter rules as possible, we took a hybrid approach that combines automatic localization of constrained parameters with manual inspection. Our automatic approach—PaRu—locates parameters that have constraints either documented in Javadoc (i.e., document rules) or implied by source code (i.e., code rules). Our manual inspection (1) identifies and categorizes rules for the located parameters, and (2) establishes mapping between document and code rules.

By applying PaRu to 9 widely used libraries, we located 5,334 parameters with either document or code rules. Interestingly, there are only 187 parameters that have both types of rules, and 79 pairs of these parameter rules are unmatched. Additionally, PaRu extracted 1,688 rule sentences from Javadoc and code. We manually classified these sentences into six categories, two of which are overlooked by prior approaches. We found that 86.2% of parameters have only code rules; 10.3% of parameters have only document rules; and only 3.5% of parameters have both document and code rules. Our research reveals the challenges for automating parameter rule extraction. Based on our findings, we discuss the potentials of prior approaches and present our insights for future tool design.

An Empirical Comparison between Monkey Testing and Human Testing (LCTES19)

Android app testing is challenging and time-consuming because fully testing all feasible execution paths is difficult. Nowadays apps are usually tested in two ways: human testing or automated testing. Prior work compared different automated tools. However, some fundamental questions are still unexplored, including (1) how automated testing behaves differently from human testing, and (2) whether automated testing can fully or partially substitute human testing.

This paper presents our study to explore the open questions. Monkey has been considered one of the best automated testing tools due to its usability, reliability, and competitive coverage metrics, so we applied Monkey to five Android apps and collected their dynamic event traces. Meanwhile, we recruited eight users to manually test the same apps and gathered the traces. By comparing the collected data, we revealed that i.) on average, the two methods generated similar numbers of unique events; ii.) Monkey created more system events while humans created more UI events; iii.) Monkey could mimic human behaviors when apps have UIs fully of clickable widgets to trigger logically independent events; and iv.) Monkey was insufficient to test apps that require information comprehension and problem-solving skills. Our research sheds light on future research that combines human expertise with the agility of Monkey testing.

Lascad: Language-Agnostic Software Categorization and Similar Application Detection (JSS18)

Categorizing software and detecting similar programs are useful for various purposes including expertise sharing, program comprehension, and rapid prototyping. However, existing categorization and similar software detection tools are not sufficient. Some tools only handle applications written in certain languages or belonging to specific domains like Java or Android. Other tools require significant configuration effort due to their sensitivity to parameter settings, and may produce excessively large numbers of categories. In this paper, we present a more usable and reliable approach of Language-Agnostic Software Categorization and similar Application Detection (LASCAD). Our approach applies Latent Dirichlet Allocation (LDA) and hierarchical clustering to programs’ source code in order to reveal which applications implement similar functionalities. LASCAD is easier to use in cases when no domain-specific tool is available or when users want to find similar software across programming languages.

To evaluate LASCAD’s capability of categorizing software, we used three labeled data sets: two sets from prior work and one larger set that we created with 103 applications implemented in 19 different languages. By comparing LASCAD with prior approaches on these data sets, we found LASCAD to be more usable and outperform existing tools. To evaluate LASCAD’s capability of similar application detection, we reused our 103-application data set and a newly created unlabeled data set of 5,220 applications. The relevance scores of the Top-1 retrieved applications within these two data sets were separately 70% and 71%. Overall, LASCAD effectively categorizes and detects similar programs across languages.