NSF: 0811518: Blended Static/Dynamic Analyses for Performance Understanding and Improved Security of Framework-intensive Systems

Web applications are an important software paradigm in wide usage both by the commercial and research communities. These applications are built on top of numerous integrated layers of libraries and frameworks. Performance problems in these framework-intensive systems are often difficult to understand, exhibiting characteristics intrinsically different from previous systems. For example, a typical performance problem is not a single frequently executed method, but rather involves problematic activity across many methods spanning disparate frameworks (e.g., Apache's Tomcat, Microsoft's .NET CLR platform, Java EE platforms such as Apache's Geronimo, JBoss, or IBM's Websphere. To the developer, the application resembles an iceberg, the familiar code being only a small portion of the entire implementation, yet the entire system must be analyzed to achieve understanding of performance and security problems.

Framework-intensive Web applications are a challenge to existing analysis techniques. Purely static analyses, accomplished through examination of code without execution, suffer problems of insufficient scalability and/or insufficient precision for answering behavioral questions for these systems. Purely dynamic analyses, accomplished through judiciously placed instrumentation in source code, bytecode or by probing the JVM runtime system, introduce too much execution overhead, especially for production systems, or are too limited in the information gathered. Further, existing dynamic performance analyses focus on control flow, but the main purpose of these applications is to manipulate data; understanding object usage is crucial. The main idea in this proposal is to address these weaknesses {\it by blending static and dynamic analyses in new ways}, that in combination avoid these problems and support tools for framework-intensive applications.

The specific goals of this research proposal are:

---to design and experiment with {\it blended analyses} that are practical and effective in identifying performance problems for framework-based applications, thereby providing targets for inter-framework code specialization of common usage patterns;

---to enable richer characterizations of applications in order to design and validate realistic framework-intensive benchmarks, for example, to define framework API design {\it best practices};

---to investigate additional potential clients of blended analyses that can improve the quality of framework-intensive software systems.

Framework-intensive applications largely have been ignored by software engineering researchers because of their complexity and scale. This has resulted in a gap between the tools and techniques needed to deal with these applications, and those being developed by the research community. Designing analyses and developing tools to address performance and security issues for these applications will begin to bridge this gap. The PI has the advantage of her unique depth in program analysis, plus an already established research relationship with IBM researchers. These colleagues can provide access to real-world data for testing these ideas and appreciation of the difficulties of software development with inadequate tools.

Intellectual Merit. This research offers a unique opportunity to advance the state-of-the-art in program analysis to handle an important, but currently unexplored, complex software paradigm (i.e., framework-based applications) and to strongly influence current software practice. Successful application of blended analysis to framework-intensive systems will demonstrate that analysis is {\it scalable to software orders of magnitude larger than is currently possible}, such as Web applications. The intellectual challenge is to develop analyses of practical cost and of sufficient precision to scale up to industrial-strength framework-intensive software. Collection of framework-intensive benchmarks for a shared repository, a task best suited for an academic-industrial research collaboration, will encourage other researchers to address problems in Web applications.

Broader Impacts. Blended analyses for performance understanding and security enhancement will benefit Web application developers and the Open Source community through the prototypes built. Making the research infrastructure available to others will lower the barriers to further investigations into framework-intensive applications. The PI co-ordinates an experimental pedagogy program, RESCS (http://rescs.rutgers.edu), aimed at students entering CS from underrepresented groups. Research opportunities will be made available to the best RESCS students.