Drawing

Welcome

Hi, I’m Jamie. I am a PhD Candidate in CS at Virginia Tech. I am advised by Dr. Dongyoon Lee.

I am on the academic job market for a tenure-track position beginning Fall 2020.

Here’s a generic version of my application package:

My wife Kirsten Davis is also on the market for a tenure-track job in Engineering Education. She is an award-winning researcher with a focus on global engineering.

Research interests

My ambition is to improve software quality by empirically studying software defects, and developing tools and systems that reflect practitioners’ needs. I blend techniques from software engineering, systems, and security in order to understand, measure, and ameliorate the issues that software engineering practitioners face. I publish my work in communities like ESEC/FSE, EuroSys, and USENIX Security, and have been honored with two ACM Distinguished Paper awards.

I value applied research, and endeavor to impact practice with my findings as well as to communicate my work to practitioners.

  • I’ve disclosed hundreds of defects in real-world software.
  • I post practitioner-friendly abridged versions of my research papers on Medium.
  • To facilitate reproducibility and future research, I store artifacts for my research on Zenodo.

Industry experience

Full-time

From 2012-2015, plus summer 2016/2017, I worked as a software engineer at IBM. I was a software tester on IBM’s General Parallel File System (GPFS), now rebranded as IBM Spectrum Scale. - I focused on the ways in which a distributed file system can fail, with an emphasis on error injection and data validation. - I hold several US patents related to this work, and the product record for most defects filed in a year.

Research internships

Time Position Activities
Summer 2019 Intern @Microsoft Research: RiSE group under Patrice Godefroid Techniques and tools to improve web API security.
Summer 2018 Intern @IBM Research: Storage systems group under Deepavali Bhagwat and Lukas Rupprecht Provenance system for ML/Analytics (SIGMOD’19 demo paper).

Research projects

My work to date has considered the defects arising from troublesome tools and emerging paradigms. Some of these defects are due to unfriendly systems and frameworks, others by shallow practitioner expertise.

Troublesome Tools

A practical look at regular expressions

Regular expressions (regexes) are a widely used, hard to master engineering tool. They often cause software defects. In my regex investigations, I have measured the difficulties that practitioners experience, and guided programming language designers toward regex engines that reflect the needs of practitioners.

Here are the questions we’ve investigated:

  1. Drawing ESEC/FSE’18 How widespread of a problem is Regex Denial of Service (ReDoS)?
    • We measured the extent of super-linear regexes in two software ecosystems, npm and pypi.
    • We found ReDoS vulnerabilities in thousands of projects, including Node.js core, Python core, MongoDB, Django, and Hapi. We also disclosed vulnerabilities to Microsoft, which acknowledged us here in July 2018. Many more of our finds are listed in Snyk.io’s vulnerability database, mostly under npm.
  2. ESEC/FSE’19 How portable are regexes?
    • The Internet is full of anecdotes of regex portability problems. We measured them scientifically.
    • We surveyed 150 developers to understand their perspectives about regex re-use and found thousands of regexes re-used from Stack Overflow and RegExLib.
    • We experimentally measured the extent of syntactic, semantic, and performance portability problems when moving regexes across programming languages.
  3. Drawing ASE’19 How hard are regexes to work with?
    • We surveyed 279 developers and interviewed 17 developers to learn more about regex practices.
    • They told us that “Regexes Are Hard” in many ways, suggesting many avenues for further research to support them.
  4. ASE’19 How generalizable is regex research?
    • A deep dive on the generalizability of prior empirical regex research.
    • We investigated whether researchers’ regex samples are biased by (1) regex extraction methodology; or (2) programming language.
  5. Drawing ESEC/FSE’19 SRC Can we address ReDoS at the regex engine level?
    • Clearly, real software contains ReDoS vulnerabilities. Can we address this issue without major overhauls to the regex engines?

Emerging paradigms

Shifting to a new software paradigm is difficult. As developers migrate, they encounter issues caused by misconceptions or flaws in new frameworks. Every new paradigm presents an opportunity for researchers to contribute, with empirical work characterizing best practices or systems work evaluating the frameworks.

Examining server-side event-driven programming

I have investigated the correctness and security risks that resulted from one recent industry shift: adopting the event-driven paradigm on the server side. Thousands of companies have done so as they shift to the Node.js platform, unifying their stack on one programming language. Though this paradigmatic transition has brought business benefits, it has also led to many software defects due to fundamental limitations (non-determinism, security flaws) in the architecture of the Node.js framework.

  1. EuroSys’17 What are the race conditions in Node.js programs?
    • We investigated concurrency errors in Node.js applications.
    • We studied bug reports to demonstrate that these occur in practice, and built a schedule fuzzing tool called Node.fz to make these bugs more likely to manifest.
  2. USENIX Security’18 What are the denial of service attacks against Node.js programs?
  3. What are the performance issues in Node.js programs?
    • I am working on improving Node.js performance down in libuv. See my meta-issue about the state of the libuv threadpool and my pull request enabling a pluggable threadpool.
    • My undergraduate mentee Jonathan Alexander won first place at the VT Undergraduate Research in CS competition for his work on this project. Congratulations, Jonathan!

GraphQL

The web community is considering GraphQL as a means of addressing management issues with traditional REST-style APIs.

  1. ICSOC’19 What do GraphQL schemas look like?
    • We mined GitHub and commercial GraphQL APIs for their schemas.
    • We identified idioms to help new adopters write easy-to-understand schemas, and evaluated the extent of denial of service vulnerabilities in schemas.
  2. How can we defend against GraphQL denial of service attacks?
    • This work is under submission

Conference publications

Full-length

  1. Davis, Moyer, Kazerouni, and Lee. Testing Regex Generalizability And Its Implications: A Large-Scale Many-Language Measurement Study. Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering (ASE’19).
  2. Drawing Michael, Donohue, Davis, Lee, and Servant. Regexes are Hard: Decision-making, Difficulties, and Risks in Programming Regular Expressions. Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering (ASE’19, ACM Distinguished Paper).
  3. Wittern, Cha, Davis, Baudart, and Mandel. An Empirical Study of GraphQL Schemas. Proceedings of the 17th International Conference on Service-Oriented Computing (ICSOC’19).
  4. Davis, Michael, Coghlan, Servant, and Lee. Why Aren’t Regular Expressions a Lingua Franca? An Empirical Study on the Re-use and Portability of Regular Expressions. Proceedings of the 27th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE’19).
  5. Fu, Ghaffar, Davis, and Lee. EdgeWise: A Better Stream Processing Engine for the Edge. 2019 USENIX Annual Technical Conference (USENIX ATC’19).
  6. Drawing Davis, Coghlan, Servant, and Lee. The Impact of Regular Expression Denial of Service (REDOS) in Practice: an Empirical Study at the Ecosystem Scale. Proceedings of the 26th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE’18, ACM Distinguished Paper)
  7. Davis, Williamson, and Lee. A Sense of Time for JavaScript and Node.js: First-Class Timeouts as a Cure for Event Handler Poisoning. Proceedings of the 27th USENIX Security Symposium (USENIX Security’18).
  8. Davis, Thekumparampil, and Lee. Node.fz: Fuzzing the server-side event-driven architecture. Proceedings of the Twelfth European Conference on Computer Systems (EuroSys’17).

Short papers

Workshop-style papers offer a good opportunity for feedback from the research community.

  1. Drawing Davis. Rethinking Regex Engines to Address ReDoS. FSE’19 Student Research Competition, first place.
  2. Rupprecht, Davis, Arnold, Lubbock, Tyson, and Bhagwat. Ursprung: Provenance for Large-Scale Analytics Environments. SIGMOD’19 Demo Track.
  3. James Davis, Gregor Kildow, and Dongyoon Lee. The case of the poisoned event handler: Weaknesses in the Node.js event-driven architecture. Proceedings of the 10th European Workshop on Systems Security (EuroSec’17).

Patents

I hold several patents from my time at IBM.

  1. W. Davis, J. Davis. Injection of Simulated Hardware Failure(s) in a File System for Establishing File System Tolerance-to-Storage-Failure(s). IBM, U.S. patent pending.
  2. W. Davis, J. Davis. Verification of the integrity of data files stored in copy-on-write (CoW) based file system snapshots. IBM, U.S. patent pending.
  3. J. Davis, W. Davis. File Metadata Verification in a Distributed File System. IBM, U.S. patent pending.
  4. W. Davis, J. Davis. Testing of Lock Managers in Computing Environments. IBM, U.S. patent 10,061,777 granted Aug. 28, 2018.
  5. J. Davis, W. Davis, F. Knop. Detection of File Corruption in a Distributed File System. IBM, U.S. patent 10,025,788 granted July 17, 2018.

Collaborations

In addition to my on-site collaborations at IBM Research and Microsoft Research during internships, I have been able to collaborate on several other projects.

  1. I have been working with Erik Wittern, Alan Cha, Guillaume Baudart, and others at IBM Research on GraphQL projects. We have published a paper at ICSOC’19, with other work under submission.
  2. I am a Graduate Fellow in the VT Academy for Global Engineering, an institutional effort to capture the research and teaching efforts around global engineering education. I am working on a paper with other academy members on the experiences that leaders (not students) have while on study abroad trips.
  3. I was a de facto external advisor to James Donohue for his Master’s thesis at the University of Bradford. With other VT collaborators, James and I published an ASE’19 paper together.
  4. My labmate Xinwei Fu led a project on optimizing stream processing engines for the web, published at USENIX ATC’19.

Teaching experience

I enjoy teaching, and embrace the creativity required of an instructor – course design in the large, and ad-libbing individual lectures and activities in the small. Intellectual empathy seems to be a skill one never masters, but I’m learning.

  1. In Fall 2019 I was an instructor of record for CS 3114: Data Structures and Algorithms, a core course for majors.
  2. In Spring 2019 I was an instructor of record for CS 1064: Introduction to Programming (in Python), an introductory course for non-majors.
  3. In Spring 2018 and Spring 2019 I was a guest teaching assistant for ENGE 1644: Global STEM Practice. I helped prepare students for two-week study abroad trips to Australia (Spring’18) and Spain/Morocco (Spring’19), and then helped facilitate their learning as they traveled.
  4. In Fall 2017 I was a Teaching Assistant for the graduate-level CS 5510: Multiprocessor Programming.
  5. In Fall 2015 I was a Teaching Assistant for CS 3604: Professionalism in Computing. I was a spectacular failure, but certainly learned a lot about teaching and grading.

Other activities

  1. I organize the weekly VT Systems Reading Group (mailing list).
  2. As part of my “Event Handler Poisoning” research, I wrote a guide for nodejs.org: Don’t Block the Event Loop (or the Worker Pool).
  3. I have contributed to a few open-source projects:
    • Node.js: Server-side JavaScript
    • libuv: Cross-platform asynchronous I/O
    • regexp-tree: ASTs for regexes
    • marked: Popular markdown parser and compiler
  4. I served on the EuroSys 2018 Shadow PC. They named me an outstanding reviewer!
  5. I attended Node+JS Interactive 2018 and the subsequent collaborator summit.
  6. I spoke at the VT CS Grad Seminar in Fall 2019. Presentation slides.
  7. My colleague Sazzadur and I were guests on The Secure Developer podcast. We spoke On the dangers of copy-pasting code.
  8. I’ve read through a few papers on Plausible Deniability (PD) and remain unsatisfied. I think the research is interesting, but the threat model the authors use seems, well, implausible. Here is my rejoinder.
  9. In Fall 2016 I took Prof. Steve Harrison’s course in Human-Computer Interaction (HCI). In my course project I suggested that different programming paradigms might be easier for different cultures, based on Nisbett’s The Geography of Thought. Report.
  10. I am Virginia Tech’s reigning singles and doubles racquetball champion.