Hi, I’m Jamie. I am a PhD Candidate in CS at Virginia Tech.
I am on the job market! It is my ambition to obtain a tenure-track assistant professorship beginning in Fall 2020.
I’m a scientific computer scientist. I like to do research that is motivated by practical problems. My work often takes the form of “taking a problem and running with it”, by learning of a problem that practitioners face, articulating it effectively, and studying it deeply.
This is perhaps because I’ve spent enough time in industry to know that there are tons of great practical problems out there. When I worked full-time I often happened across interesting problems, got something working, and moved on, but wished I could return. As a researcher that’s my job!
I value applied research, and endeavor to impact practice with my findings as well as to communicate my work to practitioners.
- Many of my papers include a “bugs” section.
- I post practitioner-friendly abridged versions of my research papers on Medium.
- To facilitate reproducibility and additional research, I store artifacts for my research on Zenodo.
Here is my CV.
In Summer 2016 and Summer 2017 I worked on IBM’s GPFS product as a software tester.
From 2012-2015 I was a software tester at IBM. I worked on IBM’s General Parallel File System (GPFS), now rebranded as IBM Spectrum Scale.
- As a tester I focused on the ways in which a parallel file system can fail, with an emphasis on error injection and data validation.
- I hold several US patents related to this work, as well as (at the time that I left) the product record for most defects filed in a year.
- I also enjoyed visiting India (Pune, Bangalore) and China (Beijing) to train colleagues.
Ongoing research projects
- Regular expressions in practice
- Part 1 (ESEC/FSE’18): Regular expression denial of service (ReDoS) in the wild. We found ReDoS vulnerabilities in some interesting places, including Node.js core, Python core, MongoDB, Django, and Hapi. We also disclosed vulnerabilities to Microsoft, which acknowledged us here in July 2018. Many more of our finds are listed in Snyk.io’s vulnerability database, mostly under npm.
- Part 2 (ESEC/FSE’19): Regular expression portability problems. We surveyed developers to understand their perspectives abour regex re-use, matched real regexes to Stack Overflow and RegExLib examples, and experimentally measured the extent of syntactic, semantic, and performance portability problems when moving regexes across programming languages.
- Part 3: (ASE’19): A deep dive on regex comparisons across programming languages. We investigated whether (1) Statically-defined and dynamically-defined regexes look similar on various metrics; and (2) Whether regexes defined in different programming languages look similar on various metrics. They do – so researchers can collect statically-defined regexes and not worry about dynamically defined ones, and research covering only one programming language may generalize to others. Hurrah for simpler methods!
- Part 4: (ASE’19): A deep dive on developers’ perspectives on regexes. We surveyed 279 developers and interviewed 17 developers to learn more about regex practices.
- I am working on improving Node.js performance down in libuv. See my meta-issue about the state of the libuv threadpool and my pull request enabling a pluggable threadpool. We’ll see where this goes…
- My undergraduate mentee Jonathan Alexander won first place at the VT Undergraduate Research in CS competition for his work on this project. Congratulations, Jonathan!
Past research projects
- Node.fz. We investigated concurrency errors in Node.js applications. We studied bug reports to demonstrate that these occur in practice, and built a schedule fuzzing tool called Node.fz to make these bugs more likely to manifest. We published this work at EuroSys’17.
- Davis, Moyer, Kazerouni, and Lee. Testing Regex Generalizability And Its Implications: A Large-Scale Many-Language Measurement Study. Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering (ASE’19).
- Michael, Donohue, Davis, Lee, and Servant. Regexes are Hard: Decision-making, Difficulties, and Risks in Programming Regular Expressions. Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering (ASE’19, ACM Distinguished Paper).
- Wittern, Cha, Davis, Baudart, and Mandel. An Empirical Study of GraphQL Schemas. Proceedings of the 17th International Conference on Service-Oriented Computing (ICSOC’19).
- Davis, Michael, Coghlan, Servant, and Lee. Why Aren’t Regular Expressions a Lingua Franca? An Empirical Study on the Re-use and Portability of Regular Expressions. Proceedings of the 27th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE’19).
- Fu, Ghaffar, Davis, and Lee. EdgeWise: A Better Stream Processing Engine for the Edge. 2019 USENIX Annual Technical Conference (USENIX ATC’19).
- Davis, Coghlan, Servant, and Lee. The Impact of Regular Expression Denial of Service (REDOS) in Practice: an Empirical Study at the Ecosystem Scale. Proceedings of the 26th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE’18, ACM Distinguished Paper)
- Davis, Thekumparampil, and Lee. Node.fz: Fuzzing the server-side event-driven architecture. Proceedings of the Twelfth European Conference on Computer Systems (EuroSys’17).
- Davis. Rethinking Regex Engines to Address ReDoS. FSE’19 Student Research Competition. I won first place!
- Rupprecht, Davis, Arnold, Lubbock, Tyson, and Bhagwat. Ursprung: Provenance for Large-Scale Analytics Environments. SIGMOD’19 Demo Track.
- James Davis, Gregor Kildow, and Dongyoon Lee. The case of the poisoned event handler: Weaknesses in the Node.js event-driven architecture. Proceedings of the 10th European Workshop on Systems Security (EuroSec’17).
- W. Davis, J. Davis. Injection of Simulated Hardware Failure(s) in a File System for Establishing File System Tolerance-to-Storage-Failure(s). IBM, U.S. patent pending.
- W. Davis, J. Davis. Verification of the integrity of data files stored in copy-on-write (CoW) based file system snapshots. IBM, U.S. patent pending.
- J. Davis, W. Davis. File Metadata Verification in a Distributed File System. IBM, U.S. patent pending.
- W. Davis, J. Davis. Testing of Lock Managers in Computing Environments. IBM, U.S. patent 10,061,777 granted Aug. 28, 2018.
- J. Davis, W. Davis, F. Knop. Detection of File Corruption in a Distributed File System. IBM, U.S. patent 10,025,788 granted July 17, 2018.
I enjoy teaching, and embrace the creativity required of an instructor – course design in the large, and ad-libbing individual lectures and activities in the small. Intellectual empathy seems to be a skill one never masters, but I’m learning.
- In Fall 2019 I was an instructor of record for CS 3114: Data Structures and Algorithms, a core course for majors.
- In Spring 2019 I was an instructor of record for CS 1064: Introduction to Programming (in Python), an introductory course for non-majors.
- In Spring 2018 and Spring 2019 I was a guest teaching assistant for ENGE 1644: Global STEM Practice. I helped prepare students for two-week study abroad trips to Australia (Spring’18) and Spain/Morocco (Spring’19), and then helped facilitate their learning as they traveled.
- In Fall 2017 I was a Teaching Assistant for the graduate-level CS 5510: Multiprocessor Programming.
- In Fall 2015 I was a Teaching Assistant for CS 3604: Professionalism in Computing. I was a spectacular failure, but certainly learned a lot about teaching and grading.
In addition to my on-site collaborations at IBM Research and Microsoft Research during internships, I have been able to collaborate on several other projects remotely.
- I have been working with Erik Wittern, Alan Cha, Guillaume Baudart, and others at IBM Research on GraphQL projects. So far we have published a paper at ICSOC’19.
- I am a Graduate Fellow in the VT Academy for Global Engineering, an institutional effort to capture the research and taeching efforts around global engineering education. I am working on a paper with other academy members on the experiences that leaders (not students) have while on study abroad trips.
- I was a de facto external advisor to James Donohue for his Master’s thesis at the University of Bradford. With other VT collaborators, James and I published an ASE’19 paper together. And it won a distinguished paper award!
- I organize the weekly VT Systems Reading Group (mailing list).
- As part of my “Event Handler Poisoning” research, I wrote a guide for nodejs.org: Don’t Block the Event Loop (or the Worker Pool).
- I have contributed to a few open-source projects:
- I served on the EuroSys 2018 Shadow PC. They named me an outstanding reviewer!
- I attended Node+JS Interactive 2018 and the subsequent collaborator summit.
- I spoke at the VT CS Grad Seminar in Fall 2019. Presentation slides.
- I’ve read through a few papers on Plausible Deniability (PD) and remain unsatisfied. I think the research is interesting, but the threat model the authors use seems, well, implausible. Here is my rejoinder.
- In Fall 2017 I took Dr. Sharath Raghvendra’s course in Algorithms. In my course project I did a small literature review of major papers in testing the correctness of concurrent programs. Report.
- In Spring 2017 I took Dr. Pierre Olivier’s course in Linux Kernel Programming. I worked on a project to deploy an alternative scheduler with Xinwei Fu and Jingoo Han. It was a hierarchical multi-level feedback queue, and our feedback mechanism turned out to be so fine-grained that the system wasn’t particularly usable. Report.
- In Fall 2016 I took Prof. Steve Harrison’s course in Human-Computer Interaction (HCI). In my course project I suggested that different programming paradigms might be easier for different cultures, based on Nisbett’s The Geography of Thought. Report.
- In Spring 2016 I took Dr. Ali Butt’s course in Cloud Computing. I worked on a project to compare cloud service providers (AWS, Google, and Azure) with Uday Ananth and Ayaan Kazerouni. We performed a qualitative study of usability, reliability, and customer service, and a quantitative study of node performance. Report.
I’m reasonably conversant with everything other than the front-end. Some technologies I have used:
- Operating systems
- UNIX-like: Linux, AIX.
- I’ve done coursework in the Linux kernel and have spent a lot of time programming against the UNIX syscalls.
- UNIX-like: Linux, AIX.
- Scripting: Bash, Python, Perl, Node.js
- I am pretty familiar with the implementation of Node.js – Node core, V8, and libuv.
- Other languages: Java, C-90, C++
- Scripting: Bash, Python, Perl, Node.js
- Parallel programming
- I know a bit of SQL, though I admit I prefer grepping giant flat files.
- I’ve used VMs (usually locally, through VirtualBox) for development.
- I set up VMs on AWS, Google Cloud, Azure, and IBM SoftLayer as part of a course project on VM performance variability.
- I try to Dockerize my research artifacts.
- I leveraged the Linux auditd subsystem in the Ursprung prototype.