I'm a scientific computer scientist. I like to solve interesting problems with software. My principal interests lie in program correctness (bugs!) and programmability.
From 2012-2015 I was a full-time software tester at IBM. I worked on IBM's General Parallel File System (GPFS), now rebranded as IBM Spectrum Scale. I focused on the ways in which a parallel file system can fail, with an emphasis on data validation. I'm still employed by IBM and return for summer work.
In the fall of 2015 I started working on a PhD at Virginia Tech in Computer Science. My advisor is Dr. Dongyoon Lee and we study a variety of systems and security problems, most recently those in event-driven systems like Node.js applications.
During my years as a software tester it became increasingly clear to me that the struggles of developers to write correct code are important. My research interests revolve around the core issues underlying software development:
In Spring 2017 I took Dr. Pierre Olivier's course in Linux kernel programming. I worked on a project to deploy an alternative scheduler Xinwei Fu and Jingoo Han. It was a hierarchical multi-level feedback queue, and our feedback mechanism was so fine-grained that the system wasn't particularly usable. You can amuse yourself with our report here.
In Spring 2016 I took Dr. Ali Butt's course in cloud computing. I worked on a project to compare cloud service providers (AWS, Google, and Azure) with Uday Ananth and Ayaan Kazerouni. We performed a qualitative study of usability, reliability, and customer service, and a quantitative study of node performance. You can read our report here.
While at IBM a colleague and I filed a US patent on the detection of file corruption in a distributed file system. As software testers, we were responsible for identifying everything that was wrong with GPFS, across the entire scope of the product. We opened defects against poor command-line interfaces, inadequate error messages, inappropriate syslog verbosity, poor performance, and everything else you can imagine. Our favorite defects, however, related to correctness errors
When you write data ABC into a file system, you expect to read data ABC back. In a file system as complex as GPFS, however, it's not uncommon early on in the test cycle to get something else instead. You might get AB, in which case the file system has inappropriately truncated your data. You might get ABC old data, in which case the file system probably left "old" data in the file and failed to wipe it properly before giving it to you. You might even get AXC, in which case a write X to another file has mysteriously ended up in your file. In all of these cases it's appropriate to open a defect under the (highly-visible) category of Silent Data Corruption, and once reported these defects are addressed promptly; IBM is not in the habit of releasing code that corrupts its customers' data.
Typically, test programs rely on a checksum to detect file corruption. A checksum is a hash of a file's contents, typically of a fixed length. You can compute checksums in a few different ways, including the sysutils sum, cksum, md5sum, sha1sum, sha512sum. You might notice a theme in the names. Suppose that you have a 10MB file and you record its cksum every time you make modifications to it. When you read it, you re-calculate the cksum and compare it to the recorded cksum. If there's a mismatch, you've detected (a bug in your test program, or) corruption. Unfortunately, developers need a lot more information than just knowing that "the file content isn't right!" in order to debug a problem like this.
Once the patent is approved, I'll be happy to fill in the details of our design. Until then, enjoy pondering the problem.
Like all good graduate students, I maintain a blog on matters technical and personal.
You can visit it here.
I'm working on a guide for new students.
I run the CS department's Systems Reading Group. We meet weekly over pastries to discuss papers and practice talks.