Experiences with Overleaf

25 January 2017

I wanted to write briefly about my new favorite LaTeX software.

For those of you who don't know, LaTeX (pronounced "lah-tech", not to be mistaken for "lay-tex") is a way to "typeset" your writing. I believe LaTeX is primarily used to prepare scientific articles for publication.

Most (all?) of us have written a document in Microsoft Word or {Libre,Open}Office. Such editors are known as WYSIWYG: What You See Is What You Get. When you write in Microsoft Word, you don't think about how the reader will see your writing; it's right there in front of the screen, and you are seeing exactly the same thing the reader will get.

In stark contrast, LaTeX is written entirely in normal text, and it looks kind of like a markup language like HTML or MarkDown. The author of a LaTeX document must tell the LaTeX engine exactly what he wants as regards formatting. For example, if he wants bullets, he writes \begin{itemize}, followed by a sequence of \item XYZ commands, then finishes it off with a \end{itemize}. This can get a bit tiresome, since in Microsoft Word, you would just press the "enable bullets" button, and the bullets would appear on the screen.

Why would anyone do this to themselves?

I can think of four good reasons to write documents in LaTeX.

  1. First, have you ever wondered how a WYSIWYG editor makes bullets, figures, etc. happen? Who knows? WYSIWYG documents are notoriously unportable; just try opening a .docx in LibreOffice if you don't believe me. LaTeX documents compile into PDFs, so they are well behaved across operating systems and PDF viewing software. In addition, there are a multitude of LaTeX engines, and they can implement the idea of, say, "bullets" each in their own way. This is very much analogous to how the same software source code can be compiled to run on a variety of architectures, each of which may implement addition or multithreading in a different way.
  2. Second, building off of the portability concept...you might use LaTeX because you are required to do so. LaTeX is the lingua franca of academia, and journals and conferences may require your submissions to be in LaTeX. This is because they (presumably) have their own private ways of turning LaTeX "source code" into pretty journal papers, using their equivalent of CSS sheets to coax the submissions into a uniform and branded format.
  3. Third, if you've ever tried to do something fancy in a WYSIWYG editor (equations, multiple panels in a figure, etc.), you know things can get a bit nasty. The benefit of LaTeX is that you can in principle typeset these fancier things more readily. LaTeX gives an excellent way to describe equations and make them show up prettily. Unfortunately, I'm of the opinion that more advanced typesetting like fancy figures is actually terrible in LaTeX, so I wouldn't really recommend it for that purpose.
  4. Lastly, LaTeX skills are one of the ways groups of nerds vie for technical supremacy. It's considered a badge of honor when a colleague asks you for help typesetting his document.

Cool, how do I do it?

I learned LaTeX "the hard way". I wrote my documents in VIM, compiled them with a Makefile, and viewed them with evince. This made writing documents a bit of a chore, and I didn't really look forward to it. I experimented with GUI LaTeX engines like MikTEX, but they didn't seem to add a lot to my experience. I went back to the terminal.

In a recent class, however, I learned about the Overleaf software, which offers a LaTeX editor in a web browser. It automatically handles concurrent edits to the same document, like Google Docs does, so instead of using "command-line LaTeX + git" to collaborate on papers, I've begun using Overleaf. You can think of Overleaf as "LaTeX meets Google Docs", and the marriage is an excellent one.

Though Overleaf simplifies the process of preparing and viewing LaTeX documents, it still helps to get to know the LaTeX language. I've done a few tutorials in LaTeX but I can't really recommend one. Maybe if I find a good one I'll write a future post about it.

Now, if only I can convince my advisor to switch to Overleaf!

Paper accepted!

23 Jan 2017

The email

Exciting news, everyone! That paper I wrote about in this blog post was accepted to EuroSys 2017 this morning! While it's not OSDI or SOSP, EuroSys is still a pretty darn good conference, and I'm really excited to have the privilege of presenting there. Though I attended ASPLOS 2016 (tagging along with Tong Zhang who was presenting his TxRace paper), I bet it feels different to attend a conference as a presenter rather than as a tag-along. I count myself extremely fortunate. This was my first paper submission, to a pretty good conference, and it was accepted.

Don't torture graduate students, people!

Confession time: I spent the morning frantically checking my email to see the results from the reviewers. I probably refreshed Gmail dozens of times. When I opened the email I was briefly despairing, since it began with the formula companies and universities use to tell you that you've been rejected:

On behalf of the program committee, thank you for submitting to EuroSys 2017. This year we had 200 submissions covering a wide range of topics and areas. The number and the quality of submissions as well as the breadth of topics has made it quite difficult to select the program.

Usually this kind of introduction is followed by lines informing you that while your submission was excellent, ultimately it did not suffice to win the approval of the powers that be. So I was pretty surprised to read the beginning of the next sentence: We are pleased to inform you that.... What a pleasant surprise!

And the contents of the paper? Dear reader, I'm afraid I can't tell you much about the project until it's officially in publication. I will, however, let slip the area: my work is on event-driven programming, with a working prototype implemented in Node.js.

Well, IBM sent me to China and India. Looks like Virginia Tech will be sending me to Serbia. Gotta say, I'm a big fan of passport stamps funded by a source other than my own bank account!

The aftermath

Once I'd read the email a few times to make sure I hadn't misread it, I called Kirsten and then wandered around the office telling my labmates the good news. Though we didn't collaborate on the project, they participated with moral support and company during long nights at the lab. They were delighted for me, and we played several celebratory games of ping-pong. I didn't play super well because I was jittery from the excitement (and also had given blood this morning), but who cares? Confession part two: I didn't get much done this afternoon, though I did make some headway into Welsh, Culler, and Brewer's SEDA paper.

Reflections

It's interesting to see the difference in perspective I have on the work now that I know it has been reviewed and accepted by my peers. Before today I thought the work was pretty good, but was not particularly optimistic about acceptance. I felt there wasn't enough novelty to warrant acceptance, and was anticipating some cutting evaluations to this effect. Now that it has "passed muster", I'm prouder of what I did - I think of it now as "real research", not just an interesting project.

Having a paper accepted feels like a major milestone in my PhD process. With one paper under my belt, writing another paper feels much more do-able. Having been through a nearly-complete publication cycle (it still needs to be shepherded), I now know what the "research motions" feel like. This must be like what an artist feels the first time he sells a painting. Come to think of it, I should talk to Curt Ramsey to compare feelings on this experience.

This acceptance gives me a lot more empathy for a friend whose first paper submission was rejected. At the time I made consoling noises for a minute and then went back to work, but I realize now that he must have experienced an unpleasant cacophony of failure- and depression-themed feelings. Making a career in research must be all about enough optimism and confidence to keep trying after rejection. I'm certainly not looking forward to having my work rejected, but it will be interesting to evaluate my own feelings after having that experience.

First steps to collecting NPM statistics

12 December 2016

Recently I thought it would be interesting to study the npm ecosystem. The first step to such a study is to gather data: for example, what are the names of the 300,000 packages in npm? I tried running some open-ended searches using the npm command-line tool (e.g. npm search 'a'), but these ran for minutes or hours and sometimes crashed with ENOMEM.

I dropped an email to the npm team a few days ago, and haven't heard back. I recently googled "npm by numbers", though, and found that this kind of study has actually been done before (detail), back in December 2014. But we're two years and 200,000 packages on from there, so it seems like a good time to do it again. Happily, the folks from bocoup had advice about how to clone your own copy of the repository (metadata only: this is referred to as the "skim" rather than the "fat" version, presumably with reference to dairy products). Though I love me some fatty dairy products, I only needed metadata, so skim seemed like the right option. I also got some advice from a chap explaining how to create a full mirror ("fat") of the repository.

Update: I think you should look here for the latest word on a fullfat version of npm.

Here are the steps I followed:

  1. Install couchdb: sudo apt-get install couchdb
  2. Open up /etc/couchdb/local.ini and edit httpd as described here.
  3. Restart couchdb: sudo service restart couchdb
  4. Instruct couch to replicate from npm's skimdb: curl -X POST http://127.0.0.1:5984/\_replicate -d '{"source":"https://skimdb.npmjs.com/registry/", "target":"registry", "create\_target":true}' -H "Content-Type: application/json"
  5. The skimdb database is being synced to a local registry database. You can monitor progress by visiting http://localhost:5984/\_utils/database.html?registry

After 30-60 minutes, my desktop had finished the replication. I retrieved descriptions of 364405 modules, which occupied in total 2.1 GB. What I retrieved was basically (entirely?) the package.json associated with each npm module, which includes information like: package name (always), author name and contact information, module repository, creation time, each version, etc. This information does not include, however, ecosystem metadata, like download rates or a module's dependents (the other modules that depend on it). Download rates remain a mystery, though a module's dependents can be computed by searching for it among every other module's dependencies. This will only tell you which of the modules published to npm depends on a module, though.

The couchDB website has some helpful advice for querying a database. The name of our database is 'registry', so here are some examples:

  1. Get DB metadata: curl -X GET http://127.0.0.1:5984/registry
  2. Get info about module 'math': curl -X GET http://127.0.0.1:5984/registry/math

CouchDB returns information in JSON, so it's easy to parse.

Thoughts on Writing

16 November 2016

This semester I've done more writing than I ever had before, and in particular I've spent far more time revising and editing a piece of writing than previously. When you're submitting to a conference or the NSF, somehow you care a lot more about the result than when you're turning in a class project...

I think this emphasis on careful writing has actually had an impact on the way that I think, speak, read, and write. I've observed an "aftereffect" for several days after completing an intensive writing project.

When you write carefully (unlike, for example, this blog), you must have a deep grasp of your audience. Your goal is to make your audience understand your idea. This is very different from the way you think of the idea in your own mind; writing is not the same as speaking. When you speak, you state the idea as you understand it. The idea is already in your head, and you are letting it out. When you write, on the other hand, the idea is not in your reader's mind, but rather you are trying to put it there. This seems somewhat like the relationship between a motor and a generator; electricity drives the motor to spin,, while being spun causes the generator to produce electricity.

I find the effect of intensive writing on myself quite noticeable. I speak more volubly (yes, this is possible) and with a wider vocabulary. After cudgeling my brain into organizing its thoughts for writing, I find the rest of my thinking more organized as well. And when I read, I'm now familiar with the process of writing, so I know the formulae and tricks the writer will use to help me understand his ideas.

All this makes me upset about the way I was taught to read in K-12. At least in my school system in upstate New York, English class consisted of reading works of fiction and writing non-fiction summaries, reports, and analyses about them. I was frustrated to no end about the teachers' emphasis on symbolism, metaphor, foreshadowing, and the other arts of the skilled writer. "What if the author just meant to tell a story?" was my cry, supported by comments from Gary Paulsen about how he was indeed just trying to tell young adult stories (that, unfortunately, I can't locate). I took a pragmatic approach, making up an interesting perspective and then "proof-texting" to force the meaning into the book.

But all this rigmarole was silly. When I actually thought about how I would go about writing a piece of fiction, I quickly realized that if I had a point to convey I would need to rely on the very tools I'd mocked growing up -- symbolism, metaphor, foreshadowing, and all the rest. I was taught these ideas because this is how authors actually convey the information, but without a real author to talk to and without playing at authorship myself I could never understand this. It turns out that some of these techniques show up in technical writing, too: foreshadowing, for example, is just the way you hint to the reader in your introduction about the "secret sauce" you'll use to solve the problem.

The solution? I think regularly writing in any style, technical, non-fiction, fiction, and perhaps even computer code, would be hugely influential on how thoughtful a person is. As a result of my experiences this semester, I plan to write in this blog more regularly, to do a better job of documenting the software I write, and to produce both daily research journal entries and larger-scale summaries of my progress (perhaps on a bi-weekly basis?).

Grant writing

16 November 2016

In an earlier post I talked about submitting my first paper to an academic conference. After I finished that submission I thought my major writing tasks were over for the semester. I was wrong.

About a week later, my advisor mentioned that he was planning to submit a grant proposal to the upcoming NSF Call for Proposals. Since I have some expertise in the area of the proposal, he wondered if I would be willing to help out. I found the prospect slightly intimidating, but after getting weekly pep talks about the importance of communication this semester from Dr. Eli Tilevich in his Research Methods class, I thought it would be a great opportunity to try it out in the big leagues.

Well, it was certainly an opportunity, all right. I haven't really done any homework in the past two weeks, since I've spent all of my time working on a seemingly endless stream of revisions of the grant proposal. Seeing how significantly the proposal transformed over the weeks was a real eye-opener on the importance of revision.

We spent several revisions in search of the right level for the NSF audience, helped by Eric Berger's insider's perspective. I'm not sure if we found the sweet spot between "too high level to convince anyone that what you're doing is possible" and "too detailed to be readable by anyone not in your area of expertise" but we sure tried.

I benefited a lot in this process from having recently read Joseph Williams's book Style, a masterpiece of writing on writing. I've always thought of myself as a pretty good writer "for an engineer", but let's face it, at Clarkson University where I did my undergraduate work the emphasis was definitely not on learning to write. Williams definitely taught me quite a lot. The most interesting part of the book was his ability to articulate the principles of good writing in a way that I've understood intuitively but never been able to put into words. I usually go with the "I know it when I see it" approach to good writing, but Williams helped me understand whether a particular piece of writing is good and how to improve it.

On a side note, I thought Style was vastly superior to Strunk and White's The Elements of Style. Strunk and White emphasize "nitpicks", but they don't help the reader see the big picture. I think Strunk and White is a great use of 50 pages, and is a good starting point for middle school or high school, but it won't help the reader reach great heights of writing.

Well, my advisor turned in the grant application this morning (I was up at 6AM working on final edits). We'll hear back from the NSF in about 6 months. I'm hopeful that we'll at least get helpful peer feedback, though of course actual grant money would be great!

Contributing to open source

2 November 2016

Until this semester I'd never made a contribution to an open-source project. However, since my research is in bug detection, successful research means finding bugs in real projects. When researchers want to look for bugs they turn to open-source projects. Every software project has bugs, and access to the source code is necessary to prove that a bug is legitimate rather than an issue with the research prototype. Since companies don't typically let you have access to their source code, open-source it is.

Anyway, the upshot of this is that -- yay! -- my research prototype actually did turn up some real bugs in open-source projects. Part of the goal of research is to make the world a better place, and in this context that means submitting a patch for the bugs. I've always found the prospect of committing code to an open source project somewhat intimidating -- aren't the guys in the open source community programming wizards? It turns out that contributing to a project is actually pretty easy. Not nearly as intimidating as I'd thought! I've made five contributions in the last four weeks, and it's been a cool experience. Let me tell you about it.

GitHub

First things first. All open source projects live on GitHub. So, to make a contribution, you need to create an account. The good news: there's actually a student package that gives you a nicer account for free. Thanks GitHub! You can also personalize the URL, always a nice touch when trying to make your web presence consistent. Here's my GitHub account page.

Once you've made an account, the next step is using it. You can create your own repositories (public by default, but the student developer pack lets you make (additional?) private repositories) and upload them, and other people Googling the right keywords can find them. Alternatively, you can identify an existing repository with which you want to interact. You might, for example, want to use it in one of your projects, or you might just think it's cool and want to get involved.

Getting involved with an existing project

Alright, let's suppose you've identified a project and want to help out. Any code contributions you make will be in the form of a pull request, so you should read up on those if you don't know what they are.

One route to help out is to wander through the source code and fix things. For example, in this pull request I submitted to kue, I (drumroll please) fixed a typo in an old comment.

Another option is to go to the GitHub page for the project and click on the "Issues" tab. Here people have indicated bugs or feature requests that they'd like fixed, and you could try to tackle one of them. Some project maintainers use helpful tags like "Good first contribution" or "Help wanted" to indicate where a new contributor could get started. I haven't tried this yet.

Two other nice ways to contribute are to improve the project documentation (e.g. this pull request or to enhance the test suite (like this one or this one. I think the open source community tends to attract more coders than writers or testers, so documentation and test suites are often minimal and could use an extra hand.

Closing thoughts

I must say, it's pretty exciting to get an email in my inbox with a reply to an issue I've opened or with some feedback on a pull request I've submitted. And there's quite the thrill when the maintainer merges a submission into the project for everyone to (hopefully!) benefit.

As a final note, these days your GitHub portfolio can really make a difference when you're interviewing for jobs. It's a way to showcase your technical skills and your ability to contribute to a community, both important aspects of real-world software development. So if you've never contributed to open-source, give it a whirl. You'll learn a lot and potential employers will get a sense of what you can do.

I've been working on a research project since last fall, and my advisor and I recently submitted a paper describing our work to a conference. I thought I would share some reflections about the experience.

First, finishing a project and turning it into a paper is a lot of work! I don't think I've worked that hard before in my entire life. I worked essentially non-stop for the three weeks leading up to the deadline: day, night, and weekend. My trusty French press and a new bag of Starbucks coffee carried me through, especially in the last three days where I got about 10 hours of sleep combined.

What was I doing all that time, exactly? Well, I don't know if this is a typical experience, but I got our experimental setup finished only a few weeks before the deadline. As a result, the majority of those three weeks were spent honing our experimental protocol. I would run an experiment and examine the results. They might look good, or I might decide I needed to tweak the experiment a bit, or run more iterations, or (!) modify the setup to fix issues on our end. Happily I didn't find too many egregious issues along the way. Unfortunately, every time I did find a problem with our setup, I had to re-run the previous experiments so that all of the results could be fairly compared. In retrospect, I should have automated the process of collecting experimental data, which would have made re-running the experiments much less painful.

The rest of the time was spent crafting the paper. My advisor was nice enough to help out with the writing process (yay!), so we divided up the sections and cranked out several drafts. We completed the first draft just over a week before the deadline, and in retrospect I think this was a mistake. Time permitting, next time around I'll aim to have run most of the experiments three weeks before the deadline, thus allowing plenty of time to really get the writing perfect. This extra time would also have allowed me to re-run experiments or perform new experiments as needed. I'm of the opinion that a mediocre idea, written up beautifully, is better for everyone involved than a beautiful idea buried in mediocrity. Clear writing helps the researchers focus their thoughts, helps the reviewers give it a fair assessment, and helps future readers get the best sense of the project. It takes me a long time to read poorly-written papers and the effort never seems quite worth it. If the paper is accepted, I plan to spend about two weeks cleaning up the manuscript before the final deadline.

Overall, I'm really glad we finished the paper in time for the submission deadline. We now have some time to turn our attention to a new project before the reviewers respond, giving us a chance to clear our minds and approach the original project again from a fresh perspective if the work is rejected.

One of the most rewarding aspects of submitting the paper is that it drove home how much work I've put into the project. When you do a little bit each day, it's hard to see how much progress you've made. But when I tried to write it all down and then actually had to discard some of my efforts to fit inside the page limit, I had a huge sense of accomplishment. It remains to be seen if the reviewers think I did anything worthwhile, but I sure do.

This experience has made a nice change from my time at IBM. Our software development cycle typically ran in a 3-6 month period. While I would work on new features in each cycle, there was a lot of repetition in the day-to-day tasks -- maybe 30% new, 70% ho-hum. In contrast, after we got the paper out the door, I've started work on a completely different project. Vive la difference!

In my Research Methods class, Professor Tilevich has been emphasizing the importance of branding and establishing a sense of self. Our assignment this week was to develop a good-looking website, and, if I say so myself, I think I've succeeded. This post is a collection of the notes I made along the way. Perhaps they will be helpful to someone else with a similar task.

Virginia Tech gives each CS student hosting space on people.cs.vt.edu (here's the official policy). The landing page looks like http://people.cs.vt.edu/~YOURPIDHERE/, so my landing page is here.

By default this will yield an HTTP 404 error (page not found). To add content, you can access your home directory (e.g. via YOURPID@rlogin.cs.vt.edu:~/WWW/people.cs.vt.edu, aka ...:/web/people/MYPID) and add an index.html file.

I installed an apache server on my laptop for local testing:

sudo apt-get install --reinstall apache2
sudo service 
service apache2 status
 * apache2 is running

The test version of my blog is in ~/Desktop/MY_PID-website/, so I set up a temporary sshfs mount for easy deployment:

mkdir /tmp/MY_PID
sshfs MY_PID@rlogin.cs.vt.edu:/web/people/MY_PID /tmp/MY_PID

Deployment was now as simple as

# Deploy locally for apache to serve; navigate to 'localhost' in web browser
sudo cp -r ~/Desktop/MY_PID/* /var/www/html/

# Deploy the latest version from ~/Desktop
rm -rf /tmp/MY_PID/*
cp -r ~/Desktop/MY_PID/* /tmp/MY_PID

The next question was, what to put on my website? I'd never built a website before, and I wasn't sure where to start. I recall hearing terms like HTML and CSS, so those seemed like good search terms for Google. After a bit I located this set of paid and free website templates. I settled on the Mungo layout as a starting point. Sure enough, after following the deployment steps above I had a website chock full o' lorem ipsum. It seems that HTML holds the website layout (and typically the content), and CSS (Cascading Style Sheets) defines rules to tell the web browser what the layout should look like (e.g. color, width, any changes needed for mobile devices, etc.)

I wasn't very happy with the color scheme. It turns out that if you Google for "HTML color scheme" you can find sites like this that will propose color schemes -- colors that look good together. I found some I liked. However, the CSS had the colors hardcoded. I wasn't a fan, in case I changed my mind about the color scheme later, so I looked for a way to have variables in CSS. This is apparently impossible, but there are pre-processing systems (like the pre-processor that allows macros in C) that let you write somewhat more flexible CSS code. I settled on sass.

Sass is pretty straightforward to use. The file extension is .scss, and you can either convert .scss files to .css files manually or run sass in "watch" mode to monitor for changes to .scss files and automatically produce new .css files.

# Convert scss to css
sass input.scss output.css

# More convenient -- it will monitor for changes and generate the corresponding .css file for you
sass --watch ~/Desktop/MY_PID/css > /dev/null 2>&1 &

One issue I ran into during the process was getting images to the appropriate size. I started out with a 2MB picture of myself looking glamorous, but this caused the site to load noticeably more slowly. I found some advice here on using Gimp to tweak images. Here were the steps I followed:

  1. Change pixels per inch (ppi) to 72.
  2. Export as jpg, the preferred format for pictures of humans.
  3. Update the image path in the html and/or CSS. In my case, the image was a background in the CSS file, so I also had to update the image dimensions specified there.

I was able to extend the template supplid by Mungo (thanks, Mungo!) to develop what I think is a pretty good personal website. I also added separate pages for blogs and ported my fledgling personal blog from Weebly to here. Now I have total control over my online presence!