CS 6604: Security ML

Instructor	Gang Wang (gangwang@vt.edu)
Time/Location	Tuesday/Thursday 3:30 PM - 4:45 PM in McBryde Hall 226
Office Hour	By appointment. My CRC office is in KnowledgeWorks II, room 2223 (Reachable via CRC shuttle)
Text Book	We will focus on reading research papers. There is no required textbook.

Announcements

08/15/17: The class is full right now. If you want to join the class, please use the waiting list and attend the first class during week-1.

Class Description

Machine learning has become a mainstream tool that significantly extends the capabilities of data-driven systems in a variety of areas. This class will focus on understanding the recent inter-play between machine learning and security. 1) machine learning is a useful technique to build new solutions for many security problems. 2) Similarly, attackers may also use machine learning to launch more intelligent attacks. 3) machine learning itself can introduce a whole new class of risks, allowing adversaries to manipulate the machine learning process and the outcome.

This is not a typical machine learning class: we will not focus on developing new theories or methods in machine learning. Instead, we will study the state of the art in applied machine learning in security related topics. We will focus on understanding the best, most creative ways to apply existing machine learning tools and techniques as well as their limitations and potential risks. In this class, we will read a number of technical papers, and work on a research project in teams of 2-3 students. The goal of the project is to extend current machine learning techniques to new problems, with the end goal of producing real and publishable results by the end of the semester. In addition, students are expected to gain experience in two valuable skills: quickly reading technical papers (without sacrificing understanding), and giving good public presentations.

Expected Work

Participation: students are required to attend all lectures, read all required papers and participate in paper discussions both online and in-class.

Team Project: 2-3 students will form a team to work on a single research project throughout the semester. The project should aim to solve a real problem in the intersection area of machine learning and security/privacy. Each team will give a short talk in the midterm and have a final presentation at the end of the semester. Each team is also expected to write up a final project report.

Paper Presentation: students will present papers during the class to lead the discussion. Each student will cover 1-2 papers depending on the class size.

Class Schedule

Date	Papers	Note
Aug 29, 31	ML for Attack Class Intro, Overview of ML, Project Ideas (Slides) "Keyboard emanations revisited," CCS 2005, PDF (Gang Wang: Slides)
Sep 05, 07	ML for Attack "Hidden Voice Commands," USENIX Security 2016, PDF (Stefan Nagy) "Doppelgänger Finder: Taking Stylometry to the Underground," IEEE SP 2014, PDF (Sazzadur Rahaman)
Sep 12, 14	ML for Attack "Accessorize to a Crime: Real and Stealthy Attacks on State-of-the-Art Face Recognition," CCS 2016, PDF (Reid Bixler) "SAMPLES: Self Adaptive Mining of Persistent Lexical Snippets for Classifying Mobile Application Traffic," Mobicom 2015, PDF (Long Cheng)	Proposal due (Tue 11:59pm)
Sep 19, 21	ML for Defense "Mining Anomalies Using Traffic Feature Distributions," SIGCOMM 2005, PDF () "Behavioral clustering of HTTP-based malware and signature generation using malicious network traces," NSDI 2010, PDF (Zachary Burch)
Sep 26, 28	Adversarial ML "Adversarial Machine Learning", AISec 2011, PDF (Tue: ) "Adversarial classification," KDD 2004, PDF (Tue: ) "Poisoning Attacks against Support Vector Machines," ICML 2012, PDF (Thr: Shengzhe Xu)
Oct 03, 05	Adversarial ML "Bayesian Watermark Attacks," ICML 2012, PDF (Yang Xiao) "Automatically evading classifiers," NDSS 2016, PDF (Yanshen Yang)
Oct 10, 12	Adversarial ML "Deep neural networks are easily fooled: High confidence predictions for unrecognizable images," CVPR 2015, PDF (Tue: Xiangwen Wang) Midterm (Thr)
Oct 17, 19	ML for Defense "Learning to Identify Regular Expressions that Describe Email Campaigns," ICML 2012, PDF Guest Lecture: Supporting Image Geolocation with Diagramming and Crowdsourcing. AAAI HCOMP 2017 PDF (Professor Kurt Luther)
Oct 24, 26	Working on your project (no class meeting) "Detecting Credential Spearphishing in Enterprise Settings," USENIX Security 2017, PDF (Kate Nguyen)
Oct 31, Nov 02	ML Usability "Why should i trust you?: Explaining the predictions of any classifier," KDD 2016, PDF (Hang Hu) "Scaling Up Crowd-Sourcing to Very Large Datasets: A Case for Active Learning", PVLDB 2014, PDF (Moeti Masiane)
Nov 07, 09	ML for Defense/Attack "Finding Botnets Using Minimal Graph Clusterings," ICML 2012, PDF () "Evading Classifiers by Morphing in the Dark," CCS 2017, PDF (Peng Peng)
Nov 14, 16	Adversarial ML "Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks," IEEE SP 2016, PDF (Thomas Lux) "Towards evaluating the robustness of neural networks," IEEE SP 2017, PDF (Colin Shea-Blymyer)
Nov 21, 23	Thanksgiving Holiday
Nov 28, 30	Adversarial ML "Practical black-box attacks against machine learning," ASIA CCS 2017, PDF (Chun Wang) "Approaches to adversarial drift," AISec 2013, PDF (Lindah Kotut)
Dec 5, 7	ML for Defense "MagNet: a Two-Pronged Defense against Adversarial Examples," CCS 2017, PDF () "Vulnerability disclosure in the age of social media: Exploiting Twitter for predicting real-world exploits," USENIX Security 2015, PDF (Alex Hsu)
Dec 12, 14	No class, working on your project (Dec 12); Reading Day (Dec 14); Final (1:05-3:05PM, Dec 20, MCB 226)

Grading

Class attendance and participation	10%
Paper discussion online	20%
Paper presentation in class	15%
Project: proposal	10%
Project: midterm presentation	10%
Project: final presentation	20%
Project: report	15%

To calculate final grades, I simply sum up the points obtained by each student (the points will sum up to some number x out of 100) and then use the following scale to determine the letter grade: [0-60] F, [60-62] D-, [63-66] D, [67-69] D+, [70-72] C-, [73-76] C, [77-79] C+, [80-82] B-, [83-86] B, [87-89] B+, [90-92] A-, [93-100] A. I do not curve the grades in any way.

Online Paper Discussion

Please use this discussion section in Canvas to post comments about the papers you read. The discussion will count for 20% of your final score.

For each paper, the first student can start a discussion thread by summarizing the main idea and key contributions. All other students should use the first student's thread to post their comments. The first student should use the paper title as the title for the discussion thread.
Your comment should contain sufficient content (200-500 words, but can be longer or shorter).
The comment should look like a mini-review for the paper. You can talk about the strength and weakness of the paper, raise questions about the experiment methodologies, discuss practical implications or suggest new ideas. The comments should be your own thoughts. Please don't copy and paste text from the paper.
You cannot repeat or rephrase the points that have been covered by your classmates. This means, post your comments early.

Policies

Late Policy: All the deadlines are hard deadlines. Any late submissions will be subject to score reduction: if you submit within 3 days (72 hours) after the deadline, your score will be 0.5*(your raw score). If you submit after 3 days, the score will be 0.

Academic Integrity: Virginia Tech Honor Code applies to this course. It describes the expectations for academic integrity in this course. This course will have a zero-tolerance policy regarding plagiarism. You (or your team) should complete all the assignments and project tasks on your own. You are encouraged to post your questions to CANVAS, and are also encouraged to answer questions posted by other students. However, you may not give or receive help from others with writing your program code or your answers to any assignment or test. When you use the code or tools developed by other people, please acknowledge the source. If an idea or a concept used in your project has been proposed by existing work, please make the proper citation. All electronic work submitted for this course is archived and subjected to automatic plagiarism detection and cheating analysis. Whenever in doubt, please seek help from the instructor. I will not hesitate to report any incidents of academic dishonesty to the graduate school or honor system. For more information on the Graduate Honor Code, please refer to the GHS Constitution. More information about the Virginia Tech Honor Code can be found here.

When presenting research papers in the class, you may not use the authors' slides directly. Please make your own slides and add your own thoughts.

Special Accommodations: If you need special accommodations because of a disability, please contact the instructor in the first week of classes.

CS 6604: Applied Machine Learning in Security