CS 4984 - Data Science & Analytics Capstone
Spring 2019

Logistics

  • Instructor: Tanushree (Tanu) Mitra
  • Instructor Email: tmitra@vt.edu
  • Class hours: Tues, Thurs - 12:30pm to 1:45pm
  • Where: Holden Hall 212
  • Class Schedule
  • Office hours:
    • Tuesdays: 2:30pm - 4:00pm at Torg 3160E (schedule by appointment only)

Course Overview

Researchers across disciplines are excited by the prospect of “data-driven science”. This advanced project-based course is geared towards deriving valuable insights from data. Students are expected to integrate software engineering and data analytics skills acquired in previous courses. Team-based capstone data project will work on real-world challenges that surface on online social platforms. Hence, the focus will be on data generated on online platforms, such as, Facebook, Twitter, Reddit, YouTube, news platforms and the corresponding real-world challenges that emerge on these platforms (examples include, misinformation, hate speech, polarization, etc.). The first few weeks of this course will comprise multiple readings, in-class discussions, and in-class practicum sessions to introduce you to basic concepts of analyzing data left behind in social media platforms. During this time, you will have the opportunity to read technical papers, write your reflections where you will not just summarize the paper but think about what additional questions the paper enables. This is your chance to come up with a cool project idea based on what you just read. I will also provide you with a list of high level topics and suggestions. You will blog about your ideas, which will ultimately lead to team pitches and your project proposal. We will also have mid-term check points for your final projects and multiple practicum sessions and milestone checks during the course of the semester..

Learning Objectives

After successful completion of this course, you will be able to:

  • Identify important research questions that can be answered with data gathered from online social platforms.
  • Analyze the data left behind in online platforms.
  • Draw valuable insights from the analyzed data so as to answer questions from a number of practical scenarios and domains, spanning news, misinformation, online speech, etc.
  • Approach problems arising in online platforms data-analytically. That is, think carefully & systematically about whether & how data can improve an existing online phenomenon or offer new insights.

Prerequisites

Students should be prepared to apply what they have learned in other classes and on their own while implementing their capstone projects. They should be open to learning many new skills in the field of data analytics by referring to primary as well as secondary resources. In terms of the required skills, students should be comfortable programming in python. A basic knowledge of statistics and preliminary machine learning will be a plus, but not required. An overview of the concepts and tools needed will be reviewed in class, however in-depth coverage of the fundamentals is not in the scope of this course. Make sure you are comfortable with this. You are expected to quickly learn many new things which can vary based on your project. For example, your project may require you to fetch Twitter data using the Twitter API or analyze posts from Reddit using pre-existing libraries (like python nltk, sklearn), which should not be too challenging if you already know high-level languages like Python. Please make sure you are comfortable with this.

Texts

No textbook is required for the course. Readings will be provided as linked pdfs or as electronic reserve from the Virginia Tech Libraries website, which means either being on campus or connecting to the VT network through a VPN.

Websites

The following sites will be used to support this class:

  • This website, for the syllabus, schedule, and assignment details.
  • Class blog, for reading reflections. Please make sure to sign-up to vt wordpress.
  • Canvas, for grades and submitting work.

Grading Criteria

  • Class participation – 10%
  • Reading responses – 15%
  • Term project – 75%
    • Project pitch presentation - 5%
    • Project proposal - 5%
    • Practicum milestone – 10%
    • Midterm project presentation - 5%
    • Midterm Report - 10%
    • Final project presentation - 10%
    • Final report - 25%
    • Poster submission to VTURCS Spring Symposium - 5%

Exams

The final project serves as the final exam for this course. There is no separate exam.

Assignment Details:

Class participation (10%) - Individual

Attendance in class, participating in class discussions and in-class exercises and activities are critical and essential for success in this course. We will also do occasional quizzes spread throughout the semester. The purpose of the quiz is to ensure in-class participation and also at times for me to gauge what topics we need to devote more time on. These quizzes will not be graded for correctness, but for completeness and participation. Note, the word – completeness. You cannot do a sloppy job in-class and expect a grade for correctness on your quiz.

Reading responses (15%) - Individual

During the first few weeks of the semester, you will be assigned several academic papers. The goal of these readings is to stimulate your critical thinking about designing and implementing a data science project and eventually coming up with a stellar project proposal. While writing your reflections, you will not just summarize the paper but think about what additional questions the paper enables, how is it relevant to modern digital social environments, give examples, talk about your experiences if any, be creative.

Reading reflections should be within one page (roughly within 600 words if you are using 12pt font) and should be submitted on the Class blog. You won’t be penalized if you write more, but being succinct is another great writing skill which you should aim to cultivate in this course. Note, you do not need to summarize the full paper, but you need to reflect on what additional questions the work enables. Does this help you think about your next big project? What will that be? What other questions the paper makes you think? What else the paper is not answering or is concerning or is just intriguing?

Again this is an individual assignment and work submitted should be written solely by you. Most importantly, a reader while glancing at your reflection should be able to easily spot these questions. So use bold, italicize, bullet points or other means of highlighting them. Here are some great examples of a reflection on the paper Antisocial Behavior in Online Discussion Communities written by students in a prior course offering (example 1, example 2, example 3). I also like the following examples reflections on “Partisanship and the search for engaging news”, example 4, example 5). All these reflections pose exciting new questions that could be start of a cool project. What would yours be, based on what you read in the first few weeks of class?

Term Project (70%) - Group

The goal of the final project is to identify an interesting question or problem on online social media platforms that you can address by analyzing online data. The papers and practicums discussed in class should help you along the way. All project topics must come from one of the following two themes:

  • Theme 1: News and misinformation
  • Theme 2: Hateful and offensive content

Project topics must be approved by the instructor. You need to justify that the key question that your project topic is interesting, relevant to the course, relevant to the theme, and is of suitable difficulty. Your project should have some some non-trivial analysis/algorithms/computation/experimentation (e.g., computing basic statistics, like average, min/max will not be enough). Implementing your term project has multiple graded components starting with the project proposal and ending with the final presentation and report submission.

A list of high-level suggested topics framed as “research questions” spanning these two themes will be made available. While you are free to implement the project in the programming language of your choice, I will highly recommend using python. For doing statistical analysis, R is another great choice. You will use Jupyter notebooks to present your data analysis and for any in-class project discussions, like the milestone practicums.

Once you have selected a topic, you should do some background reading so that you are capable of describing, in some detail, what you expect to accomplish. For example, if you decide that you want to implement some new proposal for detecting hateful content on the social network Gab, you will have to carefully read papers that addresses this problem, pinpoint their weaknesses, or come up with new suggestions based on what you read and explain how your approach will address these weaknesses or is a good alternative. Once you have read up on your topic, you will be ready to write your proposal.

See the (Term Project) page for additional details.

Late Policy

  • Reading Responses: No late days. All reading responses are due at 9am on the day of class. Responses to readings serve to stir class discussions, hence there is no point if you submit it at a later time after class.
  • Practicum milestone checkpoints: No late days. All materials due at 9am on the day of milestone checkpoint.
  • Midterm project presentation: No late days.
  • Midterm project report: Late days allowed, however, late assignments will be penalized at a rate of one grade step (e.g., A becomes A-) per day. Submissions more than five days late will not be accepted.
  • Final project presentation: No late days.
  • Final project report: No late days..

Honor code

The Virginia Tech Undergraduate Honor System is in effect for all work, whether performed individually or in teams. Be particularly careful to avoid plagiarism, which essentially means using materials (ideas, code, designs, text, etc.) that you did not create without giving appropriate credit to the creator (using quotation marks, citations, comments in the code, link to URL, etc.). Students are encouraged to consult with one another about project design and evaluation issues, as the sharing of ideas here will lead to better work. The final exam is entirely individual. Any suspected violations of the honor code will be promptly reported to the honor system, as required by university policy.

Special needs

If you are a student with special needs or circumstances, if you have emergency medical information to share with me, or if you need special arrangements in case the building must be evacuated, please let me know privately as soon as possible.

Resources

Several online resources and materials adapted from similar classes