Computer Science PhD Qualifier Exam: Data, Information, Knowledge, and Libraries

Committee

Instructions

Registered Students

Early Withdrawal Policy

Once students have notified the Computer Science Department of their intention to take the Data, Information, Knowledge, and Libraries (DIKL) Ph.D. Qualifier Exam, they may withdraw from taking the exam at any point prior to the public release of the exam questions. Once the exam questions are released, the exam is considered "in progress" and withdrawal is prohibited. Students with questions about this policy should contact the exam chair directly.

Academic Integrity

Discussions among students of the papers identified for the DIKL Qualifier are reasonable up until the date the exam is released publicly. Once the exam questions are released, we expect all such discussions will cease as students are required to conduct their own work entirely to answer the qualifier questions. This examination is conducted under the University's Graduate Honor System Code. Students are encouraged to draw from other papers than those listed in the exam to the extent that this strengthens their arguments. However, the answers submitted must represent the sole and complete work of the student submitting the answers. Material substantially derived from other works, whether published in print or found on the web, must be explicitly and fully cited. Note that your grade will be more strongly influenced by arguments you make rather than arguments you quote or cite.

Exam Schedule

Exam Questions

The exam questions are available for download here.

Reading List

  1. Moritz Hardt. How Big Data is Unfair. Medium, 2014.
  2. Dino Pedreschi, Salvatore Ruggieri, Franco Turini. Discrimination-Aware Data Mining. Knowledge Discovery and Data Mining, 2008.
  3. Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold. Fairness Through Awareness. Innovations in Theoretical Computer Science Conference, 2012.
  4. Sara Hajian and Josep Domingo-Ferrer. A Methodology for Direct and Indirect Discrimination Prevention in Data Mining. IEEE Transactions on Knowledge and Data Engineering, 2013.
  5. Richard Zemel, Yu Wu, Kevin Swersky, Toniann Pitassi, Cynthia Dwork. Learning Fair Representations. International Conference on Machine Learning, 2013.
  6. Koray Mancuhan, Chris Clifton. Combating Discrimination Using Bayesian networks. Artificial Intelligence and Law, 2014.
  7. Francesco Bonchi, Sara Hajian, Bud Mishra, Daniele Ramazzotti. Exposing the Probabilistic Causal Structure of Discrimination. ArXiv Preprint, 2015.
  8. Marco Tulio Ribeiro, Sameer Singh, Carlos Guestrin. “Why Should I Trust You?” Explaining the Predictions of Any Classifier. Knowledge Discovery and Data Mining, 2016.
  9. Tolga Bolukbasi, Kai-Wei Chang, James Zou, Venkatesh Saligrama, Adam Kalai. Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings. Neural Information Processing Systems, 2016.
  10. R. Li, S. Wang, K. Chang, “Multiple Location Profiling for Users and Relationships from Social Network and Content,” Proceedings of the VLDB Endowment, Vol. 5, No. 11, pp. 1603-6114, 2012
  11. R. Li, S. Wang, H. Deng, R. Wang, and K. Chang, “Towards Social User Profiling: Unified and Discriminative Influence Model for Inferring Home Locations,” Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1023-1031, 2012.
  12. Zhiwei Li, Bin Wang, and Mingjing Li, Wei-Ying Ma, “A Probabilistic Model for Retrospective News Event Detection,” Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 106–113, 2005.
  13. Eytan Bakshy, Jake M. Hofman, Winter A. Mason, Duncan J. Watts, “Everyone’s an Influencer: Quantifying Influence on Twitter,” Proceedings of the Fourth ACM International Conference on Web Search and Data Mining, pp. 65-74, 2011
  14. Shuyang Lin, Fengjiao Wang, Qingbo Hu, and Philip S. Yu, “Extracting Social Events for Learning Better Information Diffusion Models,” Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining pp. 365-373, 2013.
  15. Shaomei Wu, Jake M. Hofman, Winter A. Mason, and Duncan J. Watts, “Who Says What to Whom on Twitter,” Proceedings of the 20th International Conference on World Wide Web, pp. 705-714, 2011.

Grading Scale

The exam will ultimately be graded on a scale as detailed in the Ph.D. Student Handbook, as replicated here.

  1. Student's performance is such that the committee considers the student unable to do Ph.D.-level work in computer science.
  2. While the student adequately understands the content of the work, the student is deficient in one or more factors listed for assessment under score value of 2. A score of 1 is the minimum necessary for an MS-level pass.
  3. Performance appropriate for students preparing to do Ph.D.-level work. Prime factors for assessment include being able to distinguish good work from poor work, and explain why; being able to synthesize the body of work into an assessment of the state-of-the-art on a problem (as indiciated by the collection of papers); and being able to identify open problems and suggest future work.
  4. Excellent performance, beyond that normally expected or required for a Ph.D. student.