Computer Science PhD Qualifier Exam: Data, Information, Knowledge, and Libraries

Committee

Instructions

All students must register via email to the chair (Bert Huang).
At the beginning of the examination period, all students will receive a document that contains two questions.
By the end of the examination period, each student must turn in a written solution to one of those questions. The solutions will be no longer than 8 pages (excluding references) at 10 point or larger using IEEE 2-column style format.
Written solutions should take the form of a scientific paper. It should include at least the following:
- a motivation section making clear the context of the problem/situation;
- a clear statement of the problem in terms of concepts and terminology in the information/data area, that addresses the situation/context;
- a review of related literature, drawn mostly from multiple relevant works in the reading list, but that should include additional references found by the student during a thorough literature search;
- a statement of how the problem can be approached; and
- a description of the approach to solve the problem.
Students will then provide an oral presentation detailing their solution. They must be completed within a 30 minute period, in which 20 minutes are for presentation and 10 minutes for answering questions posed by faculty examiners.
Each solution will be graded by at least 2 faculty members. A combined grade will then be assigned for each student based on all faculty input by the area committee, on a scale of 0-3, as is called for by GPC policies.

Registered Students

Subhodip Biswas
Zhiqian Chen
Ziqian Song

Early Withdrawal Policy

Once students have notified the Computer Science Department of their intention to take the Data, Information, Knowledge, and Libraries (DIKL) Ph.D. Qualifier Exam, they may withdraw from taking the exam at any point prior to the public release of the exam questions. Once the exam questions are released, the exam is considered "in progress" and withdrawal is prohibited. Students with questions about this policy should contact the exam chair directly.

Academic Integrity

Discussions among students of the papers identified for the DIKL Qualifier are reasonable up until the date the exam is released publicly. Once the exam questions are released, we expect all such discussions will cease as students are required to conduct their own work entirely to answer the qualifier questions. This examination is conducted under the University's Graduate Honor System Code. Students are encouraged to draw from other papers than those listed in the exam to the extent that this strengthens their arguments. However, the answers submitted must represent the sole and complete work of the student submitting the answers. Material substantially derived from other works, whether published in print or found on the web, must be explicitly and fully cited. Note that your grade will be more strongly influenced by arguments you make rather than arguments you quote or cite.

Exam Schedule

12/20/2016: release of reading list
1/9/2017: release of written exam, last day to register for exam
1/23/2017: student solutions to written exam due
1/30/2017: student presentation slides due
2/1/2017: oral exam

Exam Questions

The exam questions are available for download here.

Reading List

Moritz Hardt. How Big Data is Unfair. Medium, 2014.
Dino Pedreschi, Salvatore Ruggieri, Franco Turini. Discrimination-Aware Data Mining. Knowledge Discovery and Data Mining, 2008.
Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold. Fairness Through Awareness. Innovations in Theoretical Computer Science Conference, 2012.
Sara Hajian and Josep Domingo-Ferrer. A Methodology for Direct and Indirect Discrimination Prevention in Data Mining. IEEE Transactions on Knowledge and Data Engineering, 2013.
Richard Zemel, Yu Wu, Kevin Swersky, Toniann Pitassi, Cynthia Dwork. Learning Fair Representations. International Conference on Machine Learning, 2013.
Koray Mancuhan, Chris Clifton. Combating Discrimination Using Bayesian networks. Artificial Intelligence and Law, 2014.
Francesco Bonchi, Sara Hajian, Bud Mishra, Daniele Ramazzotti. Exposing the Probabilistic Causal Structure of Discrimination. ArXiv Preprint, 2015.
Marco Tulio Ribeiro, Sameer Singh, Carlos Guestrin. “Why Should I Trust You?” Explaining the Predictions of Any Classifier. Knowledge Discovery and Data Mining, 2016.
Tolga Bolukbasi, Kai-Wei Chang, James Zou, Venkatesh Saligrama, Adam Kalai. Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings. Neural Information Processing Systems, 2016.
R. Li, S. Wang, K. Chang, “Multiple Location Profiling for Users and Relationships from Social Network and Content,” Proceedings of the VLDB Endowment, Vol. 5, No. 11, pp. 1603-6114, 2012
R. Li, S. Wang, H. Deng, R. Wang, and K. Chang, “Towards Social User Proﬁling: Uniﬁed and Discriminative Inﬂuence Model for Inferring Home Locations,” Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1023-1031, 2012.
Zhiwei Li, Bin Wang, and Mingjing Li, Wei-Ying Ma, “A Probabilistic Model for Retrospective News Event Detection,” Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 106–113, 2005.
Eytan Bakshy, Jake M. Hofman, Winter A. Mason, Duncan J. Watts, “Everyone’s an Influencer: Quantifying Influence on Twitter,” Proceedings of the Fourth ACM International Conference on Web Search and Data Mining, pp. 65-74, 2011
Shuyang Lin, Fengjiao Wang, Qingbo Hu, and Philip S. Yu, “Extracting Social Events for Learning Better Information Diffusion Models,” Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining pp. 365-373, 2013.
Shaomei Wu, Jake M. Hofman, Winter A. Mason, and Duncan J. Watts, “Who Says What to Whom on Twitter,” Proceedings of the 20th International Conference on World Wide Web, pp. 705-714, 2011.

Grading Scale

The exam will ultimately be graded on a scale as detailed in the Ph.D. Student Handbook, as replicated here.

Student's performance is such that the committee considers the student unable to do Ph.D.-level work in computer science.
While the student adequately understands the content of the work, the student is deficient in one or more factors listed for assessment under score value of 2. A score of 1 is the minimum necessary for an MS-level pass.
Performance appropriate for students preparing to do Ph.D.-level work. Prime factors for assessment include being able to distinguish good work from poor work, and explain why; being able to synthesize the body of work into an assessment of the state-of-the-art on a problem (as indiciated by the collection of papers); and being able to identify open problems and suggest future work.
Excellent performance, beyond that normally expected or required for a Ph.D. student.