Data and Information Ph.D. Qualifier Exam

Spring 2022

Examining Faculty

    Dr. Bimal Viswanath (Chair)
    Dr. Daphne Yao
    Dr. Ismini Lourentzou
    Dr. Lifu Huang
    Dr. Anuj Karpatne
    Dr. Peng Gao

Registered Students

    Sijia Wang
    Sifat Muhammad Abdullah
    Connor Weeks
    Xavier Pleimling
    Kenneth Neba
    Amarachi Blessing Mbakwe
    Sikiru Adewale
    Muntasir Wahed
    Sareh Ahmadi
    Blessy Antony
    Tanmoy Sarkar Pias
    Mehmet Oguz Yardimci
    Jostein Barry-Straume
    Brannon King
    Alvi Md Ishmam
    Makanjuola Ogunleye
    Shuaicheng Zhang
    Yanshen Sun

Tentative Instructions

Early Withdrawal Policy

A student registered for the PhD qualifier exam may withdraw at any point of time before the early withdrawal deadline, which is Jan 1, 2022. After this date, withdrawal is prohibited. Students with questions about this policy should contact the exam chair directly.

Academic Integrity

Discussions among students of the papers identified for the exam are reasonable up until the date the exam is released publicly. Once the exam questions are released, we expect all such discussions will cease as students are required to conduct their own work entirely to answer the qualifier questions. This examination is conducted under the University's Graduate Honor System Code. Students are encouraged to draw from other papers than those listed in the exam to the extent that this strengthens their arguments. However, the answers submitted must represent the sole and complete work of the student submitting the answers. Material substantially derived from other works, whether published in print or found on the web, must be explicitly and fully cited. Note that your grade will be more strongly influenced by arguments you make rather than arguments you quote or cite.

Exam Schedule

    12/01/2021: Release of reading list
    12/06/2021: Deadline for students to commit to exam
    1/1/2022: Last day to withdraw
    1/11/2022: Release of written exam
    1/27/2022: Student solutions to written exam due
    Beginning of Feb: Oral exams.

Reading List

The reading lists below cover the following topics: (1) Data Mining and Information Retrieval, (2) Natural Language Processing, (3) Computer Vision, (4) Reinforcement Learning, (5) Graph Neural Networks, (6) Machine Learning and Security. You may choose choose any one of these lists for your exam. You are expected to significantly expand on your selected list while preparing your written solution.

List 1: Data Mining and Information Retrieval
  1. Bitfunnel: Revisiting signatures for search, Goodwin, Bob, Michael Hopcroft, Dan Luu, Alex Clemmer, Mihaela Curmei, Sameh Elnikety, and Yuxiong He. SIGIR, 2017.
  2. Controlling fairness and bias in dynamic learning-to-rank, Morik, Marco, Ashudeep Singh, Jessica Hong, and Thorsten Joachims. SIGIR, 2020.
  3. Neural collaborative filtering vs. matrix factorization revisited, Rendle, Steffen, Walid Krichene, Li Zhang, and John Anderson. ACM RecSys, 2020.
  4. A stochastic treatment of learning to rank scoring functions, Bruch, Sebastian, Shuguang Han, Michael Bendersky, and Marc Najork. WSDM, 2020.
  5. On sampled metrics for item recommendation, Krichene, Walid, and Steffen Rendle. KDD, 2020.
List 2: Natural Language Processing
  1. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova. NAACL, 2019.
  2. It’s Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners, Timo Schick, Hinrich Schutze. NAACL, 2021.
  3. OneIE: A Joint Neural Model for Information Extraction with Global Features, Ying Lin, Heng Ji, Fei Huang, Lingfei Wu. ACL, 2020.
  4. Future is not One-dimensional: Complex Event Schema Induction by Graph Modeling for Event Prediction, Manling Li, She Li, Zhenhailong Wang, Lifu Huang, Kyunghyun Cho, Heng Ji, Jiawei Han, Clare Voss. EMNLP, 2021.
  5. Autoprompt: Eliciting Knowledge from Language Models Using Automatically Generated Prompts, Taylor Shin, Yasaman Razeghi, Robert L Logan IV, Eric Wallace, and Sameer Singh. EMNLP, 2020.
  6. Prefix-Tuning: Optimizing Continuous Prompts for Generation, Xiang Lisa Li, Percy Liang. ACL, 2021.
List 3: Computer Vision
  1. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, Dosovitskiy, Alexey, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani et al. ICLR, 2020.
  2. Big Self-Supervised Models are Strong Semi-Supervised Learners, Chen, Ting, Simon Kornblith, Kevin Swersky, Mohammad Norouzi, and Geoffrey E. Hinton. NeurIPS, 2020.
  3. Prototypical Contrastive Learning of Unsupervised Representations, Li, Junnan, Pan Zhou, Caiming Xiong, and Steven Hoi. ICLR, 2020.
  4. Zero-shot Natural Language Video Localization, Nam, Jinwoo, Daechul Ahn, Dongyeop Kang, Seong Jong Ha, and Jonghyun Choi. ICCV, 2021.
  5. AnaXNet: Anatomy Aware Multi-Label Finding Classification in Chest X-Ray, Agu, Nkechinyere N., Joy T. Wu, Hanqing Chao, Ismini Lourentzou, Arjun Sharma, Mehdi Moradi, Pingkun Yan, and James A. Hendler. MICCAI, 2021.
List 4: Reinforcement Learning
  1. The sensory neuron as a transformer: Permutation-invariant neural networks for reinforcement learning, Tang, Yujin, and David Ha. NeurIPS, 2021.
  2. Offline meta-reinforcement learning with advantage weighting, Mitchell, Eric, Rafael Rafailov, Xue Bin Peng, Sergey Levine, and Chelsea Finn. ICML, 2021.
  3. Expert-Supervised Reinforcement Learning for Offline Policy Learning and Evaluation, Sonabend-W, Aaron, Junwei Lu, Leo A. Celi, Tianxi Cai, and Peter Szolovits. NeurIPS, 2020.
  4. Interpretation of emergent communication in heterogeneous collaborative embodied agents, Patel, Shivansh, Saim Wani, Unnat Jain, Alexander G. Schwing, Svetlana Lazebnik, Manolis Savva, and Angel X. Chang. ICCV, 2021.
  5. Offline Reinforcement Learning as One Big Sequence Modeling Problem, Janner, Michael, Qiyang Li, and Sergey Levine. NeurIPS, 2021.
  6. Spatial Intention Maps for Multi-Agent Mobile Manipulation, Wu, Jimmy, Xingyuan Sun, Andy Zeng, Shuran Song, Szymon Rusinkiewicz, and Thomas Funkhouser. ICRA, 2021.
List 5: Graph Neural Networks
  1. An Attention-based Graph Neural Network for Heterogeneous Structural Learning, Huiting Hong, Hantao Guo, Yucheng Lin, Xiaoqing Yang, Zang Li, Jieping Ye. AAAI, 2020.
  2. DropEdge: Towards Deep Graph Convolutional Networks on Node Classification, Yu Rong, Wenbing Huang, Tingyang Xu, Junzhou Huang. ICLR, 2020.
  3. Graph Neural Networks: A Review of Methods and Applications, Jie Zhou, Ganqu Cui, Zhengyan Zhang, Cheng Yang, Zhiyuan Liu, Maosong Sun. AI Open, 2020.
  4. Memory-Based Graph Networks, Amir Hosein Khasahmadi, Kaveh Hassani, Parsa Moradi, Leo Lee, Quaid Morris. ICLR, 2020.
List 6: Machine Learning and Security
  1. Hidden Backdoors in Human-Centric Language Models, Shaofeng Li, Hui Liu, Tian Dong, Benjamin Zi Hao Zhao, Minhui Xue, Haojin Zhu, and Jialiang Lu. CCS, 2021.
  2. Adversarial watermarking transformer: Towards tracing text provenance with data hiding, Abdelnabi, Sahar, and Mario Fritz. IEEE S&P, 2021.
  3. You autocomplete me: Poisoning vulnerabilities in neural code completion, Schuster, Roei, Congzheng Song, Eran Tromer, and Vitaly Shmatikov. USENIX Security, 2021.
  4. Concealed Data Poisoning Attacks on NLP Models, Wallace, Eric, Tony Z. Zhao, Shi Feng, and Sameer Singh. NAACL, 2021.
  5. Poisoning the Unlabeled Dataset of Semi-Supervised Learning, Carlini, Nicholas. USENIX Security, 2021.
  6. Data Poisoning Attacks to Deep Learning Based Recommender Systems, Hai Huang, Jiaming Mu, Neil Zhenqiang Gong, Qi Li, Bin Liu, and Mingwei Xu. NDSS, 2021.

Exam Questions

Exam questions are available here: PDF

Grading Scale

The exam will ultimately be graded on a scale as detailed in the Ph.D. Student Handbook, as replicated here.

  1. Student's performance is such that the committee considers the student unable to do Ph.D-level work in computer science.
  2. While the student adequately understands the content of the work, the student is deficient in one or more factors listed for assessment under score value of 2. A score of 1 is the minimum necessary for an MS-level pass.
  3. Performance appropriate for students preparing to do Ph.D-level work. Prime factors for assessment include being able to distinguish good work from poor work, and explain why; being able to synthesize the body of work into an assessment of the state-of-the-art on a problem (as indiciated by the collection of papers); and being able to identify open problems and suggest future work.
  4. Excellent performance, beyond that normally expected or required for a Ph.D. student.