Data and Information Qualifying Exam

Data and Information Ph.D. Qualifier Exam

Spring 2022

Examining Faculty

    Dr. Bimal Viswanath (Chair)
    Dr. Daphne Yao
    Dr. Ismini Lourentzou
    Dr. Lifu Huang
    Dr. Anuj Karpatne
    Dr. Peng Gao

Registered Students

    Sijia Wang
    Sifat Muhammad Abdullah
    Connor Weeks
    Xavier Pleimling
    Kenneth Neba
    Amarachi Blessing Mbakwe
    Sikiru Adewale
    Muntasir Wahed
    Sareh Ahmadi
    Blessy Antony
    Tanmoy Sarkar Pias
    Mehmet Oguz Yardimci
    Jostein Barry-Straume
    Brannon King
    Alvi Md Ishmam
    Makanjuola Ogunleye
    Shuaicheng Zhang
    Yanshen Sun

Tentative Instructions

Send email to vbimal@cs.vt.edu to be registered. Your participation in the survey did not register you. We need to hear from you!
First, a paper reading list will be released. At the beginning of the examination period, all students will receive a document that contains questions.
By the end of the examination period, each student must turn in a written solution to those questions. The solutions will be no longer than 8 pages (excluding references) at 11 point font or larger using a format TBA.
Written solutions should take the form of a scientific paper. It should include at least the following:
- a motivation section making clear the context of the problem/situation;
- a clear statement of the problem in terms of concepts and terminology in the information/data area, that addresses the situation/context;
- a review of related literature, drawn partially from multiple relevant works in the reading list, but must include additional references found by the student during a thorough literature search;
- descriptions of approaches to solve the problem; and
- an evaluation plan for how such approaches would be validated.
Students will then provide an oral presentation detailing their solution. They must be completed within a 15 minute period, in which 10 minutes are for presentation and 5 minutes for answering questions posed by faculty examiners.
Each solution will be graded by at least 2 faculty members. A combined grade will then be assigned for each student based on all faculty input by the area committee, on a scale of 0-3, as is called for by GPC policies.

Early Withdrawal Policy

A student registered for the PhD qualifier exam may withdraw at any point of time before the early withdrawal deadline, which is Jan 1, 2022. After this date, withdrawal is prohibited. Students with questions about this policy should contact the exam chair directly.

Academic Integrity

Discussions among students of the papers identified for the exam are reasonable up until the date the exam is released publicly. Once the exam questions are released, we expect all such discussions will cease as students are required to conduct their own work entirely to answer the qualifier questions. This examination is conducted under the University's Graduate Honor System Code. Students are encouraged to draw from other papers than those listed in the exam to the extent that this strengthens their arguments. However, the answers submitted must represent the sole and complete work of the student submitting the answers. Material substantially derived from other works, whether published in print or found on the web, must be explicitly and fully cited. Note that your grade will be more strongly influenced by arguments you make rather than arguments you quote or cite.

Exam Schedule

    12/01/2021: Release of reading list
    12/06/2021: Deadline for students to commit to exam
    1/1/2022: Last day to withdraw
    1/11/2022: Release of written exam
    1/27/2022: Student solutions to written exam due
    Beginning of Feb: Oral exams.

Reading List

The reading lists below cover the following topics: (1) Data Mining and Information Retrieval, (2) Natural Language Processing, (3) Computer Vision, (4) Reinforcement Learning, (5) Graph Neural Networks, (6) Machine Learning and Security. You may choose choose any one of these lists for your exam. You are expected to significantly expand on your selected list while preparing your written solution.

List 1: Data Mining and Information Retrieval

Bitfunnel: Revisiting signatures for search, Goodwin, Bob, Michael Hopcroft, Dan Luu, Alex Clemmer, Mihaela Curmei, Sameh Elnikety, and Yuxiong He. SIGIR, 2017.
Controlling fairness and bias in dynamic learning-to-rank, Morik, Marco, Ashudeep Singh, Jessica Hong, and Thorsten Joachims. SIGIR, 2020.
Neural collaborative filtering vs. matrix factorization revisited, Rendle, Steffen, Walid Krichene, Li Zhang, and John Anderson. ACM RecSys, 2020.
A stochastic treatment of learning to rank scoring functions, Bruch, Sebastian, Shuguang Han, Michael Bendersky, and Marc Najork. WSDM, 2020.
On sampled metrics for item recommendation, Krichene, Walid, and Steffen Rendle. KDD, 2020.

List 2: Natural Language Processing

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova. NAACL, 2019.
It’s Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners, Timo Schick, Hinrich Schutze. NAACL, 2021.
OneIE: A Joint Neural Model for Information Extraction with Global Features, Ying Lin, Heng Ji, Fei Huang, Lingfei Wu. ACL, 2020.
Future is not One-dimensional: Complex Event Schema Induction by Graph Modeling for Event Prediction, Manling Li, She Li, Zhenhailong Wang, Lifu Huang, Kyunghyun Cho, Heng Ji, Jiawei Han, Clare Voss. EMNLP, 2021.
Autoprompt: Eliciting Knowledge from Language Models Using Automatically Generated Prompts, Taylor Shin, Yasaman Razeghi, Robert L Logan IV, Eric Wallace, and Sameer Singh. EMNLP, 2020.
Prefix-Tuning: Optimizing Continuous Prompts for Generation, Xiang Lisa Li, Percy Liang. ACL, 2021.

List 3: Computer Vision

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, Dosovitskiy, Alexey, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani et al. ICLR, 2020.
Big Self-Supervised Models are Strong Semi-Supervised Learners, Chen, Ting, Simon Kornblith, Kevin Swersky, Mohammad Norouzi, and Geoffrey E. Hinton. NeurIPS, 2020.
Prototypical Contrastive Learning of Unsupervised Representations, Li, Junnan, Pan Zhou, Caiming Xiong, and Steven Hoi. ICLR, 2020.
Zero-shot Natural Language Video Localization, Nam, Jinwoo, Daechul Ahn, Dongyeop Kang, Seong Jong Ha, and Jonghyun Choi. ICCV, 2021.
AnaXNet: Anatomy Aware Multi-Label Finding Classification in Chest X-Ray, Agu, Nkechinyere N., Joy T. Wu, Hanqing Chao, Ismini Lourentzou, Arjun Sharma, Mehdi Moradi, Pingkun Yan, and James A. Hendler. MICCAI, 2021.

List 4: Reinforcement Learning

The sensory neuron as a transformer: Permutation-invariant neural networks for reinforcement learning, Tang, Yujin, and David Ha. NeurIPS, 2021.
Offline meta-reinforcement learning with advantage weighting, Mitchell, Eric, Rafael Rafailov, Xue Bin Peng, Sergey Levine, and Chelsea Finn. ICML, 2021.
Expert-Supervised Reinforcement Learning for Offline Policy Learning and Evaluation, Sonabend-W, Aaron, Junwei Lu, Leo A. Celi, Tianxi Cai, and Peter Szolovits. NeurIPS, 2020.
Interpretation of emergent communication in heterogeneous collaborative embodied agents, Patel, Shivansh, Saim Wani, Unnat Jain, Alexander G. Schwing, Svetlana Lazebnik, Manolis Savva, and Angel X. Chang. ICCV, 2021.
Offline Reinforcement Learning as One Big Sequence Modeling Problem, Janner, Michael, Qiyang Li, and Sergey Levine. NeurIPS, 2021.
Spatial Intention Maps for Multi-Agent Mobile Manipulation, Wu, Jimmy, Xingyuan Sun, Andy Zeng, Shuran Song, Szymon Rusinkiewicz, and Thomas Funkhouser. ICRA, 2021.

List 5: Graph Neural Networks

An Attention-based Graph Neural Network for Heterogeneous Structural Learning, Huiting Hong, Hantao Guo, Yucheng Lin, Xiaoqing Yang, Zang Li, Jieping Ye. AAAI, 2020.
DropEdge: Towards Deep Graph Convolutional Networks on Node Classification, Yu Rong, Wenbing Huang, Tingyang Xu, Junzhou Huang. ICLR, 2020.
Graph Neural Networks: A Review of Methods and Applications, Jie Zhou, Ganqu Cui, Zhengyan Zhang, Cheng Yang, Zhiyuan Liu, Maosong Sun. AI Open, 2020.
Memory-Based Graph Networks, Amir Hosein Khasahmadi, Kaveh Hassani, Parsa Moradi, Leo Lee, Quaid Morris. ICLR, 2020.

List 6: Machine Learning and Security

Hidden Backdoors in Human-Centric Language Models, Shaofeng Li, Hui Liu, Tian Dong, Benjamin Zi Hao Zhao, Minhui Xue, Haojin Zhu, and Jialiang Lu. CCS, 2021.
Adversarial watermarking transformer: Towards tracing text provenance with data hiding, Abdelnabi, Sahar, and Mario Fritz. IEEE S&P, 2021.
You autocomplete me: Poisoning vulnerabilities in neural code completion, Schuster, Roei, Congzheng Song, Eran Tromer, and Vitaly Shmatikov. USENIX Security, 2021.
Concealed Data Poisoning Attacks on NLP Models, Wallace, Eric, Tony Z. Zhao, Shi Feng, and Sameer Singh. NAACL, 2021.
Poisoning the Unlabeled Dataset of Semi-Supervised Learning, Carlini, Nicholas. USENIX Security, 2021.
Data Poisoning Attacks to Deep Learning Based Recommender Systems, Hai Huang, Jiaming Mu, Neil Zhenqiang Gong, Qi Li, Bin Liu, and Mingwei Xu. NDSS, 2021.

Exam Questions

Exam questions are available here: PDF

Grading Scale

The exam will ultimately be graded on a scale as detailed in the Ph.D. Student Handbook, as replicated here.

Student's performance is such that the committee considers the student unable to do Ph.D-level work in computer science.
While the student adequately understands the content of the work, the student is deficient in one or more factors listed for assessment under score value of 2. A score of 1 is the minimum necessary for an MS-level pass.
Performance appropriate for students preparing to do Ph.D-level work. Prime factors for assessment include being able to distinguish good work from poor work, and explain why; being able to synthesize the body of work into an assessment of the state-of-the-art on a problem (as indiciated by the collection of papers); and being able to identify open problems and suggest future work.
Excellent performance, beyond that normally expected or required for a Ph.D. student.