Spring 2021 Data and Information (AI, Data Mining, Digital Libraries, Information Retrieval, ML) Ph.D.
Qualifying Examination
Exam Available January 6 (Wednesday), 2021
Examining Faculty
- Chang-Tien Lu (Chair, Primary Contact)
- Hoda Eldardiry
- Edward Fox
- Lenwood Heath
- Ismini Lourentzou
- Naren Ramakrishnan
- Chandan Reddy
- Liqing Zhang
Philosophy of Examination
- Since students vary in their abilities regarding written and oral communication,
and since doctoral students are expected to have some skill with each modality, students will explain their solutions both in writing and orally. Solutions
will be graded based on their clarity as a result of the union of these modes
of expression.
- Students are expected to have studied all works in the reading list. Any
pre-requisite or background knowledge required to understand the works in
the reading list also is expected to be acquired by the student.
- Students are expected to understand those works at the level of a doctoral
student who has taken the equivalent of courses such as CS5525 Data Analytics, CS5604 Information Storage and Retrieval, CS5614 Database Management Systems, and CS5984 Introduction to Data Mining.
- Students are expected to be able to understand a real situation/context/problem
in the information/data area, to be able to synthesize/apply the findings
of multiple papers from the reading list to such problems, and to be able
to formulate an answer outlining how they would approach and solve that problem.
- Once students have notified the Computer Science Department of their intention to take the Data and Information Ph.D. Qualifier Exam, they may withdraw from taking the exam at any point prior to the public release of the exam questions. Once the exam questions are released, the exam is considered "in progress" and withdrawal is prohibited. Students with questions about this policy should contact the exam chair directly.
Process and Format
- The examination includes a takehome examination that is expected to be administered
in the beginning of 2021.
- At the beginning of the examination period, all students will receive a
document that contains two questions.
- By the end of the examination period, each student must turn in a written
solution to one of those questions, i.e., the student must choose one out
of two. It is expected that the solutions will be no longer than 8
pages (excluding references) at 10 point using IEEE 2-column style format (Word doc, Overleaf LaTeX).
- Also at this time, each student must turn in a PowerPoint presentation or
equivalent that will be used for an oral explanation of the written solution.
- Written solutions might be expected to have the following approximate format
(although detailed guidelines will be provided during the exam):
- a motivation section making clear the context of the problem/situation
- a clear statement of the problem in terms of concepts and terminology
in the information/data area, that addresses the situation/context
- a review of related literature, drawn mostly from multiple relevant
works in the reading list
- a statement of how the problem can be approached
- a description of the approach to solve the problem
It is important that any assumptions made be clearly stated in the written
solution.
- Finally, oral presentations recordings must follow what is given in the previously turned-in
PowerPoint file or equivalent. They must be completed within a 15-minute period.
A tutorial on how to record a talk with Zoom can be found at: Zoom video recording tutorial.
Students can use other professional screen recording software (e.g., Camtasia, VidGrid) if they have access. Please make sure to include a picture-in-picture window so that the presenter is always visible in the video.
- Each solution will be graded by at least two faculty members. A combined grade
will then be assigned for each student based on faculty input by the area
committee, on a scale of 0-3, as is called for by GPC policies.
- The Honor Code is in effect. All work should be individual.
Schedule
- 12/1 (Tuesday), 2020: Complete Reading List Available.
- 12/7 (Monday), 2020: Student Registration.
- 1/6 (Wednesday), 2021: Written Examination Available.
- 1/19 (Tuesday) 10PM, 2021: Written Examination Due.
- 1/22 (Friday) 10PM, 2021: PowerPoint Presentation File Due.
- 1/26 (Tuesday) 10PM, 2021: Oral Presentation Video Due.
- 1/27-2/9: Committee Evaluation
- 2/15 (Monday), 2021: Exam Results due to GPC.
Reading List
- Carl Lagoze, Dean Krafft, Tim Cornwell, Naomi Dushay, Dean Eckstrom, and John Saylor. 2006. Metadata aggregation and "automated digital libraries": a retrospective on the NSDL experience. In Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries (JCDL '06). ACM, New York, NY, USA, 230-239. DOI: https://doi-org.ezproxy.lib.vt.edu/10.1145/1141753.1141804
- Debasis Ganguly, Dwaipayan Roy, Mandar Mitra, and Gareth J.F. Jones. 2015. Word Embedding based Generalized Language Model for Information Retrieval. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '15). ACM, New York, NY, USA, 795-798. DOI: https://doi-org.ezproxy.lib.vt.edu/10.1145/2766462.2767780
- Qingyao Ai, Liu Yang, Jiafeng Guo, and W. Bruce Croft. 2016. Improving Language Estimation with the Paragraph Vector Model for Ad-hoc Retrieval. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval (SIGIR '16). ACM, New York, NY, USA, 869-872. DOI: https://doi-org.ezproxy.lib.vt.edu/10.1145/2911451.2914688
- Sujatha Das Gollapalli, Prasenjit Mitra, and C. Lee Giles. 2011. Ranking authors in digital libraries. In Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries (JCDL '11). ACM, New York, NY, USA, 251-254. DOI: https://dl-acm-org.ezproxy.lib.vt.edu/doi/10.1145/1998076.1998123
- Susan Dumais, Edward Cutrell, J. J. Cadiz, Gavin Jancke, Raman Sarin, and Daniel C. Robbins. 2016. Stuff I've Seen: A System for Personal Information Retrieval and Re-Use. SIGIR Forum 49, 2 (January 2016), 28-35. DOI=http://dx.doi.org.ezproxy.lib.vt.edu/10.1145/2888422.2888425
- Alan F. Smeaton and Jamie Callan. 2005. Personalisation and recommender systems in digital libraries. Int. J. Digit. Libr. 5, 4 (August 2005), 299-308. DOI: https://doi-org.ezproxy.lib.vt.edu/10.1007/s00799-004-0100-1
- Ying Liu, Kun Bai, Prasenjit Mitra, and C. Lee Giles. 2007. TableSeer: automatic table metadata extraction and searching in digital libraries. In Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries (JCDL '07). ACM, New York, NY, USA, 91-100. DOI=http://dx.doi.org.ezproxy.lib.vt.edu/10.1145/1255175.1255193
- Lijing Wang, Jiangzhuo Chen, and Madhav Marathe. Defsi: Deep learning based epidemic forecasting with synthetic information. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 9607–9612, 2019.
- Bijaya Adhikari, Xinfeng Xu, Naren Ramakrishnan, and B Aditya Prakash. Epideep: Exploiting embeddings for epidemic forecasting. In Proceedings of the 25th ACMSIGKDD International Conference on Knowledge Discovery & Data Mining, pages 577–586, 2019.
- Minseok Kim, Junhyeok Kang, Doyoung Kim, Hwanjun Song, Hyangsuk Min,Youngeun Nam, Dongmin Park, and Jae-Gil Lee. Hi-covidnet: Deep learning approach to predict inbound covid-19 patients and case study in south korea. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 3466–3473, 2020.
https://dl-acm-org.ezproxy.lib.vt.edu/doi/pdf/10.1145/3394486.3412864
- Qianyue Hao, Lin Chen, Fengli Xu, and Yong Li. Understanding the urban pandemic spreading of covid-19 with real world mobility data. In Proceedings of the 26th ACMSIGKDD International Conference on Knowledge Discovery & Data Mining, pages 3485–3492, 2020.
https://dl-acm-org.ezproxy.lib.vt.edu/doi/10.1145/3394486.3412860
- Liang Zhao, Jiangzhuo Chen, Feng Chen, Fang Jin, Wei Wang, Chang-Tien Lu, and Naren Ramakrishnan. Online flu epidemiological deep modeling on disease contact network. GeoInformatica, Vol. 24, pp. 443–475, 2020. SpringerLink
- Diya Li, Lifu Huang, Heng Ji, Jiawei Han. Biomedical Event Extraction based on Knowledge-driven Tree-LSTM. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1, pages 1421-1430, 2019.