CS5604 Information Storage and Retrieval

Spring 2024
Instructor: Chang-Tien Lu
Meeting Time: Tuesday 5:00-7:45 PM
Classroom: NVC 114
Office: NVC 312
Office Hour: M 11AM-noon, W 4-5PM, or by appointment
TA Office Hour: NVC R317, M 10AM-11AM, W 3-4PM, or by appointment

Course Description:

This course treats a specific topic of current research interest in the area of information storage and retrieval. The main objective of this class is to study research methods and literature in information storage and retrieval systems, including analyzing, indexing, representing, storing, searching, retrieving, processing, and presenting information and documents using fully automatic systems. The information may be in the form of text, hypertext, multimedia, or hypermedia. The systems are based on various models, e.g., Boolean logic, fuzzy logic, probability theory, etc., and they are implemented using inverted files, relational thesauri, special hardware, and other approaches. Core research skills of literature analysis, innovation, evaluation of new ideas, and communication are emphasized via homeworks and projects. Most students may like to get a broad overview of the research topics, methodologies, major results, open problems, and potential future research directions. Specifically, this course will help students:

Topics

Textbook

Modern Information Retrieval:
The Concepts and Technology behind Search (2nd Edition)
Ricardo Baeza-Yates and Berthier Ribeiro-Neto
ACM Press Books, 2011.
ISBN-10: 0321416910
ISBN-13: 978-0321416919

Recommended reading

Introduction to Information Retrieval
Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze
Cambridge University Press, 2008.
ISBN-10: 0521865719
ISBN-13: 978-0521865715

Supplementary Material

A collection of papers.

Tentative Schedule:

                The schedule indicates the concepts and material to be covered in each week under the column labeled "Topics".

Week Date Lecture Topics Read Due
1 1/16 Introduction, User Interfaces for Search Chap 1, 2  
2 1/23 Modeling Chap 3  
3 1/30 Modeling, Retrieval Evaluation Chap 3, 4 HW (Paper Critique)
4 2/6 Relevance Feedback and Query Expansion Chap 5  
5 2/13 Documents: Languages & Properties Chap 6, 7  
6 2/20 Queries: Languages & Properties Chap 6, 7 Project Proposal
7 2/27 Indexing and Searching + (Midterm Exam I) Chap 9  
8 3/5 (Spring Break)
9 3/12 Indexing and Searching, Web Retrieval Chap 9, 11  
10 3/19 Web Retrieval, Multimedia Information Retrieval Chap 11, 14 Project Checkpoint I
11 3/26 Multimedia Information Retrieval Chap 14  
12 4/2 Midterm Exam II    
13 4/9

Web Crawling

Chap 12 Project Checkpoint II
12-min Checkpoint Presentation
14 4/16 Parallel and Distributed IR
Guest Lecture
Chap 10  
15 4/23 Final Project Presentation I   SPOT Survey
16 4/30 Final Project Presentation II   Project Report (Due 5/7, 8PM)


Examinations and Assignments:

There are three homework assignments. Homework assignments are due at the start of class. If you have an excused absence from a class, turn in the homework assignment prior to the class session. All assignments must have your name, student ID and course name/ number. 

The weighting scheme used for grading is: HW Assignment: 3%, Midterm I: 25%, Midterm II: 35%, Final Project: 37% (Final Presentation: 12%, Final Report: 25%), Class Discussion and Participation: 5%. Students are responsible for all material covered in lectures. Examinations will heavily emphasize conceptual understanding of the material.

Late Submission Policy: 

Assignments must be handed in at the beginning of the class on the specified due date (Tuesday of designated week).A penalty of 30% will be deducted from your score for the first 24-hour period if your assignment is late. A penalty of 70% will be deducted from your score for >= 24-hour period. Weekend days will be counted. For assignments, you are encouraged to type your answers. 

Honor System: 

All work is to be done under the provisions of the Virginia Tech Honor System. Students can discuss the interpretation of an assignment, however, the actual solution to problems must be one's own. The tenets of the Virginia Tech Graduate Honor Code will be strictly enforced in this course, and all assignments shall be subject to the stipulations of the Graduate Honor Code. Whenever I learn that a student has violated the honor code, I am obligated to report the violation. For more information on the Graduate Honor Code, please refer to the GHS Constitution, located online at http://graduateschool.vt.edu/academics/expectations/graduate-honor-system/ghs-constitution.html.

Disabilities:

Any student that is in need of special accommodations due to a disability, as recognized by the Americans with Disabilities Act, should contact the Services for Students with Disabilities (SSD) in the Dean of Students Office. Students with disabilities are responsible for self-identification.  To be eligible for services, documentation of the disability from a qualified professional must be presented to SSD upon request. Academic adjustments may include, but are not limited to: priority registration, auxiliary aids, program and course adjustment, exam modifications, oral or sign language interpreters, cassette taping of text/materials, note takers/readers, or assistive technology.

If you need adaptation or accommodations because of a disability (learning disability, attention deficit disorder, psychological, physical, etc.), if you have emergency medical information to share with me, or if you need special arrangements in case the building must be evacuated, please make an appointment with me as soon as possible. If you need captioning for videos, please let me know no later than two weeks in advance of date on syllabus for reviewing.

Helpful Comments: 

This class is Very Interesting and Useful for audience interested in information retrieval systems research, as well as in Master/Doctoral projects. We will explore a number of current research areas which are very important yet fairly open for research. Storage and retrieval issues continue to be the heart of information management in areas ranging from business to scientific domains.

To get full benefit out of the class you have to work independently and regularly. Read the textbook and papers before the meeting and bring comments for discussion. Plan to spend at least 9 hrs a week on this class doing projects or reading.

Good Luck, and Welcome to CS 5604!
Chang-Tien Lu