CS 5864: Learning-based Computer Vision
Fall 2023
Department of Computer Science, Virginia Tech
Location: 211 Randolph Hall, 460 Old Turner St. Blacksburg, VA.
Meeting time: Monday and Wednesday, 2:30 PM - 3:45 PM
Instructor: Chris Thomas
E-mail: The primary communication points for course-related matters should be Canvas discussion boards and the course TA. If you need to reach out to me, please message me through the messaging system on Canvas rather than sending e-mail.
Office: 378 Data and Decision Sciences Building
Office hours: 4:00 PM - 5:00 PM Wednesdays. My Zoom is linked here.
TA: Alvi Ishmam
TA’s office hours: 5:00 PM - 6:00 PM Mondays. Zoom is linked here.
Exam section: 14M. December 9, 2023, 10:05AM - 12:05PM.
Course overview
Catalog description: Comprehensive introduction to modern computer vision. Fundamental concepts in computer vision and pattern recognition such as filtering, alignment, and matching. Survey of computer vision tasks, models, and learning techniques related to vision architectures, visual recognition methods, multimodal and generative models, and select advanced topics.
Prerequisites: CS 5805 (Machine Learning) is the official prerequisite for this course. If you are enrolling without this prerequisite, you must have a background in machine learning and be familiar with tasks such as preprocessing, classification, and clustering, in addition to fundamental concepts in probability theory, linear algebra, calculus, and neural networks. Some prior python programming experience is also highly desirable because all homework will be done in python. Concerned students should speak to the instructor as soon as possible to ensure they have sufficient background to successfully complete this course.
Format: This graduate-level course provides a comprehensive introduction to modern computer vision. The first part of the course introduces fundamental concepts in computer vision and pattern recognition, such as filtering, alignment, and matching. The remainder of the course provides an in-depth survey of computer vision tasks, models, and learning techniques. Topics include vision architectures, visual recognition methods, multimodal and generative models, and select advanced topics as determined by the instructor. The course format will include lectures, exams, homework assignments, and a group project. By the end of the course, students are expected to able to understand, apply, and extend state-of-the-art computer vision methods.
Textbooks: No textbook is required as no single text covers all of the necessary information for the course, however, recommended texts include:
- Computer Vision: Algorithms and Applications (2nd ed.) by Richard Szeliski (available for free online)
- Deep Learning by Ian Goodfellow, Yoshua Bengio, and Aaron Courville (available for free online)
- Computer Vision: A Modern Approach (2nd ed.) by David Forsyth and Jean Ponce
Canvas: All slides, course schedule, grades, and other materials for this course will be released and updated on Canvas. In general, before reaching out to the TA or instructor you should post a discussion on Canvas. This way all students in the class can benefit from the discussion. Students are strongly encouraged to answer questions from others and engage on Canvas. However, you must not reveal key components or steps of solutions or otherwise make assignments trivial for others in the course. You may not share solution code or parameter values with others, even if you are seeking debugging assistance as this may demonstrate a potential solution to other students. Helping others with general issues unrelated to assignment specifics (e.g. how matrix operations work in Pytorch), to understand how to use the server, or to understand an error they are receiving is permissible as long as this does not reveal core parts of the solution. In general, when you need help with specific assignment-related code and can not ask without revealing key aspects of your solution, you should contact the TA or utilize the TA’s office hours. You are strongly encouraged to sign up for Canvas e-mail and/or push notifications so that you are notified of new discussions and course updates posted to Canvas.
Course learning objectives
- Analyze fundamental concepts in low-level image processing and how these concepts manifest in learning-based computer vision approaches
- Apply approaches for addressing classic vision problems such as object detection
- Apply machine learning techniques in the context of computer vision
- Understand the internal components of state-of-the-art computer vision architectures and critique trade-offs and design choices
- Utilize techniques for learning powerful vision representations in the absence of strong supervision
- Adapt learning-based computer vision methods to reason jointly on visual data and non-visual modalities
- Generate visual data using state-of-the-art generative architectures
- Critically evaluate methods’ strengths and weaknesses and propose new solutions to address them
- Synthesize current research trends and emerging topics in the field in order to discover research opportunities of interest
Topical outline
- Classical Topics and Foundations
- Field overview, growth, applications, historical development, common vision tasks, and current active research areas
- Features and filters
- Grouping and matching
- Image formation and geometric vision
- Computer Vision Today
- Basic concept review
- Vision architectures
- Visual representation learning
- Object detection and semantic segmentation
- Multimodal vision
- Generative vision
- Advanced Topics (depending on time, interest, and instructor choice)
Requirements
Your final grade in the course will be based on your homework assignments, final project, and exams which will be weighted as follows:
Component | Weight |
---|---|
Homework assignments | 30% |
Final project | 25% |
Midterm exam | 20% |
Final exam | 25% |
The grade scale for the term will be:
Percentage | 100 | 90 | 89 | 85 | 80 | 79 | 75 | 70 | 69 | 65 | 60 | <60 |
Letter | A | A- | B+ | B | B- | C+ | C | C- | D+ | D | D- | F |
Virginia Tech does not award A+ grades. Any component of the course may be curved at instructor discretion. No grades will be lowered as a result of a curve.
Participation: You are encouraged to regularly and meaningfully participate in this course. Participation could include answering and responding to discussions on Canvas from other class members or participating in class. While you will not explicitly receive a participation grade in this course, students whose final grade is highly borderline who engaged throughout the semester are more likely to receive a boost should they need one than students who did not.
Class policies
Exams: There will be a midterm and a final exam. The exams will be closed book and notes. The final exam will be given during the exam section listed above. You may not wear a smart watch or consult any other materials during the exam.
Regrades: If you have a question about a grade you received or request a regrade of an assignment, you must submit a request to the TA within a week of receiving your score. The TA will not consider regrade requests after a week. In general, the TA has discretion to assign grades and partial credit which fairly reflect your performance on the assignment. Therefore, regrade requests should be reserved to exceptional cases where you believe the grade is in error or the TA may have misunderstood or overlooked aspects of your solution. Please do not file regrade requests for small judgment call decisions. The time for regrades may be shortened at the end of the semester. Regrade requests for your midterm must be made to the instructor within one week of receiving your grade and again should be reserved for exceptional cases or mistakes. If you do not submit a regrade request within the time allotted, your grade is final. The final exam grades will be posted to the web as soon as they are graded. In general, you must contact the instructor immediately if you believe there is an issue with your final exam grade so it can be looked at before final grades are uploaded.
Formatting: All reports with your homework should be submitted as typed and legible PDF files. Handwritten work will not be accepted.
Submission: All submissions in this class will be through Canvas. It is your responsibility to make sure all submissions in this class are complete. Once you submit, please download the file again and verify it opens successfully. Corrupted files will receive no credit. In the event of an outage on Canvas that affects submission, you may e-mail the TA your files as a fallback (before the deadline).
GPU Access: Your final project and some homeworks will likely require use of a GPU. Some GPU resources you can utilize include Google Colab, the Advanced Research Computing center at Virginia Tech, and the GLogin cluster run by the department. Please take note of GPU limitations when designing your final project.
Late policy
Free late dates for homework assignments: Life and conference deadlines happen. To ensure you can do your best work on each assignment, each student will receive a total of five free late days that apply to all homework assignments throughout the semester. The late days are measured in days, not hours or minutes, i.e. a submission that is one minute late is still considered one day late and will use one of your late days. The five late days are to be used across all assignments, not per assignment. You do not need to request an extension and you will not receive a late penalty as long as you stay within your five day budget across the semester.
Free late dates for class project: Each group will get two late days that apply to the project deliverables. These apply only to the class project and will be subtracted from each group.
Late penalty: Every day that your assignment or project deliverable is late will result in your new final score for that assignment or deliverable being multiplied by 0.5 if you have used up your late days. The entire assignment must be submitted all at once, e.g. you may not submit parts of an assignment at different times to only receive a late penalty on part of the assignment.
Extension requests: To ensure fairness, extensions will not be considered absent documented extraordinary circumstances. You should also inform the instructor before the deadline of the assignment or exam. You will also likely be required to submit documentation to the Dean of Students Office for verification. Please note that the late days are being provided to help you deal with routine illnesses, injuries, paper deadlines, interviews, etc. These types of issues do not qualify for an additional extension.
Incomplete requests: Should you be unable to complete the requirements of this course during the semester because of extraordinary circumstances, you may request an incomplete through the last day of class. You will also likely be required to submit documentation to the Dean of Students Office for verification.
Academic accomodations
Virginia Tech welcomes students with disabilities into the University’s educational programs. The University promotes efforts to provide equal access and a culture of inclusion without altering the essential elements of coursework. If you anticipate or experience academic barriers that may be due to disability, including but not limited to ADHD, chronic or temporary medical conditions, deaf or hard of hearing, learning disability, mental health, or vision impairment, please contact the Services for Students with Disabilities (SSD) office (540-231-3788, ssd@vt.edu, or visit https://ssd.vt.edu). If you have an SSD accommodation letter, please e-mail me as early in the semester as possible to deliver your letter and discuss your accommodations. You must give me reasonable notice to implement your accommodations, which can be logistically difficult later in the semester (e.g. scheduling special exam proctors or rooms), thus please notify me as soon as possible if you have an accommodation letter.
Academic integrity
The tenets of the Virginia Tech Graduate Honor Code will be strictly enforced in this course, and all assignments shall be subject to the stipulations of the Graduate Honor Code. For more information on the Graduate Honor Code, please refer to the GHS Constitution. You are highly encouraged to discuss homework assignments on Canvas with one another on the discussion forums. However, your coding and writing of your reports are meant to be your own work and the product of your own thought processes. Key aspects of the solution must be your own. You are free to leverage external resources, e.g. ChatGPT or the web, as long as what you are using doesn’t shape or provide key aspects of your solution. For example, if you were asked to write code to convolve a filter with an image without using a library, it would be inappropriate to ask ChatGPT for the solution or to look at how forums or reference libraries do it. However, using resources to understand convolution (e.g. animations) or an error code with Pytorch or Numpy is acceptable as these do not take away from you ultimately implementing the solution on your own. In general, if something is solving major parts of a problem for you or revealing steps which then shape key parts of your solution, this is not permissible. In all cases, any external sources used must explicitly be acknowledged in the report. For ChatGPT or similar models, this means submitting the entire chat or interaction with your report.
For exams, it goes without saying that students are not permitted to collaborate in any way during exams. Students believed to have committed an academic integrity violation such as copying or turning in work that is not entirely their own will be reported to the University and may receive disciplinary penalty, which will likely include automatically failing the entire course. Because this course will be a recurring course in the CS department, we ask that you do not post your solution code for homeworks online or distribute them to others. If you have any questions as to whether something runs afoul of this policy, please contact the TA before using the resource or submitting the assignment.
Emergencies and medical conditions
If you have an emergency or medical condition, you must inform the instructor before the deadline of the assignment or exam. You may be required to submit documentation of the emergency or condition to the Dean of Students Office.
Final project
This course will conclude with a student-driven group project, with a report due at the end of the course. In order to make meaningful progress on a project, groups must be 3-4 students. Note that larger groups will have higher expectations. Since a substantial portion of your final grade depends on the project, each student is expected to make a significant contribution. All projects must involve implementation of a computer vision algorithm along with a thorough evaluation. The goal is for your final project report to resemble a short conference paper. The topic of your final project is open-ended and groups are free to chose a topic of their choosing. However, final projects should at least fall into one of the following broad categories:
- Extend one of the methods we covered in class in a novel way, complete with a thorough experimental evaluation
- Propose a novel method or approach for solving a vision problem we discussed in class or that is already known in the literature and thoroughly evaluate it;
- Propose a completely new vision problem and explain why it is significant and needs solving, implement an approach to solve the problem, and evaluate the approach
In summary, your final project can address any computer vision problem, either existing or new, as long as you propose a new method or significant extension or modification of existing methods. Applications of existing methods or techniques to new datasets or problems without at least some technical novelty is not sufficient for the final project. All projects must be thoroughly experimentally evaluated. This may involve benchmarking existing relevant work in the case of a new problem or applying your method on standard benchmarks and computing standard metrics. Projects that overlap in some way with your existing research are OK, but design, conception, implementation, evaluation, and delivery of the project should be the result of the students, not other faculty members, and should be specific to this course. However, your project can build upon or extend your prior research efforts without an issue.
Each student in the group should document everything they contributed to the project and how work was divided among group members. Students will be asked to provide a review of each other students’ contributions in their group as a form of peer review at the end of the course. Please take this seriously and recognize that free-riders will be significantly penalized.
Project proposal
The project proposal should be two pages long (excluding references) and must use the CVPR latex template. Your project proposal should include the following:
- Project title
- Group members
- Group logistics (how will you communicate, how will the group regularly meet to discuss the project, etc.)
- A clear problem statement which describes the goal of the project.
- A thorough literature review. Make sure you thoroughly search the literature before you start writing. You might find that your idea has already been taken. The literature review should resemble that in a CVPR conference paper and should cite existing work. It should clearly show how the proposed project does something the prior work you cite does not.
- A detailed description of the proposed approach. The authors should describe which datasets they plan to use, new loss functions, changes to existing models, etc. Collecting your own dataset for this project is not recommended. Try to use existing datasets.
- Identify the computational resources that you plan to use. Do not propose a project that you are not confident you will have sufficient resources to execute.
- The proposed experimental evaluation protocol and expected results. You should describe what experiments you plan to run and how you will run this. You should describe which datasets you plan to evaluate on, any existing code bases you will use, and what needs to be implemented by the group. You should also describe what your group is aiming for with the project (i.e. what do you consider a success). You should explain what you hope each experiment will show and discuss any uncertainty you have about the project.
Project presentation video
Each group will make a five minute video presenting their final project. Please see this video for a 10 minute research project presentation for an example. During the final class session(s), we will play the videos for each group in class and the group will then answer questions from the class. Your video, at a minimum, should:
- Clearly define what problem your project is addressing.
- Provide motivation for why the problem is important, interesting, and/or challenging.
- Address prior related work that has attempted to address this problem (or a related problem). Obviously there is not much time, so a brief summary of a few key pieces of related work is sufficient so we can differentiate your work from these.
- Describe, in detail, your proposed approach for the problem. For example, this may involve describing details of the model design and key loss functions used to train it. You should understand all equations that you present in your video in case there are questions.
- Explain how you evaluated your method. You should fully describe the experimental set-up and present any quantitative and qualitative results. If there are any unusual metrics that students may not know, you should explain what those are and how they are computed.
- Briefly discuss key strengths and weaknesses of your project.
- Briefly identify any ideas for future work and any open research questions.
Try to make use of illustrations and animations in your video (e.g. calling out specific results), as opposed to text-heavy slides. Most of the information content should come from the voice-over, rather than reading the slides. You are highly encouraged to search online for relevant materials which you may use in your presentation. However, make sure to clearly cite all your sources.
Grading policy for presentation video: Your grade for your video will be based on: 1) clarity and presentation quality; 2) correctness of your statements made during the presentation; 3) whether you addressed all the guidelines above; 4) the thought process and effort put into your project as demonstrated in your video; and 5) how well you answered questions in class about your project.
Final report
The project final report should resemble a mini CVPR conference paper and should be four pages (excluding references). This means having a method figure, table(s) with results, some qualitative/quantitative results, etc. Your final report must include an Introduction, Related work, Approach, and Results section. The related work section can be significantly abbreviated given it appeared in your proposal. The final report should be self-contained and thoroughly described in sufficient detail that someone else in this class could implement your approach given the chance. Please recognize that given the page limitations you will need to carefully manage what stays in and is cut. In general, try to state what you did clearly and concisely. If you implemented a new model or architecture, include a figure and describe it. If you used or created a new loss, include it and describe the variables in it sufficiently. For figures, you may need to design them so they can be shrunken to use less space. Please also describe precisely what you did that is new in your approach (you can sometimes handle this in the related work section as well).
Final project grading criteria
Your final project grade will primarily be based on the thought process and effort put into your project as demonstrated through your video presentation and final report. While you should work to get the best results possible, it is understood that given the limited time available your method might not outperform other state-of-the-art approaches. The best projects are those that have new, clever ideas, not necessarily those that perform best on a given benchmark. Your project report (and presentation) will be evaluated on the following factors: 1) how well you related it to prior research; 2) the clarity and format of the presentation and report; 3) and completeness. For completeness, you will be evaluated on whether you complied with all the requirements of the project (e.g. does your report have all the required sections) and the degree to which you put in the thought and effort required to deliver an interesting and compelling class project. The paper will also be evaluated on the degree to which all experimental evaluations necessarily for evaluating it have been performed (e.g. main results, ablations, qualitative results).
Project deliverables
All project deliverables will be uploaded to Canvas.
- Project proposal (10% of final project grade) - due October 13th, 11:59 PM
- Project presentation video (40% of final project grade) - due November 30, 11:59 PM
- Project final report (50% of final project grade) - due December 6, 11:59 PM
Acknowledgements
This course was inspired by and/or uses resources from the following courses:
- Computer Vision by David Fouhey, University of Michigan, Winter 2023
- Computer Vision by Svetlana Lazebnik, University of Illinois at Urbana-Champaign, Fall 2022
- Computer Vision by Andrew Owens, University of Michigan, Fall 2022
- Advanced Topics in Computer Vision by Andrew Owens, University of Michigan, Winter 2022
- Computer Vision by Adriana Kovashka, University of Pittsburgh, Spring 2021
- Advanced Computer Vision by Carl Vondrick, Columbia University, Spring 2019
- Computer Vision by Jia-Bin Huang, Virginia Tech, Fall 2018
- Advanced Computer Vision by Jia-Bin Huang, Virginia Tech, Spring 2017
- Visual Recognition by Adriana Kovashka, University of Pittsburgh, Spring 2015
- Advanced Topics in Computer Vision by Devi Parikh, Virginia Tech, Spring 2014