course number instructor title
CS 6604 I Lourentzou Data Challenges in Machine Learning

Machine learning (ML) has revolutionized a wide variety of domains, ranging from computer vision and natural language processing to robotics, speech recognition, and beyond. In general, ML models are exceptional in recognizing subtle patterns when trained on large volumes of high-quality data. Such reliance on well-defined, accessible, and consumption-ready data poses broad challenges to the successful application of machine learning. In this course we will explore recent advances that address data challenges, for example, data annotation and low-resource scenarios, outliers, class imbalance, missing attributes/values, robustness and generalization.

The course will follow a seminar format, with selected readings drawn from recent literature, from conferences such as ICML, NeurIPS, AAAI, IJCAI, CVPR, etc. There will also be a group project that students will work on through the course of the semester.

Prerequisites: Students should have experience with machine learning, data analytics, and preferably with deep learning. Familiarity with linear algebra, statistics and probability are necessary, as well as with the design and implementation of machine learning models (ideally with a framework that is well-suited for rapid ML prototyping, e.g., PyTorch, Tensorflow, Keras, etc.) Most importantly, students are expected to extract key concepts and ideas from reading ML conference papers. Additionally, to enable collaboration, students will be assigned different "roles" each week.

Readings will cover a range of topics, included, but not limited to: