While the impact of machine learning (ML) in commercial disciplines involving vision, speech, and text related problems is well understood, the promise of ML is yet to be fully realized for accelerating discoveries in scientific and engineering disciplines. This is because mainstream "black-box" ML models, that only rely on data, are susceptible to learning spurious relationships that do not generalize well outside the data they are trained upon. Moreover, black-box ML models do not provide any mechanistic insights about the scientific processes being studied, thus making them unfit to be used as building blocks in scientific discovery. What is fundamentally lacking in black-box ML is their inability to ingeest the rich background of scientific knowledge driving real-world phenomena along with the infomation contained in data. To addrress this, there is a growing research trend to deeply integrate scientific knowledge in the ML process, referred to as the paradigm of Science-guided ML (SGML). This course will introduce the foundations of SGML and provide a coherent perspective of research themes in SGML. These research themes will be illustrated using recent examples of cutting-edge research from diverse scientific disciplines. The course will also impart hands-on experience in conducting SGML research through a semester-long project. All course activities will be conducted online.
By the end of the course, students will:
This course will tentatively cover the following list of research themes in SGML:
This is a project-based course that will use a mix of learning activities. Introductory lectures will cover the basics of ML, foundations of SGML, and research themes in SGML. The course will also include occassional guest lectures by leading researchers in SGML covering special topics of interest. The lectures will be interspersed with paper discussions led by students from a reading list of relevant literature in SGML. Students will also submit reviews of the paper they read after every discussion session. A major component of the course will be a semester-long project where students will get to work on a research problem in SGML of their interest at the intersection of ML and science from scratch to finish. Students will get to work in groups to identify and formulate a research problem, apply, explore, and design SGML algorithms to solve the problem, and demonstrate the real-world effectiveness of SGML procedures using rigorous evaluation setups, potentially leading to publications. All project activities starting from idea generation to report preparation will be facilitated through online peer discussions coordinated by the instructor.
This advanced topics course does not require any formal pre-requisite courses and is broadly open to students with the interest and ability to learn topics in SGML. Specifically, this course is meant for two categories of graduate students: (a) students familiar in ML who are eager and willing to learn about scientific problems and pursue SGML research, and (b) students from scientific disciplines with little familiarity in ML who are eager to learn and apply SGML in an area they are familiar with. Students can assess their preparedneess for the course by discussing with the instructor and attending the first class.