While the impact of machine learning (ML) in commercial disciplines involving vision, speech, and text related problems is well understood, the promise of ML is yet to be fully realized for accelerating discoveries in scientific and engineering disciplines. This is because mainstream “black-box” ML models, that only rely on data, are susceptible to learning spurious relationships that do not generalize well outside the data they are trained upon. Moreover, black-box ML models do not provide any mechanistic insights about the scientific processes being studied, thus making them unfit to be used as building blocks in scientific discovery. What is fundamentally lacking in black-box ML is their inability to ingest the rich background of scientific knowledge driving real-world phenomena along with the information contained in data. To address this, there is a growing research trend to deeply integrate scientific knowledge in the ML process, referred to as the paradigm of scientific Knowledge-guided ML (KGML). This course will introduce the foundations of KGML and provide a coherent perspective of research themes in KGML. These research themes will be illustrated using recent examples of cutting-edge research from diverse scientific disciplines. The course will also impart hands-on experience in conducting KGML research through a semester-long project.
By the end of the course, students will:
This course will tentatively cover the following list of research themes in KGML:
This is a project-based course that will use a mix of learning activities. Introductory lectures will cover the basics of ML, foundations of KGML, and research themes in KGML. The course will also include occassional guest lectures by leading researchers in KGML covering special topics of interest. The lectures will be interspersed with paper discussions led by students from a reading list of relevant literature in KGML. Students will also submit reviews of the paper they read after every discussion session. A major component of the course will be a semester-long project where students will get to work on a research problem in KGML of their interest at the intersection of ML and science from scratch to finish. Students will get to work in groups to identify and formulate a research problem, apply, explore, and design KGML algorithms to solve the problem, and demonstrate the real-world effectiveness of KGML procedures using rigorous evaluation setups, potentially leading to publications. All project activities starting from idea generation to report preparation will be facilitated through peer discussions coordinated by the instructor.
This advanced topics course does not require any formal pre-requisite courses and is broadly open to students with the interest and ability to learn topics in KGML. Specifically, this course is meant for two categories of graduate students: (a) students familiar in ML who are eager and willing to learn about scientific problems and pursue KGML research, and (b) students from scientific disciplines with little familiarity in ML who are eager to learn and apply KGML in an area they are familiar with. Students can assess their preparedneess for the course by discussing with the instructor and attending the first class.