Course Title: CS 6804: Science-guided Machine Learning (3 credits, CRN: 83213)
Instructor: Anuj Karpatne
Class Type: Online Course
Class Timings: Mon, Wed: 4:00 pm - 5:15 pm Eastern; Zoom URL: https://virginiatech.zoom.us/j/99539879818
Instructor Office Hours: Mon, Wed: 5:30 pm - 6:30 pm Eastern; Zoom URL: https://virginiatech.zoom.us/j/95075800579
Additional Links used in First Class (08/24):
Additional Links used in Second Class (08/26):

Course Overview

While the impact of machine learning (ML) in commercial disciplines involving vision, speech, and text related problems is well understood, the promise of ML is yet to be fully realized for accelerating discoveries in scientific and engineering disciplines. This is because mainstream "black-box" ML models, that only rely on data, are susceptible to learning spurious relationships that do not generalize well outside the data they are trained upon. Moreover, black-box ML models do not provide any mechanistic insights about the scientific processes being studied, thus making them unfit to be used as building blocks in scientific discovery. What is fundamentally lacking in black-box ML is their inability to ingeest the rich background of scientific knowledge driving real-world phenomena along with the infomation contained in data. To addrress this, there is a growing research trend to deeply integrate scientific knowledge in the ML process, referred to as the paradigm of Science-guided ML (SGML). This course will introduce the foundations of SGML and provide a coherent perspective of research themes in SGML. These research themes will be illustrated using recent examples of cutting-edge research from diverse scientific disciplines. The course will also impart hands-on experience in conducting SGML research through a semester-long project. All course activities will be conducted online.

Learning Aims

By the end of the course, students will:

  • Be well-versed with the foundations and theme areas of SGML, as well as recent developments in every theme area
  • Be able to compare and contrast different SGML research themes and identify their strengths, limitations, and opportunities for future research
  • Be equipped to cross-pollinate SGML ideas from one application domain to another
  • Develop essential research skills including reading, discussing, and critiquing research papers, identifying research gaps and brainstorming solutions, and communicating research ideas through technical writing and oral presentations
  • Gain practical experience in pursuing SGML reseach through a course project

Course Topics

This course will tentatively cover the following list of research themes in SGML:

  1. Science-guided Learning: Techniques for modifying the learning algorithms of ML, e.g., with the help of priors, constraints, and loss functions to ensure that the learned ML solutions are scientifically consistent.
  2. Science-guided Design: Techniques for hard-coding (or "baking in") scientific knowledge in the design of ML models, e.g., using neural network architectures that capture the physics of the scientific process.
  3. Science-guided Refinement: Techniques for refining the outputs of ML models using scientific knowledge, e.g., by pruning or post-processing.
  4. Discovery of Scientific Laws from Data: Techniques for automatically discovering the governing equations of a scientific problem from simulated or real-world data using ML.
  5. Inferring parameters in science-based models: Techniques for inferring parameters or state-variables in science-based forward models using ML-based inversion techniques.
  6. Hybrid-science-ML Modeling: Techniques for integrating ML models with science-based models to augment systematic biases or replace sub-components of science-based models that are currently lacking.

Learning Activities

This is a project-based course that will use a mix of learning activities. Introductory lectures will cover the basics of ML, foundations of SGML, and research themes in SGML. The course will also include occassional guest lectures by leading researchers in SGML covering special topics of interest. The lectures will be interspersed with paper discussions led by students from a reading list of relevant literature in SGML. Students will also submit reviews of the paper they read after every discussion session. A major component of the course will be a semester-long project where students will get to work on a research problem in SGML of their interest at the intersection of ML and science from scratch to finish. Students will get to work in groups to identify and formulate a research problem, apply, explore, and design SGML algorithms to solve the problem, and demonstrate the real-world effectiveness of SGML procedures using rigorous evaluation setups, potentially leading to publications. All project activities starting from idea generation to report preparation will be facilitated through online peer discussions coordinated by the instructor.

Background Required

This advanced topics course does not require any formal pre-requisite courses and is broadly open to students with the interest and ability to learn topics in SGML. Specifically, this course is meant for two categories of graduate students: (a) students familiar in ML who are eager and willing to learn about scientific problems and pursue SGML research, and (b) students from scientific disciplines with little familiarity in ML who areĀ eager to learn and apply SGML in an area they are familiar with. Students can assess their preparedneess for the course by discussing with the instructor and attending the first class.

Detailed Course Syllabus:

Course Lecture Slides and Videos:

  1. Class on 08/24/2020
  2. Class on 08/26/2020