Introduction to Scientific Machine Learning

This course introduces data science to engineers with no prior knowledge. Throughout the course, the instructor follows a probabilistic perspective that highlights the first principles behind the presented methods with the ultimate goal of teaching the student how to create and fit their own models.

ME53900

Credit Hours:

Learning Objective:

Represent uncertainty in parameters in engineering or scientific models using probability theory
Propagate uncertainty through physical models to quantify the induced uncertainty in quantities of interest
Solve basic supervised learning tasks, such as: regression, classification, and filtering
Solve basic unsupervised learning tasks, such as: clustering, dimensionality reduction, and density estimation
Create new models that encode physical information and other causal assumptions
Calibrate arbitrary models using data
Apply various Python coding skills
Load and visualize data sets in Jupyter notebooks
Visualize uncertainty in Jupyter notebooks
Recognize basic Python software (e.g., Pandas, numpy, scipy, scikit-learn) and advanced Python software (e.g., pymc3, pytorch, pyro, Tensorflow) commonly used in data analytics.

Description:

This course provides an introduction to data science for individuals with no prior knowledge of data science or machine learning. The course starts with an extensive review of probability theory as the language of uncertainty, discusses Monte Carlo sampling for uncertainty propagation, covers the basics of supervised (Bayesian generalized linear regression, logistic regression, Gaussian processes, deep neural networks, convolutional neural networks), unsupervised learning (k-means clustering, principal component analysis, Gaussian mixtures) and state space models (Kalman filters). The course also reviews the state-of-the-art in physics-informed deep learning (Markov chain Monte Carlo, sequential Monte Carlo, and variational inference). Throughout the course, the instructor follows a probabilistic perspective that highlights the first principles behind the presented methods with the ultimate goal of teaching the student how to create and fit their own models.

Fall 2023 Syllabus

Topics Covered:

The course starts with an extensive review of probability theory as the language of uncertainty, discusses Monte Carlo sampling for uncertainty propagation, covers the basics of supervised (Bayesian generalized linear regression, logistic regression, Gaussian processes, deep neural networks, convolutional neural networks), unsupervised learning (k-means clustering, principal component analysis, Gaussian mixtures) and state space models (Kalman filters). The course also reviews the state-of-the-art in physics-informed deep learning and ends with a discussion of automated Bayesian inference using probabilistic programming (Markov chain Monte Carlo, sequential Monte Carlo, and variational inference).

Prerequisites:

Working knowledge of multivariate calculus and basic linear algebra
Basic Python knowledge
Knowledge of probability and numerical methods would be helpful, but not required

Applied / Theory:

50 / 50

Web Address:

https://purdue.brightspace.com

Homework:

There will be seven (7) homework assignments. The homework assignments will be both theoretical (e.g., prove this, derive that) and computational (e.g., use this data to fit that model, create and fit a model for this situation). The assignments will be in the form of a Jupyter notebook with empty space reserved for your writing or coding. If you wish, you can do the writing by hand (instead of the latex required by Jupyter notebooks), scan it and submit a single PDF. Submissions should be made through Gradescope.

Exams:

One midterm exam.

Textbooks:

Online Textbook:

https://predictivesciencelab.github.io/data-analytics-se/index.html

Computer Requirements:

Jupyter Notebook

Jupyter notebooks are interactive documents that can simultaneously contain text, mathematics, images, and executable code. The executable code can be in many programming languages (e.g., R, Matlab), but we are only going to use Python in this course. The course uses Jupyter notebooks for the following content: Reading Activities, Hands-on Activities, and Homework Assignments. The rationale behind this choice is that it allows the student to focus on the mathematical methods rather than the programming and it ensures the reproducibility of the course content. Of course, understanding the code in Jupyter notebooks does require knowledge of Python, albeit it does not require knowing how to structure and call Python code from the command line. Jupyter notebooks can be run either on the students' personal computers (instructions vary with operating system and can be found here) or in several cloud computing resources. The recommended method for this class is to use Google Colab which is available free of charge and requires only a standard Google account. The activity links included in the course will take you automatically to a copy of the latest version of the corresponding Jupyter notebook which you can then save and edit on your Google Drive.

Access to the Jupyter Notebooks Repository

As stated earlier, the recommended method for using the Jupyter notebooks of this class is to use Google Colab. The links to all the activities will take you directly to a Google Colab copy of the Jupyter notebook. If you want to use any alternative method (e.g., your personal computers Purdue's Jupyter Hub, or anything else), you will need access to the Jupyter Notebook repository for the class. Here you can find the Jupyter notebooks GitHub repository.

ProEd Minimum Requirements:

view