Notice: For the latest information and guidance on Purdue's response to COVID-19 please visit:

Data Analytics for Scientists and Engineers


Credit Hours:


Learning Objective:

This course provides an introduction to data analytics for individuals with no prior knowledge of data science or machine learning.


The course starts with an extensive review of probability theory as the language of uncertainty, discusses Monte Carlo sampling for uncertainty propagation, covers the basics of supervised (Bayesian generalized linear regression, logistic regression, Gaussian processes, deep neural networks, convolutional neural networks), unsupervised learning (k-means clustering, principal component analysis, Gaussian mixtures) and state space models (Kalman filters). The course also reviews the state-of-the-art in physics-informed deep learning and ends with a discussion of automated Bayesian inference using probabilistic programming (Markov chain Monte Carlo, sequential Monte Carlo, and variational inference). Throughout the course, the instructor follows a probabilistic perspective that highlights the first principles behind the presented methods with the ultimate goal of teaching the student how to create and fit their own models

Topics Covered:

Introduction to predictive modeling. Review of probability theory. Uncertainty propagation using Monte Carlo. Principles of Bayesian inference. Supervised learning: linear and logistic regression. Unsupervised learning: clustering, density estimation, and dimensionality reduction. State-space models: Kalman filters. Gaussian process regression. Neural networks: regression, classification, physics-informed machine learning. Advanced methods for characterizing posteriors: Markov chain Monte Carlo, variational inference


  • Working knowledge of multivariate calculus and basic linear algebra
  • Basic Python knowledge
  • Knowledge of probability and numerical methods for engineering would be helpful, but not required

Applied / Theory:

50 / 50


There will be ten (10) homework assignments. The homework assignments will be both theoretical (e.g., prove this, derive that) and computational (e.g., use this data to fit that model, create and fit a model for this situation). The assignments will be in the form of a Jupyter notebook with empty space reserved for your writing or coding. If you wish, you can do the writing by hand (instead of the latex required by Jupyter notebooks), scan it and submit a single PDF. Submissions should be made through Gradescope.


There are no exams.


Jupyter Notebook (details in syllabus)

Computer Requirements:

ProEd Minimum Requirements:


Tuition & Fees: