Introduction to Probability for Data Science

An undergraduate to graduate textbook on probability in the context of modern data science.
Stanley H. Chan, 2021


I am fortunate to have the opportunity to witness and contribute to teaching several of the most important data science courses in Purdue ECE and college of engineering. This book is a collection of materials that I find fundamental, interesting, and practical. It is written based on three courses I have taught/created:

  • ECE 20875 (was ECE 295) Introduction to Data Science with Python (Sophomore)

  • ECE 302 Probabilistic Methods for Electrical and Computer Engineering (Junior-Senior)

  • ECE 595ML Machine Learning (Graduate)

As I write the book, I have done a fairly exhaustive search of the available textbooks on this subject. It was quite surprising to see that while there is a tsunami of data science books on the internet, many of them are written for programmers. I am not overlooking the importance of these books, but in my opinion college students need a more solid mathematical training so that they can pursue a more advanced career. However, on the other end of the spectrum, classical probability textbooks are everywhere. While these books offer great details, many of them do not have a soul. Why should we learn probability? How can flipping a coin be useful in modern data science? Can we help undergraduate students to appreciate measure theory? Why does the Gaussian have a bell shape? Where does the Poisson distribution come from? How to fit data with a line? How to tell whether a change is statistically significant?

I hope that the book will become a valuable asset to our community. The book is not yet finished, and I am actively revising the book. If you have any suggestions, I would appreciate you send me an email and let me know.

Stanley Chan, Jan 2021.



  • Chapter 6 Sample Statistics (Update: 1-18-2021)

    • Lecture 6.1 Moment generating functions (Video) (Slide)

    • Lecture 6.2 Characteristic functions (Video) (Slide)

    • Lecture 6.3 Union bound, Cauchy-schwarz inequality, Jensen's inequality

    • Lecture 6.4 Markov inequality, Chebyshev inequality

    • Lecture 6.5 Chernoff bound, Hoeffding inequality

    • Lecture 6.6 Weak law of large number and convergence in probability

    • Lecture 6.7 Strong law of large number and almost sure convergence

    • Lecture 6.8 Central limit theorem and convergence in distribution

  • Chapter 7 Regression

    • Lecture 7.1 Principles of regression

    • Lecture 7.2 Robust regression

    • Lecture 7.3 Overfitting

    • Lecture 7.4 Training and testing error

    • Lecture 7.5 Bias and variance

    • Lecture 7.6 Regression: Ridge

    • Lecture 7.7 Regression: LASSO

  • Chapter 8 Estimation (Update: 01-07-2021)

    • Lecture 8.1 Parameter estimation settings

    • Lecture 8.2 Maximum-likelihood estimation

    • Lecture 8.3 ML estimation vs linear regression

    • Lecture 8.4 Unbiased estimators

    • Lecture 8.5 Consistent estimators

    • Lecture 8.6 Maximum-a-posteriori estimation

    • Lecture 8.7 MAP vs ML

    • Lecture 8.8 Conjugate priors

    • Lecture 8.9 Mean square error (MSE)

    • Lecture 8.10 Minimum mean square error (MMSE) estimation

    • Lecture 8.11 MMSE vs MAP vs ML

  • Chapter 9 Confidence and Hypothesis (Update: 12-30-2020)

    • Lecture 9.1 What is a confidence interval, and what is not?

    • Lecture 9.2 Constructing confidence intervals

    • Lecture 9.3 Gaussian Z distribution and Student's T distribution

    • Lecture 9.4 Bootstrap: Motivation

    • Lecture 9.5 Bootstrapping variances

    • Lecture 9.6 Understanding hypothesis

    • Lecture 9.7 Critical-value and p-value

    • Lecture 9.8 Z-test and T-test

    • Lecture 9.9 Type 1 and Type 2 error

    • Lecture 9.10 Neyman-Pearson decision

    • Lecture 9.11 Receiver Operating Characteristic (ROC)

    • Lecture 9.12 Precision-Recall (PR)