# Introduction to Probability for Data Science

An undergraduate to graduate textbook on probability in the context of modern data science.
Stanley H. Chan, 2021

## Overview

I am fortunate to have the opportunity to witness and contribute to teaching several of the most important data science courses in Purdue ECE and college of engineering. This book is a collection of materials that I find fundamental, interesting, and practical. It is written based on three courses I have taught/created:

• ECE 20875 (was ECE 295) Introduction to Data Science with Python (Sophomore)

• ECE 302 Probabilistic Methods for Electrical and Computer Engineering (Junior-Senior)

• ECE 595ML Machine Learning (Graduate)

As I write the book, I have done a fairly exhaustive search of the available textbooks on this subject. It was quite surprising to see that while there is a tsunami of data science books on the internet, many of them are written for programmers. I am not overlooking the importance of these books, but in my opinion college students need a more solid mathematical training so that they can pursue a more advanced career. However, on the other end of the spectrum, classical probability textbooks are everywhere. While these books offer great details, many of them do not have a soul. Why should we learn probability? How can flipping a coin be useful in modern data science? Can we help undergraduate students to appreciate measure theory? Why does the Gaussian have a bell shape? Where does the Poisson distribution come from? How to fit data with a line? How to tell whether a change is statistically significant?

I hope that the book will become a valuable asset to our community. The book is not yet finished, and I am actively revising the book. If you have any suggestions, I would appreciate you send me an email and let me know.

Stanley Chan, Jan 2021.

Preview

## Chapters

• Chapter 6 Sample Statistics (Update: 1-18-2021)

• Lecture 6.1 Moment generating functions (Video) (Slide)

• Lecture 6.2 Characteristic functions (Video) (Slide)

• Lecture 6.3 Union bound, Cauchy-schwarz inequality, Jensen's inequality

• Lecture 6.4 Markov inequality, Chebyshev inequality

• Lecture 6.5 Chernoff bound, Hoeffding inequality

• Lecture 6.6 Weak law of large number and convergence in probability

• Lecture 6.7 Strong law of large number and almost sure convergence

• Lecture 6.8 Central limit theorem and convergence in distribution

• Lecture 7.1 Principles of regression

• Lecture 7.2 Robust regression

• Lecture 7.3 Overfitting

• Lecture 7.4 Training and testing error

• Lecture 7.5 Bias and variance

• Lecture 7.6 Regression: Ridge

• Lecture 7.7 Regression: LASSO

• Chapter 8 Estimation (Update: 01-07-2021)

• Lecture 8.1 Parameter estimation settings

• Lecture 8.2 Maximum-likelihood estimation

• Lecture 8.3 ML estimation vs linear regression

• Lecture 8.4 Unbiased estimators

• Lecture 8.5 Consistent estimators

• Lecture 8.6 Maximum-a-posteriori estimation

• Lecture 8.7 MAP vs ML

• Lecture 8.8 Conjugate priors

• Lecture 8.9 Mean square error (MSE)

• Lecture 8.10 Minimum mean square error (MMSE) estimation

• Lecture 8.11 MMSE vs MAP vs ML

• Chapter 9 Confidence and Hypothesis (Update: 12-30-2020)

• Lecture 9.1 What is a confidence interval, and what is not?

• Lecture 9.2 Constructing confidence intervals

• Lecture 9.3 Gaussian Z distribution and Student's T distribution

• Lecture 9.4 Bootstrap: Motivation

• Lecture 9.5 Bootstrapping variances

• Lecture 9.6 Understanding hypothesis

• Lecture 9.7 Critical-value and p-value

• Lecture 9.8 Z-test and T-test

• Lecture 9.9 Type 1 and Type 2 error

• Lecture 9.10 Neyman-Pearson decision

• Lecture 9.11 Receiver Operating Characteristic (ROC)

• Lecture 9.12 Precision-Recall (PR)