Introduction to Probability for Data Science

An undergraduate to graduate textbook on probability in the context of modern data science.
Stanley H. Chan, 2021


I am fortunate to have the opportunity to witness and contribute to teaching several of the most important data science courses in Purdue ECE and college of engineering. This book is a collection of materials that I find fundamental, interesting, and practical. It is written based on three courses I have taught/created:

  • ECE 20875 (was ECE 295) Introduction to Data Science with Python (Sophomore)

  • ECE 302 Probabilistic Methods for Electrical and Computer Engineering (Junior-Senior)

  • ECE 595ML Machine Learning (Graduate)

  • ECE 645 Estimation Theory (Graduate)

As I write the book, I have done a fairly exhaustive search of the available textbooks on this subject. It was quite surprising to see that while there is a tsunami of data science books on the internet, many of them are written for programmers. I am not overlooking the importance of these books, but in my opinion college students need a more solid mathematical training so that they can pursue a more advanced career. However, on the other end of the spectrum, classical probability textbooks are everywhere. While these books offer great details, many of them do not have a soul. Why should we learn probability? How can flipping a coin be useful in modern data science? Can we help undergraduate students to appreciate measure theory? Why does the Gaussian have a bell shape? Where does the Poisson distribution come from? How to fit data with a line? How to tell whether a change is statistically significant?

I hope that the book will become a valuable asset to our community. The book is not yet finished, and I am actively revising the book. If you have any suggestions, I would appreciate you send me an email and let me know.

Stanley Chan, Feb 2021.

Preview (Update: 2-27-2021)

Please fill the evaluation if you have a few minutes.



(Will expand)

Videos and Slides