Introduction to Probability for Data ScienceAn undergraduate to graduate textbook on probability in the context of modern data science.
Stanley H. Chan, 2021 OverviewI am fortunate to have the opportunity to witness and contribute to teaching several of the most important data science courses in Purdue ECE and college of engineering. This book is a collection of materials that I find fundamental, interesting, and practical. It is written based on three courses I have taught/created:
As I write the book, I have done a fairly exhaustive search of the available textbooks on this subject. It was quite surprising to see that while there is a tsunami of data science books on the internet, many of them are written for programmers. I am not overlooking the importance of these books, but in my opinion college students need a more solid mathematical training so that they can pursue a more advanced career. However, on the other end of the spectrum, classical probability textbooks are everywhere. While these books offer great details, many of them do not have a soul. Why should we learn probability? How can flipping a coin be useful in modern data science? Can we help undergraduate students to appreciate measure theory? Why does the Gaussian have a bell shape? Where does the Poisson distribution come from? How to fit data with a line? How to tell whether a change is statistically significant? I hope that the book will become a valuable asset to our community. The book
is not yet finished, and I am actively revising the book. If you have any
suggestions, I would appreciate you send me an email and let me know. Stanley Chan, Jan 2021. Preview Chapters
|