# Announcements

11/3/19 The solutions for midterm 1 are available here.

8/8/19 Website is live!

# Course Description

This course is an introductory programming course that teaches Python. But in addition to that, it provides an introduction to topics in data science. Topics covered include:

- Basics of git
- Regular expressions and text processing
- Python basics
- Python data structures and libraries
- Basic object-oriented programming
- Basic data visualization
- Sampling, estimation, hypothesis testing
- Regression analyses
- Classification and clustering
- Basic neural networks

We will have Python programming assignments roughly every week (10 in all), plus a mini-project at the end of the semester.

**Prerequisites**: Undergraduate level CS 15900 Minimum Grade of C-

# Course Details

The syllabus for the course explains the logistical details of the course. The course also uses a Piazza discussion board for course questions.

# Lecture Notes

- Week 1:
- 8/20 Intro. Please also see the notes on git and GitHub.
- 8/22 Bash. You can find the files that we used for the examples in class here

- Week 2:
- 8/27 Python basics. There are slides and code. You can also download the Jupyter Notebook associated with the code if you want to play around with the code
- 8/30 Data structures. Code and notebook. Also see the slides from 8/27.

- Week 3:
- 9/3 Histograms.
- 9/5 Probability and Distribution.

- Week 4:
- 9/10 Probability and Distrubiton (continued). Higher Order Functions
- 9/12 Higher Order Functions continued. See the code and notebook associated with this material.

- Week 5:
- Week 6:
- 9/24 More Hypothesis Testing (note that slides have been updated to include material on one-sided tests)
- 9/26 Midterm Review

- Week 7:
- 10/1 Regular Expressions.
- 10/3 Regular Expressions (continued)

- Week 8:
- 10/8 Fall Break. No class.
- 10/10 Regression

- Week 9:
- 10/15 Regression (continued). We also did a brief overview of linear algebra, and discussed NumPy (Associated notebook)
- 10/17 Regression (continued). Note that the regression notes are now updated with all of the material covered across the three regression lectures. You may also find this notebook walking through a regression computation useful.

- Week 10:
- 10/22 Basic Natural Language Processing. You may also find this notebook walking through building a document-word matrix helpful.
- 10/24 No class (Midterm 1 makeup)

- Week 11:
- 10/29 Classes and Objects and Clustering
- 10/31 Classes and Objects, and Cluster, continued.

- Week 12:
- 11/5 Midterm 2 review
- 11/7 Clustering continued, and Inheritance

- Week 13:
- 11/12 Classification: Naive Bayes and k-Nearest Neighbor.
- 11/14 Iterators and Generators, with associated notebook.

- Week 14:
- 11/19 Classification: Logistic Regression.
- 11/21 Perceptrons, plus some supplemental notes discussing convergence.

- Week 15:
- 11/26 No class (Midterm 2 makeup)
- 11/28 No class (Thanksgiving)

- Week 16:
- 12/3 Neural nets and back propagation. Note that these notes include updates to the notes from 11/21.

# Assignments

- Homework 1: Bash scripting. (Solutions)
- Homework 2: Basic Data Structures. (Solutions)
- Homework 3: Histograms and Distributions. (Solutions)
- Homework 4: Higher Order Functions. (Solutions)
- Homework 5: Hypothesis Testing and Confidence Intervals. (Solutions)
- Homework 6: Regular Expressions. (Solutions)
- Homework 7: Regression. (Solutions)
- Homework 8: Basic NLP.
- Homework 9: Objects and K-Means. (Solutions)
- Homework 10: Gaussian Mixture Models. (Solutions)