# Announcements

1/23/19 Lab hours for Thursday (1/24) are going to be from 3:30–4:30. Lab hours on Friday and for later weeks are unchanged.

1/3/19 Webpage is live!

# Course Description

This course provides a broad introduction to data analysis and modeling. Topics covered include:

- How do I tell whether data follows a pattern?
- How can I visualize my data?
- How can I tell whether patterns in my data are real?
- How can I use data to draw conclusions?

We are going to take a problem-focused approach to the material, looking at how to use data analysis and modeling algorithms, such as clustering, regression, hypothesis testing, etc., to solve interesting engineering problems. The course will use Python to teach how to write these analyses. There will be programming assignments (in Python) every two weeks during the first part of the course to explore concepts, followed by an end-of-semester mini-project where the students will tackle a larger analysis and modeling problem.

# Course Details

The syllabus for the course explains the logistical details of the course. The course also uses a Piazza discussion board for course questions.

This course is offered as part of a three-course "set," along with ILS 295—Introduction to Data Management and PHIL 293—Ethics for Data Science. These courses cover a broad introduction to topics in data science, and the courses interlock with each other. We put together a flyer that illustrates how these courses cover the ecosystem of data science. I encourage you to sign up (or audit, or sit in on) the other two courses.

# Lecture Notes

- Lecture 1: Introduction
- Lecture 2: Python for C Programmers. You may also find it helpful to play around with the corresponding notebook.
- Lecture 3: Histograms
- Lecture 4: Higher order functions. You may also find it helpful to play around with the corresponding notebook.
- Lecture 5: Higher order functions (cont) and Data Structures. You may also find it helpful to play around with the corresponding notebook.
- Lecture 6: Probability and Distribution. Some material courtesy Professor Stanley Chan.
- Lecture 7: Sampling and Estimation. Slides courtesy Professor Stanley Chan.
- Lecture 8: Regression. Slides courtesy Professor Stanley Chan.

# Assignments

- HW 0: Environment setup. Due 1/18
- HW 1: Histograms. Due 2/1
- HW 2: N-grams. Due 2/15
- HW 3: Sampling and Confidence Intervals. Due 3/3
- HW 4: Regression. Due 3/22