Due: March 2rd
Homework 4 : Regression
This homework looks at using various techniques for making determinations of statistical significance. It is due March 23rd.
Goals
In this assignment you will:
- Compute a linear regression to model the contribution of various factors to prostate cancer markers
- Use autoregression to predict stock prices
Background
In this assignment, we will use several different data sets. For part 1 of the
assignment, we will use hw04.csv
(which is the same prostate data set as
hw03.csv
from the previous assignment -- you can use similar routines as
last time to read it in and process it). For part 2 of the assignment, we will
use google_stock_train.txt
and nvs_stock_train.txt
This assignment will require performing a lot of matrix computations to compute your regressions. You will likely find NumPy's matrix library and linear algebra library useful.
A few tips and tricks for using NumPy matrices and completing this assignment:
-
If you create a NumPy matrix, the attributes
.T
and.I
will give you the transposed and inverse matrices, respectively. Som.T
will give you the transposed version of matrixm
. -
Keep track of the dimensionality of your matrices, to make sure that you are computing things correctly.
.shape
will tell you the dimensions of your matrix. So ifm.shape
returns(67, 8)
, that means thatm
is a 67x8 matrix. -
.mean(0)
will compute the column means of a matrix, and.sum(0)
will compute the column sums of a matrix. So if you have a 67x8 matrixm
and you computem.mean(0)
, you will get a new 1x8 matrix containing the means of each column. -
Elementwise computation in NumPy is subtle, because it tries to do the "right" thing in ways you may not expect. If you want to subtract off a single value
x
from each element of a matrixm
,m - x
will do that. Ifx
is instead a 1xC matrix, thenm - x
will subtract the "row"x
from each row ofm
. Hence,m - m.mean(0)
will subtract from each element inm
the mean of the column that element is in. - If you are using numpy matrices, you cannot use
*
and**
to do elementwise multiplication or exponentiation (those will instead do matrix multiplication and exponentiation). You will have to usenumpy.multiply
andnumpy.power
to do that.
Instructions
0) Set up your repository for this homework.
Click the link on Blackboard to set up a repository for homework 4, then clone it, as you did for homework 0.
The repository should contain 6 files:
- This README
- Input data files called
hw04.csv
,google_stock_train.txt
, andnvs_stock_train.txt
- A PDF with the homework writeup called
hw04.pdf
- A helper file called
csv_reader.py
(this is the same as the file from HW 3).
1) Homework Problem 1: Regression
For problem 1, put your code in a file called hw4_1.py
and your writeup in a file called hw4_1.doc
or hw4_1.pdf
. If you are using Jupyter Notebook, put your code and writeup in hw4_1.ipynb
Do exercise 1 from hw04.pdf
2) Homework Problem 2: Bootstrap
For problem 2, put your code in a file called hw4_2.py
and your writeup in a file called hw4_2.doc
or hw4_2.pdf
. If you are using Jupyter Notebook, put your code and writeup in hw4_2.ipynb
Do exercise 2 from hw04.pdf
What you need to submit
Each of the homework problems specify what file(s) to generate and submit for
that problem. Remember that if you are writing code in a .py
file, you must
include your writeup in an accompanying .doc
or .pdf
file. If you are
writing code in a .ipynb
file, your writeup should be included inline.
Submitting your code
Please tag the version of the code that you want to submit with submission
, as you did in HW0.
Don't forget to commit the code that you want to submit before tagging your submission. You have to do this in two steps.