Homework 2 - Ridge, Lasso, and Splines
Due Friday, Mar 1 11:59 PM
Please submit your work in one PDF file to D2L > Assessments > Dropbox. Multiple files or a file that is not in pdf format are not allowed.
Any relevant code should be attached.
Read ISL Chapter 5.1, 6.2, and 7.
Homework Questions
ISL Sec. 6.6: 4
ISL Sec. 6.6: 6
ISL Sec. 6.6: 9 (a)-(d)
ISL Sec. 7.9: 9
[Special Case for Ridge and Lasso] Suppose for a linear regression problem with \(n = p\), \({\bf X} = {\bf I}\) and no intercept. Show that
The least squares problem can be simplified to finding \(\beta_1, \dots, \beta_p\) that minimize \(\sum_{j=1}^p\left(y_j - \beta_j\right)^2\). What is least squares estimator \(b_j\)?
The ridge estimator is \(\hat{\beta}_j^r = \frac{y_j}{1+\lambda} = \underset{\beta_j}{\arg\min} \sum_{j=1}^p\left(y_j - \beta_j\right)^2 + \lambda \sum_{j=1}^p \beta_j^2\).
The lasso solution of \(\sum_{j=1}^p\left(y_j - \beta_j\right)^2 + \lambda \sum_{j=1}^p | \beta_j |\) is \[\hat{\beta}_j^l = \begin{cases} y_j - \lambda/2 & \quad \text{if } y_j > \lambda/2\\ y_j + \lambda/2 & \quad \text{if } y_j < -\lambda/2 \\ 0 & \quad \text{if } |y_j| \le \lambda/2 \end{cases}\]
Describe the ridge and lasso shrinkage behavior.
[Lasso with Correlated Variables]
Consider the linear regression
\[ \mathbf{y}= \mathbf{X}\boldsymbol \beta+ \boldsymbol \epsilon\]
where \(\boldsymbol \beta= (\beta_1, \beta_2, \ldots, \beta_{20})'\) with \(\beta_1 = \beta_{2} = \beta_{3} = 0.5\) and all other \(\beta_j = 0, j = 4, \dots, 20\). No intercept \(\beta_0\). The input vector \(\mathbf{x}= (x_1, x_2, \dots, x_{20})'\) follows a multivariate Gaussian distribution
\[\mathbf{x}\sim N\left(\mathbf{0}, \Sigma_{20 \times 20}\right)\]
In \(\Sigma\), all diagonal elements are 1, and all off-diagonal elements are \(\rho\) which measures the correlation between any two predictors.
Generate training data of size 400 and test data of size 100 independently from the above model with \(\rho = 0.1\) and \(\epsilon_i \stackrel{iid} \sim N(0, 1)\).
Fit a Lasso model on the training data with 10-fold cross-validation (CV).
Compute test MSE with the optimal \(\lambda\) selected by CV in (b). Does Lasso select the correct variables?
Repeat (a)-(c) 100 times. That is, generate 100 different training and test data sets. For each run, record the test MSE and whether or not the true model is correctly selected. Then compute the average test MSE and the proportion of runs where the correct model was selected.
Redo (a)-(d) with \(\rho = 0.6\). Compare the two average test MSEs and the proportions. Comment the result.