Homework 2 - Ridge, Lasso, and Splines

Due Friday, Mar 1 11:59 PM

Homework Questions

  1. ISL Sec. 6.6: 4

  2. ISL Sec. 6.6: 6

  3. ISL Sec. 6.6: 9 (a)-(d)

  4. ISL Sec. 7.9: 9

  5. [Special Case for Ridge and Lasso] Suppose for a linear regression problem with \(n = p\), \({\bf X} = {\bf I}\) and no intercept. Show that

    1. The least squares problem can be simplified to finding \(\beta_1, \dots, \beta_p\) that minimize \(\sum_{j=1}^p\left(y_j - \beta_j\right)^2\). What is least squares estimator \(b_j\)?

    2. The ridge estimator is \(\hat{\beta}_j^r = \frac{y_j}{1+\lambda} = \underset{\beta_j}{\arg\min} \sum_{j=1}^p\left(y_j - \beta_j\right)^2 + \lambda \sum_{j=1}^p \beta_j^2\).

    3. The lasso solution of \(\sum_{j=1}^p\left(y_j - \beta_j\right)^2 + \lambda \sum_{j=1}^p | \beta_j |\) is \[\hat{\beta}_j^l = \begin{cases} y_j - \lambda/2 & \quad \text{if } y_j > \lambda/2\\ y_j + \lambda/2 & \quad \text{if } y_j < -\lambda/2 \\ 0 & \quad \text{if } |y_j| \le \lambda/2 \end{cases}\]

    4. Describe the ridge and lasso shrinkage behavior.

  6. [Lasso with Correlated Variables]

    Consider the linear regression

    \[ \mathbf{y}= \mathbf{X}\boldsymbol \beta+ \boldsymbol \epsilon\]

    where \(\boldsymbol \beta= (\beta_1, \beta_2, \ldots, \beta_{20})'\) with \(\beta_1 = \beta_{2} = \beta_{3} = 0.5\) and all other \(\beta_j = 0, j = 4, \dots, 20\). No intercept \(\beta_0\). The input vector \(\mathbf{x}= (x_1, x_2, \dots, x_{20})'\) follows a multivariate Gaussian distribution

    \[\mathbf{x}\sim N\left(\mathbf{0}, \Sigma_{20 \times 20}\right)\]

    In \(\Sigma\), all diagonal elements are 1, and all off-diagonal elements are \(\rho\) which measures the correlation between any two predictors.

    1. Generate training data of size 400 and test data of size 100 independently from the above model with \(\rho = 0.1\) and \(\epsilon_i \stackrel{iid} \sim N(0, 1)\).

    2. Fit a Lasso model on the training data with 10-fold cross-validation (CV).

    3. Compute test MSE with the optimal \(\lambda\) selected by CV in (b). Does Lasso select the correct variables?

    4. Repeat (a)-(c) 100 times. That is, generate 100 different training and test data sets. For each run, record the test MSE and whether or not the true model is correctly selected. Then compute the average test MSE and the proportion of runs where the correct model was selected.

    5. Redo (a)-(d) with \(\rho = 0.6\). Compare the two average test MSEs and the proportions. Comment the result.