Homework 3 - Bayesian Statistics, Logistic Regression, Generative Models, and K-Nearest Neighbors

Due Friday, April 12 11:59 PM

Homework Questions

Required

  1. ISL Sec. 4.8: 2

  2. ISL Sec. 4.8: 3

  3. ISL Sec. 4.8: 5

  4. ISL Sec. 4.8: 13

  5. [KNN Curse of Dimensionality]

    1. Generate Generate the covariates \(x_1, x_2, \dots, x_5\) of \(n = 1000\) training data from independent standard normal distribution. Then, generate \(Y\) from \[Y = X_1 + 0.5 X_2 - X_3 + \epsilon,\] where \(\epsilon \sim N(0, 1).\)

    2. Use the first 500 observations as the training data and the rest as the test data. Fit KNN regression, and report the test MSE of \(y\) with the optimal \(K\).

    3. Add additional 95 noisy predictors as follows.

      • Case 1: \(x_6, x_7, \dots, x_{100} \overset{\mathrm{iid}}{\sim} N(0, 1)\)

      • Case 2: \(XA\) where \(X_{1000 \times 5} = [x_1 \cdots x_5]\) and \(A_{5 \times 95}\) having entries from iid uniform(0, 1).

    4. Fit KNN regression in both cases (with the total of 100 covariates) and select the best \(K\) value.

    5. For both cases, what is the best K and the best mean squared error for prediction? Discuss the effect of adding 95 (unnecessary) covariates.

  6. [MNIST Handwritten Digits Image]

    1. Load the prepared MNIST data mnist.csv. Print some images.

    2. Use the first 1000 observations as the training data and the second half as the test data.

    3. Training with KNN and predicting on the test data with the best \(K\) selected from the training.

      • Calculate the test error rate.
      • Generate the confusion matrix.
    4. Training with multinomial logistic regression and predicting on the test data.

      • Calculate the test error rate.
      • Generate the confusion matrix.

Do one of the followings

You are encouraged to do both, but to complete the homework, you only need to do one of the two assignments.

  1. Watch the talk All About that Bayes: Probability, Statistics, and the Quest to Quantify Uncertainty by Dr. Kristin Lennox. In 250 words, summarize your thoughts and what you learn from the talk.
  1. In 250 words, summarize your thoughts and what you learned from the deep learning workshop.