MSSC 6250 Statistical Machine Learning
Supervised Learning: response
Unsupervised Learning: only features
English and Math measure an overall academic performance.
English and Math measure different abilities.
One variable represents one dimension.
With many variables in the data, we live in a high dimensional world.
GOAL:
Find a low-dimensional (usually 2D) representation of the data that captures as much of the information all of those variables provide as possible.
Use two created variables to represent all
Why and when can we omit dimensions?
PCA is a dimension reduction tool that finds a low-dimensional representation of a data set that contains as much as possible of variation.
Each observation lives in a high-dimensional space (lots of variables), but not all of these dimensions (variables) are equally interesting/important.
The concept of interesting/important is measured by the amount that the observations vary along each dimension.
Principal Component 1 (PC1): maximizes the variance of the projected points.
PC1 is the line in the Eng-Math space that is closest to the
PC1 is the best 1D representation of the 2D data
1D representation
2D representation
If the variation for PC1 is
PC1 accounts for
Murder Assault UrbanPop Rape
Alabama 13.2 236 58 21
Alaska 10.0 263 48 44
Arizona 8.1 294 80 31
Arkansas 8.8 190 50 20
California 9.0 276 91 41
Colorado 7.9 204 78 39
Connecticut 3.3 110 77 11
Delaware 5.9 238 72 16
Florida 15.4 335 80 32
Georgia 17.4 211 60 26
Hawaii 5.3 46 83 20
Idaho 2.6 120 54 14
Illinois 10.4 249 83 24
Indiana 7.2 113 65 21
Iowa 2.2 56 57 11
Kansas 6.0 115 66 18
USArrests
pca_output <- prcomp(USArrests, scale = TRUE)
## rotation matrix provides PC loadings
(pca_output$rotation <- -pca_output$rotation)
PC1 PC2 PC3 PC4
Murder 0.54 0.42 -0.34 -0.649
Assault 0.58 0.19 -0.27 0.743
UrbanPop 0.28 -0.87 -0.38 -0.134
Rape 0.54 -0.17 0.82 -0.089
-pca_output$rotation
gives us the same PCs as pca_output$rotation
does. The sign just change the direction, not the angle.pca_output$x
PC1 PC2 PC3 PC4
Alabama 0.98 1.12 -0.44 -0.15
Alaska 1.93 1.06 2.02 0.43
Arizona 1.75 -0.74 0.05 0.83
Arkansas -0.14 1.11 0.11 0.18
California 2.50 -1.53 0.59 0.34
Colorado 1.50 -0.98 1.08 0.00
Connecticut -1.34 -1.08 -0.64 0.12
Delaware 0.05 -0.32 -0.71 0.87
Florida 2.98 0.04 -0.57 0.10
Georgia 1.62 1.27 -0.34 -1.07
Hawaii -0.90 -1.55 0.05 -0.89
Idaho -1.62 0.21 0.26 0.49
Illinois 1.37 -0.67 -0.67 0.12
Indiana -0.50 -0.15 0.23 -0.42
Iowa -2.23 -0.10 0.16 -0.02
Kansas -0.79 -0.27 0.03 -0.20
PC1 PC2 PC3 PC4
Murder 0.54 0.42 -0.34 -0.649
Assault 0.58 0.19 -0.27 0.743
UrbanPop 0.28 -0.87 -0.38 -0.134
Rape 0.54 -0.17 0.82 -0.089
Assualt
, Murder
and Rape
, with much less weights on UrbanPop
.UrbanPop
, and much less weight on the other 3 features. PC1 PC2 PC3 PC4
Wisconsin -2.06 -0.61 -0.14 -0.18
Wyoming -0.62 0.32 -0.24 0.16
Higher value of PC1 means higher crime rates (roughly).
Higher value of PC2 means higher level of urbanization (roughly).
UrbanPop
.Assualt
, Murder
and Rape
) are located close to each other.UrbanPop
is far from the other three.Assualt
, Murder
and Rape
are more correlated, and UrbanPop
is less correlated with the other three.[1] 2.48 0.99 0.36 0.17
[1] 0.620 0.247 0.089 0.043
Look for a point at which the proportion of variance explained by each subsequent PC drops off.
The PC1 loading vector solves
Maximize the sample variance of the projected points, or the scores
The PC loading vector defines a direction in feature space along which the data vary the most.
For
PCs provide low-dimensional planes that are closest to the observations.
If we perform PCA on the unscaled variables, PC1 loading vector will have a large loading for Assault
.
When all the variables are of the same type, no need to scale the variables.
PCA is equivalent to singular value decomposition (SVD) of
The
Project
PCA is equivalent to eigendecomposition of
Total variation:
Variation of
Transform
The least- squares estimator
A small
The PC regression combats multicollinearity by using less PCs
pls::pcr()
pls::pcr()
[1] 142812
Data: X dimension: 263 19
Y dimension: 263 1
Fit method: svdpc
Number of components considered: 5
TRAINING: % variance explained
1 comps 2 comps 3 comps 4 comps 5 comps
X 38.31 60.16 70.84 79.03 84.29
y 40.63 41.58 42.17 43.22 44.90
Kernel Principal Component Analysis https://ml-explained.com/blog/kernel-pca-explained
Probabilistic PCA
Factor Analysis
Autoencoders
t-SNE