Principal Components Analysis - How to Get the Contribution (%) of Each Parameter to a Prin.Comp

Principal Components Analysis - how to get the contribution (%) of each parameter to a Prin.Comp.?

You want the $loadings component of the returned object:

R> class(pca$loadings)
[1] "loadings"
R> pca$loadings

Loadings:
Comp.1 Comp.2 Comp.3 Comp.4 Comp.5
a -0.198 0.713 -0.671
b 0.600 0.334 -0.170 0.707
c -0.600 -0.334 0.170 0.707
d 0.439 -0.880 -0.180
e 0.221 0.701 0.678

Comp.1 Comp.2 Comp.3 Comp.4 Comp.5
SS loadings 1.0 1.0 1.0 1.0 1.0
Proportion Var 0.2 0.2 0.2 0.2 0.2
Cumulative Var 0.2 0.4 0.6 0.8 1.0

Note that this has a special print() method which suppresses printing of small loadings.

If you want this as a relative contribution then sum up the loadings per column and express each loading as a proportion of the column (loading) sum, taking care to use the absolute values to account for negative loadings.

R> load <- with(pca, unclass(loadings))
R> load
Comp.1 Comp.2 Comp.3 Comp.4 Comp.5
a -0.1980087 0.712680378 0.04606100 -0.6713848 0.000000e+00
b 0.5997346 -0.014945831 0.33353047 -0.1698602 7.071068e-01
c -0.5997346 0.014945831 -0.33353047 0.1698602 7.071068e-01
d 0.4389388 0.009625746 -0.88032515 -0.1796321 5.273559e-16
e 0.2208215 0.701104321 -0.02051507 0.6776944 -1.110223e-16

This final step then yields the proportional contribution to the each principal component

R> aload <- abs(load) ## save absolute values
R> sweep(aload, 2, colSums(aload), "/")
Comp.1 Comp.2 Comp.3 Comp.4 Comp.5
a 0.09624979 0.490386943 0.02853908 0.35933068 0.000000e+00
b 0.29152414 0.010284050 0.20665322 0.09091055 5.000000e-01
c 0.29152414 0.010284050 0.20665322 0.09091055 5.000000e-01
d 0.21336314 0.006623362 0.54544349 0.09614059 3.728970e-16
e 0.10733880 0.482421595 0.01271100 0.36270762 7.850462e-17

R> colSums(sweep(aload, 2, colSums(aload), "/"))
Comp.1 Comp.2 Comp.3 Comp.4 Comp.5
1 1 1 1 1

If using the preferred prcomp() then the relevant loadings are in the $rotation component:

R> pca2 <- prcomp(my_table, scale = TRUE)
R> pca2$rotation
PC1 PC2 PC3 PC4 PC5
a -0.1980087 0.712680378 -0.04606100 -0.6713848 0.000000e+00
b 0.5997346 -0.014945831 -0.33353047 -0.1698602 -7.071068e-01
c -0.5997346 0.014945831 0.33353047 0.1698602 -7.071068e-01
d 0.4389388 0.009625746 0.88032515 -0.1796321 -3.386180e-15
e 0.2208215 0.701104321 0.02051507 0.6776944 5.551115e-17

And the relevant incantation is now:

R> aload <- abs(pca2$rotation)
R> sweep(aload, 2, colSums(aload), "/")
PC1 PC2 PC3 PC4 PC5
a 0.09624979 0.490386943 0.02853908 0.35933068 0.000000e+00
b 0.29152414 0.010284050 0.20665322 0.09091055 5.000000e-01
c 0.29152414 0.010284050 0.20665322 0.09091055 5.000000e-01
d 0.21336314 0.006623362 0.54544349 0.09614059 2.394391e-15
e 0.10733880 0.482421595 0.01271100 0.36270762 3.925231e-17

How do I find the link between principal components and raw data's variables?

As @Axeman mentioned, you can look at the rotation

if you have this PCA:

pcaRes <- prcomp(df, scale. = TRUE)

Then, look at the rotations

loadings <- pcaRes$rotation

This should show how the variables contribute to the PCA axes. e.g.,
negative values indicate negative relationship.

If you want the relative contribution of each variable, you can sum
the total loadings for each PC axis (use absolute value for negatives) then
divide each value with the column sum

#You can do this quick and dirty way
t(t(abs(loadings))/rowSums(t(abs(loadings))))*100

# or this sweet function
sweep(x = abs(loadings), MARGIN = 2,
STATS = colSums(abs(loadings)), FUN = "/")*100

PCA -how are the principal components mapped?

It's a old question... but maybe someone needs it in the future

library(stats)
data(USArrests)
PCA.USA <- prcomp(USArrests[,c(1,2,4)], scale=TRUE)
proporcionDeInfluencia <- abs(PCA.USA$rotation)
sweep(proporcionDeInfluencia, 2, colSums(proporcionDeInfluencia), "/")

More info in Principal Components Analysis - how to get the contribution (%) of each parameter to a Prin.Comp.?

How to programmatically determine the column indices of principal components using FactoMineR package?

Not sure if my interpretation of your question is correct, apologies if not. From what I gather you are using PCA as an initial tool to show you what variables are the most important in explaining the dataset. You then want to go back to your original data, select these variables quickly without manual coding each time, and use them for some other analysis.

If this is correct then I have saved the data from the contribution plot, filtered out the variables that have the greatest contribution, and used that result to create a new data frame with these variables alone.

digits = 0:9
# set seed for reproducibility
set.seed(17)
# function to create random string
createRandString <- function(n = 5000) {
a <- do.call(paste0, replicate(5, sample(LETTERS, n, TRUE), FALSE))
paste0(a, sprintf("%04d", sample(9999, n, TRUE)), sample(LETTERS, n, TRUE))
}

df <- data.frame(ID=c(1:10), name=sample(letters[1:10]),
studLoc=sample(createRandString(10)),
finalmark=sample(c(0:100),10),
subj1mark=sample(c(0:100),10),subj2mark=sample(c(0:100),10)
)

df.princomp <- FactoMineR::FAMD(df, graph = FALSE)

factoextra::fviz_screeplot(df.princomp, addlabels = TRUE,
barfill = "gray", barcolor = "black",
ylim = c(0, 50), xlab = "Principal Component",
ylab = "Percentage of explained variance",
main = "Principal Component (PC) for mixed variables")

#find the top contributing variables to the overall variation in the dataset
#here I am choosing the top 10 variables (although we only have 6 in our df).
#note you can specify which axes you want to look at with axes=, you can even do axes=c(1,2)

f<-factoextra::fviz_contrib(df.princomp, choice = "var",
axes = c(1), top = 10, sort.val = c("desc"))

#save data from contribution plot
dat<-f$data

#filter out ID's that are higher than, say, 20

r<-rownames(dat[dat$contrib>20,])

#extract these from your original data frame into a new data frame for further analysis

new<-df[r]

new

#finalmark name studLoc
#1 53 b POTYQ0002N
#2 73 i LWMTW1195I
#3 95 d VTUGO1685F
#4 39 f YCGGS5755N
#5 97 c GOSWE3283C
#6 58 g APBQD6181U
#7 67 a VUJOG1460V
#8 64 h YXOGP1897F
#9 15 j NFUOB6042V
#10 81 e QYTHG0783G

Based on your comment, where you said you wanted to 'Find variables with value greater than 5 in Dim.1 AND Dim.2 and save these variables to a new data frame', I would do this:

#top contributors to both Dim 1 and 2

f<-factoextra::fviz_contrib(df.princomp, choice = "var",
axes = c(1,2), top = 10, sort.val = c("desc"))

#save data from contribution plot
dat<-f$data

#filter out ID's that are higher than 5

r<-rownames(dat[dat$contrib>5,])

#extract these from your original data frame into a new data frame for further analysis

new<-df[r]

new

(This keeps all the original variables in our new data frame since they all contributed more than 5% to the total variance)

How do I extract summary of PCA as a dataframe in R using Prcomp?

What you are looking for is in the "element" importance of summary(res.pca):

Example taken from Principal Components Analysis - how to get the contribution (%) of each parameter to a Prin.Comp.?:

a <- rnorm(10, 50, 20)
b <- seq(10, 100, 10)
c <- seq(88, 10, -8)
d <- rep(seq(3, 16, 3), 2)
e <- rnorm(10, 61, 27)

my_table <- data.frame(a, b, c, d, e)
res.pca <- prcomp(my_table, scale = TRUE)

summary(res.pca)$importance
# PC1 PC2 PC3 PC4 PC5
#Standard deviation 1.7882 0.9038 0.8417 0.52622 9.037e-17
#Proportion of Variance 0.6395 0.1634 0.1417 0.05538 0.000e+00
#Cumulative Proportion 0.6395 0.8029 0.9446 1.00000 1.000e+00

class(summary(res.pca)$importance)
#[1] "matrix"

N.B.:
When you want to "study" an object, it can be convenient to use str on it. Here, you can do str(summary(pca) to see where the information are and hence where you can get what you want:

str(summary(res.pca))

List of 6
$ sdev : num [1:5] 1.79 9.04e-01 8.42e-01 5.26e-01 9.04e-17
$ rotation : num [1:5, 1:5] 0.278 0.512 -0.512 0.414 -0.476 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:5] "a" "b" "c" "d" ...
.. ..$ : chr [1:5] "PC1" "PC2" "PC3" "PC4" ...
$ center : Named num [1:5] 34.9 55 52 9 77.8
..- attr(*, "names")= chr [1:5] "a" "b" "c" "d" ...
$ scale : Named num [1:5] 22.4 30.28 24.22 4.47 26.11
..- attr(*, "names")= chr [1:5] "a" "b" "c" "d" ...
$ x : num [1:10, 1:5] -2.962 -1.403 -1.653 -0.537 1.186 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:5] "PC1" "PC2" "PC3" "PC4" ...
 $ importance: num [1:3, 1:5] 1.788 0.64 0.64 0.904 0.163 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:3] "Standard deviation" "Proportion of Variance" "Cumulative Proportion"
.. ..$ : chr [1:5] "PC1" "PC2" "PC3" "PC4" ...
- attr(*, "class")= chr "summary.prcomp"


Related Topics



Leave a reply



Submit