Dividing Each Cell in a Data Set by the Column Sum in R

Dividing each cell in a data set by the column sum in R

Given this:

> d = data.frame(sample=c("a2","a3"),a=c(1,5),b=c(4,5),c=c(6,4))
> d
sample a b c
1 a2 1 4 6
2 a3 5 5 4

You can replace every column other than the first by applying over the rest:

> d[,-1] = apply(d[,-1],2,function(x){x/sum(x)})

> d
sample a b c
1 a2 0.1666667 0.4444444 0.6
2 a3 0.8333333 0.5555556 0.4

If you don't want d being stomped on make a copy beforehand.

summing rows of specific columns then dividing by the sum

Try this way to specify the column (by sub-setting Df), and then indicating the margin as 1

Df_new = t(apply(Df[,c(1:3)], 1, \(x) x/sum(x)))

lose draw win
[1,] 0.5000 0.1428571 0.3571429
[2,] 0.0625 0.1250000 0.8125000

Dividing columns by colSums in R

See ?sweep, eg:

> sweep(m,2,colSums(m),`/`)
[,1] [,2] [,3]
[1,] 0.08333333 0.1333333 0.1666667
[2,] 0.33333333 0.3333333 0.3333333
[3,] 0.58333333 0.5333333 0.5000000

or you can transpose the matrix and then colSums(m) gets recycled correctly. Don't forget to transpose afterwards again, like this :

> t(t(m)/colSums(m))
[,1] [,2] [,3]
[1,] 0.08333333 0.1333333 0.1666667
[2,] 0.33333333 0.3333333 0.3333333
[3,] 0.58333333 0.5333333 0.5000000

Or you use the function prop.table() to do basically the same:

> prop.table(m,2)
[,1] [,2] [,3]
[1,] 0.08333333 0.1333333 0.1666667
[2,] 0.33333333 0.3333333 0.3333333
[3,] 0.58333333 0.5333333 0.5000000

The time differences are rather small. the sweep() function and the t() trick are the most flexible solutions, prop.table() is only for this particular case

Dividing cell with sum of every nth cell in same column in R

You can achieve your "dream dataframe" by :

library(dplyr)

df %>%
group_by(Country) %>%
mutate(across(LT5F:Y9t14T, prop.table)) %>%
ungroup

# Country LT5F LT5M LT5T Y9t14F Y9t14M Y9t14T
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 AL 0.4 0.357 0.375 0.333 0.0909 0.2
# 2 AL 0.2 0.214 0.208 0.222 0.455 0.35
# 3 AL 0.1 0.286 0.208 0.111 0.273 0.2
# 4 AL 0.3 0.143 0.208 0.333 0.182 0.25
# 5 FR 0.25 0.2 0.222 0.263 0.25 0.257
# 6 FR 0.125 0.1 0.111 0.158 0.375 0.257
# 7 FR 0.5 0 0.222 0.368 0.0625 0.229
# 8 FR 0.125 0.7 0.444 0.211 0.312 0.257
# 9 UK 0.286 0.5 0.385 0.231 0.214 0.222
#10 UK 0.143 0.333 0.231 0.231 0.286 0.259
#11 UK 0.286 0.167 0.231 0.154 0.286 0.222
#12 UK 0.286 0 0.154 0.385 0.214 0.296

If you have NA's you can use :

library(dplyr)

df %>%
group_by(Country) %>%
mutate(across(LT5F:Y9t14T, ~./sum(., na.rm = TRUE))) %>%
ungroup

Divide each each cell of large matrix by sum of its row

You could do this using apply, but scale in this case makes things even simplier. Assuming you want to divide columns by their sums:

set.seed(0)
relative_abundance <- matrix(sample(1:10, 360*375, TRUE), nrow= 375)

freqs <- scale(relative_abundance, center = FALSE,
scale = colSums(relative_abundance))

The matrix is too big to output here, but here's how it shoud look like:

> head(freqs[, 1:5])
[,1] [,2] [,3] [,4] [,5]
[1,] 0.004409603 0.0014231499 0.003439803 0.004052685 0.0024026910
[2,] 0.001469868 0.0023719165 0.002457002 0.005065856 0.0004805382
[3,] 0.001959824 0.0018975332 0.004914005 0.001519757 0.0043248438
[4,] 0.002939735 0.0042694497 0.002948403 0.002532928 0.0009610764
[5,] 0.004899559 0.0009487666 0.000982801 0.001519757 0.0028832292
[6,] 0.001469868 0.0023719165 0.002457002 0.002026342 0.0009610764

And a sanity check:

> head(colSums(freqs))
[1] 1 1 1 1 1 1

Using apply:

freqs2 <- apply(relative_abundance, 2, function(i) i/sum(i))

This has the advatange of being easly changed to run by rows, but the results will be joined as columns anyway, so you'd have to transpose it.

Column sum in mutate function in R

You are very close.
Is this what you want?

head(iris[1:4]) %>% summarise(across(.cols = c(1:4), .fns = function(x) {x/sum(x)}))

Output:

 Sepal.Length Sepal.Width Petal.Length Petal.Width
1 0.1717172 0.1724138 0.1609195 0.1428571
2 0.1649832 0.1477833 0.1609195 0.1428571
3 0.1582492 0.1576355 0.1494253 0.1428571
4 0.1548822 0.1527094 0.1724138 0.1428571
5 0.1683502 0.1773399 0.1609195 0.1428571
6 0.1818182 0.1921182 0.1954023 0.2857143

How do I Divide Values in a Column by the Value in the Last Cell?

You can try

df$CFn <- with(df,CumFreq/sum(Freq))

or

df$CFn <- with(df,CumFreq/tail(CumFreq,1))

Divide row value by aggregated sum in R data.frame

There are various ways of solving this, here's one

with(dat, ave(y, x, FUN = function(x) x/sum(x)))
## [1] 0.3750000 0.6666667 0.4444444 0.5555556 0.3333333 0.6250000

Here's another possibility

library(data.table)
setDT(dat)[, z := y/sum(y), by = x]
dat
# x y z
# 1: 1 3 0.3750000
# 2: 2 4 0.6666667
# 3: 3 4 0.4444444
# 4: 3 5 0.5555556
# 5: 2 2 0.3333333
# 6: 1 5 0.6250000

Here's a third one

library(dplyr)
dat %>%
group_by(x) %>%
mutate(z = y/sum(y))

# Source: local data frame [6 x 3]
# Groups: x
#
# x y z
# 1 1 3 0.3750000
# 2 2 4 0.6666667
# 3 3 4 0.4444444
# 4 3 5 0.5555556
# 5 2 2 0.3333333
# 6 1 5 0.6250000


Related Topics



Leave a reply



Submit