Divide Row Value by Aggregated Sum in R Data.Frame

Divide row value by aggregated sum in R data.frame

There are various ways of solving this, here's one

with(dat, ave(y, x, FUN = function(x) x/sum(x)))
## [1] 0.3750000 0.6666667 0.4444444 0.5555556 0.3333333 0.6250000

Here's another possibility

library(data.table)
setDT(dat)[, z := y/sum(y), by = x]
dat
# x y z
# 1: 1 3 0.3750000
# 2: 2 4 0.6666667
# 3: 3 4 0.4444444
# 4: 3 5 0.5555556
# 5: 2 2 0.3333333
# 6: 1 5 0.6250000

Here's a third one

library(dplyr)
dat %>%
group_by(x) %>%
mutate(z = y/sum(y))

# Source: local data frame [6 x 3]
# Groups: x
#
# x y z
# 1 1 3 0.3750000
# 2 2 4 0.6666667
# 3 3 4 0.4444444
# 4 3 5 0.5555556
# 5 2 2 0.3333333
# 6 1 5 0.6250000

summing rows of specific columns then dividing by the sum

Try this way to specify the column (by sub-setting Df), and then indicating the margin as 1

Df_new = t(apply(Df[,c(1:3)], 1, \(x) x/sum(x)))

lose draw win
[1,] 0.5000 0.1428571 0.3571429
[2,] 0.0625 0.1250000 0.8125000

How to divide each element in a row by corresponding row value?

Here is one option with tidyverse. We divide all the columns except the 'Ac' column with the 'Ac', then summarise_all to return the sum if any non-NA element is present or else return NA

library(tidyverse)
df %>%
transmute_at(-1, list(~ ./Ac)) %>%
summarise_all(list(~ if(all(is.na(.))) NA else sum(.,na.rm = TRUE)))
# V1 V2 V3 V4 V5 V6 V7
#1 NA 0 9.821429 3.690476 0 0.8484848 0.9188312

It can also be done in a single step

df %>% 
summarise_at(-1, list(~ if(all(is.na(.))) NA else (sum(./Ac, na.rm = TRUE)) ))
# V1 V2 V3 V4 V5 V6 V7
#1 NA 0 9.821429 3.690476 0 0.8484848 0.9188312

Update

Based on the comments,

df %>% 
summarise_at(-1, list(~ if(all(is.na(.))) NA
else if(sum(is.na(.)) == 1) (sum(./Ac, na.rm = TRUE))
else (sum(Ac* ., na.rm = TRUE)/sum(Ac, na.rm = TRUE)) ))
# V1 V2 V3 V4 V5 V6 V7
#1 NA 0 9.821429 3.690476 0 2.464 2.904

Same method can be translated to data.table as well

library(data.table)
setDT(df)[, lapply(.SD, function(x) if(all(is.na(x))) NA
else sum(x/Ac, na.rm = TRUE)), .SDcols = 2:ncol(df)]
# V1 V2 V3 V4 V5 V6 V7
#1: NA 0 9.821429 3.690476 0 0.8484848 0.9188312

Updated data.table solution

setDT(df)[, lapply(.SD, function(x) if(all(is.na(x))) NA
else if(sum(is.na(x)) == 1) (sum(x/Ac, na.rm = TRUE))
else (sum(Ac* x, na.rm = TRUE)/sum(Ac, na.rm = TRUE)) ), .SDcols = 2:ncol(df)]
# V1 V2 V3 V4 V5 V6 V7
#1: NA 0 9.821429 3.690476 0 2.464 2.904

data

df <- structure(list(Ac = c(6.6, 8.4), V1 = c(NA_real_, NA_real_), 
V2 = c(NA, 0), V3 = c(NA, 82.5), V4 = c(NA, 31), V5 = c(0,
0), V6 = c(5.6, 0), V7 = c(5.2, 1.1)), class = "data.frame",
row.names = c(NA,
-2L))

Adding row values then dividing between data frames

For this, we can also use base R. Get the colSums of subset of rows of both datasets, divide, rbind with the division of 2nd rows of each dataset

cbind(Type = c('A', 'B'), rbind.data.frame(colSums(df1[-2, 
-1])/colSums(df2[-2, -1]), df1[2, -1]/df2[2, -1]))
# Type 2016 2017
#1 A 0.5 0.3555556
#2 B 0.3 0.5000000

Here, the subsetting is done for rows and columns with index

df2[-2, -1] 

implies, we remove the 2nd rows and the first column. The indexing is row,column. If it is positive, then we are keeping that rows/columns. Here, those rows/columns are removed.

data

df1 <- structure(list(ID = 1:3, `2016` = c(5L, 15L, 10L), `2017` = c(6L, 
20L, 10L)), class = "data.frame", row.names = c(NA, -3L))

df2 <- structure(list(ID = 1:3, X2016 = c(20L, 50L, 10L), X2017 = c(30L,
40L, 15L)), class = "data.frame", row.names = c(NA, -3L))

Divide the dataframe to the sum of each row

Here's a simple way with a for loop. I'll assume you have a list of column indices for each group:

groups = list(c(1, 2), c(3, 4))

result = dd
for (g in groups) {
result[g] = dd[g] / rowSums(dd[g])
}

result
# a b c d
# 1 0.3333333 0.6666667 0.4285714 0.5714286
# 2 0.2500000 0.7500000 0.5555556 0.4444444

You could also use lapply like this:

result2 = do.call(cbind, lapply(groups, function(g) dd[g] / rowSums(dd[g])))

Using this input data:

dd = read.table(text = "a   b   c   d
1 2 3 4
1 3 5 4", header = T)

Group and divide multiple values

You shoudn't group by area(km^2):

df %>%
group_by(year, country, disastertype) %>%
mutate(proportion = `area(km^2)` / sum(`area(km^2)`)) %>%
ungroup()

Divide one value in a data.frame by another in an alternate data.frame base on row and column meta data

We could use a join here

library(data.table)
nm1 <- paste0("V", 1:4)
setDT(df1)[, (nm1) := lapply(.SD, as.numeric), .SDcols = nm1]
df1[df2, (nm1) := Map(`/`, mget(nm1),
mget(paste0("i.", nm1))), on = .(Gene = Category)]

-output

 df1
Gene Transcript_ID V1 V2 V3 V4
1: ENSG00000000003.14 ENST00000612152.4 0 0.6666667 0 1
2: ENSG00000000003.14 ENST00000373020.8 1 0.0000000 1 0
3: ENSG00000000003.14 ENST00000614008.4 0 0.0000000 0 0
4: ENSG00000000003.14 ENST00000496771.5 0 0.3333333 0 0

data

df1 <- structure(list(Gene = c("ENSG00000000003.14", "ENSG00000000003.14", 
"ENSG00000000003.14", "ENSG00000000003.14"), Transcript_ID = c("ENST00000612152.4",
"ENST00000373020.8", "ENST00000614008.4", "ENST00000496771.5"
), V1 = c(0L, 4L, 0L, 0L), V2 = c(6L, 0L, 0L, 3L), V3 = c(0L,
5L, 0L, 0L), V4 = c(3L, 0L, 0L, 0L)), class = "data.frame", row.names = c("1",
"2", "3", "4"))

df2 <- structure(list(Category = c("ENSG00000000003.14", "ENSG00000000005.6",
"ENSG00000000419.12", "ENSG00000000457.14"), V1 = c(4, 0, 61,
577.01), V2 = c(9, 0, 94, 698.2), V3 = c(5, 0, 103, 815.49),
V4 = c(3, 0, 71, 697.72)), class = "data.frame", row.names = c("1",
"2", "3", "4"))


Related Topics



Leave a reply



Submit