Divide row value by aggregated sum in R data.frame
There are various ways of solving this, here's one
with(dat, ave(y, x, FUN = function(x) x/sum(x)))
## [1] 0.3750000 0.6666667 0.4444444 0.5555556 0.3333333 0.6250000
Here's another possibility
library(data.table)
setDT(dat)[, z := y/sum(y), by = x]
dat
# x y z
# 1: 1 3 0.3750000
# 2: 2 4 0.6666667
# 3: 3 4 0.4444444
# 4: 3 5 0.5555556
# 5: 2 2 0.3333333
# 6: 1 5 0.6250000
Here's a third one
library(dplyr)
dat %>%
group_by(x) %>%
mutate(z = y/sum(y))
# Source: local data frame [6 x 3]
# Groups: x
#
# x y z
# 1 1 3 0.3750000
# 2 2 4 0.6666667
# 3 3 4 0.4444444
# 4 3 5 0.5555556
# 5 2 2 0.3333333
# 6 1 5 0.6250000
summing rows of specific columns then dividing by the sum
Try this way to specify the column (by sub-setting Df
), and then indicating the margin as 1
Df_new = t(apply(Df[,c(1:3)], 1, \(x) x/sum(x)))
lose draw win
[1,] 0.5000 0.1428571 0.3571429
[2,] 0.0625 0.1250000 0.8125000
How to divide each element in a row by corresponding row value?
Here is one option with tidyverse
. We divide all the columns except the 'Ac' column with the 'Ac', then summarise_all
to return the sum
if any non-NA element is present or else return NA
library(tidyverse)
df %>%
transmute_at(-1, list(~ ./Ac)) %>%
summarise_all(list(~ if(all(is.na(.))) NA else sum(.,na.rm = TRUE)))
# V1 V2 V3 V4 V5 V6 V7
#1 NA 0 9.821429 3.690476 0 0.8484848 0.9188312
It can also be done in a single step
df %>%
summarise_at(-1, list(~ if(all(is.na(.))) NA else (sum(./Ac, na.rm = TRUE)) ))
# V1 V2 V3 V4 V5 V6 V7
#1 NA 0 9.821429 3.690476 0 0.8484848 0.9188312
Update
Based on the comments,
df %>%
summarise_at(-1, list(~ if(all(is.na(.))) NA
else if(sum(is.na(.)) == 1) (sum(./Ac, na.rm = TRUE))
else (sum(Ac* ., na.rm = TRUE)/sum(Ac, na.rm = TRUE)) ))
# V1 V2 V3 V4 V5 V6 V7
#1 NA 0 9.821429 3.690476 0 2.464 2.904
Same method can be translated to data.table
as well
library(data.table)
setDT(df)[, lapply(.SD, function(x) if(all(is.na(x))) NA
else sum(x/Ac, na.rm = TRUE)), .SDcols = 2:ncol(df)]
# V1 V2 V3 V4 V5 V6 V7
#1: NA 0 9.821429 3.690476 0 0.8484848 0.9188312
Updated data.table solution
setDT(df)[, lapply(.SD, function(x) if(all(is.na(x))) NA
else if(sum(is.na(x)) == 1) (sum(x/Ac, na.rm = TRUE))
else (sum(Ac* x, na.rm = TRUE)/sum(Ac, na.rm = TRUE)) ), .SDcols = 2:ncol(df)]
# V1 V2 V3 V4 V5 V6 V7
#1: NA 0 9.821429 3.690476 0 2.464 2.904
data
df <- structure(list(Ac = c(6.6, 8.4), V1 = c(NA_real_, NA_real_),
V2 = c(NA, 0), V3 = c(NA, 82.5), V4 = c(NA, 31), V5 = c(0,
0), V6 = c(5.6, 0), V7 = c(5.2, 1.1)), class = "data.frame",
row.names = c(NA,
-2L))
Adding row values then dividing between data frames
For this, we can also use base R
. Get the colSums
of subset of rows of both datasets, divide, rbind
with the division of 2nd rows of each dataset
cbind(Type = c('A', 'B'), rbind.data.frame(colSums(df1[-2,
-1])/colSums(df2[-2, -1]), df1[2, -1]/df2[2, -1]))
# Type 2016 2017
#1 A 0.5 0.3555556
#2 B 0.3 0.5000000
Here, the subsetting is done for rows and columns with index
df2[-2, -1]
implies, we remove the 2nd rows and the first column. The indexing is row,column. If it is positive, then we are keeping that rows/columns. Here, those rows/columns are removed.
data
df1 <- structure(list(ID = 1:3, `2016` = c(5L, 15L, 10L), `2017` = c(6L,
20L, 10L)), class = "data.frame", row.names = c(NA, -3L))
df2 <- structure(list(ID = 1:3, X2016 = c(20L, 50L, 10L), X2017 = c(30L,
40L, 15L)), class = "data.frame", row.names = c(NA, -3L))
Divide the dataframe to the sum of each row
Here's a simple way with a for
loop. I'll assume you have a list of column indices for each group:
groups = list(c(1, 2), c(3, 4))
result = dd
for (g in groups) {
result[g] = dd[g] / rowSums(dd[g])
}
result
# a b c d
# 1 0.3333333 0.6666667 0.4285714 0.5714286
# 2 0.2500000 0.7500000 0.5555556 0.4444444
You could also use lapply
like this:
result2 = do.call(cbind, lapply(groups, function(g) dd[g] / rowSums(dd[g])))
Using this input data:
dd = read.table(text = "a b c d
1 2 3 4
1 3 5 4", header = T)
Group and divide multiple values
You shoudn't group by area(km^2)
:
df %>%
group_by(year, country, disastertype) %>%
mutate(proportion = `area(km^2)` / sum(`area(km^2)`)) %>%
ungroup()
Divide one value in a data.frame by another in an alternate data.frame base on row and column meta data
We could use a join here
library(data.table)
nm1 <- paste0("V", 1:4)
setDT(df1)[, (nm1) := lapply(.SD, as.numeric), .SDcols = nm1]
df1[df2, (nm1) := Map(`/`, mget(nm1),
mget(paste0("i.", nm1))), on = .(Gene = Category)]
-output
df1
Gene Transcript_ID V1 V2 V3 V4
1: ENSG00000000003.14 ENST00000612152.4 0 0.6666667 0 1
2: ENSG00000000003.14 ENST00000373020.8 1 0.0000000 1 0
3: ENSG00000000003.14 ENST00000614008.4 0 0.0000000 0 0
4: ENSG00000000003.14 ENST00000496771.5 0 0.3333333 0 0
data
df1 <- structure(list(Gene = c("ENSG00000000003.14", "ENSG00000000003.14",
"ENSG00000000003.14", "ENSG00000000003.14"), Transcript_ID = c("ENST00000612152.4",
"ENST00000373020.8", "ENST00000614008.4", "ENST00000496771.5"
), V1 = c(0L, 4L, 0L, 0L), V2 = c(6L, 0L, 0L, 3L), V3 = c(0L,
5L, 0L, 0L), V4 = c(3L, 0L, 0L, 0L)), class = "data.frame", row.names = c("1",
"2", "3", "4"))
df2 <- structure(list(Category = c("ENSG00000000003.14", "ENSG00000000005.6",
"ENSG00000000419.12", "ENSG00000000457.14"), V1 = c(4, 0, 61,
577.01), V2 = c(9, 0, 94, 698.2), V3 = c(5, 0, 103, 815.49),
V4 = c(3, 0, 71, 697.72)), class = "data.frame", row.names = c("1",
"2", "3", "4"))
Related Topics
R: Replace Multiple Values in Multiple Columns of Dataframes with Na
How to Make Variable Bar Widths in Ggplot2 Not Overlap or Gap
Plot.New Has Not Been Called Yet
Get Rid of \Addlinespace in Kable
Emoticons in Twitter Sentiment Analysis in R
Improve Centering County Names Ggplot & Maps
How to Access and Edit Rprofile
How to Determine the Namespace of a Function
Stop an R Program Without Error
Smaller Gap Between Two Legends in One Plot (E.G. Color and Size Scale)
Reading Global Variables Using Foreach in R
Join Two Data Frames in R Based on Closest Timestamp
Install.Packages Fails in Knitr Document: "Trying to Use Cran Without Setting a Mirror"
How to Change the Figure Caption Format in Bookdown
Is There a Logical Way to Think About List Indexing
R Ggplot2: Stat_Count() Must Not Be Used with a Y Aesthetic Error in Bar Graph