Divide each data frame row by vector in R
sweep
is useful for these sorts of operations, but it requires a matrix as input. As such, convert your data frame to a matrix, do the operation and then convert back. For example, some dummy data where we divide each element in respective columns of matrix mat
by the corresponding value in the vector vec
:
mat <- matrix(1:25, ncol = 5)
vec <- seq(2, by = 2, length = 5)
sweep(mat, 2, vec, `/`)
In use we have:
> mat
[,1] [,2] [,3] [,4] [,5]
[1,] 1 6 11 16 21
[2,] 2 7 12 17 22
[3,] 3 8 13 18 23
[4,] 4 9 14 19 24
[5,] 5 10 15 20 25
> vec
[1] 2 4 6 8 10
> sweep(mat, 2, vec, `/`)
[,1] [,2] [,3] [,4] [,5]
[1,] 0.5 1.50 1.833333 2.000 2.1
[2,] 1.0 1.75 2.000000 2.125 2.2
[3,] 1.5 2.00 2.166667 2.250 2.3
[4,] 2.0 2.25 2.333333 2.375 2.4
[5,] 2.5 2.50 2.500000 2.500 2.5
> mat[,1] / vec[1]
[1] 0.5 1.0 1.5 2.0 2.5
To convert from a data frame use as.matrix(df)
or data.matrix(df)
, and as.data.frame(mat)
for the reverse.
Divide data frame by vector, by rows and not by columns
You can just use the transpose function:
> df[,2:4] <- t(t(df[,2:4]) / div)
> df
type V1 V2 V3
1 A 0.1 0.01 0.001
2 B 0.1 0.01 0.001
3 C 0.1 0.01 0.001
Dividing each row of a dataframe matrix by a vector with a shorter length in R?
You can use sweep
.
sweep(type.convert(d[, -1]), 2, crow_sqm, `/`)
# Crow_education_Omer Crow_education_Keisha Crow_education_Kate Crow_education_Winston
#[1,] 206.0 123 10.0 207.5
#[2,] 208.0 123 10.2 207.5
#[3,] 208.5 121 10.2 209.0
# Crow_education_Marlin
#[1,] NA
#[2,] NA
#[3,] NA
Or with transpose.
t(t(type.convert(d[, -1]))/crow_sqm)
The data is a matrix and matrix can have data of only one type. The 1st column cannot be represented as number hence all the values in the matrix turns to be of type character. -1
is used to drop 1st column in the matrix and type.convert
is used to change values from character to numeric for all the columns.
Divide each column of a dataframe by one row of the dataframe
We replicate the vector
to make the length same and then do the division
data.mat/unlist(vector)[col(data.mat)]
# FeO Total S SO4 Total N SiO2 Al2O3 Fe2O3 MnO MgO CaO Na2O K2O
#[1,] 0.10 16.5555556 NA NA 0.8908607 0.8987269 0.1835206 0.08333333 0.03680982 0.04175365 0.04823151 0.5738562
#[2,] 0.40 125.8333333 NA NA 0.5510204 0.4456019 0.2359551 0.08333333 0.04294479 0.01878914 0.04501608 0.2588235
#[3,] 0.85 0.6111111 NA NA 1.0021295 1.0162037 0.7715356 1.08333333 0.53987730 0.69728601 1.03858521 1.0457516
#[4,] 0.15 48.0555556 NA NA 1.1027507 0.2569444 NA 0.08333333 0.01840491 0.01878914 0.04180064 0.1647059
#[5,] 0.85 NA NA NA 1.0889086 1.0271991 0.6591760 0.75000000 0.59509202 0.53862213 1.02250804 1.1228758
#[6,] NA NA NA NA 1.3426797 0.6319444 0.0411985 0.08333333 0.03067485 0.11899791 0.65594855 0.7764706
# TiO2 P2O5 LOI LOI2 Total Total 2 Fe2O3(T)
#[1,] 0.7924528 0.3928571 7.0841837 6.6963855 0.9922233 0.9894632 0.14489796
#[2,] 0.5094340 0.3214286 14.5561224 13.7710843 0.9958126 0.9936382 0.31020408
#[3,] 0.8679245 0.6428571 1.5637755 1.5228916 0.9990030 0.9970179 0.80612245
#[4,] 1.4905660 0.2857143 7.4056122 7.0024096 0.9795613 0.9769384 0.05510204
#[5,] 1.0377358 0.2500000 0.3520408 0.3783133 0.9969093 0.9960239 0.74489796
#[6,] 0.3018868 0.2500000 1.2551020 1.1879518 1.0019940 1.0000000 0.04489796
Or use sweep
sweep(data.mat, MARGIN = 2, unlist(vector), FUN = `/`)
Or using mapply
with asplit
mapply(`/`, asplit(data.mat, 2), vector)
data
data_mat <- structure(c(0.2, 0.8, 1.7, 0.3, 1.7, NA, 5.96, 45.3, 0.22, 17.3,
NA, NA, NA, 6.72, NA, 4.08, 0.06, 0.16, NA, NA, NA, NA, NA, NA,
50.2, 31.05, 56.47, 62.14, 61.36, 75.66, 15.53, 7.7, 17.56, 4.44,
17.75, 10.92, 0.49, 0.63, 2.06, NA, 1.76, 0.11, 0.01, 0.01, 0.13,
0.01, 0.09, 0.01, 0.06, 0.07, 0.88, 0.03, 0.97, 0.05, 0.2, 0.09,
3.34, 0.09, 2.58, 0.57, 0.15, 0.14, 3.23, 0.13, 3.18, 2.04, 4.39,
1.98, 8, 1.26, 8.59, 5.94, 0.42, 0.27, 0.46, 0.79, 0.55, 0.16,
0.11, 0.09, 0.18, 0.08, 0.07, 0.07, 27.77, 57.06, 6.13, 29.03,
1.38, 4.92, 27.79, 57.15, 6.32, 29.06, 1.57, 4.93, 99.52, 99.88,
100.2, 98.25, 99.99, 100.5, 99.54, 99.96, 100.3, 98.28, 100.2,
100.6, 0.71, 1.52, 3.95, 0.27, 3.65, 0.22), .Dim = c(6L, 19L), .Dimnames = list(
NULL, c("FeO", "Total S", "SO4", "Total N", "SiO2", "Al2O3",
"Fe2O3", "MnO", "MgO", "CaO", "Na2O", "K2O", "TiO2", "P2O5",
"LOI", "LOI2", "Total", "Total 2", "Fe2O3(T)")))
vector <- structure(list(FeO = 2, `Total S` = 0.36, SO4 = NA_real_, `Total N` = NA_real_,
SiO2 = 56.35, Al2O3 = 17.28, Fe2O3 = 2.67, MnO = 0.12, MgO = 1.63,
CaO = 4.79, Na2O = 3.11, K2O = 7.65, TiO2 = 0.53, P2O5 = 0.28,
LOI = 3.92, LOI2 = 4.15, Total = 100.3, `Total 2` = 100.6,
`Fe2O3(T)` = 4.9), row.names = c(NA, -1L), class = c("tbl_df",
"tbl", "data.frame"))
How to divide each row of a matrix by elements of a vector in R
Here are a few ways in order of increasing code length:
t(t(mat) / dev)
mat / dev[col(mat)] # @DavidArenburg & @akrun
mat %*% diag(1 / dev)
sweep(mat, 2, dev, "/")
t(apply(mat, 1, "/", dev))
plyr::aaply(mat, 1, "/", dev)
mat / rep(dev, each = nrow(mat))
mat / t(replace(t(mat), TRUE, dev))
mapply("/", as.data.frame(mat), dev) # added later
mat / matrix(dev, nrow(mat), ncol(mat), byrow = TRUE) # added later
do.call(rbind, lapply(as.data.frame(t(mat)), "/", dev))
mat2 <- mat; for(i in seq_len(nrow(mat2))) mat2[i, ] <- mat2[i, ] / dev
Data Frames
All the solutions that begin with mat /
also work if mat
is a data frame and produce a data frame result. The same is also the case for the sweep
solution and the last, i.e. mat2
, solution. The mapply
solutions works with data.frames but produces a matrix.
Vector
If mat
is a plain vector rather than a matrix then either of these return a one column matrix
t(t(mat) / dev)
mat / t(replace(t(mat), TRUE, dev))
and this one returns a vector:
plyr::aaply(mat, 1, "/", dev)
The others give an error, warning or not the desired answer.
Benchmarks
The brevity and clarity of the code may be more important than speed but for purposes of completeness here are some benchmarks using 10 repetitions and then 100 repetitions.
library(microbenchmark)
library(plyr)
set.seed(84789)
mat<-matrix(runif(1e6),nrow=1e5)
dev<-runif(10)
microbenchmark(times=10L,
"1" = t(t(mat) / dev),
"2" = mat %*% diag(1/dev),
"3" = sweep(mat, 2, dev, "/"),
"4" = t(apply(mat, 1, "/", dev)),
"5" = mat / rep(dev, each = nrow(mat)),
"6" = mat / t(replace(t(mat), TRUE, dev)),
"7" = aaply(mat, 1, "/", dev),
"8" = do.call(rbind, lapply(as.data.frame(t(mat)), "/", dev)),
"9" = {mat2 <- mat; for(i in seq_len(nrow(mat2))) mat2[i, ] <- mat2[i, ] / dev},
"10" = mat/dev[col(mat)])
giving:
Unit: milliseconds
expr min lq mean median uq max neval
1 7.957253 8.136799 44.13317 8.370418 8.597972 366.24246 10
2 4.678240 4.693771 10.11320 4.708153 4.720309 58.79537 10
3 15.594488 15.691104 16.38740 15.843637 16.559956 19.98246 10
4 96.616547 104.743737 124.94650 117.272493 134.852009 177.96882 10
5 17.631848 17.654821 18.98646 18.295586 20.120382 21.30338 10
6 19.097557 19.365944 27.78814 20.126037 43.322090 48.76881 10
7 8279.428898 8496.131747 8631.02530 8644.798642 8741.748155 9194.66980 10
8 509.528218 524.251103 570.81573 545.627522 568.929481 821.17562 10
9 161.240680 177.282664 188.30452 186.235811 193.250346 242.45495 10
10 7.713448 7.815545 11.86550 7.965811 8.807754 45.87518 10
Re-running the test on all those that took <20 milliseconds with 100 repetitions:
microbenchmark(times=100L,
"1" = t(t(mat) / dev),
"2" = mat %*% diag(1/dev),
"3" = sweep(mat, 2, dev, "/"),
"5" = mat / rep(dev, each = nrow(mat)),
"6" = mat / t(replace(t(mat), TRUE, dev)),
"10" = mat/dev[col(mat)])
giving:
Unit: milliseconds
expr min lq mean median uq max neval
1 8.010749 8.188459 13.972445 8.560578 10.197650 299.80328 100
2 4.672902 4.734321 5.802965 4.769501 4.985402 20.89999 100
3 15.224121 15.428518 18.707554 15.836116 17.064866 42.54882 100
5 17.625347 17.678850 21.464804 17.847698 18.209404 303.27342 100
6 19.158946 19.361413 22.907115 19.772479 21.142961 38.77585 100
10 7.754911 7.939305 9.971388 8.010871 8.324860 25.65829 100
So on both these tests #2 (using diag
) is fastest. The reason may lie in its almost direct appeal to the BLAS, whereas #1 relies on the costlier t
.
R: How to divide columns in matrix by a vector ?
This is perfect use case for sweep
-
sweep(matrix, MARGIN = 2, STATS = vector, `/`)
[,1] [,2] [,3] [,4] [,5]
[1,] 0.5 1.25 3.000000 2.6 8.5
[2,] 1.0 1.50 3.333333 2.8 9.0
[3,] 1.5 1.75 3.666667 3.0 9.5
[4,] 2.0 2.00 4.000000 3.2 10.0
Dividing a dataframe row wise using a vector with condition in r
No need to use apply
, we can use vectorized/matrix operations:
df / t(ifelse(df > 500, dividing_factor, 1))
# col1 col2 col3
# 1 500.00 10000 4e+04
# 2 13333.33 500 1e+05
# 3 33333.33 25000 5e+02
How to divide each element in a row by corresponding row value?
Here is one option with tidyverse
. We divide all the columns except the 'Ac' column with the 'Ac', then summarise_all
to return the sum
if any non-NA element is present or else return NA
library(tidyverse)
df %>%
transmute_at(-1, list(~ ./Ac)) %>%
summarise_all(list(~ if(all(is.na(.))) NA else sum(.,na.rm = TRUE)))
# V1 V2 V3 V4 V5 V6 V7
#1 NA 0 9.821429 3.690476 0 0.8484848 0.9188312
It can also be done in a single step
df %>%
summarise_at(-1, list(~ if(all(is.na(.))) NA else (sum(./Ac, na.rm = TRUE)) ))
# V1 V2 V3 V4 V5 V6 V7
#1 NA 0 9.821429 3.690476 0 0.8484848 0.9188312
Update
Based on the comments,
df %>%
summarise_at(-1, list(~ if(all(is.na(.))) NA
else if(sum(is.na(.)) == 1) (sum(./Ac, na.rm = TRUE))
else (sum(Ac* ., na.rm = TRUE)/sum(Ac, na.rm = TRUE)) ))
# V1 V2 V3 V4 V5 V6 V7
#1 NA 0 9.821429 3.690476 0 2.464 2.904
Same method can be translated to data.table
as well
library(data.table)
setDT(df)[, lapply(.SD, function(x) if(all(is.na(x))) NA
else sum(x/Ac, na.rm = TRUE)), .SDcols = 2:ncol(df)]
# V1 V2 V3 V4 V5 V6 V7
#1: NA 0 9.821429 3.690476 0 0.8484848 0.9188312
Updated data.table solution
setDT(df)[, lapply(.SD, function(x) if(all(is.na(x))) NA
else if(sum(is.na(x)) == 1) (sum(x/Ac, na.rm = TRUE))
else (sum(Ac* x, na.rm = TRUE)/sum(Ac, na.rm = TRUE)) ), .SDcols = 2:ncol(df)]
# V1 V2 V3 V4 V5 V6 V7
#1: NA 0 9.821429 3.690476 0 2.464 2.904
data
df <- structure(list(Ac = c(6.6, 8.4), V1 = c(NA_real_, NA_real_),
V2 = c(NA, 0), V3 = c(NA, 82.5), V4 = c(NA, 31), V5 = c(0,
0), V6 = c(5.6, 0), V7 = c(5.2, 1.1)), class = "data.frame",
row.names = c(NA,
-2L))
dividing row with a vector in R
I advise against including margin totals in raw data. As you found out, it makes things unnecessarily complicated.
That aside, here is an option
df %>%
mutate(across(b:c, ~ replace(.x, a != "total", .x[a != "total"] / last(.x))))
# a b c
#1 1a 0.4285714 0.25
#2 2a 0.8571429 0.50
#3 3a 0.4285714 0.75
#4 total 7.0000000 8.00
This assumes that totals are always in the last row (i.e. the total is the last entry in a column vector).
You can replace across(b:c, ...)
with across(where(is.numeric), ...)
if preferable.
Sample data
df <-read.table(text = " a b c
1 1a 3 2
2 2a 6 4
3 3a 3 6
4 total 7 8", header = T)
Related Topics
Using Filtered Datatables in Shiny
Import Multiple Text Files in R and Assign Them Names from a Predetermined List
How to Increase the Space Between Grouped Bars in Ggplot2
How to Create a List in R from Two Vectors (One Would Be the Keys, the Other the Values)
Cumulative Count of Unique Values in R
Ggplot2: How to Adjust Fill Colour in a Boxplot (And Change Legend Text)
How to Remove Groups of Observation with Dplyr::Filter()
How to Test If Object Is a Vector
Sum Multiple Columns by Group with Tapply
Rolling Regression by Group in the Tidyverse
Align Plots Next to Each Other with Knitr
How to Change the Order of the Panels in Simple Lattice Graphs
Changing Word Template for Knitr in Rmarkdown
R: Find First Non-Na Observation in Data.Table Column by Group