Divide Each Data Frame Row by Vector in R

Divide each data frame row by vector in R

sweep is useful for these sorts of operations, but it requires a matrix as input. As such, convert your data frame to a matrix, do the operation and then convert back. For example, some dummy data where we divide each element in respective columns of matrix mat by the corresponding value in the vector vec:

mat <- matrix(1:25, ncol = 5)
vec <- seq(2, by = 2, length = 5)

sweep(mat, 2, vec, `/`)

In use we have:

> mat
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    6   11   16   21
[2,]    2    7   12   17   22
[3,]    3    8   13   18   23
[4,]    4    9   14   19   24
[5,]    5   10   15   20   25
> vec
[1]  2  4  6  8 10
> sweep(mat, 2, vec, `/`)
     [,1] [,2]     [,3]  [,4] [,5]
[1,]  0.5 1.50 1.833333 2.000  2.1
[2,]  1.0 1.75 2.000000 2.125  2.2
[3,]  1.5 2.00 2.166667 2.250  2.3
[4,]  2.0 2.25 2.333333 2.375  2.4
[5,]  2.5 2.50 2.500000 2.500  2.5
> mat[,1] / vec[1]
[1] 0.5 1.0 1.5 2.0 2.5

To convert from a data frame use as.matrix(df) or data.matrix(df), and as.data.frame(mat) for the reverse.

Divide data frame by vector, by rows and not by columns

You can just use the transpose function:

> df[,2:4] <- t(t(df[,2:4]) / div)
> df
  type  V1   V2    V3
1    A 0.1 0.01 0.001
2    B 0.1 0.01 0.001
3    C 0.1 0.01 0.001

Dividing each row of a dataframe matrix by a vector with a shorter length in R?

You can use sweep.

sweep(type.convert(d[, -1]), 2, crow_sqm, `/`)

#     Crow_education_Omer Crow_education_Keisha Crow_education_Kate Crow_education_Winston
#[1,]               206.0                   123                10.0                  207.5
#[2,]               208.0                   123                10.2                  207.5
#[3,]               208.5                   121                10.2                  209.0

#     Crow_education_Marlin
#[1,]                    NA
#[2,]                    NA
#[3,]                    NA

Or with transpose.

t(t(type.convert(d[, -1]))/crow_sqm)

The data is a matrix and matrix can have data of only one type. The 1st column cannot be represented as number hence all the values in the matrix turns to be of type character. -1 is used to drop 1st column in the matrix and type.convert is used to change values from character to numeric for all the columns.

Divide each column of a dataframe by one row of the dataframe

We replicate the vector to make the length same and then do the division

data.mat/unlist(vector)[col(data.mat)]
#  FeO     Total S SO4 Total N      SiO2     Al2O3     Fe2O3        MnO        MgO        CaO       Na2O       K2O
#[1,] 0.10  16.5555556  NA      NA 0.8908607 0.8987269 0.1835206 0.08333333 0.03680982 0.04175365 0.04823151 0.5738562
#[2,] 0.40 125.8333333  NA      NA 0.5510204 0.4456019 0.2359551 0.08333333 0.04294479 0.01878914 0.04501608 0.2588235
#[3,] 0.85   0.6111111  NA      NA 1.0021295 1.0162037 0.7715356 1.08333333 0.53987730 0.69728601 1.03858521 1.0457516
#[4,] 0.15  48.0555556  NA      NA 1.1027507 0.2569444        NA 0.08333333 0.01840491 0.01878914 0.04180064 0.1647059
#[5,] 0.85          NA  NA      NA 1.0889086 1.0271991 0.6591760 0.75000000 0.59509202 0.53862213 1.02250804 1.1228758
#[6,]   NA          NA  NA      NA 1.3426797 0.6319444 0.0411985 0.08333333 0.03067485 0.11899791 0.65594855 0.7764706
#          TiO2      P2O5        LOI       LOI2     Total   Total 2   Fe2O3(T)
#[1,] 0.7924528 0.3928571  7.0841837  6.6963855 0.9922233 0.9894632 0.14489796
#[2,] 0.5094340 0.3214286 14.5561224 13.7710843 0.9958126 0.9936382 0.31020408
#[3,] 0.8679245 0.6428571  1.5637755  1.5228916 0.9990030 0.9970179 0.80612245
#[4,] 1.4905660 0.2857143  7.4056122  7.0024096 0.9795613 0.9769384 0.05510204
#[5,] 1.0377358 0.2500000  0.3520408  0.3783133 0.9969093 0.9960239 0.74489796
#[6,] 0.3018868 0.2500000  1.2551020  1.1879518 1.0019940 1.0000000 0.04489796

Or use sweep

sweep(data.mat, MARGIN = 2, unlist(vector), FUN = `/`)

Or using mapply with asplit

mapply(`/`, asplit(data.mat, 2), vector)

data

data_mat <- structure(c(0.2, 0.8, 1.7, 0.3, 1.7, NA, 5.96, 45.3, 0.22, 17.3, 
NA, NA, NA, 6.72, NA, 4.08, 0.06, 0.16, NA, NA, NA, NA, NA, NA, 
50.2, 31.05, 56.47, 62.14, 61.36, 75.66, 15.53, 7.7, 17.56, 4.44, 
17.75, 10.92, 0.49, 0.63, 2.06, NA, 1.76, 0.11, 0.01, 0.01, 0.13, 
0.01, 0.09, 0.01, 0.06, 0.07, 0.88, 0.03, 0.97, 0.05, 0.2, 0.09, 
3.34, 0.09, 2.58, 0.57, 0.15, 0.14, 3.23, 0.13, 3.18, 2.04, 4.39, 
1.98, 8, 1.26, 8.59, 5.94, 0.42, 0.27, 0.46, 0.79, 0.55, 0.16, 
0.11, 0.09, 0.18, 0.08, 0.07, 0.07, 27.77, 57.06, 6.13, 29.03, 
1.38, 4.92, 27.79, 57.15, 6.32, 29.06, 1.57, 4.93, 99.52, 99.88, 
100.2, 98.25, 99.99, 100.5, 99.54, 99.96, 100.3, 98.28, 100.2, 
100.6, 0.71, 1.52, 3.95, 0.27, 3.65, 0.22), .Dim = c(6L, 19L), .Dimnames = list(
    NULL, c("FeO", "Total S", "SO4", "Total N", "SiO2", "Al2O3", 
    "Fe2O3", "MnO", "MgO", "CaO", "Na2O", "K2O", "TiO2", "P2O5", 
    "LOI", "LOI2", "Total", "Total 2", "Fe2O3(T)")))

vector <- structure(list(FeO = 2, `Total S` = 0.36, SO4 = NA_real_, `Total N` = NA_real_, 
    SiO2 = 56.35, Al2O3 = 17.28, Fe2O3 = 2.67, MnO = 0.12, MgO = 1.63, 
    CaO = 4.79, Na2O = 3.11, K2O = 7.65, TiO2 = 0.53, P2O5 = 0.28, 
    LOI = 3.92, LOI2 = 4.15, Total = 100.3, `Total 2` = 100.6, 
    `Fe2O3(T)` = 4.9), row.names = c(NA, -1L), class = c("tbl_df", 
"tbl", "data.frame"))

How to divide each row of a matrix by elements of a vector in R

Here are a few ways in order of increasing code length:

t(t(mat) / dev)

mat / dev[col(mat)] #  @DavidArenburg & @akrun

mat %*% diag(1 / dev)

sweep(mat, 2, dev, "/")

t(apply(mat, 1, "/", dev))

plyr::aaply(mat, 1, "/", dev)

mat / rep(dev, each = nrow(mat))

mat / t(replace(t(mat), TRUE, dev))

mapply("/", as.data.frame(mat), dev)  # added later

mat / matrix(dev, nrow(mat), ncol(mat), byrow = TRUE)  # added later

do.call(rbind, lapply(as.data.frame(t(mat)), "/", dev))

mat2 <- mat; for(i in seq_len(nrow(mat2))) mat2[i, ] <- mat2[i, ] / dev

Data Frames

All the solutions that begin with mat / also work if mat is a data frame and produce a data frame result. The same is also the case for the sweep solution and the last, i.e. mat2, solution. The mapply solutions works with data.frames but produces a matrix.

Vector

If mat is a plain vector rather than a matrix then either of these return a one column matrix

t(t(mat) / dev)
mat / t(replace(t(mat), TRUE, dev))

and this one returns a vector:

plyr::aaply(mat, 1, "/", dev)

The others give an error, warning or not the desired answer.

Benchmarks

The brevity and clarity of the code may be more important than speed but for purposes of completeness here are some benchmarks using 10 repetitions and then 100 repetitions.

library(microbenchmark)
library(plyr)

set.seed(84789)

mat<-matrix(runif(1e6),nrow=1e5)
dev<-runif(10)

microbenchmark(times=10L,
  "1" = t(t(mat) / dev),
  "2" = mat %*% diag(1/dev),
  "3" = sweep(mat, 2, dev, "/"),
  "4" = t(apply(mat, 1, "/", dev)),
  "5" = mat / rep(dev, each = nrow(mat)),
  "6" = mat / t(replace(t(mat), TRUE, dev)),
  "7" = aaply(mat, 1, "/", dev),
  "8" = do.call(rbind, lapply(as.data.frame(t(mat)), "/", dev)),
  "9" = {mat2 <- mat; for(i in seq_len(nrow(mat2))) mat2[i, ] <- mat2[i, ] / dev},
 "10" = mat/dev[col(mat)])

giving:

Unit: milliseconds
 expr         min          lq       mean      median          uq        max neval
    1    7.957253    8.136799   44.13317    8.370418    8.597972  366.24246    10
    2    4.678240    4.693771   10.11320    4.708153    4.720309   58.79537    10
    3   15.594488   15.691104   16.38740   15.843637   16.559956   19.98246    10
    4   96.616547  104.743737  124.94650  117.272493  134.852009  177.96882    10
    5   17.631848   17.654821   18.98646   18.295586   20.120382   21.30338    10
    6   19.097557   19.365944   27.78814   20.126037   43.322090   48.76881    10
    7 8279.428898 8496.131747 8631.02530 8644.798642 8741.748155 9194.66980    10
    8  509.528218  524.251103  570.81573  545.627522  568.929481  821.17562    10
    9  161.240680  177.282664  188.30452  186.235811  193.250346  242.45495    10
   10    7.713448    7.815545   11.86550    7.965811    8.807754   45.87518    10

Re-running the test on all those that took <20 milliseconds with 100 repetitions:

microbenchmark(times=100L,
  "1" = t(t(mat) / dev),
  "2" = mat %*% diag(1/dev),
  "3" = sweep(mat, 2, dev, "/"),
  "5" = mat / rep(dev, each = nrow(mat)),
  "6" = mat / t(replace(t(mat), TRUE, dev)),
 "10" = mat/dev[col(mat)])

giving:

Unit: milliseconds
 expr       min        lq      mean    median        uq       max neval
    1  8.010749  8.188459 13.972445  8.560578 10.197650 299.80328   100
    2  4.672902  4.734321  5.802965  4.769501  4.985402  20.89999   100
    3 15.224121 15.428518 18.707554 15.836116 17.064866  42.54882   100
    5 17.625347 17.678850 21.464804 17.847698 18.209404 303.27342   100
    6 19.158946 19.361413 22.907115 19.772479 21.142961  38.77585   100
   10  7.754911  7.939305  9.971388  8.010871  8.324860  25.65829   100

So on both these tests #2 (using diag) is fastest. The reason may lie in its almost direct appeal to the BLAS, whereas #1 relies on the costlier t.

R: How to divide columns in matrix by a vector ？

This is perfect use case for sweep -

sweep(matrix, MARGIN = 2, STATS = vector, `/`)

     [,1] [,2]     [,3] [,4] [,5]
[1,]  0.5 1.25 3.000000  2.6  8.5
[2,]  1.0 1.50 3.333333  2.8  9.0
[3,]  1.5 1.75 3.666667  3.0  9.5
[4,]  2.0 2.00 4.000000  3.2 10.0

Dividing a dataframe row wise using a vector with condition in r

No need to use apply, we can use vectorized/matrix operations:

df / t(ifelse(df > 500, dividing_factor, 1))
#       col1  col2  col3
# 1   500.00 10000 4e+04
# 2 13333.33   500 1e+05
# 3 33333.33 25000 5e+02

How to divide each element in a row by corresponding row value?

Here is one option with tidyverse. We divide all the columns except the 'Ac' column with the 'Ac', then summarise_all to return the sum if any non-NA element is present or else return NA

library(tidyverse)
df %>%
  transmute_at(-1, list(~ ./Ac)) %>% 
  summarise_all(list(~ if(all(is.na(.))) NA else sum(.,na.rm = TRUE)))
#  V1 V2       V3       V4 V5        V6        V7
#1 NA  0 9.821429 3.690476  0 0.8484848 0.9188312

It can also be done in a single step

df %>% 
  summarise_at(-1, list(~ if(all(is.na(.))) NA else (sum(./Ac, na.rm = TRUE)) ))
#  V1 V2       V3       V4 V5        V6        V7
#1 NA  0 9.821429 3.690476  0 0.8484848 0.9188312

Update

Based on the comments,

df %>% 
    summarise_at(-1, list(~ if(all(is.na(.))) NA
       else if(sum(is.na(.)) == 1) (sum(./Ac, na.rm = TRUE)) 
      else (sum(Ac* ., na.rm = TRUE)/sum(Ac, na.rm = TRUE)) ))
#  V1 V2       V3       V4 V5    V6    V7
#1 NA  0 9.821429 3.690476  0 2.464 2.904

Same method can be translated to data.table as well

library(data.table)
setDT(df)[, lapply(.SD, function(x) if(all(is.na(x))) NA 
      else sum(x/Ac, na.rm = TRUE)), .SDcols = 2:ncol(df)]
#   V1 V2       V3       V4 V5        V6        V7
#1: NA  0 9.821429 3.690476  0 0.8484848 0.9188312

Updated data.table solution

setDT(df)[, lapply(.SD, function(x) if(all(is.na(x))) NA
       else if(sum(is.na(x)) == 1) (sum(x/Ac, na.rm = TRUE)) 
      else (sum(Ac* x, na.rm = TRUE)/sum(Ac, na.rm = TRUE)) ), .SDcols = 2:ncol(df)]
#   V1 V2       V3       V4 V5    V6    V7
#1: NA  0 9.821429 3.690476  0 2.464 2.904

data

df <- structure(list(Ac = c(6.6, 8.4), V1 = c(NA_real_, NA_real_), 
    V2 = c(NA, 0), V3 = c(NA, 82.5), V4 = c(NA, 31), V5 = c(0, 
    0), V6 = c(5.6, 0), V7 = c(5.2, 1.1)), class = "data.frame", 
    row.names = c(NA, 
-2L))

dividing row with a vector in R

I advise against including margin totals in raw data. As you found out, it makes things unnecessarily complicated.

That aside, here is an option

df %>%
    mutate(across(b:c, ~ replace(.x, a != "total", .x[a != "total"] / last(.x))))
#      a         b    c
#1    1a 0.4285714 0.25
#2    2a 0.8571429 0.50
#3    3a 0.4285714 0.75
#4 total 7.0000000 8.00

This assumes that totals are always in the last row (i.e. the total is the last entry in a column vector).

You can replace across(b:c, ...) with across(where(is.numeric), ...) if preferable.

Sample data

df <-read.table(text = " a     b     c    
1 1a    3     2 
2 2a    6     4 
3 3a    3     6  
4 total 7     8", header = T)

Divide Each Data Frame Row by Vector in R