How to Multiply Data Frame by Vector

What is the right way to multiply data frame by vector?

This works too:

data.frame(mapply(`*`,df,v))

In that solution, you are taking advantage of the fact that data.frame is a type of list, so you can iterate over both the elements of df and v at the same time with mapply.

Unfortunately, you are limited in what you can output from mapply: as simple list, or a matrix. If your data are huge, this would likely be more efficient:

data.frame(mapply(`*`,df,v,SIMPLIFY=FALSE))

Because it would convert it to a list, which is more efficient to convert to a data.frame.

Multiply columns in a data frame by a vector

Transposing the dataframe works.

c1 <- c(1,2,3)
c2 <- c(4,5,6)
c3 <- c(7,8,9)
d1 <- data.frame(c1,c2,c3)
v1 <- c(1,2,3)
t(t(d1)*v1)
# c1 c2 c3
#[1,] 1 8 21
#[2,] 2 10 24
#[3,] 3 12 27

EDIT: If all columns are not numeric, you can do the following

c1 <- c(1,2,3)
c2 <- c(4,5,6)
c3 <- c(7,8,9)
d1 <- data.frame(c1,c2,c3)

# Adding a column of characters for demonstration
d1$c4 <- c("rr", "t", "s")

v1 <- c(1,2,3)

#Choosing only numeric columns
index <- which(sapply(d1, is.numeric) == TRUE)
d1_mat <- as.matrix(d1[,index])

d1[,index] <- t(t(d1_mat)*v1)
d1
# c1 c2 c3 c4
#1 1 8 21 rr
#2 2 10 24 t
#3 3 12 27 s

Multiply each column of a data frame by the corresponding value of a vector

Here is another option using sweep

sweep(dframe, 2, vector, "*")
# V1 V2 V3
#1 2 12 28
#2 4 15 32
#3 6 18 36

Or using col

dframe*vector[col(dframe)]

multiply each row of a dataframe by it's vector R

You may do this in single mutate statement, using dplyr's powerful cur_data()

set.seed(2021)
x <- data.frame(age = c("one", "two", "three", "four", "five","one", "two", "three", "four", "five"),
replicate(10,sample(0:5,5,rep=TRUE)),
time = c("one", "two", "three", "four", "five","one", "two", "three", "four", "five"),
vector = c("1-2-9-4-5-1-5-6-1-2",
"3-2-3-4-5-2-6-6-1-2",
"1-2-4-4-2-4-5-4-2-1",
"9-2-3-1-5-5-5-3-1-2",
"1-1-3-4-5-1-5-6-3-2"))

library(tidyverse)

x %>% mutate(select(cur_data(), starts_with('X')) * t(map_dfc(strsplit(vector, '-'), as.numeric)))

#> age X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 time vector
#> 1 one 5 10 36 4 25 1 20 12 1 0 one 1-2-9-4-5-1-5-6-1-2
#> 2 two 15 10 0 8 5 4 0 6 4 0 two 3-2-3-4-5-2-6-6-1-2
#> 3 three 1 4 12 12 6 12 25 20 10 5 three 1-2-4-4-2-4-5-4-2-1
#> 4 four 27 10 6 4 20 20 5 15 2 8 four 9-2-3-1-5-5-5-3-1-2
#> 5 five 3 5 9 8 25 5 10 30 3 6 five 1-1-3-4-5-1-5-6-3-2
#> 6 one 5 10 36 4 25 1 20 12 1 0 one 1-2-9-4-5-1-5-6-1-2
#> 7 two 15 10 0 8 5 4 0 6 4 0 two 3-2-3-4-5-2-6-6-1-2
#> 8 three 1 4 12 12 6 12 25 20 10 5 three 1-2-4-4-2-4-5-4-2-1
#> 9 four 27 10 6 4 20 20 5 15 2 8 four 9-2-3-1-5-5-5-3-1-2
#> 10 five 3 5 9 8 25 5 10 30 3 6 five 1-1-3-4-5-1-5-6-3-2

or even using across as G.Grothendieck has suggested (that would eliminate use of cur_data()

x %>% mutate(across(starts_with('X')) * t(map_dfc(strsplit(vector, '-'), as.numeric)))

tidyverse solution for multiplying columns by a vector

If it is by row, then one option is c_across

library(dplyr)
library(stringr)
library(tibble)
new <- as_tibble(setNames(as.list(v1), names(d1)))
d1 %>%
rowwise %>%
mutate(c_across(everything()) * new) %>%
rename_with(~ str_c("pro_", .x), everything()) %>%
bind_cols(d1, .)

-output

   1 c2 c3 pro_c1 pro_c2 pro_c3
1 1 4 7 1 8 21
2 2 5 8 2 10 24
3 3 6 9 3 12 27

Or another option is map2

library(purrr)
map2_dfc(d1, v1, `*`) %>%
rename_with(~ str_c("pro_", .x), everything()) %>%
bind_cols(d1, .)

-output

 c1 c2 c3 pro_c1 pro_c2 pro_c3
1 1 4 7 1 8 21
2 2 5 8 2 10 24
3 3 6 9 3 12 27

Also, with the OP' approach, it is a data.frame column. It can be unpacked

library(tidyr)
d1 |>
mutate(pro = sweep(cur_data(), 2, v1, `*`)) |>
unpack(pro, names_sep = "_")

-output

# A tibble: 3 × 6
c1 c2 c3 pro_c1 pro_c2 pro_c3
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 4 7 1 8 21
2 2 5 8 2 10 24
3 3 6 9 3 12 27

EDIT: Based on @deschen comments with names_sep

Matrix multiplication of each row in data frame

You can use apply:

data %>% mutate(multiplication = apply(., 1, function(x) x %*% vector))
#> X1 X2 X3 multiplication
#> 1 1 3 5 6
#> 2 2 4 6 8

R - Multiply every row of df or matrix with a vector

Option#1: Using data.table features:

Note:it works because column number and value matches for a

a[,lapply(.SD,function(x)(x*b[,x]))]
# V1 V2 V3 V4
#1: 2 4 6 8
#2: 2 4 6 8
#3: 2 4 6 8

Option#2: could be:

t(t(b) * (as.matrix(a)[1,]))
[,1] [,2] [,3] [,4]
[1,] 2 4 6 8
[2,] 2 4 6 8
[3,] 2 4 6 8

UPDATE

Option#3: To handle decimal/actual values in a

#Cases when `a` contains decimal values can be handled as
a <- data.table(t(c(1, 0.24, 3, 4)))
b <- matrix(data=2, nrow=3, ncol=4)

a[,lapply(V1:V4,function(i)(a[[i]]*b[,i]))]
# V1 V2 V3 V4
#1: 2 0.48 6 8
#2: 2 0.48 6 8
#3: 2 0.48 6 8

R sum of vector multiplied by each row of data frame

Try:

transform(df, prod = rowSums(sweep(df[, 3:5], 2, v, `*`)))

Output:

   prod sd  c1   c2   c3
1 0.175 NA 0.5 0.25 0.25
2 0.150 NA 0.5 0.50 0.00
3 0.200 NA 0.5 0.00 0.50

Most efficient way to multiply a data frame by a vector

You could try (using df and v from Richard Scriven's answer):

df[-1] <- t(t(df[-1]) * v)
df
# a x y z
# 1 a 5 40 105
# 2 b 10 50 120
# 3 c 15 60 135

When you multiply a matrix by a vector, it multiplies columnwise. Since you want to multiply your rows by the vector, we transpose df[-1] using t, multiply by v, and transpose back using t.

It seems like this approach has a slight edge in benchmarking over the Map approach, and a significant advantage over sweep:

library(microbenchmark)
rscriven <- function(df, v) cbind(df[1], Map(`*`, df[-1], v))
josilber <- function(df, v) cbind(df[1], t(t(df[-1]) * v))
dardisco <- function(df, v) cbind(df[1], sweep(df[-1], MARGIN=2, STATS=v, FUN="*"))
df2 <- cbind(data.frame(rep("a", 1000)), matrix(rnorm(100000), nrow=1000))
v2 <- rnorm(100)
all.equal(rscriven(df2, v2), josilber(df2, v2))
# [1] TRUE
all.equal(rscriven(df2, v2), dardisco(df2, v2))
# [1] TRUE

microbenchmark(rscriven(df2, v2), josilber(df2, v2), dardisco(df2, v2))
# Unit: milliseconds
# expr min lq median uq max neval
# rscriven(df2, v2) 5.276458 5.378436 5.451041 5.587644 9.470207 100
# josilber(df2, v2) 2.545144 2.753363 3.099589 3.704077 8.955193 100
# dardisco(df2, v2) 11.647147 12.761184 14.196678 16.581004 132.428972 100

Thanks to @thelatemail for pointing out that the Map approach is a good deal faster for 100x larger data frames:

df2 <- cbind(data.frame(rep("a", 10000)), matrix(rnorm(10000000), nrow=10000))
v2 <- rnorm(1000)
microbenchmark(rscriven(df2, v2), josilber(df2, v2), dardisco(df2, v2))
# Unit: milliseconds
# expr min lq median uq max neval
# rscriven(df2, v2) 75.74051 90.20161 97.08931 115.7789 259.0855 100
# josilber(df2, v2) 340.72774 388.17046 498.26836 514.5923 623.4020 100
# dardisco(df2, v2) 928.81128 1041.34497 1156.39293 1271.4758 1506.0348 100

It seems like you'll need to benchmark to determine which approach is fastest for your application.

What is the right way to multiply a named vector by a dataframe?

How about this:

r <- mapply('*', df, v[names(df)])
# or equivalently: mapply(function(x,y) x*y, df, v[names(df)])

# A B
#[1,] 0 4
#[2,] 0 6
#[3,] 0 8
#[4,] 0 10
#[5,] 0 12

v[names(df)] will give the vector elements in the same order as they are in df, so column-name-respective, so to say.

If you want to have r as data frame, just do as.data.frame(r).

This is from ?mapply

mapply is a multivariate version of sapply. mapply applies FUN to the first elements of each ... argument, the second elements, the third elements, and so on. Arguments are recycled if necessary.

FUN is * in our setting.



Related Topics



Leave a reply



Submit