What is the right way to multiply data frame by vector?
This works too:
data.frame(mapply(`*`,df,v))
In that solution, you are taking advantage of the fact that data.frame
is a type of list
, so you can iterate over both the elements of df
and v
at the same time with mapply
.
Unfortunately, you are limited in what you can output from mapply
: as simple list
, or a matrix
. If your data are huge, this would likely be more efficient:
data.frame(mapply(`*`,df,v,SIMPLIFY=FALSE))
Because it would convert it to a list
, which is more efficient to convert to a data.frame
.
Multiply columns in a data frame by a vector
Transposing the dataframe works.
c1 <- c(1,2,3)
c2 <- c(4,5,6)
c3 <- c(7,8,9)
d1 <- data.frame(c1,c2,c3)
v1 <- c(1,2,3)
t(t(d1)*v1)
# c1 c2 c3
#[1,] 1 8 21
#[2,] 2 10 24
#[3,] 3 12 27
EDIT: If all columns are not numeric, you can do the following
c1 <- c(1,2,3)
c2 <- c(4,5,6)
c3 <- c(7,8,9)
d1 <- data.frame(c1,c2,c3)
# Adding a column of characters for demonstration
d1$c4 <- c("rr", "t", "s")
v1 <- c(1,2,3)
#Choosing only numeric columns
index <- which(sapply(d1, is.numeric) == TRUE)
d1_mat <- as.matrix(d1[,index])
d1[,index] <- t(t(d1_mat)*v1)
d1
# c1 c2 c3 c4
#1 1 8 21 rr
#2 2 10 24 t
#3 3 12 27 s
Multiply each column of a data frame by the corresponding value of a vector
Here is another option using sweep
sweep(dframe, 2, vector, "*")
# V1 V2 V3
#1 2 12 28
#2 4 15 32
#3 6 18 36
Or using col
dframe*vector[col(dframe)]
multiply each row of a dataframe by it's vector R
You may do this in single mutate statement, using dplyr
's powerful cur_data()
set.seed(2021)
x <- data.frame(age = c("one", "two", "three", "four", "five","one", "two", "three", "four", "five"),
replicate(10,sample(0:5,5,rep=TRUE)),
time = c("one", "two", "three", "four", "five","one", "two", "three", "four", "five"),
vector = c("1-2-9-4-5-1-5-6-1-2",
"3-2-3-4-5-2-6-6-1-2",
"1-2-4-4-2-4-5-4-2-1",
"9-2-3-1-5-5-5-3-1-2",
"1-1-3-4-5-1-5-6-3-2"))
library(tidyverse)
x %>% mutate(select(cur_data(), starts_with('X')) * t(map_dfc(strsplit(vector, '-'), as.numeric)))
#> age X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 time vector
#> 1 one 5 10 36 4 25 1 20 12 1 0 one 1-2-9-4-5-1-5-6-1-2
#> 2 two 15 10 0 8 5 4 0 6 4 0 two 3-2-3-4-5-2-6-6-1-2
#> 3 three 1 4 12 12 6 12 25 20 10 5 three 1-2-4-4-2-4-5-4-2-1
#> 4 four 27 10 6 4 20 20 5 15 2 8 four 9-2-3-1-5-5-5-3-1-2
#> 5 five 3 5 9 8 25 5 10 30 3 6 five 1-1-3-4-5-1-5-6-3-2
#> 6 one 5 10 36 4 25 1 20 12 1 0 one 1-2-9-4-5-1-5-6-1-2
#> 7 two 15 10 0 8 5 4 0 6 4 0 two 3-2-3-4-5-2-6-6-1-2
#> 8 three 1 4 12 12 6 12 25 20 10 5 three 1-2-4-4-2-4-5-4-2-1
#> 9 four 27 10 6 4 20 20 5 15 2 8 four 9-2-3-1-5-5-5-3-1-2
#> 10 five 3 5 9 8 25 5 10 30 3 6 five 1-1-3-4-5-1-5-6-3-2
or even using across
as G.Grothendieck has suggested (that would eliminate use of cur_data()
x %>% mutate(across(starts_with('X')) * t(map_dfc(strsplit(vector, '-'), as.numeric)))
tidyverse solution for multiplying columns by a vector
If it is by row, then one option is c_across
library(dplyr)
library(stringr)
library(tibble)
new <- as_tibble(setNames(as.list(v1), names(d1)))
d1 %>%
rowwise %>%
mutate(c_across(everything()) * new) %>%
rename_with(~ str_c("pro_", .x), everything()) %>%
bind_cols(d1, .)
-output
1 c2 c3 pro_c1 pro_c2 pro_c3
1 1 4 7 1 8 21
2 2 5 8 2 10 24
3 3 6 9 3 12 27
Or another option is map2
library(purrr)
map2_dfc(d1, v1, `*`) %>%
rename_with(~ str_c("pro_", .x), everything()) %>%
bind_cols(d1, .)
-output
c1 c2 c3 pro_c1 pro_c2 pro_c3
1 1 4 7 1 8 21
2 2 5 8 2 10 24
3 3 6 9 3 12 27
Also, with the OP' approach, it is a data.frame
column. It can be unpack
ed
library(tidyr)
d1 |>
mutate(pro = sweep(cur_data(), 2, v1, `*`)) |>
unpack(pro, names_sep = "_")
-output
# A tibble: 3 × 6
c1 c2 c3 pro_c1 pro_c2 pro_c3
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 4 7 1 8 21
2 2 5 8 2 10 24
3 3 6 9 3 12 27
EDIT: Based on @deschen comments with names_sep
Matrix multiplication of each row in data frame
You can use apply
:
data %>% mutate(multiplication = apply(., 1, function(x) x %*% vector))
#> X1 X2 X3 multiplication
#> 1 1 3 5 6
#> 2 2 4 6 8
R - Multiply every row of df or matrix with a vector
Option#1: Using data.table
features:
Note:it works because column number and value matches for a
a[,lapply(.SD,function(x)(x*b[,x]))]
# V1 V2 V3 V4
#1: 2 4 6 8
#2: 2 4 6 8
#3: 2 4 6 8
Option#2: could be:
t(t(b) * (as.matrix(a)[1,]))
[,1] [,2] [,3] [,4]
[1,] 2 4 6 8
[2,] 2 4 6 8
[3,] 2 4 6 8
UPDATE
Option#3: To handle decimal/actual values in a
#Cases when `a` contains decimal values can be handled as
a <- data.table(t(c(1, 0.24, 3, 4)))
b <- matrix(data=2, nrow=3, ncol=4)
a[,lapply(V1:V4,function(i)(a[[i]]*b[,i]))]
# V1 V2 V3 V4
#1: 2 0.48 6 8
#2: 2 0.48 6 8
#3: 2 0.48 6 8
R sum of vector multiplied by each row of data frame
Try:
transform(df, prod = rowSums(sweep(df[, 3:5], 2, v, `*`)))
Output:
prod sd c1 c2 c3
1 0.175 NA 0.5 0.25 0.25
2 0.150 NA 0.5 0.50 0.00
3 0.200 NA 0.5 0.00 0.50
Most efficient way to multiply a data frame by a vector
You could try (using df
and v
from Richard Scriven's answer):
df[-1] <- t(t(df[-1]) * v)
df
# a x y z
# 1 a 5 40 105
# 2 b 10 50 120
# 3 c 15 60 135
When you multiply a matrix by a vector, it multiplies columnwise. Since you want to multiply your rows by the vector, we transpose df[-1]
using t
, multiply by v
, and transpose back using t
.
It seems like this approach has a slight edge in benchmarking over the Map
approach, and a significant advantage over sweep
:
library(microbenchmark)
rscriven <- function(df, v) cbind(df[1], Map(`*`, df[-1], v))
josilber <- function(df, v) cbind(df[1], t(t(df[-1]) * v))
dardisco <- function(df, v) cbind(df[1], sweep(df[-1], MARGIN=2, STATS=v, FUN="*"))
df2 <- cbind(data.frame(rep("a", 1000)), matrix(rnorm(100000), nrow=1000))
v2 <- rnorm(100)
all.equal(rscriven(df2, v2), josilber(df2, v2))
# [1] TRUE
all.equal(rscriven(df2, v2), dardisco(df2, v2))
# [1] TRUE
microbenchmark(rscriven(df2, v2), josilber(df2, v2), dardisco(df2, v2))
# Unit: milliseconds
# expr min lq median uq max neval
# rscriven(df2, v2) 5.276458 5.378436 5.451041 5.587644 9.470207 100
# josilber(df2, v2) 2.545144 2.753363 3.099589 3.704077 8.955193 100
# dardisco(df2, v2) 11.647147 12.761184 14.196678 16.581004 132.428972 100
Thanks to @thelatemail for pointing out that the Map
approach is a good deal faster for 100x larger data frames:
df2 <- cbind(data.frame(rep("a", 10000)), matrix(rnorm(10000000), nrow=10000))
v2 <- rnorm(1000)
microbenchmark(rscriven(df2, v2), josilber(df2, v2), dardisco(df2, v2))
# Unit: milliseconds
# expr min lq median uq max neval
# rscriven(df2, v2) 75.74051 90.20161 97.08931 115.7789 259.0855 100
# josilber(df2, v2) 340.72774 388.17046 498.26836 514.5923 623.4020 100
# dardisco(df2, v2) 928.81128 1041.34497 1156.39293 1271.4758 1506.0348 100
It seems like you'll need to benchmark to determine which approach is fastest for your application.
What is the right way to multiply a named vector by a dataframe?
How about this:
r <- mapply('*', df, v[names(df)])
# or equivalently: mapply(function(x,y) x*y, df, v[names(df)])
# A B
#[1,] 0 4
#[2,] 0 6
#[3,] 0 8
#[4,] 0 10
#[5,] 0 12
v[names(df)]
will give the vector elements in the same order as they are in df
, so column-name-respective, so to say.
If you want to have r
as data frame, just do as.data.frame(r)
.
This is from ?mapply
mapply is a multivariate version of sapply. mapply applies FUN to the first elements of each ... argument, the second elements, the third elements, and so on. Arguments are recycled if necessary.
FUN
is *
in our setting.
Related Topics
How to Change 'Maximum Upload Size Exceeded' Restriction in Shiny and Save User File Inputs
How to Change the Color Value of Just One Value in Ggplot2's Scale_Fill_Brewer
Find the N Most Common Values in a Vector
How to Force Specific Order of the Variables on the X Axis
Why Does "One" < 2 Equal False in R
How to Draw the Boxplot with Significant Level
Way to Securely Give a Password to R Application from the Terminal
How to Join Two Dataframes by Nearest Time-Date
How to Reorder a Legend in Ggplot2
Same Function Over Multiple Data Frames in R
Create End of the Month Date from a Date Variable
Re-Ordering Bars in R's Barplot()
How to Use Functions in One R Package Masked by Another Package
R: += (Plus Equals) and ++ (Plus Plus) Equivalent from C++/C#/Java, etc.