cartesian product with dplyr R
Use crossing from the tidyr
package:
x <- data.frame(x=c("a","b","c"))
y <- data.frame(y=c(1,2,3))
crossing(x, y)
Result:
x y
1 a 1
2 a 2
3 a 3
4 b 1
5 b 2
6 b 3
7 c 1
8 c 2
9 c 3
Cross Join in dplyr in R
You just need a dummy column to join on:
cust_time$k <- 1
cust_time %>%
inner_join(cust_time, by='k') %>%
select(-k)
Or if you don't want to modify your original dataframe:
cust_time %>%
mutate(k = 1) %>%
replicate(2, ., simplify=FALSE) %>%
Reduce(function(a, b) inner_join(a, b, by='k'), .) %>%
select(-k)
How to filter by the elements of a Cartesian product of columns in R with dplyr?
This is an inner join of df
and key
library(dplyr)
df %>%
inner_join(key)
# a b c
# 1 1 1 1
# 2 1 1 3
# 3 2 2 4
# 4 2 2 6
Cartesian product data frame
You can use expand.grid(A, B, C)
EDIT: an alternative to using do.call
to achieve the second part, is the function mdply
from the package plyr
:
library(plyr)
d = expand.grid(x = A, y = B, z = C)
d = mdply(d, f)
To illustrate its usage using a trivial function 'paste', you can try
d = mdply(d, 'paste', sep = '+');
Is it possible to use outer() to generate a Cartesian product?
There is no FUN
in expand_grid
. An option is to use mutate
to create a new column and then reshape back to 'wide' with pivot_wider
library(dplyr)
library(tidyr)
expand_grid(x, y) %>%
mutate(out = x + y) %>%
pivot_wider(names_from = y, values_from = out) %>%
select(-x) %>%
as.matrix %>%
`dimnames<-`(., NULL)
-output
[,1] [,2] [,3]
[1,] 2 3 4
[2,] 3 4 5
Regarding the second question, it seems that the OP wanted to store each element of the matrix
as a list
out1 <- outer(x, y, FUN = Vectorize(function(x, y) list(c(x, y))))
-output
out1
[,1] [,2] [,3]
[1,] integer,2 integer,2 integer,2
[2,] integer,2 integer,2 integer,2
Catesian product without duplicate pairs in R
For cartesian joins with merge
pass NULL into by argument:
merge(SaleItems, SaleItems2, by=NULL)
Then to remove equivalent matches and reverse duplicates, extend it with subset
:
subset(merge(SaleItems, SaleItems2, by=NULL),
Appliance <= Appliance2)
And if fields are factors:
subset(merge(SaleItems, SaleItems2, by=NULL),
as.character(Appliance) <= as.character(Appliance2))
# Appliance Appliance2
# 1 Radio Radio
# 2 Laptop Radio
# 4 Fridge Radio
# 6 Laptop Laptop
# 8 Fridge Laptop
# 9 Radio TV
# 10 Laptop TV
# 11 TV TV
# 12 Fridge TV
# 16 Fridge Fridge
join each row to the whole second table in R dplyr
There is no need to join, we can use tidyr::expand_grid
:
library(dplyr)
library(tidyr)
table1 <- tibble(a = c("a1", "a2"),
b = c("b1", "b2"))
table2 <- tibble(c = c("c1","c2"),
d = c("d1", "d2"))
expand_grid(table1, table2)
#> # A tibble: 4 x 4
#> a b c d
#> <chr> <chr> <chr> <chr>
#> 1 a1 b1 c1 d1
#> 2 a1 b1 c2 d2
#> 3 a2 b2 c1 d1
#> 4 a2 b2 c2 d2
Created on 2021-09-17 by the reprex package (v2.0.1)
How to do cross join in R?
Is it just all=TRUE
?
x<-data.frame(id1=c("a","b","c"),vals1=1:3)
y<-data.frame(id2=c("d","e","f"),vals2=4:6)
merge(x,y,all=TRUE)
From documentation of merge
:
If by or both by.x and by.y are of length 0 (a length zero vector or NULL), the result, r, is the Cartesian product of x and y, i.e., dim(r) = c(nrow(x)*nrow(y), ncol(x) + ncol(y)).
Cartesian product with filter data.table
Use by = .EACHI feature. In data.table
joins and subsets are very closely linked; i.e., a join is just another subset - using data.table
- instead of the usual integer / logical / row names. They are designed this way with these cases in mind.
Subset based joins allow to incorporate j
-expressions and grouping operations together while joining.
require(data.table)
dt[dt, .SD[contract != i.contract & value + i.value < 4L], by = .EACHI, allow = TRUE]
This is the idiomatic way (in case you'd like to use i.*
cols just for condition, but not return them as well), however, .SD
has not yet been optimised, and evaluating the j
-expression on .SD
for each group is costly.
system.time(dt[dt, .SD[contract != i.contract & value + i.value < 4L], by = .EACHI, allow = TRUE])
# user system elapsed
# 2.874 0.020 2.983
Some cases using .SD
have already been optimised. Until these cases are taken care of, you can workaround it this way:
dt[dt, {
idx = contract != i.contract & value + i.value < 4L
list(contract = contract[idx],
value = value[idx],
i.contract = i.contract[any(idx)],
i.value = i.value[any(idx)]
)
}, by = .EACHI, allow = TRUE]
And this takes 0.045 seconds, as opposed to 0.005 seconds from your method. But by = .EACHI
evaluates the j
-expression each time (and therefore memory efficient). That's the trade-off you'll have to accept.
Related Topics
R: Lm() Result Differs When Using 'Weights' Argument and When Using Manually Reweighted Data
How to Convert Integer into Categorical Data in R
Shiny: Passing Input$Var to Aes() in Ggplot2
Replace Duplicated Elements with Na, Instead of Removing Them
Find the Most Frequent Value by Row
Delete Duplicate Rows in Two Columns Simultaneously
Purrr Map Equivalent of Nested for Loop
No Visible Binding For Global Variable Note in R Cmd Check
Object Not Found Error with Ddply Inside a Function
Combine Points with Lines with Ggplot2
Update Handsontable by Editing Table And/Or Eventreactive
How to Complete Missing Factor Levels in Data Frame
R Group by Date, and Summarize the Values
Convert a Character Vector of Mixed Numbers, Fractions, and Integers to Numeric
Data.Frame Merge and Selection of Values Which Are Common in 2 Data.Frames