Cartesian Product with Dplyr R

cartesian product with dplyr R

Use crossing from the tidyr package:

x <- data.frame(x=c("a","b","c"))
y <- data.frame(y=c(1,2,3))

crossing(x, y)

Result:

Cross Join in dplyr in R

You just need a dummy column to join on:

cust_time$k <- 1
cust_time %>% 
  inner_join(cust_time, by='k') %>%
  select(-k)

Or if you don't want to modify your original dataframe:

cust_time %>%
  mutate(k = 1) %>%
  replicate(2, ., simplify=FALSE) %>%
  Reduce(function(a, b) inner_join(a, b, by='k'), .) %>%
  select(-k)

How to filter by the elements of a Cartesian product of columns in R with dplyr?

This is an inner join of df and key

library(dplyr)

df %>% 
  inner_join(key)

#   a b c
# 1 1 1 1
# 2 1 1 3
# 3 2 2 4
# 4 2 2 6

Cartesian product data frame

You can use expand.grid(A, B, C)

EDIT: an alternative to using do.call to achieve the second part, is the function mdply from the package plyr:

library(plyr)

d = expand.grid(x = A, y = B, z = C)
d = mdply(d, f)

To illustrate its usage using a trivial function 'paste', you can try

d = mdply(d, 'paste', sep = '+');

Is it possible to use outer() to generate a Cartesian product?

There is no FUN in expand_grid. An option is to use mutate to create a new column and then reshape back to 'wide' with pivot_wider

library(dplyr)
library(tidyr)
expand_grid(x, y) %>% 
     mutate(out = x + y) %>% 
     pivot_wider(names_from = y, values_from = out) %>%
     select(-x) %>%
     as.matrix %>% 
     `dimnames<-`(., NULL)

-output

      [,1] [,2] [,3]
[1,]    2    3    4
[2,]    3    4    5

Regarding the second question, it seems that the OP wanted to store each element of the matrix as a list

out1 <- outer(x, y, FUN = Vectorize(function(x, y) list(c(x, y))))

-output

out1
     [,1]      [,2]      [,3]     
[1,] integer,2 integer,2 integer,2
[2,] integer,2 integer,2 integer,2

Catesian product without duplicate pairs in R

For cartesian joins with merge pass NULL into by argument:

merge(SaleItems, SaleItems2, by=NULL)

Then to remove equivalent matches and reverse duplicates, extend it with subset:

subset(merge(SaleItems, SaleItems2, by=NULL),
       Appliance <= Appliance2)

And if fields are factors:

subset(merge(SaleItems, SaleItems2, by=NULL),
       as.character(Appliance) <= as.character(Appliance2))

#    Appliance Appliance2
# 1      Radio      Radio
# 2     Laptop      Radio
# 4     Fridge      Radio
# 6     Laptop     Laptop
# 8     Fridge     Laptop
# 9      Radio         TV
# 10    Laptop         TV
# 11        TV         TV
# 12    Fridge         TV
# 16    Fridge     Fridge

join each row to the whole second table in R dplyr

There is no need to join, we can use tidyr::expand_grid:

library(dplyr)
library(tidyr)

table1 <- tibble(a = c("a1", "a2"),
                 b = c("b1", "b2"))

table2 <- tibble(c = c("c1","c2"),
                 d = c("d1", "d2"))

expand_grid(table1, table2)
#> # A tibble: 4 x 4
#>   a     b     c     d    
#>   <chr> <chr> <chr> <chr>
#> 1 a1    b1    c1    d1   
#> 2 a1    b1    c2    d2   
#> 3 a2    b2    c1    d1   
#> 4 a2    b2    c2    d2

^{Created on 2021-09-17 by the reprex package (v2.0.1)}

How to do cross join in R?

Is it just all=TRUE?

x<-data.frame(id1=c("a","b","c"),vals1=1:3)
y<-data.frame(id2=c("d","e","f"),vals2=4:6)
merge(x,y,all=TRUE)

From documentation of merge:

If by or both by.x and by.y are of length 0 (a length zero vector or NULL), the result, r, is the Cartesian product of x and y, i.e., dim(r) = c(nrow(x)*nrow(y), ncol(x) + ncol(y)).

Cartesian product with filter data.table

Use by = .EACHI feature. In data.table joins and subsets are very closely linked; i.e., a join is just another subset - using data.table - instead of the usual integer / logical / row names. They are designed this way with these cases in mind.

Subset based joins allow to incorporate j-expressions and grouping operations together while joining.

require(data.table)
dt[dt, .SD[contract != i.contract & value + i.value < 4L], by = .EACHI, allow = TRUE]

This is the idiomatic way (in case you'd like to use i.* cols just for condition, but not return them as well), however, .SD has not yet been optimised, and evaluating the j-expression on .SD for each group is costly.

system.time(dt[dt, .SD[contract != i.contract & value + i.value < 4L], by = .EACHI, allow = TRUE])
#    user  system elapsed 
#   2.874   0.020   2.983

Some cases using .SD have already been optimised. Until these cases are taken care of, you can workaround it this way:

dt[dt, {
        idx = contract != i.contract & value + i.value < 4L
        list(contract = contract[idx],
             value = value[idx], 
             i.contract = i.contract[any(idx)],
             i.value = i.value[any(idx)]
        )
       }, by = .EACHI, allow = TRUE]

And this takes 0.045 seconds, as opposed to 0.005 seconds from your method. But by = .EACHI evaluates the j-expression each time (and therefore memory efficient). That's the trade-off you'll have to accept.

Cartesian Product with Dplyr R