Cartesian Product with Dplyr R

cartesian product with dplyr R

Use crossing from the tidyr package:

x <- data.frame(x=c("a","b","c"))
y <- data.frame(y=c(1,2,3))

crossing(x, y)

Result:

   x y
1 a 1
2 a 2
3 a 3
4 b 1
5 b 2
6 b 3
7 c 1
8 c 2
9 c 3

Cross Join in dplyr in R

You just need a dummy column to join on:

cust_time$k <- 1
cust_time %>%
inner_join(cust_time, by='k') %>%
select(-k)

Or if you don't want to modify your original dataframe:

cust_time %>%
mutate(k = 1) %>%
replicate(2, ., simplify=FALSE) %>%
Reduce(function(a, b) inner_join(a, b, by='k'), .) %>%
select(-k)

How to filter by the elements of a Cartesian product of columns in R with dplyr?

This is an inner join of df and key

library(dplyr)

df %>%
inner_join(key)

# a b c
# 1 1 1 1
# 2 1 1 3
# 3 2 2 4
# 4 2 2 6

Cartesian product data frame

You can use expand.grid(A, B, C)


EDIT: an alternative to using do.call to achieve the second part, is the function mdply from the package plyr:

library(plyr)

d = expand.grid(x = A, y = B, z = C)
d = mdply(d, f)

To illustrate its usage using a trivial function 'paste', you can try

d = mdply(d, 'paste', sep = '+');

Is it possible to use outer() to generate a Cartesian product?

There is no FUN in expand_grid. An option is to use mutate to create a new column and then reshape back to 'wide' with pivot_wider

library(dplyr)
library(tidyr)
expand_grid(x, y) %>%
mutate(out = x + y) %>%
pivot_wider(names_from = y, values_from = out) %>%
select(-x) %>%
as.matrix %>%
`dimnames<-`(., NULL)

-output

      [,1] [,2] [,3]
[1,] 2 3 4
[2,] 3 4 5

Regarding the second question, it seems that the OP wanted to store each element of the matrix as a list

out1 <- outer(x, y, FUN = Vectorize(function(x, y) list(c(x, y))))

-output

out1
[,1] [,2] [,3]
[1,] integer,2 integer,2 integer,2
[2,] integer,2 integer,2 integer,2

Catesian product without duplicate pairs in R

For cartesian joins with merge pass NULL into by argument:

merge(SaleItems, SaleItems2, by=NULL)

Then to remove equivalent matches and reverse duplicates, extend it with subset:

subset(merge(SaleItems, SaleItems2, by=NULL),
Appliance <= Appliance2)

And if fields are factors:

subset(merge(SaleItems, SaleItems2, by=NULL),
as.character(Appliance) <= as.character(Appliance2))

# Appliance Appliance2
# 1 Radio Radio
# 2 Laptop Radio
# 4 Fridge Radio
# 6 Laptop Laptop
# 8 Fridge Laptop
# 9 Radio TV
# 10 Laptop TV
# 11 TV TV
# 12 Fridge TV
# 16 Fridge Fridge

join each row to the whole second table in R dplyr

There is no need to join, we can use tidyr::expand_grid:

library(dplyr)
library(tidyr)

table1 <- tibble(a = c("a1", "a2"),
b = c("b1", "b2"))

table2 <- tibble(c = c("c1","c2"),
d = c("d1", "d2"))

expand_grid(table1, table2)
#> # A tibble: 4 x 4
#> a b c d
#> <chr> <chr> <chr> <chr>
#> 1 a1 b1 c1 d1
#> 2 a1 b1 c2 d2
#> 3 a2 b2 c1 d1
#> 4 a2 b2 c2 d2

Created on 2021-09-17 by the reprex package (v2.0.1)

How to do cross join in R?

Is it just all=TRUE?

x<-data.frame(id1=c("a","b","c"),vals1=1:3)
y<-data.frame(id2=c("d","e","f"),vals2=4:6)
merge(x,y,all=TRUE)

From documentation of merge:

If by or both by.x and by.y are of length 0 (a length zero vector or NULL), the result, r, is the Cartesian product of x and y, i.e., dim(r) = c(nrow(x)*nrow(y), ncol(x) + ncol(y)).

Cartesian product with filter data.table

Use by = .EACHI feature. In data.table joins and subsets are very closely linked; i.e., a join is just another subset - using data.table - instead of the usual integer / logical / row names. They are designed this way with these cases in mind.

Subset based joins allow to incorporate j-expressions and grouping operations together while joining.

require(data.table)
dt[dt, .SD[contract != i.contract & value + i.value < 4L], by = .EACHI, allow = TRUE]

This is the idiomatic way (in case you'd like to use i.* cols just for condition, but not return them as well), however, .SD has not yet been optimised, and evaluating the j-expression on .SD for each group is costly.

system.time(dt[dt, .SD[contract != i.contract & value + i.value < 4L], by = .EACHI, allow = TRUE])
# user system elapsed
# 2.874 0.020 2.983

Some cases using .SD have already been optimised. Until these cases are taken care of, you can workaround it this way:

dt[dt, {
idx = contract != i.contract & value + i.value < 4L
list(contract = contract[idx],
value = value[idx],
i.contract = i.contract[any(idx)],
i.value = i.value[any(idx)]
)
}, by = .EACHI, allow = TRUE]

And this takes 0.045 seconds, as opposed to 0.005 seconds from your method. But by = .EACHI evaluates the j-expression each time (and therefore memory efficient). That's the trade-off you'll have to accept.



Related Topics



Leave a reply



Submit