How to do cross join in R?
Is it just all=TRUE
?
x<-data.frame(id1=c("a","b","c"),vals1=1:3)
y<-data.frame(id2=c("d","e","f"),vals2=4:6)
merge(x,y,all=TRUE)
From documentation of merge
:
If by or both by.x and by.y are of length 0 (a length zero vector or NULL), the result, r, is the Cartesian product of x and y, i.e., dim(r) = c(nrow(x)*nrow(y), ncol(x) + ncol(y)).
R: data.table cross-join not working
There is no cross join functionality available in data.table
out of the box.
Yet there is CJ.dt
function (a CJ
like but designed for data.tables) to achieve cartesian product (cross join) available in optiRum
package (available in CRAN).
You can create the function:
CJ.dt = function(X,Y) {
stopifnot(is.data.table(X),is.data.table(Y))
k = NULL
X = X[, c(k=1, .SD)]
setkey(X, k)
Y = Y[, c(k=1, .SD)]
setkey(Y, NULL)
X[Y, allow.cartesian=TRUE][, k := NULL][]
}
CJ.dt(dtCustomers, dtDates1)
CJ.dt(dtCustomers, dtDates2)
Yet there is a FR for convenience way to perform cross join filled in data.table#1717, so you could check there if there is a nicer api for cross join.
Cartesian Product using data.table package
If you first construct full names from the first and last in the cust
-dataframe, you can then use CJ
(cross-join). You cannot use all three vectors since there would be 99 items and teh first names would get inappropriately mixed with last names.
> nrow(CJ(dates$date, cust$first.name, cust$last.name ) )
[1] 99
This returns the desired data.table object:
> CJ(dates$date,paste(cust$first.name, cust$last.name) )
V1 V2
1: 2012-08-28 George Smith
2: 2012-08-28 Henry Smith
3: 2012-08-28 John Doe
4: 2012-08-29 George Smith
5: 2012-08-29 Henry Smith
6: 2012-08-29 John Doe
7: 2012-08-30 George Smith
8: 2012-08-30 Henry Smith
9: 2012-08-30 John Doe
10: 2012-08-31 John Doe
11: 2012-08-31 George Smith
12: 2012-08-31 Henry Smith
13: 2012-09-01 John Doe
14: 2012-09-01 George Smith
15: 2012-09-01 Henry Smith
16: 2012-09-02 George Smith
17: 2012-09-02 Henry Smith
18: 2012-09-02 John Doe
19: 2012-09-03 Henry Smith
20: 2012-09-03 John Doe
21: 2012-09-03 George Smith
22: 2012-09-04 Henry Smith
23: 2012-09-04 John Doe
24: 2012-09-04 George Smith
25: 2012-09-05 George Smith
26: 2012-09-05 Henry Smith
27: 2012-09-05 John Doe
28: 2012-09-06 George Smith
29: 2012-09-06 Henry Smith
30: 2012-09-06 John Doe
31: 2012-09-07 George Smith
32: 2012-09-07 Henry Smith
33: 2012-09-07 John Doe
V1 V2
Cross join in Data.table doesnt seem to retain column names
That names are retained is not mentioned in the main body of the help file ?CJ
, that is in the Details or Value section. However, there appears to be mention that names are retained as a comment in the examples section of the help file (and it looks like this is where you got your example).
Digging around in the CJ
function, which appears to be entirely implemented in R, there is a block near the end,
if (getOption("datatable.CJ.names", FALSE))
vnames = name_dots(...)$vnames
Running getOption("datatable.CJ.names", FALSE)
returns FALSE with data.table
version 1.12.0. When we set this to TRUE with
options("datatable.CJ.names"=TRUE)
then the code
x = c(1,1,2)
y = c(4,6,4)
CJ(x, y)
returns
x y
1: 1 4
2: 1 4
3: 1 4
4: 1 4
5: 1 6
6: 1 6
7: 2 4
8: 2 4
9: 2 6
However, you are also able to directly provide names (which is not mentioned in the help file).
CJ(uu=x, vv=y)
which returns
uu vv
1: 1 4
2: 1 4
3: 1 4
4: 1 4
5: 1 6
6: 1 6
7: 2 4
8: 2 4
9: 2 6
Note that this overrides the above option.
R data.table cross-join by three variables
You can also do this:
data[, .(date=dates_wanted), .(group,id)]
Output:
group id date
1: A frank 2020-01-01
2: A frank 2020-01-02
3: A frank 2020-01-03
4: A frank 2020-01-04
5: A frank 2020-01-05
---
120: B edward 2020-01-27
121: B edward 2020-01-28
122: B edward 2020-01-29
123: B edward 2020-01-30
124: B edward 2020-01-31
R data.table join two tables and keep all rows
This is cross join assign a New Key to help merge
DT1$Key=1
DT2$Key=1
DT3=merge(DT1,DT2,by='Key')
DT3 #DT3$Key=NULL remove the key
Key ID_1 val_1 ID_2 val_2
1: 1 1 1 3 3
2: 1 1 1 4 4
3: 1 2 2 3 3
4: 1 2 2 4 4
Cross join in data.table within a function
You could use the ...
which are used to refer to arguments passed down from a calling function...?
require( data.table )
f <- function( ... ){
CJ(...)
}
f( c(1:2) , c(3:4) )
# V1 V2
#1: 1 3
#2: 1 4
#3: 2 3
#4: 2 4
Edit: How about this?
do.call(CJ, replicate(n, vals, simplify=FALSE))
# V1 V2 V3 V4
# 1: no no no no
# 2: no no no yes
# 3: no no yes no
# 4: no no yes yes
# 5: no yes no no
# 6: no yes no yes
# 7: no yes yes no
# 8: no yes yes yes
# 9: yes no no no
# 10: yes no no yes
# 11: yes no yes no
# 12: yes no yes yes
# 13: yes yes no no
# 14: yes yes no yes
# 15: yes yes yes no
# 16: yes yes yes yes
Related Topics
Check for Installed Packages Before Running Install.Packages()
Dplyr Mutate Rowwise Max of Range of Columns
Subtract a Column in a Dataframe from Many Columns in R
Pass a Vector of Variable Names to Arrange() in Dplyr
Filter Data Frame Rows Based on Values in Vector
What Are the R Sorting Rules of Character Vectors
How to Convert R Markdown to HTML? I.E., What Does "Knit HTML" Do in Rstudio 0.96
A Matrix Version of Cor.Test()
Why Do Some Unicode Characters Display in Matrices, But Not Data Frames in R
Write Many Files in a for Loop
How to Wait for a Keypress in R
Update a Value in One Column Based on Criteria in Other Columns
Tidyverse Pivot_Longer Several Sets of Columns, But Avoid Intermediate Mutate_Wider Steps
Proper Idiom for Adding Zero Count Rows in Tidyr/Dplyr
Making a Stacked Bar Plot for Multiple Variables - Ggplot2 in R
How to Generate All Possible Combinations of Vectors Without Caring for Order