How to Perform a Cross Join or Cartesian Product in Excel

Google sheets - cross join / cartesian join from two separate columns

Use this:

=ARRAYFORMULA(
SPLIT(
FLATTEN(
FILTER('input 1'!A2:A, 'input 1'!A2:A <> "")
& ","
& TRANSPOSE(FILTER('input 2'!A2:A, 'input 2'!A2:A <> ""))
),
","
)
)

in cell A3 but make sure you reformat column B into date (copy the format from column E):

Sample Image

Excel - Generating the Cartesian product of three sets of numbers in each row

Here is a VBA function which can handle the special case of 3 number sets:

Function CartesianProduct(nums1 As Range, nums2 As Range, nums3 As Range) As Variant
Dim n As Long 'number of products
Dim i As Long, j As Long, k As Long, r As Long
Dim products As Variant

n = nums1.Cells.Count * nums2.Cells.Count * nums3.Cells.Count
ReDim products(1 To n, 1 To 3)
For i = 1 To nums1.Cells.Count
For j = 1 To nums2.Cells.Count
For k = 1 To nums3.Cells.Count
r = r + 1 'current row
products(r, 1) = nums1.Cells(i)
products(r, 2) = nums2.Cells(j)
products(r, 3) = nums3.Cells(k)
Next k
Next j
Next i
CartesianProduct = products
End Function

This can be called from another VBA function or sub, or used directly as an array formula in the sheet:

Sample Image

In the above screenshot I selected the range A3:C8 (needing to determine its size ahead of time) entered the formula

=CartesianProduct(A1,B1:D1,E1:F1)

and then accept it as an array formula by entering it with Ctrl+Shift+Enter.

Once you get beyond three sets, things get a bit tricky since you can't hardwire in the necessary levels for a looping approach and would instead probably use a recursive approach, something along the lines of this answer: https://stackoverflow.com/a/31622856/4996248

cartesian product in pandas

In recent versions of Pandas (>= 1.2) this is built into merge so you can do:

from pandas import DataFrame
df1 = DataFrame({'col1':[1,2],'col2':[3,4]})
df2 = DataFrame({'col3':[5,6]})

df1.merge(df2, how='cross')

This is equivalent to the previous pandas < 1.2 answer but is easier to read.


For pandas < 1.2:

If you have a key that is repeated for each row, then you can produce a cartesian product using merge (like you would in SQL).

from pandas import DataFrame, merge
df1 = DataFrame({'key':[1,1], 'col1':[1,2],'col2':[3,4]})
df2 = DataFrame({'key':[1,1], 'col3':[5,6]})

merge(df1, df2,on='key')[['col1', 'col2', 'col3']]

Output:

   col1  col2  col3
0 1 3 5
1 1 3 6
2 2 4 5
3 2 4 6

See here for the documentation: http://pandas.pydata.org/pandas-docs/stable/merging.html

Create the cartesian product (cross join) of two csv files in python

I don't know of any out-of-the-box solution, so I made this:

import csv
from itertools import product

def main():
with open('file1.csv', 'rb') as f1, open('file2.csv', 'rb') as f2:
reader1 = csv.reader(f1, dialect=csv.excel_tab)
reader2 = csv.reader(f2, dialect=csv.excel_tab)

# Step 1: Read and write the headers separately.
header1, header2 = next(reader1), next(reader2)
with open('output.csv', 'wb') as out:
writer = csv.writer(out, dialect=csv.excel_tab)
writer.writerow(header1 + header2)
# Step 2: Write the product of the rest of the rows.
writer.writerows(
row1 + row2 for row1, row2 in product(reader1, reader2))

main()

With files:

file1.csv

hour    Elevation   Azimuth
06:29:00 -0.833 67.72
07:00:00 6.28 68.75

file2.csv

ID  SURFACES
1 GROUND
2 ROOF

you get the following output.csv:

hour    Elevation   Azimuth ID  SURFACES
06:29:00 -0.833 67.72 1 GROUND
06:29:00 -0.833 67.72 2 ROOF
07:00:00 6.28 68.75 1 GROUND
07:00:00 6.28 68.75 2 ROOF

Generate all possible combinations for Columns(cross join or Cartesian product)

Update 201810

Original formula crashes for a big dataset. I described a way to make cross-join with any size of data here.


Try formula:

=ArrayFormula(transpose(split(rept(concatenate(A2:A&char(9)),counta(B2:B)),char(9)))
&" "&transpose(split(concatenate(rept(B2:B&char(9),counta(A2:A))),char(9))))

The result:

car red
train red
car yellow
train yellow
car blue
train blue

You may use it again to add another list:

Sample Image

The formula is in cells C2 and E2,

C2 is:

=ArrayFormula(transpose(split(rept(concatenate(A2:A&char(9)),counta(B2:B)),char(9)))&" "&transpose(split(concatenate(rept(B2:B&char(9),counta(A2:A))),char(9))) )

and E2 is:

=ArrayFormula(transpose(split(rept(concatenate(C2:C&char(9)),counta(D2:D)),char(9)))&" "&transpose(split(concatenate(rept(D2:D&char(9),counta(C2:C))),char(9))) )


Related Topics



Leave a reply



Submit