Combine Two Data Frames with All Posible Combinations

combine two data frames with all posible combinations

With this particular example I think you can just use the merge function. As a standard its arguments all.x and all.y are set to TRUE, so it automatically creates all combinations since the dataframes do not have any variables or values in common.

df <-data.frame(a=letters[1:10] )
df1<-data.frame(one=1:10)

dfcomb <- merge(df,df1)
dim(dfcomb)
[1] 100 2 #gives 100 rows and 2 columns

Merge DataFrames with the all combinations of Pks

It looks like you want the cartesian product of both dataframes. On approach would be to add a key which is repeated for all values in both dataframes, and merge them using this key.

The reason behind this is that as stated in the documentation, when key combinations appear more than once in both dataframes pd.merge does the cartesian product:

It is worth spending some time understanding the result of the many-to-many join case. In SQL / standard relational algebra, if a key combination appears more than once in both tables, the resulting table will have the Cartesian product of the associated data.

Note that all types of merge will yield the same result in this case, here left is chosen simply so that results are sorted with the same order of appearance as in df1:

cp = df1.assign(key=0).merge(df2.assign(key=0), how='left', on = 'key')
cp[['USER_ID_hash','COUPON_ID_hash']]

USER_ID_hash COUPON_ID_hash
0 1118263 f4ecdf6a1ccd5c4ae83c7a83a9f68454
1 1118263 34a80aff2d963d28013110789cfdfdb9
2 1118263 f0ab02ced0e91e097abc1b3c89a9d9b0
3 1118263 e97a601bc242db11579f7eada35205fd
4 1118263 771941be288930b649c7bb214efe2b07
5 1118263 2faa0e168770fb01a21f3ddf1b23aeab
6 1118263 c340a7ab46f19c1ba3b57b513f16bead
7 1118263 d84eb4dfc6d0b7bf0e8c50c06d14e4fe
8 1118263 88419bf4acc9f7e5c3529fc3bde29da6
9 1118263 530aa1bf00c00f02255eeea81559b932
10 2669790 f4ecdf6a1ccd5c4ae83c7a83a9f68454
11 2669790 34a80aff2d963d28013110789cfdfdb9
12 2669790 f0ab02ced0e91e097abc1b3c89a9d9b0

Python: combine two data frames with all combinations of index

You can create a merging index with:

df['temp'] = 1

And then merge the dataframe to itself with:

merged_df = df.merge(df, on='temp', suffixes=('A', 'B')).drop('temp', axis=1)

If you don't want the combinations of the same id_number, do finally:

merged_df = merged_df[merged_df['id_numberA'] != merged_df['id_numberB']]

And if you don't want duplicated mixes of id_numberA and id_numberB, do finally instead:

merged_df = merged_df[merged_df['id_numberA'] < merged_df['id_numberB']]

How to concatenate combinations of rows from two different dataframes?

Use itertools.product():

import itertools
pd.DataFrame(list(itertools.product(df1.A,df2.B)),columns=['A','B'])

   A  B
0 1 a
1 1 b
2 1 c
3 2 a
4 2 b
5 2 c

Create combination of two pandas dataframes in two dimensions

Update pandas 1.2.0+

df1.merge(df2, how='cross')

Set a common key between the two dataframes and use pd.merge:

df1['key'] = 1
df2['key'] = 1

Merge and drop key column:

df3 = pd.merge(df1,df2,on='key').drop('key',axis=1)
df3

Output:

   df1_id  other_data_1  other_data_2  df2_id  other_data_3  other_data_4
0 1 0 1 1 0 1
1 1 0 1 3 2 2
2 2 1 5 1 0 1
3 2 1 5 3 2 2

Merge two data frames with all combinations

You can first reshape the two data frames (using melt from reshape2), then merge them (using full_join from dplyr) and reshape that back into the original format:

library(dplyr)
library(reshape2)

df1.b <- melt(df1)
df2.b <- melt(df2)

df3 <- full_join(df1.b, df2.b)

df3 <- na.omit(df3)

df3.b <- dcast(df3, nr + Color + Part ~ variable)

nr Color Part mach1 mach2 mach3 mach4
1 000 Brown Tree NA NA NA 1
2 000 Cyan Tree NA NA 1 NA
3 000 Red Car NA 1 NA NA
4 000 Red Tree 1 NA NA NA
5 001 Blue Flag 1 1 1 1
6 002 Green Road 1 1 NA NA
7 002 Green Tree NA NA NA 1
8 002 Red Tree NA NA 1 NA
9 003 DeepBlue House 1 1 NA NA
10 003 DeepBlue Road NA NA 1 1
11 004 Cyan Plane 1 NA NA NA
12 004 Grey Plane NA NA 1 1
13 004 Orange Plane NA 1 NA NA
14 005 Magenta Car 1 NA NA NA
15 005 Magenta House NA 1 1 1

Combining two dataframes together to create a long dataframe with all combinations

It's called a cross join (or cartesian product) and can do it easy enough in base R

merge(datA,datB,all=TRUE)

All combinations of two dataframes

Assuming you start with dataframes df1 and df2, first melt df2 (wide-to-long), convert your percentages to numeric, then merge with df1:

df2 = pd.melt(df2, id_vars='Week', value_vars=['Product1', 'Product2', 'Product3'])
df2['value'] = pd.to_numeric(df2['value'].str[:-1])

res = df1.merge(df2)\
.eval('Value = Value * value / 100')\
.drop('value', 1)

print(res)

id Country Channel Week Value variable
0 0 ES Train 2018-48 250.0 Product1
1 0 ES Train 2018-48 500.0 Product2
2 0 ES Train 2018-48 250.0 Product3
3 1 ES Train 2018-49 250.0 Product1
4 1 ES Train 2018-49 500.0 Product2
5 1 ES Train 2018-49 250.0 Product3
6 2 ES Train 2018-50 250.0 Product1
7 2 ES Train 2018-50 500.0 Product2
8 2 ES Train 2018-50 250.0 Product3
9 3 ES Train 2018-51 250.0 Product1
10 3 ES Train 2018-51 500.0 Product2
11 3 ES Train 2018-51 250.0 Product3
12 4 ES Train 2018-52 250.0 Product1
13 4 ES Train 2018-52 500.0 Product2
14 4 ES Train 2018-52 250.0 Product3

Merge all possible combinations of multiple data frames

You can do:

# Put all frames in a list
d <- list(frame1, frame2, frame3)
# get all merge-combinations
gr <- expand.grid(1:length(d), 1:length(d))

# function to merge and get the sum diff:
foo <- function(i, x, gr){
tmp <- merge(x[[gr[i, 1]]], x[[gr[i, 2]]], by=c("a", "b"))
sum(abs(tmp$c.x - tmp$c.y))
}

# result matrix
matrix(sapply(1:nrow(gr), foo, d, gr), length(d), length(d), byrow = T)
[,1] [,2] [,3]
[1,] 0 0 0
[2,] 0 0 0
[3,] 0 0 0

# The scores are set as followed:
matrix(apply(gr, 1, paste, collapse="_"), 3, 3, byrow = T)
[,1] [,2] [,3]
[1,] "1_1" "2_1" "3_1"
[2,] "1_2" "2_2" "3_2"
[3,] "1_3" "2_3" "3_3"


# alternative using apply:

# function to merge and get the sum diff:
foo <- function(y, x){
tmp <- merge(x[[ y[1] ]], x[[ y[2] ]], by=c("a", "b"))
sum(abs(tmp$c.x - tmp$c.y))
}
# result matrix
matrix(apply(gr, 1, foo, d), length(d), length(d), byrow = T)


Related Topics



Leave a reply



Submit