combine two data frames with all posible combinations
With this particular example I think you can just use the merge function. As a standard its arguments all.x and all.y are set to TRUE, so it automatically creates all combinations since the dataframes do not have any variables or values in common.
df <-data.frame(a=letters[1:10] )
df1<-data.frame(one=1:10)
dfcomb <- merge(df,df1)
dim(dfcomb)
[1] 100 2 #gives 100 rows and 2 columns
Merge DataFrames with the all combinations of Pks
It looks like you want the cartesian product of both dataframes. On approach would be to add a key
which is repeated for all values in both dataframes, and merge
them using this key
.
The reason behind this is that as stated in the documentation, when key combinations appear more than once in both dataframes pd.merge
does the cartesian product:
It is worth spending some time understanding the result of the many-to-many join case. In SQL / standard relational algebra, if a key combination appears more than once in both tables, the resulting table will have the Cartesian product of the associated data.
Note that all types of merge will yield the same result in this case, here left
is chosen simply so that results are sorted with the same order of appearance as in df1
:
cp = df1.assign(key=0).merge(df2.assign(key=0), how='left', on = 'key')
cp[['USER_ID_hash','COUPON_ID_hash']]
USER_ID_hash COUPON_ID_hash
0 1118263 f4ecdf6a1ccd5c4ae83c7a83a9f68454
1 1118263 34a80aff2d963d28013110789cfdfdb9
2 1118263 f0ab02ced0e91e097abc1b3c89a9d9b0
3 1118263 e97a601bc242db11579f7eada35205fd
4 1118263 771941be288930b649c7bb214efe2b07
5 1118263 2faa0e168770fb01a21f3ddf1b23aeab
6 1118263 c340a7ab46f19c1ba3b57b513f16bead
7 1118263 d84eb4dfc6d0b7bf0e8c50c06d14e4fe
8 1118263 88419bf4acc9f7e5c3529fc3bde29da6
9 1118263 530aa1bf00c00f02255eeea81559b932
10 2669790 f4ecdf6a1ccd5c4ae83c7a83a9f68454
11 2669790 34a80aff2d963d28013110789cfdfdb9
12 2669790 f0ab02ced0e91e097abc1b3c89a9d9b0
Python: combine two data frames with all combinations of index
You can create a merging index with:
df['temp'] = 1
And then merge the dataframe
to itself with:
merged_df = df.merge(df, on='temp', suffixes=('A', 'B')).drop('temp', axis=1)
If you don't want the combinations of the same id_number
, do finally:
merged_df = merged_df[merged_df['id_numberA'] != merged_df['id_numberB']]
And if you don't want duplicated mixes of id_numberA
and id_numberB
, do finally instead:
merged_df = merged_df[merged_df['id_numberA'] < merged_df['id_numberB']]
How to concatenate combinations of rows from two different dataframes?
Use itertools.product()
:
import itertools
pd.DataFrame(list(itertools.product(df1.A,df2.B)),columns=['A','B'])
A B
0 1 a
1 1 b
2 1 c
3 2 a
4 2 b
5 2 c
Create combination of two pandas dataframes in two dimensions
Update pandas 1.2.0+
df1.merge(df2, how='cross')
Set a common key between the two dataframes and use pd.merge
:
df1['key'] = 1
df2['key'] = 1
Merge and drop key column:
df3 = pd.merge(df1,df2,on='key').drop('key',axis=1)
df3
Output:
df1_id other_data_1 other_data_2 df2_id other_data_3 other_data_4
0 1 0 1 1 0 1
1 1 0 1 3 2 2
2 2 1 5 1 0 1
3 2 1 5 3 2 2
Merge two data frames with all combinations
You can first reshape the two data frames (using melt
from reshape2
), then merge them (using full_join
from dplyr
) and reshape that back into the original format:
library(dplyr)
library(reshape2)
df1.b <- melt(df1)
df2.b <- melt(df2)
df3 <- full_join(df1.b, df2.b)
df3 <- na.omit(df3)
df3.b <- dcast(df3, nr + Color + Part ~ variable)
nr Color Part mach1 mach2 mach3 mach4
1 000 Brown Tree NA NA NA 1
2 000 Cyan Tree NA NA 1 NA
3 000 Red Car NA 1 NA NA
4 000 Red Tree 1 NA NA NA
5 001 Blue Flag 1 1 1 1
6 002 Green Road 1 1 NA NA
7 002 Green Tree NA NA NA 1
8 002 Red Tree NA NA 1 NA
9 003 DeepBlue House 1 1 NA NA
10 003 DeepBlue Road NA NA 1 1
11 004 Cyan Plane 1 NA NA NA
12 004 Grey Plane NA NA 1 1
13 004 Orange Plane NA 1 NA NA
14 005 Magenta Car 1 NA NA NA
15 005 Magenta House NA 1 1 1
Combining two dataframes together to create a long dataframe with all combinations
It's called a cross join (or cartesian product) and can do it easy enough in base R
merge(datA,datB,all=TRUE)
All combinations of two dataframes
Assuming you start with dataframes df1
and df2
, first melt df2
(wide-to-long), convert your percentages to numeric, then merge with df1
:
df2 = pd.melt(df2, id_vars='Week', value_vars=['Product1', 'Product2', 'Product3'])
df2['value'] = pd.to_numeric(df2['value'].str[:-1])
res = df1.merge(df2)\
.eval('Value = Value * value / 100')\
.drop('value', 1)
print(res)
id Country Channel Week Value variable
0 0 ES Train 2018-48 250.0 Product1
1 0 ES Train 2018-48 500.0 Product2
2 0 ES Train 2018-48 250.0 Product3
3 1 ES Train 2018-49 250.0 Product1
4 1 ES Train 2018-49 500.0 Product2
5 1 ES Train 2018-49 250.0 Product3
6 2 ES Train 2018-50 250.0 Product1
7 2 ES Train 2018-50 500.0 Product2
8 2 ES Train 2018-50 250.0 Product3
9 3 ES Train 2018-51 250.0 Product1
10 3 ES Train 2018-51 500.0 Product2
11 3 ES Train 2018-51 250.0 Product3
12 4 ES Train 2018-52 250.0 Product1
13 4 ES Train 2018-52 500.0 Product2
14 4 ES Train 2018-52 250.0 Product3
Merge all possible combinations of multiple data frames
You can do:
# Put all frames in a list
d <- list(frame1, frame2, frame3)
# get all merge-combinations
gr <- expand.grid(1:length(d), 1:length(d))
# function to merge and get the sum diff:
foo <- function(i, x, gr){
tmp <- merge(x[[gr[i, 1]]], x[[gr[i, 2]]], by=c("a", "b"))
sum(abs(tmp$c.x - tmp$c.y))
}
# result matrix
matrix(sapply(1:nrow(gr), foo, d, gr), length(d), length(d), byrow = T)
[,1] [,2] [,3]
[1,] 0 0 0
[2,] 0 0 0
[3,] 0 0 0
# The scores are set as followed:
matrix(apply(gr, 1, paste, collapse="_"), 3, 3, byrow = T)
[,1] [,2] [,3]
[1,] "1_1" "2_1" "3_1"
[2,] "1_2" "2_2" "3_2"
[3,] "1_3" "2_3" "3_3"
# alternative using apply:
# function to merge and get the sum diff:
foo <- function(y, x){
tmp <- merge(x[[ y[1] ]], x[[ y[2] ]], by=c("a", "b"))
sum(abs(tmp$c.x - tmp$c.y))
}
# result matrix
matrix(apply(gr, 1, foo, d), length(d), length(d), byrow = T)
Related Topics
Why Doesn't Outer Work the Way I Think It Should (In R)
How to Align the Bars of a Histogram with the X Axis
Import Data into R with an Unknown Number of Columns
How to Change the Figure Caption Format in Bookdown
How to Change Order of Array Dimensions
How to Make Variable Bar Widths in Ggplot2 Not Overlap or Gap
Dplyr - Using Mutate() Like Rowmeans()
Get Rid of \Addlinespace in Kable
R: How to Get the Week Number of the Month
Emoticons in Twitter Sentiment Analysis in R
How to Access and Edit Rprofile
Generate Correlated Random Numbers from Binomial Distributions
Stop an R Program Without Error
Any Suggestions for How to Plot Mixem Type Data Using Ggplot2
Reading Global Variables Using Foreach in R
Reading Text File with Multiple Space as Delimiter in R