Pandas: combine data frames of different sizes
Just perform a left merge
on 'product_id' column:
In [12]:
df.merge(df1, on='product_id', how='left')
Out[12]:
product_id count_white total_count
0 12345 4 10
1 23456 7 30
2 34567 1 90
Merge two dataframes of different sizes after a groupby function
We are using inner join to merge both dataframes, since original df has duplicates on merge keys so it was returning duplicate values. drop_duplicates() came in handy to solve that problem.
Code
df_cut.merge(df.drop_duplicates(), on=["COD","TEC","SET", "AZIM"])
Output
COD TEC SET AZIM STATE CITY
0 ALAAD_0001 4 1 0 AL MAC
1 ALAAD_0001 4 2 120 AL MAC
2 ALAAD_0001 4 3 240 AL MAC
3 BAPID_0001 2 1 20 BA SAL
4 BAPID_0001 2 2 100 BA SAL
5 BAPID_0001 2 3 250 BA SAL
6 CEMBC_0003 4 1 90 CE FOR
7 CEMBC_0003 4 2 160 CE FOR
8 CEMBC_0003 4 3 280 CE FOR
How to merge two Pandas DataFrames of different size based on condition
Try adding an indicator column to o_type_df
:
o_type_df['TypeID'] = 'O'
Then merge
left on those columns:
merged = (
primary_df.merge(o_type_df,
left_on=['RCID', 'TypeID'],
right_on=['O_ID', 'TypeID'],
how='left')
)
merged
:
RCID TypeID Data O_ID O_Data
0 777 D Hello NaN NaN
1 777 O Hey 777.0 Foo
2 778 O Hey 778.0 Bar
3 779 D Hello NaN NaN
Or with assign
:
merged = (
primary_df.merge(o_type_df.assign(TypeID='O'),
left_on=['RCID', 'TypeID'],
right_on=['O_ID', 'TypeID'],
how='left')
)
merged
:
RCID TypeID Data O_ID O_Data
0 777 D Hello NaN NaN
1 777 O Hey 777.0 Foo
2 778 O Hey 778.0 Bar
3 779 D Hello NaN NaN
Concatenate two dataframes of different sizes (pandas)
In this case using combine_first
df1.set_index('id').combine_first(df2.set_index('id')).reset_index()
Out[766]:
id metric1 metric2
0 a 123.0 1.0
1 b 22.0 2.0
2 c 356.0 3.0
3 d 412.0 4.0
4 f 54.0 5.0
5 g 634.0 6.0
6 h 72.0 7.0
7 j 812.0 8.0
8 k 129.0 9.0
9 l 110.0 10.0
10 m 200.0 11.0
11 q 812.0 NaN
12 w 110.0 NaN
13 z 129.0 NaN
Related Topics
Avoid String Printed to Console Getting Truncated (In Rstudio)
Equivalent to Unix "Less" Command Within R Console
Handling Dates When We Switch to Daylight Savings Time and Back
Merge by Range in R - Applying Loops
Delete "" from CSV Values and Change Column Names When Writing to a CSV
Create Frequency Tables for Multiple Factor Columns in R
Adding New Columns to a Data.Table By-Reference Within a Function Not Always Working
Generate Paired Stacked Bar Charts in Ggplot (Using Position_Dodge Only on Some Variables)
Directly Creating Dummy Variable Set in a Sparse Matrix in R
Data.Frame Without Ruining Column Names
Replace Values in a Vector Based on Another Vector
Group Integer Vector into Consecutive Runs
R Function Not Returning Values
Add Max Value to a New Column in R
The Condition Has Length > 1 and Only the First Element Will Be Used in If Else Statement
Should I Use a Data.Frame or a Matrix
How to Make Graphics with Transparent Background in R Using Ggplot2