Pandas: Merging Two Columns into One With Corresponding Values

python: combine two columns into one and duplicate corresponding cells

This is the way to do it with stack.. the way you were trying it before -

  1. Step 1 - df.stack() only the col1 and col 2 (and drop the Nans as well), then keep only the integer with reset_index()
    index that will be used to merge it in the next step
  2. Step 2 - pd.merge()
    the initial DataFrame with the stacked one on their index
  3. Step 3 - DONE!
a = pd.DataFrame(df[['col1','col2']].stack(dropna=True),columns=['col1']).reset_index(level=1, drop=True)
pd.merge(df[['ID']],a,how='left',left_index=True, right_index=True)
    ID  col1
0 item_1 abc
1 item_2 bcd
2 item_3 NaN
3 item_4 mnb
3 item_4 lkj

Do update the correct answer in case you find this easier to understand, for anyone finding similar solutions. Cheers!

How to merge only specific text parts of two columns into one column?

Use splitting values by . and selecting first lists by str[0] indexing:

df['ColNew'] = (df['Col1'].str.rsplit('.', n=1).str[0]  + '.' + 
df['Col2'].astype(str).str.split('.').str[0])
print (df)
Col1 Col2 ColNew
0 11.50.199.1 12121.0 11.50.199.12121
1 12.55.222.1 12121.0 12.55.222.12121

Or use Series.str.replace for last digits with . in both columns:

df['ColNew'] = (df['Col1'].str.replace(r'\.\d+$', '')   + '.' + 
df['Col2'].astype(str).str.replace(r'\.\d+$', '') )

Merge DataFrames with Matching Values From Two Different Columns - Pandas

Use how='inner' in pd.merge:

merged_df = DF2.merge(DF1, how = 'inner', on = ['date', 'hours'])

This will perform and "inner-join" thereby omitting rows in each dataframe that do not match. Hence, no NaN in either the right or left part of merged dataframe.

Python Pandas merge only certain columns

You could merge the sub-DataFrame (with just those columns):

df2[list('xab')]  # df2 but only with columns x, a, and b

df1.merge(df2[list('xab')])

How to add or combine two columns into another one in a dataframe if they meet a condition

Use numpy where, which works for conditionals. It is akin to an if statement in Python, but significantly faster. I rarely use iterrows, since I don't find it as efficient as numpy where.

 dfn['c'] = np.where(dfn['a']%2 !=0, 
dfn.a + dfn.b,
dfn.a)


a b c
0 1 6 7
1 2 7 2
2 3 8 11
3 4 9 4
4 5 10 15

Basically, the first line in np.where defines your condition, which in this case is finding out if the 'a' column is an odd number. If it is, the next line is executed. If it is an even number, then the last line is executed. You can think of it as an if-else statement.



Related Topics



Leave a reply



Submit