Pandas Populate New Dataframe Column Based on Matching Columns in Another Dataframe

Pandas populate new dataframe column based on matching columns in another dataframe

Consider the following dataframes df and df2

df = pd.DataFrame(dict(
AUTHOR_NAME=list('AAABBCCCCDEEFGG'),
title= list('zyxwvutsrqponml')
))

df2 = pd.DataFrame(dict(
AUTHOR_NAME=list('AABCCEGG'),
title =list('zwvtrpml'),
CATEGORY =list('11223344')
))

option 1

merge

df.merge(df2, how='left')

option 2

join

cols = ['AUTHOR_NAME', 'title']
df.join(df2.set_index(cols), on=cols)

both options yield

enter image description here

Pandas: Add a new column in a data frame based on a value in another data frame

print (df1)
userId gender
0 1 F
1 2 M
2 3 F
3 4 M
4 5 M
5 6 M

print (df2)
userId itemClicked ItemBought date
0 1 123182 123212 02/02/2016
1 3 234256 123182 05/02/2016
2 5 986834 234256 04/19/2016
3 4 787663 787663 05/12/2016
4 20 465738 465738 03/20/2016
5 4 787223 787663 07/12/2016

You can use map:

df2['gender'] = df2.userId.map(df1.set_index('userId')['gender'].to_dict())

print (df2)
userId itemClicked ItemBought date gender
0 1 123182 123212 02/02/2016 F
1 3 234256 123182 05/02/2016 F
2 5 986834 234256 04/19/2016 M
3 4 787663 787663 05/12/2016 M
4 20 465738 465738 03/20/2016 NaN
5 4 787223 787663 07/12/2016 M

Another solution with merge and left join, parameter on can be omit if only column gender is same in both DataFrames:

df = pd.merge(df2, df1, how='left')

print (df)
userId itemClicked ItemBought date gender
0 1 123182 123212 02/02/2016 F
1 3 234256 123182 05/02/2016 F
2 5 986834 234256 04/19/2016 M
3 4 787663 787663 05/12/2016 M
4 20 465738 465738 03/20/2016 NaN
5 4 787223 787663 07/12/2016 M

Timings:

#len(df2) = 600k
df2 = pd.concat([df2]*100000).reset_index(drop=True)

def f(df1,df2):
df2['gender'] = df2.userId.map(df1.set_index('userId')['gender'].to_dict())
return df2


In [43]: %timeit f(df1,df2)
10 loops, best of 3: 34.2 ms per loop

In [44]: %timeit (pd.merge(df2, df1, how='left'))
10 loops, best of 3: 102 ms per loop

Add a column to pandas dataframe based on value present in different dataframe

You can use .isin(), as follows:

A['df_b_presence'] = A['ID'].isin(B['ID'])

Result:

print(A)

ID color df_b_presence
0 5 red False
1 6 blue False
2 7 blue True
3 8 NaN False
4 9 green True
5 10 NaN True

New column based on matching values from another dataframe pandas

Check with stack df1's list columns after re-create with DataFrame then map the value from df2


Also since you asking not using for loop I am using sum , and sum for this case is much slower than *for loop* or itertools


s=pd.DataFrame(df1.column2.tolist()).stack()
df1['New']=s.map(df2.set_index('column3').column4).sum(level=0).apply(set)
df1
Out[36]:
column1 column2 New
0 a1 [A, B] {2, 4, 3, 1}
1 a2 [A, B, C] {3, 5, 4, 2, 1}
2 a3 [B, C] {4, 3, 1, 5}

As I mentioned and most of us suggested , also you can check with For loops with pandas - When should I care?

import itertools
d=dict(zip(df2.column3,df2.column4))


l=[set(itertools.chain(*[d[y] for y in x ])) for x in df1.column2.tolist()]
df1['New']=l

Fill column of a dataframe from another dataframe

Use drop_duplicates with set_index and combine_first:

df = df2.set_index('Col1').combine_first(df1.drop_duplicates().set_index('Col1')).reset_index()

If need check dupes only in id column:

df = df2.set_index('Col1').combine_first(df1.drop_duplicates().set_index('Col1')).reset_index()

how do I succinctly create a new dataframe column based on matching existing column values with list of values?

Use str.extract: create a regex pattern of your search words and try to extract the matched pattern:

pattern = fr"\b({'|'.join(search_words1)})\b"
df3['col4'] = df3['col3'].str.extract(pattern)

Pattern:

>>> print(pattern)
\b(man|red)\b

\b matches the empty string, but only at the beginning or end of a word. The ( ) is the capture group.



Related Topics



Leave a reply



Submit