Creating a New Dataframe Column by Comparing Strings of Two Unequal Dataframes

Comparing strings in two different dataframe and adding a column

Using merge() should solve the problem.

df3 = pd.merge(df1, df2, on='Name')

Outcome:

import pandas as pd

df1 = pd.DataFrame({ "Name":["Bob1", "Bob2", "Bob3"], "Age":[20,21,22]})
df2 = pd.DataFrame({ "Country":["US", "UK", "US", "Canada", "Canada", "US", "UK", "UK", "UK", "Canada"],
"Name":["Bob1", "Bob123", "Bob234", "Bob2", "Bob987", "Bob3", "Mary1", "Mary2", "Mary3", "Mary65"]})

df3 = pd.merge(df1, df2, on='Name')
df3

outcome

pandas: compare string columns from two different dataframes of different sizes

Use Series.isin if need boolean True/False:

df1['result'] = df1['text'].isin(df2['text'])
print (df1)
text result
0 the old man is here True
1 the young girl is there False
2 the old woman is here False
3 the young boy is there True
4 the young girl is here False
5 the old girl is here False

working like:

#removed '' from 'True', 'False' for boolean
df1['result'] = np.where(df1['text'].isin(df2['text']), True, False)

Your solution create strings, so if need use for filtering it fail:

df1['result'] = np.where(df1['text'].isin(df2['text']), 'True', 'False')

how to compare two data frame on one string column that the number of samples are different pandas

You can apply on the smallest DataFrame like dftest then check in unique() values in largest DataFrame like dftrain like below :

>>> dftrain = pd.DataFrame({'col1': ['text', 'Hello', 'How are you?', 'Hello', 'Hello' , 'Hello']})

>>> dftest = pd.DataFrame({'col2': ['text', 'hello', 'How are you?', 'hello']})

>>> dftest.loc[dftest['col2'].apply(lambda x : x in dftrain.col1.unique()), 'col2']

0 text
2 How are you?
Name: col2, dtype: object

>>> dftest.loc[dftest['col2'].apply(lambda x : x in dftrain.col1.unique()), 'col2'].tolist()

['text', 'How are you?']

Create a new dataframe column by comparing two other columns in different dataframes

Use map after converting alpha2 to a mappable object.

First we make our map:

>> country_map = alpha2.set_index('Code')['Name'].to_dict()
>> # country_map = dict(alpha2[['Code', 'Name']].values)
>> # country_map = alpha2.set_index('Code')['Name']
>> print(country_map)
{'ES': 'Spain', 'UK': 'United Kingdom', 'GH': 'Ghana', 'SL': 'Sierra Leone'}

Then we map it on the Country Code column:

>> cube_data['Country'] = cube_data['Country Code'].map(country_map)
>> print(cube_data)
Country Code Country
0 UK United Kingdom
1 ES Spain
2 SL Sierra Leone

Comparing columns of two Data Frames and returning the values of a different column using Pandas

You can use a dataframe merge for this

import pandas as pd

df_1 = pd.DataFrame({
'product_id': ['p1', 'p2', 'p3', 'p4'],
'product_price': [100, 200, 300, 400],
'invoice_total': [200, 300, 600, 700]
})

df_2 = pd.DataFrame({
'product_id': ['p1', 'p6', 'p2'],
'quantity': [8, 3, 5],
'invoice_total': [700, 900, 600]
})

df_merged = df_1.merge(
df_2,
on='product_id',
suffixes=('_df1', '')
)

Contents of df_merged

  product_id  product_price  invoice_total_df1  quantity  invoice_total
0 p1 100 200 8 700
1 p2 200 300 5 600

Then filter to only the columns you need

df_merged = df_merged[['product_id', 'invoice_total']]

Final contents of df_merged

  product_id  invoice_total
0 p1 700
1 p2 600

Compare the values of two columns of different length in two different DataFrames and perform a math operation if matches a condition

Use MultiIndex if unique MultiIndex values:

df11 = df1.set_index(['ID','Class'])
df11['VALUE'] = df11['VALUE'].mul(df2.set_index(['ID','Class'])['NUMBER'])
df = df11.reset_index()

Or use left join in DataFrame.merge and multiple column VALUE with NUMBER with DataFrame.pop for remove after this operation:

df = df1.merge(df2, on=['ID','Class'], how='left')
df['VALUE'] *= df.pop('NUMBER')

Or:

df1['VALUE'] *= df1.merge(df2, on=['ID','Class'], how='left')['NUMBER']

How to compare two (2) unequal dataframes in Python and assign elements from the one to another?

Use df.merge():

In [240]: res = df1.merge(df2, on='number1')

In [241]: res
Out[241]:
number1 start end
0 10 17.8 17.8
1 20 25.0 28.0
2 30 18.4 19.5


Related Topics



Leave a reply



Submit