Conditionally Fill Column Values Based on Another Columns Value in Pandas

Conditionally fill column values based on another columns value in pandas

You probably want to do

df['Normalized'] = np.where(df['Currency'] == '$', df['Budget'] * 0.78125, df['Budget'])

Set value of one Pandas column based on value in another column

one way to do this would be to use indexing with .loc.

Example

In the absence of an example dataframe, I'll make one up here:

import numpy as np
import pandas as pd

df = pd.DataFrame({'c1': list('abcdefg')})
df.loc[5, 'c1'] = 'Value'

>>> df
c1
0 a
1 b
2 c
3 d
4 e
5 Value
6 g

Assuming you wanted to create a new column c2, equivalent to c1 except where c1 is Value, in which case, you would like to assign it to 10:

First, you could create a new column c2, and set it to equivalent as c1, using one of the following two lines (they essentially do the same thing):

df = df.assign(c2 = df['c1'])
# OR:
df['c2'] = df['c1']

Then, find all the indices where c1 is equal to 'Value' using .loc, and assign your desired value in c2 at those indices:

df.loc[df['c1'] == 'Value', 'c2'] = 10

And you end up with this:

>>> df
c1 c2
0 a a
1 b b
2 c c
3 d d
4 e e
5 Value 10
6 g g

If, as you suggested in your question, you would perhaps sometimes just want to replace the values in the column you already have, rather than create a new column, then just skip the column creation, and do the following:

df['c1'].loc[df['c1'] == 'Value'] = 10
# or:
df.loc[df['c1'] == 'Value', 'c1'] = 10

Giving you:

>>> df
c1
0 a
1 b
2 c
3 d
4 e
5 10
6 g

Conditionally fill column based off values in other columns in a pandas df

I believe need:

df['Code_new'] = df['Code'].where(df['Code'].isin(['AA','BB'])).ffill()

df[['Numx','Numy']] = df[['Numx','Numy']].mask(df['Code_new'].duplicated())
mask = df['Code_new'] == 'BB'
df.loc[mask, ['Numx','Numy']] = df.loc[mask, ['Numx','Numy']].ffill()
print (df)
Code Numx Numy Code_new
0 AA 30.2 1.9 AA
1 NaN NaN AA
2 NaN NaN AA
3 BB 35.6 2.2 BB
4 35.6 2.2 BB
5 35.6 2.2 BB
6 35.6 2.2 BB
7 CC 35.6 2.2 BB
8 35.6 2.2 BB
9 DD 35.6 2.2 BB

Or:

df = df.replace('nan', np.nan)

df['Code_new'] = df['Code'].where(df['Code'].isin(['AA','BB'])).ffill()

m1 = df['Code_new'].duplicated() & (df['Code_new'] == 'AA')
df[['Numx','Numy']] = df[['Numx','Numy']].mask(m1)
m2 = df['Code_new'] == 'BB'
df.loc[m2, ['Numx','Numy']] = df.loc[m2, ['Numx','Numy']].ffill()
print (df)
Code Numx Numy Code_new
0 AA 30.2 1.9 AA
1 NaN NaN AA
2 NaN NaN AA
3 BB 35.6 2.2 BB
4 40.2 2.5 BB
5 45.5 3.1 BB
6 45.5 3.1 BB
7 CC 45.5 3.1 BB
8 45.5 3.1 BB
9 DD 42.2 5.4 BB

How to fill a column conditionally to values in an other column being in a list?

this should work

df['type'] = np.where(df['food'].isin(['apple', 'banana', 'kiwi']), 'fruit', 'oth. food')

Fill new column in dataframe using conditional logic

You can use numpy.where function, to get the required values; use .isin method to check if the value of column Game is one of [Type A, Type B, Type C], assign Played for True values, and assign Status column values for False values:

>>> np.where(df['Game'].isin(['Type A', 'Type B', 'Type C']), ['Played'], df['Status'])
array(['Played', 'Played', 'Played', 'Played', 'Won', nan], dtype=object)

You can assign it as a new column:

df['Result'] = np.where(df['Game'].isin(['Type A', 'Type B', 'Type C']),
['Played'],
df['Status'])

ID Game Status Result
0 AB01 Type A Won Played
1 AB02 Type B Draw Played
2 AB03 Type A Won Played
3 AB04 Type C NaN Played
4 AB05 Type D Won Won
5 AB06 Type D NaN NaN

Pandas conditional fillna based on another column values

You can use pandas.Series.map instead of numpy.where.

pandas.Series.map seems to be handier for these simple cases, which makes multiple imputations easier and explicit with dictionaries (say {'0-1000': 'Small', '2000-3000': 'High'}).

numpy.where is designed to handle more logic (ex: if a < 5 then a^2) which is not very useful in the OP use case, but comes at some cost, like making multiple imputations tricky to handle (nested if-else).

Steps :

  1. Generate a mask to tag the subset of the pandas.DataFrame with missing 'Outlet_Size' using pandas.Series.isna() ;
  2. Define a dictionary with mappings, e.g. from '0-1000' to 'Small' ;
  3. Replace 'Outlet_Size' values in the defined pandas.DataFrame subset using pandas.Series.map with the defined dictionary as args argument.
  4. Use pandas.Series.fillna() to catch the unmapped missing 'Outlet_Size' and impute them to a default value.

Example :

import pandas as pd
import numpy as np
fake_dataframe = pd.DataFrame({
'Outlet_Size' : ['Medium', 'Medium', 'Medium', np.nan, 'High', 'High', np.nan, 'Small', 'Medium', 'Small', np.nan, np.nan],
'sales_bin': ['3000-4000', '0-1000', '2000-3000', '0-1000', '0-1000', '2000-3000', '0-1000', '1000-2000', '1000-2000', '0-1000', '2000-3000', '1000-2000']
})
missing_mask = fake_dataframe['Outlet_Size'].isna()
mapping_dict = dict({'0-1000': 'Small'})
fake_dataframe.loc[missing_mask, 'Outlet_Size'] = fake_dataframe.loc[missing_mask, 'sales_bin'].map(mapping_dict)
fake_dataframe['Outlet_Size'] = fake_dataframe['Outlet_Size'].fillna('Medium')
print(fake_dataframe)
Outlet_Size sales_bin
0 Medium 3000-4000
1 Medium 0-1000
2 Medium 2000-3000
3 Small 0-1000
4 High 0-1000
5 High 2000-3000
6 Small 0-1000
7 Small 1000-2000
8 Medium 1000-2000
9 Small 0-1000
10 Medium 2000-3000
11 Medium 1000-2000

Example with multiple imputations :

import pandas as pd
import numpy as np
fake_dataframe = pd.DataFrame({
'Outlet_Size' : ['Medium', 'Medium', 'Medium', np.nan, 'High', 'High', np.nan, 'Small', 'Medium', 'Small', np.nan, np.nan],
'sales_bin': ['3000-4000', '0-1000', '2000-3000', '0-1000', '0-1000', '2000-3000', '0-1000', '1000-2000', '1000-2000', '0-1000', '2000-3000', '1000-2000']
})
missing_mask = fake_dataframe['Outlet_Size'].isna()
mapping_dict = dict({'0-1000': 'Small', '2000-3000': 'High'})
fake_dataframe.loc[missing_mask, 'Outlet_Size'] = fake_dataframe.loc[missing_mask, 'sales_bin'].map(mapping_dict)
fake_dataframe['Outlet_Size'] = fake_dataframe['Outlet_Size'].fillna('Medium')
print(fake_dataframe)
Outlet_Size sales_bin
0 Medium 3000-4000
1 Medium 0-1000
2 Medium 2000-3000
3 Small 0-1000
4 High 0-1000
5 High 2000-3000
6 Small 0-1000
7 Small 1000-2000
8 Medium 1000-2000
9 Small 0-1000
10 High 2000-3000
11 Medium 1000-2000

How do I assign values based on multiple conditions for existing columns?

You can do this using np.where, the conditions use bitwise & and | for and and or with parentheses around the multiple conditions due to operator precedence. So where the condition is true 5 is returned and 0 otherwise:

In [29]:
df['points'] = np.where( ( (df['gender'] == 'male') & (df['pet1'] == df['pet2'] ) ) | ( (df['gender'] == 'female') & (df['pet1'].isin(['cat','dog'] ) ) ), 5, 0)
df

Out[29]:
gender pet1 pet2 points
0 male dog dog 5
1 male cat cat 5
2 male dog cat 0
3 female cat squirrel 5
4 female dog dog 5
5 female squirrel cat 0
6 squirrel dog cat 0


Related Topics



Leave a reply



Submit