Conditionally Fill Column Values Based on Another Columns Value in Pandas

Conditionally fill column values based on another columns value in pandas

You probably want to do

df['Normalized'] = np.where(df['Currency'] == '$', df['Budget'] * 0.78125, df['Budget'])

Set value of one Pandas column based on value in another column

one way to do this would be to use indexing with .loc.

Example

In the absence of an example dataframe, I'll make one up here:

import numpy as np
import pandas as pd

df = pd.DataFrame({'c1': list('abcdefg')})
df.loc[5, 'c1'] = 'Value'

>>> df
      c1
0      a
1      b
2      c
3      d
4      e
5  Value
6      g

Assuming you wanted to create a new column c2, equivalent to c1 except where c1 is Value, in which case, you would like to assign it to 10:

First, you could create a new column c2, and set it to equivalent as c1, using one of the following two lines (they essentially do the same thing):

df = df.assign(c2 = df['c1'])
# OR:
df['c2'] = df['c1']

Then, find all the indices where c1 is equal to 'Value' using .loc, and assign your desired value in c2 at those indices:

df.loc[df['c1'] == 'Value', 'c2'] = 10

And you end up with this:

>>> df
      c1  c2
0      a   a
1      b   b
2      c   c
3      d   d
4      e   e
5  Value  10
6      g   g

If, as you suggested in your question, you would perhaps sometimes just want to replace the values in the column you already have, rather than create a new column, then just skip the column creation, and do the following:

df['c1'].loc[df['c1'] == 'Value'] = 10
# or:
df.loc[df['c1'] == 'Value', 'c1'] = 10

Giving you:

Conditionally fill column based off values in other columns in a pandas df

I believe need:

df['Code_new'] = df['Code'].where(df['Code'].isin(['AA','BB'])).ffill()

df[['Numx','Numy']] = df[['Numx','Numy']].mask(df['Code_new'].duplicated())
mask = df['Code_new'] == 'BB'
df.loc[mask, ['Numx','Numy']] = df.loc[mask, ['Numx','Numy']].ffill()
print (df)
  Code  Numx Numy Code_new
0   AA  30.2  1.9       AA
1        NaN  NaN       AA
2        NaN  NaN       AA
3   BB  35.6  2.2       BB
4       35.6  2.2       BB
5       35.6  2.2       BB
6       35.6  2.2       BB
7   CC  35.6  2.2       BB
8       35.6  2.2       BB
9   DD  35.6  2.2       BB

Or:

df = df.replace('nan', np.nan)

df['Code_new'] = df['Code'].where(df['Code'].isin(['AA','BB'])).ffill()

m1 = df['Code_new'].duplicated() & (df['Code_new'] == 'AA')
df[['Numx','Numy']] = df[['Numx','Numy']].mask(m1)
m2 = df['Code_new'] == 'BB'
df.loc[m2, ['Numx','Numy']] = df.loc[m2, ['Numx','Numy']].ffill()
print (df)
  Code  Numx  Numy Code_new
0   AA  30.2   1.9       AA
1        NaN   NaN       AA
2        NaN   NaN       AA
3   BB  35.6   2.2       BB
4       40.2   2.5       BB
5       45.5   3.1       BB
6       45.5   3.1       BB
7   CC  45.5   3.1       BB
8       45.5   3.1       BB
9   DD  42.2   5.4       BB

How to fill a column conditionally to values in an other column being in a list?

this should work

df['type'] = np.where(df['food'].isin(['apple', 'banana', 'kiwi']), 'fruit', 'oth. food')

Fill new column in dataframe using conditional logic

You can use numpy.where function, to get the required values; use .isin method to check if the value of column Game is one of [Type A, Type B, Type C], assign Played for True values, and assign Status column values for False values:

>>> np.where(df['Game'].isin(['Type A', 'Type B', 'Type C']), ['Played'], df['Status'])
array(['Played', 'Played', 'Played', 'Played', 'Won', nan], dtype=object)

You can assign it as a new column:

df['Result'] = np.where(df['Game'].isin(['Type A', 'Type B', 'Type C']),
                        ['Played'],
                        df['Status'])

     ID    Game Status  Result
0  AB01  Type A    Won  Played
1  AB02  Type B   Draw  Played
2  AB03  Type A    Won  Played
3  AB04  Type C    NaN  Played
4  AB05  Type D    Won     Won
5  AB06  Type D    NaN     NaN

Pandas conditional fillna based on another column values

You can use pandas.Series.map instead of numpy.where.

pandas.Series.map seems to be handier for these simple cases, which makes multiple imputations easier and explicit with dictionaries (say {'0-1000': 'Small', '2000-3000': 'High'}).

numpy.where is designed to handle more logic (ex: if a < 5 then a^2) which is not very useful in the OP use case, but comes at some cost, like making multiple imputations tricky to handle (nested if-else).

Steps :

Generate a mask to tag the subset of the pandas.DataFrame with missing 'Outlet_Size' using pandas.Series.isna() ;
Define a dictionary with mappings, e.g. from '0-1000' to 'Small' ;
Replace 'Outlet_Size' values in the defined pandas.DataFrame subset using pandas.Series.map with the defined dictionary as args argument.
Use pandas.Series.fillna() to catch the unmapped missing 'Outlet_Size' and impute them to a default value.

Example :

import pandas as pd
import numpy as np
fake_dataframe = pd.DataFrame({
    'Outlet_Size' : ['Medium', 'Medium', 'Medium', np.nan, 'High', 'High', np.nan, 'Small', 'Medium', 'Small', np.nan, np.nan],
    'sales_bin': ['3000-4000', '0-1000', '2000-3000', '0-1000', '0-1000', '2000-3000', '0-1000', '1000-2000', '1000-2000', '0-1000', '2000-3000', '1000-2000']
})
missing_mask = fake_dataframe['Outlet_Size'].isna()
mapping_dict = dict({'0-1000': 'Small'})
fake_dataframe.loc[missing_mask, 'Outlet_Size'] = fake_dataframe.loc[missing_mask, 'sales_bin'].map(mapping_dict)
fake_dataframe['Outlet_Size'] = fake_dataframe['Outlet_Size'].fillna('Medium')
print(fake_dataframe)
   Outlet_Size  sales_bin
0       Medium  3000-4000
1       Medium     0-1000
2       Medium  2000-3000
3        Small     0-1000
4         High     0-1000
5         High  2000-3000
6        Small     0-1000
7        Small  1000-2000
8       Medium  1000-2000
9        Small     0-1000
10      Medium  2000-3000
11      Medium  1000-2000

Example with multiple imputations :

import pandas as pd
import numpy as np
fake_dataframe = pd.DataFrame({
    'Outlet_Size' : ['Medium', 'Medium', 'Medium', np.nan, 'High', 'High', np.nan, 'Small', 'Medium', 'Small', np.nan, np.nan],
    'sales_bin': ['3000-4000', '0-1000', '2000-3000', '0-1000', '0-1000', '2000-3000', '0-1000', '1000-2000', '1000-2000', '0-1000', '2000-3000', '1000-2000']
})
missing_mask = fake_dataframe['Outlet_Size'].isna()
mapping_dict = dict({'0-1000': 'Small', '2000-3000': 'High'})
fake_dataframe.loc[missing_mask, 'Outlet_Size'] = fake_dataframe.loc[missing_mask, 'sales_bin'].map(mapping_dict)
fake_dataframe['Outlet_Size'] = fake_dataframe['Outlet_Size'].fillna('Medium')
print(fake_dataframe)
    Outlet_Size sales_bin
0   Medium  3000-4000
1   Medium  0-1000
2   Medium  2000-3000
3   Small   0-1000
4   High    0-1000
5   High    2000-3000
6   Small   0-1000
7   Small   1000-2000
8   Medium  1000-2000
9   Small   0-1000
10  High    2000-3000
11  Medium  1000-2000

How do I assign values based on multiple conditions for existing columns?

You can do this using np.where, the conditions use bitwise & and | for and and or with parentheses around the multiple conditions due to operator precedence. So where the condition is true 5 is returned and 0 otherwise:

In [29]:
df['points'] = np.where( ( (df['gender'] == 'male') & (df['pet1'] == df['pet2'] ) ) | ( (df['gender'] == 'female') & (df['pet1'].isin(['cat','dog'] ) ) ), 5, 0)
df

Out[29]:
     gender      pet1      pet2  points
0      male       dog       dog       5
1      male       cat       cat       5
2      male       dog       cat       0
3    female       cat  squirrel       5
4    female       dog       dog       5
5    female  squirrel       cat       0
6  squirrel       dog       cat       0

Conditionally Fill Column Values Based on Another Columns Value in Pandas