How to Pass Another Entire Column as Argument to Pandas Fillna()

How to pass another entire column as argument to pandas fillna()

You can provide this column to fillna (see docs), it will use those values on matching indexes to fill:

In [17]: df['Cat1'].fillna(df['Cat2'])
Out[17]:
0 cat
1 dog
2 cat
3 ant
Name: Cat1, dtype: object

fillna by referring another column but copy same column value using pandas

df['sub_code'] =df.groupby(['grade'])['sub_code'].bfill().ffill()

sub_code stud_level grade
0 CSE01 101 STA
1 CSE01 101 STA
2 CSE03 101 PSA
3 CSE02 101 STA
4 CSE03 101 STA
5 CSE02 101 SSA
6 CSE03 101 PSA
7 CSE02 101 QSA

fill NaN based on other columns value

Try this:

df['MEAN'] = df['MEAN'].fillna(df['WFR'])

Pandas conditional fillna based on another column values

You can use pandas.Series.map instead of numpy.where.

pandas.Series.map seems to be handier for these simple cases, which makes multiple imputations easier and explicit with dictionaries (say {'0-1000': 'Small', '2000-3000': 'High'}).

numpy.where is designed to handle more logic (ex: if a < 5 then a^2) which is not very useful in the OP use case, but comes at some cost, like making multiple imputations tricky to handle (nested if-else).

Steps :

  1. Generate a mask to tag the subset of the pandas.DataFrame with missing 'Outlet_Size' using pandas.Series.isna() ;
  2. Define a dictionary with mappings, e.g. from '0-1000' to 'Small' ;
  3. Replace 'Outlet_Size' values in the defined pandas.DataFrame subset using pandas.Series.map with the defined dictionary as args argument.
  4. Use pandas.Series.fillna() to catch the unmapped missing 'Outlet_Size' and impute them to a default value.

Example :

import pandas as pd
import numpy as np
fake_dataframe = pd.DataFrame({
'Outlet_Size' : ['Medium', 'Medium', 'Medium', np.nan, 'High', 'High', np.nan, 'Small', 'Medium', 'Small', np.nan, np.nan],
'sales_bin': ['3000-4000', '0-1000', '2000-3000', '0-1000', '0-1000', '2000-3000', '0-1000', '1000-2000', '1000-2000', '0-1000', '2000-3000', '1000-2000']
})
missing_mask = fake_dataframe['Outlet_Size'].isna()
mapping_dict = dict({'0-1000': 'Small'})
fake_dataframe.loc[missing_mask, 'Outlet_Size'] = fake_dataframe.loc[missing_mask, 'sales_bin'].map(mapping_dict)
fake_dataframe['Outlet_Size'] = fake_dataframe['Outlet_Size'].fillna('Medium')
print(fake_dataframe)
Outlet_Size sales_bin
0 Medium 3000-4000
1 Medium 0-1000
2 Medium 2000-3000
3 Small 0-1000
4 High 0-1000
5 High 2000-3000
6 Small 0-1000
7 Small 1000-2000
8 Medium 1000-2000
9 Small 0-1000
10 Medium 2000-3000
11 Medium 1000-2000

Example with multiple imputations :

import pandas as pd
import numpy as np
fake_dataframe = pd.DataFrame({
'Outlet_Size' : ['Medium', 'Medium', 'Medium', np.nan, 'High', 'High', np.nan, 'Small', 'Medium', 'Small', np.nan, np.nan],
'sales_bin': ['3000-4000', '0-1000', '2000-3000', '0-1000', '0-1000', '2000-3000', '0-1000', '1000-2000', '1000-2000', '0-1000', '2000-3000', '1000-2000']
})
missing_mask = fake_dataframe['Outlet_Size'].isna()
mapping_dict = dict({'0-1000': 'Small', '2000-3000': 'High'})
fake_dataframe.loc[missing_mask, 'Outlet_Size'] = fake_dataframe.loc[missing_mask, 'sales_bin'].map(mapping_dict)
fake_dataframe['Outlet_Size'] = fake_dataframe['Outlet_Size'].fillna('Medium')
print(fake_dataframe)
Outlet_Size sales_bin
0 Medium 3000-4000
1 Medium 0-1000
2 Medium 2000-3000
3 Small 0-1000
4 High 0-1000
5 High 2000-3000
6 Small 0-1000
7 Small 1000-2000
8 Medium 1000-2000
9 Small 0-1000
10 High 2000-3000
11 Medium 1000-2000

How to fill column' value with another column and keep existing?

You can do so:

df['c1'] = df['c1'].replace('',np.NaN).fillna(df['c2'])
df['c2'] = df['c2'].replace('',np.NaN).fillna(df['c1'])

Output:

           c1          c2
0 HP_0003470 HP_0003470
1 HP_8362789 HP_0093723
2 MP_0000231 MP_0000231
3 MP_0000231 MP_0000231

Fillna in multiple columns in place in Python Pandas

You could use apply for your columns with checking dtype whether it's numeric or not by checking dtype.kind:

res = df.apply(lambda x: x.fillna(0) if x.dtype.kind in 'biufc' else x.fillna('.'))

print(res)
A B City Name
0 1.0 0.25 Seattle Jack
1 2.1 0.00 SF Sue
2 0.0 0.00 LA .
3 4.7 4.00 OC Bob
4 5.6 12.20 . Alice
5 6.8 14.40 . John

How to Conditionally Set Column Values

try this

df['col 3'] = np.where(df['col 1'].isnull(),df['col 2'],df['col 1'])


Related Topics



Leave a reply



Submit