How to pass another entire column as argument to pandas fillna()
You can provide this column to fillna
(see docs), it will use those values on matching indexes to fill:
In [17]: df['Cat1'].fillna(df['Cat2'])
Out[17]:
0 cat
1 dog
2 cat
3 ant
Name: Cat1, dtype: object
fillna by referring another column but copy same column value using pandas
df['sub_code'] =df.groupby(['grade'])['sub_code'].bfill().ffill()
sub_code stud_level grade
0 CSE01 101 STA
1 CSE01 101 STA
2 CSE03 101 PSA
3 CSE02 101 STA
4 CSE03 101 STA
5 CSE02 101 SSA
6 CSE03 101 PSA
7 CSE02 101 QSA
fill NaN based on other columns value
Try this:
df['MEAN'] = df['MEAN'].fillna(df['WFR'])
Pandas conditional fillna based on another column values
You can use pandas.Series.map instead of numpy.where.
pandas.Series.map seems to be handier for these simple cases, which makes multiple imputations easier and explicit with dictionaries (say {'0-1000': 'Small', '2000-3000': 'High'}
).
numpy.where is designed to handle more logic (ex: if a < 5 then a^2) which is not very useful in the OP use case, but comes at some cost, like making multiple imputations tricky to handle (nested if-else).
Steps :
- Generate a mask to tag the subset of the pandas.DataFrame with missing 'Outlet_Size' using pandas.Series.isna() ;
- Define a dictionary with mappings, e.g. from '0-1000' to 'Small' ;
- Replace 'Outlet_Size' values in the defined pandas.DataFrame subset using pandas.Series.map with the defined dictionary as args argument.
- Use pandas.Series.fillna() to catch the unmapped missing 'Outlet_Size' and impute them to a default value.
Example :
import pandas as pd
import numpy as np
fake_dataframe = pd.DataFrame({
'Outlet_Size' : ['Medium', 'Medium', 'Medium', np.nan, 'High', 'High', np.nan, 'Small', 'Medium', 'Small', np.nan, np.nan],
'sales_bin': ['3000-4000', '0-1000', '2000-3000', '0-1000', '0-1000', '2000-3000', '0-1000', '1000-2000', '1000-2000', '0-1000', '2000-3000', '1000-2000']
})
missing_mask = fake_dataframe['Outlet_Size'].isna()
mapping_dict = dict({'0-1000': 'Small'})
fake_dataframe.loc[missing_mask, 'Outlet_Size'] = fake_dataframe.loc[missing_mask, 'sales_bin'].map(mapping_dict)
fake_dataframe['Outlet_Size'] = fake_dataframe['Outlet_Size'].fillna('Medium')
print(fake_dataframe)
Outlet_Size sales_bin
0 Medium 3000-4000
1 Medium 0-1000
2 Medium 2000-3000
3 Small 0-1000
4 High 0-1000
5 High 2000-3000
6 Small 0-1000
7 Small 1000-2000
8 Medium 1000-2000
9 Small 0-1000
10 Medium 2000-3000
11 Medium 1000-2000
Example with multiple imputations :
import pandas as pd
import numpy as np
fake_dataframe = pd.DataFrame({
'Outlet_Size' : ['Medium', 'Medium', 'Medium', np.nan, 'High', 'High', np.nan, 'Small', 'Medium', 'Small', np.nan, np.nan],
'sales_bin': ['3000-4000', '0-1000', '2000-3000', '0-1000', '0-1000', '2000-3000', '0-1000', '1000-2000', '1000-2000', '0-1000', '2000-3000', '1000-2000']
})
missing_mask = fake_dataframe['Outlet_Size'].isna()
mapping_dict = dict({'0-1000': 'Small', '2000-3000': 'High'})
fake_dataframe.loc[missing_mask, 'Outlet_Size'] = fake_dataframe.loc[missing_mask, 'sales_bin'].map(mapping_dict)
fake_dataframe['Outlet_Size'] = fake_dataframe['Outlet_Size'].fillna('Medium')
print(fake_dataframe)
Outlet_Size sales_bin
0 Medium 3000-4000
1 Medium 0-1000
2 Medium 2000-3000
3 Small 0-1000
4 High 0-1000
5 High 2000-3000
6 Small 0-1000
7 Small 1000-2000
8 Medium 1000-2000
9 Small 0-1000
10 High 2000-3000
11 Medium 1000-2000
How to fill column' value with another column and keep existing?
You can do so:
df['c1'] = df['c1'].replace('',np.NaN).fillna(df['c2'])
df['c2'] = df['c2'].replace('',np.NaN).fillna(df['c1'])
Output:
c1 c2
0 HP_0003470 HP_0003470
1 HP_8362789 HP_0093723
2 MP_0000231 MP_0000231
3 MP_0000231 MP_0000231
Fillna in multiple columns in place in Python Pandas
You could use apply
for your columns with checking dtype
whether it's numeric
or not by checking dtype.kind
:
res = df.apply(lambda x: x.fillna(0) if x.dtype.kind in 'biufc' else x.fillna('.'))
print(res)
A B City Name
0 1.0 0.25 Seattle Jack
1 2.1 0.00 SF Sue
2 0.0 0.00 LA .
3 4.7 4.00 OC Bob
4 5.6 12.20 . Alice
5 6.8 14.40 . John
How to Conditionally Set Column Values
try this
df['col 3'] = np.where(df['col 1'].isnull(),df['col 2'],df['col 1'])
Related Topics
Moving Average or Running Mean
Live Output from Subprocess Command
What Should I Do with "Unexpected Indent" in Python
Why Can't I Use a List as a Dict Key in Python
How to Redirect Output with Subprocess in Python
How to Get the Ascii Value of a Character
Keras, How to Get the Output of Each Layer
Read Specific Columns from a CSV File with CSV Module
How to Run Functions in Parallel
What's the Correct Way to Convert Bytes to a Hex String in Python 3
Matplotlib: How to Create Axessubplot Objects, Then Add Them to a Figure Instance
How to Install Writable Shared and User Specific Data Files with Setuptools
Error Installing Uwsgi in Virtualenv
Docker.Errors.Dockerexception: Error While Fetching Server API Version
Python For-In Loop Preceded by a Variable
Lxml Error "Ioerror: Error Reading File" When Parsing Facebook Mobile in a Python Scraper Script