Remove Row with Nan Value

How to drop rows of Pandas DataFrame whose value in a certain column is NaN

Don't drop, just take the rows where EPS is not NA:

df = df[df['EPS'].notna()]

Remove row with null value from pandas data frame

This should do the work:

df = df.dropna(how='any',axis=0) 

It will erase every row (axis=0) that has "any" Null value in it.

EXAMPLE:

#Recreate random DataFrame with Nan values
df = pd.DataFrame(index = pd.date_range('2017-01-01', '2017-01-10', freq='1d'))
# Average speed in miles per hour
df['A'] = np.random.randint(low=198, high=205, size=len(df.index))
df['B'] = np.random.random(size=len(df.index))*2

#Create dummy NaN value on 2 cells
df.iloc[2,1]=None
df.iloc[5,0]=None

print(df)
A B
2017-01-01 203.0 1.175224
2017-01-02 199.0 1.338474
2017-01-03 198.0 NaN
2017-01-04 198.0 0.652318
2017-01-05 199.0 1.577577
2017-01-06 NaN 0.234882
2017-01-07 203.0 1.732908
2017-01-08 204.0 1.473146
2017-01-09 198.0 1.109261
2017-01-10 202.0 1.745309

#Delete row with dummy value
df = df.dropna(how='any',axis=0)

print(df)

A B
2017-01-01 203.0 1.175224
2017-01-02 199.0 1.338474
2017-01-04 198.0 0.652318
2017-01-05 199.0 1.577577
2017-01-07 203.0 1.732908
2017-01-08 204.0 1.473146
2017-01-09 198.0 1.109261
2017-01-10 202.0 1.745309

See the reference for further detail.

If everything is OK with your DataFrame, dropping NaNs should be as easy as that. If this is still not working, make sure you have the proper datatypes defined for your column (pd.to_numeric comes to mind...)

how to remove rows that contain NaN in both 1st and 3rd columns?

dropna has an additional parameter, how:

how{‘any’, ‘all’}, default ‘any’
Determine if row or column is removed from DataFrame, when we have at least one NA or all NA.
‘any’ : If any NA values are present, drop that row or column.
‘all’ : If all values are NA, drop that row or column.

If you set it to all, it will only drop the lines that are filled with NaN. In your case df.dropna(subset=['b', 'd'], how="all") would work.

how to drop rows with 'nan' in a column in a pandas dataframe?

I think what you're doing is taking one column from a DataFrame, removing all the NaNs from it, but then adding that column to the same DataFrame again - where any missing values from the index will be filled by NaNs again.

Do you want to remove that row from the entire DataFrame? If yes, try df.dropna(subset=["col1"])

Remove a specific number of rows before and after NaN value in Pandas dataframe

I found an other approach to solve this.

df['Value'] = np.where(df['Value'].shift(-3).isnull(), np.nan, df['Value']) 
df['Value'] = np.where(df['Value'].shift(3).isnull(), np.nan, df['Value'])

Remove Row if NaN in First Five Columns

If you want to do a strict check of Nan in all rows for first 5 columns:

df.iloc[:, :5].dropna(how='all')

Explanation:

df.iloc[:, :5] : select all rows and first 5 columns

.dropna(how='all') : check if all values in a row are NaN

If you want to check for Nan in any of the 5 columns:

df.iloc[:, :5].dropna(how='any')

For assigning it back to original df, you can do this:

In [2107]: ix = df.iloc[:, :5].dropna(how='all').index.tolist()

In [2110]: df = df.loc[ix]

In [2111]: df
Out[2111]:
LotName C15 C16 C17 C18 C19 Spots15 Spots16
Cherry St 439.0 464.0 555.0 239.0 420 101 101.0
Barton Lot 34.0 24.0 43.0 45.0 39 10 9.0

How to remove rows that include partially Nan values without taking specific part of the row into account?

You can check all columns without total in seconds, datetime(utc) by subset parameter with Index.difference:

cols = ['total in seconds','datetime(utc)']
checked = df.columns.difference(cols)

df = df.dropna(subset=checked, how='all')


Related Topics



Leave a reply



Submit