How to drop rows of Pandas DataFrame whose value in a certain column is NaN
Don't drop, just take the rows where EPS is not NA:
df = df[df['EPS'].notna()]
Python: How to drop a row whose particular column is empty/NaN?
Use dropna
with parameter subset
for specify column for check NaN
s:
data = data.dropna(subset=['sms'])
print (data)
id city department sms category
1 2 lhr revenue good 1
Another solution with boolean indexing
and notnull
:
data = data[data['sms'].notnull()]
print (data)
id city department sms category
1 2 lhr revenue good 1
Alternative with query
:
print (data.query("sms == sms"))
id city department sms category
1 2 lhr revenue good 1
Timings
#[300000 rows x 5 columns]
data = pd.concat([data]*100000).reset_index(drop=True)
In [123]: %timeit (data.dropna(subset=['sms']))
100 loops, best of 3: 19.5 ms per loop
In [124]: %timeit (data[data['sms'].notnull()])
100 loops, best of 3: 13.8 ms per loop
In [125]: %timeit (data.query("sms == sms"))
10 loops, best of 3: 23.6 ms per loop
how to drop rows with 'nan' in a column in a pandas dataframe?
I think what you're doing is taking one column from a DataFrame, removing all the NaNs from it, but then adding that column to the same DataFrame again - where any missing values from the index will be filled by NaNs again.
Do you want to remove that row from the entire DataFrame? If yes, try df.dropna(subset=["col1"])
Trying to Drop values by column (I convert these values to nan but could be anything) not working
Passing axis is not support for dask
dataframes as of now. You cvan also print docstring of the function via ddf.dropna?
and it will tell you the same:
Signature: ddf.dropna(how='any', subset=None, thresh=None)
Docstring:
Remove missing values.
This docstring was copied from pandas.core.frame.DataFrame.dropna.
Some inconsistencies with the Dask version may exist.
See the :ref:`User Guide <missing_data>` for more on which values are
considered missing, and how to work with missing data.
Parameters
----------
axis : {0 or 'index', 1 or 'columns'}, default 0 (Not supported in Dask)
Determine if rows or columns which contain missing values are
removed.
* 0, or 'index' : Drop rows which contain missing values.
* 1, or 'columns' : Drop columns which contain missing value.
.. versionchanged:: 1.0.0
Pass tuple or list to drop on multiple axes.
Only a single axis is allowed.
how : {'any', 'all'}, default 'any'
Determine if row or column is removed from DataFrame, when we have
at least one NA or all NA.
* 'any' : If any NA values are present, drop that row or column.
* 'all' : If all values are NA, drop that row or column.
thresh : int, optional
Require that many non-NA values.
subset : array-like, optional
Labels along other axis to consider, e.g. if you are dropping rows
these would be a list of columns to include.
inplace : bool, default False (Not supported in Dask)
If True, do operation inplace and return None.
Returns
-------
DataFrame or None
DataFrame with NA entries dropped from it or None if ``inplace=True``.
Worth noting that Dask Documentation is copied from pandas for many instances like this. But wherever it does, it specifically states that:
This docstring was copied from pandas.core.frame.DataFrame.drop. Some
inconsistencies with the Dask version may exist.
Therefore its always best to check docstring for dask
's pandas
-driven functions instead of relying on documentation
How to drop rows in a df based on NaN values in specific columns not using column names but integer position for the subset?
Instead of deleting the rows that you do not want, try keeping those that you want:
df[df.iloc[:,[2,3]].notnull().all(axis=1)]
But what is wrong with getting the column names by index?
df.dropna(subset=df.columns[[2,3]])
Drop row if column entry contains NaN
You can identify all index positions that are equal to NaN
for the exploded data frame and can then filter the data frame for those that are not in the index array:
ser = pd.DataFrame(data={"col": [[1, 2, 3, np.nan, np.nan], [3, 4, 5], [3, 9], [np.nan, 10]]})['col']
ser_exploded = ser.explode()
ser[~ser.index.isin(np.unique(ser_exploded[ser_exploded.isna()].index))]
--------------------------------------
1 [3, 4, 5]
2 [3, 9]
Name: col, dtype: object
--------------------------------------
How to drop entire record if more than 90% of features have missing value in pandas
You can use df.dropna()
and set the thresh
parameter to the value that corresponds to 10% of your columns (the minimum number of non-NA values).
df.dropna(axis=0, thresh=50, inplace=True)
Squeeze dataframe rows with missing values
For each row remove missing values in Series.dropna
, rename
columns by dictionary and last add missing columns in DataFrame.reindex
:
df = (df1.apply(lambda x: pd.Series(x.dropna().to_numpy()), axis=1)
.rename(columns=dict(enumerate(df1.columns)))
.reindex(df1.columns, axis=1))
print (df)
A B C
0 1 100.0 NaN
1 2 20.0 NaN
2 300.0 NaN NaN
3 bla 400.0 NaN
Another idea:
df = (df1.apply(lambda x: x.sort_values(key=lambda x: x.isna()).to_numpy(),
axis=1,
result_type='expand')
.set_axis(df1.columns, axis=1)
.mask(lambda x: x.isna())
)
print (df)
A B C
0 1 100.0 NaN
1 2 20.0 NaN
2 300.0 NaN NaN
3 bla 400.0 NaN
df = (df1.apply(lambda x: x.sort_values(key=lambda x: x.isna()).to_numpy(),
axis=1,
result_type='expand')
.set_axis(df1.columns, axis=1)
)
print (df)
A B C
0 1 100.0 <NA>
1 2 20.0 NaN
2 300.0 NaN NaN
3 bla 400.0 <NA>
Related Topics
Convert Np.Array of Type Float64 to Type Uint8 Scaling Values
Python Super :Typeerror: _Init_() Takes 2 Positional Arguments But 3 Were Given
How to Extract All Upper from a String - Python
Best Practices for Adding .Gitignore File for Python Projects
Spark Add New Column With Value Form Previous Some Columns
Most Efficient Way to Find Mode in Numpy Array
Pick Dictionary Keys:Values Randomly
Convert CSV File to Pipe Delimited File in Python
Implement K-Fold Cross Validation in Mlpclassification Python
How to Find Last Occurence Index Matching a Certain Value in a Pandas Series
Better Way to Extract Only 2Nd Column of a Txt File in Python
How to Populate New Column Based on Values in Other Columns
Python: String Iteration Replace a Space With a Hyphen (Or Other Character)
Stuck With Loops in Python - Only Returning First Value
Index Out of Bounds Error:Python
Printing Each Letter of a Word + Another Letter - Python
Run a Python Script from Another Python Script, Passing in Arguments
Pandas: Merging Two Columns into One With Corresponding Values