How to count nan values in a pandas DataFrame?
If you want to count only NaN values in column 'a'
of a DataFrame df
, use:
len(df) - df['a'].count()
Here count()
tells us the number of non-NaN values, and this is subtracted from the total number of values (given by len(df)
).
To count NaN values in every column of df
, use:
len(df) - df.count()
If you want to use value_counts
, tell it not to drop NaN values by setting dropna=False
(added in 0.14.1):
dfv = dfd['a'].value_counts(dropna=False)
This allows the missing values in the column to be counted too:
3 3
NaN 2
1 1
Name: a, dtype: int64
The rest of your code should then work as you expect (note that it's not necessary to call sum
; just print("nan: %d" % dfv[np.nan])
suffices).
How to count null values for each columns as well as finding percentage in pandas dataframe?
You could try this:
import pandas as pd
# Toy dataframe
ski_data = pd.DataFrame(
{
"A": [1, 1, 1],
"B": [2, 2, None],
"C": ["markers", "", "markers"],
"D": [None, 2, None],
"E": [4, "", 4],
}
)
counts = ski_data.isna().sum()
print(counts.sort_values())
# Outputs
A 0
C 0
E 0
B 1
D 2
percentages = round(ski_data.isna().mean() * 100, 1)
print(percentages.sort_values())
# Outputs
A 0.0
C 0.0
E 0.0
B 33.3
D 66.7
null_values = pd.concat([counts, percentages], axis=1, keys=["count", "%"])
print(null_values)
# Outputs
count %
A 0 0.0
B 1 33.3
C 0 0.0
D 2 66.7
E 0 0.0
How to count the number of non-NaN columns in a row in a Dataframe?
Use,
df['Number_of_non_nans'] = df.notna().sum(axis=1)
or as @Datanovice suggests in comments use:
df['Number_of_non_nans'] = df.count(axis=1)
Output:
| | 0 | 1 | 3 | Number_of_non_nans |
|----|-----|-----|-----|----------------------|
| a | 1 | 4 | nan | 2 |
| b | 2 | 5 | 7 | 3 |
| c | 3 | 6 | nan | 2 |
Timings:
%timeit df.count(axis=1)
656 µs ± 14.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops
> each)
%timeit df.isna().sum(axis=1)
> 437 µs ± 3.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
How to plot count of null values for each column in Pandas Dataframe
If u need NaN count in each column, that have NaN and get bar plot, the next code may help:
df.isna().sum()[df.isna().sum()>0].plot(kind='bar')
Count NaN per row with Pandas
IIUC, this should fulfill your needs.
nasum=df['First_Name'].isnull().sum()
df['countNames'] = df.groupby('First_Name')['First_Name'].transform('count').replace(np.nan,nasum)
or, as suggested by ALollz, below code will also provide the same result
df['countNames'] = df.groupby('First_Name')['First_Name'].transform('count').fillna(nasum)
Input
First_Name Favorite_Color
0 Jared Blue
1 Lily Blue
2 Sarah Pink
3 Bill Red
4 Bill Yellow
5 Alfred Orange
6 None Red
7 None Pink
Output
First_Name Favorite_Color countNames
0 Jared Blue 1.0
1 Lily Blue 1.0
2 Sarah Pink 1.0
3 Bill Red 2.0
4 Bill Yellow 2.0
5 Alfred Orange 1.0
6 None Red 2.0
7 None Pink 2.0
Related Topics
Remove Non-Numeric Rows in One Column with Pandas
Python Scope: "Unboundlocalerror: Local Variable 'C' Referenced Before Assignment"
Does Spark Predicate Pushdown Work with Jdbc
Remove Adjacent Duplicate Elements from a List
Vectorised Haversine Formula with a Pandas Dataframe
Scrollbar on Matplotlib Showing Page
How to Add a Qvideowidget in Qt Designer
List Returned by Map Function Disappears After One Use
Assign Operator to Variable in Python
Is' Operator Behaves Differently When Comparing Strings with Spaces
How to Set Window Size in Selenium Chrome Python