How to Count the Nan Values in a Column in Pandas Dataframe

How to count nan values in a pandas DataFrame?

If you want to count only NaN values in column 'a' of a DataFrame df, use:

len(df) - df['a'].count()

Here count() tells us the number of non-NaN values, and this is subtracted from the total number of values (given by len(df)).

To count NaN values in every column of df, use:

len(df) - df.count()

If you want to use value_counts, tell it not to drop NaN values by setting dropna=False (added in 0.14.1):

dfv = dfd['a'].value_counts(dropna=False)

This allows the missing values in the column to be counted too:

 3     3
NaN 2
1 1
Name: a, dtype: int64

The rest of your code should then work as you expect (note that it's not necessary to call sum; just print("nan: %d" % dfv[np.nan]) suffices).

How to count null values for each columns as well as finding percentage in pandas dataframe?

You could try this:

import pandas as pd

# Toy dataframe
ski_data = pd.DataFrame(
{
"A": [1, 1, 1],
"B": [2, 2, None],
"C": ["markers", "", "markers"],
"D": [None, 2, None],
"E": [4, "", 4],
}
)

counts = ski_data.isna().sum()
print(counts.sort_values())
# Outputs
A 0
C 0
E 0
B 1
D 2

percentages = round(ski_data.isna().mean() * 100, 1)
print(percentages.sort_values())
# Outputs
A 0.0
C 0.0
E 0.0
B 33.3
D 66.7

null_values = pd.concat([counts, percentages], axis=1, keys=["count", "%"])
print(null_values)
# Outputs
count %
A 0 0.0
B 1 33.3
C 0 0.0
D 2 66.7
E 0 0.0

How to count the number of non-NaN columns in a row in a Dataframe?

Use,

df['Number_of_non_nans'] = df.notna().sum(axis=1)

or as @Datanovice suggests in comments use:

 df['Number_of_non_nans'] = df.count(axis=1)

Output:

|    |   0 |   1 |   3 |   Number_of_non_nans |
|----|-----|-----|-----|----------------------|
| a | 1 | 4 | nan | 2 |
| b | 2 | 5 | 7 | 3 |
| c | 3 | 6 | nan | 2 |

Timings:

%timeit df.count(axis=1)

656 µs ± 14.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops
> each)

%timeit df.isna().sum(axis=1)

> 437 µs ± 3.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

How to plot count of null values for each column in Pandas Dataframe

If u need NaN count in each column, that have NaN and get bar plot, the next code may help:

df.isna().sum()[df.isna().sum()>0].plot(kind='bar')

Count NaN per row with Pandas

IIUC, this should fulfill your needs.

nasum=df['First_Name'].isnull().sum()
df['countNames'] = df.groupby('First_Name')['First_Name'].transform('count').replace(np.nan,nasum)

or, as suggested by ALollz, below code will also provide the same result

df['countNames'] = df.groupby('First_Name')['First_Name'].transform('count').fillna(nasum)

Input

       First_Name   Favorite_Color
0 Jared Blue
1 Lily Blue
2 Sarah Pink
3 Bill Red
4 Bill Yellow
5 Alfred Orange
6 None Red
7 None Pink

Output

     First_Name     Favorite_Color  countNames
0 Jared Blue 1.0
1 Lily Blue 1.0
2 Sarah Pink 1.0
3 Bill Red 2.0
4 Bill Yellow 2.0
5 Alfred Orange 1.0
6 None Red 2.0
7 None Pink 2.0


Related Topics



Leave a reply



Submit