Counting Number of Zeros Per Row by Pandas Dataframe

Counting number of zeros per row by Pandas DataFrame?

Use a boolean comparison which will produce a boolean df, we can then cast this to int, True becomes 1, False becomes 0 and then call count and pass param axis=1 to count row-wise:

In [56]:

df = pd.DataFrame({'a':[1,0,0,1,3], 'b':[0,0,1,0,1], 'c':[0,0,0,0,0]})
df
Out[56]:
a b c
0 1 0 0
1 0 0 0
2 0 1 0
3 1 0 0
4 3 1 0
In [64]:

(df == 0).astype(int).sum(axis=1)
Out[64]:
0 2
1 3
2 2
3 2
4 1
dtype: int64

Breaking the above down:

In [65]:

(df == 0)
Out[65]:
a b c
0 False True True
1 True True True
2 True False True
3 False True True
4 False False True
In [66]:

(df == 0).astype(int)
Out[66]:
a b c
0 0 1 1
1 1 1 1
2 1 0 1
3 0 1 1
4 0 0 1

EDIT

as pointed out by david the astype to int is unnecessary as the Boolean types will be upcasted to int when calling sum so this simplifies to:

(df == 0).sum(axis=1)

counting leading & trailing zeros for every row in a dataframe in R

We could use rowCumsums from matrixStats along with rowSums

library(matrixStats)
cbind(df[1], total_zeros = rowSums(df[-1] == 0),
Leading_zeros = rowSums(!rowCumsums(df[-1] != 0)))

-output

     key total_zeros Leading_zeros
1 10A 3 1
2 11xy 1 0
3 445pe 3 2

or in tidyverse, we may also use rowwise

library(dplyr)
df %>%
mutate(total_zeros = rowSums(select(., starts_with("Obs")) == 0)) %>%
rowwise %>%
transmute(key, total_zeros,
Leading_zeros = sum(!cumsum(c_across(starts_with('Obs')) != 0))) %>%
ungroup

-output

# A tibble: 3 x 3
key total_zeros Leading_zeros
<chr> <dbl> <int>
1 10A 3 1
2 11xy 1 0
3 445pe 3 2

In Python, check for zeros in each row, if row has 3 or more zeros, remove the row. Current code does nothing to the file

Update

df = pd.read_csv('GiftYearTotal.csv', encoding='ISO-8859-1')
df = df.apply(lambda x: x.str.strip())
out = df[df.eq('$0.00').sum(1) <= 3]

Old answer

You can use:

out = df[df.eq('$0.00').sum(1) <= 3]
print(out)

# Output
Year 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
1 Person_B $100.00 $150.00 $1.00 $50.00 $0.25 $100.00 $0.00 $50.00 $60.00 $50.00 $0.00 $0.00 $1000.00

Pandas: Counting the proportion of zeros in rows and columns of dataframe

try this instead of the first funtion:

print(df[df == 0].count(axis=1)/len(df.columns))

UPDATE (correction):

print('rows')
print(df[df == 0].count(axis=1)/len(df.columns))
print('cols')
print(df[df == 0].count(axis=0)/len(df.index))

Input data (i've decided to add a few rows):

ID  var1  var2
1 2 3
2 5 0
3 4 5
4 10 10
5 1 0

Output:

rows
ID
1 0.0
2 0.5
3 0.0
4 0.0
5 0.5
dtype: float64
cols
var1 0.0
var2 0.4
dtype: float64

Count number of zeros per row, and remove rows with more than n zeros

It's not only possible, but very easy:

DF[rowSums(DF == 0) <= 4, ]

You could also use apply:

DF[apply(DF == 0, 1, sum) <= 4, ]

pandas groupby count the number of zeros in a column

I believe need DataFrameGroupBy.agg with compare by 0 and sum:

a) To count no. of zero values:

df1 = df.groupby('Date').agg(lambda x: x.eq(0).sum())
print (df1)

B C
Date
20.07.2018 0 1
21.07.2018 1 1

b) To count no. of non-zero values:

df2 = df.groupby('Date').agg(lambda x: x.ne(0).sum())
print (df2)
B C
Date
20.07.2018 2 1
21.07.2018 1 1

Another idea for improve performance is create DatetimeIndex, comapre columns and last use sum per level (DatetimeIndex):

df1 = df.set_index('Date').eq(0).sum(level=0)
print (df1)
B C
Date
20.07.2018 0 1
21.07.2018 1 1

df2 = df.set_index('Date').ne(0).sum(level=0)
print (df2)
B C
Date
20.07.2018 2 1
21.07.2018 1 1


Related Topics



Leave a reply



Submit