﻿ Counting Number of Zeros Per Row by Pandas Dataframe - ITCodar

# Counting Number of Zeros Per Row by Pandas Dataframe

## Counting number of zeros per row by Pandas DataFrame?

Use a boolean comparison which will produce a boolean df, we can then cast this to int, True becomes 1, False becomes 0 and then call `count` and pass param `axis=1` to count row-wise:

``In [56]:df = pd.DataFrame({'a':[1,0,0,1,3], 'b':[0,0,1,0,1], 'c':[0,0,0,0,0]})dfOut[56]:   a  b  c0  1  0  01  0  0  02  0  1  03  1  0  04  3  1  0In [64]:(df == 0).astype(int).sum(axis=1)Out[64]:0    21    32    23    24    1dtype: int64``

Breaking the above down:

``In [65]:(df == 0)Out[65]:       a      b     c0  False   True  True1   True   True  True2   True  False  True3  False   True  True4  False  False  TrueIn [66]:(df == 0).astype(int)Out[66]:   a  b  c0  0  1  11  1  1  12  1  0  13  0  1  14  0  0  1``

EDIT

as pointed out by david the `astype` to `int` is unnecessary as the `Boolean` types will be upcasted to `int` when calling `sum` so this simplifies to:

``(df == 0).sum(axis=1)``

## counting leading & trailing zeros for every row in a dataframe in R

We could use `rowCumsums` from `matrixStats` along with `rowSums`

``library(matrixStats)cbind(df[1], total_zeros = rowSums(df[-1] == 0),      Leading_zeros = rowSums(!rowCumsums(df[-1] != 0)))``

-output

``     key total_zeros Leading_zeros1   10A           3              12  11xy           1              03 445pe           3              2``

or in tidyverse, we may also use `rowwise`

``library(dplyr)df %>%    mutate(total_zeros = rowSums(select(., starts_with("Obs")) == 0)) %>%   rowwise %>%   transmute(key, total_zeros,       Leading_zeros = sum(!cumsum(c_across(starts_with('Obs')) != 0))) %>%      ungroup``

-output

``# A tibble: 3 x 3  key   total_zeros Leading_zeros  <chr>       <dbl>         <int>1 10A             3             12 11xy            1             03 445pe           3             2``

## In Python, check for zeros in each row, if row has 3 or more zeros, remove the row. Current code does nothing to the file

Update

``df = pd.read_csv('GiftYearTotal.csv', encoding='ISO-8859-1')df = df.apply(lambda x: x.str.strip())out = df[df.eq('\$0.00').sum(1) <= 3]``

You can use:

``out = df[df.eq('\$0.00').sum(1) <= 3]print(out)# Output       Year     2010     2011   2012    2013   2014     2015   2016    2017    2018    2019   2020   2021      20221  Person_B  \$100.00  \$150.00  \$1.00  \$50.00  \$0.25  \$100.00  \$0.00  \$50.00  \$60.00  \$50.00  \$0.00  \$0.00  \$1000.00``

## Pandas: Counting the proportion of zeros in rows and columns of dataframe

try this instead of the first funtion:

``print(df[df == 0].count(axis=1)/len(df.columns))``

UPDATE (correction):

``print('rows')print(df[df == 0].count(axis=1)/len(df.columns))print('cols')print(df[df == 0].count(axis=0)/len(df.index))``

Input data (i've decided to add a few rows):

``ID  var1  var21     2     32     5     03     4     54    10    105    1      0``

Output:

``rowsID1    0.02    0.53    0.04    0.05    0.5dtype: float64colsvar1    0.0var2    0.4dtype: float64``

## Count number of zeros per row, and remove rows with more than n zeros

It's not only possible, but very easy:

``DF[rowSums(DF == 0) <= 4, ]``

You could also use `apply`:

``DF[apply(DF == 0, 1, sum) <= 4, ]``

## pandas groupby count the number of zeros in a column

I believe need `DataFrameGroupBy.agg` with compare by `0` and `sum`:

a) To count no. of zero values:

``df1 = df.groupby('Date').agg(lambda x: x.eq(0).sum())print (df1)            B  CDate            20.07.2018  0  121.07.2018  1  1``

b) To count no. of non-zero values:

``df2 = df.groupby('Date').agg(lambda x: x.ne(0).sum())print (df2)            B  CDate            20.07.2018  2  121.07.2018  1  1``

Another idea for improve performance is create `DatetimeIndex`, comapre columns and last use `sum` per level (DatetimeIndex):

``df1 = df.set_index('Date').eq(0).sum(level=0)print (df1)            B  CDate            20.07.2018  0  121.07.2018  1  1df2 = df.set_index('Date').ne(0).sum(level=0)print (df2)            B  CDate            20.07.2018  2  121.07.2018  1  1 ``