Count Number of Zeros Per Row, and Remove Rows with More Than N Zeros

Count number of zeros per row, and remove rows with more than n zeros

It's not only possible, but very easy:

DF[rowSums(DF == 0) <= 4, ]

You could also use apply:

DF[apply(DF == 0, 1, sum) <= 4, ]

In Python, check for zeros in each row, if row has 3 or more zeros, remove the row. Current code does nothing to the file

Update

df = pd.read_csv('GiftYearTotal.csv', encoding='ISO-8859-1')
df = df.apply(lambda x: x.str.strip())
out = df[df.eq('$0.00').sum(1) <= 3]

Old answer

You can use:

out = df[df.eq('$0.00').sum(1) <= 3]
print(out)

# Output
       Year     2010     2011   2012    2013   2014     2015   2016    2017    2018    2019   2020   2021      2022
1  Person_B  $100.00  $150.00  $1.00  $50.00  $0.25  $100.00  $0.00  $50.00  $60.00  $50.00  $0.00  $0.00  $1000.00

Remove rows in a dataframe if 0 is found X number of times

Here is a one-liner. Note that rowSums is coded in C and is fast.

df[!rowSums(df == 0) >= 2, , drop = FALSE]

Counting number of zeros per row by Pandas DataFrame?

Use a boolean comparison which will produce a boolean df, we can then cast this to int, True becomes 1, False becomes 0 and then call count and pass param axis=1 to count row-wise:

In [56]:

df = pd.DataFrame({'a':[1,0,0,1,3], 'b':[0,0,1,0,1], 'c':[0,0,0,0,0]})
df
Out[56]:
   a  b  c
0  1  0  0
1  0  0  0
2  0  1  0
3  1  0  0
4  3  1  0
In [64]:

(df == 0).astype(int).sum(axis=1)
Out[64]:
0    2
1    3
2    2
3    2
4    1
dtype: int64

Breaking the above down:

In [65]:

(df == 0)
Out[65]:
       a      b     c
0  False   True  True
1   True   True  True
2   True  False  True
3  False   True  True
4  False  False  True
In [66]:

(df == 0).astype(int)
Out[66]:
   a  b  c
0  0  1  1
1  1  1  1
2  1  0  1
3  0  1  1
4  0  0  1

EDIT

as pointed out by david the astype to int is unnecessary as the Boolean types will be upcasted to int when calling sum so this simplifies to:

(df == 0).sum(axis=1)

Deleting rows have most of the value zero

We can use rowSums

df[rowSums(df == 0) < 3, ]

#  i j  k l m n
#b 8 6 34 1 0 0
#d 7 9  3 7 0 5
#f 2 3  9 6 8 9
#g 0 1  0 3 1 5

We can also use apply and count row-wise number of 0's and then subset

df[apply(df == 0, 1, sum) < 3, ]

Pandas dataframe drop rows which store certain number of zeros in it

This will work:

drop_indexs = []
for i in range(len(df.iloc[:,0])):
    if (df.iloc[i,:]==0).sum()>=4:    # 4 is how many zeros should row min have
        drop_indexs.append(i)
updated_df = df.drop(drop_indexs)

Excluding rows containting consecutive zeros from data frame

If we are looking for any consecutive zeros in each row and want to exclude that row, one way would be to loop through the rows using apply and MARGIN=1. Check whether there are any of the adjacent elements are equal and are zero, do the negation and subset the rows.

df1[!apply(df1[-(1:2)], 1, FUN = function(x) any((c(FALSE, x[-1]==x[-length(x)])) & !x)),]
#  subj stimulus var1 var2 var3 var4
#1    1        A   25   30   15   36
#3    1        C   12    0   20   23

Or if we need consecutive zero length to be 'n', then rle can be applied to each row, check whether the lengths for 'values' that are 0 is 'n', negate and subset the rows.

df1[!apply(df1[-(1:2)], 1, FUN = function(x) any(with(rle(x==0), lengths[values])==2)),]
#  subj stimulus var1 var2 var3 var4
#1    1        A   25   30   15   36
#3    1        C   12    0   20   23

Pandas: drop row if more than one of multiple columns is zero

Apply the condition and count the True values.

(df == 0).sum(1)

ID1    2
ID2    0
ID3    1
dtype: int64

df[(df == 0).sum(1) < 2]

     col0  col1  col2  col3
ID2     1     1     2    10
ID3     0     1     3     4

Alternatively, convert the integers to bool and sum that. A little more direct.

# df[(~df.astype(bool)).sum(1) < 2]
df[df.astype(bool).sum(1) > len(df.columns)-2]  # no inversion needed

     col0  col1  col2  col3
ID2     1     1     2    10
ID3     0     1     3     4

For performance, you can use np.count_nonzero:

# df[np.count_nonzero(df, axis=1) > len(df.columns)-2]
df[np.count_nonzero(df.values, axis=1) > len(df.columns)-2]

     col0  col1  col2  col3
ID2     1     1     2    10
ID3     0     1     3     4

df = pd.concat([df] * 10000, ignore_index=True)

%timeit df[(df == 0).sum(1) < 2]
%timeit df[df.astype(bool).sum(1) > len(df.columns)-2]
%timeit df[np.count_nonzero(df.values, axis=1) > len(df.columns)-2]

7.13 ms ± 161 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
4.28 ms ± 120 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
997 µs ± 38.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Count Number of Zeros Per Row, and Remove Rows with More Than N Zeros