Count Number of Zeros Per Row, and Remove Rows with More Than N Zeros

Count number of zeros per row, and remove rows with more than n zeros

It's not only possible, but very easy:

DF[rowSums(DF == 0) <= 4, ]

You could also use apply:

DF[apply(DF == 0, 1, sum) <= 4, ]

In Python, check for zeros in each row, if row has 3 or more zeros, remove the row. Current code does nothing to the file

Update

df = pd.read_csv('GiftYearTotal.csv', encoding='ISO-8859-1')
df = df.apply(lambda x: x.str.strip())
out = df[df.eq('$0.00').sum(1) <= 3]

Old answer

You can use:

out = df[df.eq('$0.00').sum(1) <= 3]
print(out)

# Output
Year 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
1 Person_B $100.00 $150.00 $1.00 $50.00 $0.25 $100.00 $0.00 $50.00 $60.00 $50.00 $0.00 $0.00 $1000.00

Remove rows in a dataframe if 0 is found X number of times

Here is a one-liner. Note that rowSums is coded in C and is fast.

df[!rowSums(df == 0) >= 2, , drop = FALSE]

Counting number of zeros per row by Pandas DataFrame?

Use a boolean comparison which will produce a boolean df, we can then cast this to int, True becomes 1, False becomes 0 and then call count and pass param axis=1 to count row-wise:

In [56]:

df = pd.DataFrame({'a':[1,0,0,1,3], 'b':[0,0,1,0,1], 'c':[0,0,0,0,0]})
df
Out[56]:
a b c
0 1 0 0
1 0 0 0
2 0 1 0
3 1 0 0
4 3 1 0
In [64]:

(df == 0).astype(int).sum(axis=1)
Out[64]:
0 2
1 3
2 2
3 2
4 1
dtype: int64

Breaking the above down:

In [65]:

(df == 0)
Out[65]:
a b c
0 False True True
1 True True True
2 True False True
3 False True True
4 False False True
In [66]:

(df == 0).astype(int)
Out[66]:
a b c
0 0 1 1
1 1 1 1
2 1 0 1
3 0 1 1
4 0 0 1

EDIT

as pointed out by david the astype to int is unnecessary as the Boolean types will be upcasted to int when calling sum so this simplifies to:

(df == 0).sum(axis=1)

Deleting rows have most of the value zero

We can use rowSums

df[rowSums(df == 0) < 3, ]

# i j k l m n
#b 8 6 34 1 0 0
#d 7 9 3 7 0 5
#f 2 3 9 6 8 9
#g 0 1 0 3 1 5

We can also use apply and count row-wise number of 0's and then subset

df[apply(df == 0, 1, sum) < 3, ] 

Pandas dataframe drop rows which store certain number of zeros in it

This will work:

drop_indexs = []
for i in range(len(df.iloc[:,0])):
if (df.iloc[i,:]==0).sum()>=4: # 4 is how many zeros should row min have
drop_indexs.append(i)
updated_df = df.drop(drop_indexs)

Excluding rows containting consecutive zeros from data frame

If we are looking for any consecutive zeros in each row and want to exclude that row, one way would be to loop through the rows using apply and MARGIN=1. Check whether there are any of the adjacent elements are equal and are zero, do the negation and subset the rows.

df1[!apply(df1[-(1:2)], 1, FUN = function(x) any((c(FALSE, x[-1]==x[-length(x)])) & !x)),]
# subj stimulus var1 var2 var3 var4
#1 1 A 25 30 15 36
#3 1 C 12 0 20 23

Or if we need consecutive zero length to be 'n', then rle can be applied to each row, check whether the lengths for 'values' that are 0 is 'n', negate and subset the rows.

df1[!apply(df1[-(1:2)], 1, FUN = function(x) any(with(rle(x==0), lengths[values])==2)),]
# subj stimulus var1 var2 var3 var4
#1 1 A 25 30 15 36
#3 1 C 12 0 20 23

Pandas: drop row if more than one of multiple columns is zero

Apply the condition and count the True values.

(df == 0).sum(1)

ID1 2
ID2 0
ID3 1
dtype: int64

df[(df == 0).sum(1) < 2]

col0 col1 col2 col3
ID2 1 1 2 10
ID3 0 1 3 4

Alternatively, convert the integers to bool and sum that. A little more direct.

# df[(~df.astype(bool)).sum(1) < 2]
df[df.astype(bool).sum(1) > len(df.columns)-2] # no inversion needed

col0 col1 col2 col3
ID2 1 1 2 10
ID3 0 1 3 4

For performance, you can use np.count_nonzero:

# df[np.count_nonzero(df, axis=1) > len(df.columns)-2]
df[np.count_nonzero(df.values, axis=1) > len(df.columns)-2]

col0 col1 col2 col3
ID2 1 1 2 10
ID3 0 1 3 4

df = pd.concat([df] * 10000, ignore_index=True)

%timeit df[(df == 0).sum(1) < 2]
%timeit df[df.astype(bool).sum(1) > len(df.columns)-2]
%timeit df[np.count_nonzero(df.values, axis=1) > len(df.columns)-2]

7.13 ms ± 161 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
4.28 ms ± 120 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
997 µs ± 38.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


Related Topics



Leave a reply



Submit