counting the number of non-zero numbers in a column of a df in pandas/python
Use double sum
:
print df
a b c d e
0 0 1 2 3 5
1 1 4 0 5 2
2 5 8 9 6 0
3 4 5 0 0 0
print (df != 0).sum(1)
0 4
1 4
2 4
3 2
dtype: int64
print (df != 0).sum(1).sum()
14
If you need count only column c
or d
:
print (df['c'] != 0).sum()
2
print (df['d'] != 0).sum()
3
EDIT: Solution with numpy.sum
:
print ((df != 0).values.sum())
14
Running Count of Non-Zero Values in pd Dataframe Column
You can still use cumsum
df['running_total'] = df['payment'].ne(0).groupby(df['key_id']).cumsum()
Get count of non zero values per row in Pandas DataFrame
Compare by gt
(>
), lt
(<
) or le
, ge
, ne
, eq
first and then sum
True
s, there are processing like 1
:
Bad -> check all previous columns:
df['> zero'] = df.gt(0).sum(axis=1)
df['< zero'] = df.lt(0).sum(axis=1)
df['== zero'] = df.eq(0).sum(axis=1)
print (df)
GOOG AAPL XOM IBM Value > zero < zero == zero
2011-01-10 0.0 0.0 0.0 0.0 0.0 0 0 7
2011-01-13 0.0 -1500.0 0.0 4000.0 -61900.0 1 2 2
Correct - select columns for check:
cols = df.columns
df['> zero'] = df[cols].gt(0).sum(axis=1)
df['< zero'] = df[cols].lt(0).sum(axis=1)
df['== zero'] = df[cols].eq(0).sum(axis=1)
print (df)
GOOG AAPL XOM IBM Value > zero < zero == zero
2011-01-10 0.0 0.0 0.0 0.0 0.0 0 0 5
2011-01-13 0.0 -1500.0 0.0 4000.0 -61900.0 1 2 2
Detail:
print (df.gt(0))
GOOG AAPL XOM IBM Value
2011-01-10 False False False False False
2011-01-13 False False False True False
EDIT:
To remove some columns from the 'cols' use difference
:
cols = df.columns.difference(['Value'])
print (cols)
Index(['AAPL', 'GOOG', 'IBM', 'XOM'], dtype='object')
df['> zero'] = df[cols].gt(0).sum(axis=1)
df['< zero'] = df[cols].lt(0).sum(axis=1)
df['== zero'] = df[cols].eq(0).sum(axis=1)
print (df)
GOOG AAPL XOM IBM Value > zero < zero == zero
2011-01-10 0.0 0.0 0.0 0.0 0.0 0 0 4
2011-01-13 0.0 -1500.0 0.0 4000.0 -61900.0 1 1 2
How to count non-zero values in a dataframe inside a range of a column
You can use
sum(df.loc[i:j,'data'].ne(0))
#or
df.loc[i:j,'data'].ne(0).sum()
Get column indices for non-zero values in each row in pandas data frame
One quick option is to apply numpy.flatnonzero
to each row:
import numpy as np
df.apply(np.flatnonzero, axis=1)
0 [0, 1]
1 [0]
2 [1]
3 [0, 1, 2, 5, 7, 8]
dtype: object
If you care about performance, here is a pure numpy option (caveat for this option is if the row doesn't have any non zero values, it will be ignored in the result. Choose the method that works for you depending on your need):
idx, idy = np.where(df != 0)
np.split(idy, np.flatnonzero(np.diff(idx) != 0) + 1)
[array([0, 1], dtype=int32),
array([0], dtype=int32),
array([1], dtype=int32),
array([0, 1, 2, 5, 7, 8], dtype=int32)]
pandas determine column labels that contribute to non-zero values in each row
To count your non-zeros in each row you can use nonzero_count
from numpy
package and perform the operation row-wise:
import numpy as np
df['non_zero_count'] = np.count_nonzero(df,axis=1)
>>> df
1 2 3 4 5 6 7 non_zero_count
0 8122 0 0 0 0 0 0 1
1 0 0 0 3292 0 1313 0 2
2 0 8675 0 0 0 0 0 1
3 0 0 1910 0 213 0 12312 3
4 0 0 0 0 4010 0 0 1
5 0 0 0 0 0 1002 0 1
6 0 0 0 0 0 0 1278 1
Then you can get the columns where a row contains a non-zero value with apply
, so be cautious here if you have a big dataset at hand:
df['non_zero_label'] = df.drop('non_zero_count',axis=1)\
.apply(lambda r: r.index[r.ne(0)].to_list(), axis=1)
df
>>> df
1 2 3 4 5 6 7 non_zero_count non_zero_label
0 8122 0 0 0 0 0 0 1 [1]
1 0 0 0 3292 0 1313 0 2 [4, 6]
2 0 8675 0 0 0 0 0 1 [2]
3 0 0 1910 0 213 0 12312 3 [3, 5, 7]
4 0 0 0 0 4010 0 0 1 [5]
5 0 0 0 0 0 1002 0 1 [6]
6 0 0 0 0 0 0 1278 1 [7]
PySpark write a function to count non zero values of given columns
You can use a list comprehension to generate the list of aggregation expressions:
import pyspark.sql.functions as F
def count_non_zero (df, features, grouping):
return df.groupBy(*grouping).agg(*[F.count(F.when(F.col(c) != 0, 1)).alias(c) for c in features])
Add and count non-zero values of rows based on current date
We can filter the required columns using boolean indexing, then calculate and insert the total
and active_months
columns in df
where total
is computed by summing up the values along axis=1
and active_months
is calculated by counting non-zero values along axis=1
m = pd.to_datetime(df.columns, errors='coerce') <= '1 May, 2021'
c = df.loc[:, m]
df.insert(2, 'total', c.sum(1))
df.insert(3, 'active_months', c.ne(0).sum(1))
>>> df
account_id contract_id total active_months 2020-12-01 00:00:00 2021-01-01 00:00:00 2021-02-01 00:00:00 2021-03-01 00:00:00 2021-04-01 00:00:00 2021-05-01 00:00:00 2021-06-01 00:00:00
0 1 A 200.0 1 200.0 0.0 0.0 0.0 0.0 0.0 0.0
1 1 B 600.0 2 300.0 300.0 0.0 0.0 0.0 0.0 0.0
2 1 C 1200.0 3 0.0 0.0 0.0 400.0 400.0 400.0 400.0
3 2 K 300.0 3 100.0 100.0 100.0 0.0 0.0 0.0 0.0
4 2 F 200.0 4 0.0 0.0 50.0 50.0 50.0 50.0 50.0
Count of non-zero values in multiple rows in Python?
You can do this using iloc for slicing and numpy
np.sum((df.iloc[[0, 1], 1:]!=0).any(axis=0))
Here df.iloc[[0, 1], 1:] gives you first two rows and numpy sum is counting the total number of non zero pairs in the selected row. You can use df.iloc[[0, 1], 1:] to select any combination of rows.
Related Topics
Get Discord User Id from Username
How to Clear Your Printed Text in Python
How to Check If Numbers Are in a List in Python
How to Check List Containing Nan
Splitting a Phone Number into a List of Digits: Python
A Way to Quick Preview .Ipynb Files
How to Add Parenthesis Around a Substring in a String
How to Update a Label Inside While Loop in Tkinter
Vary the Color of Each Bar in Bargraph Using Particular Value
Matplotlib: Drawing Lines Between Points Ignoring Missing Data
How to Put a Space Between Two String Items in Python
Replacing Blank Values (White Space) With Nan in Pandas
How to Run External Executable Using Python
How to Make Python Code to Execute Only Once
How to Constantly Run Python Script in the Background on Windows
Selecting Specific Rows and Columns from Numpy Array
Possible to Loop Through Excel Files With Differently Named Sheets, and Import into a List