Efficient method to count consecutive positive values in pandas dataframe
Use consecutiveCounts
just once in an unstacked series. Then, stack back to data frame.
Using DSM's consecutiveCount
, which I named c
here for simplicity:
>>> c = lambda y: y * (y.groupby((y != y.shift()).cumsum()).cumcount() + 1)
>>> c(df.unstack()).unstack().T
a b
0 0 0
1 1 0
2 0 0
3 1 0
4 2 1
5 0 2
6 0 0
7 0 1
8 1 2
9 2 3
10 0 0
11 1 0
12 0 0
Timings
# df2 is (65, 40)
df2 = pd.concat([pd.concat([df]*20, axis=1)]*5).T.reset_index(drop=True).T.reset_index(drop=True)
%timeit c(df2.unstack()).unstack().T
5.54 ms ± 296 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit df2.apply(c)
82.5 ms ± 2.19 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
How can I get the count of consecutive positive number in each column in 2 dimensional df in python/ Padas
Let us use cumsum
def your function
def yourfun(x) :
return x[x.ge(0)].groupby(x.lt(0).cumsum()).size().iloc[-1]
df.loc['Count'] = df.apply(yourfun)
df
Out[62]:
X y
a 1.0 -1.0
b -2.0 2.0
c 3.0 -3.0
d 2.1 4.0
Count 2.0 1.0
Count rows with positive values and reset if negative
You first want to mark the positions where new segments (i.e., groups) start:
>>> df['Count'] = df.Slope.lt(0)
>>> df.head(7)
Slope Count
0 -25.0 True
1 -15.0 True
2 17.0 False
3 6.0 False
4 0.1 False
5 5.0 False
6 -3.0 True
Now you need to label each group using the cumulative sum: as True
is evaluated as 1
in mathematical equations, the cumulative sum will label each segment with an incrementing integer. (This is a very powerful concept in pandas!)
>>> df['Count'] = df.Count.cumsum()
>>> df.head(7)
Slope Count
0 -25.0 1
1 -15.0 2
2 17.0 2
3 6.0 2
4 0.1 2
5 5.0 2
6 -3.0 3
Now you can use groupby
to access each segment, then all you need to do is generate an incrementing sequence starting at zero for each group. There are many ways to do that, I'd just use the (reset
'ed) index of each group, i.e., reset the index, get the fresh RangeIndex
starting at 0
, and turn it into a series:
>>> df.groupby('Count').apply(lambda x: x.reset_index().index.to_series())
Count
1 0 0
2 0 0
1 1
2 2
3 3
4 4
3 0 0
1 1
2 2
3 3
4 0 0
5 0 0
1 1
6 0 0
This results in the expected counts, but note that the final index doesn't match the original dataframe, so we need another reset_index()
with drop=True
to discard the grouped index to put this into our original dataframe:
>>> df['Count'] = df.groupby('Count').apply(lambda x:x.reset_index().index.to_series()).reset_index(drop=True)
Et voilá:
>>> df
Slope Count
0 -25.0 0
1 -15.0 0
2 17.0 1
3 6.0 2
4 0.1 3
5 5.0 4
6 -3.0 0
7 5.0 1
8 1.0 2
9 3.0 3
10 -0.1 0
11 -0.2 0
12 1.0 1
13 -9.0 0
Pandas dataframe: count consecutive True / False values
You can get the group number of consecutive True
/False
by .cumsum()
and put into g
.
Then, group by g
and get the size/count of each group by .transform()
+ .size()
. Set the sign by multiplying the return value (1
or -1
) of np.where()
, as follows:
g = df['Mask'].ne(df['Mask'].shift()).cumsum()
df['Count'] = df.groupby(g)['Mask'].transform('size') * np.where(df['Mask'], 1, -1)
Result:
print(df)
Mask Count
0 True 3
1 True 3
2 True 3
3 False -2
4 False -2
5 True 1
6 False -2
7 False -2
Python Pandas: Compute Consecutive Window Count of Positive Numbers
IIUC you just need to count backwards:
s = df["Col3"][::-1]
df["New"] = s.groupby((s<0).cumsum()).apply(lambda d: (d>=0).cumsum())
print (df)
Col1 Col2 Col3 Col4 New
0 A 0.532 -0.234 2020-01-01 05:00:00 0
1 B 0.242 0.224 2020-01-01 06:00:00 1
2 A 0.152 -0.753 2020-01-01 08:00:00 0
3 C 0.149 0.983 2020-01-01 08:00:00 4
4 A 0.635 0.429 2020-01-01 09:00:00 3
5 A 0.938 0.365 2020-01-01 10:00:00 2
6 C 0.293 0.956 2020-01-02 05:00:00 1
7 A 0.294 -0.234 2020-01-02 06:00:00 0
8 E 0.294 0.394 2020-01-02 07:00:00 5
9 D 0.294 0.258 2020-01-02 08:00:00 4
10 A 0.687 0.666 2020-01-03 05:00:00 3
11 C 0.232 0.494 2020-01-03 06:00:00 2
12 D 0.575 0.845 2020-01-03 07:00:00 1
Count consecutive positive and negative values in a list
Count consecutive groups of positive/negative values using groupby
:
s = pd.Series(y)
v = s.gt(0).ne(s.gt(0).shift()).cumsum()
pd.DataFrame(
v.groupby(v).count().values.reshape(-1, 2), columns=['pos', 'neg']
)
pos neg
0 1 2
1 4 2
Count Positive Consecutive Elements in Dataframe
Here's a similar approach in Pandas
In [792]: df_p = df > 0
In [793]: df_p
Out[793]:
0 1 2 3 4
0 False False True True True
1 True True False True True
2 True True True True False
3 False False True False True
4 False False True False False
In [794]: df_p['0'] * (df_p < df_p.shift(1, axis=1)).idxmax(axis=1).astype(int)
Out[794]:
0 0
1 2
2 4
3 0
4 0
dtype: int32
How to count consecutive repetitions in a pandas series
Here is another approach using fillna
to handle NaN
values:
s = df.id.fillna('nan')
mask = s.ne(s.shift())
ids = s[mask].to_numpy()
counts = s.groupby(mask.cumsum()).cumcount().add(1).groupby(mask.cumsum()).max().to_numpy()
# Convert 'nan' string back to `NaN`
ids[ids == 'nan'] = np.nan
ser_out = pd.Series(counts, index=ids, name='counts')
[out]
nan 2
1.0 2
2.0 3
nan 2
1.0 3
nan 1
Name: counts, dtype: int64
Related Topics
How Include Static Files to Setuptools - Python Package
Splitting a List Based on a Delimiter Word
What's the Difference Between %S and %D in Python String Formatting
What Is the Point of Indexing in Pandas
Coalesce Values from 2 Columns into a Single Column in a Pandas Dataframe
How to Pass an Operator to a Python Function
Can a Lambda Function Call Itself Recursively in Python
Scikit-Learn & Statsmodels - Which R-Squared Is Correct
How to Convert Columns into One Datetime Column in Pandas
"Pythonic" Method to Parse a String of Comma-Separated Integers into a List of Integers
Running Python on Windows for Node.Js Dependencies
Let JSON Object Accept Bytes or Let Urlopen Output Strings
How to Determine Whether a Year Is a Leap Year