Cumulative Sum for Positive Numbers Only

Cumulative sum for positive numbers only

One option is

x1 <- inverse.rle(within.list(rle(x), values[!!values] <- 
(cumsum(values))[!!values]))
x[x1!=0] <- ave(x[x1!=0], x1[x1!=0], FUN=seq_along)
x
#[1] 1 2 3 4 5 0 1 0 0 0 1 2

Or a one-line code would be

 x[x>0] <-  with(rle(x), sequence(lengths[!!values]))
x
#[1] 1 2 3 4 5 0 1 0 0 0 1 2

Cumulative sum on time series split by consecutive negative or positive values

Putting 0 in with the positives, you can use the shift-compare-cumsum pattern:

In [33]: sign = df["values"] >= 0

In [34]: df["vsum"] = df["values"].groupby((sign != sign.shift()).cumsum()).cumsum()

In [35]: df
Out[35]:
date values vsum
0 2017-05-01 1.00 1.00
1 2017-05-02 0.50 1.50
2 2017-05-03 -2.00 -2.00
3 2017-05-04 -1.00 -3.00
4 2017-05-05 -1.25 -4.25
5 2017-05-06 0.50 0.50
6 2017-05-07 0.50 1.00

which works because (sign != sign.shift()).cumsum() gives us a new number for each contiguous group:

In [36]: sign != sign.shift()
Out[36]:
0 True
1 False
2 True
3 False
4 False
5 True
6 False
Name: values, dtype: bool

In [37]: (sign != sign.shift()).cumsum()
Out[37]:
0 1
1 1
2 2
3 2
4 2
5 3
6 3
Name: values, dtype: int64

Running total of positive and negative numbers where the sum cannot go below zero

Unfortunately, there is no way to do this without cycling through the records one-by-one. That, in turn, requires something like a recursive CTE.

with t as (
select t.*, row_number() over (order by date) as seqnum
from mytable t
),
cte as (
select NULL as number, 0 as desired, 0 as seqnum
union all
select t.number,
(case when cte.desired + t.number < 0 then 0
else cte.desired + t.number
end),
cte.seqnum + 1
from cte join
t
on t.seqnum = cte.seqnum + 1
)
select cte.*
from cte
where cte.number is not null;

I would recommend this approach only if your data is rather small. But then again, if you have to do this, there are not many alternatives other then going through the table row-by-agonizing-row.

Here is a db<>fiddle (using Postgres).

Cumulative sum that resets when turning negative/positive

The data

I'm going to change the data in the example you provided.

df = pl.DataFrame(
{
"a": [11, 10, 10, 10, 9, 8, 8, 8, 8, 8, 15, 15, 15],
"b": [11, 9, 9, 9, 9, 9, 10, 8, 8, 10, 11, 11, 15],
}
)
print(df)
shape: (13, 2)
┌─────┬─────┐
│ a ┆ b │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 11 ┆ 11 │
├╌╌╌╌╌┼╌╌╌╌╌┤
│ 10 ┆ 9 │
├╌╌╌╌╌┼╌╌╌╌╌┤
│ 10 ┆ 9 │
├╌╌╌╌╌┼╌╌╌╌╌┤
│ 10 ┆ 9 │
├╌╌╌╌╌┼╌╌╌╌╌┤
│ 9 ┆ 9 │
├╌╌╌╌╌┼╌╌╌╌╌┤
│ 8 ┆ 9 │
├╌╌╌╌╌┼╌╌╌╌╌┤
│ 8 ┆ 10 │
├╌╌╌╌╌┼╌╌╌╌╌┤
│ 8 ┆ 8 │
├╌╌╌╌╌┼╌╌╌╌╌┤
│ 8 ┆ 8 │
├╌╌╌╌╌┼╌╌╌╌╌┤
│ 8 ┆ 10 │
├╌╌╌╌╌┼╌╌╌╌╌┤
│ 15 ┆ 11 │
├╌╌╌╌╌┼╌╌╌╌╌┤
│ 15 ┆ 11 │
├╌╌╌╌╌┼╌╌╌╌╌┤
│ 15 ┆ 15 │
└─────┴─────┘

Notice the cases where the two columns are the same. Your post didn't address what to do in these cases, so I made some assumptions as to what should happen. (You can adapt the code to handle those cases differently.)

The algorithm

df = (
df
.with_column((pl.col("a") - pl.col("b")).sign().alias("sign_a_minus_b"))
.with_column(
pl.when(pl.col("sign_a_minus_b") == 0)
.then(None)
.otherwise(pl.col("sign_a_minus_b"))
.forward_fill()
.alias("run_type")
)
.with_column(
(pl.col("run_type") != pl.col("run_type").shift_and_fill(1, 0))
.cumsum()
.alias("run_id")
)
.with_column(pl.col("sign_a_minus_b").cumsum().over("run_id").alias("result"))
)
print(df)
shape: (13, 6)
┌─────┬─────┬────────────────┬──────────┬────────┬────────┐
│ a ┆ b ┆ sign_a_minus_b ┆ run_type ┆ run_id ┆ result │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ i64 ┆ u32 ┆ i64 │
╞═════╪═════╪════════════════╪══════════╪════════╪════════╡
│ 11 ┆ 11 ┆ 0 ┆ null ┆ 1 ┆ 0 │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 10 ┆ 9 ┆ 1 ┆ 1 ┆ 2 ┆ 1 │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 10 ┆ 9 ┆ 1 ┆ 1 ┆ 2 ┆ 2 │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 10 ┆ 9 ┆ 1 ┆ 1 ┆ 2 ┆ 3 │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 9 ┆ 9 ┆ 0 ┆ 1 ┆ 2 ┆ 3 │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 8 ┆ 9 ┆ -1 ┆ -1 ┆ 3 ┆ -1 │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 8 ┆ 10 ┆ -1 ┆ -1 ┆ 3 ┆ -2 │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 8 ┆ 8 ┆ 0 ┆ -1 ┆ 3 ┆ -2 │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 8 ┆ 8 ┆ 0 ┆ -1 ┆ 3 ┆ -2 │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 8 ┆ 10 ┆ -1 ┆ -1 ┆ 3 ┆ -3 │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 15 ┆ 11 ┆ 1 ┆ 1 ┆ 4 ┆ 1 │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 15 ┆ 11 ┆ 1 ┆ 1 ┆ 4 ┆ 2 │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 15 ┆ 15 ┆ 0 ┆ 1 ┆ 4 ┆ 2 │
└─────┴─────┴────────────────┴──────────┴────────┴────────┘

I've left the intermediate calculations in the output, merely to show how the algorithm works. (You can drop them.)

The basic idea is to calculate a run_id for each run of positive or negative values. We will then use the cumsum function and the over windowing expression to create a running count of positives/negatives over each run_id.

Key assumption: ties in columns a and b do not interrupt a run, but they do not contribute to the total for that run of positive/negative values.

sign_a_minus_b does two things: it identifies whether a run is positive/negative, and whether there is a tie in columns a and b.

run_type extends any run to include any cases where a tie occurs in columns a and b. The null value at the top of the column was intended - it shows what happens when a tie occurs in the first row.

result is the output column. Note that tied columns do not interrupt a run, but they don't contribute to the totals for that run.

One final note: if ties in columns a and b are not allowed, then this algorithm can be simplified ... and run faster.

pandas calculate/show dataframe cumsum() only for positive values and other condition

Here is a solution (but there might be a more elegant one):

indexes = (df.col_2 == 'closed') & (df.col_values > 0)
df.loc[indexes, 'new_col'] = df.loc[indexes].groupby('col_1')['col_values'].cumsum()

Conditional cumulative sum

Here's another way.

> r <- rle(sign(t$v2))
> diff(c(0,cumsum(t$v1)[cumsum(r$lengths)]))[r$values==1]
[1] 8 10 12 2

It's easier to understand if you split it up; it works by picking out the right elements of the cumulative sum and subtracting them.

> (s <- cumsum(t$v1))
[1] 1 3 4 8 14 21 29 31 34 38 46 47 49
> (r <- rle(sign(t$v2)))
Run Length Encoding
lengths: int [1:7] 4 2 2 1 2 1 1
values : num [1:7] 1 -1 1 -1 1 -1 1
> (k <- cumsum(r$lengths))
[1] 4 6 8 9 11 12 13
> (a <- c(0,s[k]))
[ 1] 0 8 21 31 34 46 47 49
> (d <- diff(a))
[1] 8 13 10 3 12 1 2
> d[r$values==1]
[1] 8 10 12 2

Similarly, but without rle:

> k <- which(diff(c(sign(t$v2),0))!=0)
> diff(c(0,cumsum(t$v1)[k]))[t$v2[k]>0]
[1] 8 10 12 2

Cumulative sum of certain numbers in a dataframe ordered by date

You can use this.

library(data.table)
df <- as.data.table(df)

# Order by date
df <- df[order(date)]

# Perform the cumsum for positives and negatives separately
df[, expected := cumsum(values), by = sign(values)]

# Just for the negatives, get the previous positive value
df[, expected := ifelse(values > 0, expected, c(0, expected[-.N]))]

print(df)

date values expected
1: 2016-12-05 5 5
2: 2016-12-07 -10 5
3: 2016-12-08 10 15
4: 2017-01-05 5 20
5: 2017-01-10 -7 20
6: 2017-01-11 8 28
7: 2017-01-11 8 36

Note that if there are more than one consecutive negative values, you have to repeat the operation. For instance, if your data frame is this one:

df <- data.frame(date = as.Date(c("2016-12-08", "2016-12-07", "2016-12-05", "2017-01-05","2017-01-10", "2017-01-10", "2017-01-11", "2017-01-11")), 
values = c(10, -10, 5, 5, -7, -15, 8, 8))

One single execution of the above code would produce the following output:

         date values expected
1: 2016-12-05 5 5
2: 2016-12-07 -10 5
3: 2016-12-08 10 15
4: 2017-01-05 5 20
5: 2017-01-10 -7 20
6: 2017-01-10 -15 -17
7: 2017-01-11 8 28
8: 2017-01-11 8 36

The value -17 would be wrong. In order to avoid this problem, you can repeat the process until there aren't any negative values left. So the full code would be:

df <- df[order(date)]
df[, expected := cumsum(values), by = sign(values)]

# If there are negative values, repeat the process
while(length(which(df$expected < 0))){
df[, expected := ifelse(values > 0, expected, c(0, expected[-.N]))]
}

print(df)
date values expected
1: 2016-12-05 5 5
2: 2016-12-07 -10 5
3: 2016-12-08 10 15
4: 2017-01-05 5 20
5: 2017-01-10 -7 20
6: 2017-01-10 -15 20
7: 2017-01-11 8 28
8: 2017-01-11 8 36

Perfrom cumulative sum over a column but reset to 0 if sum become negative in Pandas

Slightly modify also this method is slow that numba solution

sumlm = np.frompyfunc(lambda a,b: 0 if a+b < 0 else a+b,2,1)
newx=sumlm.accumulate(df.Value.values, dtype=np.object)
newx
Out[147]: array([7, 9, 3, 0, 8, 8], dtype=object)

numba solution

from numba import njit
@njit
def cumli(x, lim):
total = 0
result = []
for i, y in enumerate(x):
total += y
if total < lim:
total = 0
result.append(total)
return result
cumli(df.Value.values,0)
Out[166]: [7, 9, 3, 0, 8, 8]


Related Topics



Leave a reply



Submit