Cumulative Count of Each Value

Cumulative count of each value

The dplyr way:

library(dplyr)

foo <- data.frame(id=c(1, 2, 3, 2, 2, 1, 2, 3))
foo <- foo %>% group_by(id) %>% mutate(count=row_number())
foo

# A tibble: 8 x 2
# Groups: id [3]
id count
<dbl> <int>
1 1 1
2 2 1
3 3 1
4 2 2
5 2 3
6 1 2
7 2 4
8 3 2

That ends up grouped by id. If you want it not grouped, add %>% ungroup().

How to get cumulative counts for each ID at that point in time based on entries in another column ina a pandas dataframe

use isin to get True if the value is in the list manage_condition_list on the column CONDITION, then groupby.cumsum by the CLIENT_ID column

df['CONDITION_COUNTS'] = (
df['CONDITION'].isin(manage_condition_list)
.groupby(df['CLIENT_ID']).cumsum()
)
print(df)
CLIENT_ID ENCOUNTER_DATE CONDITION CONDITION_COUNTS
0 8222 2020-01-01 positive 1
1 8222 2020-03-02 treated 2
2 8222 2020-04-18 treated 3
3 8222 2020-07-31 negative 3
4 8300 2017-06-10 negative 0
5 8300 2017-09-11 treated 1
6 8300 2018-02-01 future treatment 2
7 8300 2018-04-01 treated 3
8 8300 2018-05-31 negative 3
9 8400 2020-12-31 future treatment 1
10 8401 2017-08-29 negative 0
11 8401 2017-09-15 positive 1
12 8500 2018-10-10 positive 1

Cumulative count but not if already seen, then assign previous count value

Building on answer from Chris. Change sort to True if you want it in alphabetical order.

import pandas as pd

dict_d = {'Item': ['b', 'a','c','d','c','d','e','b','f','g']}
df = pd.DataFrame(data=dict_d)
df['Order Number'] = df.groupby(pd.Grouper(key='Item',sort=False)).ngroup()+1

print(df)

Output:

  Item  Order Number
0 b 1
1 a 2
2 c 3
3 d 4
4 c 3
5 d 4
6 e 5
7 b 1
8 f 6
9 g 7

Edit: To incorporate another column, simply add another grouper. For example.

import pandas as pd

dict_d = {'Item': ['b', 'a','c','d','c','d','e','b','f','g'],
'User_ID': ['111', '231','431','333','132','345','253','111','335','999']}
df = pd.DataFrame(data=dict_d)
df['Order Number'] = df.groupby([
pd.Grouper(key='Item',sort=False),
pd.Grouper(key='User_ID',sort=False)
]).ngroup()+1

print(df)

Output:

  Item User_ID  Order Number
0 b 111 2
1 a 231 1
2 c 431 4
3 d 333 5
4 c 132 3
5 d 345 6
6 e 253 7
7 b 111 2
8 f 335 8
9 g 999 9

Getting a cumulative count of rows in each column excluding NAs in R

We can loop over the column with lapply, convert to logical, do the cumsum and assign the output back to the original or a copy of the original object. Make sure to use [] to preserve the attributes

test1 <- test
test1[] <- lapply(test, function(x) cumsum(!is.na(x)))

-output

head(test1)
CT_CW.QA.RWL.H1A1Y CT_CW.QA.RWL.H1A1Z CT_CW.QA.RWL.H1A2Y CT_CW.QA.RWL.H1A2Z CT_CW.QA.RWL.H1A3Y CT_CW.QA.RWL.H1A3Z CT_CW.QA.RWL.H1A4Y
1812 0 0 0 0 0 0 0
1813 0 0 0 0 0 0 0
1814 0 0 0 0 0 0 0
1815 0 0 0 0 0 0 0
1816 0 0 0 0 0 0 0
1817 0 0 0 0 0 0 0
CT_CW.QA.RWL.H1A4Z CT_CW.QA.RWL.H1A5Y CT_CW.QA.RWL.H1A5Z CT_CW.QA.RWL.H1A6Y CT_CW.QA.RWL.H1A6Z CT_CW.QA.RWL.H1A7Y CT_CW.QA.RWL.H1A7Z
1812 0 0 1 0 0 0 0
1813 0 0 2 0 0 0 0
1814 0 0 3 0 1 0 0
1815 0 0 4 0 2 0 0
1816 0 0 5 0 3 0 0
1817 0 0 6 0 4 0 0
CT_CW.QA.RWL.H1A8Y
1812 0
1813 0
1814 0
1815 0
1816 0
1817 0
....

There is a colCumsums from matrixStats

library(matrixStats)
test1[] <- colCumsums(!is.na(test))

The issue with cumsum returning a vector is because colSums is a single vector of one observation per each column, and cumsum on it returns the cumulative sum of column sums instead of cumulative sum inside each column

Cumulative count of column based on Month

Use groupby on the month (and year to be safe) information from Period and apply cumsum:

year_col = pd.to_datetime(df_2['Period']).dt.year
month_col = pd.to_datetime(df_2['Period']).dt.month
df_2['count'] = df_2.groupby([year_col, month_col])['size'].cumsum()

Result:

        Period  Code  size  count
0 2022-04-29 A 2 2
1 2022-04-30 A 1 3
2 2022-05-01 A 3 3

Pandas groupby cumcount - one cumulative count rather than a cumulative count for each unique value

Lets try sort df, check consecutive difference, create new group by cumsum and then resort the df

new_df=df.sort_values(by=['name','game_id'])
new_df=new_df.assign(rank=new_df['game_id']!=new_df['game_id'].shift())
new_df=new_df.assign(rank=new_df.groupby('name')['rank'].cumsum()).sort_index()
print(new_df)



name game_id rank
0 pam 0 1
1 pam 0 1
2 bob 1 1
3 bob 1 1
4 pam 0 1
5 bob 2 2
6 pam 1 2
7 bob 2 2

How to find out the cumulative count between numbers?

If performance is important count consecutive 0 values from difference column:

m = df['diff'].eq(0)
b = m.cumsum()
df['out'] = b.sub(b.mask(m).ffill().fillna(0)).astype(int)
print (df)
Value diff need out
0 6 NaN na 0
1 5 -1.0 0 0
2 5 0.0 1 1
3 5 0.0 2 2
4 4 -1.0 0 0
5 4 0.0 1 1
6 4 0.0 2 2
7 4 0.0 3 3
8 4 0.0 4 4
9 5 1.0 0 0
10 5 0.0 1 1
11 5 0.0 2 2
12 5 0.0 3 3
13 6 1.0 0 0
14 7 1.0 0 0


Related Topics



Leave a reply



Submit