Cumulative count of each value
The dplyr
way:
library(dplyr)
foo <- data.frame(id=c(1, 2, 3, 2, 2, 1, 2, 3))
foo <- foo %>% group_by(id) %>% mutate(count=row_number())
foo
# A tibble: 8 x 2
# Groups: id [3]
id count
<dbl> <int>
1 1 1
2 2 1
3 3 1
4 2 2
5 2 3
6 1 2
7 2 4
8 3 2
That ends up grouped by id
. If you want it not grouped, add %>% ungroup()
.
How to get cumulative counts for each ID at that point in time based on entries in another column ina a pandas dataframe
use isin
to get True if the value is in the list manage_condition_list
on the column CONDITION, then groupby.cumsum
by the CLIENT_ID column
df['CONDITION_COUNTS'] = (
df['CONDITION'].isin(manage_condition_list)
.groupby(df['CLIENT_ID']).cumsum()
)
print(df)
CLIENT_ID ENCOUNTER_DATE CONDITION CONDITION_COUNTS
0 8222 2020-01-01 positive 1
1 8222 2020-03-02 treated 2
2 8222 2020-04-18 treated 3
3 8222 2020-07-31 negative 3
4 8300 2017-06-10 negative 0
5 8300 2017-09-11 treated 1
6 8300 2018-02-01 future treatment 2
7 8300 2018-04-01 treated 3
8 8300 2018-05-31 negative 3
9 8400 2020-12-31 future treatment 1
10 8401 2017-08-29 negative 0
11 8401 2017-09-15 positive 1
12 8500 2018-10-10 positive 1
Cumulative count but not if already seen, then assign previous count value
Building on answer from Chris. Change sort to True if you want it in alphabetical order.
import pandas as pd
dict_d = {'Item': ['b', 'a','c','d','c','d','e','b','f','g']}
df = pd.DataFrame(data=dict_d)
df['Order Number'] = df.groupby(pd.Grouper(key='Item',sort=False)).ngroup()+1
print(df)
Output:
Item Order Number
0 b 1
1 a 2
2 c 3
3 d 4
4 c 3
5 d 4
6 e 5
7 b 1
8 f 6
9 g 7
Edit: To incorporate another column, simply add another grouper. For example.
import pandas as pd
dict_d = {'Item': ['b', 'a','c','d','c','d','e','b','f','g'],
'User_ID': ['111', '231','431','333','132','345','253','111','335','999']}
df = pd.DataFrame(data=dict_d)
df['Order Number'] = df.groupby([
pd.Grouper(key='Item',sort=False),
pd.Grouper(key='User_ID',sort=False)
]).ngroup()+1
print(df)
Output:
Item User_ID Order Number
0 b 111 2
1 a 231 1
2 c 431 4
3 d 333 5
4 c 132 3
5 d 345 6
6 e 253 7
7 b 111 2
8 f 335 8
9 g 999 9
Getting a cumulative count of rows in each column excluding NAs in R
We can loop over the column with lapply
, convert to logical, do the cumsum
and assign the output back to the original or a copy of the original object. Make sure to use []
to preserve the attributes
test1 <- test
test1[] <- lapply(test, function(x) cumsum(!is.na(x)))
-output
head(test1)
CT_CW.QA.RWL.H1A1Y CT_CW.QA.RWL.H1A1Z CT_CW.QA.RWL.H1A2Y CT_CW.QA.RWL.H1A2Z CT_CW.QA.RWL.H1A3Y CT_CW.QA.RWL.H1A3Z CT_CW.QA.RWL.H1A4Y
1812 0 0 0 0 0 0 0
1813 0 0 0 0 0 0 0
1814 0 0 0 0 0 0 0
1815 0 0 0 0 0 0 0
1816 0 0 0 0 0 0 0
1817 0 0 0 0 0 0 0
CT_CW.QA.RWL.H1A4Z CT_CW.QA.RWL.H1A5Y CT_CW.QA.RWL.H1A5Z CT_CW.QA.RWL.H1A6Y CT_CW.QA.RWL.H1A6Z CT_CW.QA.RWL.H1A7Y CT_CW.QA.RWL.H1A7Z
1812 0 0 1 0 0 0 0
1813 0 0 2 0 0 0 0
1814 0 0 3 0 1 0 0
1815 0 0 4 0 2 0 0
1816 0 0 5 0 3 0 0
1817 0 0 6 0 4 0 0
CT_CW.QA.RWL.H1A8Y
1812 0
1813 0
1814 0
1815 0
1816 0
1817 0
....
There is a colCumsums
from matrixStats
library(matrixStats)
test1[] <- colCumsums(!is.na(test))
The issue with cumsum
returning a vector is because colSums
is a single vector
of one observation per each column, and cumsum
on it returns the cumulative sum of column sums instead of cumulative sum inside each column
Cumulative count of column based on Month
Use groupby
on the month (and year to be safe) information from Period
and apply cumsum
:
year_col = pd.to_datetime(df_2['Period']).dt.year
month_col = pd.to_datetime(df_2['Period']).dt.month
df_2['count'] = df_2.groupby([year_col, month_col])['size'].cumsum()
Result:
Period Code size count
0 2022-04-29 A 2 2
1 2022-04-30 A 1 3
2 2022-05-01 A 3 3
Pandas groupby cumcount - one cumulative count rather than a cumulative count for each unique value
Lets try sort df, check consecutive difference, create new group by cumsum and then resort the df
new_df=df.sort_values(by=['name','game_id'])
new_df=new_df.assign(rank=new_df['game_id']!=new_df['game_id'].shift())
new_df=new_df.assign(rank=new_df.groupby('name')['rank'].cumsum()).sort_index()
print(new_df)
name game_id rank
0 pam 0 1
1 pam 0 1
2 bob 1 1
3 bob 1 1
4 pam 0 1
5 bob 2 2
6 pam 1 2
7 bob 2 2
How to find out the cumulative count between numbers?
If performance is important count consecutive 0
values from difference column:
m = df['diff'].eq(0)
b = m.cumsum()
df['out'] = b.sub(b.mask(m).ffill().fillna(0)).astype(int)
print (df)
Value diff need out
0 6 NaN na 0
1 5 -1.0 0 0
2 5 0.0 1 1
3 5 0.0 2 2
4 4 -1.0 0 0
5 4 0.0 1 1
6 4 0.0 2 2
7 4 0.0 3 3
8 4 0.0 4 4
9 5 1.0 0 0
10 5 0.0 1 1
11 5 0.0 2 2
12 5 0.0 3 3
13 6 1.0 0 0
14 7 1.0 0 0
Related Topics
Delete "" from CSV Values and Change Column Names When Writing to a CSV
Dplyr on Data.Table, am I Really Using Data.Table
Finding 2 & 3 Word Phrases Using R Tm Package
Convert Four Digit Year Values to Class Date
Get All Diagonal Vectors from Matrix
Struggling with Integers (Maximum Integer Size)
How to Reorder Data.Table Columns (Without Copying)
Percentage on Y Lab in a Faceted Ggplot Barchart
Splitting a File Name into Name,Extension
Plot One Numeric Variable Against N Numeric Variables in N Plots
How to Count True Values in a Logical Vector
Referring to Data.Table Columns by Names Saved in Variables
Download a File from Https Using Download.File()
Show Frequencies Along with Barplot in Ggplot2
Cumulative Count of Each Value
Cumulative Sum Until Maximum Reached, Then Repeat from Zero in the Next Row