Count Number of Rows Per Group and Add Result to Original Data Frame

Count number of rows per group and add result to original data frame

Using data.table:

library(data.table)
dt = as.data.table(df)

# or coerce to data.table by reference:
# setDT(df)

dt[ , count := .N, by = .(name, type)]

For pre-data.table 1.8.2 alternative, see edit history.


Using dplyr:

library(dplyr)
df %>%
group_by(name, type) %>%
mutate(count = n())

Or simply:

add_count(df, name, type)

Using plyr:

plyr::ddply(df, .(name, type), transform, count = length(num))

How to calculate number of rows per group in pandas dataframe and add it to original data

You are looking for a transform:

df['window_count'] = df.groupby(['ID','CHAMBER_TYPE','COMMODITY_CODE','DELIVERY_TYPE','DAY'])['ID'].transform('size')

By the way, there is no 'CHAMBER_TYPE' columns in your sample data.

count number of rows in a data frame in R based on group

Here's an example that shows how table(.) (or, more closely matching your desired output, data.frame(table(.)) does what it sounds like you are asking for.

Note also how to share reproducible sample data in a way that others can copy and paste into their session.

Here's the (reproducible) sample data:

mydf <- structure(list(ID = c(110L, 111L, 121L, 131L, 141L), 
MONTH.YEAR = c("JAN. 2012", "JAN. 2012",
"FEB. 2012", "FEB. 2012",
"MAR. 2012"),
VALUE = c(1000L, 2000L, 3000L, 4000L, 5000L)),
.Names = c("ID", "MONTH.YEAR", "VALUE"),
class = "data.frame", row.names = c(NA, -5L))

mydf
# ID MONTH.YEAR VALUE
# 1 110 JAN. 2012 1000
# 2 111 JAN. 2012 2000
# 3 121 FEB. 2012 3000
# 4 131 FEB. 2012 4000
# 5 141 MAR. 2012 5000

Here's the calculation of the number of rows per group, in two output display formats:

table(mydf$MONTH.YEAR)
#
# FEB. 2012 JAN. 2012 MAR. 2012
# 2 2 1

data.frame(table(mydf$MONTH.YEAR))
# Var1 Freq
# 1 FEB. 2012 2
# 2 JAN. 2012 2
# 3 MAR. 2012 1

Add a column that count number of rows until the first 1, by group in R

df <- data.frame(Group=c(1,1,1,1,2,2),
var1=c(1,0,0,1,1,1),
var2=c(0,0,1,1,0,0),
var3=c(0,1,0,0,0,1))

This works for any number of variables as long as the structure is the same as in the example (i.e. Group + many variables that are 0 or 1)

df %>% 
mutate(rownr = row_number()) %>%
pivot_longer(-c(Group, rownr)) %>%
group_by(Group, name) %>%
mutate(out = cumsum(value != 1 & (cumsum(value) < 1)) + 1,
out = ifelse(max(out) > n(), 0, max(out))) %>%
pivot_wider(names_from = c(name, name), values_from = c(value, out)) %>%
select(-rownr)

Returns:

  Group value_var1 value_var2 value_var3 out_var1 out_var2 out_var3
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 0 0 1 3 2
2 1 0 0 1 1 3 2
3 1 0 1 0 1 3 2
4 1 1 1 0 1 3 2
5 2 1 0 0 1 0 2
6 2 1 0 1 1 0 2

Pandas, group by count and add count to original dataframe?

IIUC

In [247]: df['count'] = df.groupby('kind').transform('count')

In [248]: df
Out[248]:
kind msg count
0 aaa aaa text 1 3
1 aaa aaa text 2 3
2 aaa aaa text 3 3
3 bb bb text 1 4
4 bb bb text 2 4
5 bb bb text 3 4
6 bb bb text 4 4
7 cccc cccc text 1 2
8 cccc cccc text 2 2
9 dd dd text 1 1
10 e e text 1 1
11 fff fff text 1 1

sorting:

In [249]: df.sort_values('count', ascending=False)
Out[249]:
kind msg count
3 bb bb text 1 4
4 bb bb text 2 4
5 bb bb text 3 4
6 bb bb text 4 4
0 aaa aaa text 1 3
1 aaa aaa text 2 3
2 aaa aaa text 3 3
7 cccc cccc text 1 2
8 cccc cccc text 2 2
9 dd dd text 1 1
10 e e text 1 1
11 fff fff text 1 1

Count number of rows within each group

Current best practice (tidyverse) is:

require(dplyr)
df1 %>% count(Year, Month)

Count rows in data table with certain values by group

You can solve it as follows:

cols <- c("number_of_offices", "number_of_apartments")
df[, (cols) := .(sum(Type == "office"), sum(Type == "apartment")), Property]

# Property Type number_of_offices number_of_apartments
# 1: 1 apartment 1 1
# 2: 1 office 1 1
# 3: 2 office 2 0
# 4: 2 office 2 0
# 5: 3 apartment 1 2
# 6: 3 apartment 1 2
# 7: 3 office 1 2

Count observations of distinct values per group and add a new column of counts for each value

Or without any additional library, you can just use table:

table(df$group,df$letter)

As you seem to work with data.table, you can also use dcast()

dcast(df, group~letter,length)


Related Topics



Leave a reply



Submit