Count Number of Records and Generate Row Number Within Each Group in a Data.Table

Count number of records and generate row number within each group in a data.table

Using .N...

DT[ , `:=`( COUNT = .N , IDX = 1:.N ) , by = VAL ]
# VAL COUNT IDX
# 1: 1 3 1
# 2: 2 4 1
# 3: 2 4 2
# 4: 3 3 1
# 5: 1 3 2
# 6: 3 3 2
# 7: 3 3 3
# 8: 2 4 3
# 9: 2 4 4
#10: 1 3 3

.N is the number of records in each group, with groups defined by "VAL".

Count number of groups with single rows in r data table

If you need to count only the groups which has count as 1 you can do

library(data.table)
nrow(dt[ , .(count := .N), by = .(name, type)][count == 1])

Or :

sum(dt[ , .(count := .N), by = .(name, type)]$count == 1)

If you want to subset the rows where number of rows is 1 in a group you can do

dt[, .SD[.N == 1], (name, type)]

and using nrow on this would give you again count of groups.

Count rows in data table with certain values by group

You can solve it as follows:

cols <- c("number_of_offices", "number_of_apartments")
df[, (cols) := .(sum(Type == "office"), sum(Type == "apartment")), Property]

# Property Type number_of_offices number_of_apartments
# 1: 1 apartment 1 1
# 2: 1 office 1 1
# 3: 2 office 2 0
# 4: 2 office 2 0
# 5: 3 apartment 1 2
# 6: 3 apartment 1 2
# 7: 3 office 1 2

Numbering rows within groups in a data frame

Use ave, ddply, dplyr or data.table:

df$num <- ave(df$val, df$cat, FUN = seq_along)

or:

library(plyr)
ddply(df, .(cat), mutate, id = seq_along(val))

or:

library(dplyr)
df %>% group_by(cat) %>% mutate(id = row_number())

or (the most memory efficient, as it assigns by reference within DT):

library(data.table)
DT <- data.table(df)

DT[, id := seq_len(.N), by = cat]
DT[, id := rowid(cat)]

count number of rows in a data frame in R based on group

Here's an example that shows how table(.) (or, more closely matching your desired output, data.frame(table(.)) does what it sounds like you are asking for.

Note also how to share reproducible sample data in a way that others can copy and paste into their session.

Here's the (reproducible) sample data:

mydf <- structure(list(ID = c(110L, 111L, 121L, 131L, 141L), 
MONTH.YEAR = c("JAN. 2012", "JAN. 2012",
"FEB. 2012", "FEB. 2012",
"MAR. 2012"),
VALUE = c(1000L, 2000L, 3000L, 4000L, 5000L)),
.Names = c("ID", "MONTH.YEAR", "VALUE"),
class = "data.frame", row.names = c(NA, -5L))

mydf
# ID MONTH.YEAR VALUE
# 1 110 JAN. 2012 1000
# 2 111 JAN. 2012 2000
# 3 121 FEB. 2012 3000
# 4 131 FEB. 2012 4000
# 5 141 MAR. 2012 5000

Here's the calculation of the number of rows per group, in two output display formats:

table(mydf$MONTH.YEAR)
#
# FEB. 2012 JAN. 2012 MAR. 2012
# 2 2 1

data.frame(table(mydf$MONTH.YEAR))
# Var1 Freq
# 1 FEB. 2012 2
# 2 JAN. 2012 2
# 3 MAR. 2012 1

How do I count the numbers of occurrences for each group in a tidy data.table?

Maybe use sum on marker column:

DT[, num_markers := sum(marker), by = id ][]

# id marker num_markers
# 1: 1 TRUE 1
# 2: 1 FALSE 1
# 3: 1 FALSE 1
# 4: 2 TRUE 3
# 5: 2 FALSE 3
# 6: 2 TRUE 3
# 7: 2 TRUE 3
# 8: 2 FALSE 3

Count number of rows per group and add result to original data frame

Using data.table:

library(data.table)
dt = as.data.table(df)

# or coerce to data.table by reference:
# setDT(df)

dt[ , count := .N, by = .(name, type)]

For pre-data.table 1.8.2 alternative, see edit history.


Using dplyr:

library(dplyr)
df %>%
group_by(name, type) %>%
mutate(count = n())

Or simply:

add_count(df, name, type)

Using plyr:

plyr::ddply(df, .(name, type), transform, count = length(num))

data.table approach for creating a running sequential number for each row in a group

How about this data.table solution:

library(data.table)
setDT(x)
x[, days_between := c(0, diff(recording_date)), by = .(artist_id)
][, course_number := 1L + cumsum(days_between > 7), by = .(artist_id)
][, session_in_course := seq_len(.N), by = .(artist_id, course_number)]
# artist_id session_number_total CustomerRecordId SiteRecordId recording_date control_panel year days_between course_number session_in_course
# <int> <int> <int> <int> <Date> <char> <int> <num> <int> <int>
# 1: 257 1 4 5 2013-12-23 Left 2013 0 1 1
# 2: 257 2 4 5 2013-12-24 Left 2013 1 1 2
# 3: 257 3 4 5 2013-12-26 Left 2013 2 1 3
# 4: 257 4 4 5 2013-12-27 Left 2013 1 1 4
# 5: 257 5 4 5 2014-01-04 Left 2014 8 2 1
# 6: 257 6 4 5 2014-01-09 Left 2014 5 2 2
# 7: 257 7 4 5 2014-01-17 Left 2014 8 3 1
# 8: 257 8 4 5 2014-01-22 Left 2014 5 3 2
# 9: 421 1 5 10 2013-11-18 Bilateral 2013 0 1 1
# 10: 421 2 5 10 2013-11-19 Bilateral 2013 1 1 2
# 11: 421 3 5 10 2013-11-26 Bilateral 2013 7 1 3
# 12: 421 4 5 10 2013-11-29 Bilateral 2013 3 1 4
# 13: 421 5 5 10 2013-12-17 Bilateral 2013 18 2 1
# 14: 421 6 5 10 2013-12-19 Bilateral 2013 2 2 2
# 15: 421 7 5 10 2013-12-26 Bilateral 2013 7 2 3
# 16: 421 8 5 10 2014-01-02 Bilateral 2014 7 2 4
# 17: 421 9 5 10 2014-01-03 Bilateral 2014 1 2 5
# 18: 421 10 5 10 2014-01-07 Bilateral 2014 4 2 6
# 19: 421 11 5 10 2014-01-09 Bilateral 2014 2 2 7
# 20: 421 12 5 10 2014-01-13 Bilateral 2014 4 2 8
# 21: 421 13 5 10 2014-01-16 Bilateral 2014 3 2 9
# 22: 421 14 5 10 2014-01-17 Bilateral 2014 1 2 10
# 23: 421 15 5 10 2014-01-20 Bilateral 2014 3 2 11
# 24: 421 16 5 10 2014-01-21 Bilateral 2014 1 2 12
# 25: 421 17 5 10 2014-01-24 Bilateral 2014 3 2 13
# 26: 421 18 5 10 2014-02-10 Bilateral 2014 17 3 1
# artist_id session_number_total CustomerRecordId SiteRecordId recording_date control_panel year days_between course_number session_in_course


Related Topics



Leave a reply



Submit