Count number of records and generate row number within each group in a data.table
Using .N
...
DT[ , `:=`( COUNT = .N , IDX = 1:.N ) , by = VAL ]
# VAL COUNT IDX
# 1: 1 3 1
# 2: 2 4 1
# 3: 2 4 2
# 4: 3 3 1
# 5: 1 3 2
# 6: 3 3 2
# 7: 3 3 3
# 8: 2 4 3
# 9: 2 4 4
#10: 1 3 3
.N
is the number of records in each group, with groups defined by "VAL"
.
Count number of groups with single rows in r data table
If you need to count only the groups which has count
as 1 you can do
library(data.table)
nrow(dt[ , .(count := .N), by = .(name, type)][count == 1])
Or :
sum(dt[ , .(count := .N), by = .(name, type)]$count == 1)
If you want to subset the rows where number of rows is 1 in a group you can do
dt[, .SD[.N == 1], (name, type)]
and using nrow
on this would give you again count of groups.
Count rows in data table with certain values by group
You can solve it as follows:
cols <- c("number_of_offices", "number_of_apartments")
df[, (cols) := .(sum(Type == "office"), sum(Type == "apartment")), Property]
# Property Type number_of_offices number_of_apartments
# 1: 1 apartment 1 1
# 2: 1 office 1 1
# 3: 2 office 2 0
# 4: 2 office 2 0
# 5: 3 apartment 1 2
# 6: 3 apartment 1 2
# 7: 3 office 1 2
Numbering rows within groups in a data frame
Use ave
, ddply
, dplyr
or data.table
:
df$num <- ave(df$val, df$cat, FUN = seq_along)
or:
library(plyr)
ddply(df, .(cat), mutate, id = seq_along(val))
or:
library(dplyr)
df %>% group_by(cat) %>% mutate(id = row_number())
or (the most memory efficient, as it assigns by reference within DT
):
library(data.table)
DT <- data.table(df)
DT[, id := seq_len(.N), by = cat]
DT[, id := rowid(cat)]
count number of rows in a data frame in R based on group
Here's an example that shows how table(.)
(or, more closely matching your desired output, data.frame(table(.))
does what it sounds like you are asking for.
Note also how to share reproducible sample data in a way that others can copy and paste into their session.
Here's the (reproducible) sample data:
mydf <- structure(list(ID = c(110L, 111L, 121L, 131L, 141L),
MONTH.YEAR = c("JAN. 2012", "JAN. 2012",
"FEB. 2012", "FEB. 2012",
"MAR. 2012"),
VALUE = c(1000L, 2000L, 3000L, 4000L, 5000L)),
.Names = c("ID", "MONTH.YEAR", "VALUE"),
class = "data.frame", row.names = c(NA, -5L))
mydf
# ID MONTH.YEAR VALUE
# 1 110 JAN. 2012 1000
# 2 111 JAN. 2012 2000
# 3 121 FEB. 2012 3000
# 4 131 FEB. 2012 4000
# 5 141 MAR. 2012 5000
Here's the calculation of the number of rows per group, in two output display formats:
table(mydf$MONTH.YEAR)
#
# FEB. 2012 JAN. 2012 MAR. 2012
# 2 2 1
data.frame(table(mydf$MONTH.YEAR))
# Var1 Freq
# 1 FEB. 2012 2
# 2 JAN. 2012 2
# 3 MAR. 2012 1
How do I count the numbers of occurrences for each group in a tidy data.table?
Maybe use sum on marker column:
DT[, num_markers := sum(marker), by = id ][]
# id marker num_markers
# 1: 1 TRUE 1
# 2: 1 FALSE 1
# 3: 1 FALSE 1
# 4: 2 TRUE 3
# 5: 2 FALSE 3
# 6: 2 TRUE 3
# 7: 2 TRUE 3
# 8: 2 FALSE 3
Count number of rows per group and add result to original data frame
Using data.table
:
library(data.table)
dt = as.data.table(df)
# or coerce to data.table by reference:
# setDT(df)
dt[ , count := .N, by = .(name, type)]
For pre-data.table 1.8.2
alternative, see edit history.
Using dplyr
:
library(dplyr)
df %>%
group_by(name, type) %>%
mutate(count = n())
Or simply:
add_count(df, name, type)
Using plyr
:
plyr::ddply(df, .(name, type), transform, count = length(num))
data.table approach for creating a running sequential number for each row in a group
How about this data.table
solution:
library(data.table)
setDT(x)
x[, days_between := c(0, diff(recording_date)), by = .(artist_id)
][, course_number := 1L + cumsum(days_between > 7), by = .(artist_id)
][, session_in_course := seq_len(.N), by = .(artist_id, course_number)]
# artist_id session_number_total CustomerRecordId SiteRecordId recording_date control_panel year days_between course_number session_in_course
# <int> <int> <int> <int> <Date> <char> <int> <num> <int> <int>
# 1: 257 1 4 5 2013-12-23 Left 2013 0 1 1
# 2: 257 2 4 5 2013-12-24 Left 2013 1 1 2
# 3: 257 3 4 5 2013-12-26 Left 2013 2 1 3
# 4: 257 4 4 5 2013-12-27 Left 2013 1 1 4
# 5: 257 5 4 5 2014-01-04 Left 2014 8 2 1
# 6: 257 6 4 5 2014-01-09 Left 2014 5 2 2
# 7: 257 7 4 5 2014-01-17 Left 2014 8 3 1
# 8: 257 8 4 5 2014-01-22 Left 2014 5 3 2
# 9: 421 1 5 10 2013-11-18 Bilateral 2013 0 1 1
# 10: 421 2 5 10 2013-11-19 Bilateral 2013 1 1 2
# 11: 421 3 5 10 2013-11-26 Bilateral 2013 7 1 3
# 12: 421 4 5 10 2013-11-29 Bilateral 2013 3 1 4
# 13: 421 5 5 10 2013-12-17 Bilateral 2013 18 2 1
# 14: 421 6 5 10 2013-12-19 Bilateral 2013 2 2 2
# 15: 421 7 5 10 2013-12-26 Bilateral 2013 7 2 3
# 16: 421 8 5 10 2014-01-02 Bilateral 2014 7 2 4
# 17: 421 9 5 10 2014-01-03 Bilateral 2014 1 2 5
# 18: 421 10 5 10 2014-01-07 Bilateral 2014 4 2 6
# 19: 421 11 5 10 2014-01-09 Bilateral 2014 2 2 7
# 20: 421 12 5 10 2014-01-13 Bilateral 2014 4 2 8
# 21: 421 13 5 10 2014-01-16 Bilateral 2014 3 2 9
# 22: 421 14 5 10 2014-01-17 Bilateral 2014 1 2 10
# 23: 421 15 5 10 2014-01-20 Bilateral 2014 3 2 11
# 24: 421 16 5 10 2014-01-21 Bilateral 2014 1 2 12
# 25: 421 17 5 10 2014-01-24 Bilateral 2014 3 2 13
# 26: 421 18 5 10 2014-02-10 Bilateral 2014 17 3 1
# artist_id session_number_total CustomerRecordId SiteRecordId recording_date control_panel year days_between course_number session_in_course
Related Topics
How to Use Dplyr's Summarize and Which() to Lookup Min/Max Values
Merge Dataframes, Different Lengths
Combining S4 and S3 Methods in a Single Function
R Solve:System Is Exactly Singular
Adding Column If It Does Not Exist
How to Rbind Vectors Matching Their Column Names
Pad with Leading Zeros to Common Width
Return Df with a Columns Values That Occur More Than Once
Geom_Tile and Facet_Grid/Facet_Wrap for Same Height of Tiles
In Ggplot2, Coord_Flip and Free Scales Don't Work Together
R Random Forest Error - Type of Predictors in New Data Do Not Match
Add Color to Boxplot - "Continuous Value Supplied to Discrete Scale" Error
Replicate Each Row of Data.Frame and Specify the Number of Replications for Each Row