data.table key indices or group counter
Update: From v1.8.3
, you can simply use the inbuilt special .GRP
:
DT[ , i := .GRP, by = key(DT)]
See history for older answers.
Group counter that restarts (with R data.table)
You can achieve that with rleid
function on y
column grouped by x
. rleid
is a type of counter that increase each time there is a change and stay the same otherwise
library(data.table)
tab <- fread("
x y i d
A B 1 1
A B 1 1
A C 2 2
A D 3 3
B A 1 4
B A 1 4
C A 1 4
C A 1 4
C B 2 5
C C 3 6
C C 3 6
C D 4 7")
dt <- tab[, .(x, y, i)]
dt[, d:= rleid(y), by = .(x)]
dt
#> x y i d
#> 1: A B 1 1
#> 2: A B 1 1
#> 3: A C 2 2
#> 4: A D 3 3
#> 5: B A 1 1
#> 6: B A 1 1
#> 7: C A 1 1
#> 8: C A 1 1
#> 9: C B 2 2
#> 10: C C 3 3
#> 11: C C 3 3
#> 12: C D 4 4
Created on 2018-06-03 by the reprex package (v0.2.0).
data.table within group id
dat[, z:=.GRP,by=list(x,y)]
dat
# x y z
# 1: A 1 1
# 2: A 1 1
# 3: A 1 1
# 4: A 1 1
# 5: A 3 2
# 6: A 3 2
# 7: A 3 2
# 8: A 3 2
# 9: B 2 3
# 10: B 2 3
# ...
adding a record counter to a data.table
In the spirit of completeness in case someone stumbles across this.
DT[, newcol := 1:.N]
is how I solved the problem. Thanks to go @thelatemail and @Simon
Enumerate groups within groups in a data.table
Try
library(data.table)
dt[, id := rleid(cl), by=gr]
dt
# gr cl id
# 1: a a 1
# 2: a a 1
# 3: a a 1
# 4: a b 2
# 5: a b 2
# 6: a b 2
# 7: b c 1
# 8: b c 1
# 9: b c 1
#10: b d 2
#11: b d 2
#12: b d 2
determining row indices of data.table group members
Available since data.table
1.8.3 you can use .I
in the j
of a data.table
to get the row indices by groups...
DT[ , list( yidx = list(.I) ) , by = y ]
# y yidx
#1: 1 1,4,7
#2: 3 2,5,8
#3: 6 3,6,9
data.table - keep first row per group OR based on condition
Try this.
Using mpg >= 50
, we should get one row per carb
:
x[ rowid(carb) == 1 | mpg >= 50,]
# mpg cyl disp hp drat wt qsec vs am gear carb
# <num> <num> <num> <num> <num> <num> <num> <num> <num> <num> <num>
# 1: 21.0 6 160.0 110 3.90 2.62 16.46 0 1 4 4
# 2: 22.8 4 108.0 93 3.85 2.32 18.61 1 1 4 1
# 3: 18.7 8 360.0 175 3.15 3.44 17.02 0 0 3 2
# 4: 16.4 8 275.8 180 3.07 4.07 17.40 0 0 3 3
# 5: 19.7 6 145.0 175 3.62 2.77 15.50 0 1 5 6
# 6: 15.0 8 301.0 335 3.54 3.57 14.60 0 1 5 8
Using mpg >= 30
(since all(mpg > 10)
), we should get all of the above plus a few more:
x[ rowid(carb) == 1 | mpg >= 30,]
# mpg cyl disp hp drat wt qsec vs am gear carb
# <num> <num> <num> <num> <num> <num> <num> <num> <num> <num> <num>
# 1: 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
# 2: 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
# 3: 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
# 4: 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
# 5: 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
# 6: 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
# 7: 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
# 8: 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
# 9: 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
# 10: 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
An alternative, in case you need more grouping variables:
x[, .SD[seq_len(.N) == 1L | mpg >= 30,], by = carb]
though I've been informed that rowid(...)
is more efficient than seq_len(.N)
.
What is the most efficient method for finding row indices by group in a data.table in R?
Using .I
, this option returns a data.table
with two columns. The first column is the unique values in a
, and the second is a list of indices where each value appears in k
. The form is different than the OP's m
, but the information is all there and just as easily accessible.
k[, .(idx = .(.I)), a]
Benchmarking:
library(data.table)
k <- data.table(a = sample(factor(seq_len(200)), size = 1e6, replace = TRUE))
microbenchmark::microbenchmark(
A = {
u <- unique(k$a)
m <- lapply(u, function(x) k[a == x, which = TRUE])
},
B = {
m2 <- k[, .(idx = .(.I)), a]
},
times = 100
)
#> Unit: milliseconds
#> expr min lq mean median uq max neval
#> A 282.0331 309.2662 335.30146 325.3355 350.51080 525.7929 100
#> B 9.7870 10.3598 13.04379 10.8292 12.73785 65.4864 100
all.equal(m, m2$idx)
#> [1] TRUE
all.equal(u, m2$a)
#> [1] TRUE
for each key after index number create table , key counter Twig
use slice with split filter https://twig.symfony.com/doc/3.x/filters/slice.html
like
{% set shoeSizeArray = key.shoesize|split(',')|slice(3) %}
{% set sortimentArray = key.sortiment|split(',')|slice(4) %}
Related Topics
R Shiny: Handle Action Buttons in Data Table
Data.Table "Key Indices" or "Group Counter"
Aggregate Multiple Columns At Once
Reshaping Time Series Data from Wide to Tall Format (For Plotting)
Coalesce Two String Columns With Alternating Missing Values to One
Calculating Statistics on Subsets of Data
What Do Hjust and Vjust Do When Making a Plot Using Ggplot
Proper/Fastest Way to Reshape a Data.Table
Using Stat_Function and Facet_Wrap Together in Ggplot2 in R
How to Add Percentage or Count Labels Above Percentage Bar Plot
Simplest Way to Do Grouped Barplot
How to Plot a Function Curve in R
How to Convert Dataframe into Time Series
How to Position Two Legends Independently in Ggplot
How to Change the Y-Axis Figures into Percentages in a Barplot