Data.Table "Key Indices" or "Group Counter"

data.table key indices or group counter

Update: From v1.8.3, you can simply use the inbuilt special .GRP:

DT[ , i := .GRP, by = key(DT)]

See history for older answers.

Group counter that restarts (with R data.table)

You can achieve that with rleid function on y column grouped by x. rleid is a type of counter that increase each time there is a change and stay the same otherwise

library(data.table)
tab <- fread("
x y i d
A B 1 1
A B 1 1
A C 2 2
A D 3 3
B A 1 4
B A 1 4 
C A 1 4
C A 1 4 
C B 2 5
C C 3 6
C C 3 6
C D 4 7")

dt <- tab[, .(x, y, i)]
dt[, d:= rleid(y), by = .(x)]
dt
#>     x y i d
#>  1: A B 1 1
#>  2: A B 1 1
#>  3: A C 2 2
#>  4: A D 3 3
#>  5: B A 1 1
#>  6: B A 1 1
#>  7: C A 1 1
#>  8: C A 1 1
#>  9: C B 2 2
#> 10: C C 3 3
#> 11: C C 3 3
#> 12: C D 4 4

Created on 2018-06-03 by the reprex package (v0.2.0).

data.table within group id

dat[, z:=.GRP,by=list(x,y)]
dat
#     x y z
#  1: A 1 1
#  2: A 1 1
#  3: A 1 1
#  4: A 1 1
#  5: A 3 2
#  6: A 3 2
#  7: A 3 2
#  8: A 3 2
#  9: B 2 3
# 10: B 2 3
# ...

adding a record counter to a data.table

In the spirit of completeness in case someone stumbles across this.

DT[, newcol := 1:.N]

is how I solved the problem. Thanks to go @thelatemail and @Simon

Enumerate groups within groups in a data.table

Try

library(data.table)
dt[, id := rleid(cl), by=gr]
dt
#    gr cl id
# 1:  a  a  1
# 2:  a  a  1
# 3:  a  a  1
# 4:  a  b  2
# 5:  a  b  2
# 6:  a  b  2
# 7:  b  c  1
# 8:  b  c  1
# 9:  b  c  1
#10:  b  d  2
#11:  b  d  2
#12:  b  d  2

determining row indices of data.table group members

Available since data.table 1.8.3 you can use .I in the j of a data.table to get the row indices by groups...

DT[ , list( yidx = list(.I) ) , by = y ]
#   y  yidx
#1: 1 1,4,7
#2: 3 2,5,8
#3: 6 3,6,9

data.table - keep first row per group OR based on condition

Try this.

Using mpg >= 50, we should get one row per carb:

x[ rowid(carb) == 1 | mpg >= 50,]
#      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#    <num> <num> <num> <num> <num> <num> <num> <num> <num> <num> <num>
# 1:  21.0     6 160.0   110  3.90  2.62 16.46     0     1     4     4
# 2:  22.8     4 108.0    93  3.85  2.32 18.61     1     1     4     1
# 3:  18.7     8 360.0   175  3.15  3.44 17.02     0     0     3     2
# 4:  16.4     8 275.8   180  3.07  4.07 17.40     0     0     3     3
# 5:  19.7     6 145.0   175  3.62  2.77 15.50     0     1     5     6
# 6:  15.0     8 301.0   335  3.54  3.57 14.60     0     1     5     8

Using mpg >= 30 (since all(mpg > 10)), we should get all of the above plus a few more:

x[ rowid(carb) == 1 | mpg >= 30,]
#       mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#     <num> <num> <num> <num> <num> <num> <num> <num> <num> <num> <num>
#  1:  21.0     6 160.0   110  3.90 2.620 16.46     0     1     4     4
#  2:  22.8     4 108.0    93  3.85 2.320 18.61     1     1     4     1
#  3:  18.7     8 360.0   175  3.15 3.440 17.02     0     0     3     2
#  4:  16.4     8 275.8   180  3.07 4.070 17.40     0     0     3     3
#  5:  32.4     4  78.7    66  4.08 2.200 19.47     1     1     4     1
#  6:  30.4     4  75.7    52  4.93 1.615 18.52     1     1     4     2
#  7:  33.9     4  71.1    65  4.22 1.835 19.90     1     1     4     1
#  8:  30.4     4  95.1   113  3.77 1.513 16.90     1     1     5     2
#  9:  19.7     6 145.0   175  3.62 2.770 15.50     0     1     5     6
# 10:  15.0     8 301.0   335  3.54 3.570 14.60     0     1     5     8

An alternative, in case you need more grouping variables:

x[, .SD[seq_len(.N) == 1L | mpg >= 30,], by = carb]

though I've been informed that rowid(...) is more efficient than seq_len(.N).

What is the most efficient method for finding row indices by group in a data.table in R?

Using .I, this option returns a data.table with two columns. The first column is the unique values in a, and the second is a list of indices where each value appears in k. The form is different than the OP's m, but the information is all there and just as easily accessible.

k[, .(idx = .(.I)), a]

Benchmarking:

library(data.table)

k <- data.table(a = sample(factor(seq_len(200)), size = 1e6, replace = TRUE))

microbenchmark::microbenchmark(
  A = {
    u <- unique(k$a)
    m <- lapply(u, function(x) k[a == x, which = TRUE])
  },
  B = {
    m2 <- k[, .(idx = .(.I)), a]
  },
  times = 100
)
#> Unit: milliseconds
#>  expr      min       lq      mean   median        uq      max neval
#>     A 282.0331 309.2662 335.30146 325.3355 350.51080 525.7929   100
#>     B   9.7870  10.3598  13.04379  10.8292  12.73785  65.4864   100

all.equal(m, m2$idx)
#> [1] TRUE

all.equal(u, m2$a)
#> [1] TRUE

for each key after index number create table , key counter Twig

use slice with split filter https://twig.symfony.com/doc/3.x/filters/slice.html

like

{% set shoeSizeArray = key.shoesize|split(',')|slice(3) %} 
{% set sortimentArray = key.sortiment|split(',')|slice(4) %}

Data.Table "Key Indices" or "Group Counter"