Data.Table "Key Indices" or "Group Counter"

data.table key indices or group counter

Update: From v1.8.3, you can simply use the inbuilt special .GRP:

DT[ , i := .GRP, by = key(DT)]

See history for older answers.

Group counter that restarts (with R data.table)

You can achieve that with rleid function on y column grouped by x. rleid is a type of counter that increase each time there is a change and stay the same otherwise

library(data.table)
tab <- fread("
x y i d
A B 1 1
A B 1 1
A C 2 2
A D 3 3
B A 1 4
B A 1 4
C A 1 4
C A 1 4
C B 2 5
C C 3 6
C C 3 6
C D 4 7")

dt <- tab[, .(x, y, i)]
dt[, d:= rleid(y), by = .(x)]
dt
#> x y i d
#> 1: A B 1 1
#> 2: A B 1 1
#> 3: A C 2 2
#> 4: A D 3 3
#> 5: B A 1 1
#> 6: B A 1 1
#> 7: C A 1 1
#> 8: C A 1 1
#> 9: C B 2 2
#> 10: C C 3 3
#> 11: C C 3 3
#> 12: C D 4 4

Created on 2018-06-03 by the reprex package (v0.2.0).

data.table within group id

dat[, z:=.GRP,by=list(x,y)]
dat
# x y z
# 1: A 1 1
# 2: A 1 1
# 3: A 1 1
# 4: A 1 1
# 5: A 3 2
# 6: A 3 2
# 7: A 3 2
# 8: A 3 2
# 9: B 2 3
# 10: B 2 3
# ...

adding a record counter to a data.table

In the spirit of completeness in case someone stumbles across this.

DT[, newcol := 1:.N] 

is how I solved the problem. Thanks to go @thelatemail and @Simon

Enumerate groups within groups in a data.table

Try

library(data.table)
dt[, id := rleid(cl), by=gr]
dt
# gr cl id
# 1: a a 1
# 2: a a 1
# 3: a a 1
# 4: a b 2
# 5: a b 2
# 6: a b 2
# 7: b c 1
# 8: b c 1
# 9: b c 1
#10: b d 2
#11: b d 2
#12: b d 2

determining row indices of data.table group members

Available since data.table 1.8.3 you can use .I in the j of a data.table to get the row indices by groups...

DT[ , list( yidx = list(.I) ) , by = y ]
# y yidx
#1: 1 1,4,7
#2: 3 2,5,8
#3: 6 3,6,9

data.table - keep first row per group OR based on condition

Try this.

Using mpg >= 50, we should get one row per carb:

x[ rowid(carb) == 1 | mpg >= 50,]
# mpg cyl disp hp drat wt qsec vs am gear carb
# <num> <num> <num> <num> <num> <num> <num> <num> <num> <num> <num>
# 1: 21.0 6 160.0 110 3.90 2.62 16.46 0 1 4 4
# 2: 22.8 4 108.0 93 3.85 2.32 18.61 1 1 4 1
# 3: 18.7 8 360.0 175 3.15 3.44 17.02 0 0 3 2
# 4: 16.4 8 275.8 180 3.07 4.07 17.40 0 0 3 3
# 5: 19.7 6 145.0 175 3.62 2.77 15.50 0 1 5 6
# 6: 15.0 8 301.0 335 3.54 3.57 14.60 0 1 5 8

Using mpg >= 30 (since all(mpg > 10)), we should get all of the above plus a few more:

x[ rowid(carb) == 1 | mpg >= 30,]
# mpg cyl disp hp drat wt qsec vs am gear carb
# <num> <num> <num> <num> <num> <num> <num> <num> <num> <num> <num>
# 1: 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
# 2: 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
# 3: 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
# 4: 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
# 5: 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
# 6: 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
# 7: 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
# 8: 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
# 9: 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
# 10: 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8

An alternative, in case you need more grouping variables:

x[, .SD[seq_len(.N) == 1L | mpg >= 30,], by = carb]

though I've been informed that rowid(...) is more efficient than seq_len(.N).

What is the most efficient method for finding row indices by group in a data.table in R?

Using .I, this option returns a data.table with two columns. The first column is the unique values in a, and the second is a list of indices where each value appears in k. The form is different than the OP's m, but the information is all there and just as easily accessible.

k[, .(idx = .(.I)), a]

Benchmarking:

library(data.table)

k <- data.table(a = sample(factor(seq_len(200)), size = 1e6, replace = TRUE))

microbenchmark::microbenchmark(
A = {
u <- unique(k$a)
m <- lapply(u, function(x) k[a == x, which = TRUE])
},
B = {
m2 <- k[, .(idx = .(.I)), a]
},
times = 100
)
#> Unit: milliseconds
#> expr min lq mean median uq max neval
#> A 282.0331 309.2662 335.30146 325.3355 350.51080 525.7929 100
#> B 9.7870 10.3598 13.04379 10.8292 12.73785 65.4864 100

all.equal(m, m2$idx)
#> [1] TRUE

all.equal(u, m2$a)
#> [1] TRUE

for each key after index number create table , key counter Twig

use slice with split filter https://twig.symfony.com/doc/3.x/filters/slice.html

like

{% set shoeSizeArray = key.shoesize|split(',')|slice(3) %} 
{% set sortimentArray = key.sortiment|split(',')|slice(4) %}


Related Topics



Leave a reply



Submit