How to Number/Label Data-Table by Group-Number from Group_By

How to number/label data-table by group-number from group_by?

Updated answer

get_group_number = function(){
i = 0
function(){
i <<- i+1
i
}
}
group_number = get_group_number()
df %>% group_by(u,v) %>% mutate(label = group_number())

You can also consider the following slightly unreadable version

group_number = (function(){i = 0; function() i <<- i+1 })()
df %>% group_by(u,v) %>% mutate(label = group_number())

using iterators package

library(iterators)

counter = icount()
df %>% group_by(u,v) %>% mutate(label = nextElem(counter))

How to give numbers to each group of a dataframe with dplyr::group_by?

Use mutate to add a column which is just a numeric form of from as a factor:

df %>% mutate(group_no = as.integer(factor(from)))

# from dest group_no
# 1 a b 1
# 2 a c 1
# 3 b d 2

Note group_by isn't necessary here, unless you're using it for other purposes. If you want to group by the new column for use later, you can use group_by instead of mutate to add the column.

How to label a double grouped data frame with a group number with group_indices in dplyr?

We can use dense_rank.

library(dplyr)

db2 <- db %>%
group_by(ID) %>%
mutate(rank = dense_rank(date)) %>%
ungroup()
db2
# # A tibble: 10 x 3
# ID date rank
# <dbl> <date> <int>
# 1 1. 2001-01-01 1
# 2 1. 2001-01-01 1
# 3 1. 2001-01-01 1
# 4 1. 2001-01-03 2
# 5 1. 2001-01-03 2
# 6 2. 2011-01-01 3
# 7 2. 2011-01-01 3
# 8 2. 2010-03-12 2
# 9 2. 2010-03-12 2
# 10 2. 2001-01-01 1

Numbering of groups in dplyr?

Using as.numeric will do the trick.

S <- rep(letters[1:12],each=6)
R = sort(replicate(9, sample(5000:6000,4)))
df <- data.frame(R,S)

result <- df %>% mutate(label = as.numeric(S)) %>% group_by(S)

result
Source: local data frame [72 x 3]
Groups: S

R S label
1 5018 a 1
2 5042 a 1
3 5055 a 1
4 5066 a 1
5 5081 a 1
6 5133 a 1
7 5149 b 2
8 5191 b 2
9 5197 b 2
10 5248 b 2
.. ... . ...

R: add a dplyr group label as a number

I think in this case something as simple as :

df %>%
mutate(group_no = as.integer(name))

will work

# A tibble: 20 x 4
# Groups: id [2]
id name val group_no
<fct> <fct> <dbl> <int>
1 a N1 0.647 1
2 a N1 0.530 1
3 a N1 0.245 1
4 a N2 0.693 2
5 a N2 0.478 2
6 a N2 0.861 2
7 a N3 0.821 3
8 a N3 0.0995 3
9 a N3 0.662 3
10 b N1 0.553 1
11 b N1 0.0233 1
12 b N1 0.519 1
13 b N2 0.783 2
14 b N2 0.789 2
15 b N2 0.477 2
16 b N2 0.438 2
17 b N2 0.407 2
18 b N3 0.732 3
19 b N3 0.0707 3
20 b N3 0.316 3

Numbering rows within groups in a data frame

Use ave, ddply, dplyr or data.table:

df$num <- ave(df$val, df$cat, FUN = seq_along)

or:

library(plyr)
ddply(df, .(cat), mutate, id = seq_along(val))

or:

library(dplyr)
df %>% group_by(cat) %>% mutate(id = row_number())

or (the most memory efficient, as it assigns by reference within DT):

library(data.table)
DT <- data.table(df)

DT[, id := seq_len(.N), by = cat]
DT[, id := rowid(cat)]

Enumerate groups within groups in a data.table

Try

library(data.table)
dt[, id := rleid(cl), by=gr]
dt
# gr cl id
# 1: a a 1
# 2: a a 1
# 3: a a 1
# 4: a b 2
# 5: a b 2
# 6: a b 2
# 7: b c 1
# 8: b c 1
# 9: b c 1
#10: b d 2
#11: b d 2
#12: b d 2

R grouping data by numeric numbers in a column

We can use tidvyerse to do a group by operation. Create a group of ranges with cut, summarise the frequency count based on the cut and the 'variants', then paste them together in summarise

library(dplyr)
patient1 %>%
group_by(group = cut(position, breaks = c(-Inf, seq(1, 100,
by = 10))), variants) %>%
summarise(n = n()) %>%
summarise(tally = paste(n, variants, collapse=' ', sep=""))

NOTE: Another option is findInterval which does similar option as cut but without the labels as it will output numeric index

R - Group by variable and then assign a unique ID

dplyr has a group_indices function for creating unique group IDs

library(dplyr)
data <- data.frame(personal_id = c("111-111-111", "999-999-999", "222-222-222", "111-111-111"),
gender = c("M", "F", "M", "M"),
temperature = c(99.6, 98.2, 97.8, 95.5))

data$group_id <- data %>% group_indices(personal_id)
data <- data %>% select(-personal_id)

data
gender temperature group_id
1 M 99.6 1
2 F 98.2 3
3 M 97.8 2
4 M 95.5 1

Or within the same pipeline (https://github.com/tidyverse/dplyr/issues/2160):

data %>% 
mutate(group_id = group_indices(., personal_id))


Related Topics



Leave a reply



Submit