R Creating a Sequence Table from Two Columns

R creating a sequence table from two columns

With base R, you could do something like:

data.frame(product=rep(dd$product, dd$max-dd$min+1),
mm=unlist(mapply(seq, dd$min, dd$max)))
# product mm
# 1 a 1
# 2 b 5
# 3 b 6
# 4 b 7
# 5 c 3
# 6 c 4
# 7 c 5
# 8 c 6
# 9 c 7

How to create a sequence numerical column based on two columns in r?

This is a perfect use case for the rleid function from the data.table package:

# example data
xx <- rep(Sys.Date(), 5)
xx <- c(xx, xx + lubridate::days(1))
id <- rep(c(1:4), c(2,3,3,2))
dat <- data.frame(date = xx, id = id)

# date id
# 1 2021-03-29 1
# 2 2021-03-29 1
# 3 2021-03-29 2
# 4 2021-03-29 2
# 5 2021-03-29 2
# 6 2021-03-30 3
# 7 2021-03-30 3
# 8 2021-03-30 3
# 9 2021-03-30 4
# 10 2021-03-30 4

library(data.table)
dat_dt <- as.data.table(dat)
dat_dt[,target_id := rleid(date, id)]

# date id target_id
# 1: 2021-03-29 1 1
# 2: 2021-03-29 1 1
# 3: 2021-03-29 2 2
# 4: 2021-03-29 2 2
# 5: 2021-03-29 2 2
# 6: 2021-03-30 3 3
# 7: 2021-03-30 3 3
# 8: 2021-03-30 3 3
# 9: 2021-03-30 4 4
#10: 2021-03-30 4 4

And here's how you could use %>% and mutate to solve it:

library(tidyverse)
dat %>%
mutate(target_id = data.table::rleid(date, id))

How to take 2 columns to generate sequence of length N and add as columns in R?

One way using apply :

cbind(df, t(apply(df, 1, function(x) x[1]:x[2])))

# mn mx V1 V2 V3 V4
#1: 1 4 1 2 3 4
#2: 2 5 2 3 4 5
#3: 3 6 3 4 5 6

Or mapply :

cbind(df, t(mapply(`:`, df$mn, df$mx)))

Rows sequence by group using two columns

library(data.table)
library(dplyr)

data %>%
group_by(ID) %>%
mutate(Result2 = rleid(Value))

This gives us:

     ID Value Result Result2
<dbl> <dbl> <dbl> <int>
1 1 1 1 1
2 1 1 1 1
3 1 0 2 2
4 1 1 3 3
5 1 0 4 4
6 1 1 5 5
7 1 1 5 5
8 2 1 1 1
9 2 0 2 2
10 2 0 2 2
11 2 1 3 3
12 3 0 1 1
13 3 0 1 1
14 3 0 1 1

Create sequence of values based on multiple column values in R

Since V1 should always be increasing, we can take first value of V1 as reference and subtract all the values from 2nd row by this first_value and take the one which gives the minimum difference. Since, we also want to consider priority one way is to multiply the difference by incremental number. In this example, I have just multiplied it by integers 1, 2 and 3. So the first difference is multiplied by 1, second by 2 and so on. More complex methods can be thought of to assign priority if some edge case are found.

first_value <- search_result$V1[1]
search_result$ordered <- c(first_value, apply(search_result[-1, ], 1, function(x) {
x <- x[x > first_value]
x[which.min((x - first_value) * seq_along(x))]
}))

search_result
# V1 V2 V3 ordered
#1 1350 1351 1349 1350
#2 1390 1391 1389 1390
#3 1411 1410 1940 1411
#4 1437 1438 1913 1437
#5 1444 1907 1445 1444
#6 1895 1456 1894 1456
#7 1895 1456 1894 1456
#8 1467 1466 1884 1467
#9 1478 1477 1479 1478
#10 1500 1499 1501 1500

This also works for the second dataset, consider it as df

first_value <- df$V1[1]
df$ordered <- c(first_value, apply(df[-1, ], 1, function(x) {
x <- x[x > first_value]
x[which.min((x - first_value) * seq_along(x))]
}))

df
# V1 V2 V3 ordered
#1 1881 1470 1880 1881
#2 1457 1893 1894 1893
#3 1907 1444 1906 1907
#4 1442 1443 1908 1908
#5 1433 1918 1432 1918
#6 1402 1949 1401 1949
#7 1968 1969 1967 1968
#8 1985 1986 1984 1985
#9 1992 1993 1991 1992

Generate sequence within sub group in data.table

We group by 'id1', sort the 'val' and then create 'grp' as row_number()

input %>%
group_by(id1) %>%
mutate(val = sort(val), grp= row_number())

Or another option is to arrange

input %>%
arrange(id1, val) %>%
group_by(id1) %>%
mutate(grp = row_number())

Or using data.table

library(data.table)
setDT(input)[, c("grp", "val") := .(seq_len(.N), sort(val)), by = id1]
input
# id1 val grp
#1: 1 1 1
#2: 1 2 2
#3: 1 3 3
#4: 1 4 4
#5: 2 3 1
#6: 2 4 2
#7: 2 5 3

If we need to sort as well, use setorder based on the 'id1' and 'val' to order in place, then create the 'grp' as the rowid of 'id1'

input <- data.frame("id1"=c(1,1,1,1,2,2,2),val=c(2,3,4,1,4,3,5), 
achar=c('a','a','b','b','d','c','e'))
setorder(setDT(input), id1, val)[, grp := rowid(id1)][]
# id1 val achar grp
#1: 1 1 b 1
#2: 1 2 a 2
#3: 1 3 a 3
#4: 1 4 b 4
#5: 2 3 c 1
#6: 2 4 d 2
#7: 2 5 e 3

Sequence a column based on two other columns with a restarting sequence

1) dplyr Create a factor and extract its levels:

library(dplyr)
df %>%
arrange(name, year) %>%
group_by(name) %>%
mutate(Year_id = as.numeric(factor(year))) %>%
ungroup()

giving:

# A tibble: 10 x 3
name year Year_id
<chr> <int> <dbl>
1 A 2000 1
2 A 2000 1
3 A 2000 1
4 A 2001 2
5 A 2001 2
6 B 2000 1
7 B 2000 1
8 B 2001 2
9 B 2001 2
10 B 2001 2

1a) The mutate could alternately be written as mutate(Year_id = match(year, unique(year))) as per @nicola's comment.

2) no packages Without package it could be written:

o <- with(df, order(name, year))
transform(df[o, ], Year_id = ave(year, name, FUN = function(x) as.numeric(factor(x))))

or using match.



Related Topics



Leave a reply



Submit