R creating a sequence table from two columns
With base R, you could do something like:
data.frame(product=rep(dd$product, dd$max-dd$min+1),
mm=unlist(mapply(seq, dd$min, dd$max)))
# product mm
# 1 a 1
# 2 b 5
# 3 b 6
# 4 b 7
# 5 c 3
# 6 c 4
# 7 c 5
# 8 c 6
# 9 c 7
How to create a sequence numerical column based on two columns in r?
This is a perfect use case for the rleid
function from the data.table
package:
# example data
xx <- rep(Sys.Date(), 5)
xx <- c(xx, xx + lubridate::days(1))
id <- rep(c(1:4), c(2,3,3,2))
dat <- data.frame(date = xx, id = id)
# date id
# 1 2021-03-29 1
# 2 2021-03-29 1
# 3 2021-03-29 2
# 4 2021-03-29 2
# 5 2021-03-29 2
# 6 2021-03-30 3
# 7 2021-03-30 3
# 8 2021-03-30 3
# 9 2021-03-30 4
# 10 2021-03-30 4
library(data.table)
dat_dt <- as.data.table(dat)
dat_dt[,target_id := rleid(date, id)]
# date id target_id
# 1: 2021-03-29 1 1
# 2: 2021-03-29 1 1
# 3: 2021-03-29 2 2
# 4: 2021-03-29 2 2
# 5: 2021-03-29 2 2
# 6: 2021-03-30 3 3
# 7: 2021-03-30 3 3
# 8: 2021-03-30 3 3
# 9: 2021-03-30 4 4
#10: 2021-03-30 4 4
And here's how you could use %>%
and mutate
to solve it:
library(tidyverse)
dat %>%
mutate(target_id = data.table::rleid(date, id))
How to take 2 columns to generate sequence of length N and add as columns in R?
One way using apply
:
cbind(df, t(apply(df, 1, function(x) x[1]:x[2])))
# mn mx V1 V2 V3 V4
#1: 1 4 1 2 3 4
#2: 2 5 2 3 4 5
#3: 3 6 3 4 5 6
Or mapply
:
cbind(df, t(mapply(`:`, df$mn, df$mx)))
Rows sequence by group using two columns
library(data.table)
library(dplyr)
data %>%
group_by(ID) %>%
mutate(Result2 = rleid(Value))
This gives us:
ID Value Result Result2
<dbl> <dbl> <dbl> <int>
1 1 1 1 1
2 1 1 1 1
3 1 0 2 2
4 1 1 3 3
5 1 0 4 4
6 1 1 5 5
7 1 1 5 5
8 2 1 1 1
9 2 0 2 2
10 2 0 2 2
11 2 1 3 3
12 3 0 1 1
13 3 0 1 1
14 3 0 1 1
Create sequence of values based on multiple column values in R
Since V1
should always be increasing, we can take first value of V1
as reference and subtract all the values from 2nd row by this first_value
and take the one which gives the minimum difference. Since, we also want to consider priority one way is to multiply the difference by incremental number. In this example, I have just multiplied it by integers 1, 2 and 3. So the first difference is multiplied by 1, second by 2 and so on. More complex methods can be thought of to assign priority if some edge case are found.
first_value <- search_result$V1[1]
search_result$ordered <- c(first_value, apply(search_result[-1, ], 1, function(x) {
x <- x[x > first_value]
x[which.min((x - first_value) * seq_along(x))]
}))
search_result
# V1 V2 V3 ordered
#1 1350 1351 1349 1350
#2 1390 1391 1389 1390
#3 1411 1410 1940 1411
#4 1437 1438 1913 1437
#5 1444 1907 1445 1444
#6 1895 1456 1894 1456
#7 1895 1456 1894 1456
#8 1467 1466 1884 1467
#9 1478 1477 1479 1478
#10 1500 1499 1501 1500
This also works for the second dataset, consider it as df
first_value <- df$V1[1]
df$ordered <- c(first_value, apply(df[-1, ], 1, function(x) {
x <- x[x > first_value]
x[which.min((x - first_value) * seq_along(x))]
}))
df
# V1 V2 V3 ordered
#1 1881 1470 1880 1881
#2 1457 1893 1894 1893
#3 1907 1444 1906 1907
#4 1442 1443 1908 1908
#5 1433 1918 1432 1918
#6 1402 1949 1401 1949
#7 1968 1969 1967 1968
#8 1985 1986 1984 1985
#9 1992 1993 1991 1992
Generate sequence within sub group in data.table
We group by 'id1', sort
the 'val' and then create 'grp' as row_number()
input %>%
group_by(id1) %>%
mutate(val = sort(val), grp= row_number())
Or another option is to arrange
input %>%
arrange(id1, val) %>%
group_by(id1) %>%
mutate(grp = row_number())
Or using data.table
library(data.table)
setDT(input)[, c("grp", "val") := .(seq_len(.N), sort(val)), by = id1]
input
# id1 val grp
#1: 1 1 1
#2: 1 2 2
#3: 1 3 3
#4: 1 4 4
#5: 2 3 1
#6: 2 4 2
#7: 2 5 3
If we need to sort as well, use setorder
based on the 'id1' and 'val' to order in place, then create the 'grp' as the rowid
of 'id1'
input <- data.frame("id1"=c(1,1,1,1,2,2,2),val=c(2,3,4,1,4,3,5),
achar=c('a','a','b','b','d','c','e'))
setorder(setDT(input), id1, val)[, grp := rowid(id1)][]
# id1 val achar grp
#1: 1 1 b 1
#2: 1 2 a 2
#3: 1 3 a 3
#4: 1 4 b 4
#5: 2 3 c 1
#6: 2 4 d 2
#7: 2 5 e 3
Sequence a column based on two other columns with a restarting sequence
1) dplyr Create a factor and extract its levels:
library(dplyr)
df %>%
arrange(name, year) %>%
group_by(name) %>%
mutate(Year_id = as.numeric(factor(year))) %>%
ungroup()
giving:
# A tibble: 10 x 3
name year Year_id
<chr> <int> <dbl>
1 A 2000 1
2 A 2000 1
3 A 2000 1
4 A 2001 2
5 A 2001 2
6 B 2000 1
7 B 2000 1
8 B 2001 2
9 B 2001 2
10 B 2001 2
1a) The mutate
could alternately be written as mutate(Year_id = match(year, unique(year)))
as per @nicola's comment.
2) no packages Without package it could be written:
o <- with(df, order(name, year))
transform(df[o, ], Year_id = ave(year, name, FUN = function(x) as.numeric(factor(x))))
or using match
.
Related Topics
Here We Go Again: Append an Element to a List in R
Use Superscripts in R Axis Labels
How to Break Out of a Foreach Loop
Where Should I Put Data for Automated Tests with Testthat
Group by in R, Ddply with Weighted.Mean
Plots with Good Resolution for Printing and Screen Display
Load a Small Random Sample from a Large CSV File into R Data Frame
How to Make a Ggplot2 Contour Plot Analogue to Lattice:Filled.Contour()
When Using Ggplot in R, How to Remove Margins Surrounding the Plot Area
Geom_Col Is Assigning the Wrong Independent Variable
Coding Practice in R:What Are the Advantages and Disadvantages of Different Styles
How to Change X-Axis Tick Label Names, Order and Boxplot Colour Using R Ggplot
Differencebetween Names and Colnames