## Expand ranges defined by from and to columns

You can use the `plyr`

package:

`library(plyr)`

ddply(presidents, "name", summarise, year = seq(from, to))

# name year

# 1 Barack Obama 2009

# 2 Barack Obama 2010

# 3 Barack Obama 2011

# 4 Barack Obama 2012

# 5 Bill Clinton 1993

# 6 Bill Clinton 1994

# [...]

and if it is important that the data be sorted by year, you can use the `arrange`

function:

`df <- ddply(presidents, "name", summarise, year = seq(from, to))`

arrange(df, df$year)

# name year

# 1 Bill Clinton 1993

# 2 Bill Clinton 1994

# 3 Bill Clinton 1995

# [...]

# 21 Barack Obama 2011

# 22 Barack Obama 2012

Edit 1: Following's @edgester's "Update 1", a more appropriate approach is to use `adply`

to account for presidents with non-consecutive terms:

`adply(foo, 1, summarise, year = seq(from, to))[c("name", "year")]`

## expand a data frame to have as many rows as range of two columns in original row

With `dplyr`

, we can use `rowwise`

with `do`

`library(dplyr)`

df1 %>%

rowwise() %>%

do(data.frame(symbol= .$symbol, value = .$start:.$end)) %>%

arrange(symbol)

# A tibble: 30 x 2

# symbol value

# <chr> <int>

# 1 a 7

# 2 a 8

# 3 a 9

# 4 a 10

# 5 a 11

# 6 i 8

# 7 i 9

# 8 i 10

# 9 i 11

#10 i 12

# ... with 20 more rows

## Expand number range to the individual numbers

I have a `data.table`

solution in mind.

I made the hypothesis that your `label`

var is unique by observation. Otherwise, you should use a row number to group your data.

`library(data.table)`

df <- data.frame(start = c(10, 20), end = c(15,33), label = c('ex1','ex2'))

setDT(df)

df[, seq(.SD[['start']], .SD[['end']]), by = label]

label V1

1: ex1 10

2: ex1 11

3: ex1 12

4: ex1 13

5: ex1 14

6: ex1 15

7: ex2 20

8: ex2 21

9: ex2 22

10: ex2 23

11: ex2 24

12: ex2 25

13: ex2 26

14: ex2 27

15: ex2 28

16: ex2 29

17: ex2 30

18: ex2 31

19: ex2 32

20: ex2 33

In terms of efficiency, it might be hard to find a solution faster than `data.table`

that is designed to that end.

If you can't use `label`

as a unique identifier, you can do

`df[,'rn' := seq(.N)]`

df[, seq(.SD[['start']], .SD[['end']]), by = c('rn','label')]

rn label V1

1: 1 ex1 10

2: 1 ex1 11

3: 1 ex1 12

4: 1 ex1 13

5: 1 ex1 14

6: 1 ex1 15

7: 2 ex2 20

8: 2 ex2 21

9: 2 ex2 22

10: 2 ex2 23

11: 2 ex2 24

12: 2 ex2 25

13: 2 ex2 26

14: 2 ex2 27

15: 2 ex2 28

16: 2 ex2 29

17: 2 ex2 30

18: 2 ex2 31

19: 2 ex2 32

20: 2 ex2 33

and you can drop the intermediate row number using `df[,'rn' := NULL]`

### Efficiency

`data.table`

brings a good speedup (does not matter that much if you use one or two columns to group in this example)

`Unit: microseconds`

expr min lq mean median uq

df %>% rowwise() %>% do(f(.)) 1549.408 1808.669 2309.332 2292.525 2555.888

df[, seq(.SD[["start"]], .SD[["end"]]), by = "label"] 1011.608 1302.249 1555.808 1490.542 1779.543

df[, seq(.SD[["start"]], .SD[["end"]]), by = c("label", "rn")] 968.124 1095.703 1387.556 1253.023 1592.483

max neval cld

7141.964 100 b

3061.487 100 a

2953.598 100 a

If you want to go even faster, you can set a key (`?setkeyv`

). If your dataframe is of significant size, this might bring huge performance gains (in this small example it won't)

## Expand range of dates by another column by inserting rows in R

Here's a very pedestrian way of doing it:

`do.call(rbind, lapply(split(df, seq_along(df$idnum)), function(x) { `

if(x$between[1] == x$end[1]) return(x)

x <- x[c(1, 1),]

x$end[1] <- x$between[1]

x$start[2] <- x$between[1] + 1

x$between[2] <- NA

x}))

#> idnum var start end between

#> 1.1 17 A 1993-03-01 1993-03-01 1993-03-01

#> 1.1.1 17 A 1993-03-02 1993-03-12 <NA>

#> 2.2 17 B 1993-01-02 1993-04-03 1993-04-03

#> 2.2.1 17 B 1993-04-04 1993-04-09 <NA>

#> 3 20 A 1993-02-01 1993-02-01 1993-02-01

#> 4.4 21 C 1993-05-09 1993-07-10 1993-07-10

#> 4.4.1 21 C 1993-07-11 1993-07-12 <NA>

^{Created on 2020-07-26 by the reprex package (v0.3.0)}

## Expand data set to fill in with sequential values in R

We can get the rowwise sequence from 'Score2_Min' to 'Score2_Max' with `map2`

in a `list`

column and then `unnest`

the `list`

column

`library(dplyr)`

library(tidyr)

library(purrr)

data %>%

transmute(Score1, Score2 = map2(Score2_Min, Score2_Max, `:`)) %>%

unnest(Score2)

# A tibble: 14 x 2

# Score1 Score2

# <dbl> <int>

# 1 286 108

# 2 286 109

# 3 286 110

# 4 286 111

# 5 287 112

# 6 287 113

# 7 288 112

# 8 288 113

# 9 289 112

#10 289 113

#11 290 112

#12 290 113

#13 291 112

#14 291 113

## Split a column consisting of number range and use the resulting numbers as range values in R

We can split the 'Speed' into two column with `separate`

, then create a sequence `list`

column based on the values of 'start', 'end' and `unnest`

the `list`

column

`library(dplyr)`

library(tidyr)

library(purrr)

df1 %>%

separate(Speed, into = c('start', 'end'), remove = FALSE, convert = TRUE) %>%

mutate(AcutalSpeed = map2(start, end, `:`), start = NULL, end = NULL) %>%

unnest(c(AcutalSpeed))

# A tibble: 101 x 3

# Speed SpeedLevel AcutalSpeed

# <chr> <dbl> <int>

# 1 0-20 1 0

# 2 0-20 1 1

# 3 0-20 1 2

# 4 0-20 1 3

# 5 0-20 1 4

# 6 0-20 1 5

# 7 0-20 1 6

# 8 0-20 1 7

# 9 0-20 1 8

#10 0-20 1 9

# … with 91 more rows

