Repeating Rows of Data.Frame in Dplyr

Repeating rows of data.frame in dplyr

This is rife with peril if the data.frame has other columns (there, I said it!), but the do block will allow you to generate a derived data.frame within a dplyr pipe (though, ceci n'est pas un pipe):

library(dplyr)
df <- data.frame(column = letters[1:4], stringsAsFactors = FALSE)
df %>%
do( data.frame(column = rep(.$column, each = 4), stringsAsFactors = FALSE) )
# column
# 1 a
# 2 a
# 3 a
# 4 a
# 5 b
# 6 b
# 7 b
# 8 b
# 9 c
# 10 c
# 11 c
# 12 c
# 13 d
# 14 d
# 15 d
# 16 d

As @Frank suggested, a much better alternative could be

df %>% slice(rep(1:n(), each=4))

R dplyr repeat dataframe rows by group

Give only one value to times argument in rep. Since you want to do this by group you can use any value from ntimes column.

library(dplyr)
df %>% group_by(my.group) %>% slice(rep(1:n(), first(ntimes)))
#Similar other variations could be
#df %>% group_by(my.group) %>% slice(rep(seq_len(n()), first(ntimes)))
#df %>% group_by(my.group) %>% slice(rep(seq_along(ntimes), first(ntimes)))

# my.group vals ntimes
# <fct> <dbl> <int>
# 1 a 0.110 3
# 2 a 0.273 3
# 3 a 0.491 3
# 4 a 0.110 3
# 5 a 0.273 3
# 6 a 0.491 3
# 7 a 0.110 3
# 8 a 0.273 3
# 9 a 0.491 3
#10 b 0.318 1
#11 b 0.559 1
#12 b 0.263 1
#13 z 0.202 2
#14 z 0.388 2
#15 z 0.888 2
#16 z 0.202 2
#17 z 0.388 2
#18 z 0.888 2

Doing this in base R is surprisingly convulated or maybe there is a way which I can't figure out

df[unlist(Map(rep, split(1:nrow(df), df$my.group), 
tapply(df$ntimes, df$my.group, head, 1))), ]

data

df <- structure(list(my.group = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 
3L, 3L, 3L), .Label = c("a", "b", "z"), class = "factor"), vals = c(0.110453,
0.2732849, 0.4905132, 0.318404, 0.5591728, 0.2625931, 0.2018752,
0.3875257, 0.8878698), ntimes = c(3L, 3L, 3L, 1L, 1L, 1L, 2L,
2L, 2L)), class = "data.frame", row.names = c("1", "2", "3",
"4", "5", "6", "7", "8", "9"))

R: Repeating row of dataframe with respect to multiple count columns

Here is a tidyverse option. We can use uncount from tidyr to duplicate the rows according to the count in value (i.e., from the var columns) after pivoting to long format.

library(tidyverse)

df %>%
pivot_longer(starts_with("var"), names_to = "class") %>%
filter(value != 0) %>%
uncount(value) %>%
mutate(class = str_extract(class, "\\d+"))

Output

  f1    f2    class
<chr> <chr> <chr>
1 a c 1
2 a c 3
3 a c 3
4 a c 3
5 b d 1
6 b d 2
7 b d 2

Another slight variation is to use expandrows from splitstackshape in conjunction with tidyverse.

library(splitstackshape)

df %>%
pivot_longer(starts_with("var"), names_to = "class") %>%
filter(value != 0) %>%
expandRows("value") %>%
mutate(class = str_extract(class, "\\d+"))

R: Create duplicate rows based on a variable (dplyr preferred)

A nice tidyr function for this is uncount():

df %>%
uncount(sales) %>%
rename(salesTime = time)

salesTime
1 0
2 1
3 2
3.1 2
4 3
5 4
6 5
6.1 5
6.2 5

Repeat/duplicate specific row of data frame and append

You could select the row that you want to duplicate and add it to original dataframe :

library(dplyr)

var1_variable <- 'A'
df %>%
filter(var1 == var1_variable) %>%
slice_max(var2, n = 1) %>%
#For dplyr < 1.0.0
#slice(which.max(var2)) %>%
bind_rows(df, .)

# var1 var2 val
#1 A 1 21
#2 A 2 31
#3 A 3 54
#4 B 4 65
#5 B 5 76
#6 A 3 54

In base R, that can be done as :

df1 <- subset(df, var1 == var1_variable)
rbind(df, df1[which.max(df1$var2), ])

From this post we can save the previous work in a temporary variable and then bind rows so that we don't break the chain and don't bind the original dataframe df.

df %>%
#Previous list of commands
{
{. -> temp} %>%
filter(var1 == var1_variable) %>%
slice_max(var2, n = 1) %>%
bind_rows(temp)
}

Add column but duplicate all other row values in dataframe in R

You can repeat each row index 24 times and then assign new hour column from 1 to 24 using recycling techinique.

newdata <- mydata[rep(seq_len(nrow(mydata)), each = 24),]
newdata$hour <- 1:24

Couple of tidyverse options :

library(dplyr)
mydata %>% tidyr::uncount(24) %>% group_by(Day) %>% mutate(hour = 1:24)

and

mydata %>% group_by(Day) %>% slice(rep(row_number(), 24)) %>% mutate(hour = 1:24)

Repeat rows with specific value

You can create a new column specifying number of times a row should be repeated and then use uncount to repeat them.

library(dplyr)
library(tidyr)

df %>%
mutate(repeat_row = ifelse(name1 %in% c('x', 'y'), 2, 1)) %>%
uncount(repeat_row)

# name1 name2
#1 x 0
#2 x 0
#3 y 1
#4 y 1
#5 z 2

Repeat rows of a data.frame N times

EDIT: updated to a better modern R answer.

You can use replicate(), then rbind the result back together. The rownames are automatically altered to run from 1:nrows.

d <- data.frame(a = c(1,2,3),b = c(1,2,3))
n <- 3
do.call("rbind", replicate(n, d, simplify = FALSE))

A more traditional way is to use indexing, but here the rowname altering is not quite so neat (but more informative):

 d[rep(seq_len(nrow(d)), n), ]

Here are improvements on the above, the first two using purrr functional programming, idiomatic purrr:

purrr::map_dfr(seq_len(3), ~d)

and less idiomatic purrr (identical result, though more awkward):

purrr::map_dfr(seq_len(3), function(x) d)

and finally via indexing rather than list apply using dplyr:

d %>% slice(rep(row_number(), 3))

Repeat rows of a data.frame

df <- data.frame(a = 1:2, b = letters[1:2]) 
df[rep(seq_len(nrow(df)), each = 2), ]


Related Topics



Leave a reply



Submit