Repeat Rows in Data Frame n Times

Use a combination of pd.DataFrame.loc and pd.Index.repeat

test.loc[test.index.repeat(test.times)]

  id  times
0  a      2
0  a      2
1  b      3
1  b      3
1  b      3
2  c      1
3  d      5
3  d      5
3  d      5
3  d      5
3  d      5

To mimic your exact output, use reset_index

test.loc[test.index.repeat(test.times)].reset_index(drop=True)

   id  times
0   a      2
1   a      2
2   b      3
3   b      3
4   b      3
5   c      1
6   d      5
7   d      5
8   d      5
9   d      5
10  d      5

replicate rows by n times in python

Another method could be:

df.assign(Times = df.Times.apply(lambda x: range(1, x + 1))).explode('Times')
Out[]: 
  String Times
0      a     1
0      a     2
1      b     1
1      b     2
1      b     3
2      c     1
2      c     2
2      c     3
2      c     4
2      c     5

How can I replicate rows of a Pandas DataFrame?

Solutions:

Use `np.repeat`:

Version 1:

Try using np.repeat:

newdf = pd.DataFrame(np.repeat(df.values, 3, axis=0))
newdf.columns = df.columns
print(newdf)

The above code will output:

  Person   ID ZipCode  Gender
0  12345  882   38182  Female
1  12345  882   38182  Female
2  12345  882   38182  Female
3  32917  271   88172    Male
4  32917  271   88172    Male
5  32917  271   88172    Male
6  18273  552   90291  Female
7  18273  552   90291  Female
8  18273  552   90291  Female

np.repeat repeats the values of df, 3 times.

Then we add the columns with assigning new_df.columns = df.columns.

Version 2:

You could also assign the column names in the first line, like below:

newdf = pd.DataFrame(np.repeat(df.values, 3, axis=0), columns=df.columns)
print(newdf)

The above code will also output:

  Person   ID ZipCode  Gender
0  12345  882   38182  Female
1  12345  882   38182  Female
2  12345  882   38182  Female
3  32917  271   88172    Male
4  32917  271   88172    Male
5  32917  271   88172    Male
6  18273  552   90291  Female
7  18273  552   90291  Female
8  18273  552   90291  Female

Version 3:

You could shorten it with loc and only repeat the index, like below:

newdf = df.loc[np.repeat(df.index, 3)].reset_index(drop=True)
print(newdf)

The above code will also output:

  Person   ID ZipCode  Gender
0  12345  882   38182  Female
1  12345  882   38182  Female
2  12345  882   38182  Female
3  32917  271   88172    Male
4  32917  271   88172    Male
5  32917  271   88172    Male
6  18273  552   90291  Female
7  18273  552   90291  Female
8  18273  552   90291  Female

I use reset_index to replace the index with monotonic indexes (0, 1, 2, 3, 4...).

Without `np.repeat`:

Version 4:

You could use the built-in pd.DataFrame.index.repeat function, like the below:

newdf = df.loc[df.index.repeat(3)].reset_index(drop=True)
print(newdf)

The above code will also output:

  Person   ID ZipCode  Gender
0  12345  882   38182  Female
1  12345  882   38182  Female
2  12345  882   38182  Female
3  32917  271   88172    Male
4  32917  271   88172    Male
5  32917  271   88172    Male
6  18273  552   90291  Female
7  18273  552   90291  Female
8  18273  552   90291  Female

Remember to add reset_index to line-up the index.

Version 5:

Or by using concat with sort_index, like below:

newdf = pd.concat([df] * 3).sort_index().reset_index(drop=True)
print(newdf)

The above code will also output:

  Person   ID ZipCode  Gender
0  12345  882   38182  Female
1  12345  882   38182  Female
2  12345  882   38182  Female
3  32917  271   88172    Male
4  32917  271   88172    Male
5  32917  271   88172    Male
6  18273  552   90291  Female
7  18273  552   90291  Female
8  18273  552   90291  Female

Version 6:

You could also use loc with Python list multiplication and sorted, like below:

newdf = df.loc[sorted([*df.index] * 3)].reset_index(drop=True)
print(newdf)

The above code will also output:

  Person   ID ZipCode  Gender
0  12345  882   38182  Female
1  12345  882   38182  Female
2  12345  882   38182  Female
3  32917  271   88172    Male
4  32917  271   88172    Male
5  32917  271   88172    Male
6  18273  552   90291  Female
7  18273  552   90291  Female
8  18273  552   90291  Female

Timings:

Timing with the following code:

import timeit
import pandas as pd
import numpy as np

df = pd.DataFrame({'Person': {0: 12345, 1: 32917, 2: 18273}, 'ID': {0: 882, 1: 271, 2: 552}, 'ZipCode': {0: 38182, 1: 88172, 2: 90291}, 'Gender': {0: 'Female', 1: 'Male', 2: 'Female'}})

def version1():
    newdf = pd.DataFrame(np.repeat(df.values, 3, axis=0))
    newdf.columns = df.columns
    
def version2():
    newdf = pd.DataFrame(np.repeat(df.values, 3, axis=0), columns=df.columns)

    
def version3():
    newdf = df.loc[np.repeat(df.index, 3)].reset_index(drop=True)

    
def version4():
    newdf = df.loc[df.index.repeat(3)].reset_index(drop=True)

    
def version5():
    newdf = pd.concat([df] * 3).sort_index().reset_index(drop=True)

    
def version6():
    newdf = df.loc[sorted([*df.index] * 3)].reset_index(drop=True)
    
print('Version 1 Speed:', timeit.timeit('version1()', 'from __main__ import version1', number=20000))
print('Version 2 Speed:', timeit.timeit('version2()', 'from __main__ import version2', number=20000))
print('Version 3 Speed:', timeit.timeit('version3()', 'from __main__ import version3', number=20000))
print('Version 4 Speed:', timeit.timeit('version4()', 'from __main__ import version4', number=20000))
print('Version 5 Speed:', timeit.timeit('version5()', 'from __main__ import version5', number=20000))
print('Version 6 Speed:', timeit.timeit('version6()', 'from __main__ import version6', number=20000))

Output:

Version 1 Speed: 9.879425965991686
Version 2 Speed: 7.752138633004506
Version 3 Speed: 7.078321029010112
Version 4 Speed: 8.01169377300539
Version 5 Speed: 19.853051771002356
Version 6 Speed: 9.801617017001263

We can see that Versions 2 & 3 are faster than the others, the reason for this is because they both use the np.repeat function, and numpy functions are very fast because they are implemented with C.

Version 3 wins against Version 2 marginally due to the usage of loc instead of DataFrame.

Version 5 is significantly slower because of the functions concat and sort_index, since concat copies DataFrames quadratically, which takes longer time.

Fastest Version: Version 3.

Repeat rows of a data.frame

df <- data.frame(a = 1:2, b = letters[1:2]) 
df[rep(seq_len(nrow(df)), each = 2), ]

Repeat rows of a data.frame N times

EDIT: updated to a better modern R answer.

You can use replicate(), then rbind the result back together. The rownames are automatically altered to run from 1:nrows.

d <- data.frame(a = c(1,2,3),b = c(1,2,3))
n <- 3
do.call("rbind", replicate(n, d, simplify = FALSE))

A more traditional way is to use indexing, but here the rowname altering is not quite so neat (but more informative):

 d[rep(seq_len(nrow(d)), n), ]

Here are improvements on the above, the first two using purrr functional programming, idiomatic purrr:

purrr::map_dfr(seq_len(3), ~d)

and less idiomatic purrr (identical result, though more awkward):

purrr::map_dfr(seq_len(3), function(x) d)

and finally via indexing rather than list apply using dplyr:

d %>% slice(rep(row_number(), 3))

Pandas: repeat dataframe n times

Use:

N = 3
df = pd.concat([df] * N, ignore_index=True)
print (df)
    col
0     0
1    60
2   300
3   320
4     0
5    60
6   300
7   320
8     0
9    60
10  300
11  320

Dataframe groupby certain column and repeat the row n times

You can use GroupBy.apply per date, and pandas.concat:

N = 2
out = (df_input
      .groupby(['date'], group_keys=False)
      .apply(lambda d: pd.concat([d]*N))
      )

output:

    date type value
0  01/01    1    10
1  01/01    2     5
0  01/01    1    10
1  01/01    2     5
2  01/02    1     9
3  01/02    2     7
2  01/02    1     9
3  01/02    2     7

With "repeat" column:

N = 2
out = (df_input
      .groupby(['date'], group_keys=False)
      .apply(lambda d: pd.concat([d.assign(repeat=n+1) for n in range(N)]))
      )

output:

    date type value  repeat
0  01/01    1    10       1
1  01/01    2     5       1
0  01/01    1    10       2
1  01/01    2     5       2
2  01/02    1     9       1
3  01/02    2     7       1
2  01/02    1     9       2
3  01/02    2     7       2

How do I repeat the last row of a data frame n times, while changing 1 or 2 variables?

You can repeat the last row number n times, and add the seq(n) on Age to increase it by 1, i.e.

rbind(df, transform(df[rep(nrow(df), 3),], Age = Age + seq(3), Year = Year + seq(3)))

#    Year Age x   y
#1   2000   0 1 0.3
#2   2001   1 2 0.7
#3   2002   2 3 0.5
#31  2003   3 3 0.5
#3.1 2004   4 3 0.5
#3.2 2005   5 3 0.5

Repeat Rows in Data Frame N Times

Repeat Rows in Data Frame n Times

replicate rows by n times in python

How can I replicate rows of a Pandas DataFrame?

Solutions:

Use `np.repeat`:

Version 1:

Version 2:

Version 3:

Without `np.repeat`:

Version 4:

Version 5:

Version 6:

Timings:

Fastest Version: Version 3.

Repeat rows of a data.frame

Repeat rows of a data.frame N times

Pandas: repeat dataframe n times

Dataframe groupby certain column and repeat the row n times

How do I repeat the last row of a data frame n times, while changing 1 or 2 variables?

Related Topics

Leave a reply

Repeat Rows in Data Frame n Times

replicate rows by n times in python

How can I replicate rows of a Pandas DataFrame?

Solutions:

Use np.repeat:

Version 1:

Version 2:

Version 3:

Without np.repeat:

Version 4:

Version 5:

Version 6:

Timings:

Fastest Version: Version 3.

Repeat rows of a data.frame

Repeat rows of a data.frame N times

Pandas: repeat dataframe n times

Dataframe groupby certain column and repeat the row n times

How do I repeat the last row of a data frame n times, while changing 1 or 2 variables?

Related Topics

Leave a reply

Use `np.repeat`:

Without `np.repeat`: