Repeat Rows in Data Frame n Times
Use a combination of pd.DataFrame.loc
and pd.Index.repeat
test.loc[test.index.repeat(test.times)]
id times
0 a 2
0 a 2
1 b 3
1 b 3
1 b 3
2 c 1
3 d 5
3 d 5
3 d 5
3 d 5
3 d 5
To mimic your exact output, use reset_index
test.loc[test.index.repeat(test.times)].reset_index(drop=True)
id times
0 a 2
1 a 2
2 b 3
3 b 3
4 b 3
5 c 1
6 d 5
7 d 5
8 d 5
9 d 5
10 d 5
replicate rows by n times in python
Another method could be:
df.assign(Times = df.Times.apply(lambda x: range(1, x + 1))).explode('Times')
Out[]:
String Times
0 a 1
0 a 2
1 b 1
1 b 2
1 b 3
2 c 1
2 c 2
2 c 3
2 c 4
2 c 5
How can I replicate rows of a Pandas DataFrame?
Solutions:Use np.repeat
:
Version 1:
np.repeat
:Try using np.repeat
:
newdf = pd.DataFrame(np.repeat(df.values, 3, axis=0))
newdf.columns = df.columns
print(newdf)
The above code will output:
Person ID ZipCode Gender
0 12345 882 38182 Female
1 12345 882 38182 Female
2 12345 882 38182 Female
3 32917 271 88172 Male
4 32917 271 88172 Male
5 32917 271 88172 Male
6 18273 552 90291 Female
7 18273 552 90291 Female
8 18273 552 90291 Female
np.repeat
repeats the values of df
, 3
times.
Then we add the columns with assigning new_df.columns = df.columns
.
Version 2:
You could also assign the column names in the first line, like below:
newdf = pd.DataFrame(np.repeat(df.values, 3, axis=0), columns=df.columns)
print(newdf)
The above code will also output:
Person ID ZipCode Gender
0 12345 882 38182 Female
1 12345 882 38182 Female
2 12345 882 38182 Female
3 32917 271 88172 Male
4 32917 271 88172 Male
5 32917 271 88172 Male
6 18273 552 90291 Female
7 18273 552 90291 Female
8 18273 552 90291 Female
Version 3:
You could shorten it with loc
and only repeat the index, like below:
newdf = df.loc[np.repeat(df.index, 3)].reset_index(drop=True)
print(newdf)
The above code will also output:
Person ID ZipCode Gender
0 12345 882 38182 Female
1 12345 882 38182 Female
2 12345 882 38182 Female
3 32917 271 88172 Male
4 32917 271 88172 Male
5 32917 271 88172 Male
6 18273 552 90291 Female
7 18273 552 90291 Female
8 18273 552 90291 Female
I use reset_index
to replace the index with monotonic indexes (0, 1, 2, 3, 4...
).
Without np.repeat
:
Version 4:
You could use the built-in pd.DataFrame.index.repeat
function, like the below:
newdf = df.loc[df.index.repeat(3)].reset_index(drop=True)
print(newdf)
The above code will also output:
Person ID ZipCode Gender
0 12345 882 38182 Female
1 12345 882 38182 Female
2 12345 882 38182 Female
3 32917 271 88172 Male
4 32917 271 88172 Male
5 32917 271 88172 Male
6 18273 552 90291 Female
7 18273 552 90291 Female
8 18273 552 90291 Female
Remember to add reset_index
to line-up the index
.
Version 5:
Or by using concat
with sort_index
, like below:
newdf = pd.concat([df] * 3).sort_index().reset_index(drop=True)
print(newdf)
The above code will also output:
Person ID ZipCode Gender
0 12345 882 38182 Female
1 12345 882 38182 Female
2 12345 882 38182 Female
3 32917 271 88172 Male
4 32917 271 88172 Male
5 32917 271 88172 Male
6 18273 552 90291 Female
7 18273 552 90291 Female
8 18273 552 90291 Female
Version 6:
You could also use loc
with Python list
multiplication and sorted
, like below:
newdf = df.loc[sorted([*df.index] * 3)].reset_index(drop=True)
print(newdf)
The above code will also output:
Person ID ZipCode Gender
0 12345 882 38182 Female
1 12345 882 38182 Female
2 12345 882 38182 Female
3 32917 271 88172 Male
4 32917 271 88172 Male
5 32917 271 88172 Male
6 18273 552 90291 Female
7 18273 552 90291 Female
8 18273 552 90291 Female
Timings:
Timing with the following code:
import timeit
import pandas as pd
import numpy as np
df = pd.DataFrame({'Person': {0: 12345, 1: 32917, 2: 18273}, 'ID': {0: 882, 1: 271, 2: 552}, 'ZipCode': {0: 38182, 1: 88172, 2: 90291}, 'Gender': {0: 'Female', 1: 'Male', 2: 'Female'}})
def version1():
newdf = pd.DataFrame(np.repeat(df.values, 3, axis=0))
newdf.columns = df.columns
def version2():
newdf = pd.DataFrame(np.repeat(df.values, 3, axis=0), columns=df.columns)
def version3():
newdf = df.loc[np.repeat(df.index, 3)].reset_index(drop=True)
def version4():
newdf = df.loc[df.index.repeat(3)].reset_index(drop=True)
def version5():
newdf = pd.concat([df] * 3).sort_index().reset_index(drop=True)
def version6():
newdf = df.loc[sorted([*df.index] * 3)].reset_index(drop=True)
print('Version 1 Speed:', timeit.timeit('version1()', 'from __main__ import version1', number=20000))
print('Version 2 Speed:', timeit.timeit('version2()', 'from __main__ import version2', number=20000))
print('Version 3 Speed:', timeit.timeit('version3()', 'from __main__ import version3', number=20000))
print('Version 4 Speed:', timeit.timeit('version4()', 'from __main__ import version4', number=20000))
print('Version 5 Speed:', timeit.timeit('version5()', 'from __main__ import version5', number=20000))
print('Version 6 Speed:', timeit.timeit('version6()', 'from __main__ import version6', number=20000))
Output:
Version 1 Speed: 9.879425965991686
Version 2 Speed: 7.752138633004506
Version 3 Speed: 7.078321029010112
Version 4 Speed: 8.01169377300539
Version 5 Speed: 19.853051771002356
Version 6 Speed: 9.801617017001263
We can see that Versions 2 & 3 are faster than the others, the reason for this is because they both use the np.repeat
function, and numpy
functions are very fast because they are implemented with C.
Version 3 wins against Version 2 marginally due to the usage of loc
instead of DataFrame
.
Version 5 is significantly slower because of the functions concat
and sort_index
, since concat
copies DataFrame
s quadratically, which takes longer time.
Fastest Version: Version 3.
Repeat rows of a data.frame
df <- data.frame(a = 1:2, b = letters[1:2])
df[rep(seq_len(nrow(df)), each = 2), ]
Repeat rows of a data.frame N times
EDIT: updated to a better modern R answer.
You can use replicate()
, then rbind
the result back together. The rownames are automatically altered to run from 1:nrows.
d <- data.frame(a = c(1,2,3),b = c(1,2,3))
n <- 3
do.call("rbind", replicate(n, d, simplify = FALSE))
A more traditional way is to use indexing, but here the rowname altering is not quite so neat (but more informative):
d[rep(seq_len(nrow(d)), n), ]
Here are improvements on the above, the first two using purrr
functional programming, idiomatic purrr:
purrr::map_dfr(seq_len(3), ~d)
and less idiomatic purrr (identical result, though more awkward):
purrr::map_dfr(seq_len(3), function(x) d)
and finally via indexing rather than list apply using dplyr
:
d %>% slice(rep(row_number(), 3))
Pandas: repeat dataframe n times
Use:
N = 3
df = pd.concat([df] * N, ignore_index=True)
print (df)
col
0 0
1 60
2 300
3 320
4 0
5 60
6 300
7 320
8 0
9 60
10 300
11 320
Dataframe groupby certain column and repeat the row n times
You can use GroupBy.apply
per date, and pandas.concat
:
N = 2
out = (df_input
.groupby(['date'], group_keys=False)
.apply(lambda d: pd.concat([d]*N))
)
output:
date type value
0 01/01 1 10
1 01/01 2 5
0 01/01 1 10
1 01/01 2 5
2 01/02 1 9
3 01/02 2 7
2 01/02 1 9
3 01/02 2 7
With "repeat" column:
N = 2
out = (df_input
.groupby(['date'], group_keys=False)
.apply(lambda d: pd.concat([d.assign(repeat=n+1) for n in range(N)]))
)
output:
date type value repeat
0 01/01 1 10 1
1 01/01 2 5 1
0 01/01 1 10 2
1 01/01 2 5 2
2 01/02 1 9 1
3 01/02 2 7 1
2 01/02 1 9 2
3 01/02 2 7 2
How do I repeat the last row of a data frame n times, while changing 1 or 2 variables?
You can repeat the last row number n times, and add the seq(n)
on Age to increase it by 1, i.e.
rbind(df, transform(df[rep(nrow(df), 3),], Age = Age + seq(3), Year = Year + seq(3)))
# Year Age x y
#1 2000 0 1 0.3
#2 2001 1 2 0.7
#3 2002 2 3 0.5
#31 2003 3 3 0.5
#3.1 2004 4 3 0.5
#3.2 2005 5 3 0.5
Related Topics
Most Efficient Way to Search the Last X Lines of a File
How to Generate Random Numbers That Are Different
Rect Collision with List of Rects
How to Check Whether a Variable Is a Class or Not
How to Print Utf-8 Encoded Text to the Console in Python < 3
Finding Elements Not in a List
Python: Use MySQLdb to Import a MySQL Table as a Dictionary
Distributing My Python Scripts as Jar Files with Jython
Pandas Read CSV File with Float Values Results in Weird Rounding and Decimal Digits
Unicodeencodeerror: 'Ascii' Codec Can't Encode Character '\Xe9' - -When Using Urlib.Request Python3
Remove Duplicate Rows from Pandas Dataframe Where Only Some Columns Have the Same Value
Changing Order of Unit Tests in Python
Resampling a Numpy Array Representing an Image
How to Remove Leading and Trailing Zeros in a String? Python
How to Add Hours to Current Time in Python