Repeat Rows of a Data.Frame

How can I replicate rows in Pandas?

Use np.repeat:

Version 1:

Try using np.repeat:

newdf = pd.DataFrame(np.repeat(df.values, 3, axis=0))
newdf.columns = df.columns
print(newdf)

The above code will output:

  Person   ID ZipCode  Gender
0 12345 882 38182 Female
1 12345 882 38182 Female
2 12345 882 38182 Female
3 32917 271 88172 Male
4 32917 271 88172 Male
5 32917 271 88172 Male
6 18273 552 90291 Female
7 18273 552 90291 Female
8 18273 552 90291 Female

np.repeat repeats the values of df, 3 times.

Then we add the columns with assigning new_df.columns = df.columns.

Version 2:

You could also assign the column names in the first line, like below:

newdf = pd.DataFrame(np.repeat(df.values, 3, axis=0), columns=df.columns)
print(newdf)

The above code will also output:

  Person   ID ZipCode  Gender
0 12345 882 38182 Female
1 12345 882 38182 Female
2 12345 882 38182 Female
3 32917 271 88172 Male
4 32917 271 88172 Male
5 32917 271 88172 Male
6 18273 552 90291 Female
7 18273 552 90291 Female
8 18273 552 90291 Female

Repeat rows of a data.frame

df <- data.frame(a = 1:2, b = letters[1:2]) 
df[rep(seq_len(nrow(df)), each = 2), ]

Repeat rows of a data.frame N times

EDIT: updated to a better modern R answer.

You can use replicate(), then rbind the result back together. The rownames are automatically altered to run from 1:nrows.

d <- data.frame(a = c(1,2,3),b = c(1,2,3))
n <- 3
do.call("rbind", replicate(n, d, simplify = FALSE))

A more traditional way is to use indexing, but here the rowname altering is not quite so neat (but more informative):

 d[rep(seq_len(nrow(d)), n), ]

Here are improvements on the above, the first two using purrr functional programming, idiomatic purrr:

purrr::map_dfr(seq_len(3), ~d)

and less idiomatic purrr (identical result, though more awkward):

purrr::map_dfr(seq_len(3), function(x) d)

and finally via indexing rather than list apply using dplyr:

d %>% slice(rep(row_number(), 3))

Repeat Rows in Data Frame n Times

Use a combination of pd.DataFrame.loc and pd.Index.repeat

test.loc[test.index.repeat(test.times)]

id times
0 a 2
0 a 2
1 b 3
1 b 3
1 b 3
2 c 1
3 d 5
3 d 5
3 d 5
3 d 5
3 d 5

To mimic your exact output, use reset_index

test.loc[test.index.repeat(test.times)].reset_index(drop=True)

id times
0 a 2
1 a 2
2 b 3
3 b 3
4 b 3
5 c 1
6 d 5
7 d 5
8 d 5
9 d 5
10 d 5

How to repeat rows until a certain number of rows is reached in R

We may use rep with sample

if(nrow(df2) > nrow(df1)) {

i1 <- sample(rep(seq_len(nrow(df1)), length.out = nrow(df2)))
out <- cbind(df1[i1,], df2)
} else {

i1 <- sample(rep(seq_len(nrow(df2)), length.out = nrow(df1)))
out <- cbind(df1, df2[i1,])
}

row.names(out) <- NULL

-output

> out
A B C D
1 12 13 19 20
2 12 13 20 30
3 15 16 10 13
4 12 13 54 32
5 15 16 34 10

data

df1 <- structure(list(A = c(12L, 15L), B = c(13L, 16L)), 
class = "data.frame", row.names = c("x",
"y"))

df2 <- structure(list(C = c(19L, 20L, 10L, 54L, 34L), D = c(20L, 30L,
13L, 32L, 10L)), class = "data.frame", row.names = c("z", "w",
"r", "k", "f"))

Repeat rows in pandas data frame with a sequential change in a column value

I took a different approach by pivoting & melting..
Seems to be working.. Any body sees an issue..?

data = {'year': ['2000', '2000', '2005', '2005', '2007', '2007', '2007', '2009'],
'country':['UK', 'US', 'FR','US','UK','FR','US','UK'],
'sales': [10, 21, 20, 10,12,20, 10,12],
'rep': ['john', 'john', 'claire','claire', 'kyle','kyle','kyle','amy']
}
df=pd.DataFrame(data)


year country sales rep
0 2000 UK 10 john
1 2000 US 21 john
2 2005 FR 20 claire
3 2005 US 10 claire
4 2007 UK 12 kyle
5 2007 FR 20 kyle
6 2007 US 10 kyle
7 2009 UK 12 amy

First doing a pivot...

dfp=pd.pivot_table(df,index=['country','rep'],values=['sales'],columns=['year']).fillna(0)
dfp=dfp.xs('sales', axis=1, drop_level=True)

year 2000 2005 2007 2009
country rep
FR claire 0.0 20.0 0.0 0.0
kyle 0.0 0.0 20.0 0.0
UK amy 0.0 0.0 0.0 12.0
john 10.0 0.0 0.0 0.0
kyle 0.0 0.0 12.0 0.0
US claire 0.0 10.0 0.0 0.0
john 21.0 0.0 0.0 0.0
kyle 0.0 0.0 10.0 0.0

Then a little logic to replicate the columns..

cols=dfp.columns.astype(int).values
dft=dfp.copy()
i=0
for col in cols :
if col != cols[-1]:
for newcol in range(col+1,cols[i+1]):
dft[str(newcol)]=dft[str(col)]
i+=1

year 2000 2005 2007 2009 2001 2002 2003 2004 2006 2008
country rep
FR claire 0.0 20.0 0.0 0.0 0.0 0.0 0.0 0.0 20.0 0.0
kyle 0.0 0.0 20.0 0.0 0.0 0.0 0.0 0.0 0.0 20.0
UK amy 0.0 0.0 0.0 12.0 0.0 0.0 0.0 0.0 0.0 0.0
john 10.0 0.0 0.0 0.0 10.0 10.0 10.0 10.0 0.0 0.0
kyle 0.0 0.0 12.0 0.0 0.0 0.0 0.0 0.0 0.0 12.0
US claire 0.0 10.0 0.0 0.0 0.0 0.0 0.0 0.0 10.0 0.0
john 21.0 0.0 0.0 0.0 21.0 21.0 21.0 21.0 0.0 0.0
kyle 0.0 0.0 10.0 0.0 0.0 0.0 0.0 0.0 0.0 10.0

Then did a melt get them back into original format..

dfm=dft.reset_index()
dfm=dfm.melt(id_vars=['country','rep'],value_vars=dfm.columns.values[2:],var_name='Year',value_name='sales')
dfm=dfm.loc[dfm.sales>0].reset_index(drop='True')

country rep Year sales
0 UK john 2000 10.0
1 US john 2000 21.0
2 FR claire 2005 20.0
3 US claire 2005 10.0
4 FR kyle 2007 20.0
5 UK kyle 2007 12.0
6 US kyle 2007 10.0
7 UK amy 2009 12.0
8 UK john 2001 10.0
9 US john 2001 21.0
10 UK john 2002 10.0
11 US john 2002 21.0
12 UK john 2003 10.0
13 US john 2003 21.0
14 UK john 2004 10.0
15 US john 2004 21.0
16 FR claire 2006 20.0
17 US claire 2006 10.0
18 FR kyle 2008 20.0
19 UK kyle 2008 12.0
20 US kyle 2008 10.0

How do you repeat each row for a dataframe for each value in a seperate dataframe and then combine the two into a single dataframe?

You can assign a redundant key column to each DataFrame (without mutating the original DataFrames) and join on it, then drop it before returning the final result:

import pandas as pd

df1 = pd.DataFrame({
'id': list(range(1, 5))
})

df2 = pd.DataFrame({
'month': ['2010-01', '2010-02', '2010-03']
})

df_merged = pd.merge(
df1.assign(key=1),
df2.assign(key=1),
on='key'
).drop('key', axis=1)
+----+----+---------+
| | id | month |
+----+----+---------+
| 0 | 1 | 2010-01 |
| 1 | 1 | 2010-02 |
| 2 | 1 | 2010-03 |
| 3 | 2 | 2010-01 |
| 4 | 2 | 2010-02 |
| 5 | 2 | 2010-03 |
| 6 | 3 | 2010-01 |
| 7 | 3 | 2010-02 |
| 8 | 3 | 2010-03 |
| 9 | 4 | 2010-01 |
| 10 | 4 | 2010-02 |
| 11 | 4 | 2010-03 |
+----+----+---------+

Repeat rows in a pandas DataFrame based on column value

reindex+ repeat

df.reindex(df.index.repeat(df.persons))
Out[951]:
code . role ..1 persons
0 123 . Janitor . 3
0 123 . Janitor . 3
0 123 . Janitor . 3
1 123 . Analyst . 2
1 123 . Analyst . 2
2 321 . Vallet . 2
2 321 . Vallet . 2
3 321 . Auditor . 5
3 321 . Auditor . 5
3 321 . Auditor . 5
3 321 . Auditor . 5
3 321 . Auditor . 5

PS: you can add.reset_index(drop=True) to get the new index

Pandas data frame repeat each row a certain number of times

Create dictionary for number of repeats for each Minute, Series.map and then repeat index with Index.repeat, last use DataFrame.loc for repeat rows:

print (df)
Minutiae LR
0 1 1.975476
1 2 1.082983
2 3 0.269608
3 4 0.878350

d = {1:2, 2:1, 3:5, 4:3}

df1 = df.loc[df.index.repeat(df['Minutiae'].map(d))]
print (df1)
Minutiae LR
0 1 1.975476
0 1 1.975476
1 2 1.082983
2 3 0.269608
2 3 0.269608
2 3 0.269608
2 3 0.269608
2 3 0.269608
3 4 0.878350
3 4 0.878350
3 4 0.878350

Detail:

print (df['Minutiae'].map(d))
0 2
1 1
2 5
3 3
Name: Minutiae, dtype: int64

print (df.index.repeat(df['Minutiae'].map(d)))
Int64Index([0, 0, 1, 2, 2, 2, 2, 2, 3, 3, 3], dtype='int64')

Or create new column for repeating:

df['repeat'] = [2,1,5,3]
print (df)
Minutiae LR repeat
0 1 1.975476 2
1 2 1.082983 1
2 3 0.269608 5
3 4 0.878350 3

df2 = df.loc[df.index.repeat(df['repeat'])]
print (df2)
Minutiae LR repeat
0 1 1.975476 2
0 1 1.975476 2
1 2 1.082983 1
2 3 0.269608 5
2 3 0.269608 5
2 3 0.269608 5
2 3 0.269608 5
2 3 0.269608 5
3 4 0.878350 3
3 4 0.878350 3
3 4 0.878350 3


Related Topics



Leave a reply



Submit