Repeat Rows of a Data.Frame

How can I replicate rows in Pandas?

Use `np.repeat`:

Version 1:

Try using np.repeat:

newdf = pd.DataFrame(np.repeat(df.values, 3, axis=0))
newdf.columns = df.columns
print(newdf)

The above code will output:

  Person   ID ZipCode  Gender
0  12345  882   38182  Female
1  12345  882   38182  Female
2  12345  882   38182  Female
3  32917  271   88172    Male
4  32917  271   88172    Male
5  32917  271   88172    Male
6  18273  552   90291  Female
7  18273  552   90291  Female
8  18273  552   90291  Female

np.repeat repeats the values of df, 3 times.

Then we add the columns with assigning new_df.columns = df.columns.

Version 2:

You could also assign the column names in the first line, like below:

newdf = pd.DataFrame(np.repeat(df.values, 3, axis=0), columns=df.columns)
print(newdf)

The above code will also output:

  Person   ID ZipCode  Gender
0  12345  882   38182  Female
1  12345  882   38182  Female
2  12345  882   38182  Female
3  32917  271   88172    Male
4  32917  271   88172    Male
5  32917  271   88172    Male
6  18273  552   90291  Female
7  18273  552   90291  Female
8  18273  552   90291  Female

Repeat rows of a data.frame

df <- data.frame(a = 1:2, b = letters[1:2]) 
df[rep(seq_len(nrow(df)), each = 2), ]

Repeat rows of a data.frame N times

EDIT: updated to a better modern R answer.

You can use replicate(), then rbind the result back together. The rownames are automatically altered to run from 1:nrows.

d <- data.frame(a = c(1,2,3),b = c(1,2,3))
n <- 3
do.call("rbind", replicate(n, d, simplify = FALSE))

A more traditional way is to use indexing, but here the rowname altering is not quite so neat (but more informative):

 d[rep(seq_len(nrow(d)), n), ]

Here are improvements on the above, the first two using purrr functional programming, idiomatic purrr:

purrr::map_dfr(seq_len(3), ~d)

and less idiomatic purrr (identical result, though more awkward):

purrr::map_dfr(seq_len(3), function(x) d)

and finally via indexing rather than list apply using dplyr:

d %>% slice(rep(row_number(), 3))

Repeat Rows in Data Frame n Times

Use a combination of pd.DataFrame.loc and pd.Index.repeat

test.loc[test.index.repeat(test.times)]

  id  times
0  a      2
0  a      2
1  b      3
1  b      3
1  b      3
2  c      1
3  d      5
3  d      5
3  d      5
3  d      5
3  d      5

To mimic your exact output, use reset_index

test.loc[test.index.repeat(test.times)].reset_index(drop=True)

   id  times
0   a      2
1   a      2
2   b      3
3   b      3
4   b      3
5   c      1
6   d      5
7   d      5
8   d      5
9   d      5
10  d      5

How to repeat rows until a certain number of rows is reached in R

We may use rep with sample

if(nrow(df2) > nrow(df1)) {

  i1 <- sample(rep(seq_len(nrow(df1)), length.out = nrow(df2)))
  out <- cbind(df1[i1,], df2)
} else {

  i1 <- sample(rep(seq_len(nrow(df2)), length.out = nrow(df1)))
  out <- cbind(df1, df2[i1,])
}

row.names(out) <- NULL

-output

> out
   A  B  C  D
1 12 13 19 20
2 12 13 20 30
3 15 16 10 13
4 12 13 54 32
5 15 16 34 10

data

df1 <- structure(list(A = c(12L, 15L), B = c(13L, 16L)), 
class = "data.frame", row.names = c("x", 
"y"))

df2 <- structure(list(C = c(19L, 20L, 10L, 54L, 34L), D = c(20L, 30L, 
13L, 32L, 10L)), class = "data.frame", row.names = c("z", "w", 
"r", "k", "f"))

Repeat rows in pandas data frame with a sequential change in a column value

I took a different approach by pivoting & melting..
Seems to be working.. Any body sees an issue..?

data = {'year': ['2000', '2000', '2005', '2005', '2007', '2007', '2007', '2009'],
'country':['UK', 'US', 'FR','US','UK','FR','US','UK'],
'sales': [10, 21, 20, 10,12,20, 10,12],
'rep': ['john', 'john', 'claire','claire', 'kyle','kyle','kyle','amy']
}
df=pd.DataFrame(data)


    year    country sales   rep
0   2000    UK  10  john
1   2000    US  21  john
2   2005    FR  20  claire
3   2005    US  10  claire
4   2007    UK  12  kyle
5   2007    FR  20  kyle
6   2007    US  10  kyle
7   2009    UK  12  amy

First doing a pivot...

dfp=pd.pivot_table(df,index=['country','rep'],values=['sales'],columns=['year']).fillna(0)
dfp=dfp.xs('sales', axis=1, drop_level=True)

    year    2000    2005    2007    2009
country rep             
FR  claire  0.0 20.0    0.0 0.0
kyle    0.0 0.0 20.0    0.0
UK  amy 0.0 0.0 0.0 12.0
john    10.0    0.0 0.0 0.0
kyle    0.0 0.0 12.0    0.0
US  claire  0.0 10.0    0.0 0.0
john    21.0    0.0 0.0 0.0
kyle    0.0 0.0 10.0    0.0

Then a little logic to replicate the columns..

cols=dfp.columns.astype(int).values
dft=dfp.copy()
i=0
for col in cols :
    if col != cols[-1]:
        for newcol in range(col+1,cols[i+1]):
            dft[str(newcol)]=dft[str(col)]
    i+=1

    year    2000    2005    2007    2009    2001    2002    2003    2004    2006    2008
country rep                                     
FR  claire  0.0 20.0    0.0 0.0 0.0 0.0 0.0 0.0 20.0    0.0
kyle    0.0 0.0 20.0    0.0 0.0 0.0 0.0 0.0 0.0 20.0
UK  amy 0.0 0.0 0.0 12.0    0.0 0.0 0.0 0.0 0.0 0.0
john    10.0    0.0 0.0 0.0 10.0    10.0    10.0    10.0    0.0 0.0
kyle    0.0 0.0 12.0    0.0 0.0 0.0 0.0 0.0 0.0 12.0
US  claire  0.0 10.0    0.0 0.0 0.0 0.0 0.0 0.0 10.0    0.0
john    21.0    0.0 0.0 0.0 21.0    21.0    21.0    21.0    0.0 0.0
kyle    0.0 0.0 10.0    0.0 0.0 0.0 0.0 0.0 0.0 10.0

Then did a melt get them back into original format..

dfm=dft.reset_index()
dfm=dfm.melt(id_vars=['country','rep'],value_vars=dfm.columns.values[2:],var_name='Year',value_name='sales')
dfm=dfm.loc[dfm.sales>0].reset_index(drop='True')

    country rep Year    sales
0   UK  john    2000    10.0
1   US  john    2000    21.0
2   FR  claire  2005    20.0
3   US  claire  2005    10.0
4   FR  kyle    2007    20.0
5   UK  kyle    2007    12.0
6   US  kyle    2007    10.0
7   UK  amy     2009    12.0
8   UK  john    2001    10.0
9   US  john    2001    21.0
10  UK  john    2002    10.0
11  US  john    2002    21.0
12  UK  john    2003    10.0
13  US  john    2003    21.0
14  UK  john    2004    10.0
15  US  john    2004    21.0
16  FR  claire  2006    20.0
17  US  claire  2006    10.0
18  FR  kyle    2008    20.0
19  UK  kyle    2008    12.0
20  US  kyle    2008    10.0

How do you repeat each row for a dataframe for each value in a seperate dataframe and then combine the two into a single dataframe?

You can assign a redundant key column to each DataFrame (without mutating the original DataFrames) and join on it, then drop it before returning the final result:

import pandas as pd

df1 = pd.DataFrame({
    'id': list(range(1, 5))
})

df2 = pd.DataFrame({
    'month': ['2010-01', '2010-02', '2010-03']
})

df_merged = pd.merge(
    df1.assign(key=1),
    df2.assign(key=1),
    on='key'
).drop('key', axis=1)

+----+----+---------+
|    | id |  month  |
+----+----+---------+
|  0 |  1 | 2010-01 |
|  1 |  1 | 2010-02 |
|  2 |  1 | 2010-03 |
|  3 |  2 | 2010-01 |
|  4 |  2 | 2010-02 |
|  5 |  2 | 2010-03 |
|  6 |  3 | 2010-01 |
|  7 |  3 | 2010-02 |
|  8 |  3 | 2010-03 |
|  9 |  4 | 2010-01 |
| 10 |  4 | 2010-02 |
| 11 |  4 | 2010-03 |
+----+----+---------+

Repeat rows in a pandas DataFrame based on column value

reindex+ repeat

df.reindex(df.index.repeat(df.persons))
Out[951]: 
   code  .     role ..1  persons
0   123  .  Janitor   .        3
0   123  .  Janitor   .        3
0   123  .  Janitor   .        3
1   123  .  Analyst   .        2
1   123  .  Analyst   .        2
2   321  .   Vallet   .        2
2   321  .   Vallet   .        2
3   321  .  Auditor   .        5
3   321  .  Auditor   .        5
3   321  .  Auditor   .        5
3   321  .  Auditor   .        5
3   321  .  Auditor   .        5

PS: you can add.reset_index(drop=True) to get the new index

Pandas data frame repeat each row a certain number of times

Create dictionary for number of repeats for each Minute, Series.map and then repeat index with Index.repeat, last use DataFrame.loc for repeat rows:

print (df)
   Minutiae        LR
0         1  1.975476
1         2  1.082983
2         3  0.269608
3         4  0.878350

d = {1:2, 2:1, 3:5, 4:3}

df1 = df.loc[df.index.repeat(df['Minutiae'].map(d))]
print (df1)
   Minutiae        LR
0         1  1.975476
0         1  1.975476
1         2  1.082983
2         3  0.269608
2         3  0.269608
2         3  0.269608
2         3  0.269608
2         3  0.269608
3         4  0.878350
3         4  0.878350
3         4  0.878350

Detail:

print (df['Minutiae'].map(d))
0    2
1    1
2    5
3    3
Name: Minutiae, dtype: int64

print (df.index.repeat(df['Minutiae'].map(d)))
Int64Index([0, 0, 1, 2, 2, 2, 2, 2, 3, 3, 3], dtype='int64')

Or create new column for repeating:

df['repeat'] = [2,1,5,3]
print (df)
   Minutiae        LR  repeat
0         1  1.975476       2
1         2  1.082983       1
2         3  0.269608       5
3         4  0.878350       3

df2 = df.loc[df.index.repeat(df['repeat'])]
print (df2)
   Minutiae        LR  repeat
0         1  1.975476       2
0         1  1.975476       2
1         2  1.082983       1
2         3  0.269608       5
2         3  0.269608       5
2         3  0.269608       5
2         3  0.269608       5
2         3  0.269608       5
3         4  0.878350       3
3         4  0.878350       3
3         4  0.878350       3

Repeat Rows of a Data.Frame

How can I replicate rows in Pandas?

Use `np.repeat`:

Version 1:

Version 2:

Repeat rows of a data.frame

Repeat rows of a data.frame N times

Repeat Rows in Data Frame n Times

How to repeat rows until a certain number of rows is reached in R

data

Repeat rows in pandas data frame with a sequential change in a column value

How do you repeat each row for a dataframe for each value in a seperate dataframe and then combine the two into a single dataframe?

Repeat rows in a pandas DataFrame based on column value

Pandas data frame repeat each row a certain number of times

Related Topics

Leave a reply

How can I replicate rows in Pandas?

Use np.repeat:

Version 1:

Version 2:

Repeat rows of a data.frame

Repeat rows of a data.frame N times

Repeat Rows in Data Frame n Times

How to repeat rows until a certain number of rows is reached in R

data

Repeat rows in pandas data frame with a sequential change in a column value

How do you repeat each row for a dataframe for each value in a seperate dataframe and then combine the two into a single dataframe?

Repeat rows in a pandas DataFrame based on column value

Pandas data frame repeat each row a certain number of times

Related Topics

Leave a reply

Use `np.repeat`: