﻿ Repeat Dataframe Rows N Times According to the Unique Column Values and to Each Row Repeat Create a New Column With Different Values - ITCodar

# Repeat Dataframe Rows N Times According to the Unique Column Values and to Each Row Repeat Create a New Column With Different Values

## Repeat dataframe rows n times according to the unique column values and to each row repeat create a new column with different values

One solution is to convert `'Cs'` to a Categorical. Then use `GroupBy` + `first`:

``df['Cs'] = df['Cs'].astype('category')res = df.groupby(['Samp', 'Cs']).first().reset_index()res['Age'] = res.groupby('Samp')['Age'].transform('first').astype(int)``

Result

``   Samp   Cs  Age0     A  cin   511     A  ebv   512     A   gs   513     A  msi   514     B  cin   625     B  ebv   626     B   gs   627     B  msi   628     C  cin   559     C  ebv   5510    C   gs   5511    C  msi   5512    D  cin   7013    D  ebv   7014    D   gs   7015    D  msi   7016    E  cin   5617    E  ebv   5618    E   gs   5619    E  msi   56``

## Repeat Rows in Data Frame n Times

Use a combination of `pd.DataFrame.loc` and `pd.Index.repeat`

``test.loc[test.index.repeat(test.times)]  id  times0  a      20  a      21  b      31  b      31  b      32  c      13  d      53  d      53  d      53  d      53  d      5``

To mimic your exact output, use `reset_index`

``test.loc[test.index.repeat(test.times)].reset_index(drop=True)   id  times0   a      21   a      22   b      33   b      34   b      35   c      16   d      57   d      58   d      59   d      510  d      5``

## Repeat rows in DataFrame N times based on len(list) in column with different list values

Use `DataFrame.explode` working in pandas 0.25+ and create new columns with `DataFrame` constructor:

``print (date_df)   a                                               date0  4       [[2017-02-01 00:00:00, 2017-03-01 00:00:00]]1  7  [[2017-02-01 00:00:00, 2017-04-01 00:00:00], [...df = date_df.explode('date')print (df)   a                                        date0  4  [2017-02-01 00:00:00, 2017-03-01 00:00:00]1  7  [2017-02-01 00:00:00, 2017-04-01 00:00:00]1  7  [2017-02-01 00:00:00, 2017-04-01 00:00:00]df[['date_start','date_end']] = pd.DataFrame(df.pop('date').values.tolist(), index=df.index)print (df)   a date_start   date_end0  4 2017-02-01 2017-03-011  7 2017-02-01 2017-04-011  7 2017-02-01 2017-04-01``

EDIT:

Solution for oldier pandas versions:

``s = date_df.pop('date')df = date_df.loc[date_df.index.repeat(s.str.len())]df[['date_start','date_end']] = pd.DataFrame(np.concatenate(s), index=df.index)df = df.reset_index(drop=True)print (df)   a date_start   date_end0  4 2017-02-01 2017-03-011  7 2017-02-01 2017-04-012  7 2017-02-01 2017-04-01``

## pandas - Copy each row 'n' times depending on column value

Use `Index.repeat`, `DataFrame.loc`, `DataFrame.assign` and `DataFrame.reset_index`

`` new_df = df.loc[df.index.repeat(df['orig_qty'])].assign(fifo_qty=1).reset_index(drop=True)``

[output]

``         date  orig_qty  price  fifo_qty0  2019-04-08         4  115.0         11  2019-04-08         4  115.0         12  2019-04-08         4  115.0         13  2019-04-08         4  115.0         14  2019-04-09         2  103.0         15  2019-04-09         2  103.0         1``

## How to keep duplicated rows that repeat exactly n times in pandas DataFame

use `.transform` and `count` with a boolean filter.

``s = df.groupby('peak_start')['peak_start'].transform('count')``

``df[s == 2]   peak_start  peak_end  motif_start  motif_end strand0         948       177      3210085    3210103      -1         948       177      3210047    3210065      +print(df[s == 3])   peak_start  peak_end  motif_start  motif_end strand2          62       419      3223269    3223287      -3          62       419      3223229    3223247      +4          62       419      3223232    3223250      +``

## Repeat rows in a pandas DataFrame based on column value

`reindex`+ `repeat`

``df.reindex(df.index.repeat(df.persons))Out[951]:    code  .     role ..1  persons0   123  .  Janitor   .        30   123  .  Janitor   .        30   123  .  Janitor   .        31   123  .  Analyst   .        21   123  .  Analyst   .        22   321  .   Vallet   .        22   321  .   Vallet   .        23   321  .  Auditor   .        53   321  .  Auditor   .        53   321  .  Auditor   .        53   321  .  Auditor   .        53   321  .  Auditor   .        5``

PS: you can add`.reset_index(drop=True)` to get the new index

## Duplicating rows n times, where n is a value of a string

You could extend your example with the followign code:

``set.seed(5)df <- data.frame(state = c('A','B'), city = c('Other (3)','Other (2)'), count = c('250','50'))times <- as.numeric(gsub(".*\\((.*)\\).*", "\\1", df\$city))df\$count <- as.numeric(df\$count)/timesoutput <- df[rep(seq_along(times),times),]``

The key addition is the line creating output, which uses row indexing on the input dataframe to repeat each row as required.