Repeat Rows in a Pandas Dataframe Based on Column Value

Repeat rows in a pandas DataFrame based on column value

reindex+ repeat

df.reindex(df.index.repeat(df.persons))
Out[951]:
code . role ..1 persons
0 123 . Janitor . 3
0 123 . Janitor . 3
0 123 . Janitor . 3
1 123 . Analyst . 2
1 123 . Analyst . 2
2 321 . Vallet . 2
2 321 . Vallet . 2
3 321 . Auditor . 5
3 321 . Auditor . 5
3 321 . Auditor . 5
3 321 . Auditor . 5
3 321 . Auditor . 5

PS: you can add.reset_index(drop=True) to get the new index

Repeat rows in a Polars DataFrame based on column value

You were close. What you were looking for was the repeat_by expression.

First some data. I'm going to add an ID column, just to show how to apply the repeat_by expression to multiple columns (but exclude Quantity).

import polars as pl

df = (
pl.DataFrame({
'ID' : [100, 200],
'Fruit': ["Apple", "Banana"],
'Quantity': [2, 3],
})
)
df
shape: (2, 3)
┌─────┬────────┬──────────┐
│ ID ┆ Fruit ┆ Quantity │
│ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 │
╞═════╪════════╪══════════╡
│ 100 ┆ Apple ┆ 2 │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 200 ┆ Banana ┆ 3 │
└─────┴────────┴──────────┘

The Algorithm

(
df
.select(
pl.exclude('Quantity').repeat_by('Quantity').explode()
)
.with_column(
pl.lit(1).alias('Quantity')
)
)
shape: (5, 3)
┌─────┬────────┬──────────┐
│ ID ┆ Fruit ┆ Quantity │
│ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i32 │
╞═════╪════════╪══════════╡
│ 100 ┆ Apple ┆ 1 │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 100 ┆ Apple ┆ 1 │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 200 ┆ Banana ┆ 1 │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 200 ┆ Banana ┆ 1 │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 200 ┆ Banana ┆ 1 │
└─────┴────────┴──────────┘

How it works

The repeat_by expression will repeat a value in a Series by the value in another column/expression. In this case, we want to repeat by the value in Quantity.

We'll also use the exclude expression to apply repeat_by to all columns except Quantity (which we'll replace later).

Note that the result of repeat_by is a list.

(
df
.select(
pl.exclude('Quantity').repeat_by('Quantity')
)
)
shape: (2, 2)
┌─────────────────┬────────────────────────────────┐
│ ID ┆ Fruit │
│ --- ┆ --- │
│ list[i64] ┆ list[str] │
╞═════════════════╪════════════════════════════════╡
│ [100, 100] ┆ ["Apple", "Apple"] │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ [200, 200, 200] ┆ ["Banana", "Banana", "Banana"] │
└─────────────────┴────────────────────────────────┘

Next, we use explode, which will take each element of each list and place it on its own row.

(
df
.select(
pl.exclude('Quantity').repeat_by('Quantity').explode()
)
)
shape: (5, 2)
┌─────┬────────┐
│ ID ┆ Fruit │
│ --- ┆ --- │
│ i64 ┆ str │
╞═════╪════════╡
│ 100 ┆ Apple │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 100 ┆ Apple │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 200 ┆ Banana │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 200 ┆ Banana │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 200 ┆ Banana │
└─────┴────────┘

From there, we use the lit expression to add Quantity back to the DataFrame.

Create duplicate row in Pandas dataframe based on condition, and change values for a specific column

One way of solving this is by creating a second dataframe with all elements which do not have Interval=0

df2 = df[df.Interval != 0]

then map the values of column Specs from the entries with Interval==0 onto column Specs in the new dataframe:

df2.loc[:, 'Specs'] = df2['Item'].map(df[df.Interval == 0].set_index('Item')['Specs'])

and concatenate the 2 dataframes in the end

df = pd.concat([df, df2], axis=0)

This will give you the desired output.

Python: How to replicate rows in Dataframe with column value but changing the column value to its range

You can do a groupby().cumcount() after that:

out = df.loc[df.index.repeat(df['Table'])]
out['Table'] = out.groupby(level=0).cumcount() + 1

Output:

   Store  Aisle  Table
0 11 59 1
0 11 59 2
1 11 61 1
1 11 61 2
1 11 61 3

python - Repeat rows in a pandas DataFrame based on column value and add 1 day in date in each repeated row

First get the date ranges for each of the rows, then explode it, and finally get the minimum index for date column and assign empty string or NaN.

df['date'] = pd.to_datetime(df['date'], yearfirst=True)
df['date']=df.apply(lambda row: pd.date_range(row['date'], row['date'] + \
pd.to_timedelta(row['add_days']-1, 'D')),
axis=1)
df = df.explode('date', ignore_index=True)
df.loc[~(df.index.isin(df.groupby('id')['date'].idxmin())),['add_days', 'status']] = ''

OUTPUT:

    id       date add_days     status
0 1 2021-01-01 3 Completed
1 1 2021-01-02
2 1 2021-01-03
3 2 2021-03-05 5 Completed
4 2 2021-03-06
5 2 2021-03-07
6 2 2021-03-08
7 2 2021-03-09
8 3 2021-02-27 3 Pending
9 3 2021-02-28
10 3 2021-03-01

Is there a way to repeat row values in a column in pandas?

df[column].fillna(method="ffill")

Check out the explanation and examples of available methods ({‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}) from the doc https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.fillna.html



Related Topics



Leave a reply



Submit