Repeat rows in a pandas DataFrame based on column value
reindex
+ repeat
df.reindex(df.index.repeat(df.persons))
Out[951]:
code . role ..1 persons
0 123 . Janitor . 3
0 123 . Janitor . 3
0 123 . Janitor . 3
1 123 . Analyst . 2
1 123 . Analyst . 2
2 321 . Vallet . 2
2 321 . Vallet . 2
3 321 . Auditor . 5
3 321 . Auditor . 5
3 321 . Auditor . 5
3 321 . Auditor . 5
3 321 . Auditor . 5
PS: you can add.reset_index(drop=True)
to get the new index
Repeat rows in a Polars DataFrame based on column value
You were close. What you were looking for was the repeat_by
expression.
First some data. I'm going to add an ID
column, just to show how to apply the repeat_by
expression to multiple columns (but exclude Quantity
).
import polars as pl
df = (
pl.DataFrame({
'ID' : [100, 200],
'Fruit': ["Apple", "Banana"],
'Quantity': [2, 3],
})
)
df
shape: (2, 3)
┌─────┬────────┬──────────┐
│ ID ┆ Fruit ┆ Quantity │
│ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 │
╞═════╪════════╪══════════╡
│ 100 ┆ Apple ┆ 2 │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 200 ┆ Banana ┆ 3 │
└─────┴────────┴──────────┘
The Algorithm
(
df
.select(
pl.exclude('Quantity').repeat_by('Quantity').explode()
)
.with_column(
pl.lit(1).alias('Quantity')
)
)
shape: (5, 3)
┌─────┬────────┬──────────┐
│ ID ┆ Fruit ┆ Quantity │
│ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i32 │
╞═════╪════════╪══════════╡
│ 100 ┆ Apple ┆ 1 │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 100 ┆ Apple ┆ 1 │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 200 ┆ Banana ┆ 1 │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 200 ┆ Banana ┆ 1 │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 200 ┆ Banana ┆ 1 │
└─────┴────────┴──────────┘
How it works
The repeat_by
expression will repeat a value in a Series by the value in another column/expression. In this case, we want to repeat by the value in Quantity
.
We'll also use the exclude
expression to apply repeat_by
to all columns except Quantity
(which we'll replace later).
Note that the result of repeat_by
is a list.
(
df
.select(
pl.exclude('Quantity').repeat_by('Quantity')
)
)
shape: (2, 2)
┌─────────────────┬────────────────────────────────┐
│ ID ┆ Fruit │
│ --- ┆ --- │
│ list[i64] ┆ list[str] │
╞═════════════════╪════════════════════════════════╡
│ [100, 100] ┆ ["Apple", "Apple"] │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ [200, 200, 200] ┆ ["Banana", "Banana", "Banana"] │
└─────────────────┴────────────────────────────────┘
Next, we use explode
, which will take each element of each list and place it on its own row.
(
df
.select(
pl.exclude('Quantity').repeat_by('Quantity').explode()
)
)
shape: (5, 2)
┌─────┬────────┐
│ ID ┆ Fruit │
│ --- ┆ --- │
│ i64 ┆ str │
╞═════╪════════╡
│ 100 ┆ Apple │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 100 ┆ Apple │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 200 ┆ Banana │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 200 ┆ Banana │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 200 ┆ Banana │
└─────┴────────┘
From there, we use the lit
expression to add Quantity
back to the DataFrame.
Create duplicate row in Pandas dataframe based on condition, and change values for a specific column
One way of solving this is by creating a second dataframe with all elements which do not have Interval=0
df2 = df[df.Interval != 0]
then map the values of column Specs
from the entries with Interval==0
onto column Specs
in the new dataframe:
df2.loc[:, 'Specs'] = df2['Item'].map(df[df.Interval == 0].set_index('Item')['Specs'])
and concatenate the 2 dataframes in the end
df = pd.concat([df, df2], axis=0)
This will give you the desired output.
Python: How to replicate rows in Dataframe with column value but changing the column value to its range
You can do a groupby().cumcount()
after that:
out = df.loc[df.index.repeat(df['Table'])]
out['Table'] = out.groupby(level=0).cumcount() + 1
Output:
Store Aisle Table
0 11 59 1
0 11 59 2
1 11 61 1
1 11 61 2
1 11 61 3
python - Repeat rows in a pandas DataFrame based on column value and add 1 day in date in each repeated row
First get the date ranges
for each of the rows, then explode
it, and finally get the minimum index
for date column and assign empty string or NaN
.
df['date'] = pd.to_datetime(df['date'], yearfirst=True)
df['date']=df.apply(lambda row: pd.date_range(row['date'], row['date'] + \
pd.to_timedelta(row['add_days']-1, 'D')),
axis=1)
df = df.explode('date', ignore_index=True)
df.loc[~(df.index.isin(df.groupby('id')['date'].idxmin())),['add_days', 'status']] = ''
OUTPUT:
id date add_days status
0 1 2021-01-01 3 Completed
1 1 2021-01-02
2 1 2021-01-03
3 2 2021-03-05 5 Completed
4 2 2021-03-06
5 2 2021-03-07
6 2 2021-03-08
7 2 2021-03-09
8 3 2021-02-27 3 Pending
9 3 2021-02-28
10 3 2021-03-01
Is there a way to repeat row values in a column in pandas?
df[column].fillna(method="ffill")
Check out the explanation and examples of available methods ({‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}) from the doc https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.fillna.html
Related Topics
Which Is the Easiest Way to Simulate Keyboard and Mouse on Python
Postponing Functions in Python
How to Profile Python Code Line-By-Line
What Is the Point of Indexing in Pandas
How to Make a Selenium Script Undetectable Using Geckodriver and Firefox Through Python
Function Changes List Values and Not Variable Values in Python
How to Use 'Else' in a List Comprehension
Matplotlib Xticks Not Lining Up with Histogram
"Line Contains Null Byte" in CSV Reader (Python)
How to Do Exponentiation in Python
Split a Python List into Other "Sublists" I.E Smaller Lists
Let JSON Object Accept Bytes or Let Urlopen Output Strings
Python: How to Get Stdout After Running Os.System
Why Does Str(Float) Return More Digits in Python 3 Than Python 2