Repeating Rows Based on Column Value in Each Row

Repeating rows based on column value in each row

Supposing you won't generate more than 1000 rows per row:

with num as (select level as rnk from dual connect by level<=1000)
select Job, Quantity, Status, Repeat, rnk
from t join num on ( num.rnk <= repeat )
order by job, rnk;

Here is a test:
http://sqlfiddle.com/#!4/4519f/12

UPDATE: As Jeffrey Kemp said, you can "detect" the maximum with a subquery:

with num as (select level as rnk 
from dual
connect by level<=(select max(repeat) from t)
)
select job, quantity, status, repeat, rnk
from t join num on ( num.rnk <= repeat )
order by job, rnk;

Repeat rows in a Polars DataFrame based on column value

You were close. What you were looking for was the repeat_by expression.

First some data. I'm going to add an ID column, just to show how to apply the repeat_by expression to multiple columns (but exclude Quantity).

import polars as pl

df = (
pl.DataFrame({
'ID' : [100, 200],
'Fruit': ["Apple", "Banana"],
'Quantity': [2, 3],
})
)
df
shape: (2, 3)
┌─────┬────────┬──────────┐
│ ID ┆ Fruit ┆ Quantity │
│ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 │
╞═════╪════════╪══════════╡
│ 100 ┆ Apple ┆ 2 │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 200 ┆ Banana ┆ 3 │
└─────┴────────┴──────────┘

The Algorithm

(
df
.select(
pl.exclude('Quantity').repeat_by('Quantity').explode()
)
.with_column(
pl.lit(1).alias('Quantity')
)
)
shape: (5, 3)
┌─────┬────────┬──────────┐
│ ID ┆ Fruit ┆ Quantity │
│ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i32 │
╞═════╪════════╪══════════╡
│ 100 ┆ Apple ┆ 1 │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 100 ┆ Apple ┆ 1 │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 200 ┆ Banana ┆ 1 │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 200 ┆ Banana ┆ 1 │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 200 ┆ Banana ┆ 1 │
└─────┴────────┴──────────┘

How it works

The repeat_by expression will repeat a value in a Series by the value in another column/expression. In this case, we want to repeat by the value in Quantity.

We'll also use the exclude expression to apply repeat_by to all columns except Quantity (which we'll replace later).

Note that the result of repeat_by is a list.

(
df
.select(
pl.exclude('Quantity').repeat_by('Quantity')
)
)
shape: (2, 2)
┌─────────────────┬────────────────────────────────┐
│ ID ┆ Fruit │
│ --- ┆ --- │
│ list[i64] ┆ list[str] │
╞═════════════════╪════════════════════════════════╡
│ [100, 100] ┆ ["Apple", "Apple"] │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ [200, 200, 200] ┆ ["Banana", "Banana", "Banana"] │
└─────────────────┴────────────────────────────────┘

Next, we use explode, which will take each element of each list and place it on its own row.

(
df
.select(
pl.exclude('Quantity').repeat_by('Quantity').explode()
)
)
shape: (5, 2)
┌─────┬────────┐
│ ID ┆ Fruit │
│ --- ┆ --- │
│ i64 ┆ str │
╞═════╪════════╡
│ 100 ┆ Apple │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 100 ┆ Apple │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 200 ┆ Banana │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 200 ┆ Banana │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 200 ┆ Banana │
└─────┴────────┘

From there, we use the lit expression to add Quantity back to the DataFrame.

Repeat rows in a pandas DataFrame based on column value

reindex+ repeat

df.reindex(df.index.repeat(df.persons))
Out[951]:
code . role ..1 persons
0 123 . Janitor . 3
0 123 . Janitor . 3
0 123 . Janitor . 3
1 123 . Analyst . 2
1 123 . Analyst . 2
2 321 . Vallet . 2
2 321 . Vallet . 2
3 321 . Auditor . 5
3 321 . Auditor . 5
3 321 . Auditor . 5
3 321 . Auditor . 5
3 321 . Auditor . 5

PS: you can add.reset_index(drop=True) to get the new index

Repeat Rows N Times According to Column Value

You could do that with a recursive CTE using UNION ALL:

;WITH cte AS
(
SELECT * FROM Table1

UNION ALL

SELECT cte.[ID], cte.ProductFK, (cte.[Order] - 1) [Order], cte.Price
FROM cte INNER JOIN Table1 t
ON cte.[ID] = t.[ID]
WHERE cte.[Order] > 1
)
SELECT [ID], ProductFK, 1 [Order], Price
FROM cte
ORDER BY 1

Here's a working SQLFiddle.

Here's a longer explanation of this technique.


Since your input is too large for this recursion, you could use an auxillary table to have "many" dummy rows and then use SELECT TOP([Order]) for each input row (CROSS APPLY):

;WITH E00(N) AS (SELECT 1 UNION ALL SELECT 1),
E02(N) AS (SELECT 1 FROM E00 a, E00 b),
E04(N) AS (SELECT 1 FROM E02 a, E02 b),
E08(N) AS (SELECT 1 FROM E04 a, E04 b),
E16(N) AS (SELECT 1 FROM E08 a, E08 b)
SELECT t.[ID], t.ProductFK, 1 [Order], t.Price
FROM Table1 t CROSS APPLY (
SELECT TOP(t.[Order]) N
FROM E16) ca
ORDER BY 1

(The auxillary table is borrowed from here, it allows up to 65536 rows per input row and can be extended if required)

Here's a working SQLFiddle.

Repeating rows but changing column value each time

I think your best bet here is some recursive CTE:

WITH RECURSIVE quantitySpreader AS
(
/*Recursive Seed (starting point)*/
SELECT
ID,
CASE WHEN Quantity >= 200 then 200 ELSE Quantity END as Quantity,
Status,
1 as Depth,
CASE WHEN test.Quantity >= 200 THEN test.Quantity - 200 ELSE 0 END as remainder
FROM test

UNION ALL

/*Recursive member (sql that iterates until join fails)*/
SELECT
quantitySpreader.ID,
CASE WHEN remainder >= 200 THEN 200 ELSE remainder END,
quantitySpreader.Status,
depth + 1,
Case when remainder >= 200 THEN remainder - 200 else 0 END
FROM
quantitySpreader
INNER JOIN test
ON quantitySpreader.ID = test.ID
AND quantitySpreader.Quantity >= 200
WHERE depth <= 10
)
SELECT id, quantity, status
FROM quantitySpreader
ORDER BY id, quantity DESC;

This can get a little heady, but Recursive sql like this is split into two chunks inside that CTE.

  1. The recursive starting point/seed. This defines the starting point for iterating. Here we want every record (so no WHERE clause is present) and we establish the first iteration. We want "200" unless the quantity is less than 200, then just the quantity. We are also tracking the depth of recursiveness (to keep us from cycling endlessly) as well as the remainder after we subtract that 200.
  2. After the UNION ALL is the recursive member. This SELECT statement will repeat over and over and over again referring to its own result set (quantitySpread) until the JOIN fails and returns nothing. Each iteration we do the same logic as above. Check if the quantity is over 200, and if so, set the output to 200 and recalculate the remainder for the next iteration.

SQLFiddle of this in action It's running on Postgres, but the syntax is nearly identical for SQL Server so it should just be a copy/paste job.

Input:

CREATE TABLE test (id int, Quantity int, Status varchar(10));
INSERT INTO test VALUES (1, 250, 'OK');
INSERT INTO test VALUES (2, 440, 'HOLD');

Output:





































idquantitystatus
1200OK
150OK
2200HOLD
2200HOLD
240HOLD

Duplicate rows based on other columns containing values, then return row with split column value

Try this:

df.assign(Group=df['Group'].str.split('-')).explode('Group')

Output:

        Date  End Time Group Assignment
0 2/2/2021 1130 A quiz
0 2/2/2021 1130 B quiz
0 2/2/2021 1130 C quiz
1 2/2/2021 1230 XYZ test
2 1/22/2021 1330 B paper
2 1/22/2021 1330 D paper
3 1/22/2021 1130 A homework
3 1/22/2021 1130 E homework
3 1/22/2021 1130 C homework

Using assign we can reassign Group as a list of strings delimited by '-' using str accessor and split. Then using pd.DataFrame.explode we can explode that list to create the rows in the dataframe for each element in the list.

How to create duplicate rows based on a column values

You can use Hierarchical Query

select T1.ID, T1.TEXT
from TestTable1 t1
join TestTable2 t2
on T1.ID = T2.ID
connect by level <= T2.repeat
and prior T1.ID = T1.ID
and prior sys_guid() is not null;

Demo



Related Topics



Leave a reply



Submit