Repeat Dataframe Rows N Times According to the Unique Column Values and to Each Row Repeat Create a New Column With Different Values

Repeat dataframe rows n times according to the unique column values and to each row repeat create a new column with different values

One solution is to convert 'Cs' to a Categorical. Then use GroupBy + first:

df['Cs'] = df['Cs'].astype('category')

res = df.groupby(['Samp', 'Cs']).first().reset_index()
res['Age'] = res.groupby('Samp')['Age'].transform('first').astype(int)

Result

   Samp   Cs  Age
0 A cin 51
1 A ebv 51
2 A gs 51
3 A msi 51
4 B cin 62
5 B ebv 62
6 B gs 62
7 B msi 62
8 C cin 55
9 C ebv 55
10 C gs 55
11 C msi 55
12 D cin 70
13 D ebv 70
14 D gs 70
15 D msi 70
16 E cin 56
17 E ebv 56
18 E gs 56
19 E msi 56

Repeat Rows in Data Frame n Times

Use a combination of pd.DataFrame.loc and pd.Index.repeat

test.loc[test.index.repeat(test.times)]

id times
0 a 2
0 a 2
1 b 3
1 b 3
1 b 3
2 c 1
3 d 5
3 d 5
3 d 5
3 d 5
3 d 5

To mimic your exact output, use reset_index

test.loc[test.index.repeat(test.times)].reset_index(drop=True)

id times
0 a 2
1 a 2
2 b 3
3 b 3
4 b 3
5 c 1
6 d 5
7 d 5
8 d 5
9 d 5
10 d 5

Repeat rows in DataFrame N times based on len(list) in column with different list values

Use DataFrame.explode working in pandas 0.25+ and create new columns with DataFrame constructor:

print (date_df)
a date
0 4 [[2017-02-01 00:00:00, 2017-03-01 00:00:00]]
1 7 [[2017-02-01 00:00:00, 2017-04-01 00:00:00], [...

df = date_df.explode('date')
print (df)
a date
0 4 [2017-02-01 00:00:00, 2017-03-01 00:00:00]
1 7 [2017-02-01 00:00:00, 2017-04-01 00:00:00]
1 7 [2017-02-01 00:00:00, 2017-04-01 00:00:00]


df[['date_start','date_end']] = pd.DataFrame(df.pop('date').values.tolist(), index=df.index)
print (df)
a date_start date_end
0 4 2017-02-01 2017-03-01
1 7 2017-02-01 2017-04-01
1 7 2017-02-01 2017-04-01

EDIT:

Solution for oldier pandas versions:

s = date_df.pop('date')
df = date_df.loc[date_df.index.repeat(s.str.len())]
df[['date_start','date_end']] = pd.DataFrame(np.concatenate(s), index=df.index)
df = df.reset_index(drop=True)
print (df)
a date_start date_end
0 4 2017-02-01 2017-03-01
1 7 2017-02-01 2017-04-01
2 7 2017-02-01 2017-04-01

pandas - Copy each row 'n' times depending on column value

Use Index.repeat, DataFrame.loc, DataFrame.assign and DataFrame.reset_index

 new_df = df.loc[df.index.repeat(df['orig_qty'])].assign(fifo_qty=1).reset_index(drop=True)

[output]

         date  orig_qty  price  fifo_qty
0 2019-04-08 4 115.0 1
1 2019-04-08 4 115.0 1
2 2019-04-08 4 115.0 1
3 2019-04-08 4 115.0 1
4 2019-04-09 2 103.0 1
5 2019-04-09 2 103.0 1

How to keep duplicated rows that repeat exactly n times in pandas DataFame

use .transform and count with a boolean filter.

s = df.groupby('peak_start')['peak_start'].transform('count')

df[s == 2]
peak_start peak_end motif_start motif_end strand
0 948 177 3210085 3210103 -
1 948 177 3210047 3210065 +


print(df[s == 3])

peak_start peak_end motif_start motif_end strand
2 62 419 3223269 3223287 -
3 62 419 3223229 3223247 +
4 62 419 3223232 3223250 +

Repeat rows in a pandas DataFrame based on column value

reindex+ repeat

df.reindex(df.index.repeat(df.persons))
Out[951]:
code . role ..1 persons
0 123 . Janitor . 3
0 123 . Janitor . 3
0 123 . Janitor . 3
1 123 . Analyst . 2
1 123 . Analyst . 2
2 321 . Vallet . 2
2 321 . Vallet . 2
3 321 . Auditor . 5
3 321 . Auditor . 5
3 321 . Auditor . 5
3 321 . Auditor . 5
3 321 . Auditor . 5

PS: you can add.reset_index(drop=True) to get the new index

Duplicating rows n times, where n is a value of a string

You could extend your example with the followign code:

set.seed(5)
df <- data.frame(state = c('A','B'), city = c('Other (3)','Other (2)'), count = c('250','50'))
times <- as.numeric(gsub(".*\\((.*)\\).*", "\\1", df$city))
df$count <- as.numeric(df$count)/times
output <- df[rep(seq_along(times),times),]

The key addition is the line creating output, which uses row indexing on the input dataframe to repeat each row as required.



Related Topics



Leave a reply



Submit