Repeat dataframe rows n times according to the unique column values and to each row repeat create a new column with different values
One solution is to convert 'Cs'
to a Categorical. Then use GroupBy
+ first
:
df['Cs'] = df['Cs'].astype('category')
res = df.groupby(['Samp', 'Cs']).first().reset_index()
res['Age'] = res.groupby('Samp')['Age'].transform('first').astype(int)
Result
Samp Cs Age
0 A cin 51
1 A ebv 51
2 A gs 51
3 A msi 51
4 B cin 62
5 B ebv 62
6 B gs 62
7 B msi 62
8 C cin 55
9 C ebv 55
10 C gs 55
11 C msi 55
12 D cin 70
13 D ebv 70
14 D gs 70
15 D msi 70
16 E cin 56
17 E ebv 56
18 E gs 56
19 E msi 56
Repeat Rows in Data Frame n Times
Use a combination of pd.DataFrame.loc
and pd.Index.repeat
test.loc[test.index.repeat(test.times)]
id times
0 a 2
0 a 2
1 b 3
1 b 3
1 b 3
2 c 1
3 d 5
3 d 5
3 d 5
3 d 5
3 d 5
To mimic your exact output, use reset_index
test.loc[test.index.repeat(test.times)].reset_index(drop=True)
id times
0 a 2
1 a 2
2 b 3
3 b 3
4 b 3
5 c 1
6 d 5
7 d 5
8 d 5
9 d 5
10 d 5
Repeat rows in DataFrame N times based on len(list) in column with different list values
Use DataFrame.explode
working in pandas 0.25+ and create new columns with DataFrame
constructor:
print (date_df)
a date
0 4 [[2017-02-01 00:00:00, 2017-03-01 00:00:00]]
1 7 [[2017-02-01 00:00:00, 2017-04-01 00:00:00], [...
df = date_df.explode('date')
print (df)
a date
0 4 [2017-02-01 00:00:00, 2017-03-01 00:00:00]
1 7 [2017-02-01 00:00:00, 2017-04-01 00:00:00]
1 7 [2017-02-01 00:00:00, 2017-04-01 00:00:00]
df[['date_start','date_end']] = pd.DataFrame(df.pop('date').values.tolist(), index=df.index)
print (df)
a date_start date_end
0 4 2017-02-01 2017-03-01
1 7 2017-02-01 2017-04-01
1 7 2017-02-01 2017-04-01
EDIT:
Solution for oldier pandas versions:
s = date_df.pop('date')
df = date_df.loc[date_df.index.repeat(s.str.len())]
df[['date_start','date_end']] = pd.DataFrame(np.concatenate(s), index=df.index)
df = df.reset_index(drop=True)
print (df)
a date_start date_end
0 4 2017-02-01 2017-03-01
1 7 2017-02-01 2017-04-01
2 7 2017-02-01 2017-04-01
pandas - Copy each row 'n' times depending on column value
Use Index.repeat
, DataFrame.loc
, DataFrame.assign
and DataFrame.reset_index
new_df = df.loc[df.index.repeat(df['orig_qty'])].assign(fifo_qty=1).reset_index(drop=True)
[output]
date orig_qty price fifo_qty
0 2019-04-08 4 115.0 1
1 2019-04-08 4 115.0 1
2 2019-04-08 4 115.0 1
3 2019-04-08 4 115.0 1
4 2019-04-09 2 103.0 1
5 2019-04-09 2 103.0 1
How to keep duplicated rows that repeat exactly n times in pandas DataFame
use .transform
and count
with a boolean filter.
s = df.groupby('peak_start')['peak_start'].transform('count')
df[s == 2]
peak_start peak_end motif_start motif_end strand
0 948 177 3210085 3210103 -
1 948 177 3210047 3210065 +
print(df[s == 3])
peak_start peak_end motif_start motif_end strand
2 62 419 3223269 3223287 -
3 62 419 3223229 3223247 +
4 62 419 3223232 3223250 +
Repeat rows in a pandas DataFrame based on column value
reindex
+ repeat
df.reindex(df.index.repeat(df.persons))
Out[951]:
code . role ..1 persons
0 123 . Janitor . 3
0 123 . Janitor . 3
0 123 . Janitor . 3
1 123 . Analyst . 2
1 123 . Analyst . 2
2 321 . Vallet . 2
2 321 . Vallet . 2
3 321 . Auditor . 5
3 321 . Auditor . 5
3 321 . Auditor . 5
3 321 . Auditor . 5
3 321 . Auditor . 5
PS: you can add.reset_index(drop=True)
to get the new index
Duplicating rows n times, where n is a value of a string
You could extend your example with the followign code:
set.seed(5)
df <- data.frame(state = c('A','B'), city = c('Other (3)','Other (2)'), count = c('250','50'))
times <- as.numeric(gsub(".*\\((.*)\\).*", "\\1", df$city))
df$count <- as.numeric(df$count)/times
output <- df[rep(seq_along(times),times),]
The key addition is the line creating output, which uses row indexing on the input dataframe to repeat each row as required.
Related Topics
Making a Matrix in Python 3 Without Numpy Using Inputs
Making a Dictionary from Each Line in a File
How to Read a Specific Line from a Text File in Python
Delete Every Non Utf-8 Symbols from String
Check If a Key Exists in a Bucket in S3 Using Boto3
Python How to Use Excelwriter to Write into an Existing Worksheet
How to Count the Amount of Sentences in a Paragraph in Python
Using Selenium in Python to Save a Webpage on Firefox
Pandas Filtering for Multiple Substrings in Series
How to Extract Hours and Minutes from a Datetime.Datetime Object
Print() Prints Only Every Second Input
Replacing All Negative Values in Certain Columns by Another Value in Pandas
How to Add Parenthesis Around a Substring in a String
Using Regex to Get the Value Between Two Characters (Python 3)
Passing Multiple Arguments from Django Template Href Link to View
How to Remove Name and Dtype from Pandas Output