Pandas data frame repeat each row a certain number of times
Create dictionary for number of repeats for each Minute
, Series.map
and then repeat index with Index.repeat
, last use DataFrame.loc
for repeat rows:
print (df)
Minutiae LR
0 1 1.975476
1 2 1.082983
2 3 0.269608
3 4 0.878350
d = {1:2, 2:1, 3:5, 4:3}
df1 = df.loc[df.index.repeat(df['Minutiae'].map(d))]
print (df1)
Minutiae LR
0 1 1.975476
0 1 1.975476
1 2 1.082983
2 3 0.269608
2 3 0.269608
2 3 0.269608
2 3 0.269608
2 3 0.269608
3 4 0.878350
3 4 0.878350
3 4 0.878350
Detail:
print (df['Minutiae'].map(d))
0 2
1 1
2 5
3 3
Name: Minutiae, dtype: int64
print (df.index.repeat(df['Minutiae'].map(d)))
Int64Index([0, 0, 1, 2, 2, 2, 2, 2, 3, 3, 3], dtype='int64')
Or create new column for repeating:
df['repeat'] = [2,1,5,3]
print (df)
Minutiae LR repeat
0 1 1.975476 2
1 2 1.082983 1
2 3 0.269608 5
3 4 0.878350 3
df2 = df.loc[df.index.repeat(df['repeat'])]
print (df2)
Minutiae LR repeat
0 1 1.975476 2
0 1 1.975476 2
1 2 1.082983 1
2 3 0.269608 5
2 3 0.269608 5
2 3 0.269608 5
2 3 0.269608 5
2 3 0.269608 5
3 4 0.878350 3
3 4 0.878350 3
3 4 0.878350 3
Repeat each row of data.frame the number of times specified in a column
Here's one solution:
df.expanded <- df[rep(row.names(df), df$freq), 1:2]
Result:
var1 var2
1 a d
2 b e
2.1 b e
3 c f
3.1 c f
3.2 c f
Repeat Rows in Data Frame n Times
Use a combination of pd.DataFrame.loc
and pd.Index.repeat
test.loc[test.index.repeat(test.times)]
id times
0 a 2
0 a 2
1 b 3
1 b 3
1 b 3
2 c 1
3 d 5
3 d 5
3 d 5
3 d 5
3 d 5
To mimic your exact output, use reset_index
test.loc[test.index.repeat(test.times)].reset_index(drop=True)
id times
0 a 2
1 a 2
2 b 3
3 b 3
4 b 3
5 c 1
6 d 5
7 d 5
8 d 5
9 d 5
10 d 5
Repeat each Row in a Dataframe different N times according to the difference between two value in the Time Column
Create another column to hold the difference in the values of columns, for repetition reference and then do the operation like this:
import pandas as pd
# Sample dataframe
df = pd.DataFrame({
'id' : ['a', 'b', 'c', 'd'],
'col1' : [4, 5, 6, 7],
'col2' : [3, 2, 4, 3]
})
# Create a new column to hold the difference in column values
# i.e. the number of times the row repition is required.
df['times'] = df.col1 - df.col2
# create the finalDf with repeated rows
finalDf = df.loc[df.index.repeat(df.times)].reset_index(drop=True)
print(finalDf.head())
The output of print
statement looks like:
id col1 col2 times
0 a 4 3 1
1 b 5 2 3
2 b 5 2 3
3 b 5 2 3
4 c 6 4 2
R: Repeating row of dataframe with respect to multiple count columns
Here is a tidyverse
option. We can use uncount
from tidyr
to duplicate the rows according to the count in value
(i.e., from the var
columns) after pivoting to long format.
library(tidyverse)
df %>%
pivot_longer(starts_with("var"), names_to = "class") %>%
filter(value != 0) %>%
uncount(value) %>%
mutate(class = str_extract(class, "\\d+"))
Output
f1 f2 class
<chr> <chr> <chr>
1 a c 1
2 a c 3
3 a c 3
4 a c 3
5 b d 1
6 b d 2
7 b d 2
Another slight variation is to use expandrows
from splitstackshape
in conjunction with tidyverse
.
library(splitstackshape)
df %>%
pivot_longer(starts_with("var"), names_to = "class") %>%
filter(value != 0) %>%
expandRows("value") %>%
mutate(class = str_extract(class, "\\d+"))
repeat each row of a data.frame as often as nrow() of another data.frame
You can repeat the row index :
df1[rep(1:nrow(df1), nrow(df1)), ]
Or using tidyr::uncount
library(dplyr)
tidyr::uncount(df1, n())
Repeat rows of a data.frame
df <- data.frame(a = 1:2, b = letters[1:2])
df[rep(seq_len(nrow(df)), each = 2), ]
De-aggregate a data frame
Here's a tidyverse solution.
As you say, it's easy to repeat a row an arbitrary number of times. If you know that row_number()
counts rows within groups when a data frame is grouped, then it's easy to convert grouped counts to presence/absence flags. across
gives you a way to succinctly convert multiple count columns.
library(tidyverse)
tibble(group=c("A", "B"), total_N=c(4,5), measure_A=c(1,4), measure_B=c(2,3)) %>%
uncount(total_N) %>%
group_by(group) %>%
mutate(
across(
starts_with("measure"),
function(x) as.numeric(row_number() <= x)
)
) %>%
ungroup()
# A tibble: 9 × 3
group measure_A measure_B
<chr> <dbl> <dbl>
1 A 1 1
2 A 0 1
3 A 0 0
4 A 0 0
5 B 1 1
6 B 1 1
7 B 1 1
8 B 1 0
9 B 0 0
As you say, this approach takes no account of correlations between the outcome columns, as this cannot be deduced from the grouped data.
Related Topics
Creating a New Column Based on Unique Id With Values in R
How to Force a Line Break in Rmarkdown'S Title
Using Ifelse Statement on the Whole Dataset Instead of a Single Column
How to Fix Spaces in Column Names of a Data.Frame (Remove Spaces, Inject Dots)
How to Convert a Factor to Integer\Numeric Without Loss of Information
Finding All Duplicate Rows, Including "Elements With Smaller Subscripts"
How to Use a Variable to Specify Column Name in Ggplot
Why Are My Dplyr Group_By & Summarize Not Working Properly? (Name-Collision With Plyr)
Split String Column to Create New Binary Columns
Calculate the Mean of Every 13 Rows in Data Frame
Creating Grouped Bar-Plot of Multi-Column Data in R
Combing a Categorical Variable to Create a New Categorical Variable in R
Removing Space Between Numeric Values in R
Reshaping Multiple Sets of Measurement Columns (Wide Format) into Single Columns (Long Format)
Understanding Exactly When a Data.Table Is a Reference to (Vs a Copy Of) Another Data.Table
Drop Data Frame Columns by Name
How to Use Pivot_Longer to Reshape from Wide-Type Data to Long-Type Data With Multiple Variables