Repeat Each Row of Data.Frame the Number of Times Specified in a Column

Pandas data frame repeat each row a certain number of times

Create dictionary for number of repeats for each Minute, Series.map and then repeat index with Index.repeat, last use DataFrame.loc for repeat rows:

print (df)
Minutiae LR
0 1 1.975476
1 2 1.082983
2 3 0.269608
3 4 0.878350

d = {1:2, 2:1, 3:5, 4:3}

df1 = df.loc[df.index.repeat(df['Minutiae'].map(d))]
print (df1)
Minutiae LR
0 1 1.975476
0 1 1.975476
1 2 1.082983
2 3 0.269608
2 3 0.269608
2 3 0.269608
2 3 0.269608
2 3 0.269608
3 4 0.878350
3 4 0.878350
3 4 0.878350

Detail:

print (df['Minutiae'].map(d))
0 2
1 1
2 5
3 3
Name: Minutiae, dtype: int64

print (df.index.repeat(df['Minutiae'].map(d)))
Int64Index([0, 0, 1, 2, 2, 2, 2, 2, 3, 3, 3], dtype='int64')

Or create new column for repeating:

df['repeat'] = [2,1,5,3]
print (df)
Minutiae LR repeat
0 1 1.975476 2
1 2 1.082983 1
2 3 0.269608 5
3 4 0.878350 3

df2 = df.loc[df.index.repeat(df['repeat'])]
print (df2)
Minutiae LR repeat
0 1 1.975476 2
0 1 1.975476 2
1 2 1.082983 1
2 3 0.269608 5
2 3 0.269608 5
2 3 0.269608 5
2 3 0.269608 5
2 3 0.269608 5
3 4 0.878350 3
3 4 0.878350 3
3 4 0.878350 3

Repeat each row of data.frame the number of times specified in a column

Here's one solution:

df.expanded <- df[rep(row.names(df), df$freq), 1:2]

Result:

    var1 var2
1 a d
2 b e
2.1 b e
3 c f
3.1 c f
3.2 c f

Repeat Rows in Data Frame n Times

Use a combination of pd.DataFrame.loc and pd.Index.repeat

test.loc[test.index.repeat(test.times)]

id times
0 a 2
0 a 2
1 b 3
1 b 3
1 b 3
2 c 1
3 d 5
3 d 5
3 d 5
3 d 5
3 d 5

To mimic your exact output, use reset_index

test.loc[test.index.repeat(test.times)].reset_index(drop=True)

id times
0 a 2
1 a 2
2 b 3
3 b 3
4 b 3
5 c 1
6 d 5
7 d 5
8 d 5
9 d 5
10 d 5

Repeat each Row in a Dataframe different N times according to the difference between two value in the Time Column

Create another column to hold the difference in the values of columns, for repetition reference and then do the operation like this:

import pandas as pd

# Sample dataframe
df = pd.DataFrame({
'id' : ['a', 'b', 'c', 'd'],
'col1' : [4, 5, 6, 7],
'col2' : [3, 2, 4, 3]
})

# Create a new column to hold the difference in column values
# i.e. the number of times the row repition is required.
df['times'] = df.col1 - df.col2

# create the finalDf with repeated rows
finalDf = df.loc[df.index.repeat(df.times)].reset_index(drop=True)
print(finalDf.head())

The output of print statement looks like:

  id  col1  col2  times
0 a 4 3 1
1 b 5 2 3
2 b 5 2 3
3 b 5 2 3
4 c 6 4 2

R: Repeating row of dataframe with respect to multiple count columns

Here is a tidyverse option. We can use uncount from tidyr to duplicate the rows according to the count in value (i.e., from the var columns) after pivoting to long format.

library(tidyverse)

df %>%
pivot_longer(starts_with("var"), names_to = "class") %>%
filter(value != 0) %>%
uncount(value) %>%
mutate(class = str_extract(class, "\\d+"))

Output

  f1    f2    class
<chr> <chr> <chr>
1 a c 1
2 a c 3
3 a c 3
4 a c 3
5 b d 1
6 b d 2
7 b d 2

Another slight variation is to use expandrows from splitstackshape in conjunction with tidyverse.

library(splitstackshape)

df %>%
pivot_longer(starts_with("var"), names_to = "class") %>%
filter(value != 0) %>%
expandRows("value") %>%
mutate(class = str_extract(class, "\\d+"))

repeat each row of a data.frame as often as nrow() of another data.frame

You can repeat the row index :

df1[rep(1:nrow(df1), nrow(df1)), ]

Or using tidyr::uncount

library(dplyr)
tidyr::uncount(df1, n())

Repeat rows of a data.frame

df <- data.frame(a = 1:2, b = letters[1:2]) 
df[rep(seq_len(nrow(df)), each = 2), ]

De-aggregate a data frame

Here's a tidyverse solution.

As you say, it's easy to repeat a row an arbitrary number of times. If you know that row_number() counts rows within groups when a data frame is grouped, then it's easy to convert grouped counts to presence/absence flags. across gives you a way to succinctly convert multiple count columns.

library(tidyverse)

tibble(group=c("A", "B"), total_N=c(4,5), measure_A=c(1,4), measure_B=c(2,3)) %>%
uncount(total_N) %>%
group_by(group) %>%
mutate(
across(
starts_with("measure"),
function(x) as.numeric(row_number() <= x)
)
) %>%
ungroup()
# A tibble: 9 × 3
group measure_A measure_B
<chr> <dbl> <dbl>
1 A 1 1
2 A 0 1
3 A 0 0
4 A 0 0
5 B 1 1
6 B 1 1
7 B 1 1
8 B 1 0
9 B 0 0

As you say, this approach takes no account of correlations between the outcome columns, as this cannot be deduced from the grouped data.



Related Topics



Leave a reply



Submit