Replicate Each Row of Data.Frame and Specify the Number of Replications for Each Row

Replicate each row of data.frame and specify the number of replications for each row?

Update

Upon revisiting this question, I have a feeling that @Codoremifa was correct in their assumption that your "frequency" column might be a factor.

Here's an example if that were the case. It won't match your actual data since I don't know what other levels are in your dataset.

mydf$F2 <- factor(as.character(mydf$frequency))
## expandRows(mydf, "F2")
mydf[rep(rownames(mydf), mydf$F2), ]
# a b frequency F2
# 1 5 3 2 2
# 1.1 5 3 2 2
# 1.2 5 3 2 2
# 2 5 7 1 1
# 3 9 1 40 40
# 3.1 9 1 40 40
# 3.2 9 1 40 40
# 3.3 9 1 40 40
# 4 12 4 5 5
# 4.1 12 4 5 5
# 4.2 12 4 5 5
# 4.3 12 4 5 5
# 4.4 12 4 5 5
# 5 12 5 13 13
# 5.1 12 5 13 13

Hmmm. That doesn't look like 61 rows to me. Why not? Because rep uses the numeric values underlying the factor, which is quite different in this case from the displayed value:

as.numeric(mydf$F2)
# [1] 3 1 4 5 2

To properly convert it, you would need:

as.numeric(as.character(mydf$F2))
# [1] 2 1 40 5 13

Original answer

A while ago I wrote a function that is a bit more of a generalization of @Simono101's answer. The function looks like this:

expandRows <- function(dataset, count, count.is.col = TRUE) {
if (!isTRUE(count.is.col)) {
if (length(count) == 1) {
dataset[rep(rownames(dataset), each = count), ]
} else {
if (length(count) != nrow(dataset)) {
stop("Expand vector does not match number of rows in data.frame")
}
dataset[rep(rownames(dataset), count), ]
}
} else {
dataset[rep(rownames(dataset), dataset[[count]]),
setdiff(names(dataset), names(dataset[count]))]
}
}

For your purposes, you could just use expandRows(mydf, "frequency")

head(expandRows(mydf, "frequency"))
# a b
# 1 5 3
# 1.1 5 3
# 2 5 7
# 3 9 1
# 3.1 9 1
# 3.2 9 1

Other options are to repeat each row the same number of times:

expandRows(mydf, 2, count.is.col=FALSE)
# a b frequency
# 1 5 3 2
# 1.1 5 3 2
# 2 5 7 1
# 2.1 5 7 1
# 3 9 1 40
# 3.1 9 1 40
# 4 12 4 5
# 4.1 12 4 5
# 5 12 5 13
# 5.1 12 5 13

Or to specify a vector of how many times to repeat each row.

expandRows(mydf, c(1, 2, 1, 0, 2), count.is.col=FALSE)
# a b frequency
# 1 5 3 2
# 2 5 7 1
# 2.1 5 7 1
# 3 9 1 40
# 5 12 5 13
# 5.1 12 5 13

Note the required count.is.col = FALSE argument in those last two options.

Repeat each row of data.frame the number of times specified in a column

Here's one solution:

df.expanded <- df[rep(row.names(df), df$freq), 1:2]

Result:

    var1 var2
1 a d
2 b e
2.1 b e
3 c f
3.1 c f
3.2 c f

How can I replicate rows in Pandas?

Use np.repeat:

Version 1:

Try using np.repeat:

newdf = pd.DataFrame(np.repeat(df.values, 3, axis=0))
newdf.columns = df.columns
print(newdf)

The above code will output:

  Person   ID ZipCode  Gender
0 12345 882 38182 Female
1 12345 882 38182 Female
2 12345 882 38182 Female
3 32917 271 88172 Male
4 32917 271 88172 Male
5 32917 271 88172 Male
6 18273 552 90291 Female
7 18273 552 90291 Female
8 18273 552 90291 Female

np.repeat repeats the values of df, 3 times.

Then we add the columns with assigning new_df.columns = df.columns.

Version 2:

You could also assign the column names in the first line, like below:

newdf = pd.DataFrame(np.repeat(df.values, 3, axis=0), columns=df.columns)
print(newdf)

The above code will also output:

  Person   ID ZipCode  Gender
0 12345 882 38182 Female
1 12345 882 38182 Female
2 12345 882 38182 Female
3 32917 271 88172 Male
4 32917 271 88172 Male
5 32917 271 88172 Male
6 18273 552 90291 Female
7 18273 552 90291 Female
8 18273 552 90291 Female

Repeat rows of a data.frame

df <- data.frame(a = 1:2, b = letters[1:2]) 
df[rep(seq_len(nrow(df)), each = 2), ]

Repeat Rows in Data Frame n Times

Use a combination of pd.DataFrame.loc and pd.Index.repeat

test.loc[test.index.repeat(test.times)]

id times
0 a 2
0 a 2
1 b 3
1 b 3
1 b 3
2 c 1
3 d 5
3 d 5
3 d 5
3 d 5
3 d 5

To mimic your exact output, use reset_index

test.loc[test.index.repeat(test.times)].reset_index(drop=True)

id times
0 a 2
1 a 2
2 b 3
3 b 3
4 b 3
5 c 1
6 d 5
7 d 5
8 d 5
9 d 5
10 d 5

Pandas - replicate rows with new column value from a list for each replication

Here is a way using the keys paramater of pd.concat():

(pd.concat([df]*len(New_Cost_List),
keys = New_Cost_List,
names = ['New_Cost',None])
.reset_index(level=0))

Output:

   New_Cost State  Cost
0 1 A 2
1 1 B 9
2 1 C 8
3 1 D 4
0 5 A 2
1 5 B 9
2 5 C 8
3 5 D 4
0 10 A 2
1 10 B 9
2 10 C 8
3 10 D 4

Repeat rows in a data frame AND add an increment field

You can use uncount from tidyr

library(dplyr)
library(tidyr)

df %>%
uncount(weights = freq, .id = "n", .remove = F) %>%
mutate(value = freq + n - 1)

data startValue freq n value
1 a 3.4 3 1 3
2 a 3.4 3 2 4
3 a 3.4 3 3 5
4 b 2.1 2 1 2
5 b 2.1 2 2 3
6 c 6.3 1 1 1

Python Pandas replicate rows in dataframe

You can put df_try inside a list and then do what you have in mind:

>>> df.append([df_try]*5,ignore_index=True)

Store Dept Date Weekly_Sales IsHoliday
0 1 1 2010-02-05 24924.50 False
1 1 1 2010-02-12 46039.49 True
2 1 1 2010-02-19 41595.55 False
3 1 1 2010-02-26 19403.54 False
4 1 1 2010-03-05 21827.90 False
5 1 1 2010-03-12 21043.39 False
6 1 1 2010-03-19 22136.64 False
7 1 1 2010-03-26 26229.21 False
8 1 1 2010-04-02 57258.43 False
9 1 1 2010-02-12 46039.49 True
10 1 1 2010-02-12 46039.49 True
11 1 1 2010-02-12 46039.49 True
12 1 1 2010-02-12 46039.49 True
13 1 1 2010-02-12 46039.49 True


Related Topics



Leave a reply



Submit