Create a Variable That Identifies the Original Data.Frame After Rbind Command in R

Create a variable that identifies the original data.frame after rbind command in R

There's a function in the gdata package called combine that does just that.

df1 <- data.frame(a = seq(1, 5, by = 1),
b = seq(21, 25, by = 1))

df2 <- data.frame(a = seq(6, 10, by = 1),
b = seq(26, 30, by = 1))

library(gdata)
combine(df1, df2)

a b source
1 1 21 df1
2 2 22 df1
3 3 23 df1
4 4 24 df1
5 5 25 df1
6 6 26 df2
7 7 27 df2
8 8 28 df2
9 9 29 df2
10 10 30 df2

Combine (rbind) data frames and create column with name of original data frames

It's not exactly what you asked for, but it's pretty close. Put your objects in a named list and use do.call(rbind...)

> do.call(rbind, list(df1 = df1, df2 = df2))
x y
df1.1 1 2
df1.2 3 4
df2.1 5 6
df2.2 7 8

Notice that the row names now reflect the source data.frames.

Update: Use cbind and rbind

Another option is to make a basic function like the following:

AppendMe <- function(dfNames) {
do.call(rbind, lapply(dfNames, function(x) {
cbind(get(x), source = x)
}))
}

This function then takes a character vector of the data.frame names that you want to "stack", as follows:

> AppendMe(c("df1", "df2"))
x y source
1 1 2 df1
2 3 4 df1
3 5 6 df2
4 7 8 df2

Update 2: Use combine from the "gdata" package

> library(gdata)
> combine(df1, df2)
x y source
1 1 2 df1
2 3 4 df1
3 5 6 df2
4 7 8 df2

Update 3: Use rbindlist from "data.table"

Another approach that can be used now is to use rbindlist from "data.table" and its idcol argument. With that, the approach could be:

> rbindlist(mget(ls(pattern = "df\\d+")), idcol = TRUE)
.id x y
1: df1 1 2
2: df1 3 4
3: df2 5 6
4: df2 7 8

Update 4: use map_df from "purrr"

Similar to rbindlist, you can also use map_df from "purrr" with I or c as the function to apply to each list element.

> mget(ls(pattern = "df\\d+")) %>% map_df(I, .id = "src")
Source: local data frame [4 x 3]

src x y
(chr) (int) (int)
1 df1 1 2
2 df1 3 4
3 df2 5 6
4 df2 7 8

do.call(rbind, list(data, frames)) but also index each row by its original data frame

Here is one way.

library(dplyr)
library(tidyr)

foo <- list(df1, df2)

unnest(foo, names) %>%
mutate(names = gsub("^X", "", names))

# names a b
#1 1 1 3
#2 1 2 4
#3 2 5 7
#4 2 6 8

Combine two data frames by rows (rbind) when they have different sets of columns

rbind.fill from the package plyr might be what you are looking for.

R rbind while preserving order or rows in each data frame

Try this one-liner

do.call("rbind", Map("rbind", split(x, 1:nrow(x)), split(y, 1:nrow(y))))

which gives this data.frame if x and y are as in the question:

      a  b  c
1.1 1 2 3
1.2 10 20 30
2.2 2 3 4
2.21 20 30 40
3.3 3 4 5
3.31 30 40 50

It splits each data frame by row and then will rbind corresponding components of the splits. Then it rbinds all that. Note that this one-liner works even if the columns have different types. For example it will work even if:

x <- data.frame(a = letters[1:3], b = 1:3, c = c(TRUE, FALSE, TRUE))
y <- data.frame(a = LETTERS[1:3], b = 11:13, c = c(FALSE, TRUE, FALSE))

In R, reorganize list based on element names (rbind and indicator variable)

It sounds like you're doing a lot of gymnastics because you have a specific form in mind. What I would suggest is first trying to make the data tidy. Without reading the link, the quick summary is to put your data into a single data frame, where it can be easily processed.

The quick version of the answer (here I've used lst instead of list for the name to avoid confusion with the built-in list) is to do this:

do.call(rbind,
lapply(seq(lst), function(i) {
lst[[i]]$type <- names(lst)[i]; lst[[i]]
})
)

What this will do is create a single data frame, with a column, "type", that contains the name of the list item in which that row appeared.

Using a slightly simplified version of your initial data:

lst <- list(A1=data.frame(x=rnorm(5)), A2=data.frame(x=rnorm(3)), B=data.frame(x=rnorm(5)))
lst
$A1
x
1 1.3386071
2 1.9875317
3 0.4942179
4 -0.1803087
5 0.3094100

$A2
x
1 -0.3388195
2 1.1993115
3 1.9524970

$B
x
1 -0.1317882
2 -0.3383545
3 0.8864144
4 0.9241305
5 -0.8481927

And then applying the magic function

df <- do.call(rbind,
lapply(seq(lst), function(i) {
lst[[i]]$type <- names(lst)[i]; lst[[i]]
})
)
df
x type
1 1.3386071 A1
2 1.9875317 A1
3 0.4942179 A1
4 -0.1803087 A1
5 0.3094100 A1
6 -0.3388195 A2
7 1.1993115 A2
8 1.9524970 A2
9 -0.1317882 B
10 -0.3383545 B
11 0.8864144 B
12 0.9241305 B
13 -0.8481927 B

From here we can process to our hearts content; with operations like df$subject <- gsub("[0-9]*", "", df$type) to extract the non-numeric portion of type, and tools like split can be used to generate the sub-lists that you mention in your question.

In addition, once it is in this form, you can use functions like by and aggregate or libraries like dplyr or data.table to do more advanced split-apply-combine operations for data analysis.

Combine loop with Rbind

You can get and assign variables by their names assuming your data frames are stored in the R global environment:

library(tidyverse)

x <- c(1,2,3)
y <- c(1,2,3)

df_a_1 <- data.frame(x,y)
df_a_2 <- data.frame(x,y)

df_b_1 <- data.frame(x,y)
df_b_2 <- data.frame(x,y)

df_c_1 <- data.frame(x,y)
df_c_2 <- data.frame(x,y)

letters <- c("a", "b", "c")

for(l in letters) {
prefix <- str_glue("df_{l}")
res <- names(globalenv()) %>%
keep(~ .x %>% str_detect(prefix)) %>%
map(get) %>%
reduce(rbind)
assign(prefix, res)
}

df_a
#> x y
#> 1 1 1
#> 2 2 2
#> 3 3 3
#> 4 1 1
#> 5 2 2
#> 6 3 3
df_b
#> x y
#> 1 1 1
#> 2 2 2
#> 3 3 3
#> 4 1 1
#> 5 2 2
#> 6 3 3

Created on 2021-11-10 by the reprex package (v2.0.1)



Related Topics



Leave a reply



Submit