How to Add a Index by Set of Data When Using Rbindlist

How to add a index by set of data when using rbindlist?

This is an enhanced version of Nicolás' answer which adds the file names instead of numbers:

x2csv <- rbindlist(lapply(files, fread), idcol = "origin")
x2csv[, origin := factor(origin, labels = basename(files))]
  • fread() uses stringsAsFactors = FALSE by default so we can save some keystrokes
  • Also fill = TRUE is only required if we want to read files with differing structure, e.g., differing position, name, or number of columns
  • The id col can be named (the default is .id) and is populated with the sequence number of the list element.
  • Then, this number is converted into a factor whose levels are labeled with the file names. A file name might be easier to remember than just a mere number. basename() strips the path off the file name.

Add the index of list to bind_rows?

data.table solution

use rbindlist() from the data.table-package, which had built-in id-support that respects NULL df's.

library(data.table)
rbindlist( dat, idcol = TRUE )

.id Group.1 Pr1
1: 1 C 65
2: 1 D 75
3: 3 C 81
4: 3 D 4

dplyr - partly solution

bind_rows also has ID-support, but it 'skips' empty elements...

bind_rows( dat, .id = "id" )

id Group.1 Pr1
1 1 C 65
2 1 D 75
3 2 C 81
4 2 D 4

Note that the ID of the third element from dat becomes 2, and not 3.

Binding lists of data frames while preserving index

We can use imap to get the index

library(purrr)
imap_dfr(foo, ~ .x %>%
mutate(index = .y))

Or with map

map_dfr(foo, .f = cbind, .id = 'index')
# index x y
#1 1 a 1
#2 1 b 2
#3 1 c 3
#4 2 d 4
#5 2 e 5
#6 2 f 6

Or use Map from base R, where we loop through the elements of 'foo' and a corresponding sequence of 'foo', cbind to create a new column and then rbind the list elements

do.call(rbind, Map(cbind, index = seq_along(foo), foo))
# index x y
#1 1 a 1
#2 1 b 2
#3 1 c 3
#4 2 d 4
#5 2 e 5
#6 2 f 6

R: Combine list of data frames into single data frame, add column with list index

Try data.table::rbindlist

library(data.table) # v1.9.5+
rbindlist(dfList, idcol = "index")
# index a b c
# 1: 1 g 1.27242932 -0.005767173
# 2: 1 j 0.41464143 2.404653389
# 3: 1 o -1.53995004 0.763593461
# 4: 1 x -0.92856703 -0.799009249
# 5: 1 f -0.29472045 -1.147657009
# 6: 2 k -0.04493361 0.918977372
# 7: 2 a -0.01619026 0.782136301
# 8: 2 j 0.94383621 0.074564983
# 9: 2 w 0.82122120 -1.989351696
# 10: 2 i 0.59390132 0.619825748
# 11: 3 m -1.28459935 -0.649471647
# 12: 3 w 0.04672617 0.726750747
# 13: 3 l -0.23570656 1.151911754
# 14: 3 g -0.54288826 0.992160365
# 15: 3 b -0.43331032 -0.429513109

rbindlist - how to get an additional column with info about a source?

Thanks everyone for commenting my question. Finally, I came out with this solution:

ASC_files <- list.files(pattern="*.asc")
ASC_all <- sapply(ASC_files, function(x) read.csv(x, header=FALSE, col.names
paste0('V', 1:1268) , sep="", stringsAsFactors = FALSE))
#adding a new column with name of the source file
ASC_all <- mapply(cbind, ASC_all, "source"=ASC_files, SIMPLIFY = FALSE)
#adding a new column with row number
ASC_all <- map(ASC_all, ~rowid_to_column(.x, var="row"))
#removing last and first 25 rows in each dataframe of the list
ASC_all <- lapply(ASC_all, function(x) x[x$row<(nrow(x)-25),])
ASC_all <- lapply(ASC_all, function(x) x[x$row>25,])
#transforming the list into a dataframe with all data
ASC_all <- rbindlist(ASC_all)
#complementing the kolumn source with the row number (result: filename.csv.rownumber)
ASC_all$file <- paste0(ASC_all$file, '.', ASC_all$row)
#removing column with row numbers
ASC_all$row <- NULL

Maybe it's not the most elegant and efficient code but at least it works.

Merging list of data frames, each as a factor R

For additional reference,

dplyr::bind_rows([list], .id='year')

is probably the easiest way assuming you've named the list elements by year already.

unlist a list of data.tables with list index

as @akrun points outs idcol is available in data.tables from v.1.9.6

rbindlist(l1, idcol = 'g')

r rbind dataframes in each list using lapply function

You may use Map() which element-wise applies a function to the first elements of each of its arguments.

Map(rbind, odtl, adtl)
# [[1]]
# x index
# 1 1 1.0
# 2 2 2.0
# 3 3 3.0
# 4 4 4.0
# 5 5 5.0
# 6 NA 1.5
# 7 NA 2.5
# 8 NA 3.5
# 9 NA 4.5
# 10 NA 5.5
#
# [[2]]
# x index
# 1 1 1.0
# 2 2 2.0
# 3 3 3.0
# 4 4 4.0
# 5 NA 1.5
# 6 NA 2.5
# 7 NA 3.5
# 8 NA 4.5
#
# [[3]]
# x index
# 1 1 1.0
# 2 2 2.0
# 3 3 3.0
# 4 NA 1.5
# 5 NA 2.5
# 6 NA 3.5

Data

odtl  <- list(data.frame(x=1:5, index=1:5),
data.frame(x=1:4, index=1:4),
data.frame(x=1:3, index=1:3))
adtl <- list(data.frame(x=NA, index=seq(1.5, 5.5, 1)),
data.frame(x=NA, index=seq(1.5, 4.5, 1)),
data.frame(x=NA, index=seq(1.5, 3.5, 1)))

Combine (rbind) data frames and create column with name of original data frames

It's not exactly what you asked for, but it's pretty close. Put your objects in a named list and use do.call(rbind...)

> do.call(rbind, list(df1 = df1, df2 = df2))
x y
df1.1 1 2
df1.2 3 4
df2.1 5 6
df2.2 7 8

Notice that the row names now reflect the source data.frames.

Update: Use cbind and rbind

Another option is to make a basic function like the following:

AppendMe <- function(dfNames) {
do.call(rbind, lapply(dfNames, function(x) {
cbind(get(x), source = x)
}))
}

This function then takes a character vector of the data.frame names that you want to "stack", as follows:

> AppendMe(c("df1", "df2"))
x y source
1 1 2 df1
2 3 4 df1
3 5 6 df2
4 7 8 df2

Update 2: Use combine from the "gdata" package

> library(gdata)
> combine(df1, df2)
x y source
1 1 2 df1
2 3 4 df1
3 5 6 df2
4 7 8 df2

Update 3: Use rbindlist from "data.table"

Another approach that can be used now is to use rbindlist from "data.table" and its idcol argument. With that, the approach could be:

> rbindlist(mget(ls(pattern = "df\\d+")), idcol = TRUE)
.id x y
1: df1 1 2
2: df1 3 4
3: df2 5 6
4: df2 7 8

Update 4: use map_df from "purrr"

Similar to rbindlist, you can also use map_df from "purrr" with I or c as the function to apply to each list element.

> mget(ls(pattern = "df\\d+")) %>% map_df(I, .id = "src")
Source: local data frame [4 x 3]

src x y
(chr) (int) (int)
1 df1 1 2
2 df1 3 4
3 df2 5 6
4 df2 7 8


Related Topics



Leave a reply



Submit