How to add a index by set of data when using rbindlist?
This is an enhanced version of Nicolás' answer which adds the file names instead of numbers:
x2csv <- rbindlist(lapply(files, fread), idcol = "origin")
x2csv[, origin := factor(origin, labels = basename(files))]
fread()
usesstringsAsFactors = FALSE
by default so we can save some keystrokes- Also
fill = TRUE
is only required if we want to read files with differing structure, e.g., differing position, name, or number of columns - The id col can be named (the default is
.id
) and is populated with the sequence number of the list element. - Then, this number is converted into a factor whose levels are labeled with the file names. A file name might be easier to remember than just a mere number.
basename()
strips the path off the file name.
Add the index of list to bind_rows?
data.table solution
use rbindlist()
from the data.table
-package, which had built-in id-support that respects NULL df's.
library(data.table)
rbindlist( dat, idcol = TRUE )
.id Group.1 Pr1
1: 1 C 65
2: 1 D 75
3: 3 C 81
4: 3 D 4
dplyr - partly solution
bind_rows also has ID-support, but it 'skips' empty elements...
bind_rows( dat, .id = "id" )
id Group.1 Pr1
1 1 C 65
2 1 D 75
3 2 C 81
4 2 D 4
Note that the ID of the third element from dat becomes 2, and not 3.
Binding lists of data frames while preserving index
We can use imap
to get the index
library(purrr)
imap_dfr(foo, ~ .x %>%
mutate(index = .y))
Or with map
map_dfr(foo, .f = cbind, .id = 'index')
# index x y
#1 1 a 1
#2 1 b 2
#3 1 c 3
#4 2 d 4
#5 2 e 5
#6 2 f 6
Or use Map
from base R
, where we loop through the elements of 'foo' and a corresponding sequence of 'foo', cbind
to create a new column and then rbind
the list
elements
do.call(rbind, Map(cbind, index = seq_along(foo), foo))
# index x y
#1 1 a 1
#2 1 b 2
#3 1 c 3
#4 2 d 4
#5 2 e 5
#6 2 f 6
R: Combine list of data frames into single data frame, add column with list index
Try data.table::rbindlist
library(data.table) # v1.9.5+
rbindlist(dfList, idcol = "index")
# index a b c
# 1: 1 g 1.27242932 -0.005767173
# 2: 1 j 0.41464143 2.404653389
# 3: 1 o -1.53995004 0.763593461
# 4: 1 x -0.92856703 -0.799009249
# 5: 1 f -0.29472045 -1.147657009
# 6: 2 k -0.04493361 0.918977372
# 7: 2 a -0.01619026 0.782136301
# 8: 2 j 0.94383621 0.074564983
# 9: 2 w 0.82122120 -1.989351696
# 10: 2 i 0.59390132 0.619825748
# 11: 3 m -1.28459935 -0.649471647
# 12: 3 w 0.04672617 0.726750747
# 13: 3 l -0.23570656 1.151911754
# 14: 3 g -0.54288826 0.992160365
# 15: 3 b -0.43331032 -0.429513109
rbindlist - how to get an additional column with info about a source?
Thanks everyone for commenting my question. Finally, I came out with this solution:
ASC_files <- list.files(pattern="*.asc")
ASC_all <- sapply(ASC_files, function(x) read.csv(x, header=FALSE, col.names
paste0('V', 1:1268) , sep="", stringsAsFactors = FALSE))
#adding a new column with name of the source file
ASC_all <- mapply(cbind, ASC_all, "source"=ASC_files, SIMPLIFY = FALSE)
#adding a new column with row number
ASC_all <- map(ASC_all, ~rowid_to_column(.x, var="row"))
#removing last and first 25 rows in each dataframe of the list
ASC_all <- lapply(ASC_all, function(x) x[x$row<(nrow(x)-25),])
ASC_all <- lapply(ASC_all, function(x) x[x$row>25,])
#transforming the list into a dataframe with all data
ASC_all <- rbindlist(ASC_all)
#complementing the kolumn source with the row number (result: filename.csv.rownumber)
ASC_all$file <- paste0(ASC_all$file, '.', ASC_all$row)
#removing column with row numbers
ASC_all$row <- NULL
Maybe it's not the most elegant and efficient code but at least it works.
Merging list of data frames, each as a factor R
For additional reference,
dplyr::bind_rows([list], .id='year')
is probably the easiest way assuming you've named the list elements by year already.
unlist a list of data.tables with list index
as @akrun points outs idcol
is available in data.tables
from v.1.9.6
rbindlist(l1, idcol = 'g')
r rbind dataframes in each list using lapply function
You may use Map()
which element-wise applies a function to the first elements of each of its arguments.
Map(rbind, odtl, adtl)
# [[1]]
# x index
# 1 1 1.0
# 2 2 2.0
# 3 3 3.0
# 4 4 4.0
# 5 5 5.0
# 6 NA 1.5
# 7 NA 2.5
# 8 NA 3.5
# 9 NA 4.5
# 10 NA 5.5
#
# [[2]]
# x index
# 1 1 1.0
# 2 2 2.0
# 3 3 3.0
# 4 4 4.0
# 5 NA 1.5
# 6 NA 2.5
# 7 NA 3.5
# 8 NA 4.5
#
# [[3]]
# x index
# 1 1 1.0
# 2 2 2.0
# 3 3 3.0
# 4 NA 1.5
# 5 NA 2.5
# 6 NA 3.5
Data
odtl <- list(data.frame(x=1:5, index=1:5),
data.frame(x=1:4, index=1:4),
data.frame(x=1:3, index=1:3))
adtl <- list(data.frame(x=NA, index=seq(1.5, 5.5, 1)),
data.frame(x=NA, index=seq(1.5, 4.5, 1)),
data.frame(x=NA, index=seq(1.5, 3.5, 1)))
Combine (rbind) data frames and create column with name of original data frames
It's not exactly what you asked for, but it's pretty close. Put your objects in a named list and use do.call(rbind...)
> do.call(rbind, list(df1 = df1, df2 = df2))
x y
df1.1 1 2
df1.2 3 4
df2.1 5 6
df2.2 7 8
Notice that the row names now reflect the source data.frame
s.
Update: Use cbind
and rbind
Another option is to make a basic function like the following:
AppendMe <- function(dfNames) {
do.call(rbind, lapply(dfNames, function(x) {
cbind(get(x), source = x)
}))
}
This function then takes a character vector of the data.frame
names that you want to "stack", as follows:
> AppendMe(c("df1", "df2"))
x y source
1 1 2 df1
2 3 4 df1
3 5 6 df2
4 7 8 df2
Update 2: Use combine
from the "gdata" package
> library(gdata)
> combine(df1, df2)
x y source
1 1 2 df1
2 3 4 df1
3 5 6 df2
4 7 8 df2
Update 3: Use rbindlist
from "data.table"
Another approach that can be used now is to use rbindlist
from "data.table" and its idcol
argument. With that, the approach could be:
> rbindlist(mget(ls(pattern = "df\\d+")), idcol = TRUE)
.id x y
1: df1 1 2
2: df1 3 4
3: df2 5 6
4: df2 7 8
Update 4: use map_df
from "purrr"
Similar to rbindlist
, you can also use map_df
from "purrr" with I
or c
as the function to apply to each list element.
> mget(ls(pattern = "df\\d+")) %>% map_df(I, .id = "src")
Source: local data frame [4 x 3]
src x y
(chr) (int) (int)
1 df1 1 2
2 df1 3 4
3 df2 5 6
4 df2 7 8
Related Topics
Loop in R: How to Save the Outputs
Use R Code or Windows User Variable ("%Userprofile%") in Yaml
Rle-Like Function That Catches "Run" of Adjacent Integers
Read and Rbind Multiple CSV Files
Converting Factors to Binary in R
Find Rows in a Data Frame Where Two Columns Are Equal
How to Plot Multiple Stacked Histograms Together in R
Calculate Group Mean While Excluding Current Observation Using Dplyr
List for Multiple Plots from Loop (Ggplot2) - List Elements Being Overwritten
Divide Row Value by Aggregated Sum in R Data.Frame
Dt: Dynamically Change Column Values Based on Selectinput from Another Column in R Shiny App
Reading Multiple Files into Multiple Data Frames
Looping Through T.Tests for Data Frame Subsets in R
Creating a Density Histogram in Ggplot2
Find the Most Frequent Value by Row
R Group by Date, and Summarize the Values
Filling Missing Dates in a Grouped Time Series - a Tidyverse-Way
How to Change Type of Target Column When Doing := by Group in a Data.Table in R