How to Make a List of Data Frames

How do I make a list of data frames?

This isn't related to your question, but you want to use = and not <- within the function call. If you use <-, you'll end up creating variables y1 and y2 in whatever environment you're working in:

d1 <- data.frame(y1 <- c(1, 2, 3), y2 <- c(4, 5, 6))
y1
# [1] 1 2 3
y2
# [1] 4 5 6

This won't have the seemingly desired effect of creating column names in the data frame:

d1
# y1....c.1..2..3. y2....c.4..5..6.
# 1 1 4
# 2 2 5
# 3 3 6

The = operator, on the other hand, will associate your vectors with arguments to data.frame.

As for your question, making a list of data frames is easy:

d1 <- data.frame(y1 = c(1, 2, 3), y2 = c(4, 5, 6))
d2 <- data.frame(y1 = c(3, 2, 1), y2 = c(6, 5, 4))
my.list <- list(d1, d2)

You access the data frames just like you would access any other list element:

my.list[[1]]
# y1 y2
# 1 1 4
# 2 2 5
# 3 3 6

Loop through a list of dataframes to create dataframes in R

You should give your demo data frame definitely an "ID" column as well! Then you do not have to hope that the demographics are correctly assigned to the observations, especially if the script is still changing during the work process. That may easily be done using transform (I simply use the consecutive ID's 1:3 here in the example).

res <- lapply(list(df1, df2, df3, df4), merge, transform(demo, ID=1:3))
res
# [[1]]
# ID b c df sex age vital_sts
# 1 1 x gh z m 30 a
# 2 2 y fg x m 50 a
# 3 3 z xv y f 62 d
#
# [[2]]
# ID v hg fd sex age vital_sts
# 1 1 a yty z m 30 a
# 2 2 mm zc x m 50 a
# 3 3 xc cx y f 62 d
#
# [[3]]
# ID t j sd sex age vital_sts
# 1 1 ae ewr z m 30 a
# 2 2 yw zd x m 50 a
# 3 3 zs x y f 62 d
#
# [[4]]
# ID u k f sex age vital_sts
# 1 1 df df z m 30 a
# 2 2 y zs x m 50 a
# 3 3 z xf y f 62 d

If you have gazillions of data frames in your workspace, as it looks like, you may list by pattern using mget(ls(pattern=)). (Or better yet, change your code to get them in a list in the first place.)

lapply(mget(ls(pat='^df\\d+')), merge, transform(demo, ID=1:3))

Edit

If I understand you correctly, according to your comment you have a large data frame DAT from which you want to assemble smaller data frames of variable groups and merge the demo to them. In this case I would put the variable names of these groups in a named list vgroups. Next, lapply over it to simultaneously subset dat with "ID" concatenated and merge it to demo.

demo still should have an "ID", because you don't want to trust, all rows are sorted in the same order, just consider for example sort(c(3, 10, 1, 100)) vs. sort(as.character(c(3, 10, 1, 100))) or omitted rows for whatever reason etc.

demo <- transform(demo, ID=1:3)  ## identify demo observations

vgroups <- list(g1=c("b", "c", "df"), g2=c("v", "hg", "fd"), g3=c("t", "j", "sd"),
g4=c("u", "k", "f"))

res1 <- lapply(vgroups, \(x) merge(demo, DAT[, c('ID', x)], by="ID"))
## saying by ID is even more save --^
res1
# $g1
# ID sex age vital_sts b c df
# 1 1 m 30 a x gh z
# 2 2 m 50 a y fg x
# 3 3 f 62 d z xv y
#
# $g2
# ID sex age vital_sts v hg fd
# 1 1 m 30 a a yty z
# 2 2 m 50 a mm zc x
# 3 3 f 62 d xc cx y
#
# $g3
# ID sex age vital_sts t j sd
# 1 1 m 30 a ae ewr z
# 2 2 m 50 a yw zd x
# 3 3 f 62 d zs x y
#
# $g4
# ID sex age vital_sts u k f
# 1 1 m 30 a df df z
# 2 2 m 50 a y zs x
# 3 3 f 62 d z xf y

Access individual data frames:

res1$g1
# ID sex age vital_sts b c df
# 1 1 m 30 a x gh z
# 2 2 m 50 a y fg x
# 3 3 f 62 d z xv y

If you still want the individual data frames in your environment, use list2env:

list2env(res1)
ls()
# [1] "DAT" "demo" "res1" "vgroups"

Data:

DAT <- structure(list(ID = 1:3, b = c("x", "y", "z"), c = c("gh", "fg", 
"xv"), df = c("z", "x", "y"), f = c("z", "x", "y"), fd = c("z",
"x", "y"), hg = c("yty", "zc", "cx"), j = c("ewr", "zd", "x"),
k = c("df", "zs", "xf"), sd = c("z", "x", "y"), t = c("ae",
"yw", "zs"), u = c("df", "y", "z"), v = c("a", "mm", "xc"
), x1 = c("gs", "gs", "gs"), x2 = c("cs", "cs", "cs"), x3 = c("tv",
"tv", "tv"), x4 = c("fb", "fb", "fb")), row.names = c(NA,
-3L), class = "data.frame")

demo <- data.frame(sex = c('m', 'm', 'f'), age = c('30', '50', '62'), vital_sts = c('a', 'a', 'd'))

Python: Store multiple dataframe in list

If you will use parameter sheet_name=None:

dfs = pd.read_excel(..., sheet_name=None)

it will return a dictionary of Dataframes:

sheet_name : string, int, mixed list of strings/ints, or None, default 0

Strings are used for sheet names, Integers are used in zero-indexed
sheet positions.

Lists of strings/integers are used to request multiple sheets.

Specify None to get all sheets.

str|int -> DataFrame is returned.
list|None -> Dict of DataFrames is returned, with keys representing
sheets.

Available Cases

* Defaults to 0 -> 1st sheet as a DataFrame
* 1 -> 2nd sheet as a DataFrame
* "Sheet1" -> 1st sheet as a DataFrame
* [0,1,"Sheet5"] -> 1st, 2nd & 5th sheet as a dictionary of DataFrames
* None -> All sheets as a dictionary of DataFrames

Convert a list to a data frame

Update July 2020:

The default for the parameter stringsAsFactors is now default.stringsAsFactors() which in turn yields FALSE as its default.


Assuming your list of lists is called l:

df <- data.frame(matrix(unlist(l), nrow=length(l), byrow=TRUE))

The above will convert all character columns to factors, to avoid this you can add a parameter to the data.frame() call:

df <- data.frame(matrix(unlist(l), nrow=132, byrow=TRUE),stringsAsFactors=FALSE)

Combining a list of data frames into a new data frame in R

Note that in your list of dataframes (df_list) all the columns have different names (Area1, Area2, Area3) whereas in your output dataframe they all have been combined into one single column. So for that you need to change the different column names to the same one and bind the dataframes together.

library(dplyr)
library(purrr)

result <- map_df(df_list, ~.x %>%
rename_with(~"Area", contains('Area')), .id = 'FileName')
result

# FileName Area
#1 a1_areaX 100
#2 a2_areaX 200
#3 a3_areaX 300

How to make a functional list with the data.frames from the environment in R?

If we have multiple data.frames in the global environment that we want to merge, we can use mget and ls:

file_1 = data.frame(id = c(1,2), a = c(1,2))
file_2 = data.frame(id = c(1,2), b = c(3,4))
file_3 = data.frame(id = c(3,4), a = c(5,6))

Reduce(\(...) merge(..., all = T), mget(ls(pattern = "file")))
id a b
1 1 1 3
2 2 2 4
3 3 5 NA
4 4 6 NA


Related Topics



Leave a reply



Submit