How do I make a list of data frames?
This isn't related to your question, but you want to use =
and not <-
within the function call. If you use <-
, you'll end up creating variables y1
and y2
in whatever environment you're working in:
d1 <- data.frame(y1 <- c(1, 2, 3), y2 <- c(4, 5, 6))
y1
# [1] 1 2 3
y2
# [1] 4 5 6
This won't have the seemingly desired effect of creating column names in the data frame:
d1
# y1....c.1..2..3. y2....c.4..5..6.
# 1 1 4
# 2 2 5
# 3 3 6
The =
operator, on the other hand, will associate your vectors with arguments to data.frame
.
As for your question, making a list of data frames is easy:
d1 <- data.frame(y1 = c(1, 2, 3), y2 = c(4, 5, 6))
d2 <- data.frame(y1 = c(3, 2, 1), y2 = c(6, 5, 4))
my.list <- list(d1, d2)
You access the data frames just like you would access any other list element:
my.list[[1]]
# y1 y2
# 1 1 4
# 2 2 5
# 3 3 6
Loop through a list of dataframes to create dataframes in R
You should give your demo
data frame definitely an "ID"
column as well! Then you do not have to hope that the demographics are correctly assigned to the observations, especially if the script is still changing during the work process. That may easily be done using transform
(I simply use the consecutive ID's 1:3
here in the example).
res <- lapply(list(df1, df2, df3, df4), merge, transform(demo, ID=1:3))
res
# [[1]]
# ID b c df sex age vital_sts
# 1 1 x gh z m 30 a
# 2 2 y fg x m 50 a
# 3 3 z xv y f 62 d
#
# [[2]]
# ID v hg fd sex age vital_sts
# 1 1 a yty z m 30 a
# 2 2 mm zc x m 50 a
# 3 3 xc cx y f 62 d
#
# [[3]]
# ID t j sd sex age vital_sts
# 1 1 ae ewr z m 30 a
# 2 2 yw zd x m 50 a
# 3 3 zs x y f 62 d
#
# [[4]]
# ID u k f sex age vital_sts
# 1 1 df df z m 30 a
# 2 2 y zs x m 50 a
# 3 3 z xf y f 62 d
If you have gazillions of data frames in your workspace, as it looks like, you may list by pattern using mget(ls(pattern=))
. (Or better yet, change your code to get them in a list in the first place.)
lapply(mget(ls(pat='^df\\d+')), merge, transform(demo, ID=1:3))
Edit
If I understand you correctly, according to your comment you have a large data frame DAT
from which you want to assemble smaller data frames of variable groups and merge the demo
to them. In this case I would put the variable names of these groups in a named list vgroups
. Next, lapply
over it to simultaneously subset dat
with "ID"
c
oncatenated and merge
it to demo
.
demo
still should have an "ID"
, because you don't want to trust, all rows are sorted in the same order, just consider for example sort(c(3, 10, 1, 100))
vs. sort(as.character(c(3, 10, 1, 100)))
or omitted rows for whatever reason etc.
demo <- transform(demo, ID=1:3) ## identify demo observations
vgroups <- list(g1=c("b", "c", "df"), g2=c("v", "hg", "fd"), g3=c("t", "j", "sd"),
g4=c("u", "k", "f"))
res1 <- lapply(vgroups, \(x) merge(demo, DAT[, c('ID', x)], by="ID"))
## saying by ID is even more save --^
res1
# $g1
# ID sex age vital_sts b c df
# 1 1 m 30 a x gh z
# 2 2 m 50 a y fg x
# 3 3 f 62 d z xv y
#
# $g2
# ID sex age vital_sts v hg fd
# 1 1 m 30 a a yty z
# 2 2 m 50 a mm zc x
# 3 3 f 62 d xc cx y
#
# $g3
# ID sex age vital_sts t j sd
# 1 1 m 30 a ae ewr z
# 2 2 m 50 a yw zd x
# 3 3 f 62 d zs x y
#
# $g4
# ID sex age vital_sts u k f
# 1 1 m 30 a df df z
# 2 2 m 50 a y zs x
# 3 3 f 62 d z xf y
Access individual data frames:
res1$g1
# ID sex age vital_sts b c df
# 1 1 m 30 a x gh z
# 2 2 m 50 a y fg x
# 3 3 f 62 d z xv y
If you still want the individual data frames in your environment, use list2env
:
list2env(res1)
ls()
# [1] "DAT" "demo" "res1" "vgroups"
Data:
DAT <- structure(list(ID = 1:3, b = c("x", "y", "z"), c = c("gh", "fg",
"xv"), df = c("z", "x", "y"), f = c("z", "x", "y"), fd = c("z",
"x", "y"), hg = c("yty", "zc", "cx"), j = c("ewr", "zd", "x"),
k = c("df", "zs", "xf"), sd = c("z", "x", "y"), t = c("ae",
"yw", "zs"), u = c("df", "y", "z"), v = c("a", "mm", "xc"
), x1 = c("gs", "gs", "gs"), x2 = c("cs", "cs", "cs"), x3 = c("tv",
"tv", "tv"), x4 = c("fb", "fb", "fb")), row.names = c(NA,
-3L), class = "data.frame")
demo <- data.frame(sex = c('m', 'm', 'f'), age = c('30', '50', '62'), vital_sts = c('a', 'a', 'd'))
Python: Store multiple dataframe in list
If you will use parameter sheet_name=None
:
dfs = pd.read_excel(..., sheet_name=None)
it will return a dictionary of Dataframes:
sheet_name : string, int, mixed list of strings/ints, or None, default 0
Strings are used for sheet names, Integers are used in zero-indexed
sheet positions.
Lists of strings/integers are used to request multiple sheets.
Specify None to get all sheets.
str|int -> DataFrame is returned.
list|None -> Dict of DataFrames is returned, with keys representing
sheets.
Available Cases
* Defaults to 0 -> 1st sheet as a DataFrame
* 1 -> 2nd sheet as a DataFrame
* "Sheet1" -> 1st sheet as a DataFrame
* [0,1,"Sheet5"] -> 1st, 2nd & 5th sheet as a dictionary of DataFrames
* None -> All sheets as a dictionary of DataFrames
Convert a list to a data frame
Update July 2020:
The default for the parameter stringsAsFactors
is now default.stringsAsFactors()
which in turn yields FALSE
as its default.
Assuming your list of lists is called l
:
df <- data.frame(matrix(unlist(l), nrow=length(l), byrow=TRUE))
The above will convert all character columns to factors, to avoid this you can add a parameter to the data.frame() call:
df <- data.frame(matrix(unlist(l), nrow=132, byrow=TRUE),stringsAsFactors=FALSE)
Combining a list of data frames into a new data frame in R
Note that in your list of dataframes (df_list
) all the columns have different names (Area1
, Area2
, Area3
) whereas in your output dataframe they all have been combined into one single column. So for that you need to change the different column names to the same one and bind the dataframes together.
library(dplyr)
library(purrr)
result <- map_df(df_list, ~.x %>%
rename_with(~"Area", contains('Area')), .id = 'FileName')
result
# FileName Area
#1 a1_areaX 100
#2 a2_areaX 200
#3 a3_areaX 300
How to make a functional list with the data.frames from the environment in R?
If we have multiple data.frames in the global environment that we want to merge, we can use mget
and ls
:
file_1 = data.frame(id = c(1,2), a = c(1,2))
file_2 = data.frame(id = c(1,2), b = c(3,4))
file_3 = data.frame(id = c(3,4), a = c(5,6))
Reduce(\(...) merge(..., all = T), mget(ls(pattern = "file")))
id a b
1 1 1 3
2 2 2 4
3 3 5 NA
4 4 6 NA
Related Topics
Delete Rows Containing Specific Strings in R
How to Keep Columns When Grouping/Summarizing
How to Fix Spaces in Column Names of a Data.Frame (Remove Spaces, Inject Dots)
Reshaping Multiple Sets of Measurement Columns (Wide Format) into Single Columns (Long Format)
How to Implement Coalesce Efficiently in R
Collapse Text by Group in Data Frame
Grep Using a Character Vector With Multiple Patterns
Emulate Ggplot2 Default Color Palette
How to Find Common Elements from Multiple Vectors
How to Get to the Next Line in the R Command Prompt Without Executing
Using Ifelse Statement on the Whole Dataset Instead of a Single Column
Removing Space Between Numeric Values in R
Order Bars in Ggplot2 Bar Graph
Subset Data Frame Based on Number of Rows Per Group
Convert a List to a Data Frame