How to subset data.frames stored in a list?
lapply
's second argument is a function (subset
) and extra arguments to subset
are passed as the ...
arguments to lapply
. Hence:
my.ls <- list(d1 = d1, d2 = d2)
my.lsNA <- lapply(my.ls, subset, is.na(b))
(I am also showing you how to easily create the list of data.frames without using get
, and recommend you don't use ls
as a variable name since it is also the name of a rather common function.)
How to subset a list of data.frames?
If we want to subset the list
elements based on names
mainlist_new <- lapply(mainlist, `[`, c("rainfall", "yield"))
-output
> str(mainlist_new)
List of 2
$ :List of 2
..$ rainfall:'data.frame': 5 obs. of 3 variables:
.. ..$ station : chr [1:5] "MADA1" "MADA2" "MADA3" "MADA4" ...
.. ..$ rainfall: num [1:5] 0 5 10 15 20
.. ..$ yield : num [1:5] 2000 3000 4000 5000 6000
..$ yield :'data.frame': 5 obs. of 3 variables:
.. ..$ station : chr [1:5] "MADA1" "MADA2" "MADA3" "MADA4" ...
.. ..$ rainfall: num [1:5] 0 5 10 15 20
.. ..$ yield : num [1:5] 2000 3000 4000 5000 6000
$ :List of 2
..$ rainfall:'data.frame': 5 obs. of 3 variables:
.. ..$ station : chr [1:5] "MADA1" "MADA2" "MADA3" "MADA4" ...
.. ..$ rainfall: num [1:5] 0 5 10 15 20
.. ..$ yield : num [1:5] 2000 3000 4000 5000 6000
..$ yield :'data.frame': 5 obs. of 3 variables:
.. ..$ station : chr [1:5] "MADA1" "MADA2" "MADA3" "MADA4" ...
.. ..$ rainfall: num [1:5] 0 5 10 15 20
.. ..$ yield : num [1:5] 2000 3000 4000 5000 6000
Subset Data Based On Elements In List
Classic lapply
.
x <- lapply(variableData, function(x){subset(Data, Column_X == x)})
x
# [[1]]
# Data_x Data_y Column_X
# 1 -34 12 A
# 6 -35 24 A
#
# [[2]]
# Data_x Data_y Column_X
# 5 -34 10 B
# 7 -35 16 B
# 8 -33 22 B
it returns a list of all the subsets. To rbind
all these list elements just
do.call(rbind, x)
# Data_x Data_y Column_X
# 1 -34 12 A
# 6 -35 24 A
# 5 -34 10 B
# 7 -35 16 B
# 8 -33 22 B
however, as @Frank pointed out, you could use basic subsetting in your code:
Data[Data$Column_X %in% variableData,]
# Data_x Data_y Column_X
# 1 -34 12 A
# 5 -34 10 B
# 6 -35 24 A
# 7 -35 16 B
# 8 -33 22 B
"Warning
This is a convenience function intended for use interactively. For programming it is better to use the standard subsetting functions like
[
, and in particular the non-standard evaluation of argument subset can have unanticipated consequences." (?subset
)
Furthermore, thus the order of your rows will be kept.
Subset a list of data.frames and return list of data.frames
This should do it for you:
subsetl <- lapply(l,function(x) {
x[x<4] <- NA
return(x)
})
Result:
>subsetl
[[1]]
a b c
1 NA 4 NA
2 NA 5 4
3 NA 6 5
4 4 5 6
5 5 5 7
6 6 5 8
[[2]]
a b c
1 NA 4 NA
2 4 5 NA
3 5 6 NA
4 6 5 4
5 7 5 5
6 8 5 6
[[3]]
a b c
1 4 NA NA
2 5 4 NA
3 6 5 NA
4 5 6 4
5 5 7 5
6 5 8 6
How to subset a dataframe based on values in a list column
df %>%
filter(map_lgl(problem, ~any('thing 3' == .x)))
name problem
1 sue thing 1,....
How to create a loop which creates multiple subset dataframes from a larger data frame?
Your code works fine. Just remove list
so you create a vector of color names and not a list. If you only want distinct values, use unique
.
mydata <- data.frame(x = c(1,2,3), y = c('a','b','c'), z = c('red','red','yellow'))
colors <- unique(mydata$z)
for (i in 1:length(colors)) {
assign(paste0("mydata_",i), subset(mydata, z == colors[[i]]))
}
Selecting Entries in a Data Frame Stored in a List
You are trying to subset one data.frame
to rows that match the unique values of a column in another data.frame
.
Your attempted solution returns no elements because unique
is a data.frame
and when you coerce it to a list you are stuck with a list instead of a vector that can be used to subset rows. When subsetting using foo[bar, ]
, bar
should be a vector either with the indices of the rows to keep (e.g. foo[c(1,2), ]
or a logical value for each index in the data.frame
. All you need to do is use %in%
with the vector of unique values itself.
You don't need to use list()
for this and which()
is redundant since you can subset the data.frame using a logical vector instead of row indices. The logic behind this latter point is that %in%
is returning TRUE
or FALSE
for each row of my_data
, which can be used to subset. All that which()
is doing is getting the indices of rows that are TRUE
and subsetting by index. However, that is entirely redundant.
# Your example data
my_data = data.frame(col1 = c("abc", "bcd", "bfg", "eee", "eee") , id = 1:5)
my_data_1 = data.frame(col1 = c("abc", "byd", "bgg", "fef", "eee") , id = 1:5)
unique = unique(my_data_1[c("col1")])
# Show that unique is a data.frame
str(unique)
#> 'data.frame': 5 obs. of 1 variable:
#> $ col1: chr "abc" "byd" "bgg" "fef" ...
# Show that unique$col1 is a vector
str(unique$col1)
#> chr [1:5] "abc" "byd" "bgg" "fef" "eee"
# Show what a logical test with the character vector does
my_data$col1 %in% unique$col1
#> [1] TRUE FALSE FALSE TRUE TRUE
# We can use this to subset
my_data[my_data$col1 %in% unique$col1, ]
#> col1 id
#> 1 abc 1
#> 4 eee 4
#> 5 eee 5
You could also combine steps and simply use:
my_data[my_data$col1 %in% unique(my_data_1$col1), ]
#> col1 id
#> 1 abc 1
#> 4 eee 4
#> 5 eee 5
Related Topics
Centering Image and Text in R Markdown for a PDF Report
Ggplot2 - Shade Area Between Two Vertical Lines
How to Swap Columns Around in a Data Frame Using R
Plotting Envfit Vectors (Vegan Package) in Ggplot2
Differencebetween These Two Comparisons
Using a Loop to Create Multiple Data Frames in R
Writing to a Dataframe from a For-Loop in R
Creating a Facet_Wrap Plot with Ggplot2 with Different Annotations in Each Plot
Dynamic Height and Width for Knitr Plots
How to Get Parameters from Config File in R Script
How to Display Widgets Inline in Shiny
Reversed Order After Coord_Flip in R
How to Refer to a Variable Name with Spaces
Annotating Facet Title as Strip Over Facet
Generate a Repeating Sequence Based on Vector
Matrix Expression Causes Error "Requires Numeric/Complex Matrix/Vector Arguments"