How to Subset Data.Frames Stored in a List

How to subset data.frames stored in a list?

lapply's second argument is a function (subset) and extra arguments to subset are passed as the ... arguments to lapply. Hence:

my.ls <- list(d1 = d1, d2 = d2)
my.lsNA <- lapply(my.ls, subset, is.na(b))

(I am also showing you how to easily create the list of data.frames without using get, and recommend you don't use ls as a variable name since it is also the name of a rather common function.)

How to subset a list of data.frames?

If we want to subset the list elements based on names

mainlist_new <- lapply(mainlist, `[`, c("rainfall", "yield"))

-output

> str(mainlist_new)
List of 2
$ :List of 2
..$ rainfall:'data.frame': 5 obs. of 3 variables:
.. ..$ station : chr [1:5] "MADA1" "MADA2" "MADA3" "MADA4" ...
.. ..$ rainfall: num [1:5] 0 5 10 15 20
.. ..$ yield : num [1:5] 2000 3000 4000 5000 6000
..$ yield :'data.frame': 5 obs. of 3 variables:
.. ..$ station : chr [1:5] "MADA1" "MADA2" "MADA3" "MADA4" ...
.. ..$ rainfall: num [1:5] 0 5 10 15 20
.. ..$ yield : num [1:5] 2000 3000 4000 5000 6000
$ :List of 2
..$ rainfall:'data.frame': 5 obs. of 3 variables:
.. ..$ station : chr [1:5] "MADA1" "MADA2" "MADA3" "MADA4" ...
.. ..$ rainfall: num [1:5] 0 5 10 15 20
.. ..$ yield : num [1:5] 2000 3000 4000 5000 6000
..$ yield :'data.frame': 5 obs. of 3 variables:
.. ..$ station : chr [1:5] "MADA1" "MADA2" "MADA3" "MADA4" ...
.. ..$ rainfall: num [1:5] 0 5 10 15 20
.. ..$ yield : num [1:5] 2000 3000 4000 5000 6000

Subset Data Based On Elements In List

Classic lapply.

x <- lapply(variableData, function(x){subset(Data, Column_X == x)})
x
# [[1]]
# Data_x Data_y Column_X
# 1 -34 12 A
# 6 -35 24 A
#
# [[2]]
# Data_x Data_y Column_X
# 5 -34 10 B
# 7 -35 16 B
# 8 -33 22 B

it returns a list of all the subsets. To rbind all these list elements just

do.call(rbind, x)
# Data_x Data_y Column_X
# 1 -34 12 A
# 6 -35 24 A
# 5 -34 10 B
# 7 -35 16 B
# 8 -33 22 B

however, as @Frank pointed out, you could use basic subsetting in your code:

Data[Data$Column_X %in% variableData,]
# Data_x Data_y Column_X
# 1 -34 12 A
# 5 -34 10 B
# 6 -35 24 A
# 7 -35 16 B
# 8 -33 22 B

"Warning

This is a convenience function intended for use interactively. For programming it is better to use the standard subsetting functions like [, and in particular the non-standard evaluation of argument subset can have unanticipated consequences." (?subset)

Furthermore, thus the order of your rows will be kept.

Subset a list of data.frames and return list of data.frames

This should do it for you:

subsetl <- lapply(l,function(x) {
x[x<4] <- NA
return(x)
})

Result:

>subsetl
[[1]]
a b c
1 NA 4 NA
2 NA 5 4
3 NA 6 5
4 4 5 6
5 5 5 7
6 6 5 8

[[2]]
a b c
1 NA 4 NA
2 4 5 NA
3 5 6 NA
4 6 5 4
5 7 5 5
6 8 5 6

[[3]]
a b c
1 4 NA NA
2 5 4 NA
3 6 5 NA
4 5 6 4
5 5 7 5
6 5 8 6

How to subset a dataframe based on values in a list column

df %>%
filter(map_lgl(problem, ~any('thing 3' == .x)))

name problem
1 sue thing 1,....

How to create a loop which creates multiple subset dataframes from a larger data frame?

Your code works fine. Just remove list so you create a vector of color names and not a list. If you only want distinct values, use unique.

mydata <- data.frame(x = c(1,2,3), y = c('a','b','c'), z = c('red','red','yellow'))

colors <- unique(mydata$z)

for (i in 1:length(colors)) {
assign(paste0("mydata_",i), subset(mydata, z == colors[[i]]))
}

Selecting Entries in a Data Frame Stored in a List

You are trying to subset one data.frame to rows that match the unique values of a column in another data.frame.

Your attempted solution returns no elements because unique is a data.frame and when you coerce it to a list you are stuck with a list instead of a vector that can be used to subset rows. When subsetting using foo[bar, ], bar should be a vector either with the indices of the rows to keep (e.g. foo[c(1,2), ] or a logical value for each index in the data.frame. All you need to do is use %in% with the vector of unique values itself.

You don't need to use list() for this and which() is redundant since you can subset the data.frame using a logical vector instead of row indices. The logic behind this latter point is that %in% is returning TRUE or FALSE for each row of my_data, which can be used to subset. All that which() is doing is getting the indices of rows that are TRUE and subsetting by index. However, that is entirely redundant.

# Your example data
my_data = data.frame(col1 = c("abc", "bcd", "bfg", "eee", "eee") , id = 1:5)
my_data_1 = data.frame(col1 = c("abc", "byd", "bgg", "fef", "eee") , id = 1:5)
unique = unique(my_data_1[c("col1")])

# Show that unique is a data.frame
str(unique)
#> 'data.frame': 5 obs. of 1 variable:
#> $ col1: chr "abc" "byd" "bgg" "fef" ...

# Show that unique$col1 is a vector
str(unique$col1)
#> chr [1:5] "abc" "byd" "bgg" "fef" "eee"

# Show what a logical test with the character vector does
my_data$col1 %in% unique$col1
#> [1] TRUE FALSE FALSE TRUE TRUE

# We can use this to subset
my_data[my_data$col1 %in% unique$col1, ]
#> col1 id
#> 1 abc 1
#> 4 eee 4
#> 5 eee 5

You could also combine steps and simply use:

my_data[my_data$col1 %in% unique(my_data_1$col1), ]
#> col1 id
#> 1 abc 1
#> 4 eee 4
#> 5 eee 5


Related Topics



Leave a reply



Submit