How to Subset Data.Frames Stored in a List

How to subset data.frames stored in a list?

lapply's second argument is a function (subset) and extra arguments to subset are passed as the ... arguments to lapply. Hence:

my.ls <- list(d1 = d1, d2 = d2)
my.lsNA <- lapply(my.ls, subset, is.na(b))

(I am also showing you how to easily create the list of data.frames without using get, and recommend you don't use ls as a variable name since it is also the name of a rather common function.)

How to subset a list of data.frames?

If we want to subset the list elements based on names

mainlist_new <- lapply(mainlist, `[`, c("rainfall", "yield"))

-output

> str(mainlist_new)
List of 2
 $ :List of 2
  ..$ rainfall:'data.frame':    5 obs. of  3 variables:
  .. ..$ station : chr [1:5] "MADA1" "MADA2" "MADA3" "MADA4" ...
  .. ..$ rainfall: num [1:5] 0 5 10 15 20
  .. ..$ yield   : num [1:5] 2000 3000 4000 5000 6000
  ..$ yield   :'data.frame':    5 obs. of  3 variables:
  .. ..$ station : chr [1:5] "MADA1" "MADA2" "MADA3" "MADA4" ...
  .. ..$ rainfall: num [1:5] 0 5 10 15 20
  .. ..$ yield   : num [1:5] 2000 3000 4000 5000 6000
 $ :List of 2
  ..$ rainfall:'data.frame':    5 obs. of  3 variables:
  .. ..$ station : chr [1:5] "MADA1" "MADA2" "MADA3" "MADA4" ...
  .. ..$ rainfall: num [1:5] 0 5 10 15 20
  .. ..$ yield   : num [1:5] 2000 3000 4000 5000 6000
  ..$ yield   :'data.frame':    5 obs. of  3 variables:
  .. ..$ station : chr [1:5] "MADA1" "MADA2" "MADA3" "MADA4" ...
  .. ..$ rainfall: num [1:5] 0 5 10 15 20
  .. ..$ yield   : num [1:5] 2000 3000 4000 5000 6000

Subset Data Based On Elements In List

Classic lapply.

x <- lapply(variableData, function(x){subset(Data, Column_X == x)})
x
# [[1]]
# Data_x Data_y Column_X
# 1    -34     12        A
# 6    -35     24        A
# 
# [[2]]
# Data_x Data_y Column_X
# 5    -34     10        B
# 7    -35     16        B
# 8    -33     22        B

it returns a list of all the subsets. To rbind all these list elements just

do.call(rbind, x)
#   Data_x Data_y Column_X
# 1    -34     12        A
# 6    -35     24        A
# 5    -34     10        B
# 7    -35     16        B
# 8    -33     22        B

however, as @Frank pointed out, you could use basic subsetting in your code:

Data[Data$Column_X %in% variableData,]
#   Data_x Data_y Column_X
# 1    -34     12        A
# 5    -34     10        B
# 6    -35     24        A
# 7    -35     16        B
# 8    -33     22        B

"Warning

This is a convenience function intended for use interactively. For programming it is better to use the standard subsetting functions like [, and in particular the non-standard evaluation of argument subset can have unanticipated consequences." (?subset)

Furthermore, thus the order of your rows will be kept.

Subset a list of data.frames and return list of data.frames

This should do it for you:

subsetl <- lapply(l,function(x) {
    x[x<4] <- NA
    return(x)
})

Result:

>subsetl
[[1]]
   a b  c
1 NA 4 NA
2 NA 5  4
3 NA 6  5
4  4 5  6
5  5 5  7
6  6 5  8

[[2]]
   a b  c
1 NA 4 NA
2  4 5 NA
3  5 6 NA
4  6 5  4
5  7 5  5
6  8 5  6

[[3]]
  a  b  c
1 4 NA NA
2 5  4 NA
3 6  5 NA
4 5  6  4
5 5  7  5
6 5  8  6

How to subset a dataframe based on values in a list column

df %>%
  filter(map_lgl(problem, ~any('thing 3' == .x)))

  name      problem
1  sue thing 1,....

How to create a loop which creates multiple subset dataframes from a larger data frame?

Your code works fine. Just remove list so you create a vector of color names and not a list. If you only want distinct values, use unique.

mydata <- data.frame(x = c(1,2,3), y = c('a','b','c'), z = c('red','red','yellow'))

colors <- unique(mydata$z)

for (i in 1:length(colors)) {
    assign(paste0("mydata_",i), subset(mydata, z == colors[[i]]))
    }

Selecting Entries in a Data Frame Stored in a List

You are trying to subset one data.frame to rows that match the unique values of a column in another data.frame.

Your attempted solution returns no elements because unique is a data.frame and when you coerce it to a list you are stuck with a list instead of a vector that can be used to subset rows. When subsetting using foo[bar, ], bar should be a vector either with the indices of the rows to keep (e.g. foo[c(1,2), ] or a logical value for each index in the data.frame. All you need to do is use %in% with the vector of unique values itself.

You don't need to use list() for this and which() is redundant since you can subset the data.frame using a logical vector instead of row indices. The logic behind this latter point is that %in% is returning TRUE or FALSE for each row of my_data, which can be used to subset. All that which() is doing is getting the indices of rows that are TRUE and subsetting by index. However, that is entirely redundant.

# Your example data
my_data = data.frame(col1 = c("abc", "bcd", "bfg", "eee", "eee") , id = 1:5)
my_data_1 = data.frame(col1 = c("abc", "byd", "bgg", "fef", "eee") , id = 1:5)
unique = unique(my_data_1[c("col1")])

# Show that unique is a data.frame
str(unique)
#> 'data.frame':    5 obs. of  1 variable:
#>  $ col1: chr  "abc" "byd" "bgg" "fef" ...

# Show that unique$col1 is a vector
str(unique$col1)
#>  chr [1:5] "abc" "byd" "bgg" "fef" "eee"

# Show what a logical test with the character vector does
my_data$col1 %in% unique$col1
#> [1]  TRUE FALSE FALSE  TRUE  TRUE

# We can use this to subset
my_data[my_data$col1 %in% unique$col1, ]
#>   col1 id
#> 1  abc  1
#> 4  eee  4
#> 5  eee  5

You could also combine steps and simply use:

my_data[my_data$col1 %in% unique(my_data_1$col1), ]
#>   col1 id
#> 1  abc  1
#> 4  eee  4
#> 5  eee  5

How to Subset Data.Frames Stored in a List