How to Subset from a List in R

How to subset a list of data.frames?

If we want to subset the list elements based on names

mainlist_new <- lapply(mainlist, `[`, c("rainfall", "yield"))

-output

> str(mainlist_new)
List of 2
$ :List of 2
..$ rainfall:'data.frame': 5 obs. of 3 variables:
.. ..$ station : chr [1:5] "MADA1" "MADA2" "MADA3" "MADA4" ...
.. ..$ rainfall: num [1:5] 0 5 10 15 20
.. ..$ yield : num [1:5] 2000 3000 4000 5000 6000
..$ yield :'data.frame': 5 obs. of 3 variables:
.. ..$ station : chr [1:5] "MADA1" "MADA2" "MADA3" "MADA4" ...
.. ..$ rainfall: num [1:5] 0 5 10 15 20
.. ..$ yield : num [1:5] 2000 3000 4000 5000 6000
$ :List of 2
..$ rainfall:'data.frame': 5 obs. of 3 variables:
.. ..$ station : chr [1:5] "MADA1" "MADA2" "MADA3" "MADA4" ...
.. ..$ rainfall: num [1:5] 0 5 10 15 20
.. ..$ yield : num [1:5] 2000 3000 4000 5000 6000
..$ yield :'data.frame': 5 obs. of 3 variables:
.. ..$ station : chr [1:5] "MADA1" "MADA2" "MADA3" "MADA4" ...
.. ..$ rainfall: num [1:5] 0 5 10 15 20
.. ..$ yield : num [1:5] 2000 3000 4000 5000 6000

How can I subset a list in r by extracting the elements that contain a string?

The grep should be on the names and not the values of the list

mylist_sub <- mylist[grep('pt', names(mylist))]

How do you subset data from a list in R?

We could either use lapply from base R

out <- lapply(ret, function(x) x %>%
html_nodes(".iCIMS_JobContent, #jobDescriptionText") %>%
html_text())

or loop with map from purrr

library(purrr)
out <- map(ret, ~ .x %>%
html_nodes(".iCIMS_JobContent, #jobDescriptionText") %>%
html_text())

NOTE: Both are looping over the elements of the list, the .x or x are the individual elements (from anonymous function - i.e. function created on the fly (function(x) or ~ - in tidyverse)

Subset Data Based On Elements In List

Classic lapply.

x <- lapply(variableData, function(x){subset(Data, Column_X == x)})
x
# [[1]]
# Data_x Data_y Column_X
# 1 -34 12 A
# 6 -35 24 A
#
# [[2]]
# Data_x Data_y Column_X
# 5 -34 10 B
# 7 -35 16 B
# 8 -33 22 B

it returns a list of all the subsets. To rbind all these list elements just

do.call(rbind, x)
# Data_x Data_y Column_X
# 1 -34 12 A
# 6 -35 24 A
# 5 -34 10 B
# 7 -35 16 B
# 8 -33 22 B

however, as @Frank pointed out, you could use basic subsetting in your code:

Data[Data$Column_X %in% variableData,]
# Data_x Data_y Column_X
# 1 -34 12 A
# 5 -34 10 B
# 6 -35 24 A
# 7 -35 16 B
# 8 -33 22 B

"Warning

This is a convenience function intended for use interactively. For programming it is better to use the standard subsetting functions like [, and in particular the non-standard evaluation of argument subset can have unanticipated consequences." (?subset)

Furthermore, thus the order of your rows will be kept.

How to subset a list of a list based on the name

We can Extract

library(purrr)
map(L, `[`, select_names)
#[[1]]
#[[1]]$A
# [,1] [,2]
#[1,] 1 3
#[2,] 2 4

#[[2]]
#[[2]]$A
# [,1] [,2]
#[1,] 5 7
#[2,] 6 8

Or using lapply

lapply(L, function(x) x[select_names])

Or without anonymous function call

lapply(L, `[`, select_names)

How to subset dataframe using list that includes partial strings of another variable

You were on the right track, grepl is your friend. So that you can use the countries with it, paste them together while collapsing on an or |.

Then, using subset

EU_p <- paste(EU, collapse='|')

subset(df, grepl(EU_p, a))
# a b
# 2 Croatia USA 2
# 4 Switzerland Hungary 4
# 5 Lithuania Indonesia 5

or as you indicated using brackets

df[grepl(EU_p, df$a), ]
# a b
# 2 Croatia USA 2
# 4 Switzerland Hungary 4
# 5 Lithuania Indonesia 5

The result is any row of df containing at least one country of the EU vector, since the pattern as is doesn't distinguish the position.


Data:

df <- structure(list(a = c("Albania Canada", "Croatia USA", "Mexico Egypt", 
"Switzerland Hungary", "Lithuania Indonesia"), b = c(1, 2, 3,
4, 5)), class = "data.frame", row.names = c(NA, -5L))

create object with loop that subsets a list in r

I think I understand now; thank you for clarifying your question in the comments. If there's something I missed or you have any questions, please let me know.

Terminology, quickly

I believe you are interested in splitting a vector of strings into multiple shorter vectors of strings based on a pattern within each element. A list is simply a vector of vectors.

g is a vector of 20 string elements (see Data code chunk below).

is.vector(g)
#> [1] TRUE

Here's a list that only contains one vector.

str(list(g))
#> List of 1
#> $ : chr [1:20] "New AS Plate 21_AS Plate_Sample 12_50.fcs" "New AS Plate 21_AS Plate_Sample 1_100.fcs" "New AS Plate 21_AS Plate_Sample 1_25.fcs" "New AS Plate 21_AS Plate_Sample 1_250.fcs" ...

Now onto the question...

In your question, you specifically ask about using assign(). Although using assign() can be convenient, [it is usually not recommended][1]. But sometimes you gotta do what you gotta do, no shame in that. Here's how you could use it manually, on one group at a time (like you show in your question).

# Using assign() one group at a time
h <- g[grep("Sample 1_", g)]
assign(x = "sample_1_group", value = h)

It is pretty straightforward (and seemingly logical) to use assign() in a for-loop.

The first step in defining a for-loop, is defining what the loop will be "loop over." Or in other words, what will change during each iteration of the loop. In your case, we are looking for a number that defines your groups. We can define a vector of those numbers manually or programmatically.

# Define groups manually
ids <- c(12,1,10,11)
ids
#> [1] 12 1 10 11

# Pattern match groups
all_ids <- gsub(pattern = ".*Sample (\\d+).*", replacement = "\\1", x = g)
all_ids
#> [1] "12" "1" "1" "1" "1" "1" "10" "10" "10" "10" "10" "11" "11" "11" "11"
#> [16] "11" "12" "12" "12" "12"
ids <- unique(all_ids)
ids
#> [1] "12" "1" "10" "11"

After we know what we are looping over, we can define the structure of the loop and functions within in. paste0() can be a workhorse here. The loop below iterates over ids (one id at a time), finds matching strings in g, and writes them to your environment as a vector. Because we are using assign(), we expect a new vector to appear in our environment after each iteration of the loop.

# For-loop with assign
for(i in ids){
a <- paste0("Sample ", i, "_")
h <- g[grep(a, g)]
h_name <- paste0("sample_", i, "_group")
assign(x = h_name, value = h)
}

That technically works, but it's not the best. You may find that it is actually more convenient to use lists (a vector of vectors) to store information from a for-loop. It's fast to program, you don't have a bunch of new objects crowding your workspace, and all the scary things (not really) in that link above won't be a problem. Here's how you could do that:

# Save the results of a for-loop in a list!
# First, make a blank list to hold the results
results <- list()
for(i in ids){
a <- paste0("Sample ", i, "_")
h <- g[grep(a, g)]
h_name <- paste0("sample_", i, "_group")
results[[h_name]] <- h
}
results
#> $sample_12_group
#> [1] "New AS Plate 21_AS Plate_Sample 12_50.fcs"
#> [2] "New AS Plate 21_AS Plate_Sample 12_100.fcs"
#> [3] "New AS Plate 21_AS Plate_Sample 12_25.fcs"
#> [4] "New AS Plate 21_AS Plate_Sample 12_250.fcs"
#> [5] "New AS Plate 21_AS Plate_Sample 12_500.fcs"
#>
#> $sample_1_group
#> [1] "New AS Plate 21_AS Plate_Sample 1_100.fcs"
#> [2] "New AS Plate 21_AS Plate_Sample 1_25.fcs"
#> [3] "New AS Plate 21_AS Plate_Sample 1_250.fcs"
#> [4] "New AS Plate 21_AS Plate_Sample 1_50.fcs"
#> [5] "New AS Plate 21_AS Plate_Sample 1_500.fcs"
#>
#> $sample_10_group
#> [1] "New AS Plate 21_AS Plate_Sample 10_100.fcs"
#> [2] "New AS Plate 21_AS Plate_Sample 10_25.fcs"
#> [3] "New AS Plate 21_AS Plate_Sample 10_250.fcs"
#> [4] "New AS Plate 21_AS Plate_Sample 10_50.fcs"
#> [5] "New AS Plate 21_AS Plate_Sample 10_500.fcs"
#>
#> $sample_11_group
#> [1] "New AS Plate 21_AS Plate_Sample 11_100.fcs"
#> [2] "New AS Plate 21_AS Plate_Sample 11_25.fcs"
#> [3] "New AS Plate 21_AS Plate_Sample 11_250.fcs"
#> [4] "New AS Plate 21_AS Plate_Sample 11_50.fcs"
#> [5] "New AS Plate 21_AS Plate_Sample 11_500.fcs"

Extra credit

For-loops are great: it's easy to see what's going on inside of them, its easy to do a lot of data handling in them, and they are usually reasonably fast to execute. But sometimes its all about speed. R is vectorized ([I'm honestly not exactly sure what this means][2] besides "it can do multiple calculations simultaneously"), but a for-loop doesn't take advantage of this very well. The apply() family of vectorized functions do, and they can usually be easy to implement in cases where you might also use a for-loop. Here's how you could do that with your data:

# Vectorized
lapply(ids, function(i) g[grep(paste0("Sample ", i, "_"), g)])
#> [[1]]
#> [1] "New AS Plate 21_AS Plate_Sample 12_50.fcs"
#> [2] "New AS Plate 21_AS Plate_Sample 12_100.fcs"
#> [3] "New AS Plate 21_AS Plate_Sample 12_25.fcs"
#> [4] "New AS Plate 21_AS Plate_Sample 12_250.fcs"
#> [5] "New AS Plate 21_AS Plate_Sample 12_500.fcs"
#>
#> [[2]]
#> [1] "New AS Plate 21_AS Plate_Sample 1_100.fcs"
#> [2] "New AS Plate 21_AS Plate_Sample 1_25.fcs"
#> [3] "New AS Plate 21_AS Plate_Sample 1_250.fcs"
#> [4] "New AS Plate 21_AS Plate_Sample 1_50.fcs"
#> [5] "New AS Plate 21_AS Plate_Sample 1_500.fcs"
#>
#> [[3]]
#> [1] "New AS Plate 21_AS Plate_Sample 10_100.fcs"
#> [2] "New AS Plate 21_AS Plate_Sample 10_25.fcs"
#> [3] "New AS Plate 21_AS Plate_Sample 10_250.fcs"
#> [4] "New AS Plate 21_AS Plate_Sample 10_50.fcs"
#> [5] "New AS Plate 21_AS Plate_Sample 10_500.fcs"
#>
#> [[4]]
#> [1] "New AS Plate 21_AS Plate_Sample 11_100.fcs"
#> [2] "New AS Plate 21_AS Plate_Sample 11_25.fcs"
#> [3] "New AS Plate 21_AS Plate_Sample 11_250.fcs"
#> [4] "New AS Plate 21_AS Plate_Sample 11_50.fcs"
#> [5] "New AS Plate 21_AS Plate_Sample 11_500.fcs"
Created on 2021-10-14 by the reprex package (v2.0.1)

Data:

g <- c("New AS Plate 21_AS Plate_Sample 12_50.fcs", 
"New AS Plate 21_AS Plate_Sample 1_100.fcs",
"New AS Plate 21_AS Plate_Sample 1_25.fcs",
"New AS Plate 21_AS Plate_Sample 1_250.fcs",
"New AS Plate 21_AS Plate_Sample 1_50.fcs",
"New AS Plate 21_AS Plate_Sample 1_500.fcs",
"New AS Plate 21_AS Plate_Sample 10_100.fcs",
"New AS Plate 21_AS Plate_Sample 10_25.fcs",
"New AS Plate 21_AS Plate_Sample 10_250.fcs",
"New AS Plate 21_AS Plate_Sample 10_50.fcs",
"New AS Plate 21_AS Plate_Sample 10_500.fcs",
"New AS Plate 21_AS Plate_Sample 11_100.fcs",
"New AS Plate 21_AS Plate_Sample 11_25.fcs",
"New AS Plate 21_AS Plate_Sample 11_250.fcs",
"New AS Plate 21_AS Plate_Sample 11_50.fcs",
"New AS Plate 21_AS Plate_Sample 11_500.fcs",
"New AS Plate 21_AS Plate_Sample 12_100.fcs",
"New AS Plate 21_AS Plate_Sample 12_25.fcs",
"New AS Plate 21_AS Plate_Sample 12_250.fcs",
"New AS Plate 21_AS Plate_Sample 12_500.fcs")

[1]: Why is using assign bad?)
[2]: How do I know a function or an operation in R is vectorized?

R - Subsetting list by paramaters of nested list

We can use base R to do this. No packages are needed

lapply(List_tot, `[`, c("a", "c", "d"))

or with anonymous function

lapply(List_tot, function(x) x[c("a", "c", "d")])

if we need the top 2, order the number of rows (lengths work as these are single column matrix, so the number of rows are equal to the total number of elements, get the head of the names of the ordered vector of number of rows and use that to extract the inner list element

lapply(List_tot, function(x) {
x1 <- x[c("a", "c", "d")]
v1 <- lengths(x1)
x1[head(names(v1)[order(-v1)], 2)]
})

Subsetting nested lists based on condition (values) in R

You could use Filter from base R:

Filter(function(x) sum(x$co) !=0, dummy_list)

Or you can use purrr:

library(tidyverse)

dummy_list %>%
keep( ~ sum(.$co) != 0)

Output

$`first-group`
$`first-group`$val
[1] 534 582 298 645 314 237 418 348 363 133 493 721 722 210 467 474 145 638 545 330 709 712 674 492 262 663 609 142 428 254

$`first-group`$co
[1] 0 0 1 1 0 1 0 0 1 1 0 0 1 0 0 0 0 1 0 0 0 0 1 1 0 1 1 1 1 0

$`third-group`
$`third-group`$val
[1] 713 721 683 526 699 555 563 672 619 603 588 533 622 724 616 644 730 716 660 663 611 669 644 664 679 514 579 525 533 541 530 564 584 673 592 726 548 563 727
[40] 646 708 557 586 592 693 620 548 705 510 677 539 603 726 525 597 563 712

$`third-group`$co
[1] 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

$`fourth-group`
$`fourth-group`$val
[1] 142 317 286 174 656 299 676 206 645 755 514 424 719 741 711 552 550 372 551 520 650 503 667 162 644 595 322 247

$`fourth-group`$co
[1] 0 0 0 0 1 0 1 1 1 0 1 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 1 1

However, if you also want to exclude any co that have all 1s, then we can add an extra condition.

Filter(function(x) sum(x$co) !=0 & sum(x$co == 0) > 0, dummy_list)

purrr

dummy_list %>%
keep( ~ sum(.$co) != 0 & sum(.$co == 0) > 0)


Related Topics



Leave a reply



Submit