How to subset a list of data.frames?
If we want to subset the list
elements based on names
mainlist_new <- lapply(mainlist, `[`, c("rainfall", "yield"))
-output
> str(mainlist_new)
List of 2
$ :List of 2
..$ rainfall:'data.frame': 5 obs. of 3 variables:
.. ..$ station : chr [1:5] "MADA1" "MADA2" "MADA3" "MADA4" ...
.. ..$ rainfall: num [1:5] 0 5 10 15 20
.. ..$ yield : num [1:5] 2000 3000 4000 5000 6000
..$ yield :'data.frame': 5 obs. of 3 variables:
.. ..$ station : chr [1:5] "MADA1" "MADA2" "MADA3" "MADA4" ...
.. ..$ rainfall: num [1:5] 0 5 10 15 20
.. ..$ yield : num [1:5] 2000 3000 4000 5000 6000
$ :List of 2
..$ rainfall:'data.frame': 5 obs. of 3 variables:
.. ..$ station : chr [1:5] "MADA1" "MADA2" "MADA3" "MADA4" ...
.. ..$ rainfall: num [1:5] 0 5 10 15 20
.. ..$ yield : num [1:5] 2000 3000 4000 5000 6000
..$ yield :'data.frame': 5 obs. of 3 variables:
.. ..$ station : chr [1:5] "MADA1" "MADA2" "MADA3" "MADA4" ...
.. ..$ rainfall: num [1:5] 0 5 10 15 20
.. ..$ yield : num [1:5] 2000 3000 4000 5000 6000
How can I subset a list in r by extracting the elements that contain a string?
The grep
should be on the names
and not the values of the list
mylist_sub <- mylist[grep('pt', names(mylist))]
How do you subset data from a list in R?
We could either use lapply
from base R
out <- lapply(ret, function(x) x %>%
html_nodes(".iCIMS_JobContent, #jobDescriptionText") %>%
html_text())
or loop with map
from purrr
library(purrr)
out <- map(ret, ~ .x %>%
html_nodes(".iCIMS_JobContent, #jobDescriptionText") %>%
html_text())
NOTE: Both are looping over the elements of the list
, the .x
or x
are the individual elements (from anonymous function - i.e. function created on the fly (function(x)
or ~
- in tidyverse
)
Subset Data Based On Elements In List
Classic lapply
.
x <- lapply(variableData, function(x){subset(Data, Column_X == x)})
x
# [[1]]
# Data_x Data_y Column_X
# 1 -34 12 A
# 6 -35 24 A
#
# [[2]]
# Data_x Data_y Column_X
# 5 -34 10 B
# 7 -35 16 B
# 8 -33 22 B
it returns a list of all the subsets. To rbind
all these list elements just
do.call(rbind, x)
# Data_x Data_y Column_X
# 1 -34 12 A
# 6 -35 24 A
# 5 -34 10 B
# 7 -35 16 B
# 8 -33 22 B
however, as @Frank pointed out, you could use basic subsetting in your code:
Data[Data$Column_X %in% variableData,]
# Data_x Data_y Column_X
# 1 -34 12 A
# 5 -34 10 B
# 6 -35 24 A
# 7 -35 16 B
# 8 -33 22 B
"Warning
This is a convenience function intended for use interactively. For programming it is better to use the standard subsetting functions like
[
, and in particular the non-standard evaluation of argument subset can have unanticipated consequences." (?subset
)
Furthermore, thus the order of your rows will be kept.
How to subset a list of a list based on the name
We can Extract
library(purrr)
map(L, `[`, select_names)
#[[1]]
#[[1]]$A
# [,1] [,2]
#[1,] 1 3
#[2,] 2 4
#[[2]]
#[[2]]$A
# [,1] [,2]
#[1,] 5 7
#[2,] 6 8
Or using lapply
lapply(L, function(x) x[select_names])
Or without anonymous function call
lapply(L, `[`, select_names)
How to subset dataframe using list that includes partial strings of another variable
You were on the right track, grepl
is your friend. So that you can use the countries with it, paste
them together while collapsing on an or |
.
Then, using subset
EU_p <- paste(EU, collapse='|')
subset(df, grepl(EU_p, a))
# a b
# 2 Croatia USA 2
# 4 Switzerland Hungary 4
# 5 Lithuania Indonesia 5
or as you indicated using brackets
df[grepl(EU_p, df$a), ]
# a b
# 2 Croatia USA 2
# 4 Switzerland Hungary 4
# 5 Lithuania Indonesia 5
The result is any row of df
containing at least one country of the EU
vector, since the pattern as is doesn't distinguish the position.
Data:
df <- structure(list(a = c("Albania Canada", "Croatia USA", "Mexico Egypt",
"Switzerland Hungary", "Lithuania Indonesia"), b = c(1, 2, 3,
4, 5)), class = "data.frame", row.names = c(NA, -5L))
create object with loop that subsets a list in r
I think I understand now; thank you for clarifying your question in the comments. If there's something I missed or you have any questions, please let me know.
Terminology, quickly
I believe you are interested in splitting a vector of strings into multiple shorter vectors of strings based on a pattern within each element. A list is simply a vector of vectors.
g
is a vector of 20 string elements (see Data code chunk below).
is.vector(g)
#> [1] TRUE
Here's a list that only contains one vector.
str(list(g))
#> List of 1
#> $ : chr [1:20] "New AS Plate 21_AS Plate_Sample 12_50.fcs" "New AS Plate 21_AS Plate_Sample 1_100.fcs" "New AS Plate 21_AS Plate_Sample 1_25.fcs" "New AS Plate 21_AS Plate_Sample 1_250.fcs" ...
Now onto the question...
In your question, you specifically ask about using assign()
. Although using assign()
can be convenient, [it is usually not recommended][1]. But sometimes you gotta do what you gotta do, no shame in that. Here's how you could use it manually, on one group at a time (like you show in your question).
# Using assign() one group at a time
h <- g[grep("Sample 1_", g)]
assign(x = "sample_1_group", value = h)
It is pretty straightforward (and seemingly logical) to use assign()
in a for-loop.
The first step in defining a for-loop, is defining what the loop will be "loop over." Or in other words, what will change during each iteration of the loop. In your case, we are looking for a number that defines your groups. We can define a vector of those numbers manually or programmatically.
# Define groups manually
ids <- c(12,1,10,11)
ids
#> [1] 12 1 10 11
# Pattern match groups
all_ids <- gsub(pattern = ".*Sample (\\d+).*", replacement = "\\1", x = g)
all_ids
#> [1] "12" "1" "1" "1" "1" "1" "10" "10" "10" "10" "10" "11" "11" "11" "11"
#> [16] "11" "12" "12" "12" "12"
ids <- unique(all_ids)
ids
#> [1] "12" "1" "10" "11"
After we know what we are looping over, we can define the structure of the loop and functions within in. paste0()
can be a workhorse here. The loop below iterates over ids (one id at a time), finds matching strings in g
, and writes them to your environment as a vector. Because we are using assign()
, we expect a new vector to appear in our environment after each iteration of the loop.
# For-loop with assign
for(i in ids){
a <- paste0("Sample ", i, "_")
h <- g[grep(a, g)]
h_name <- paste0("sample_", i, "_group")
assign(x = h_name, value = h)
}
That technically works, but it's not the best. You may find that it is actually more convenient to use lists (a vector of vectors) to store information from a for-loop. It's fast to program, you don't have a bunch of new objects crowding your workspace, and all the scary things (not really) in that link above won't be a problem. Here's how you could do that:
# Save the results of a for-loop in a list!
# First, make a blank list to hold the results
results <- list()
for(i in ids){
a <- paste0("Sample ", i, "_")
h <- g[grep(a, g)]
h_name <- paste0("sample_", i, "_group")
results[[h_name]] <- h
}
results
#> $sample_12_group
#> [1] "New AS Plate 21_AS Plate_Sample 12_50.fcs"
#> [2] "New AS Plate 21_AS Plate_Sample 12_100.fcs"
#> [3] "New AS Plate 21_AS Plate_Sample 12_25.fcs"
#> [4] "New AS Plate 21_AS Plate_Sample 12_250.fcs"
#> [5] "New AS Plate 21_AS Plate_Sample 12_500.fcs"
#>
#> $sample_1_group
#> [1] "New AS Plate 21_AS Plate_Sample 1_100.fcs"
#> [2] "New AS Plate 21_AS Plate_Sample 1_25.fcs"
#> [3] "New AS Plate 21_AS Plate_Sample 1_250.fcs"
#> [4] "New AS Plate 21_AS Plate_Sample 1_50.fcs"
#> [5] "New AS Plate 21_AS Plate_Sample 1_500.fcs"
#>
#> $sample_10_group
#> [1] "New AS Plate 21_AS Plate_Sample 10_100.fcs"
#> [2] "New AS Plate 21_AS Plate_Sample 10_25.fcs"
#> [3] "New AS Plate 21_AS Plate_Sample 10_250.fcs"
#> [4] "New AS Plate 21_AS Plate_Sample 10_50.fcs"
#> [5] "New AS Plate 21_AS Plate_Sample 10_500.fcs"
#>
#> $sample_11_group
#> [1] "New AS Plate 21_AS Plate_Sample 11_100.fcs"
#> [2] "New AS Plate 21_AS Plate_Sample 11_25.fcs"
#> [3] "New AS Plate 21_AS Plate_Sample 11_250.fcs"
#> [4] "New AS Plate 21_AS Plate_Sample 11_50.fcs"
#> [5] "New AS Plate 21_AS Plate_Sample 11_500.fcs"
Extra credit
For-loops are great: it's easy to see what's going on inside of them, its easy to do a lot of data handling in them, and they are usually reasonably fast to execute. But sometimes its all about speed. R is vectorized ([I'm honestly not exactly sure what this means][2] besides "it can do multiple calculations simultaneously"), but a for-loop doesn't take advantage of this very well. The apply()
family of vectorized functions do, and they can usually be easy to implement in cases where you might also use a for-loop. Here's how you could do that with your data:
# Vectorized
lapply(ids, function(i) g[grep(paste0("Sample ", i, "_"), g)])
#> [[1]]
#> [1] "New AS Plate 21_AS Plate_Sample 12_50.fcs"
#> [2] "New AS Plate 21_AS Plate_Sample 12_100.fcs"
#> [3] "New AS Plate 21_AS Plate_Sample 12_25.fcs"
#> [4] "New AS Plate 21_AS Plate_Sample 12_250.fcs"
#> [5] "New AS Plate 21_AS Plate_Sample 12_500.fcs"
#>
#> [[2]]
#> [1] "New AS Plate 21_AS Plate_Sample 1_100.fcs"
#> [2] "New AS Plate 21_AS Plate_Sample 1_25.fcs"
#> [3] "New AS Plate 21_AS Plate_Sample 1_250.fcs"
#> [4] "New AS Plate 21_AS Plate_Sample 1_50.fcs"
#> [5] "New AS Plate 21_AS Plate_Sample 1_500.fcs"
#>
#> [[3]]
#> [1] "New AS Plate 21_AS Plate_Sample 10_100.fcs"
#> [2] "New AS Plate 21_AS Plate_Sample 10_25.fcs"
#> [3] "New AS Plate 21_AS Plate_Sample 10_250.fcs"
#> [4] "New AS Plate 21_AS Plate_Sample 10_50.fcs"
#> [5] "New AS Plate 21_AS Plate_Sample 10_500.fcs"
#>
#> [[4]]
#> [1] "New AS Plate 21_AS Plate_Sample 11_100.fcs"
#> [2] "New AS Plate 21_AS Plate_Sample 11_25.fcs"
#> [3] "New AS Plate 21_AS Plate_Sample 11_250.fcs"
#> [4] "New AS Plate 21_AS Plate_Sample 11_50.fcs"
#> [5] "New AS Plate 21_AS Plate_Sample 11_500.fcs"
Created on 2021-10-14 by the reprex package (v2.0.1)
Data:
g <- c("New AS Plate 21_AS Plate_Sample 12_50.fcs",
"New AS Plate 21_AS Plate_Sample 1_100.fcs",
"New AS Plate 21_AS Plate_Sample 1_25.fcs",
"New AS Plate 21_AS Plate_Sample 1_250.fcs",
"New AS Plate 21_AS Plate_Sample 1_50.fcs",
"New AS Plate 21_AS Plate_Sample 1_500.fcs",
"New AS Plate 21_AS Plate_Sample 10_100.fcs",
"New AS Plate 21_AS Plate_Sample 10_25.fcs",
"New AS Plate 21_AS Plate_Sample 10_250.fcs",
"New AS Plate 21_AS Plate_Sample 10_50.fcs",
"New AS Plate 21_AS Plate_Sample 10_500.fcs",
"New AS Plate 21_AS Plate_Sample 11_100.fcs",
"New AS Plate 21_AS Plate_Sample 11_25.fcs",
"New AS Plate 21_AS Plate_Sample 11_250.fcs",
"New AS Plate 21_AS Plate_Sample 11_50.fcs",
"New AS Plate 21_AS Plate_Sample 11_500.fcs",
"New AS Plate 21_AS Plate_Sample 12_100.fcs",
"New AS Plate 21_AS Plate_Sample 12_25.fcs",
"New AS Plate 21_AS Plate_Sample 12_250.fcs",
"New AS Plate 21_AS Plate_Sample 12_500.fcs")
[1]: Why is using assign bad?)
[2]: How do I know a function or an operation in R is vectorized?
R - Subsetting list by paramaters of nested list
We can use base R
to do this. No packages are needed
lapply(List_tot, `[`, c("a", "c", "d"))
or with anonymous function
lapply(List_tot, function(x) x[c("a", "c", "d")])
if we need the top 2, order
the number of rows (lengths
work as these are single column matrix
, so the number of rows are equal to the total number of elements, get the head
of the names
of the ordered vector of number of rows and use that to extract the inner list element
lapply(List_tot, function(x) {
x1 <- x[c("a", "c", "d")]
v1 <- lengths(x1)
x1[head(names(v1)[order(-v1)], 2)]
})
Subsetting nested lists based on condition (values) in R
You could use Filter
from base R:
Filter(function(x) sum(x$co) !=0, dummy_list)
Or you can use purrr
:
library(tidyverse)
dummy_list %>%
keep( ~ sum(.$co) != 0)
Output
$`first-group`
$`first-group`$val
[1] 534 582 298 645 314 237 418 348 363 133 493 721 722 210 467 474 145 638 545 330 709 712 674 492 262 663 609 142 428 254
$`first-group`$co
[1] 0 0 1 1 0 1 0 0 1 1 0 0 1 0 0 0 0 1 0 0 0 0 1 1 0 1 1 1 1 0
$`third-group`
$`third-group`$val
[1] 713 721 683 526 699 555 563 672 619 603 588 533 622 724 616 644 730 716 660 663 611 669 644 664 679 514 579 525 533 541 530 564 584 673 592 726 548 563 727
[40] 646 708 557 586 592 693 620 548 705 510 677 539 603 726 525 597 563 712
$`third-group`$co
[1] 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
$`fourth-group`
$`fourth-group`$val
[1] 142 317 286 174 656 299 676 206 645 755 514 424 719 741 711 552 550 372 551 520 650 503 667 162 644 595 322 247
$`fourth-group`$co
[1] 0 0 0 0 1 0 1 1 1 0 1 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 1 1
However, if you also want to exclude any co
that have all 1
s, then we can add an extra condition.
Filter(function(x) sum(x$co) !=0 & sum(x$co == 0) > 0, dummy_list)
purrr
dummy_list %>%
keep( ~ sum(.$co) != 0 & sum(.$co == 0) > 0)
Related Topics
R Ggplot2 Add Today's Date to the Title
How to Produce Time Series for Each Row of a Data Frame with an Unnamed First Column
How Many Non-Na Values in Each Row for a Matrix
Move a Column to First Position in a Data Frame
Circular Heatmap That Looks Like a Donut
Install a Local R Package with Dependencies from Cran Mirror
How to Save a Data Frame as CSV to a User Selected Location Using Tcltk
Reading in Chunks at a Time Using Fread in Package Data.Table
Use Dplyr's Summarise_Each to Return One Row Per Function
R - Common Title and Legend for Combined Plots
Real Time, Auto Updating, Incremental Plot in R
Is There Any Other Package Other Than "Sentiment" to Do Sentiment Analysis in R
Remove the Last Element of a Vector
Save a Data Frame with List-Columns as CSV File
R: How Does a Foreach Loop Find a Function That Should Be Invoked
Is It Bad Practice to Access S4 Objects Slots Directly Using @