subset with pattern
It is possible to do this via
subset(df, select = grepl("1", names(df)))
For automating this as a function, one can use use [
to do the subsetting. Couple that with one of R's regular expression functions and you have all you need.
By way of an example, here is a custom function implementing the ideas I mentioned above.
Subset <- function(df, pattern) {
ind <- grepl(pattern, names(df))
df[, ind]
}
Note this does not error checking etc and just relies upon grepl
to return a logical vector indicating which columns match pattern
, which is then passed to [
to subset by columns. Applied to your df
this gives:
> Subset(df, pattern = "1")
a1 b1
1 1 2
2 2 3
3 3 4
4 4 5
5 5 6
6 6 7
7 7 8
8 8 9
9 9 10
10 10 11
Subset around values that follow a pattern
There are no wild card characters in string equality. You need to use a function. You could use substr()
to extract the first three charcters
test <- subset(data, State == "AL" & substr(JobCode,1,3) == ("15-"))
Also note that you don't need to use data$
inside the subset()
parameter. Variables are evaulated in the context of the data frame for that function.
How would I create a subset by matching multiple patterns at a specific location in column names?
We could use a combination of str_locate
and which
to select columns. If you have a list of search terms, then those can be collapsed into one list with paste0
. Then, we can locate the search terms at particular positions (i.e., 11
and 12
), and select those columns.
library(tidyverse)
key_chr <- c("JG", "HB", "KU")
search_terms <- paste0(key_chr, collapse = "|")
df %>%
select(which(str_locate(names(df), search_terms)[,1] == 11 & str_locate(names(df), search_terms)[,2] == 12))
Or in base R, we could write it as:
df <- df[, which(regexpr(search_terms, names(df)) == 11)]
Output
TCGA.OR.A5JG.01A TCGA.PK.A5HB.01A TCGA.OR.A5KU.01A
cg00000029 0.9091428 0.8603163 0.08972934
cg00000108 NA NA NA
cg00000109 NA NA NA
cg00000165 0.8705515 0.2839199 0.16676025
cg00000236 0.9170243 0.9235076 0.92036744
subsetting a data frame according to specific pattern
Of course you are able to use subset
, i.e.,
res <- subset(fru,grepl("a",rownames(fru)))
Subset pattern implementation
The instructions provided seem to lend themselves more to a c++
style than a C#
style. I believe there are better ways than manually building arrays to get a list of subsets in C#
. That said, here's how I would go about implementing the instructions as they are written.
To avoid having to repeatedly grow the array of subsets, we should calculate its length before we allocate it.
Assuming n
elements in the input, we can determine the number of possible subsets by adding:
- All subsets with
0
elements (the empty set) - All subsets with
1
element - All subsets with
2
elements - ...
- All subsets with
n-1
elements - All subsets with
n
elements (the set itself)
Mathematically, this is the summation of the binomial coefficient. We take the sum from 0
to n
of n
choose k
which evaluates to 2^n
.
The jagged array should then contain 2^n
arrays whose length will vary from 0
to n
.
var input = new int[] { 1, 3, 5 };
var numberOfSubsets = (int)Math.Pow(2, input.Length);
var subsets = new int[numberOfSubsets][];
As the instructions in your article state, we start by adding the empty set to our list of subsets.
int nextEmptyIndex = 0;
subsets[nextEmptyIndex++] = new int[0];
Then, for each element in our input, we record the end of the existing subsets (so we don't end up in an infinite loop chasing the new subsets we will be adding) and add the new subset(s).
foreach (int element in input)
{
int stopIndex = nextEmptyIndex - 1;
// Build a new subset by adding the new element
// to the end of each existing subset.
for (int i = 0; i <= stopIndex; i++)
{
int newSubsetLength = subsets[i].Length + 1;
int newSubsetIndex = nextEmptyIndex++;
// Allocate the new subset array.
subsets[newSubsetIndex] = new int[newSubsetLength];
// Copy the elements from the existing subset.
Array.Copy(subsets[i], subsets[newSubsetIndex], subsets[i].Length);
// Add the new element at the end of the new subset.
subsets[newSubsetIndex][newSubsetLength - 1] = element;
}
}
With some logging at the end, we can see our result:
for (int i = 0; i < subsets.Length; i++)
{
Console.WriteLine($"subsets[{ i }] = { string.Join(", ", subsets[i]) }");
}
subsets[0] =
subsets[1] = 1
subsets[2] = 3
subsets[3] = 1, 3
subsets[4] = 5
subsets[5] = 1, 5
subsets[6] = 3, 5
subsets[7] = 1, 3, 5
Try it out!
Subsetting rows of data frame by charater patterns (grepl) in a for loop
You can try this:
df[!df$organism %in% c("bat","virus","pangolian"),]
organism size
1 human 6
2 cat 4
3 bird 2
Update: Based on new data, here an approach using grepl()
. These functions can be used to avoid loops:
#Vectors
vectors<-c("bat","virus","pangolian")
#Format
vectors2 <- paste0(vectors,collapse = '|')
#Avoid loop
df[!grepl(pattern = vectors2,df$organism),]
organism size
1 human_longname 6
2 cat_longname 4
3 bird_longname 2
Also just for curious, here maybe a not optimal loop to do the same task creating a new dataframe and an index:
#Create index
index <- c()
#Loop
for(i in 1:dim(df)[1])
{
if(grepl(vectors2,df$organism[i])==F)
{
index <- c(index,i)
}
ndf <- df[index,]
}
ndf
organism size
1 human_longname 6
2 cat_longname 4
3 bird_longname 2
subset a list within a list by column's name start with all requested pattern
Try this:
new_list <- lapply(L, \(x) x[
all(
any(grepl("^A", names(x))),
any(grepl("^B", names(x)))
)
]
)
This will return an empty list in place of L1
, and the contents of L2
to L4
.
If you don't want an empty list for L1
you can subset it again:
new_list[sapply(new_list, length)>0]
Extract(subset) two ords same time in any order with stringr in R
Using str_subset
- regex
would be to specify text-1
followed by characters (.*
) and then text-2
or (|
) in the reverse way
library(stringr)
str_subset(texts, 'text-1.*text-2|text-2.*text-1')
[1] "I-have-text-1-and-text-2" "I-have-text-2-and-text-1"
Related Topics
Plot Line on Top of Stacked Bar Chart in Ggplot2
Scale Back Linear Regression Coefficients in R from Scaled and Centered Data
Percentage of Overlap Between Polygons
Remove Zombie Processes Using Parallel Package
Installing R Studio with Anaconda
Build Word Co-Occurence Edge List in R
Why Does ".." Work to Pass Column Names in a Character Vector Variable
Can You More Clearly Explain Lazy Evaluation in R Function Operators
Ggplot2: Dashed Line in Legend
Use Loop to Split a List into Multiple Dataframes
Compute Only Diagonals of Matrix Multiplication in R
How to Use an R Script from Github
Use Object Names as List Names in R
Changing the Appearance of Facet Labels Size
R Calculate the Average of One Column Corresponding to Each Bin of Another Column
How to Plot a Boxplot from Previously-Calculated Statistics Easily (In R)
R: Row-Wise Dplyr::Mutate Using Function That Takes a Data Frame Row and Returns an Integer