Subset with Pattern

subset with pattern

It is possible to do this via

subset(df, select = grepl("1", names(df)))

For automating this as a function, one can use use [ to do the subsetting. Couple that with one of R's regular expression functions and you have all you need.

By way of an example, here is a custom function implementing the ideas I mentioned above.

Subset <- function(df, pattern) {
  ind <- grepl(pattern, names(df))
  df[, ind]
}

Note this does not error checking etc and just relies upon grepl to return a logical vector indicating which columns match pattern, which is then passed to [ to subset by columns. Applied to your df this gives:

> Subset(df, pattern = "1")
   a1 b1
1   1  2
2   2  3
3   3  4
4   4  5
5   5  6
6   6  7
7   7  8
8   8  9
9   9 10
10 10 11

Subset around values that follow a pattern

There are no wild card characters in string equality. You need to use a function. You could use substr() to extract the first three charcters

test <- subset(data, State == "AL" & substr(JobCode,1,3) == ("15-"))

Also note that you don't need to use data$ inside the subset() parameter. Variables are evaulated in the context of the data frame for that function.

How would I create a subset by matching multiple patterns at a specific location in column names?

We could use a combination of str_locate and which to select columns. If you have a list of search terms, then those can be collapsed into one list with paste0. Then, we can locate the search terms at particular positions (i.e., 11 and 12), and select those columns.

library(tidyverse)

key_chr <- c("JG", "HB", "KU")
search_terms <- paste0(key_chr, collapse = "|")

df %>% 
  select(which(str_locate(names(df), search_terms)[,1] == 11 & str_locate(names(df), search_terms)[,2] == 12))

Or in base R, we could write it as:

df <- df[, which(regexpr(search_terms, names(df)) == 11)]

Output

           TCGA.OR.A5JG.01A TCGA.PK.A5HB.01A TCGA.OR.A5KU.01A
cg00000029        0.9091428        0.8603163       0.08972934
cg00000108               NA               NA               NA
cg00000109               NA               NA               NA
cg00000165        0.8705515        0.2839199       0.16676025
cg00000236        0.9170243        0.9235076       0.92036744

subsetting a data frame according to specific pattern

Of course you are able to use subset, i.e.,

res <- subset(fru,grepl("a",rownames(fru)))

Subset pattern implementation

The instructions provided seem to lend themselves more to a c++ style than a C# style. I believe there are better ways than manually building arrays to get a list of subsets in C#. That said, here's how I would go about implementing the instructions as they are written.

To avoid having to repeatedly grow the array of subsets, we should calculate its length before we allocate it.

Assuming n elements in the input, we can determine the number of possible subsets by adding:

All subsets with 0 elements (the empty set)
All subsets with 1 element
All subsets with 2 elements
...
All subsets with n-1 elements
All subsets with n elements (the set itself)

Mathematically, this is the summation of the binomial coefficient. We take the sum from 0 to n of n choose k which evaluates to 2^n.

Wolfram Alpha result.

The jagged array should then contain 2^n arrays whose length will vary from 0 to n.

var input = new int[] { 1, 3, 5 };

var numberOfSubsets = (int)Math.Pow(2, input.Length);

var subsets = new int[numberOfSubsets][];

As the instructions in your article state, we start by adding the empty set to our list of subsets.

int nextEmptyIndex = 0;

subsets[nextEmptyIndex++] = new int[0];

Then, for each element in our input, we record the end of the existing subsets (so we don't end up in an infinite loop chasing the new subsets we will be adding) and add the new subset(s).

foreach (int element in input)
{
    int stopIndex = nextEmptyIndex - 1;

    // Build a new subset by adding the new element
    // to the end of each existing subset.
    for (int i = 0; i <= stopIndex; i++)
    {
        int newSubsetLength = subsets[i].Length + 1;
        int newSubsetIndex = nextEmptyIndex++;
        
        // Allocate the new subset array.
        subsets[newSubsetIndex] = new int[newSubsetLength];

        // Copy the elements from the existing subset.
        Array.Copy(subsets[i], subsets[newSubsetIndex], subsets[i].Length);
        
        // Add the new element at the end of the new subset.
        subsets[newSubsetIndex][newSubsetLength - 1] = element;
    }

}

With some logging at the end, we can see our result:

for (int i = 0; i < subsets.Length; i++)
{
    Console.WriteLine($"subsets[{ i }] = { string.Join(", ", subsets[i]) }");
}

subsets[0] = 
subsets[1] = 1
subsets[2] = 3
subsets[3] = 1, 3
subsets[4] = 5
subsets[5] = 1, 5
subsets[6] = 3, 5
subsets[7] = 1, 3, 5

Try it out!

Subsetting rows of data frame by charater patterns (grepl) in a for loop

You can try this:

df[!df$organism %in% c("bat","virus","pangolian"),]

  organism size
1    human    6
2      cat    4
3     bird    2

Update: Based on new data, here an approach using grepl(). These functions can be used to avoid loops:

#Vectors
vectors<-c("bat","virus","pangolian")
#Format
vectors2 <- paste0(vectors,collapse = '|')
#Avoid loop
df[!grepl(pattern = vectors2,df$organism),]

        organism size
1 human_longname    6
2   cat_longname    4
3  bird_longname    2

Also just for curious, here maybe a not optimal loop to do the same task creating a new dataframe and an index:

#Create index
index <- c()
#Loop
for(i in 1:dim(df)[1])
{
  if(grepl(vectors2,df$organism[i])==F) 
  {
    index <- c(index,i)
  }
  ndf <- df[index,]
}

ndf

        organism size
1 human_longname    6
2   cat_longname    4
3  bird_longname    2

subset a list within a list by column's name start with all requested pattern

Try this:

new_list  <- lapply(L, \(x) x[
    all(
        any(grepl("^A", names(x))),
        any(grepl("^B", names(x)))
    )
]
)

This will return an empty list in place of L1, and the contents of L2 to L4.

If you don't want an empty list for L1 you can subset it again:

new_list[sapply(new_list, length)>0]

Extract(subset) two ords same time in any order with stringr in R

Using str_subset - regex would be to specify text-1 followed by characters (.*) and then text-2 or (|) in the reverse way

library(stringr)
str_subset(texts, 'text-1.*text-2|text-2.*text-1')
[1] "I-have-text-1-and-text-2" "I-have-text-2-and-text-1"

Subset with Pattern