Pattern Matching Using a Wildcard

Pattern matching using a wildcard

If you want to examine elements inside a dataframe you should not be using ls() which only looks at the names of objects in the current workspace (or if used inside a function in the current environment). Rownames or elements inside such objects are not visible to ls() (unless of course you add an environment argument to the ls(.)-call). Try using grep() which is the workhorse function for pattern matching of character vectors:

result <- a[ grep("blue", a$x) , ]  # Note need to use `a$` to get at the `x`

If you want to use subset then consider the closely related function grepl() which returns a vector of logicals can be used in the subset argument:

subset(a, grepl("blue", a$x))
x
2 blue1
3 blue2

Edit: Adding one "proper" use of glob2rx within subset():

result <- subset(a,  grepl(glob2rx("blue*") , x) )
result
x
2 blue1
3 blue2

I don't think I actually understood glob2rx until I came back to this question. (I did understand the scoping issues that were ar the root of the questioner's difficulties. Anybody reading this should now scroll down to Gavin's answer and upvote it.)

Pattern matching on a string that already has wildcards in it

As I wrote in my comment for the most general cases you'd have to create the minimal deterministic finite automaton of the two expressions and compare the two automatons. Having said that there may be a bruteforce/poorman's solution to your question.

Based on your examples it sounds like you're interested in seeing if one of input/pattern matches all the strings generated by the other.

IsMatch("XYZ%", "?Y%") // returns true because ?Y% matches a superset of strings matched by "XYZ%"
IsMatch("%", "?Y%") // returns true because "%" matches a superset of "?Y%"

You can check if input indeed matches a subset of strings generated by pattern as long as

  • you really can limit yourselves to % and ? operators as specified
  • your input/pattern strings are reasonably short - more specifically the occurences of % in either input or pattern are less than about 20 or so.

The basic idea is you generate a list of representative strings for input and match each one with pattern using your favorite regex engine. If all the representatives match - input matches a subset of pattern. This algorithm for IsSubset can be described as follows

let c = some character not in `pattern` (lexically speaking)
let searchString = replace all occurences of '?' in input with c
add searchString to setOfSearchStrings
foreach occurence of '%' in input
foreach str in setOfSearchStrings
replace str with two strings - {str with c in place of '%', str without the '%'}

foreach str in setOfSearchStrings
if str doesn't "regex" match with pattern
return false

return true

for example if input is ?X%YZ% and the pattern doesn't contain the character A the list generated would be

AXYZ

AXYZA

AXAYZ

AXAYZA

It's easy to see that the number of strings in this list is 2^n where n is the number of '%' in input.

Also it's easy to swap the order of arguments and figure out the relationship the other way round. So in effect your

IsMatch(input,pattern) = IsSubset(input,pattern) || IsSubset(pattern,input)

Java string matching with wildcards

Just use bash style pattern to Java style pattern converter:

public static void main(String[] args) {
String patternString = createRegexFromGlob("abc*");
List<String> list = Arrays.asList("abf", "abc_fgh", "abcgafa", "fgabcafa");
list.forEach(it -> System.out.println(it.matches(patternString)));
}

private static String createRegexFromGlob(String glob) {
StringBuilder out = new StringBuilder("^");
for(int i = 0; i < glob.length(); ++i) {
final char c = glob.charAt(i);
switch(c) {
case '*': out.append(".*"); break;
case '?': out.append('.'); break;
case '.': out.append("\\."); break;
case '\\': out.append("\\\\"); break;
default: out.append(c);
}
}
out.append('$');
return out.toString();
}

Is there an equivalent of java.util.regex for “glob” type patterns?

Convert wildcard to a regex expression

How to use wildcard in string matching

You are going to want to look at the re module. This will let you do a regular expression and accomplish the same thing as the * does in the linux command line.

String Matching with wildcard in Python

The idea is to convert what you are looking for, ABCDEF in this case, into the following regular expression:

([A]|\.)([B]|\.)([C]|\.)([D]|\.)([E]|\.)([F]|\.)

Each character is placed in [] in case it turns out to be a regex special character. The only complication is if one of the search characters is ^, as in ABCDEF^. The ^ character should just be escaped and is therefore handled specially.

Then you search the string for that pattern using re.search:

import re

substring = 'ABCDEF'
large_string = 'QQQQQABC.EF^QQQQQ'

new_substring = re.sub(r'([^^])', r'([\1]|\\.)', substring)
new_substring = re.sub(r'\^', r'(\\^|\\.)', new_substring)
print(new_substring)
regex = re.compile(new_substring)
m = regex.search(large_string)
if (m):
print(m.span())

Prints:

([A]|\.)([B]|\.)([C]|\.)([D]|\.)([E]|\.)([F]|\.)
(5, 11)

Matching strings with wildcard

You could use the VB.NET Like-Operator:

string text = "x is not the same as X and yz not the same as YZ";
bool contains = LikeOperator.LikeString(text,"*X*YZ*", Microsoft.VisualBasic.CompareMethod.Binary);

Use CompareMethod.Text if you want to ignore the case.

You need to add using Microsoft.VisualBasic.CompilerServices; and add a reference to the Microsoft.VisualBasic.dll.

Since it's part of the .NET framework and will always be, it's not a problem to use this class.

Python 3.10 pattern matching (PEP 634) - wildcard in string

You can use a guard:

for event in data:
match event:
case {'id': x} if x.startswith("matchme"): # guard
print(event["message"])
case {'id':'anotherid'}:
print(event["message"])

Quoting from the official documentation,

Guard

We can add an if clause to a pattern, known as a “guard”. If the
guard is false, match goes on to try the next case block
. Note that
value capture happens before the guard is evaluated:

match point:
case Point(x, y) if x == y:
print(f"The point is located on the diagonal Y=X at {x}.")
case Point(x, y):
print(f"Point is not on the diagonal.")

See also:

  • PEP 622 - Guards
  • PEP 636 - Adding conditions to patterns
  • PEP 634 - Guards

How to match a wildcard for strings?

You are attempting to use wildcard syntax, but Groovy expects regular expression syntax for its pattern matching.

What went wrong with your attempt:

Attempt #1: p10.7.*

A regular expression of . matches any single character and .* matches 0 or more characters. This means:

p10{exactly one character of any kind here}7{zero or more characters of any
kind here}

You didn't realize it, but the . character in your first attempt was acting like a single-character wildcard too. This might match with p10x7abcdefg for example. It also does match p10.7.8 though. But be careful, it also matches p10.78, because the .* expression at the end of your pattern will happily match any sequence of characters, thus any and all characters following p10.7 are accepted.

Attempt #2: p10_7_*

_ matches only a literal underscore. But _* means to match zero or more underscores. It does not mean to match characters of any kind. So p10_7_* matches things like p10_7_______. Literally:

p10_7{zero or more underscores here}

What you can do instead:

You probably want a regular expression like p10_7_\d+

This will match things like p10_7_3 or p10_7_422. It works by matching the literal text p10_7_ followed by one or more digits where a digit is 0 through 9. \d matches any digit, and + means to match one or more of the preceding thing. Literally:

p10_7_{one or more digits here}



Related Topics



Leave a reply



Submit