Pattern matching using a wildcard
If you want to examine elements inside a dataframe you should not be using ls()
which only looks at the names of objects in the current workspace (or if used inside a function in the current environment). Rownames or elements inside such objects are not visible to ls()
(unless of course you add an environment argument to the ls(.)
-call). Try using grep()
which is the workhorse function for pattern matching of character vectors:
result <- a[ grep("blue", a$x) , ] # Note need to use `a$` to get at the `x`
If you want to use subset then consider the closely related function grepl()
which returns a vector of logicals can be used in the subset argument:
subset(a, grepl("blue", a$x))
x
2 blue1
3 blue2
Edit: Adding one "proper" use of glob2rx within subset():
result <- subset(a, grepl(glob2rx("blue*") , x) )
result
x
2 blue1
3 blue2
I don't think I actually understood glob2rx
until I came back to this question. (I did understand the scoping issues that were ar the root of the questioner's difficulties. Anybody reading this should now scroll down to Gavin's answer and upvote it.)
Pattern matching on a string that already has wildcards in it
As I wrote in my comment for the most general cases you'd have to create the minimal deterministic finite automaton of the two expressions and compare the two automatons. Having said that there may be a bruteforce/poorman's solution to your question.
Based on your examples it sounds like you're interested in seeing if one of input/pattern matches all the strings generated by the other.
IsMatch("XYZ%", "?Y%") // returns true because ?Y% matches a superset of strings matched by "XYZ%"
IsMatch("%", "?Y%") // returns true because "%" matches a superset of "?Y%"
You can check if input
indeed matches a subset of strings generated by pattern
as long as
- you really can limit yourselves to % and ? operators as specified
- your input/pattern strings are reasonably short - more specifically the occurences of % in either input or pattern are less than about 20 or so.
The basic idea is you generate a list of representative strings for input
and match each one with pattern using your favorite regex engine. If all the representatives match - input
matches a subset of pattern
. This algorithm for IsSubset
can be described as follows
let c = some character not in `pattern` (lexically speaking)
let searchString = replace all occurences of '?' in input with c
add searchString to setOfSearchStrings
foreach occurence of '%' in input
foreach str in setOfSearchStrings
replace str with two strings - {str with c in place of '%', str without the '%'}
foreach str in setOfSearchStrings
if str doesn't "regex" match with pattern
return false
return true
for example if input is ?X%YZ% and the pattern
doesn't contain the character A the list generated would be
AXYZ
AXYZA
AXAYZ
AXAYZA
It's easy to see that the number of strings in this list is 2^n where n is the number of '%' in input.
Also it's easy to swap the order of arguments and figure out the relationship the other way round. So in effect your
IsMatch(input,pattern) = IsSubset(input,pattern) || IsSubset(pattern,input)
Java string matching with wildcards
Just use bash style pattern to Java style pattern converter:
public static void main(String[] args) {
String patternString = createRegexFromGlob("abc*");
List<String> list = Arrays.asList("abf", "abc_fgh", "abcgafa", "fgabcafa");
list.forEach(it -> System.out.println(it.matches(patternString)));
}
private static String createRegexFromGlob(String glob) {
StringBuilder out = new StringBuilder("^");
for(int i = 0; i < glob.length(); ++i) {
final char c = glob.charAt(i);
switch(c) {
case '*': out.append(".*"); break;
case '?': out.append('.'); break;
case '.': out.append("\\."); break;
case '\\': out.append("\\\\"); break;
default: out.append(c);
}
}
out.append('$');
return out.toString();
}
Is there an equivalent of java.util.regex for “glob” type patterns?
Convert wildcard to a regex expression
How to use wildcard in string matching
You are going to want to look at the re module. This will let you do a regular expression and accomplish the same thing as the * does in the linux command line.
String Matching with wildcard in Python
The idea is to convert what you are looking for, ABCDEF
in this case, into the following regular expression:
([A]|\.)([B]|\.)([C]|\.)([D]|\.)([E]|\.)([F]|\.)
Each character is placed in []
in case it turns out to be a regex special character. The only complication is if one of the search characters is ^
, as in ABCDEF^
. The ^
character should just be escaped and is therefore handled specially.
Then you search the string for that pattern using re.search
:
import re
substring = 'ABCDEF'
large_string = 'QQQQQABC.EF^QQQQQ'
new_substring = re.sub(r'([^^])', r'([\1]|\\.)', substring)
new_substring = re.sub(r'\^', r'(\\^|\\.)', new_substring)
print(new_substring)
regex = re.compile(new_substring)
m = regex.search(large_string)
if (m):
print(m.span())
Prints:
([A]|\.)([B]|\.)([C]|\.)([D]|\.)([E]|\.)([F]|\.)
(5, 11)
Matching strings with wildcard
You could use the VB.NET Like-Operator:
string text = "x is not the same as X and yz not the same as YZ";
bool contains = LikeOperator.LikeString(text,"*X*YZ*", Microsoft.VisualBasic.CompareMethod.Binary);
Use CompareMethod.Text
if you want to ignore the case.
You need to add using Microsoft.VisualBasic.CompilerServices;
and add a reference to the Microsoft.VisualBasic.dll
.
Since it's part of the .NET framework and will always be, it's not a problem to use this class.
Python 3.10 pattern matching (PEP 634) - wildcard in string
You can use a guard:
for event in data:
match event:
case {'id': x} if x.startswith("matchme"): # guard
print(event["message"])
case {'id':'anotherid'}:
print(event["message"])
Quoting from the official documentation,
Guard
We can add an
if
clause to a pattern, known as a “guard”. If the
guard isfalse
, match goes on to try the nextcase
block. Note that
value capture happens before the guard is evaluated:match point:
case Point(x, y) if x == y:
print(f"The point is located on the diagonal Y=X at {x}.")
case Point(x, y):
print(f"Point is not on the diagonal.")
See also:
- PEP 622 - Guards
- PEP 636 - Adding conditions to patterns
- PEP 634 - Guards
How to match a wildcard for strings?
You are attempting to use wildcard syntax, but Groovy expects regular expression syntax for its pattern matching.
What went wrong with your attempt:
Attempt #1: p10.7.*
A regular expression of .
matches any single character and .*
matches 0 or more characters. This means:
p10
{exactly one character of any kind here}7
{zero or more characters of any
kind here}
You didn't realize it, but the .
character in your first attempt was acting like a single-character wildcard too. This might match with p10x7abcdefg
for example. It also does match p10.7.8
though. But be careful, it also matches p10.78
, because the .*
expression at the end of your pattern will happily match any sequence of characters, thus any and all characters following p10.7
are accepted.
Attempt #2: p10_7_*
_
matches only a literal underscore. But _*
means to match zero or more underscores. It does not mean to match characters of any kind. So p10_7_*
matches things like p10_7_______
. Literally:
p10_7
{zero or more underscores here}
What you can do instead:
You probably want a regular expression like p10_7_\d+
This will match things like p10_7_3
or p10_7_422
. It works by matching the literal text p10_7_
followed by one or more digits where a digit is 0
through 9
. \d
matches any digit, and +
means to match one or more of the preceding thing. Literally:
p10_7_
{one or more digits here}
Related Topics
How to Use Map from Purrr with Dplyr::Mutate to Create Multiple New Columns Based on Column Pairs
Use a Variable Within a Plotmath Expression
How to Create a Grouped Boxplot in R
Typeof Returns Integer for Something That Is Clearly a Factor
How to Split a Data Frame into Multiple Dataframes with Each Two Columns as a New Dataframe
Recode Categorical Variable to Binary (0/1)
Ggplot Geom_Bar: Meaning of Aes(Group = 1)
How Does One Stop Using Rowwise in Dplyr
Error in Installation a R Package
Ggplot2: Adjust the Symbol Size in Legends
How to Return Number of Decimal Places in R
How to Convert Integer into Categorical Data in R
How to Implement a Cleanup Routine in R Shiny