Need to Perform Wildcard (*,, etc) Search on a String Using Regex

Regular expression wildcard

The wildcard * is equivalent to the Regex pattern ".*" (greedy) or ".*?" (not-greedy), so you'll want to perform a string.Replace():

string pattern = Regex.Escape(inputPattern).Replace("\\*", ".*?");

Note the Regex.Escape(inputPattern) at the beginning. Since inputPattern may contain special characters used by Regex, you need to properly escape those characters. If you don't, your pattern would explode.

Regex.IsMatch(input, ".NET"); // may match ".NET", "aNET", "FNET", "7NET" and many more

As a result, the wildcard * is escaped to \\*, which is why we replace the escaped wildcard rather than just the wildcard itself.



To use the pattern

you can do either:

Regex.IsMatch(input, pattern);

or

var regex = new Regex(pattern);
regex.IsMatch(input);


Difference between greedy and not-greedy

The difference is in how much the pattern will try to match.

Consider the following string: "hello (x+1)(x-1) world". You want to match the opening bracket ( and the closing bracket ) as well as anything in-between.

Greedy would match only "(x+1)(x-1)" and nothing else. It basically matches the longest substring it can find.

Not-greedy would match "(x+1)" and "(x-1)" and nothing else. In other words: the shortest substrings possible.

Matching strings with wildcard

You could use the VB.NET Like-Operator:

string text = "x is not the same as X and yz not the same as YZ";
bool contains = LikeOperator.LikeString(text,"*X*YZ*", Microsoft.VisualBasic.CompareMethod.Binary);

Use CompareMethod.Text if you want to ignore the case.

You need to add using Microsoft.VisualBasic.CompilerServices; and add a reference to the Microsoft.VisualBasic.dll.

Since it's part of the .NET framework and will always be, it's not a problem to use this class.

Wild card search in C#

If you wish to evaluate each pattern in you document to match against the input string, you'll have to create a RegEx for each pattern, like you mention. There's no shortcut.

I guess you worry about perfomance. Are you sure it's a problem? If so, you should try to find a different approach altogether.

Are you going to match many input strings? In that case, you should keep your RegExes (in a list, say) rather than creating them each time. RegExes can be reused.

Otherwise, I can see no big problem with your proposed approach.

Regular Expression Wildcard Matching

Unless you want some funny behaviour, I would recommend you use \w instead of .

. matches whitespace and other non-word symbols, which you might not want it to do.

So I would replace ? with \w and replace * with \w*

Also if you want * to match at least one character, replace it with \w+ instead. This would mean that ben* would match bend and bending but not ben - it's up to you, just depends what your requirements are.

How to check if a string contains substring with wildcard? like abc*xyz

If asterisk is the only wildcard character that you wish to allow, you could replace all asterisks with .*?, and use regular expressions:

var filter = "[quick*jumps*lazy dog]";
var parts = filter.Split('*').Select(s => Regex.Escape(s)).ToArray();
var regex = string.Join(".*?", parts);

This produces \[quick.*?jumps.*?lazy\ dog] regex, suitable for matching inputs.

Demo.



Related Topics



Leave a reply



Submit