Why \b does not match word using .net regex
The C# language and .NET Regular Expressions both have their own distinct set of backslash-escape sequences, but the C# compiler is intercepting the "\b"
in your string and converting it into an ASCII backspace character so the RegEx
class never sees it. You need to make your string verbatim (prefix with an at-symbol) or double-escape the 'b' so the backslash is passed to RegEx like so:
@"\bCOMPILATION UNIT";
Or
"\\bCOMPILATION UNIT"
I'll say the .NET RegEx documentation does not make this clear. It took me a while to figure this out at first too.
Fun-fact: The \r
and \n
characters (carriage-return and line-break respectively) and some others are recognized by both RegEx and the C# language, so the end-result is the same, even if the compiled string is different.
Using \b in a .NET regex
Sure. Your \b
is actually the backspace character, not the regex \b
. You need to either use "\\b"
to embed this in a C# string literal, or use verbatim string literals: @"\b"
.
Remember: The backslash is an escape character for C# strings just as it is for regex, so if you're not careful, you need to escape things twice, once for the string literal, and once for the regex.
Another thing: Stay away from \b
, the same with \w
. \b
is an anchor defined in terms of \w
and \w
is a character class that's pretty much useless for anything except quick one-off tasks where you have very tight control over everything you want to match. \b
simply means that to one side of the anchor is a character matching \w
and to the other side there isn't (either end of string or a character matching \W
). Now, \w
includes things like numbers, and _
. If you search for vaguely word-like things at least I tend to not think of numbers and underscores as part of words. Oftentimes I like to make it explicit what it actually is what I'm looking for, e.g. via lookaround assertions: (?<!\p{L})
is a way of specifying that there is no letter directly preceding the current point in the match, being effectively a replacement for \b
at the start of the pattern. Likewise (?!\p{L})
can be used for the \b
at the end of the pattern. When writing them like this you have much more control about what you consider suitable "boundaries" for the things you're looking for, e.g. maybe you want to find foo
only when it's bounded by whitespace: (?<![^\S])foo(?![^\S])
(note the double negative here, because the lookahead and lookbehind can only be negative so they work also at the start and end of the string).
Why is this word boundary regex not matching
.
is not a word character. \b
is checking word boundaries, i.e. boundaries between word and characters not considered to be part of words. Therefore you cannot expect .
to be inside the "word" 1.
because these two characters do not form a word.
Quick reference document describes \b
as:
The match must occur on a boundary between a \w (alphanumeric) and a \W (nonalphanumeric) character.
And \w
is described as:
Matches any word character.
If you check what a Word character is, you will find it includes Unicode classes Ll [Letter, Lowercase];
Lu [Letter, Uppercase];
Lt [Letter, Titlecase];
Lo [Letter, Other];
Lm [Letter, Modifier];
Mn [Mark, Nonspacing];
Nd [Number, Decimal Digit] and
Pc [Punctuation, Connector].
But .
has Unicode class Po [Punctuation, Other] which is not listed above.
So if you expect \b
to match a word boundary in 1.
, it is right between 1
and .
. This answers your question Why.
Note: .NET regex expressions should be preferably tested on testing sites dedicated to them like for example Regex Storm. If you test your regex using PCRE regex flavour (like on the site you linked), you can get different results from .NET.
regular expressions with word boundaries fail in .NET Regex
Replace
Regex regFail = new Regex(@"\b§pattern§\b");
with
Regex regFail = new Regex(@"§\bpattern\b§");
§
is a non-word character, thus, \b
prevents pattern
from being matched. Perhaps, you do not even need the \b
here since the pattern
is already inside the non-word characters?
Regex regFail = new Regex(@"§pattern§");
.Net Regular Expression matching the string C#
The \b
does not match between the pound sign and a space because they both match non word characters but is does match between the pound sign and the d char.
Instead of a second word boundary \b
, you could assert that what is on the right is not a non-whitspace \S
character using a negative lookahead (?!
:
\bC#(?!\S)
Regex demo
As pointed out in the comments by @elgonzo, to prevent breaking the match when a non word char follows C#
, you could use a positive lookahead to assert what is on the right is either a non word char \W
or assert the end of the string $
\bC#(?=\W|$)
Regex demo
Regex to match a string which does not contain a specific word next to the match string
I want regex which does not contain not(in first string), I want to match only 2nd string.
That means you should check if the This is...
pattern is not followed by newline sequence + spaces* + not
as a whole word with backtracking disabled. We can disable backtracking using atomic group in .NET:
(?>This\s+is(?:\s+\d+)+ *)(?![\r\n]+\p{Zs}*not\b)
See the regex demo
Part 1 of the regex This\s+is(?:\s+\d+)+ *
matches This is
followed with one or more sequences of one or more whitespaces followed with one or more digits, then followed with zero or more spaces. The (?>...)
prevent backtracking inside this part of the pattern. The lookahead (?![\r\n]+\p{Zs}*not\b)
fails the match if the previously matched text is followed with the whitespaces followed with a whole word not
(where \b
stands for a word boundary).
How would I write a regular expression to match numeric or alphanumeric words, but not words without numbers?
You may use
(?xi) # Enable free-spacing and case insensitive mode
\b # Word boundary
(?=[A-Z.]*[0-9]) # After any 0+ letters/dots there must be a digit
[A-Z0-9]+ # 1+ letters or digits
(?:\.[A-Z0-9]+)* # 0+ repetitions of a . and then 1+ letters/digits
\b # Word boundary
See the regex demo at regex101.com and a .NET regex demo showing it really works in a .NET environment.
In C# code, you may use
var Pattern = new Regex(@"
\b # Word boundary
(?=[A-Z.]*[0-9]) # After any 0+ letters/dots there must be a digit
[A-Z0-9]+ # 1+ letters or digits
(?:\.[A-Z0-9]+)* # 0+ repetitions of a . and then 1+ letters/digits
\b # Word boundary",
RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);
where (?x)
= RegexOptions.IgnorePatternWhitespace
and (?i)
= RegexOptions.IgnoreCase
.
Regex: Match word not containing
Your ^((?!Drive).)*$
did not work at all because you tested against a multiline input.
You should use /m
modifier to see what the regex matches. It just matches lines that do not contain Drive
, but that tempered greedy token does not check if EFI
is inside the string.
Actually, the $
anchor is redundant here since .*
matches any zero or more characters other than line break characters. You may simply remove it from your pattern.
(NOTE: In .NET, you will need to use [^\r\n]*
instead of .*
since .
in a .NET pattern matches any char but a newline, LF, char, and matches all other line break chars, like a carriage return, CR, etc.).
Use something like
^(?!.*Drive).*EFI.*
Or, if you need to only fail the match if a Drive
is present as a whole word:
^(?!.*\bDrive\b).*EFI.*
Or, if there are more words you want to signal the failure with:
^(?!.*(?:Drive|SomethingElse)).*EFI.*
^(?!.*\b(?:Drive|SomethingElse)\b).*EFI.*
See regex demo
Here,
^
- matches start of string(?!.*Drive)
- makes sure there is no "Drive" in the string (so,Drives
are NOT allowed)(?!.*\bDrive\b)
- makes sure there is no "Drive" as a whole word in the string (so,Drives
are allowed).*
- any 0+ chars other than line break chars, as many as possibleEFI
- anEFI
substring.*
- any 0+ chars other than line break chars, as many as possible.
If your string has newlines, either use a /s
dotall modifier or replace .
with [\s\S]
.
Related Topics
Pass a Value from One Form to Another
Finding a Subsequence in Longer Sequence
What Does "Yield Break;" Do in C#
How to Download a Nuget Package Without Nuget.Exe or Visual Studio Extension
(.1F+.2F==.3F) != (.1F+.2F).Equals(.3F) Why
C# - How to Determine Whether a Type Is a Number
Best Way to Take Screenshots of Tests in Selenium 2
Start May Not Be Called on a Promise-Style Task. Exception Is Coming
How to Use More Than One Processor Group for My Threads in a C# App
How to List Available Video Modes Using C#
Selectively Use Default JSON Converter
Test If Object Implements Interface
Differencebetween Ienumerator and Ienumerable
Design Pattern for Handling Multiple Message Types