Order of regular expression operator (..|.. ... ..|..)
Left to right, and the first alternative matched "wins", others are not checked for. This is a typical NFA regex behavior. A good description of that behavior is provided at regular-expressions.info Alternation page.
Note that RegexOptions.RightToLeft
only makes the regex engine examine the input string from right to left, the modifier does not impact how the regex engine processes the pattern itself.
Let me illustrate: if you have a (aaa|bb|a)
regex and try to find a match in bbac
using Regex.Match
, the value you will obtain is bb
because a
alternative appears after bbb
. If you use Regex.Matches
, you will get all matches, and both bb
and a
will land in your results.
Also, the fact that the regex pattern is examined from left to right makes it clear that inside a non-anchored alternative group, the order of alternatives matter. If you use a (a|aa|aaa)
regex to match against abbccaa
, the first a
alternative will be matching each a
in the string (see the regex demo). Once you add word boundaries, you can place the alternatives in any order (see one more regex demo).
Operator precedence in regular expressions
Given the Oracle doc:
Table 4-2 lists the list of metacharacters supported for use in regular expressions passed to SQL regular expression functions and conditions. These metacharacters conform to the POSIX standard; any differences in behavior from the standard are noted in the "Description" column.
And taking a look at the |
value in that table:
The expression a|b matches character a or character b.
Plus taking a look at the POSIX doc:
Operator precedence
The order of precedence for of operators is as follows:
Collation-related bracket symbols [==] [::] [..]
Escaped characters \
Character set (bracket expression) []
Grouping ()
Single-character-ERE duplication * + ? {m,n}
Concatenation
Anchoring ^$
Alternation |
I would say that H|ha+
would be the same as (?:H|ha+)
.
How is the AND/OR operator represented as in Regular Expressions?
I'm going to assume you want to build a the regex dynamically to contain other words than part1 and part2, and that you want order not to matter. If so you can use something like this:
((^|, )(part1|part2|part3))+$
Positive matches:
part1
part2, part1
part1, part2, part3
Negative matches:
part1, //with and without trailing spaces.
part3, part2,
otherpart1
Regex AND operator
It is impossible for both (?=foo)
and (?=baz)
to match at the same time. It would require the next character to be both f
and b
simultaneously which is impossible.
Perhaps you want this instead:
(?=.*foo)(?=.*baz)
This says that foo
must appear anywhere and baz
must appear anywhere, not necessarily in that order and possibly overlapping (although overlapping is not possible in this specific case because the letters themselves don't overlap).
Why does the order of alternatives matter in regex?
The regular expression engine tries to match the alternatives in the order in which they are specified. So when the pattern is (foo|foobar)&?
it matches foo
immediately and continues trying to find matches. The next bit of the input string is bar& b
which cannot be matched.
In other words, because foo
is part of foobar
, there is no way (foo|foobar)
will ever match foobar
, since it will always match foo
first.
Occasionally, this can be a very useful trick, actually. The pattern (o|a|(\w))
will allow you to capture \w
and a
or o
differently:
Regex.Replace("a foobar& b", "(o|a|(\\w))", "$2") // fbr& b
Regular expressions - what determines the precedence of a conditional?
+?
is a lazy operator, meaning that it tries to match as few characters as possible before going further.
Normally, operators try to match as much as possible, from left to right, and if the rest of the expression fails, they backtrack to a shorter match. Lazy operators do the other way around: try to match as few characters as possible, and if the remaining expressions don't match, expand the current match.
So, the first part, (\b\w+?)
, will try to match 1 character (g
), and see if what follows is an es
or an s
, and a word boundary. Since that fails, it adds one more letter, and so on, until the first part matches glass
. In this phase, the second part does match the remaining es
.
If you replace that with a non-lazy, greedy operator, as in (\b\w+)(?=(?:es|s)\b)
, it will go the other way around. First, it assigns glasses
to the first part, (\b\w+)
, but fails to match an additional e
or es
, so it backtracks to glasse
, which succeeds in matching the remaining s
with the second part of the expression.
JavaScript regex: why is alternation not ordered?
The alternative graph
does not match starting at the third character, but the alternative photograph
does. The engine proceeds through the string from left to right.
The ordering you refer to in the question applies when alternatives match from a common starting point in the string. Otherwise, while proceeding through the "haystack" string, the alternatives are all considered. If there's a single match starting from a particular character,
then the rest of the regex will proceed with that (and may of course backtrack later).
Whether the engine prefers longer matches from a set of alternatives when there are multiple matches from the same character in the source, I can't say off the top of my head. I would guess it would try the longer one first, to consume more of the string optimistically, because it can always backtrack. However, I don't know that to be actual specified behavior and just thinking about reading the regex semantics in the spec makes my head hurt.
Related Topics
How to Get the Title of the Current Active Window Using C#
Is There Any Async Equivalent of Process.Start
The Entity Type <Type> Is Not Part of the Model for the Current Context
Two Different Dll with Same Namespace
Transparent Background Image for Form - Smooth Edge Shape for the Form
Does the C# "Finally" Block Always Execute
Oledbcommand Parameters Order and Priority
Prevent .Net Garbage Collection for Short Period of Time
What Are the True Benefits of Expandoobject
How to Move and Resize a Form Without a Border
View/Edit Id3 Data for Mp3 Files
How to Compile and Execute New Code at Runtime in .Net
Process.Waitforexit() Asynchronously
JSON.Net Disable the Deserialization on Datetime
How to Pass Ienumerable List to Controller in MVC Including Checkbox State