Unexpected match of regex
The behavior with {,2}
is not expected, it is a bug. If you have a look at the TRE source code, tre_parse_bound
method, you will see that the min
variable value is set to -1
before the engine tries to initialize the minimum bound. It seems that the number of "repeats" in case the minimum value is missing in the quantifier is the number of maximum value + 1
(as if the repeat number equals max - min = max - (-1) = max+1
).
So, a{,}
matches one occurrence of a
. Same as a{, }
or a{ , }
. See R demo, only abc
is matched with ab{,}c
:
grepl("ab{,}c", c("ac", "abc", "abbc", "abbbc", "abbbbc"))
grepl("ab{, }c", c("ac", "abc", "abbc", "abbbc", "abbbbc"))
grepl("ab{ , }c", c("ac", "abc", "abbc", "abbbc", "abbbbc"))
## => [1] FALSE TRUE FALSE FALSE FALSE
python regex unexpected match groups
You have 2 groups in your regex - so you're getting 2 groups. And you need to match atleast 1 number that follows.
try this:
([_\^][1-9]+)
See it in action here
Regular expression unexpected pattern matching
It matches 10(
because 1
matches [^+-]
, 0
matches 0
and (
matches [^0-9]
.
The reason I used the above expression, instead of the much simpler one, (0|[+-]?[1-9][0-9]*) is due to inability of the parser to recognise incorrect expressions such as 012.
How so? Using the above regex, 012
would be recognized as two tokens: 0
and 12
. Would this not cause an error in your parser?
Admittedly, this would not produce a very good error message, so a better approach might be to just use [0-9]+
as the regex and then use the action to check for a leading zero. That way 012
would be a single token and the lexer could produce an error or warning about the leading zero (I'm assuming here that you actually want to disallow leading zeros - not use them for octal literals).
Instead of a check in the action, you could also keep your regex and then add another one for integers with a leading zero (like 0[0-9]+ { warn("Leading zero"); return INT; }
), but I'd go with the check in the action since it's an easy check and it keeps the regex short and simple.
PS: If you make -
and +
part of the integer token, something like 2+3
will be seen as the integer 2
, followed by the integer +3
, rather than the integers 2
and 3
with a +
token in between. Therefore it is generally a better idea to not make the sign a part of the integer token and instead allow prefix +
and -
operators in the parser.
Unexpected match in .Net for a regex on path that works in JavaScript
If you have the right security permissions on the filesystem you run the code on you don't need to use Regex for this, the framework already contains what you need by way of FileInfo
:
new FileInfo(@"c:\level0\level1\level2\filename.ext").Directory.Name
If you dont have access to the file system you can also use Path.GetDirectoryName()
which is a bit more straightforward than Regex I think:
string directoryPath = Path.GetDirectoryName(@"c:\level0\level1\level2\filename.ext");
//outputs "c:\level0\level1\level2"
string folder = directoryPath.Split(Path.DirectorySeparatorChar).Last();
Fiddle for the latter of the two answers here
PCRE negative lookahead gives unexpected match
This solution uses a possessive quantifer on the foo
part of the pattern \w++
. That means it will refuse to backtrack after finding a series of "word" characters, even if the rest of the pattern -- the negative look-ahead -- then fails. I've also had to change the negative look-behind to reject word characters or colons :
to prevent things like baz::std::foo
from matching
It is mostly a tidy-up of the answer from Sebastian Proske. It uses \w
instead of the literal character class, adds layout using the /x
modifier, and removes unnecessary parentheses. It also provides a working example
use strict;
use warnings 'all';
use feature 'say';
my $s = 'match std::foo but not match std::foo::bar.';
say $1 while $s =~ / (?<![\w:]) ( \w+::\w++) (?!:) /gx;
output
std::foo
unexpected regex match with IGNORECASE
In your code, you swapped the pattern.your pattern should be like re.search(pattern, string[], flags])
names = ['PA','SB','PA Solid','SB Solid']
for name in names:
print("Name:",name)
print(re.search(r'PartBody|Part_Body', name,re.IGNORECASE))
Output:
('Name:', 'PA')
None
('Name:', 'SB')
None
('Name:', 'PA Solid')
None
('Name:', 'SB Solid')
Unexpected match with java regex
The .*?
quantifier means that it will find as few characters as possible to satisfy the match, it doesn't mean that it will stop searching at the first >
it finds. So in your example, the <x.*?>
will match all of:
<x>ipsum <x>dolor sit amet</x>
With all the characters between the first x
and the the final >
satisfying the .*?
. To fix this, you can simply change your pattern to:
<x[^>]*> +</x>
On a side note, it's been stated many times before, but you should not use regular expressions to parse xml/html/xhtml.
Regex pattern matching unexpected value
This is not a direct answer to your question but to the problem you seem to be solving and therefore maybe still helpful:
To parse emails I always make extensive use of Python's email library.
In your case you could use something like this:
from email.utils import getaddresses
from email import message_from_string
msg = message_from_string(str_with_msg_source)
tos = msg.get_all('to', [])
ccs = msg.get_all('cc', [])
resent_tos = msg.get_all('resent-to', [])
resent_ccs = msg.get_all('resent-cc', [])
all_recipients = getaddresses(tos + ccs + resent_tos + resent_ccs)
for (name, address) in all_recipients:
# do some postprocessing on name or address if necessary
This always took reliable care of splitting names and addresses in mail headers in my cases.
Related Topics
How to Update a Shiny Fileinput Object
How to Select Columns Programmatically in a Data.Table
Two Horizontal Bar Charts with Shared Axis in Ggplot2 (Similar to Population Pyramid)
Finding Non-Numeric Data in a Data Frame or Vector
Dplyr Group by Colnames Described as Vector of Strings
Expression and New Line in Plot Labels
Looping Through List of Data Frames in R
Passing Parameters to R Markdown
Use an Image as Area Fill in an R Plot
Find Locations Within Certain Lat/Lon Distance in R
Generating Multiple Plots in Ggplot by Factor
Geom_Density to Match Geom_Histogram Binwitdh
Coloring Boxplot Outlier Points in Ggplot2
When Does the Argument Go Inside or Outside Aes()
Partially Read Really Large CSV.Gz in R Using Vroom