Unexpected Match of Regex

Unexpected match of regex

The behavior with {,2} is not expected, it is a bug. If you have a look at the TRE source code, tre_parse_bound method, you will see that the min variable value is set to -1 before the engine tries to initialize the minimum bound. It seems that the number of "repeats" in case the minimum value is missing in the quantifier is the number of maximum value + 1 (as if the repeat number equals max - min = max - (-1) = max+1).

So, a{,} matches one occurrence of a. Same as a{, } or a{ , }. See R demo, only abc is matched with ab{,}c:

grepl("ab{,}c", c("ac", "abc", "abbc", "abbbc", "abbbbc"))
grepl("ab{, }c", c("ac", "abc", "abbc", "abbbc", "abbbbc"))
grepl("ab{ , }c", c("ac", "abc", "abbc", "abbbc", "abbbbc"))
## => [1] FALSE TRUE FALSE FALSE FALSE

python regex unexpected match groups

You have 2 groups in your regex - so you're getting 2 groups. And you need to match atleast 1 number that follows.

try this:

([_\^][1-9]+)

See it in action here

Regular expression unexpected pattern matching

It matches 10( because 1 matches [^+-], 0 matches 0 and ( matches [^0-9].

The reason I used the above expression, instead of the much simpler one, (0|[+-]?[1-9][0-9]*) is due to inability of the parser to recognise incorrect expressions such as 012.

How so? Using the above regex, 012 would be recognized as two tokens: 0 and 12. Would this not cause an error in your parser?

Admittedly, this would not produce a very good error message, so a better approach might be to just use [0-9]+ as the regex and then use the action to check for a leading zero. That way 012 would be a single token and the lexer could produce an error or warning about the leading zero (I'm assuming here that you actually want to disallow leading zeros - not use them for octal literals).

Instead of a check in the action, you could also keep your regex and then add another one for integers with a leading zero (like 0[0-9]+ { warn("Leading zero"); return INT; }), but I'd go with the check in the action since it's an easy check and it keeps the regex short and simple.

PS: If you make - and + part of the integer token, something like 2+3 will be seen as the integer 2, followed by the integer +3, rather than the integers 2 and 3 with a + token in between. Therefore it is generally a better idea to not make the sign a part of the integer token and instead allow prefix + and - operators in the parser.

Unexpected match in .Net for a regex on path that works in JavaScript

If you have the right security permissions on the filesystem you run the code on you don't need to use Regex for this, the framework already contains what you need by way of FileInfo:

new FileInfo(@"c:\level0\level1\level2\filename.ext").Directory.Name

If you dont have access to the file system you can also use Path.GetDirectoryName() which is a bit more straightforward than Regex I think:

string directoryPath = Path.GetDirectoryName(@"c:\level0\level1\level2\filename.ext");

//outputs "c:\level0\level1\level2"

string folder = directoryPath.Split(Path.DirectorySeparatorChar).Last();

Fiddle for the latter of the two answers here

PCRE negative lookahead gives unexpected match

This solution uses a possessive quantifer on the foo part of the pattern \w++. That means it will refuse to backtrack after finding a series of "word" characters, even if the rest of the pattern -- the negative look-ahead -- then fails. I've also had to change the negative look-behind to reject word characters or colons : to prevent things like baz::std::foo from matching

It is mostly a tidy-up of the answer from Sebastian Proske. It uses \w instead of the literal character class, adds layout using the /x modifier, and removes unnecessary parentheses. It also provides a working example

use strict;
use warnings 'all';
use feature 'say';

my $s = 'match std::foo but not match std::foo::bar.';

say $1 while $s =~ / (?<![\w:]) ( \w+::\w++) (?!:) /gx;

output

std::foo

unexpected regex match with IGNORECASE

In your code, you swapped the pattern.your pattern should be like re.search(pattern, string[], flags])

names = ['PA','SB','PA Solid','SB Solid']
for name in names:
print("Name:",name)
print(re.search(r'PartBody|Part_Body', name,re.IGNORECASE))

Output:

('Name:', 'PA')
None
('Name:', 'SB')
None
('Name:', 'PA Solid')
None
('Name:', 'SB Solid')

Unexpected match with java regex

The .*? quantifier means that it will find as few characters as possible to satisfy the match, it doesn't mean that it will stop searching at the first > it finds. So in your example, the <x.*?> will match all of:

<x>ipsum <x>dolor sit amet</x>

With all the characters between the first x and the the final > satisfying the .*?. To fix this, you can simply change your pattern to:

<x[^>]*> +</x>

On a side note, it's been stated many times before, but you should not use regular expressions to parse xml/html/xhtml.

Regex pattern matching unexpected value

This is not a direct answer to your question but to the problem you seem to be solving and therefore maybe still helpful:

To parse emails I always make extensive use of Python's email library.

In your case you could use something like this:

from email.utils import getaddresses
from email import message_from_string

msg = message_from_string(str_with_msg_source)
tos = msg.get_all('to', [])
ccs = msg.get_all('cc', [])
resent_tos = msg.get_all('resent-to', [])
resent_ccs = msg.get_all('resent-cc', [])
all_recipients = getaddresses(tos + ccs + resent_tos + resent_ccs)
for (name, address) in all_recipients:
# do some postprocessing on name or address if necessary

This always took reliable care of splitting names and addresses in mail headers in my cases.



Related Topics



Leave a reply



Submit