Case Insensitive Regular Expression Without Re.Compile

Case insensitive regular expression without re.compile?

Pass re.IGNORECASE to the flags param of search, match, or sub:

re.search('test', 'TeSt', re.IGNORECASE)
re.match('test', 'TeSt', re.IGNORECASE)
re.sub('test', 'xxxx', 'Testing', flags=re.IGNORECASE)

Python: why does regex compiled with re.IGNORECASE drop first chars?

The problem here is that you're passing the re.IGNORECASE flag in the wrong place. Since listing_re is a compiled regex, listing_re.search has a signature like this (docs):

Pattern.search(string[, pos[, endpos]])

[...]

The optional second parameter pos gives an index in the string where the search is to start; it defaults to 0. This is not completely equivalent to slicing the string; the '^' pattern character matches at the real beginning of the string and at positions just after a newline, but not necessarily at the index where the search is to start.

As you can see, you've passed re.IGNORECASE as the value of the pos parameter. Since re.IGNORECASE happens to have a value of 2, you end up skipping the first 2 characters.

>>> re.IGNORECASE
<RegexFlag.IGNORECASE: 2>

The correct usage would be to pass the flags to re.compile:

listing_re = re.compile('(.*,?) and (.*)', re.IGNORECASE)

case insensitive regex returning original pattern

import re

words = ['Cat', 'Dog', 'Horse']
reg = re.compile(r"\b(?:(" + ")|(".join(words) + r"))\b", flags=re.I)

match = reg.search('My grandma owned no cats, only a black doG named Morli.'
' Oh, and no horse, of course.')
if match:
print(words[match.lastindex - 1])

prints

Dog

This builds a regex like \b(?:(Cat)|(Dog)|(Horse))\b, i.e., a non-capturing group (this is the meaning of ?:) surrounded by word boundaries (the \bs), which is an alternation of capturing groups. The index of the last (and only, if any) matching capturing group is returned in match.lastindex, but this is also the index in the words list (because of how the regex was constructed), except it is offset by 1 because group 0 is the full match.

The set of words found in the text is easily constructed by

words_found = {words[match.lastindex - 1] for match in reg.finditer(text)}



The following is an edit by Patrick Artner, of which I don't think it's an improvement - but I'll leave it here because it is food for thought.

Edit - pls, incorporate:

all_matches = reg.findall('My grandma owned no cat, only a black doG named Morli.A cat named tinker came by.'
' Oh, and no horse, of course.')
found = [ words[idx] for k in all_matches for idx,m in enumerate(k) if m.strip() ]

print(found) # ['Cat', 'Dog', 'Cat', 'Horse']

Python efficent non cases sensitive match?

You can use regular expressions and re.IGNORECASE In python Case insensitive regular expression without re.compile?

import re
s = 'padding aAaA padding'
print(bool(re.findall("aaaa",s,re.IGNORECASE)))

Prints True.

?! in regular expression

I don't understand the expression "?!" in python Regular expression

Well, ?! is not an expression, regular or otherwise.

The documentation says:

  • (?=...)

    Matches if ... matches next, but doesn’t consume any of the string. This is called a lookahead assertion. For example, Isaac (?=Asimov) will match 'Isaac ' only if it’s followed by 'Asimov'.

  • (?!...)

    Matches if ... doesn’t match next. This is a negative lookahead assertion. For example, Isaac (?!Asimov) will match 'Isaac ' only if it’s not followed by 'Asimov'.

(including the definition of a positive lookahead assertion for comparison).

So, the expression is really (?!...).

Note that the (? sequence is used for several extensions to plain regular expressions, and they're all documented.

Why is only a searched in the first case?

Because a is the only character the pattern asks for. The lookahead group could prevent a from matching (if it was followed by abc), but it wouldn't be part of the match either way.

Can you make just part of a regex case-insensitive?

Perl lets you make part of your regular expression case-insensitive by using the (?i:) pattern modifier.

Modern regex flavors allow you to apply modifiers to only part of the regular expression. If you insert the modifier (?ism) in the middle of the regex, the modifier only applies to the part of the regex to the right of the modifier. You can turn off modes by preceding them with a minus sign. All modes after the minus sign will be turned off. E.g. (?i-sm) turns on case insensitivity, and turns off both single-line mode and multi-line mode.

Not all regex flavors support this. JavaScript and Python apply all mode modifiers to the entire regular expression. They don't support the (?-ismx) syntax, since turning off an option is pointless when mode modifiers apply to the whole regular expressions. All options are off by default.

You can quickly test how the regex flavor you're using handles mode modifiers. The regex (?i)te(?-i)st should match test and TEst, but not teST or TEST.

Source

How do I do a case insensitive regular expression in Go?

You can set a case-insensitive flag as the first item in the regex.

You do this by adding "(?i)" to the beginning of a regex.

reg, err := regexp.Compile("(?i)"+strings.Replace(s.Name, " ", "[ \\._-]", -1))

For a fixed regex it would look like this.

r := regexp.MustCompile(`(?i)CaSe`)

For more information about flags, search the
regexp/syntax package documentation
(or the syntax documentation)
for the term "flags".



Related Topics



Leave a reply



Submit