Case insensitive regular expression without re.compile?
Pass re.IGNORECASE
to the flags
param of search
, match
, or sub
:
re.search('test', 'TeSt', re.IGNORECASE)
re.match('test', 'TeSt', re.IGNORECASE)
re.sub('test', 'xxxx', 'Testing', flags=re.IGNORECASE)
Python: why does regex compiled with re.IGNORECASE drop first chars?
The problem here is that you're passing the re.IGNORECASE
flag in the wrong place. Since listing_re
is a compiled regex, listing_re.search
has a signature like this (docs):
Pattern.search(string[, pos[, endpos]])
[...]
The optional second parameter pos gives an index in the string where the search is to start; it defaults to 0. This is not completely equivalent to slicing the string; the '^' pattern character matches at the real beginning of the string and at positions just after a newline, but not necessarily at the index where the search is to start.
As you can see, you've passed re.IGNORECASE
as the value of the pos
parameter. Since re.IGNORECASE
happens to have a value of 2, you end up skipping the first 2 characters.
>>> re.IGNORECASE
<RegexFlag.IGNORECASE: 2>
The correct usage would be to pass the flags to re.compile
:
listing_re = re.compile('(.*,?) and (.*)', re.IGNORECASE)
case insensitive regex returning original pattern
import re
words = ['Cat', 'Dog', 'Horse']
reg = re.compile(r"\b(?:(" + ")|(".join(words) + r"))\b", flags=re.I)
match = reg.search('My grandma owned no cats, only a black doG named Morli.'
' Oh, and no horse, of course.')
if match:
print(words[match.lastindex - 1])
prints
Dog
This builds a regex like \b(?:(Cat)|(Dog)|(Horse))\b
, i.e., a non-capturing group (this is the meaning of ?:
) surrounded by word boundaries (the \b
s), which is an alternation of capturing groups. The index of the last (and only, if any) matching capturing group is returned in match.lastindex
, but this is also the index in the words
list (because of how the regex was constructed), except it is offset by 1 because group 0 is the full match.
The set of words found in the text is easily constructed by
words_found = {words[match.lastindex - 1] for match in reg.finditer(text)}
The following is an edit by Patrick Artner, of which I don't think it's an improvement - but I'll leave it here because it is food for thought.
Edit - pls, incorporate:
all_matches = reg.findall('My grandma owned no cat, only a black doG named Morli.A cat named tinker came by.'
' Oh, and no horse, of course.')
found = [ words[idx] for k in all_matches for idx,m in enumerate(k) if m.strip() ]
print(found) # ['Cat', 'Dog', 'Cat', 'Horse']
Python efficent non cases sensitive match?
You can use regular expressions and re.IGNORECASE
In python Case insensitive regular expression without re.compile?
import re
s = 'padding aAaA padding'
print(bool(re.findall("aaaa",s,re.IGNORECASE)))
Prints True.
?! in regular expression
I don't understand the expression "?!" in python Regular expression
Well, ?!
is not an expression, regular or otherwise.
The documentation says:
(?=...)
Matches if
...
matches next, but doesn’t consume any of the string. This is called a lookahead assertion. For example,Isaac (?=Asimov)
will match'Isaac '
only if it’s followed by'Asimov'
.
(?!...)
Matches if
...
doesn’t match next. This is a negative lookahead assertion. For example,Isaac (?!Asimov)
will match'Isaac '
only if it’s not followed by'Asimov'
.
(including the definition of a positive lookahead assertion for comparison).
So, the expression is really (?!...)
.
Note that the (?
sequence is used for several extensions to plain regular expressions, and they're all documented.
Why is only a searched in the first case?
Because a
is the only character the pattern asks for. The lookahead group could prevent a
from matching (if it was followed by abc
), but it wouldn't be part of the match either way.
Can you make just part of a regex case-insensitive?
Perl lets you make part of your regular expression case-insensitive by using the (?i:) pattern modifier.
Modern regex flavors allow you to apply modifiers to only part of the regular expression. If you insert the modifier (?ism) in the middle of the regex, the modifier only applies to the part of the regex to the right of the modifier. You can turn off modes by preceding them with a minus sign. All modes after the minus sign will be turned off. E.g. (?i-sm) turns on case insensitivity, and turns off both single-line mode and multi-line mode.
Not all regex flavors support this. JavaScript and Python apply all mode modifiers to the entire regular expression. They don't support the (?-ismx) syntax, since turning off an option is pointless when mode modifiers apply to the whole regular expressions. All options are off by default.
You can quickly test how the regex flavor you're using handles mode modifiers. The regex (?i)te(?-i)st should match test and TEst, but not teST or TEST.
Source
How do I do a case insensitive regular expression in Go?
You can set a case-insensitive flag as the first item in the regex.
You do this by adding "(?i)"
to the beginning of a regex.
reg, err := regexp.Compile("(?i)"+strings.Replace(s.Name, " ", "[ \\._-]", -1))
For a fixed regex it would look like this.
r := regexp.MustCompile(`(?i)CaSe`)
For more information about flags, search theregexp/syntax
package documentation
(or the syntax documentation)
for the term "flags".
Related Topics
Compare Two Dataframes and Output Their Differences Side-By-Side
Is There a Decorator to Simply Cache Function Return Values
Alternative to Dict Comprehension Prior to Python 2.7
How to Use Method Overloading in Python
What Do Square Brackets, "[]", Mean in Function/Class Documentation
How to Execute a File Within the Python Interpreter
How to Read Realtime Microphone Audio Volume in Python and Ffmpeg or Similar
Python Referencing Old Ssl Version
Get a Filtered List of Files in a Directory
Negative Integer Division Surprising Result
What's the Difference Between Select_Related and Prefetch_Related in Django Orm
Django Filter Queryset _In for *Every* Item in List
Recursion and Return Statements
How to Write Output in Same Place on the Console
Tkinter: "Python May Not Be Configured for Tk"
Count Unique Values Per Groups with Pandas