Python Regex Engine - "Look-Behind Requires Fixed-Width Pattern" Error

Python Regex Engine - look-behind requires fixed-width pattern Error

Python lookbehind assertions need to be fixed width, but you can try this:

>>> s = '"It "does "not "make "sense", Well, "Does "it"'
>>> re.sub(r'\b\s*"(?!,|$)', '" "', s)
'"It" "does" "not" "make" "sense", Well, "Does" "it"'

Explanation:

\b      # Start the match at the end of a "word"
\s* # Match optional whitespace
" # Match a quote
(?!,|$) # unless it's followed by a comma or end of string

Python regex error: look-behind requires fixed-width pattern

In python, you may use this work-around to avoid this error:

(?:^|(?<=[\s:]))(:[^\s:]+:)(?=[\s:]|$)

Anchors ^ and $ are zero-width matchers anyway.

RegEx Demo

Regex - look-behind requires fixed-width pattern error

If you want to assert that what is on the left is not eq it should be a negative lookbehind (?<! instead of a positive lookbehind.

You can write the pattern using 2 lookbehind assertions.

(?<!\()(?<!eq )'(?!\)|\Z)

Regex demo | Python demo

Example code

import re
text = "('hel'lo') eq 'some 'variable he're'"
print(re.compile(r"(?<!\()(?<!eq )'(?!\)|\Z)").sub(string=text, repl="''"))

Output

('hel''lo') eq 'some ''variable he''re'

Python - error: look-behind requires fixed-width pattern

Here are 2 approaches that will solve the issue:

Chained Lookbehinds

Convert an alternation based lookbehind into several negative lookbehinds since the logical relations between them will be the same (that of AND):

import re
phrase = '5 hampshire road bradford on avon avon dinas powys powys north somerset hampshire avon'
c_except = [r"on\s",r"dinas\s"]
c_out = ["avon", "powys", "somerset","hampshire"]
rx = r"(?<!\b{0})({1})".format(r")(?<!\b".join(c_except), "|".join(c_out))
print(re.sub(rx, "", phrase))

See this Python demo.

Capturing Approch

Capture what you need to keep and match only what you need to remove, and use \1 backreference to restore Group 1 value:

import re
phrase = '5 hampshire road bradford on avon avon dinas powys powys north somerset hampshire avon'
c_except = [r"on\s+",r"dinas\s+"]
c_out = ["avon", "powys", "somerset","hampshire"]
rx = r"(\b(?:{0})(?:{1}))|(?:{1})".format(r"|".join(c_except), "|".join(c_out))
print(re.sub(rx, r"\1", phrase))

See another Python demo.

Note that this approach is favorable since you may use variable width patterns inside c_except.

The regex will look like

(\b(?:on\s+|dinas\s+)(?:avon|powys|somerset|hampshire))|(?:avon|powys|somerset|hampshire)

It will match on or dinas as whole words due to the \b word boundary, and then any of the terms you need to remove and since that part is wrapped into a capturing group, you may refer to the capture with \1 backreference. In all other contexts, the c_out terms will be removed with the |(?:avon|powys|somerset|hampshire) pattern.

NOTE: The \1 replacement will work in Python 3.5+. For older versions, and Python 2.x, you need to replace it with a lambda:

re.sub(rx, lambda m: m.group(1) if m.group(1) else "", phrase)

Python look-behind regex issue: Invalid regular expression: look-behind requires fixed-width pattern

Python re module, as most languages (with the notable exception of .NET), doesn't support variable length lookbehind.

Can't you use a capturing group instead ?

“[^”]*(</p>\s*<p[^>]*>)

Data in the first capturing group.

Python look-behind regex fixed-width pattern error while looking for consecutive repeated words

Maybe regexes are not needed at all.

Using itertools.groupby does the job. It's designed to group equal occurrences of consecutive items.

  • group by words (after splitting according to dots)
  • convert to list and issue a tuple value,count only if length > 1

like this:

import itertools

s = "My.name.name.is.Inigo.Montoya.You.killed.my.father.father.father.Prepare.to.die"

matches = [(l[0],len(l)) for l in (list(v) for k,v in itertools.groupby(s.split("."))) if len(l)>1]

result:

[('name', 2), ('father', 3)]

So basically we can do whatever we want with this list of tuples (filtering it on the number of occurrences for instance)

Bonus (as I misread the question at first, so I'm leaving it in): to remove the duplicates from the sentence
- group by words (after splitting according to dots) like above
- take only key (value) of the values returned in a list comp (we don't need the values since we don't count)
- join back with dot

In one line (still using itertools):

new_s = ".".join([k for k,_ in itertools.groupby(s.split("."))])

result:

My.name.is.Inigo.Montoya.You.killed.my.father.Prepare.to.die

Regex Pattern doesn't work using look behind without validating the fixed-width pattern

You may use

rx = r'(?:(?:Ave|Rd|St|Blvd|Dr|Way|Pl|Ln|Ct)\.|Beach|Way|Walk)\s*(.+?)\s*\d{3}-\d{3}-\d{4}'
zagat['city'] = zagat['raw'].str.extract(rx, expand=False)

See the regex demo

Details

  • (?:(?:Ave|Rd|St|Blvd|Dr|Way|Pl|Ln|Ct)\.|Beach|Way|Walk) - Ave, Rd, St, Blvd, Dr, Way, Pl, Ln or Ct followed with . or Beach, Way or Walk
  • \s* - 0+ whitespaces
  • (.+?) - Group 1 (this value will be returned by .extract): any one or more chars other than line break chars, as few as possible
  • \s* - 0+ whitespaces
  • \d{3}-\d{3}-\d{4} - 3 digits, -, 3 digits, - and 4 digits.

Regex to extract unique string to new column, getting error look-behind requires fixed-width pattern

You may use

.*\s/(?:\s+XO[A-Z0-9\s]*\b)?\s+(.+)

See the regex demo.

Details

  • .* - 0+ chars other than line break chars, as many as possible
  • \s - a whitespace
  • / - a / char
  • (?:\s+XO[A-Z0-9\s]*\b)? - an optional pattern:

    • \s+ - 1+ whitespaces
    • XO - XO
    • [A-Z0-9\s]* - 0+ uppercase letters or digits followed with
    • \b - a word boundary
  • \s+ - 1+ whitespaces
  • (.+) - Group 1 (what str.extract will return): any 1+ chars other than line break chars, as many as possible

In Pandas, use

df['Result'] = df['File Name'].str.extract(r'.*\s/(?:\s+XO[A-Z0-9\s]*\b)?\s+(.+)', expand=False).fillna('')

Result:

                                   Result  
0 File Name Type
1 Document Internal Only
2
3 Location Site 3: Park Triangle
4 Block 4 Beach/Dock Camp
5 Blue-print/Register Info Site (RISs)
6 Location Place 5: Drive Place (Active)
7 Area Place 1: Beach Drive

Python regex look-behind requires fixed-width pattern

If you just want to get the title tag,

html=urllib2.urlopen("http://somewhere").read()
for item in html.split("</title>"):
if "<title>" in item:
print item[ item.find("<title>")+7: ]

Python regex look-behind strange behaviour with character '^'

Reason why first regex is nor working in Python because ^ is a zero width match and Python regex engine doesn't support alternation of zero with and non-zero alternations in the lookbehind assertion.

This is however supported in other engines such as Java, PHP, Perl, C# etc.

To solve this problem, you can use this regex:

(?:^|(?<=b))[0-9]

RegEx Demo



Related Topics



Leave a reply



Submit