Issue with a Look-Behind Regular Expression (Ruby)

Issue with a Look-behind Regular expression (Ruby)

Lookbehind has restrictions:

   (?<=subexp)        look-behind
(?<!subexp) negative look-behind

Subexp of look-behind must be fixed character length.
But different character length is allowed in top level
alternatives only.
ex. (?<=a|bc) is OK. (?<=aaa(?:b|cd)) is not allowed.

In negative-look-behind, captured group isn't allowed,
but shy group(?:) is allowed.

You cannot put alternatives in a non-top level within a (negative) lookbehind.

Put them at the top level. You also don't need to escape some characters that you did.

/(?<=href="|src=").*?"/

Ruby look behind regex error: invalid pattern in look-behind

You may use

s = s.gsub(/\A([^.]*\.[^.]*)\..*/, '\1')

See the regex demo and the regex graph:

Sample Image

Details

  • \A - start of a string
  • ([^.]*\.[^.]*) - Group 1: 0+ non-dots, a dot and 0+ non-dots
  • \. - a dot
  • .* - any 0 or more chars other than line break chars.

How to do a negative lookbehind within a %r … -delimited regexp in Ruby?

As others have mentioned, seems like an oversight based on how this character differs from other paired boundaries.

As far as "Is there really no way to escape the < here?" there is a way... but you're not going to like it:

%r<(?#{'<'}!foo)> == %r((?<!foo))

Using interpolation to insert the < character seems to work. But given that there are much better options, I would avoid it unless you were planning on splitting the regex into sections anyway...

Problem with quantifiers and look-behind

The issue is that Ruby doesn't support variable-length lookbehinds. Quantifiers aren't out per se, but they can't cause the length of the lookbehind to be nondeterministic.

Perl has the same restriction, as does just about every major language featuring regexes.

Try using the straightforward match (\w*)\W*?o instead of the lookbehind.

Unable to get my Ruby negative look ahead regex to work properly

But, wait, negative lookaheads can be variable length!

R = /
\b # match word break
#{'apples'.reverse} # match 'elppa'
\b # match word break
(?! # begin a negative lookahead
\s+ # match one or more whitespaces
#{'bad'.reverse} # match 'dab'
\b # match word break
) # close negative lookaheaad
/ix # case-indifferent and free-spacing regex definition modes
#=> /
\b # match word break
elppa # match 'selppa'
\b # match word break
(?! # begin a negative lookahead
\s+ # match one or more whitespaces
dab # match 'dab'
\b # match word break
) # close negative lookaheaad
/x

def avoid_bad_apples(str)
str.reverse.match? R
end

avoid_bad_apples("good apples") #=> true
avoid_bad_apples("Simbad apples") #=> true
avoid_bad_apples("bad pears") #=> false
avoid_bad_apples("bad apples") #=> false
avoid_bad_apples("bad apples") #=> false
avoid_bad_apples("good applesauce") #=> false
avoid_bad_apples("Very bad apples. BAD!") #=> false

SyntaxError: (irb):4: invalid pattern in look-behind (positive look-behind/ahead)

The reason is that Ruby's Onigmo regex engine does not support infinite-width lookbehind patterns.

In a general case, positive lookbehinds that contain quantifiers like *, + or {x,} can often be substituted with a consuming pattern followed with \K:

/(?: |\t*[a-zA-Z0-9_]+: |\t+)\K\d+(?=.*)/
#^^^ ^^

However, you do not even need that complicated pattern. (?=.*) is redundant, as it does not require anything, .* matches even an empty string. The positive lookbehind pattern will get triggered if there is a space or tab immediately to the left of the current location. The regex is equal to

.gsub(/(?<=[ \t])\d+/, "321")

where the pattern matches

  • (?<=[ \t]) - a location immediately preceded with a space/tab
  • \d+ - one or more digits.

Is there a bug in Ruby lookbehind assertions (1.9/2.0)?

This has been officially classified as a bug and subsequently fixed, together with another problem concerning \Z anchors in multiline strings.

Regex negative lookbehinds with a wildcard

You are thinking about it the right way. But unfortunately lookbehinds usually have be of fixed-length. The only major exception to that is .NET's regex engine, which allows repetition quantifiers inside lookbehinds. But since you only need a negative lookbehind and not a lookahead, too. There is a hack for you. Reverse the string, then try to match:

/rab(?!.{0,10}oof)/

Then reverse the result of the match or subtract the matching position from the string's length, if that's what you are after.

Now from the regex you have given, I suppose that this was only a simplified version of what you actually need. Of course, if bar is a complex pattern itself, some more thought needs to go into how to reverse it correctly.

Note that if your pattern required both variable-length lookbehinds and lookaheads, you would have a harder time solving this. Also, in your case, it would be possible to deconstruct your lookbehind into multiple variable length ones (because you use neither + nor *):

/(?<!foo)(?<!foo.)(?<!foo.{2})(?<!foo.{3})(?<!foo.{4})(?<!foo.{5})(?<!foo.{6})(?<!foo.{7})(?<!foo.{8})(?<!foo.{9})(?<!foo.{10})bar/

But that's not all that nice, is it?

Use of \K and lookahead not working as expected

The (?<=^|,)(?=,|$) matches like this: the first match is the start of the string as it is followed with ,; the second matchis between the second and the third comma; after checking the position after the second comma, the position after the third comma is checked, and the third match is found; the last match is at the end of the string, as expected, as there is a , followed with $ (end of string).

The (^|,)\K(?=,|$) pattern behavior in Ruby (Onigmo regex engine) and PCRE differs, you may easily check this at regex101.com. While in PCRE the \K construct matches the empty string/location right after the third comma, Onigmo regex engine cannot match it due to the fact that the regex index is moved/set "manually" to skip the currently tested char if the match is an empty string. It means that after matching and consuming the second ,, the matched text is omitted, and then the regex engine is forced to jump to the location after the third comma. And that means that there is no way for the (^|,)\K(?=,|$) pattern to match between , and b.



Related Topics



Leave a reply



Submit