Issue with a Look-behind Regular expression (Ruby)
Lookbehind has restrictions:
(?<=subexp) look-behind
(?<!subexp) negative look-behind
Subexp of look-behind must be fixed character length.
But different character length is allowed in top level
alternatives only.
ex. (?<=a|bc) is OK. (?<=aaa(?:b|cd)) is not allowed.
In negative-look-behind, captured group isn't allowed,
but shy group(?:) is allowed.
You cannot put alternatives in a non-top level within a (negative) lookbehind.
Put them at the top level. You also don't need to escape some characters that you did.
/(?<=href="|src=").*?"/
Ruby look behind regex error: invalid pattern in look-behind
You may use
s = s.gsub(/\A([^.]*\.[^.]*)\..*/, '\1')
See the regex demo and the regex graph:
Details
\A
- start of a string([^.]*\.[^.]*)
- Group 1: 0+ non-dots, a dot and 0+ non-dots\.
- a dot.*
- any 0 or more chars other than line break chars.
How to do a negative lookbehind within a %r … -delimited regexp in Ruby?
As others have mentioned, seems like an oversight based on how this character differs from other paired boundaries.
As far as "Is there really no way to escape the < here?" there is a way... but you're not going to like it:
%r<(?#{'<'}!foo)> == %r((?<!foo))
Using interpolation to insert the <
character seems to work. But given that there are much better options, I would avoid it unless you were planning on splitting the regex into sections anyway...
Problem with quantifiers and look-behind
The issue is that Ruby doesn't support variable-length lookbehinds. Quantifiers aren't out per se, but they can't cause the length of the lookbehind to be nondeterministic.
Perl has the same restriction, as does just about every major language featuring regexes.
Try using the straightforward match (\w*)\W*?o
instead of the lookbehind.
Unable to get my Ruby negative look ahead regex to work properly
But, wait, negative lookaheads can be variable length!
R = /
\b # match word break
#{'apples'.reverse} # match 'elppa'
\b # match word break
(?! # begin a negative lookahead
\s+ # match one or more whitespaces
#{'bad'.reverse} # match 'dab'
\b # match word break
) # close negative lookaheaad
/ix # case-indifferent and free-spacing regex definition modes
#=> /
\b # match word break
elppa # match 'selppa'
\b # match word break
(?! # begin a negative lookahead
\s+ # match one or more whitespaces
dab # match 'dab'
\b # match word break
) # close negative lookaheaad
/x
def avoid_bad_apples(str)
str.reverse.match? R
end
avoid_bad_apples("good apples") #=> true
avoid_bad_apples("Simbad apples") #=> true
avoid_bad_apples("bad pears") #=> false
avoid_bad_apples("bad apples") #=> false
avoid_bad_apples("bad apples") #=> false
avoid_bad_apples("good applesauce") #=> false
avoid_bad_apples("Very bad apples. BAD!") #=> false
SyntaxError: (irb):4: invalid pattern in look-behind (positive look-behind/ahead)
The reason is that Ruby's Onigmo regex engine does not support infinite-width lookbehind patterns.
In a general case, positive lookbehinds that contain quantifiers like *
, +
or {x,}
can often be substituted with a consuming pattern followed with \K
:
/(?: |\t*[a-zA-Z0-9_]+: |\t+)\K\d+(?=.*)/
#^^^ ^^
However, you do not even need that complicated pattern. (?=.*)
is redundant, as it does not require anything, .*
matches even an empty string. The positive lookbehind pattern will get triggered if there is a space or tab immediately to the left of the current location. The regex is equal to
.gsub(/(?<=[ \t])\d+/, "321")
where the pattern matches
(?<=[ \t])
- a location immediately preceded with a space/tab\d+
- one or more digits.
Is there a bug in Ruby lookbehind assertions (1.9/2.0)?
This has been officially classified as a bug and subsequently fixed, together with another problem concerning \Z
anchors in multiline strings.
Regex negative lookbehinds with a wildcard
You are thinking about it the right way. But unfortunately lookbehinds usually have be of fixed-length. The only major exception to that is .NET's regex engine, which allows repetition quantifiers inside lookbehinds. But since you only need a negative lookbehind and not a lookahead, too. There is a hack for you. Reverse the string, then try to match:
/rab(?!.{0,10}oof)/
Then reverse the result of the match or subtract the matching position from the string's length, if that's what you are after.
Now from the regex you have given, I suppose that this was only a simplified version of what you actually need. Of course, if bar
is a complex pattern itself, some more thought needs to go into how to reverse it correctly.
Note that if your pattern required both variable-length lookbehinds and lookaheads, you would have a harder time solving this. Also, in your case, it would be possible to deconstruct your lookbehind into multiple variable length ones (because you use neither +
nor *
):
/(?<!foo)(?<!foo.)(?<!foo.{2})(?<!foo.{3})(?<!foo.{4})(?<!foo.{5})(?<!foo.{6})(?<!foo.{7})(?<!foo.{8})(?<!foo.{9})(?<!foo.{10})bar/
But that's not all that nice, is it?
Use of \K and lookahead not working as expected
The (?<=^|,)(?=,|$)
matches like this: the first match is the start of the string as it is followed with ,
; the second matchis between the second and the third comma; after checking the position after the second comma, the position after the third comma is checked, and the third match is found; the last match is at the end of the string, as expected, as there is a ,
followed with $
(end of string).
The (^|,)\K(?=,|$)
pattern behavior in Ruby (Onigmo regex engine) and PCRE differs, you may easily check this at regex101.com. While in PCRE the \K
construct matches the empty string/location right after the third comma, Onigmo regex engine cannot match it due to the fact that the regex index is moved/set "manually" to skip the currently tested char if the match is an empty string. It means that after matching and consuming the second ,
, the matched text is omitted, and then the regex engine is forced to jump to the location after the third comma. And that means that there is no way for the (^|,)\K(?=,|$)
pattern to match between ,
and b
.
Related Topics
Using Phonegap as a Native Container for a Rails 3 App
Sinatra Does Not Start with Twitter Gem
Erroneous "Insecure World Writable Dir Foo in Path" When Running Ruby Script
What Does "Temps.Each(&:Valid)" Mean in Ruby
Nokogiri: Searching for <Div> Using Xpath
Consequences of Implementing To_Int and To_Str in Ruby
All Possible Combinations of Selected Character Substitution in a String in Ruby
Rails 3 Cli Executes Commands Really Slow
Opening Several Threads with Watir-Webdriver Results in 'Connection Refused' Error
Installing Rmagick Gem -- Can't Find Magickwand.H
Access Google Contacts API on Ruby
Does Ruby Have Syntax for Safe Navigation Operator of Nil Values, Like in Groovy
How to Upload a Text File and Parse Contents into Database in Ror