Is There a Bug in Ruby Lookbehind Assertions (1.9/2.0)

Is there a bug in Ruby lookbehind assertions (1.9/2.0)?

This has been officially classified as a bug and subsequently fixed, together with another problem concerning \Z anchors in multiline strings.

Issue with a Look-behind Regular expression (Ruby)

Lookbehind has restrictions:

   (?<=subexp)        look-behind
(?<!subexp) negative look-behind

Subexp of look-behind must be fixed character length.
But different character length is allowed in top level
alternatives only.
ex. (?<=a|bc) is OK. (?<=aaa(?:b|cd)) is not allowed.

In negative-look-behind, captured group isn't allowed,
but shy group(?:) is allowed.

You cannot put alternatives in a non-top level within a (negative) lookbehind.

Put them at the top level. You also don't need to escape some characters that you did.

/(?<=href="|src=").*?"/

Why does free-spacing mode stop negative lookbehind from working?

/(?x)(?<!big )dog/.match('I have a big dog.')
# ^

Note that you have a whitespace after big. Since it's the extended mode, the whitespace is ignored.

You have some options:

  • Use a pattern such as \s or \p{Space}.
  • Use escaped whitespace such as \, i.e. a space preceded by a
    backslash.
  • Use a character class such as [ ].

For example:

/(?x)(?<!big\s)dog/.match('I have a big dog.')
# => nil

Why does `ObjectSpace._id2ref` give different outputs on Ruby 1.9 and Ruby 2.0?

Just because in 2.0 the garbage collector was defter.

# RangeError: 0x124af7c is recycled object

states for the object was already GC’ed.

UPD: We can approach the requested behaviour with Mutex:

2.0.0 (main):0 > Mutex.new.synchronize {
2.0.0 (main):0 * class Foo ; end
2.0.0 (main):0 * id = Foo.new.singleton_class.object_id
2.0.0 (main):0 * puts id
2.0.0 (main):0 * puts ObjectSpace._id2ref(id)
2.0.0 (main):0 * }
# 23172260
# <Class:#<Foo:0x00000002c32970>>

Multiline mode in Perl and Ruby different: Ruby is wrong?

Ruby and Perl's /m work differently.


Ruby's /m changes the behavior of only .. It is equivalent to Perl's /s.

  • Ruby /m: Treat a newline as a character matched by .

  • Perl /s: Treat the string as single line. That is, change "." to match any character whatsoever, even a newline, which normally it would not match.

Perl's /m changes the behavior of ^ and $.

  • Perl /m: Treat the string being matched against as multiple lines. That is, change "^" and "$" from matching the start of the string's first line and the end of its last line to matching the start and end of each line within the string.

^ and $ always work this way in Ruby. Ruby effectively always has Perl's /m.

Ruby and Perl both share \A, \z, and \Z to match at the beginning of the string, end of the string, or just before the final newline.

Which is correct? Neither, they do their own thing. Perl's default behavior for ^ and $ is the same as POSIX regular expressions, but they are incompatible in other ways. Python uses the equivalent of Perl's multi and single-line modes (MULTILINE and DOTALL). Ruby simplifies the behavior of ^ and $ and makes regexes more explicit.

See Also

  • Ruby Regexp Anchors
  • Ruby Regexp Options
  • Perl Regexp Metacharacters
  • Perl Regexp Modifiers Overview

How do I create a multiline regex?

I'm not exactly sure why, but changing the .* to .*? allows this to match.

Rubular: http://www.rubular.com/r/GaQj6cM0rk

It seems like it should match fine with .* as well, but for some reason it doesn't appear to be backtracking.

Here is the Rubular when .* is used instead: http://www.rubular.com/r/jKf0bDZi7T

Note that regardless of the reason for this behavior, you should be using .*? anyway, otherwise you would only find a single match from the beginning of the first block to the end of the last block (if there were multiple blocks in a string).

Why am I unable to grab all 4 matchdata from Regexp?

Use String#scan instead:

def convert_to_decimal(binary_string)
octet1,octet2,octet3,octet4 = binary_string.scan /\d{8}/
puts octet1
puts octet2
puts octet3
puts octet4
end

convert_to_decimal('10000000001000000000101000000001')

output:

10000000
00100000
00001010
00000001

If you want to use MatchData#captures, modify the regular expression to contain four captured groups because MatchData#captures returns an array of captured groups not matches :

octet1,octet2,octet3,octet4 =
/(\d{8})(\d{8})(\d{8})(\d{8})/.match(binary_string).captures

No line is printed out, in command box, when running a .Net Console app through Ruby system call

Something broke when the Ruby interpreter was upgraded to 2.0 in SketchUp. We don't know what caused it. The current workaround is to pipe the output to a temp file and read the file.

`C:/s/Test.exe > sometempfile.txt`

Prevent this variable from changing

You're right. When you assign the array to bm you're really assigning a reference (or pointer, not sure what Ruby prefers to call it) to a[seed]. You can see this by printing out the object_id of both variables:

> a[seed].object_id
=> 70347648205960
> bm.object_id
=> 70347648205960

Note that they are pointing to the same internal object. The solution is to use dup to duplicate the array and assign the new one to bm2.

> bm2 = a[seed].dup
=> [1, 1, 1, 1]
> bm2.object_id
=> 70347649948520

Note the object_id has changed. And now if I make a change...

> a[seed][0] = 'WRONG'
=> "WRONG"
> a[seed]
=> ["WRONG", 1, 1, 1]
> bm
=> ["WRONG", 1, 1, 1]
> bm2
=> [1, 1, 1, 1]

You might want to Google to read about object_id, dup, and also clone which is similar to dup, but has some differences.



Related Topics



Leave a reply



Submit