Using $1, $2, etc. Global Variables Inside Method Definition

Using $1, $2, etc. global variables inside method definition

Why the output is different?

A proc in ruby has lexical scope. This means that when it finds a variable that is not defined, it is resolved within the context the proc was defined, not called. This explains the behavior of your code.

You can see the block is defined before the regexp, and this can cause confusion. The problem involves a magic ruby variable, and it works quite differently than other variables. Citing @JörgWMittag

It's rather simple, really: the reason why $SAFE doesn't behave like you would expect from a global variable is because it isn't a global variable. It's a magic unicorn thingamajiggy.

There are quite a few of those magic unicorn thingamajiggies in Ruby, and they are unfortunately not very well documented (not at all documented, in fact), as the developers of the alternative Ruby implementations found out the hard way. These thingamajiggies all behave differently and (seemingly) inconsistently, and pretty much the only two things they have in common is that they look like global variables but don't behave like them.

Some have local scope. Some have thread-local scope. Some magically change without anyone ever assigning to them. Some have magic meaning for the interpreter and change how the language behaves. Some have other weird semantics attached to them.

If you are really up to find exactly how the $1 and $2 variables work, I assume the only "documentation" you will find is rubyspec, that is a spec for ruby done the hard way by the Rubinus folks. Have a nice hacking, but be prepared for the pain.



Is there a way to pass a block to gsub from another context with $1, $2 variables setup the right way?

You can achieve what you want with this following modification (but I bet you already know that)

require 'pp'
def hello(z)
#z = proc {|m| pp $1}
"hello".gsub(/(o)/, &z)
end
z = proc {|m| pp m}
hello(z)

I'm not aware of a way to change the scope of a proc on the fly. But would you really want to do this?

Declaring global variable inside a function

You could construct your var=value as a string and evaluate it using the bash builtin command eval.

setting a global within a proc

It's rather simple, really: the reason why $SAFE doesn't behave like you would expect from a global variable is because it isn't a global variable. It's a magic unicorn thingamajiggy.

There are quite a few of those magic unicorn thingamajiggies in Ruby, and they are unfortunately not very well documented (not at all documented, in fact), as the developers of the alternative Ruby implementations found out the hard way. These thingamajiggies all behave differently and (seemingly) inconsistently, and pretty much the only two things they have in common is that they look like global variables but don't behave like them.

Some have local scope. Some have thread-local scope. Some magically change without anyone ever assigning to them. Some have magic meaning for the interpreter and change how the language behaves. Some have other weird semantics attached to them.

$SAFE has almost all of the above: it is thread-local, meaning that if you change it in one thread, it doesn't affect other threads. It is local, meaning if you change it in a local scope (like a class, module, method or block), it doesn't affect the outer scope (as you have discovered). It has magic meaning for the interpreter, since setting it to a value different than 0 makes certain things not work. And it has additional weird semantics in that you can only ever increase its value, never decrease it.

Ruby: What does $1 mean?

According to Avdi Grimm from RubyTapas

$1 is a global variable which can be used in later code:

 if "foobar" =~ /foo(.*)/ then 
puts "The matching word was #{$1}"
end

Output:

"The matching word was bar"

In short, $1, $2, $... are the global-variables used by some of the ruby library functions specially concerning REGEX to let programmers use the findings in later codes.

See this for such more variables available in Ruby.

How to pass Regexp.last_match to a block in Ruby

Here is a way as per the question (Ruby 2). It is not pretty, and is not quite 100% perfect in all aspects, but does the job.

def newsub(str, *rest, &bloc)
str =~ rest[0] # => ArgumentError if rest[0].nil?
bloc.binding.tap do |b|
b.local_variable_set(:_, $~)
b.eval("$~=_")
end if bloc
str.sub(*rest, &bloc)
end

With this, the result is as follows:

_ = (/(xyz)/ =~ 'xyz')
p $1 # => "xyz"
p _ # => 0

p newsub("abcd", /ab(c)/, '\1') # => "cd"
p $1 # => "xyz"
p _ # => 0

p newsub("abcd", /ab(c)/){|m| $1} # => "cd"
p $1 # => "c"
p _ # => #<MatchData "abc" 1:"c">

v, _ = $1, newsub("efg", /ef(g)/){$1.upcase}
p [v, _] # => ["c", "G"]
p $1 # => "g"
p Regexp.last_match # => #<MatchData "efg" 1:"g">

In-depth analysis

In the above-defined method newsub, when a block is given, the local variables $1 etc in the caller's thread are (re)set, after the block is executed, which is consistent with String#sub. However, when a block is not given, the local variables $1 etc are not reset, whereas in String#sub, $1 etc are always reset regardless of whether a block is given or not.

Also, the caller's local variable _ is reset in this algorithm. In Ruby's convention, the local variable _ is used as a dummy variable and its value should not be read or referred to. Therefore, this should not cause any practical problems. If the statement local_variable_set(:$~, $~) was valid, no temporary local variables would be needed. However, it is not, in Ruby (as of Version 2.5.1 at least). See a comment (in Japanese) by Kazuhiro NISHIYAMA in [ruby-list:50708].

General background (Ruby's specification) explained

Here is a simple example to highlight Ruby's specification related to this issue:

s = "abcd"
/b(c)/ =~ s
p $1 # => "c"
1.times do |i|
p s # => "abcd"
p $1 # => "c"
end

The special variables of $&, $1, $2, etc, (related, $~ (Regexp.last_match), $' and alike)
work in the local scope. In Ruby, a local scope inherits the variables of the same names in the parent scope.
In the example above, the variable s is inherited, and so is $1.
The do block is yield-ed by 1.times, and the method 1.times has no control over the variables inside the block except for the block parameters (i in the example above; n.b., although Integer#times does not provide any block parameters, to attempt to receive one(s) in a block would be silently ignored).

This means a method that yield-s a block has no control over $1, $2, etc in the block, which are local variables (even though they may look like global variables).

Case of String#sub

Now, let us analyse how String#sub with the block works:

'abc'.sub(/.(.)./){ |m| $1 }

Here, the method sub first performs a Regexp match, and hence the local variables like $1 are automatically set. Then, they (the variables like $1) are inherited in the block, because this block is in the same scope as the method "sub". They are not passed from sub to the block, being different from the block parameter m (which is a matched String, or equivalent to $&).

For that reason, if the method sub is defined in a different scope from the block, the sub method has no control over local variables inside the block, including $1. A different scope means the case where the sub method is written and defined with a Ruby code, or in practice, all the Ruby methods except some of those written not in Ruby but in the same language as used to write the Ruby interpreter.

Ruby's official document (Ver.2.5.1) explains in the section of String#sub:

In the block form, the current match string is passed in as a parameter, and variables such as $1, $2, $`, $&, and $' will be set appropriately.

Correct. In practice, the methods that can and do set the Regexp-match-related special variables such as $1, $2, etc are limited to some built-in methods, including Regexp#match, Regexp#=~, Regexp#===,String#=~, String#sub, String#gsub, String#scan, Enumerable#all?, and Enumerable#grep.

Tip 1: String#split seems to reset $~ nil always.

Tip 2: Regexp#match? and String#match? do not update $~ and hence are much faster.

Here is a little code snippet to highlight how the scope works:

def sample(str, *rest, &bloc)
str.sub(*rest, &bloc)
$1 # non-nil if matches
end

sample('abc', /(c)/){} # => "c"
p $1 # => nil

Here, $1 in the method sample() is set by str.sub in the same scope. That implies the method sample() would not be able to (simply) refer to $1 in the block given to it.

I point out the statement in the section of Regular expression of Ruby's official document (Ver.2.5.1)

Using =~ operator with a String and Regexp the $~ global variable is set after a successful match.

is rather misleading, because

  1. $~ is a pre-defined local-scope variable (not global variable), and
  2. $~ is set (maybe nil) regardless of whether the last attempted match is successful or not.

The fact the variables like $~ and $1 are not global variables may be slightly confusing. But hey, they are useful notations, aren't they?

How to explain $1,$2 in JavaScript when using regular expression?

It's not a "variable" - it's a placeholder that is used in the .replace() call. $n represents the nth capture group of the regular expression.

var num = "11222333";

// This regex captures the last 3 digits as capture group #2
// and all preceding digits as capture group #1
var re = /(\d+)(\d{3})/;

console.log(re.test(num));

// This replace call replaces the match of the regex (which happens
// to match everything) with the first capture group ($1) followed by
// a comma, followed by the second capture group ($2)
console.log(num.replace(re, "$1,$2"));

What are Ruby's numbered global variables

They're captures from the most recent pattern match (just as in Perl; Ruby initially lifted a lot of syntax from Perl, although it's largely gotten over it by now :). $1, $2, etc. refer to parenthesized captures within a regex: given /a(.)b(.)c/, $1 will be the character between a and b and $2 the character between b and c. $` and $' mean the strings before and after the string that matched the entire regex (which is itself in $&), respectively.

There is actually some sense to these, if only historically; you can find it in perldoc perlvar, which generally does a good job of documenting the intended mnemonics and history of Perl variables, and mostly still applies to the globals in Ruby. The numbered captures are replacements for the capture backreference regex syntax (\1, \2, etc.); Perl switched from the former to the latter somewhere in the 3.x versions, because using the backreference syntax outside of the regex complicated parsing too much. (By the time Perl 5 rolled around, the parser had been sufficiently rewritten that the syntax was again available, and promptly reused for references/"pointers". Ruby opted for using a name-quote : instead, which is closer to the Lisp and Smalltalk style; since Ruby started out as a Perl-alike with Smalltalk-style OO, this made more sense linguistically.) The same applies to $&, which in historical regex syntax is simply & (but you can't use that outside the replacement part of a substitution, so it became a variable $& instead). $` and $' are both "cutesy": "back-quote" and "forward-quote" from the matched string.

$1 and \1 in Ruby

\1 is a backreference which will only work in the same sub or gsub method call, e.g.:

"foobar".sub(/foo(.*)/, '\1\1') # => "barbar"

$1 is a global variable which can be used in later code:

if "foobar" =~ /foo(.*)/ then 
puts "The matching word was #{$1}"
end

Output:

"The matching word was bar"
# => nil

setting a global within a proc

It's rather simple, really: the reason why $SAFE doesn't behave like you would expect from a global variable is because it isn't a global variable. It's a magic unicorn thingamajiggy.

There are quite a few of those magic unicorn thingamajiggies in Ruby, and they are unfortunately not very well documented (not at all documented, in fact), as the developers of the alternative Ruby implementations found out the hard way. These thingamajiggies all behave differently and (seemingly) inconsistently, and pretty much the only two things they have in common is that they look like global variables but don't behave like them.

Some have local scope. Some have thread-local scope. Some magically change without anyone ever assigning to them. Some have magic meaning for the interpreter and change how the language behaves. Some have other weird semantics attached to them.

$SAFE has almost all of the above: it is thread-local, meaning that if you change it in one thread, it doesn't affect other threads. It is local, meaning if you change it in a local scope (like a class, module, method or block), it doesn't affect the outer scope (as you have discovered). It has magic meaning for the interpreter, since setting it to a value different than 0 makes certain things not work. And it has additional weird semantics in that you can only ever increase its value, never decrease it.

How is Ruby /lib/time.rb thread-safe?

$1, $2, etc. are special global variables that hold regular expression match results and are internally a thread-local variable, and hence, code using them are thread-safe.

Here is an excerpt from Ruby Documentation,

Special global variables


Pattern matching sets some global variables :

  • $~ is equivalent to ::last_match;
  • $& contains the complete matched text;
  • $` contains string before match;
  • $' contains string after match;
  • $1, $2 and so on contain text matching first, second, etc capture group;
  • $+ contains last capture group.

These global variables are thread-local and method-local variables.

Example below:

Thread.new {
"A B C".match(/(\w)/)
p $1 # Prints "A"

Thread.new {
"X Y Z".match(/(\w)/)
p $1 # Prints X
}.join

p $1 # Prints "A", shows that $1 was not corrupted by inner thread

}.join


Related Topics



Leave a reply



Submit