Ruby Global Match Regexp

Ruby global match regexp?

You can use the scan method. The scan method will either give you an array of all the matches or, if you pass it a block, pass each match to the block.

"hello1 hello2".scan(/(hello\d+)/)   # => [["hello1"], ["hello2"]]

"hello1 hello2".scan(/(hello\d+)/).each do|m|
puts m
end

I've written about this method, you can read about it here near the end of the article.

Ruby regex without global flag

Looks like you just need either ^ and $ at the beginning and end of your regex, or better yet \A and \Z to mark the beginning and end of the entire string (^ and $ work as long as it is a single line).

This is telling Ruby that it must match from beginning to end. Also the 'i' at the end isn't necessary and may yield incorrect results.

The following modified regex will work.

/\A(https|http):\/\/(kwagmire)\.(com)\/(embed)\/([a-zA-Z0-9]+)\/?\Z/

Note that besides the \A and \Z I also added \/? which allows for an optional / at the end of the url. I also removed the i at the end because you don't actually want the entire regex to be case insensitive. The last part ([a-zA-Z0-9]+) already is case insensitive by how it is declared with a-z and A-Z.

myregex.match("http://kwagmire.com/embed/1QgJVmCa/?onload(alert('asdfadsf'))")
returns nil

myregex.match("http://kwagmire.com/embed/1QgJVmCam/")
returns #<MatchData "http://kwagmire.com/embed/1QgJVmCam/" 1:"http" 2:"kwagmire" 3:"com" 4:"embed" 5:"1QgJVmCam">

How to match all occurrences of a regex

Using scan should do the trick:

string.scan(/regex/)

Using Regex Global Variable with Ruby gsub

Use '\1' instead of $1 ($1 references a variable which doesn't exist yet, since you haven't matched the regex yet)

Also, "my regexp isn't working" makes it difficult to help. A better phrase would be one which explains why it isn't working (string is same afterwards, or an error is raised, or whatever), and provides the data (string and regex) necessary to reproduce the problem.

str = "abcdefg"
str.gsub!(/a(.)c/, '\1')
str # => "bdefg"

Ruby RegExp - Match all CSS selectors and directives

You may use

^(?!.*@media) *[a-zA-Z#.:*][^{]*{[\s\S]*?}

See the regex demo

Details

  • ^ - start of string
  • (?!.*@media) - no @media allowed after any 0+ chars other than line break chars
  • * - 0+ spaces
  • [a-zA-Z#.:*] - a letter, #, ., : or *
  • [^{]* - zero or more chars other than {
  • { - a { char
  • [\s\S]*? - 0+ chars, as few as possible.
  • } - a } char

How to pass Regexp.last_match to a block in Ruby

Here is a way as per the question (Ruby 2). It is not pretty, and is not quite 100% perfect in all aspects, but does the job.

def newsub(str, *rest, &bloc)
str =~ rest[0] # => ArgumentError if rest[0].nil?
bloc.binding.tap do |b|
b.local_variable_set(:_, $~)
b.eval("$~=_")
end if bloc
str.sub(*rest, &bloc)
end

With this, the result is as follows:

_ = (/(xyz)/ =~ 'xyz')
p $1 # => "xyz"
p _ # => 0

p newsub("abcd", /ab(c)/, '\1') # => "cd"
p $1 # => "xyz"
p _ # => 0

p newsub("abcd", /ab(c)/){|m| $1} # => "cd"
p $1 # => "c"
p _ # => #<MatchData "abc" 1:"c">

v, _ = $1, newsub("efg", /ef(g)/){$1.upcase}
p [v, _] # => ["c", "G"]
p $1 # => "g"
p Regexp.last_match # => #<MatchData "efg" 1:"g">

In-depth analysis

In the above-defined method newsub, when a block is given, the local variables $1 etc in the caller's thread are (re)set, after the block is executed, which is consistent with String#sub. However, when a block is not given, the local variables $1 etc are not reset, whereas in String#sub, $1 etc are always reset regardless of whether a block is given or not.

Also, the caller's local variable _ is reset in this algorithm. In Ruby's convention, the local variable _ is used as a dummy variable and its value should not be read or referred to. Therefore, this should not cause any practical problems. If the statement local_variable_set(:$~, $~) was valid, no temporary local variables would be needed. However, it is not, in Ruby (as of Version 2.5.1 at least). See a comment (in Japanese) by Kazuhiro NISHIYAMA in [ruby-list:50708].

General background (Ruby's specification) explained

Here is a simple example to highlight Ruby's specification related to this issue:

s = "abcd"
/b(c)/ =~ s
p $1 # => "c"
1.times do |i|
p s # => "abcd"
p $1 # => "c"
end

The special variables of $&, $1, $2, etc, (related, $~ (Regexp.last_match), $' and alike)
work in the local scope. In Ruby, a local scope inherits the variables of the same names in the parent scope.
In the example above, the variable s is inherited, and so is $1.
The do block is yield-ed by 1.times, and the method 1.times has no control over the variables inside the block except for the block parameters (i in the example above; n.b., although Integer#times does not provide any block parameters, to attempt to receive one(s) in a block would be silently ignored).

This means a method that yield-s a block has no control over $1, $2, etc in the block, which are local variables (even though they may look like global variables).

Case of String#sub

Now, let us analyse how String#sub with the block works:

'abc'.sub(/.(.)./){ |m| $1 }

Here, the method sub first performs a Regexp match, and hence the local variables like $1 are automatically set. Then, they (the variables like $1) are inherited in the block, because this block is in the same scope as the method "sub". They are not passed from sub to the block, being different from the block parameter m (which is a matched String, or equivalent to $&).

For that reason, if the method sub is defined in a different scope from the block, the sub method has no control over local variables inside the block, including $1. A different scope means the case where the sub method is written and defined with a Ruby code, or in practice, all the Ruby methods except some of those written not in Ruby but in the same language as used to write the Ruby interpreter.

Ruby's official document (Ver.2.5.1) explains in the section of String#sub:

In the block form, the current match string is passed in as a parameter, and variables such as $1, $2, $`, $&, and $' will be set appropriately.

Correct. In practice, the methods that can and do set the Regexp-match-related special variables such as $1, $2, etc are limited to some built-in methods, including Regexp#match, Regexp#=~, Regexp#===,String#=~, String#sub, String#gsub, String#scan, Enumerable#all?, and Enumerable#grep.

Tip 1: String#split seems to reset $~ nil always.

Tip 2: Regexp#match? and String#match? do not update $~ and hence are much faster.

Here is a little code snippet to highlight how the scope works:

def sample(str, *rest, &bloc)
str.sub(*rest, &bloc)
$1 # non-nil if matches
end

sample('abc', /(c)/){} # => "c"
p $1 # => nil

Here, $1 in the method sample() is set by str.sub in the same scope. That implies the method sample() would not be able to (simply) refer to $1 in the block given to it.

I point out the statement in the section of Regular expression of Ruby's official document (Ver.2.5.1)

Using =~ operator with a String and Regexp the $~ global variable is set after a successful match.

is rather misleading, because

  1. $~ is a pre-defined local-scope variable (not global variable), and
  2. $~ is set (maybe nil) regardless of whether the last attempted match is successful or not.

The fact the variables like $~ and $1 are not global variables may be slightly confusing. But hey, they are useful notations, aren't they?

Regex - Match all words in parenthesis

Use scan() instead. It returns an array with all the matches. match() will only return the first match.

"(e), (f), and (g)".scan(/\(\w+\)/)

Replace all words which don't match a RegExp pattern in Ruby

From your example it appears that you want to replace all words with 'foo' except words that contain 's'; namely, 'string', 'surrounded' and 'quotes'. For that you can simplify /(.+)?s/ to /s/ (e.g., 'beeswax'.match?(/s/) #=> true).

It's best to use String#gsub on the entire string as it preserves extra spaces between words. If one instead splits the string on spaces, substitutes for each word in the resulting array, and then joins those elements to form a new string the extra spaces will be removed. For example, if one is old-school and inserts two spaces between sentences, we might have the following.

str = "Hello, I use a string of words, surrounded by quotes.  So there."

and want to preserve the two spaces following the period in the resulting string. Moreover, splitting on spaces and then joining the modified words creates an unnecessary array.

Suppose we wish to replace words that do not contain match 's' or 'S' with 'foo'. Words that contain 's' or 'S' match the regular expression

r = /s/i

We may then write:

str.gsub(/\w+/) { |s| s.match?(r) ? s : 'foo' }
#=> "foo, foo use foo string foo words, surrounded foo quotes. So foo."

gsub's argument is a regular expression that matches words.

Consider a second example. Suppose we with to replace all words that neither begin nor end with 's' or 'S' with 'foo'; that is, words that do not match the regular expression

r = /\As|s\z/i

We can do that in the same way:

str.gsub(/\w+/) { |s| s.match?(r) ? s : 'foo' }
#=> "foo, foo foo foo string foo words, surrounded foo quotes. So foo."


Related Topics



Leave a reply



Submit