Regex that matches valid Ruby local variable names
Identifiers are pretty straightforward. They begin with letters or an underscore, and contain letters, underscore and numbers. Local variables can't (or shouldn't?) begin with an uppercase letter, so you could just use a regex like this.
/^[a-z_][a-zA-Z_0-9]*$/
RegEx matching variable names but not string values
I interpret variable names as all word character sequences with a min length of 1 and starting with a letter. Your regexp was almost correct then:
^[A-Za-z]\w*$
What is the regular expression to find $name from abc $name,efg in Ruby code?
Just call match()
on your regex.
/\$name/.match(*yourString*)
Ruby regular expression using variable name
The code you think doesn't work, does:
var = "Value"
str = "a test Value"
p str.gsub( /#{var}/, 'foo' ) # => "a test foo"
Things get more interesting if var can contain regular expression meta-characters. If it does and you want those matacharacters to do what they usually do in a regular expression, then the same gsub will work:var = "Value|a|test"
str = "a test Value"
str.gsub( /#{var}/, 'foo' ) # => "foo foo foo"
However, if your search string contains metacharacters and you do not want them interpreted as metacharacters, then use Regexp.escape like this:var = "*This*"
str = "*This* is a string"
p str.gsub( /#{Regexp.escape(var)}/, 'foo' )
# => "foo is a string"
Or just give gsub a string instead of a regular expression. In MRI >= 1.8.7, gsub will treat a string replacement argument as a plain string, not a regular expression:var = "*This*"
str = "*This* is a string"
p str.gsub(var, 'foo' ) # => "foo is a string"
(It used to be that a string replacement argument to gsub was automatically converted to a regular expression. I know it was that way in 1.6. I don't recall which version introduced the change).As noted in other answers, you can use Regexp.new as an alternative to interpolation:
var = "*This*"
str = "*This* is a string"
p str.gsub(Regexp.new(Regexp.escape(var)), 'foo' )
# => "foo is a string"
How to pass Regexp.last_match to a block in Ruby
Here is a way as per the question (Ruby 2). It is not pretty, and is not quite 100% perfect in all aspects, but does the job.
def newsub(str, *rest, &bloc)
str =~ rest[0] # => ArgumentError if rest[0].nil?
bloc.binding.tap do |b|
b.local_variable_set(:_, $~)
b.eval("$~=_")
end if bloc
str.sub(*rest, &bloc)
end
With this, the result is as follows:_ = (/(xyz)/ =~ 'xyz')
p $1 # => "xyz"
p _ # => 0
p newsub("abcd", /ab(c)/, '\1') # => "cd"
p $1 # => "xyz"
p _ # => 0
p newsub("abcd", /ab(c)/){|m| $1} # => "cd"
p $1 # => "c"
p _ # => #<MatchData "abc" 1:"c">
v, _ = $1, newsub("efg", /ef(g)/){$1.upcase}
p [v, _] # => ["c", "G"]
p $1 # => "g"
p Regexp.last_match # => #<MatchData "efg" 1:"g">
In-depth analysis
In the above-defined methodnewsub
, when a block is given, the local variables $1 etc in the caller's thread are (re)set, after the block is executed, which is consistent with String#sub
. However, when a block is not given, the local variables $1 etc are not reset, whereas in String#sub
, $1 etc are always reset regardless of whether a block is given or not.Also, the caller's local variable _
is reset in this algorithm. In Ruby's convention, the local variable _
is used as a dummy variable and its value should not be read or referred to. Therefore, this should not cause any practical problems. If the statement local_variable_set(:$~, $~)
was valid, no temporary local variables would be needed. However, it is not, in Ruby (as of Version 2.5.1 at least). See a comment (in Japanese) by Kazuhiro NISHIYAMA in [ruby-list:50708].
General background (Ruby's specification) explained
Here is a simple example to highlight Ruby's specification related to this issue:
s = "abcd"
/b(c)/ =~ s
p $1 # => "c"
1.times do |i|
p s # => "abcd"
p $1 # => "c"
end
The special variables of $&
, $1
, $2
, etc, (related, $~
(Regexp.last_match
), $'
and alike)work in the local scope. In Ruby, a local scope inherits the variables of the same names in the parent scope.
In the example above, the variable
s
is inherited, and so is $1
.The
do
block is yield-ed by 1.times
, and the method 1.times
has no control over the variables inside the block except for the block parameters (i
in the example above; n.b., although Integer#times
does not provide any block parameters, to attempt to receive one(s) in a block would be silently ignored).This means a method that yield-s a block has no control over $1
, $2
, etc in the block, which are local variables (even though they may look like global variables).
Case of String#sub
Now, let us analyse howString#sub
with the block works:'abc'.sub(/.(.)./){ |m| $1 }
Here, the method sub
first performs a Regexp match, and hence the local variables like $1
are automatically set. Then, they (the variables like $1
) are inherited in the block, because this block is in the same scope as the method "sub". They are not passed from sub
to the block, being different from the block parameter m
(which is a matched String, or equivalent to $&
).For that reason, if the method sub
is defined in a different scope from the block, the sub
method has no control over local variables inside the block, including $1
. A different scope means the case where the sub
method is written and defined with a Ruby code, or in practice, all the Ruby methods except some of those written not in Ruby but in the same language as used to write the Ruby interpreter.
Ruby's official document (Ver.2.5.1) explains in the section of String#sub
:
Correct. In practice, the methods that can and do set the Regexp-match-related special variables such as $1, $2, etc are limited to some built-in methods, includingIn the block form, the current match string is passed in as a parameter, and variables such as $1, $2, $`, $&, and $' will be set appropriately.
Regexp#match
, Regexp#=~
, Regexp#===
,String#=~
, String#sub
, String#gsub
, String#scan
, Enumerable#all?
, and Enumerable#grep
.Tip 1:
String#split
seems to reset $~
nil always.Tip 2:
Regexp#match?
and String#match?
do not update $~
and hence are much faster.Here is a little code snippet to highlight how the scope works:
def sample(str, *rest, &bloc)
str.sub(*rest, &bloc)
$1 # non-nil if matches
end
sample('abc', /(c)/){} # => "c"
p $1 # => nil
Here, $1
in the method sample() is set by str.sub
in the same scope. That implies the method sample()
would not be able to (simply) refer to $1
in the block given to it.I point out the statement in the section of Regular expression of Ruby's official document (Ver.2.5.1)
is rather misleading, becauseUsing
=~
operator with a String and Regexp the$~
global variable is set after a successful match.
$~
is a pre-defined local-scope variable (not global variable), and$~
is set (maybe nil) regardless of whether the last attempted match is successful or not.
$~
and $1
are not global variables may be slightly confusing. But hey, they are useful notations, aren't they? Ruby Regexp group matching, assign variables on 1 line
You don't want scan
for this, as it makes little sense. You can use String#match
which will return a MatchData
object, you can then call #captures
to return an Array of captures. Something like this:
#!/usr/bin/env ruby
string = "RyanOnRails: This is a test"
one, two, three = string.match(/(^.*)(:)(.*)/i).captures
p one #=> "RyanOnRails"
p two #=> ":"
p three #=> " This is a test"
Be aware that if no match is found, String#match
will return nil, so something like this might work better:if match = string.match(/(^.*)(:)(.*)/i)
one, two, three = match.captures
end
Although scan
does make little sense for this. It does still do the job, you just need to flatten the returned Array first. one, two, three = string.scan(/(^.*)(:)(.*)/i).flatten
Regex to test criteria for class name
I believe you want the following.
r = /
\A # match the beginning of the string
[A-Z] # match an upper case English letter
\p{Alnum}* # match zero or more Unicode letters or digits
\z # match the end of the string
/x # free-spacing regex definition mode
'ThisIsATest'.match? r #=> true
'TIsAT22Test'.match? r #=> true
'thisIsATest'.match? r #=> false
'ThisIsATest?'.match? r #=> false
'T'.match? r #=> true
'LeMêmeTest'.match? r #=> true
'Être'.match? r #=> false
''.match? r #=> false
One can only test the first character (which must be a letter) for case, as any combination of upper and lower case for remaining letters can be interpreted as corresponding to a camel-case name. For example, 'TIsAT22Test'.match? r #=> true
as it could be viewed as 'T Is A T22 Test'
. Similarly 'TIsAT22test'.match? r #=> true
because it could be regarded as 'T Is A T22test'
.It is curious that, while names of constants may contain Unicode letters, they must begin with one of the 26 English letters A-Z
. That's through Ruby MRI 2.5.x anyway. However, one of the changes coming in Ruby MRI v2.6 (to be released December 25, 2018) is that constants can begin with some 1,853 additional characters (source). Presumably (I will investigate and edit to show my findings), any character s
that satisfies s.match? /\p{Upper}/ #=> true
can begin the name of a constant, and hence, the name of a module. If so, the regular expression above should be changed accordingly.
1. In Ruby v2.5.1 it can be seen that Même
is a valid name for a constant: Même = 4; Même = 5 #=> warning: already initialized constant
. However, Être
is not. In fact, Être
is the name of a local variable: Être = 7; binding.local_variable_get(:Être) #=> 7
.
Working with Named Regex Groups in Ruby
From the Ruby Rexexp docs:
So it needs to be a literal regex that is used in order to create the local variables.When named capture groups are used with a literal regexp on the left-hand side of an expression and the =~ operator, the captured text is also assigned to local variables with corresponding names.
In your case you are using a variable to reference the regex, not a literal.
For example:
regex = /(?<day>.*)/
regex =~ 'whatever'
puts day
produces NameError: undefined local variable or method `day' for main:Object
, but this/(?<day>.*)/ =~ 'whatever'
puts day
prints whatever
. Regular expression to recognize variable declarations in C
A pattern to recognize variable declarations in C. Looking at a conventional declaration, we see:
int variable;
If that's the case, one should test for the type keyword before anything, to avoid matching something else, like a string or a constant defined with the preprocessor(?:\w+\s+)([a-zA-Z_][a-zA-Z0-9]+)
variable name resides in \1.The feature you need is look-behind/look-ahead.
UPDATE July 11 2015
The previous regex fail to match some variables with _
anywhere in the middle. To fix that, one just have to add the _
to the second part of the first capture group, it also assume variable names of two or more characters, this is how it looks after the fix:
(?:\w+\s+)([a-zA-Z_][a-zA-Z0-9_]*)
However, this regular expression has many false positives, goto jump;
being one of them, frankly it's not suitable for the job, because of that, I decided to create another regex to cover a wider range of cases, though it's far from perfect, here it is:\b(?:(?:auto\s*|const\s*|unsigned\s*|signed\s*|register\s*|volatile\s*|static\s*|void\s*|short\s*|long\s*|char\s*|int\s*|float\s*|double\s*|_Bool\s*|complex\s*)+)(?:\s+\*?\*?\s*)([a-zA-Z_][a-zA-Z0-9_]*)\s*[\[;,=)]
I've tested this regex with Ruby, Python and JavaScript and it works very well for the common cases, however it fails in some cases. Also, the regex may need some optimizations, though it is hard to do optimizations while maintaining portability across several regex engines.Tests resume
unsignedchar *var; /* OK, doesn't match */
goto **label; /* OK, doesn't match */
int function(); /* OK, doesn't match */
char **a_pointer_to_a_pointer; /* OK, matches +a_pointer_to_a_pointer+ */
register unsigned char *variable; /* OK, matches +variable+ */
long long factorial(int n) /* OK, matches +n+ */
int main(int argc, int *argv[]) /* OK, matches +argc+ and +argv+ (needs two passes) */
const * char var; /* OK, matches +var+, however, it doesn't consider +const *+ as part of the declaration */
int i=0, j=0; /* 50%, matches +i+ but it will not match j after the first pass */
int (*functionPtr)(int,int); /* FAIL, doesn't match (too complex) */
False positives
The following case is hard to cover with a portable regular expression, text editors use contexts to avoid highlighting text inside quotes.printf("int i=%d", i); /* FAIL, match i inside quotes */
False positives (syntax errors)
This can be fixed if one test the syntax of the source file before applying the regular expression. With GCC and Clang one can just pass the -fsyntax-only flag to test the syntax of a source file without compiling itint char variable; /* matches +variable+ */
Related Topics
How to Specify Regexp Options Using Regexp.Union
Posting Ruby Data in JSON Format with Net/Http
Ruby/Active Record: Custom Sorting Order
Save PDF File Shown by Pdfkit Middleware
Combining Ruby on Rails and Backbone
Automatically Run Rspec When Plain-Old Ruby (Not Rails) Files Change
Use Pry in Gems Without Modifying The Gemfile or Using 'Require'
How to Click on Specific Element in Canvas by Its Coordinates (Using Webdriver)
Issue Installing Gems on Windows 7 with Proxy
Rails How to Tell If a Sidekiq Worker Is Done with Perform_Async
Should I Delete Migration After Rollback
What Is Toplevel_Binding in Ruby
Ruby Trying to Grasp a New Notation. (Inject(: ) Vs Select(&:Even); Why One Has &)
Set Ruby 2.0 Keyword Arguments with Attr_Accessor on Initialize
Ruby a Clever Way to Execute a Function on a Condition
How to Access Parent/Sibling Module Methods
How to Change "3 Errors Prohibited This Foobar from Being Saved" Validation Message in Rails