Regex with named capture groups getting all matches in Ruby
Named captures are suitable only for one matching result.
Ruby's analogue of findall
is String#scan
. You can either use scan
result as an array, or pass a block to it:
irb> s = "123--abc,123--abc,123--abc"
=> "123--abc,123--abc,123--abc"
irb> s.scan(/(\d*)--([a-z]*)/)
=> [["123", "abc"], ["123", "abc"], ["123", "abc"]]
irb> s.scan(/(\d*)--([a-z]*)/) do |number, chars|
irb* p [number,chars]
irb> end
["123", "abc"]
["123", "abc"]
["123", "abc"]
=> "123--abc,123--abc,123--abc"
Ruby - best way to extract regex capture groups?
Since v2.4.6, Ruby has had named_captures
, which can be used like this. Just add the ?<some_name>
syntax inside a capture group.
/(\w)(\w)/.match("ab").captures # => ["a", "b"]
/(\w)(\w)/.match("ab").named_captures # => {}
/(?<some_name>\w)(\w)/.match("ab").captures # => ["a"]
/(?<some_name>\w)(\w)/.match("ab").named_captures # => {"some_name"=>"a"}
Even more relevant, you can reference a named capture by name!
result = /(?<some_name>\w)(\w)/.match("ab")
result["some_name"] # => "a"
Capturing groups don't work as expected with Ruby scan method
See scan
documentation:
If the pattern contains no groups, each individual result consists of the matched string,
$&
. If the pattern contains groups, each individual result is itself an array containing one entry per group.
You should remove capturing groups (if they are redundant), or make them non-capturing (if you just need to group a sequence of patterns to be able to quantify them), or use extra code/group in case a capturing group cannot be avoided.
- In this scenario, the capturing group is used to quantifiy a pattern sequence, thus all you need to do is convert the capturing group into a non-capturing one by replacing all unescaped
(
with(?:
(there is only one occurrence here):
text = " -45.124, 1124.325"
puts text.scan(/[+-]?\d+(?:\.\d+)?/)
See demo, output:
-45.124
1124.325
Well, if you need to also match floats like .04
you can use [+-]?\d*\.?\d+
. See another demo
- There are cases when you cannot get rid of a capturing group, e.g. when the regex contains a backreference to a capturing group. In that case, you may either a) declare a variable to store all matches and collect them all inside a
scan
block, or b) enclose the whole pattern with another capturing group and map the results to get the first item from each match, c) you may use agsub
with just a regex as a single argument to return an Enumerator, with.to_a
to get the array of matches:
text = "11234566666678"
# Variant a:
results = []
text.scan(/(\d)\1+/) { results << Regexp.last_match(0) }
p results # => ["11", "666666"]
# Variant b:
p text.scan(/((\d)\2+)/).map(&:first) # => ["11", "666666"]
# Variant c:
p text.gsub(/(\d)\1+/).to_a # => ["11", "666666"]
See this Ruby demo.
Ruby regex multiple repeating captures
Repeating capturing group's data aren't stored separately in most programming languages, hence you can't refer to them individually. This is a valid reason to use \G
anchor. \G
causes a match to start from where previous match ended or it will match beginning of string as same as \A
.
So we are in need of its first capability:
(?:foo:|\G(?!\A))\s*(\d+)\s*(?:,|and)?
Breakdown:
(?:
Start a non-capturing groupfoo:
Matchfoo:
|
Or\G(?!\A)
Continue match from where previous match ends
)
End of NCG\s*
Any number of whitespace characters(\d+)
Match and capture digits\s*
Any number of whitespae characters(?:,|and)?
Optional,
orand
This regex will begin a match on meeting foo
in input string. Then tries to find a following digit that precedes a comma or and
(whitespaces are allowed around digits).
\K
token will reset match. It means it will send a signal to engine to forget whatever is matched so far (but keep whatever is captured) and then leaves cursor right at that position.
I used \K
in Rubular regex to make result set not to have matched strings but captured digits. However Rubular seems to work differently and didn't need \K
. It's not a must at all.
How to match all occurrences of a regular expression in Ruby
Using scan
should do the trick:
string.scan(/regex/)
How to return first match sub-string of a string using Ruby regex?
scan
will return all substrings that matches the pattern. You can use match
, scan
or []
to achieve your goal:
report_path = '/usr/share/filebeat/reports/ui/local/20200904_151507/API/API_Test_suite/20200904_151508/20200904_151508.csv'
report_path.match(/\d{8}_\d{6}/)[0]
# => "20200904_151507"
report_path.scan(/\d{8}_\d{6}/)[0]
# => "20200904_151507"
# String#[] supports regex
report_path[/\d{8}_\d{6}/]
# => "20200904_151507"
Note that match
returns a MatchData
object, which may contains multiple matches (if we use capture groups). scan
will return an Array
containing all matches.
Here we're calling [0]
on the MatchData
to get the first match
Capture groups:
Regex allow us to capture multiples substring using one patern. We can use ()
to create capture groups. (?'some_name'<pattern>)
allow us to create named capture groups.
report_path = '/usr/share/filebeat/reports/ui/local/20200904_151507/API/API_Test_suite/20200904_151508/20200904_151508.csv'
matches = report_path.match(/(\d{8})_(\d{6})/)
matches[0] #=> "20200904_151507"
matches[1] #=> "20200904"
matches[2] #=> "151507"
matches = report_path.match(/(?'date'\d{8})_(?'id'\d{6})/)
matches[0] #=> "20200904_151507"
matches["date"] #=> "20200904"
matches["id"] #=> "151507"
We can even use (named) capture groups with []
From String#[]
documentation:
If a Regexp is supplied, the matching portion of the string is returned. If a capture follows the regular expression, which may be a capture group index or name, follows the regular expression that component of the MatchData is returned instead.
report_path = '/usr/share/filebeat/reports/ui/local/20200904_151507/API/API_Test_suite/20200904_151508/20200904_151508.csv'
# returns the full match if no second parameter is passed
report_path[/(\d{8})_(\d{6})/]
# => 20200904_151507
# returns the capture group n°2
report_path[/(\d{8})_(\d{6})/, 2]
# => 151507
# returns the capture group called "date"
report_path[/(?'date'\d{8})_(?'id'\d{6})/, 'date']
# => 20200904
Ruby regular expression matching enumerator with named capture support
Very identical to the answer you have already seen, but slightly different.
str = "Sun rises at 6:23 am & sets at 5:45 pm; Moon comes up by 7:20 pm ..."
str.gsub(/(?<time>\d:\d{2}) (?<meridiem>am|pm)/).map{ Regexp.last_match }
#=> [#<MatchData "6:23 am" time:"6:23" meridiem:"am">, #<MatchData "5:45 pm" ...
Named capture group doesn't work with dynamic regex
The problem with the first approach is that using string interpolation in the regex literal disables the assignment of the local variables. From Regexp#=~
:
If
=~
is used with a regexp literal with named captures, captured strings (ornil
) is assigned to local variables named by the capture names.... snipped...
This assignment is implemented in the Ruby parser. The parser detects ‘regexp-literal =~ expression’ for the assignment. The regexp must be a literal without interpolation and placed at left hand side.
... snipped ...
A regexp interpolation,
#{}
, also disables the assignment.
You can always just use Regexp#match
to get the captures, but I'm not sure of anyway to automatically assign local variables like this (honestly I didn't know =~
would do so):
match_data = /(?<g1>#{permitted_keys.join('|')})_content_type/.match(key)
match_data['g1']
# => "banner"
or if you like dealing with globals:
/(?<g1>#{permitted_keys.join('|')})_content_type/ =~ key
$~['g1']
# => "banner"
How to do named capture in ruby
You should use match
with named captures, not scan
m = "555-333-7777".match(/(?<area>\d{3})-(?<city>\d{3})-(?<number>\d{4})/)
m # => #<MatchData "555-333-7777" area:"555" city:"333" number:"7777">
m[:area] # => "555"
m[:city] # => "333"
If you want an actual hash, you can use something like this:
m.names.zip(m.captures).to_h # => {"area"=>"555", "city"=>"333", "number"=>"7777"}
Or this (ruby 2.4 or later)
m.named_captures # => {"area"=>"555", "city"=>"333", "number"=>"7777"}
Ruby one-liner to capture regular expression matches
string = "the quick brown fox jumps over the lazy dog."
extract_string = string[/fox (.*?) dog/, 1]
# => "jumps over the lazy"
extract_array = string.scan(/the (.*?) fox .*?the (.*?) dog/).first
# => ["quick brown", "lazy"]
This approach will also return nil
(instead of throwing an error) if no match is found.
extract_string = string[/MISSING_CAT (.*?) dog/, 1]
# => nil
extract_array = string.scan(/the (.*?) MISSING_CAT .*?the (.*?) dog/).first
# => nil
Related Topics
How to Install Ruby Gems When Using Rvm
How to Solve the Update Bundler Warning in Rails When Deploying to Heroku
Rails: Your User Account Isn't Allowed to Install to the System Rubygems
Cucumber, Capybara and Selenium - Submitting a Form Without a Button
Errno::Enoent: No Such File or Directory Ruby
Carrierwave: Create the Same, Unique Filename for All Versioned Files
Converting an Array of Keys and an Array of Values into a Hash in Ruby
Ruby on Rails, Including a Module with Arguments
How to Remove/Disable Sign Up from Devise
How to Make Object Instance a Hash Key in Ruby
How to Read Lines from File into Array
How to Run Ruby Tasks That Use My Rails Models
Shared Variable Among Ruby Processes