Find both pattern and position of multiple regex matches in Ruby
MatchData
string.scan(regex) do
$1 # Pattern at first position
$2 # Pattern at second position
$~.offset(1) # Starting and ending position of $1
$~.offset(2) # Starting and ending position of $2
end
How to match multiple regex patterns in ruby
Because "ll" is inside "Hello", logic to include both in same scan method call requires a slightly clumsy-looking expression that double-captures the "ll". This seems close, but note the sequence interleaves "Hello" and "ll", unlike the expected output. However, as far as I can see, that would be a necessity for any regular expression that makes a single pass through the string:
str = "Hello, how are you. Hello, I am lloyds"
a = str.scan( /(He(ll)o|ll)/ ).flatten.compact
=> ["Hello", "ll", "Hello", "ll", "ll"]
The compact is necessary, because a lone "ll" will not match the inner capture, and the array may contain unwanted nil
s.
ruby regex to match multiple occurrences of pattern
Try this
:
=> str.match(/\[\[(.*)\]\].*\[\[(.*)\]\]/).captures
=> ["lead:first_name", "client:last_name"]
With many occurrences
:
=> str
=> "Some [[lead:first_name]] random text[[lead:first_name]] and more [[lead:first_name]] stuff [[client:last_name]]"
=> str.scan(/\[(\w+:\w+)\]/)
=> [["lead:first_name"], ["lead:first_name"], ["lead:first_name"], ["client:last_name"]]
How to match all occurrences of a regular expression in Ruby
Using scan
should do the trick:
string.scan(/regex/)
Find the position of each capture group in a string
I can't find a easy way to find the position in a string of the begining a capture group.
Like this:
str = 'aa123bb456'
str.scan(/(.)(.)(\d+)/) do
md = Regexp.last_match
p md.offset(1)
p md.offset(2)
p md.offset(3)
puts '-' * 20
end
--output:--
[0, 1]
[1, 2]
[2, 5]
--------------------
[5, 6]
[6, 7]
[7, 10]
In the first match, the beginning of the capture groups in the string are 0, 1, 2, and for the second match the beginning of the capture groups are 5, 6, 7.
Alternatively, if you only want the start of each capture, as hwnd demonstrated, you can do this:
str = 'aa123bb456'
str.scan(/(.)(.)(\d+)/) do
md = Regexp.last_match
p md.begin(1)
p md.begin(2)
p md.begin(3)
puts '-' * 20
end
--output:--
0
1
2
--------------------
5
6
7
--------------------
Yes, but I don't know how many times the regex will match
How is that relevant?
Response to edit:
str = "ggtgtcaactatccgccgcgaagcacgtaacgtctctcttgacaccgaatcataggtgcgacagt"
regex = /cg(.)a(.)/
results = []
str.scan(regex) do
md = Regexp.last_match
results << md.begin(1) << md.begin(2)
end
p results
--output:--
[20, 22, 27, 29, 47, 49]
Better way to check for multiple regex conditions in Ruby?
I find it difficult to comprehend what Bob#hey
is doing. One possibility is to define some appropriately-named helper methods in the String
class:
class String
def contains_a_digit?() !!(self =~ /\d/) end
def contains_no_digits?() !self.contains_a_digit? end
def contains_an_uppercase_char?() !!(self =~ /[A-Z]/) end
def contains_no_lowercase_chars?() self !~ /[a-z]/ end
def ends_with_questionmark?() !!(self =~ /[\?]$/) end
end
Note that (for example) self =~ /\d+/
and self =~ /\d/
are here interchangeable; both return a truthy value if and only if self
contains at least one digit. Incidentally, the receiver, self
, must be explicit here.
Aside to readers unfamiliar with !!
: if truthy
is a variable holding any value other than false
or nil
, !!(truthy) => true
, !!(nil) => false
and !!(false) => false
. In other words, !!
is a trick to convert truthy values to true
and falsy values to false
(which I've used merely to improve readability).
Let's try these methods:
str = 'U2d?' #=> "U2d?"
str.contains_a_digit? #=> true
str.contains_no_digits? #=> false
str.contains_an_uppercase_char? #=> true
str.contains_no_lowercase_chars? #=> false
str.ends_with_questionmark? #=> true
With Ruby 2.1+, if one is disinclined to monkey-patch the String
class, one can use Refinements.
Now the method Bob#hey
can be defined in natural way:
class Bob
def hey(remark)
case
when remark.contains_no_digits? &&
remark.contains_an_uppercase_char? &&
remark.contains_no_lowercase_chars?
'Whoa, chill out! (1st)'
when remark.ends_with_questionmark?
'Sure.'
when remark.contains_a_digit? &&
remark.contains_an_uppercase_char? &&
remark.contains_no_lowercase_chars?
'Whoa, chill out! (2nd)'
else
'Whatever.'
end
end
end
Let's try it.
bob = Bob.new
bob.hey("I PAID IN $US!") #=> "Whoa, chill out! (1st)"
bob.hey("What's that?") #=> "Sure."
bob.hey("I FLEW ON A 777!") #=> "Whoa, chill out! (2nd)"
bob.hey("I give up.") #=> "Whatever."
ruby regex: match and get position(s) of
Using Ruby 1.8.6+, you can do this:
require 'enumerator' #Only for 1.8.6, newer versions should not need this.
s = "AustinTexasDallasTexas"
positions = s.enum_for(:scan, /Texas/).map { Regexp.last_match.begin(0) }
This will create an array with:
=> [6, 17]
Trouble matching multiple patterns in Ruby (regex)
HEAD|POST
and (HEAD|POST)
match the same strings (either HEAD or POST); the second one captures the string while the first doesn't.
[HEAD|POST]
matches a single character, any of ADEHOPST or |. So "This is HEAD and a POST".match("[HEAD|POST]")
matches the single character T
in This
.
On the other hand, "This is HEAD 1 and a POST 2".match("[HEAD|POST] (.)")
can't match the leading T
because it isn't followed by a space - instead it matches the single D
at the end of HEAD
, plus the space and 1
following, capturing the 1.
Ruby Regex: Return multiple matches after a pattern
Does this help?
s.scan(/-\s(\w+)\s/)
#=> [["monday"], ["tuesday"], ["wednesday"], ["thursday"], ["friday"]]
Or:
s.scan(/-\s(\w+)\s/).map(&:first).join(" ")
#=> "monday tuesday wednesday thursday friday"
regular expression in ruby for strings with multiple patterns
Assumptions:
- Location and time start with
@
, and@
appears nowhere else. - Date starts with
on
surrounded with obligatory white spaces, andon
appears nowhere else. - Task is obligatory.
- Location and date are optional and independent of one another.
- Time appears only when there is date.
- Task, location, date, time only appear in this order.
Also, it should be taken for granted that the regex engine is oniguruma since named capture is mentioned.
regex = /
(?<task>.*?)
(?:\s*@\s*(?<location>.*?))?
(?:\s+on\s+(?<date>.*?)
(?:\s*@\s*(?<time>.*))?
)?
\z/x
string4.match(regex)
# => #<MatchData
"bike wash @ bike point on 13 may 11 @ 10 AM"
task: "bike wash"
location: "bike point"
date: "13 may 11"
time: "10 AM"
>
Related Topics
Ruby on Rails Global Activerecord::Enum
Rails Reload Dynamic Routes on Multiple Instances/Servers
Nesting Too Deep' Error While Retrieving JSON Using Httparty
Array#Rotate Equivalent in Ruby 1.8.7
Rails: Rake Db:Create:All (Could Not Connect to Server)
Trouble with Google Apps API and Service Accounts in Ruby
Respond_With Redirect with Notice Flash Message Not Working
Unescaping Characters in a String with Ruby
How to Install "Readline" for Rails Console
Controller Method #Show Getting Called
Ruby - Return Byte Array Containing Two's Complement Representation of Bignum/Fixnum
Getting Count of Elements by 'Created_At' by Day in a Given Month
Rails 3 Caching: Expire Action for Named Route
Rotate Bits Right Operation in Ruby
Rails User Profile Page Only Accessible to Correct User
Undefined Method 'Require_Relative' for Main:Object (Nomethoderror)