Find Both Pattern and Position of Multiple Regex Matches in Ruby

Find both pattern and position of multiple regex matches in Ruby

MatchData

string.scan(regex) do
$1 # Pattern at first position
$2 # Pattern at second position
$~.offset(1) # Starting and ending position of $1
$~.offset(2) # Starting and ending position of $2
end

How to match multiple regex patterns in ruby

Because "ll" is inside "Hello", logic to include both in same scan method call requires a slightly clumsy-looking expression that double-captures the "ll". This seems close, but note the sequence interleaves "Hello" and "ll", unlike the expected output. However, as far as I can see, that would be a necessity for any regular expression that makes a single pass through the string:

str = "Hello, how are you. Hello, I am lloyds"
a = str.scan( /(He(ll)o|ll)/ ).flatten.compact
=> ["Hello", "ll", "Hello", "ll", "ll"]

The compact is necessary, because a lone "ll" will not match the inner capture, and the array may contain unwanted nils.

ruby regex to match multiple occurrences of pattern

Try this:

 => str.match(/\[\[(.*)\]\].*\[\[(.*)\]\]/).captures
=> ["lead:first_name", "client:last_name"]

With many occurrences:

 => str
=> "Some [[lead:first_name]] random text[[lead:first_name]] and more [[lead:first_name]] stuff [[client:last_name]]"
=> str.scan(/\[(\w+:\w+)\]/)
=> [["lead:first_name"], ["lead:first_name"], ["lead:first_name"], ["client:last_name"]]

How to match all occurrences of a regular expression in Ruby

Using scan should do the trick:

string.scan(/regex/)

Find the position of each capture group in a string

I can't find a easy way to find the position in a string of the begining a capture group.

Like this:

str = 'aa123bb456'

str.scan(/(.)(.)(\d+)/) do
md = Regexp.last_match
p md.offset(1)
p md.offset(2)
p md.offset(3)
puts '-' * 20
end

--output:--
[0, 1]
[1, 2]
[2, 5]
--------------------
[5, 6]
[6, 7]
[7, 10]

In the first match, the beginning of the capture groups in the string are 0, 1, 2, and for the second match the beginning of the capture groups are 5, 6, 7.

Alternatively, if you only want the start of each capture, as hwnd demonstrated, you can do this:

str = 'aa123bb456'

str.scan(/(.)(.)(\d+)/) do
md = Regexp.last_match
p md.begin(1)
p md.begin(2)
p md.begin(3)
puts '-' * 20
end

--output:--
0
1
2
--------------------
5
6
7
--------------------

Yes, but I don't know how many times the regex will match

How is that relevant?

Response to edit:

str = "ggtgtcaactatccgccgcgaagcacgtaacgtctctcttgacaccgaatcataggtgcgacagt"
regex = /cg(.)a(.)/

results = []

str.scan(regex) do
md = Regexp.last_match
results << md.begin(1) << md.begin(2)
end

p results

--output:--
[20, 22, 27, 29, 47, 49]

Better way to check for multiple regex conditions in Ruby?

I find it difficult to comprehend what Bob#hey is doing. One possibility is to define some appropriately-named helper methods in the String class:

class String
def contains_a_digit?() !!(self =~ /\d/) end
def contains_no_digits?() !self.contains_a_digit? end
def contains_an_uppercase_char?() !!(self =~ /[A-Z]/) end
def contains_no_lowercase_chars?() self !~ /[a-z]/ end
def ends_with_questionmark?() !!(self =~ /[\?]$/) end
end

Note that (for example) self =~ /\d+/ and self =~ /\d/ are here interchangeable; both return a truthy value if and only if self contains at least one digit. Incidentally, the receiver, self, must be explicit here.

Aside to readers unfamiliar with !!: if truthy is a variable holding any value other than false or nil, !!(truthy) => true, !!(nil) => false and !!(false) => false. In other words, !! is a trick to convert truthy values to true and falsy values to false (which I've used merely to improve readability).

Let's try these methods:

str = 'U2d?'                     #=> "U2d?" 
str.contains_a_digit? #=> true
str.contains_no_digits? #=> false
str.contains_an_uppercase_char? #=> true
str.contains_no_lowercase_chars? #=> false
str.ends_with_questionmark? #=> true

With Ruby 2.1+, if one is disinclined to monkey-patch the String class, one can use Refinements.

Now the method Bob#hey can be defined in natural way:

class Bob
def hey(remark)
case
when remark.contains_no_digits? &&
remark.contains_an_uppercase_char? &&
remark.contains_no_lowercase_chars?
'Whoa, chill out! (1st)'
when remark.ends_with_questionmark?
'Sure.'
when remark.contains_a_digit? &&
remark.contains_an_uppercase_char? &&
remark.contains_no_lowercase_chars?
'Whoa, chill out! (2nd)'
else
'Whatever.'
end
end
end

Let's try it.

bob = Bob.new
bob.hey("I PAID IN $US!") #=> "Whoa, chill out! (1st)"
bob.hey("What's that?") #=> "Sure."
bob.hey("I FLEW ON A 777!") #=> "Whoa, chill out! (2nd)"
bob.hey("I give up.") #=> "Whatever."

ruby regex: match and get position(s) of

Using Ruby 1.8.6+, you can do this:

require 'enumerator' #Only for 1.8.6, newer versions should not need this.

s = "AustinTexasDallasTexas"
positions = s.enum_for(:scan, /Texas/).map { Regexp.last_match.begin(0) }

This will create an array with:

=> [6, 17]

Trouble matching multiple patterns in Ruby (regex)

HEAD|POST and (HEAD|POST) match the same strings (either HEAD or POST); the second one captures the string while the first doesn't.

[HEAD|POST] matches a single character, any of ADEHOPST or |. So "This is HEAD and a POST".match("[HEAD|POST]") matches the single character T in This.

On the other hand, "This is HEAD 1 and a POST 2".match("[HEAD|POST] (.)") can't match the leading T because it isn't followed by a space - instead it matches the single D at the end of HEAD, plus the space and 1 following, capturing the 1.

Ruby Regex: Return multiple matches after a pattern

Does this help?

s.scan(/-\s(\w+)\s/) 
#=> [["monday"], ["tuesday"], ["wednesday"], ["thursday"], ["friday"]]

Or:

s.scan(/-\s(\w+)\s/).map(&:first).join(" ") 
#=> "monday tuesday wednesday thursday friday"

regular expression in ruby for strings with multiple patterns

Assumptions:

  • Location and time start with @, and @ appears nowhere else.
  • Date starts with on surrounded with obligatory white spaces, and on appears nowhere else.
  • Task is obligatory.
  • Location and date are optional and independent of one another.
  • Time appears only when there is date.
  • Task, location, date, time only appear in this order.

Also, it should be taken for granted that the regex engine is oniguruma since named capture is mentioned.

regex = /
(?<task>.*?)
(?:\s*@\s*(?<location>.*?))?
(?:\s+on\s+(?<date>.*?)
(?:\s*@\s*(?<time>.*))?
)?
\z/x

string4.match(regex)
# => #<MatchData
"bike wash @ bike point on 13 may 11 @ 10 AM"
task: "bike wash"
location: "bike point"
date: "13 may 11"
time: "10 AM"
>


Related Topics



Leave a reply



Submit