How to Find Out the Starting Point for Each Match in Ruby

How to find out the starting point for each match in ruby

This is actually quite a non-trivial task, and has been discussed quite a bit in other questions on SO. This is the most common solution:

string = "#Sachin is Indian cricketer. #Tendulkar is right hand batsman. #Sachin has been honoured with the Padma Vibhushan award "
new_string = string.to_enum(:scan,/#\S+/i).inject(''){|s,m| s + "#{m}|#{$`.size}|#{m.length};"}

Ruby regex match starting at specific position

How about below using StringScanner ?

require 'strscan'

scanner = StringScanner.new 'xay'
scanner.pos = 1
!!scanner.scan(/a/) # => true

scanner = StringScanner.new 'xnnay'
scanner.pos = 1
!!scanner.scan(/a/) # => false

How to match all occurrences of a regular expression in Ruby

Using scan should do the trick:

string.scan(/regex/)

Ruby Regexp#match to match start of string with given position (Python re-like)

How about using a string slice?

/^qwe/.match(test_string[def_pos..-1])

The pos parameter tells the regex engine where to start the match, but it doesn't change the behaviour of the start-of-line (and other) anchors. ^ still only matches at the start of a line (and qwe_pos is still in the middle of test_string).

Also, in Ruby, \A is the "start-of-string" anchor, \z is the "end-of-string" anchor. ^ and $ match starts/ends of lines, too, and there is no option to change that behavior (which is special to Ruby, just like the charmingly confusing use of (?m) which does what (?s) does in other regex flavors)...

ruby regex: match and get position(s) of

Using Ruby 1.8.6+, you can do this:

require 'enumerator' #Only for 1.8.6, newer versions should not need this.

s = "AustinTexasDallasTexas"
positions = s.enum_for(:scan, /Texas/).map { Regexp.last_match.begin(0) }

This will create an array with:

=> [6, 17]

Match only beginning of line in Ruby regexp

In Ruby, the caret and dollar always match before and after newlines. Ruby does not have a modifier to change this. Use \A and \Z to match at the start or the end of the string.

See: http://www.regular-expressions.info/ruby.html

How do I get the match data for all occurrences of a Ruby regular expression in a string?

You want

"abc12def34ghijklmno567pqrs".to_enum(:scan, /\d+/).map { Regexp.last_match }

which gives you

[#<MatchData "12">, #<MatchData "34">, #<MatchData "567">] 

The "trick" is, as you see, to build an enumerator in order to get each last_match.

Ruby - Find REGEX match position and w the losest match apply the regex?

sample_text = 'lots of text'

regexes = [
/ stuff 1 /,
/ different stuff 2 /,
/ different stuff 3 /,
/ different stuff 4 /,
/ different stuff 5 /
]

infinity = 1.0/0
regex_to_use = regexes.min_by{ |re| sample_text.index(re) || infinity }

You just put the regexes into an array and try them one after another. The one with the lowest match index wins. In the above code, we classify regexes that don't match at all as infinitely away from the start of the string. If more than one regex with the same proximity is found, the first is returned.

Find both pattern and position of multiple regex matches in Ruby

MatchData

string.scan(regex) do
$1 # Pattern at first position
$2 # Pattern at second position
$~.offset(1) # Starting and ending position of $1
$~.offset(2) # Starting and ending position of $2
end


Related Topics



Leave a reply



Submit