Extract Text Between Two Tags Using Regex in Ruby

Extract text between two tags using regex in Ruby

You can use:

html = '<a href="abgeordnete-1128-0----w8397.html" class="small_link">Berlin-Treptow-Köpenick</a>'

html[/>(.*)</, 1]
#=> "Berlin-Treptow-Köpenick"

When your HTML partials are more complex then I recommend using a libraries like Nokogiri:

html = '<a href="abgeordnete-1128-0----w8397.html" class="small_link">Berlin-Treptow-Köpenick</a>'

require 'nokogiri'

#=> "Berlin-Treptow-Köpenick"

Ruby regex to capture part of content between two strings

Match all substrings between <code> and </code> and replace all \n with <br> in those matches only:

html = html.gsub(/<code>.*?<\/code>/m) { $~[0].gsub('\n', '<br>') }

Regular expression string between tags string

The regex for this problem is really simple it is: /<(.*?)>/

For the array part is would reference to the answer on how to use one line regular expression to get matched content

for array of the insides of the tags use <wpf><xaml><wpf-controls>".scan(/(?:<(.*?)>)*/)

The (?: .. ) groups the tag together and the * says we want 0 or more of that group :)

Extract text between two strings repeating multiple times

Here are two ways to extract the desired substring, if it is present. We are given the following.

str = "1;abc;111;10-nov-2017 2;abc;222;11-nov-2017 3;abc;333;12-nov-2017"
before_str = "abc;"
date_str = ";11-nov-2017"

I assume that the value of date_str appears at most once in str.

#1 Use a regular expression

r = /
.* # match any number of characters greedily
#{before_str} # match the content of the variable 'before_str'
(.*) # match any number characters greedily, in capture group 1
#{date_str} # match the content of the variable 'date_str'
/x # free-spacing regex definition mode
#=> /.*abc;(.*);11-nov-2017/x

#=> "222"

The key here is .* at the beginning of the regular expression. Being a greedy match it causes the next match to be the last instance of "abc;" (the value of before_str) that precedes ";11-nov-2017" (the value of date_str).

#2 Determine indices for the beginning and end of the desired subtring

idx_date = str.index(date_str)
#=> str.index(";11-nov-2017") => 31
idx_before = str.rindex(before_str, idx_date-before_str.size)
#=> str.rindex("abc;", 27) => 24
str[idx_before + before_str.size..idx_date-1]
#=> str[24+4..31-1] => str[28..30] => "222"

If either idx_date or idx_before were nil, nil would be returned and the last expression would not be evaluated.

See String#rindex, especially the function of the optional second argument.

(One could write str[idx_before + date_str.before...idx_date], but I find the use of three dots in ranges to be a potential source of error, so I always use two dots.)

Ruby regex: extract text between quotes

I think you want:


It works in Rubular.

ruby, using regex to find something in between two strings

Between 1st + and 1st @:


Between 1st + and last @:


Between last + and last @:


Between last + and 1st @:


Ruby Regex to capture everything between two strings (inclusive)

I believe you're looking for an non-greedy regex, like this:

/<div class="the_class">(.*?)<\/div>/m

Note the added ?. Now, the capturing group will capture as little as possible (non-greedy), instead of as most as possible (greedy).

How to split by HTML tags using a regex

If you really need to use regex to do this, you pretty much had it already.

irb(main):010:0> string.split(/<span.+?span>/)
=> ["Energia Eltrica kWh", " 10.942 ", " 0,74999294 ", " 8.206,39"]

You just needed the ? to tell it to match as little as possible.

How to return the substring of a string between two strings in Ruby?

input_string = "blahblahblahSTARTfoofoofooENDwowowowowo"
str1_markerstring = "START"
str2_markerstring = "END"

input_string[/#{str1_markerstring}(.*?)#{str2_markerstring}/m, 1]
#=> "foofoofoo"

or to put it in a method:

class String
def string_between_markers marker1, marker2
self[/#{Regexp.escape(marker1)}(.*?)#{Regexp.escape(marker2)}/m, 1]

"blahblahblahSTARTfoofoofooENDwowowowowo".string_between_markers("START", "END")
#=> "foofoofoo"

