Extract Text Between Two Tags Using Regex in Ruby

Extract text between two tags using regex in Ruby

You can use:

html = '<a href="abgeordnete-1128-0----w8397.html" class="small_link">Berlin-Treptow-Köpenick</a>'

html[/>(.*)</, 1]
#=> "Berlin-Treptow-Köpenick"

When your HTML partials are more complex then I recommend using a libraries like Nokogiri:

html = '<a href="abgeordnete-1128-0----w8397.html" class="small_link">Berlin-Treptow-Köpenick</a>'

require 'nokogiri'

Nokogiri::HTML(html).text
#=> "Berlin-Treptow-Köpenick"

Ruby regex to capture part of content between two strings

Match all substrings between <code> and </code> and replace all \n with <br> in those matches only:

html = html.gsub(/<code>.*?<\/code>/m) { $~[0].gsub('\n', '<br>') }

Regular expression string between tags string

The regex for this problem is really simple it is: /<(.*?)>/

For the array part is would reference to the answer on how to use one line regular expression to get matched content

EDIT:
for array of the insides of the tags use <wpf><xaml><wpf-controls>".scan(/(?:<(.*?)>)*/)

The (?: .. ) groups the tag together and the * says we want 0 or more of that group :)

Extract text between two strings repeating multiple times

Here are two ways to extract the desired substring, if it is present. We are given the following.

str = "1;abc;111;10-nov-2017 2;abc;222;11-nov-2017 3;abc;333;12-nov-2017"
before_str = "abc;"
date_str = ";11-nov-2017"

I assume that the value of date_str appears at most once in str.

#1 Use a regular expression

r = /
.* # match any number of characters greedily
#{before_str} # match the content of the variable 'before_str'
(.*) # match any number characters greedily, in capture group 1
#{date_str} # match the content of the variable 'date_str'
/x # free-spacing regex definition mode
#=> /.*abc;(.*);11-nov-2017/x

str[r,1]
#=> "222"

The key here is .* at the beginning of the regular expression. Being a greedy match it causes the next match to be the last instance of "abc;" (the value of before_str) that precedes ";11-nov-2017" (the value of date_str).

#2 Determine indices for the beginning and end of the desired subtring

idx_date = str.index(date_str)
#=> str.index(";11-nov-2017") => 31
idx_before = str.rindex(before_str, idx_date-before_str.size)
#=> str.rindex("abc;", 27) => 24
str[idx_before + before_str.size..idx_date-1]
#=> str[24+4..31-1] => str[28..30] => "222"

If either idx_date or idx_before were nil, nil would be returned and the last expression would not be evaluated.

See String#rindex, especially the function of the optional second argument.

(One could write str[idx_before + date_str.before...idx_date], but I find the use of three dots in ranges to be a potential source of error, so I always use two dots.)

Ruby regex: extract text between quotes

I think you want:

text.scan(/"([^"]*)"/)

It works in Rubular.

ruby, using regex to find something in between two strings

Between 1st + and 1st @:

to[/\+(.*?)@/,1]

Between 1st + and last @:

to[/\+(.*)@/,1]

Between last + and last @:

to[/.*\+(.*)@/,1]

Between last + and 1st @:

to[/.*\+(.*?)@/,1]

Ruby Regex to capture everything between two strings (inclusive)

I believe you're looking for an non-greedy regex, like this:

/<div class="the_class">(.*?)<\/div>/m

Note the added ?. Now, the capturing group will capture as little as possible (non-greedy), instead of as most as possible (greedy).

How to split by HTML tags using a regex

If you really need to use regex to do this, you pretty much had it already.

irb(main):010:0> string.split(/<span.+?span>/)
=> ["Energia Eltrica kWh", " 10.942 ", " 0,74999294 ", " 8.206,39"]

You just needed the ? to tell it to match as little as possible.

How to return the substring of a string between two strings in Ruby?

input_string = "blahblahblahSTARTfoofoofooENDwowowowowo"
str1_markerstring = "START"
str2_markerstring = "END"

input_string[/#{str1_markerstring}(.*?)#{str2_markerstring}/m, 1]
#=> "foofoofoo"

or to put it in a method:

class String
def string_between_markers marker1, marker2
self[/#{Regexp.escape(marker1)}(.*?)#{Regexp.escape(marker2)}/m, 1]
end
end

"blahblahblahSTARTfoofoofooENDwowowowowo".string_between_markers("START", "END")
#=> "foofoofoo"


Related Topics



Leave a reply



Submit