Extract text between two tags using regex in Ruby
You can use:
html = '<a href="abgeordnete-1128-0----w8397.html" class="small_link">Berlin-Treptow-Köpenick</a>'
html[/>(.*)</, 1]
#=> "Berlin-Treptow-Köpenick"
When your HTML partials are more complex then I recommend using a libraries like Nokogiri:
html = '<a href="abgeordnete-1128-0----w8397.html" class="small_link">Berlin-Treptow-Köpenick</a>'
require 'nokogiri'
Nokogiri::HTML(html).text
#=> "Berlin-Treptow-Köpenick"
Ruby regex to capture part of content between two strings
Match all substrings between <code>
and </code>
and replace all \n
with <br>
in those matches only:
html = html.gsub(/<code>.*?<\/code>/m) { $~[0].gsub('\n', '<br>') }
Regular expression string between tags string
The regex for this problem is really simple it is: /<(.*?)>/
For the array part is would reference to the answer on how to use one line regular expression to get matched content
EDIT:
for array of the insides of the tags use <wpf><xaml><wpf-controls>".scan(/(?:<(.*?)>)*/)
The (?: .. )
groups the tag together and the *
says we want 0 or more of that group :)
Extract text between two strings repeating multiple times
Here are two ways to extract the desired substring, if it is present. We are given the following.
str = "1;abc;111;10-nov-2017 2;abc;222;11-nov-2017 3;abc;333;12-nov-2017"
before_str = "abc;"
date_str = ";11-nov-2017"
I assume that the value of date_str
appears at most once in str
.
#1 Use a regular expression
r = /
.* # match any number of characters greedily
#{before_str} # match the content of the variable 'before_str'
(.*) # match any number characters greedily, in capture group 1
#{date_str} # match the content of the variable 'date_str'
/x # free-spacing regex definition mode
#=> /.*abc;(.*);11-nov-2017/x
str[r,1]
#=> "222"
The key here is .*
at the beginning of the regular expression. Being a greedy match it causes the next match to be the last instance of "abc;"
(the value of before_str
) that precedes ";11-nov-2017"
(the value of date_str
).
#2 Determine indices for the beginning and end of the desired subtring
idx_date = str.index(date_str)
#=> str.index(";11-nov-2017") => 31
idx_before = str.rindex(before_str, idx_date-before_str.size)
#=> str.rindex("abc;", 27) => 24
str[idx_before + before_str.size..idx_date-1]
#=> str[24+4..31-1] => str[28..30] => "222"
If either idx_date
or idx_before
were nil
, nil
would be returned and the last expression would not be evaluated.
See String#rindex, especially the function of the optional second argument.
(One could write str[idx_before + date_str.before...idx_date]
, but I find the use of three dots in ranges to be a potential source of error, so I always use two dots.)
Ruby regex: extract text between quotes
I think you want:
text.scan(/"([^"]*)"/)
It works in Rubular.
ruby, using regex to find something in between two strings
Between 1st +
and 1st @
:
to[/\+(.*?)@/,1]
Between 1st +
and last @
:
to[/\+(.*)@/,1]
Between last +
and last @
:
to[/.*\+(.*)@/,1]
Between last +
and 1st @
:
to[/.*\+(.*?)@/,1]
Ruby Regex to capture everything between two strings (inclusive)
I believe you're looking for an non-greedy regex, like this:
/<div class="the_class">(.*?)<\/div>/m
Note the added ?
. Now, the capturing group will capture as little as possible (non-greedy), instead of as most as possible (greedy).
How to split by HTML tags using a regex
If you really need to use regex to do this, you pretty much had it already.
irb(main):010:0> string.split(/<span.+?span>/)
=> ["Energia Eltrica kWh", " 10.942 ", " 0,74999294 ", " 8.206,39"]
You just needed the ?
to tell it to match as little as possible.
How to return the substring of a string between two strings in Ruby?
input_string = "blahblahblahSTARTfoofoofooENDwowowowowo"
str1_markerstring = "START"
str2_markerstring = "END"
input_string[/#{str1_markerstring}(.*?)#{str2_markerstring}/m, 1]
#=> "foofoofoo"
or to put it in a method:
class String
def string_between_markers marker1, marker2
self[/#{Regexp.escape(marker1)}(.*?)#{Regexp.escape(marker2)}/m, 1]
end
end
"blahblahblahSTARTfoofoofooENDwowowowowo".string_between_markers("START", "END")
#=> "foofoofoo"
Related Topics
Regex to Remove the Webpage Part of a Url in Ruby
Ruby, No Implicit Conversion of Symbol into Integer
How to Convert a Large Gem to Standalone Rails App
Why Doesn't This Code Produce the Desired Result
Reply to Thread Google-Api-Ruby-Client
Prawn Gem: How to Create the .Pdf from an *Existing* File (.Xls)
Consolidating Duplicate Array Items
How to Share Image and Description Using Social_Share_Button in Rails
Browser-Based Uploads Using Post
How to Merge Two Equally Sized Arrays into One Array with Sub-Arrays of Merged Values
How to Remove Devise Password Resetting During Email Confirmation
Converting Colors (Not Images) with Imagemagick
How to Get Exit Status with Ruby's Net::Ssh Library
Installed Memcached via Homebrew, How to Start and Stop Server