Disable HTML within XML escaping with Nokogiri
Because I don't have the Google Directions API installed I can't access the XML, but I have a strong suspicion the problem is the result of telling Nokogiri you're dealing with XML. As a result it's going to return you the HTML encoded like it should be in XML.
You can unescape the HTML using something like:
CGI::unescape_html('Head <b>south</b> on <b>Hidden Pond Dr</b> toward <b>Ironwood Ct</b>')
=> "Head <b>south</b> on <b>Hidden Pond Dr</b> toward <b>Ironwood Ct</b>\n"
unescape_html
is an alias to unescapeHTML
:
Unescape a string that has been HTML-escaped
CGI::unescapeHTML("Usage: foo "bar" <baz>")
# => "Usage: foo \"bar\" "
I had to think about this a bit more. It's something I've run into, but it was one of those things that escaped me during the rush at work. The fix is simple: You're using the wrong method to retrieve the content. Instead of:
puts h.inner_html
Use:
puts h.text
I proved this using:
require 'httpclient'
require 'nokogiri'
# This URL comes from: https://developers.google.com/maps/documentation/directions/#XML
url = 'http://maps.googleapis.com/maps/api/directions/xml?origin=Chicago,IL&destination=Los+Angeles,CA&waypoints=Joplin,MO|Oklahoma+City,OK&sensor=false'
clnt = HTTPClient.new
doc = Nokogiri::XML(clnt.get_content(url))
doc.search('html_instructions').each do |html|
puts html.text
end
Which outputs:
Head <b>south</b> on <b>S Federal St</b> toward <b>W Van Buren St</b>
Turn <b>right</b> onto <b>W Congress Pkwy</b>
Continue onto <b>I-290 W</b>
[...]
The difference is that inner_html
is reading the content of the node directly, without decoding. text
decodes it for you. text
, to_str
and inner_text
are aliased to content
internally in Nokogiri::XML::Node for our parsing pleasure.
Reading malformed XML with Nokogiri: Unescaped Ampersands in URL field
Had the same issue parsing SVGs with image links containing ampersands.
Parsing SVGs as HTML seems to correctly handle the links, escaping &.
fixed_svg = Nokogiri::HTML.fragment(raw_svg).to_html
# proceed with XML parsing
svg = Nokogiri::XML(fixed_svg)
How to save my changes in XML file with Nokogiri
Read the file into an in-memory XML document, modify the document as needed, then serialize the document back into the original file:
filename = 'exam.xml'
xml = File.read(filename)
doc = Nokogiri::XML(xml)
# ... make changes to doc ...
File.write(filename, doc.to_xml)
Preventing Nokogiri from escaping characters?
You are obliged to escape some characters in text elements like:
" "
' '
< <
> >
& &
If you want your text verbatim use a CDATA section since everything inside a CDATA section is ignored by the parser.
Nokogiri example:
builder = Nokogiri::HTML::Builder.new do |b|
b.html do
b.head do
b.cdata "<%= stylesheet_link_tag 'style'%>"
end
end
end
builder.to_html
This should keep you erb tags intact!
How to get Nokogiri inner_HTML object to ignore/remove escape sequences
page.at_css("td[custom-attribute='foo']")
.parent
.css('td')
.css('a')
.text # since you need a text, not inner_html
.strip # this will strip a result
String#strip
.
Sidenote: css('td a')
is likely more efficient than css('td').css('a')
.
How to unescape HTML in Nokogiri Ruby, so & remains & and not &
Use content
instead of inner_html
to get the content as plain text instead of (X)HTML.
irb(main):011:0> doc.at('head/title').content
=> "Foo & Bar"
Related Topics
Actioncontroller::Routingerror (No Route Matches [Put] ) for Ajax Call
Openssl VS Gpg for Encrypting Off-Site Backups
Paypal Website Payments Standard with a Ruby/Rails Application
When Do We Use the "||=" Operator in Rails? What Is Its Significance
Mongo - Ruby Connection Problem
Generate Letters to Represent Number Using Ruby
How to Make Nokogiri Not to Convert &Nbsp; to Space
Generate All Possibles Combinations of an Array with a Length Within a Given Range
Having Trouble Installing Any Ruby 1.9.X (With Rbenv) on MAC Osx Due to Psych Yaml Parse Errors
Error Installing Debugger-Linecache in Ruby 1.9.3
Rails: Good Rspec2 Example Usage? (Also: Cucumber, Pickle, Capybara)
Can't Install Ffi -V '1.9.18' on MACos Catalina
Difference Between an It Block and a Specify Block in Rspec
Rails Rbenv: Rails: Command Not Found
How to Install Ruby on Rails 3 on Osx
How to Run Selenium (Used Through Capybara) at a Lower Speed