How to convert a nested hash into XML using Nokogiri
How about this?
class Hash
def to_xml
map do |k, v|
text = Hash === v ? v.to_xml : v
"<%s>%s</%s>" % [k, text, k]
end.join
end
end
h.to_xml
#=> "<foo>bar</foo><foo1><foo2>bar2</foo2><foo3>bar3</foo3><foo4><foo5>bar5</foo5></foo4></foo1>"
Convert xml to hash using Nokogiri but keep the anchor tags
The only way I see now is to escape HTML inside <description>
in the whole document, then execute Hash#from_xml
:
doc = File.open(xml_file) { |f| Nokogiri::XML(f) }
# escape HTML inside <description>
doc.css("description").each do |node|
node.inner_html = CGI.escapeHTML(node.inner_html)
end
data = Hash.from_xml(doc.to_s) # =>
# {"blah"=>
# {
# "tag"=>[
# {
# "name"=>"My Name",
# "url"=>"www.url.com",
# "file"=>"myfile.zip",
# "description"=>"Today is a <a href=\"www.sunny.com\">sunny</a>"
# },
# {
# "name"=>"Someones Name",
# "url"=>"www.url2.com",
# "file"=>"myfile2.zip",
# "description"=>"Today is a <a href=\"www.rainy.com\">rainy</a>"
# }
# ]
# }
# }
Nokogiri is used here just for HTML escaping. You don't really need it if you find some another way to escape. For example:
xml = File.open(xml_file).read
# escaping inner HTML (maybe not the best way, just example)
xml.gsub!(/<description>(.*)<\/description>/, "<description>#{CGI.escapeHTML($1)}</description>")
data = Hash.from_xml(doc.to_s)
How to build a node from a hash using Nokogiri
Okay so I realised building nodes from a hash wasn't my best option (in my case, I'd want to have the full XML structure, even if some nodes are empty because of missing hash content).
Therefore I'm using an XML template node that already contains the full structure I want, with only arrays of length 1. So rather that building new nodes, I'm duplicating the existing one as many times as I need (preprocessing), and then I replace the content.
Because this is only painful for arrays, let's assume my contents
variable, is, for the very first call, only a hash that contains arrays (but then these arrays' items can be values, hashes,...)
Duplicating template nodes of custom_xml
contents.map do |content, items|
tmp = custom_xml.search("#{content.to_s}") # Should be unique !
if tmp.count > 1 then raise "ERROR : multiple nodes match XPATH //#{content.to_s}" end
if tmp.count == 0 then raise "ERROR : No node matches \"search #{content.to_s}\" DEBUG : #{custom_xml.serialize}" end
array_node = tmp.first # <array><item>...</item></array>
template_node = array_node.first_element_child
# Okay, we have the customXML node corresponding to the first item of the content
# We need to duplicate it as many times as needed
items.each_with_index do |item, item_index|
next if item_index == 0 # Skip the first one.
array_node << template_node.dup
end
end
Then after this preprocessing is done, it's possible to actually substitute variables of the array_node(s) by calling replace_node_vars_recursively(array_node, items)
Note that for the very frst call, we indeed have an array_node
with items
, but the recursive function needs to handle hashes and values as well. So let's use the word content
to designate this stuff, and node
Recursively change the nodes text with a "content"
def replace_node_vars_recursively(node, content)
if content.nil?
puts "WARNING : nil content trying to be assigned to node #{node.name}"
elsif content.is_a?(Hash)
# Every key in content SHOULD have a matching child node !
content.each do |key, val|
candidates = node.search("#{key.to_s}") # Should be unique !
if candidates.count > 1
puts "WARNING : multiple child_nodes match -->#{key.to_s}<--, skipping"
next
elsif candidates.count == 0
puts "WARNING : No child node matches \"#{key.to_s}\" "
next
end
replace_node_vars_recursively(candidates.first, val)
end
# Array recursion (rq : array contains either a Hash or a value.)
elsif content.is_a?(Array)
# Let's rename the variables !
array_items = content
array_node = node
if array_items.count != array_node.element_children.count # /!\ using just "children" will return empty nodes !!!
raise "ERROR : array length (#{array_items.count}) != number of nodes of #{array_node.name} (#{array_node.element_children.count}) !"
end
array_node.element_children.each_with_index do |child_node, index| # Assume item is another content_hash. Wouldn't make sense (for me) to have just a value there...
replace_node_vars_recursively(child_node, content[index])
end
# Value terminaison
elsif content.is_a?(String) or content.is_a?(Integer) or content.is_a?(Float) or content.is_a?(Symbol) or content.is_a?(Date) or content.is_a?(Datetime)
node.content = content.to_s
puts "Replacing variable #{node.name} by #{content.to_s}"
else
puts content
raise "ERROR: unknown variable type for variable replacement !"
end
end
Ruby hash to XML: How would I create duplicate keys in a hash for repeated XML xpaths?
Okay I believe I figured it out. You have to use Hash#compare_by_identity
. I believe this makes it so that the key lookups are done using object id as opposed to string matches.
I found it in "Ruby Hash with duplicate keys?".
{}.compare_by_identity
h1 = {}
h1.compare_by_identity
h1["a"] = 1
h1["a"] = 2
p h1 # => {"a"=>1, "a"=>2}
Nokogiri XML to hash using attibute names
There is no automatic way to do this, the structure of the xml does not match the structure of your required hash. You must pick out the desired nodes from the xml manually and construct the hash from their values. Using xpath is probably the easiest, the code might look something like this:
@details = []
detail_doc.xpath("/eSummaryResult/DocSum").each do |node|
detail = {}
detail[:title] = node.xpath("Item[@Name='Title']").text
detail[:journal] = node.xpath("Item[@Name='Journal']").text
detail[:authors] = node.xpath("Item[@Name='AuthorList']/Item[@Name='Author']").map{|n| n.text}
@details.push(detail)
end
How do I convert XML into a hash in Rails?
I used to use XML::Simple in Perl because parsing XML using Perl was a PITA.
When I switched to Ruby I ended up using Nokogiri, and found it to be very easy to use for parsing HTML and XML. It's so easy that I think in terms of CSS or XPath selectors and don't miss a XML-to-hash converter.
require 'ap'
require 'date'
require 'time'
require 'nokogiri'
xml = %{
<soap:Body>
<TimesInMyDAY>
<TIME_DATA>
<StartTime>2010-11-10T09:00:00</StartTime>
<EndTime>2010-11-10T09:20:00</EndTime>
</TIME_DATA>
<TIME_DATA>
<StartTime>2010-11-10T09:20:00</StartTime>
<EndTime>2010-11-10T09:40:00</EndTime>
</TIME_DATA>
<TIME_DATA>
<StartTime>2010-11-10T09:40:00</StartTime>
<EndTime>2010-11-10T10:00:00</EndTime>
</TIME_DATA>
<TIME_DATA>
<StartTime>2010-11-10T10:00:00</StartTime>
<EndTime>2010-11-10T10:20:00</EndTime>
</TIME_DATA>
<TIME_DATA>
<StartTime>2010-11-10T10:40:00</StartTime>
<EndTime>2010-11-10T11:00:00</EndTime>
</TIME_DATA>
</TimesInMyDAY>
</soap:Body>
}
time_data = []
doc = Nokogiri::XML(xml)
doc.search('//TIME_DATA').each do |t|
start_time = t.at('StartTime').inner_text
end_time = t.at('EndTime').inner_text
time_data << {
:start_time => DateTime.parse(start_time),
:end_time => Time.parse(end_time)
}
end
puts time_data.first[:start_time].class
puts time_data.first[:end_time].class
ap time_data[0, 2]
with the output looking like:
DateTime
Time
[
[0] {
:start_time => #<DateTime: 2010-11-10T09:00:00+00:00 (19644087/8,0/1,2299161)>,
:end_time => 2010-11-10 09:20:00 -0700
},
[1] {
:start_time => #<DateTime: 2010-11-10T09:20:00+00:00 (22099598/9,0/1,2299161)>,
:end_time => 2010-11-10 09:40:00 -0700
}
]
The time values are deliberately parsed into DateTime and Time objects to show that either could be used.
Access deep nested node from document.xml using nokogiri
The error is telling you that in your XPath query, //a:blip
, Nokogiri doesn’t know what namespace a
refers to. You need to specify the namespaces that you are targeting in your query, not just the prefix. The fact that the prefix a
is defined in the document doesn’t really matter, it is the actual namespace URI that is important. It is possible to use completely different prefixes in the query than those used in the document, as long as the namespace URIs match.
You may be wondering why the query //w:drawing
works. You don’t include the full XML, but I suspect that the w
prefix is defined on the root node (something like xmlns:w="http://some.uri.here"
). If you don’t specify any namespaces, Nokogiri will automatically register any defined in the root node so they will be available in your query. The namespace corresponding to the a
prefix isn’t defined on the root, so it is unavailable, and so you get the error you see.
To specify namespaces in Nokogiri you pass a hash, mapping the prefix (as used in the query) to namespace URI, to the xpath
method (or which ever query method you’re using). Since you are providing your own namespace mappings, you also need to include any you use from the root node, Nokogiri doesn’t include them in this case.
In your case, the code would look something like this:
namespaces = {
'w' => 'http://some.uri', # whatever the URI is for this namespace
'a' => 'http://schemas.openxmlformats.org/drawingml/2006/main'
}
# You can combine this to a single query.
# Also note you don’t want a double slash infront of
# the `/a:blip` part, just one.
xml.xpath('//w:drawing/a:blip', namespaces)
Have a look at the Nokogiri tutorial section on namespaces for more info.
Looping through XML to create an array of hashes in Ruby
I use another gem 'nokogiri', maybe the best gem to parse HTML/XML now.
require 'nokogiri'
str = "<CallResult> ......"
doc = Nokogiri.XML(str)
Zones = []
doc.xpath('//ZoneInfo').each do |zone|
Zones << { "Id" => zone.xpath('Id').text, "Name" => zone.xpath('Name').text, "NId" => zone.xpath("NId").text}
end
Related Topics
How to Get the Destination Url of a Shortened Url Using Ruby
How to Understand Sender and Receiver in Ruby
Rails 3 Actionmail Openssl::Ssl::Sslerror
How to Override Gemfile for Local Development
What's a Rails Plugin, or Ruby Gem, to Automatically Fix English Grammar
Troubles with Ruby-2.X Installation Using Rvm
Upgrading to Ruby 2.1.3 on MAC Osx 10.9.5
Understanding Precedence of Assignment and Logical Operator in Ruby
Why Does Hash.New({}) Hide Hash Members
Understanding Ruby Splat in Ranges and Arrays
How to Scrape Pages Which Have Lazy Loading
Finding the Product of a Variable Number of Ruby Arrays
Case Expression Different in Ruby 1.9
Install Ruby 2.2 on MAC Osx Catalina with Ruby-Install
Rails: Wkhtmltopdf Runtimeerror (Location of Wkhtmltopdf Unknown)