How to Convert a Nested Hash into Xml Using Nokogiri

How to convert a nested hash into XML using Nokogiri

How about this?

class Hash
  def to_xml
    map do |k, v|
      text = Hash === v ? v.to_xml : v
      "<%s>%s</%s>" % [k, text, k]
    end.join
  end
end

h.to_xml
#=> "<foo>bar</foo><foo1><foo2>bar2</foo2><foo3>bar3</foo3><foo4><foo5>bar5</foo5></foo4></foo1>"

Convert xml to hash using Nokogiri but keep the anchor tags

The only way I see now is to escape HTML inside <description> in the whole document, then execute Hash#from_xml:

doc = File.open(xml_file) { |f| Nokogiri::XML(f) }

# escape HTML inside <description>
doc.css("description").each do |node|
node.inner_html = CGI.escapeHTML(node.inner_html)
end

data = Hash.from_xml(doc.to_s) # =>

# {"blah"=>
# {
# "tag"=>[
# {
# "name"=>"My Name",
# "url"=>"www.url.com",
# "file"=>"myfile.zip",
# "description"=>"Today is a <a href=\"www.sunny.com\">sunny</a>"
# },
# {
# "name"=>"Someones Name",
# "url"=>"www.url2.com",
# "file"=>"myfile2.zip",
# "description"=>"Today is a <a href=\"www.rainy.com\">rainy</a>"
# }
# ]
# }
# }

Nokogiri is used here just for HTML escaping. You don't really need it if you find some another way to escape. For example:

xml = File.open(xml_file).read

# escaping inner HTML (maybe not the best way, just example)
xml.gsub!(/<description>(.*)<\/description>/, "<description>#{CGI.escapeHTML($1)}</description>")

data = Hash.from_xml(doc.to_s)

How to build a node from a hash using Nokogiri

Okay so I realised building nodes from a hash wasn't my best option (in my case, I'd want to have the full XML structure, even if some nodes are empty because of missing hash content).

Therefore I'm using an XML template node that already contains the full structure I want, with only arrays of length 1. So rather that building new nodes, I'm duplicating the existing one as many times as I need (preprocessing), and then I replace the content.

Because this is only painful for arrays, let's assume my contents variable, is, for the very first call, only a hash that contains arrays (but then these arrays' items can be values, hashes,...)

Duplicating template nodes of custom_xml

contents.map do |content, items|
tmp = custom_xml.search("#{content.to_s}") # Should be unique !
if tmp.count > 1 then raise "ERROR : multiple nodes match XPATH //#{content.to_s}" end
if tmp.count == 0 then raise "ERROR : No node matches \"search #{content.to_s}\" DEBUG : #{custom_xml.serialize}" end
array_node = tmp.first # <array><item>...</item></array>
template_node = array_node.first_element_child
# Okay, we have the customXML node corresponding to the first item of the content
# We need to duplicate it as many times as needed
items.each_with_index do |item, item_index|
next if item_index == 0 # Skip the first one.
array_node << template_node.dup
end
end

Then after this preprocessing is done, it's possible to actually substitute variables of the array_node(s) by calling replace_node_vars_recursively(array_node, items)

Note that for the very frst call, we indeed have an array_node with items, but the recursive function needs to handle hashes and values as well. So let's use the word content to designate this stuff, and node

Recursively change the nodes text with a "content"

def  replace_node_vars_recursively(node, content)
if content.nil?
puts "WARNING : nil content trying to be assigned to node #{node.name}"
elsif content.is_a?(Hash)
# Every key in content SHOULD have a matching child node !
content.each do |key, val|
candidates = node.search("#{key.to_s}") # Should be unique !
if candidates.count > 1
puts "WARNING : multiple child_nodes match -->#{key.to_s}<--, skipping"
next
elsif candidates.count == 0
puts "WARNING : No child node matches \"#{key.to_s}\" "
next
end
replace_node_vars_recursively(candidates.first, val)
end
# Array recursion (rq : array contains either a Hash or a value.)
elsif content.is_a?(Array)
# Let's rename the variables !
array_items = content
array_node = node
if array_items.count != array_node.element_children.count # /!\ using just "children" will return empty nodes !!!
raise "ERROR : array length (#{array_items.count}) != number of nodes of #{array_node.name} (#{array_node.element_children.count}) !"
end
array_node.element_children.each_with_index do |child_node, index| # Assume item is another content_hash. Wouldn't make sense (for me) to have just a value there...
replace_node_vars_recursively(child_node, content[index])
end
# Value terminaison
elsif content.is_a?(String) or content.is_a?(Integer) or content.is_a?(Float) or content.is_a?(Symbol) or content.is_a?(Date) or content.is_a?(Datetime)
node.content = content.to_s
puts "Replacing variable #{node.name} by #{content.to_s}"
else
puts content
raise "ERROR: unknown variable type for variable replacement !"
end
end

Ruby hash to XML: How would I create duplicate keys in a hash for repeated XML xpaths?

Okay I believe I figured it out. You have to use Hash#compare_by_identity. I believe this makes it so that the key lookups are done using object id as opposed to string matches.

I found it in "Ruby Hash with duplicate keys?".

{}.compare_by_identity

    h1 = {}
h1.compare_by_identity
h1["a"] = 1
h1["a"] = 2
p h1 # => {"a"=>1, "a"=>2}

Nokogiri XML to hash using attibute names

There is no automatic way to do this, the structure of the xml does not match the structure of your required hash. You must pick out the desired nodes from the xml manually and construct the hash from their values. Using xpath is probably the easiest, the code might look something like this:

@details = []
detail_doc.xpath("/eSummaryResult/DocSum").each do |node|
detail = {}
detail[:title] = node.xpath("Item[@Name='Title']").text
detail[:journal] = node.xpath("Item[@Name='Journal']").text
detail[:authors] = node.xpath("Item[@Name='AuthorList']/Item[@Name='Author']").map{|n| n.text}
@details.push(detail)
end

How do I convert XML into a hash in Rails?

I used to use XML::Simple in Perl because parsing XML using Perl was a PITA.

When I switched to Ruby I ended up using Nokogiri, and found it to be very easy to use for parsing HTML and XML. It's so easy that I think in terms of CSS or XPath selectors and don't miss a XML-to-hash converter.

require 'ap'
require 'date'
require 'time'
require 'nokogiri'

xml = %{
<soap:Body>
<TimesInMyDAY>
<TIME_DATA>
<StartTime>2010-11-10T09:00:00</StartTime>
<EndTime>2010-11-10T09:20:00</EndTime>
</TIME_DATA>
<TIME_DATA>
<StartTime>2010-11-10T09:20:00</StartTime>
<EndTime>2010-11-10T09:40:00</EndTime>
</TIME_DATA>
<TIME_DATA>
<StartTime>2010-11-10T09:40:00</StartTime>
<EndTime>2010-11-10T10:00:00</EndTime>
</TIME_DATA>
<TIME_DATA>
<StartTime>2010-11-10T10:00:00</StartTime>
<EndTime>2010-11-10T10:20:00</EndTime>
</TIME_DATA>
<TIME_DATA>
<StartTime>2010-11-10T10:40:00</StartTime>
<EndTime>2010-11-10T11:00:00</EndTime>
</TIME_DATA>
</TimesInMyDAY>
</soap:Body>
}

time_data = []

doc = Nokogiri::XML(xml)
doc.search('//TIME_DATA').each do |t|
start_time = t.at('StartTime').inner_text
end_time = t.at('EndTime').inner_text
time_data << {
:start_time => DateTime.parse(start_time),
:end_time => Time.parse(end_time)
}
end

puts time_data.first[:start_time].class
puts time_data.first[:end_time].class
ap time_data[0, 2]

with the output looking like:

DateTime
Time
[
[0] {
:start_time => #<DateTime: 2010-11-10T09:00:00+00:00 (19644087/8,0/1,2299161)>,
:end_time => 2010-11-10 09:20:00 -0700
},
[1] {
:start_time => #<DateTime: 2010-11-10T09:20:00+00:00 (22099598/9,0/1,2299161)>,
:end_time => 2010-11-10 09:40:00 -0700
}
]

The time values are deliberately parsed into DateTime and Time objects to show that either could be used.

Access deep nested node from document.xml using nokogiri

The error is telling you that in your XPath query, //a:blip, Nokogiri doesn’t know what namespace a refers to. You need to specify the namespaces that you are targeting in your query, not just the prefix. The fact that the prefix a is defined in the document doesn’t really matter, it is the actual namespace URI that is important. It is possible to use completely different prefixes in the query than those used in the document, as long as the namespace URIs match.

You may be wondering why the query //w:drawing works. You don’t include the full XML, but I suspect that the w prefix is defined on the root node (something like xmlns:w="http://some.uri.here"). If you don’t specify any namespaces, Nokogiri will automatically register any defined in the root node so they will be available in your query. The namespace corresponding to the a prefix isn’t defined on the root, so it is unavailable, and so you get the error you see.

To specify namespaces in Nokogiri you pass a hash, mapping the prefix (as used in the query) to namespace URI, to the xpath method (or which ever query method you’re using). Since you are providing your own namespace mappings, you also need to include any you use from the root node, Nokogiri doesn’t include them in this case.

In your case, the code would look something like this:

namespaces = {
'w' => 'http://some.uri', # whatever the URI is for this namespace
'a' => 'http://schemas.openxmlformats.org/drawingml/2006/main'
}

# You can combine this to a single query.
# Also note you don’t want a double slash infront of
# the `/a:blip` part, just one.
xml.xpath('//w:drawing/a:blip', namespaces)

Have a look at the Nokogiri tutorial section on namespaces for more info.

Looping through XML to create an array of hashes in Ruby

I use another gem 'nokogiri', maybe the best gem to parse HTML/XML now.

require 'nokogiri'

str = "<CallResult> ......"
doc = Nokogiri.XML(str)
Zones = []
doc.xpath('//ZoneInfo').each do |zone|
Zones << { "Id" => zone.xpath('Id').text, "Name" => zone.xpath('Name').text, "NId" => zone.xpath("NId").text}
end


Related Topics



Leave a reply



Submit