Rails Xml Parsing

Rails XML parsing

There are a lot of Ruby XML parsing libraries. However, if your XML is small, you can use the ActiveSupport Hash extension .from_xml:

Hash.from_xml(x)["message"]["param"].inject({}) do |result, elem| 
result[elem["name"]] = elem["value"]
result
end
# => {"msg"=>"xxxxxxxxxxxxx", "messageType"=>"SMS", "udh"=>nil, "id"=>"xxxxxxxxxxxxxx", "target"=>"xxxxxxxxxxxxx", "source"=>"xxxxxxxxxxx"}

Parsing XML with Ruby

As @pguardiario mentioned, Nokogiri is the de facto XML and HTML parsing library. If you wanted to print out the Id and Name values in your example, here is how you would do it:

require 'nokogiri'

xml_str = <<EOF
<THING1:things type="Container">
<PART1:Id type="Property">1234</PART1:Id>
<PART1:Name type="Property">The Name</PART1:Name>
</THING1:things>
EOF

doc = Nokogiri::XML(xml_str)

thing = doc.at_xpath('//things')
puts "ID = " + thing.at_xpath('//Id').content
puts "Name = " + thing.at_xpath('//Name').content

A few notes:

  • at_xpath is for matching one thing. If you know you have multiple items, you want to use xpath instead.
  • Depending on your document, namespaces can be problematic, so calling doc.remove_namespaces! can help (see this answer for a brief discussion).
  • You can use the css methods instead of xpath if you're more comfortable with those.
  • Definitely play around with this in irb or pry to investigate methods.

Resources

  • Parsing an HTML/XML document
  • Getting started with Nokogiri

Update

To handle multiple items, you need a root element, and you need to remove the // in the xpath query.

require 'nokogiri'

xml_str = <<EOF
<root>
<THING1:things type="Container">
<PART1:Id type="Property">1234</PART1:Id>
<PART1:Name type="Property">The Name1</PART1:Name>
</THING1:things>
<THING2:things type="Container">
<PART2:Id type="Property">2234</PART2:Id>
<PART2:Name type="Property">The Name2</PART2:Name>
</THING2:things>
</root>
EOF

doc = Nokogiri::XML(xml_str)
doc.xpath('//things').each do |thing|
puts "ID = " + thing.at_xpath('Id').content
puts "Name = " + thing.at_xpath('Name').content
end

This will give you:

Id   = 1234
Name = The Name1

ID = 2234
Name = The Name2

If you are more familiar with CSS selectors, you can use this nearly identical bit of code:

doc.css('things').each do |thing|
puts "ID = " + thing.at_css('Id').content
puts "Name = " + thing.at_css('Name').content
end

Parse xml file with nokogiri

Problem #1

In this line:

       @parentN =parent.xpath('///ancestor::*/@name')

you override the previous value of @parentN.

Problem #2

By running

<% for x in 0...@parentN.count %>

You will be getting 2 values for a single valued array. .count is equivalent to the last index +1 (for an array with only [0] .count is 1. Your @parentN is assigned to an object

Recommendation (simple)

Use a single array to hold the nested values (as a hash) rather than two variables.

#xmlController.rb
@codes = []
doc.xpath('Report/Node').each do |parent|
@codes << { parent.xpath('@name') => parent.xpath('Node').map { |child| child.text }
end

#show.html.erb

<% @codes.each do |parent, children| %>
<p> PARENT: <%= @parent %> </p>
<p> CHILDREN: <%= @children.each { |child| p child } %> </p>

Recommendation based on comments below

The above was shown to demonstrate the simpilest way to think about the problem. Now that we are ready to parse all the data in the node, we need to change our xpath and our map. The doc.xpath('Report/Node') is used to select the parent node, and that can stay the same. We will want to set the @codes key to the actual value of the string embedded in the Node which is not parent.xpath('@name') but actually parent.xpath('@name')[0].value. There could be multiple xml representations of nodes with the attribute 'name' and we want the first ([0]) one. The value of the name attribute is returned using the .value method.

Make a class so the nodes become objects

Your Parent node has a name and a color and your children have name, color, and rank. It looks like you have a model for Node that looks like:

class Node
include ActiveModel::Model
attr_accessor :name, :color, :rank, :children
end

I'm simplifying things by not using persistence here, but you may want to save your records to disk, and if you do look into the slew of things ActiveRecord does on RailsGuides

Now when we go through the xml document, we will create an array of objects rather than the hash of strings (which both happen to be objects, but I'll leave that quandry for you to check out).

Parse the Xpath to get attributes of Node Objects

A quick way to set the name and color attributes of the parent looks like this:

@node = Node.new(doc.xpath('Report/Node').first.attributes.inject({}) { |attrs, value| attrs[value[0].to_sym] = value[1].value; attrs })

OK, so maybe that wasn't all that easy. What we do is take the Enumerable result of the XPath, navigate to the first attributes and make a hash of string attribute names (name, color, rank) and their corresponding values. Once we have the hash we pass it to our Node class' new method to instanciate (create) a node. This will pass us an object that we can use:

@node.name
#=> "Example Parent 1"

Extend the Class for children

Once we have the parent node, we can give it children, creating new nodes in an array. To facilitate this, we extend the definition of the model to include an overridden initializer (new()).

class Node
include ActiveModel::Model
attr_accessor :name, :color, :rank, :children

def initialize(*args)
self.children = []
super(*args)
end
end
Adding children
@node.children << Node.new(doc.xpath('Report/Node').first.xpath('Node').first.attributes.inject({}) { |attrs, value| attrs[value[0].to_sym] = value[1].value; attrs })

We can automate this process now that we know how to create a Node object using .first and a child of it using .first with the previous enumeration.

doc.xpath('Report/Node').each do |parent|
node = Node.new(parent.attributes.inject({}) { |attrs, value| attrs[value[0].to_sym] = value[1].value; attrs }))
node.children = parent.xpath('Node').map do |child|
Node.new(child.attributes.inject({}) { |attrs, value| attrs[value[0].to_sym] = value[1].value; attrs }))
end
end

Ugly controller code

Move it to the model

But Wait! That isn't very DRY! Let's move the logic that hurts our eyes to look at into the model to make it easier to work with.

class Node
include ActiveModel::Model
attr_accessor :name, :color, :rank, :children

def initialize(*args)
self.children = []
super(*args)
end

def self.new_from_xpath(xml_node)
self.new(xml_node.attributes.inject({}) { |attrs, value| attrs[value[0].to_sym] = value[1].value; attrs })
end
end

Final controller

Now the controller looks like this:

@nodes = []
doc.xpath('Report/Node').each do |parent|
node = Node.new_from_xpath(parent)
node.children = parent.xpath('Node').map do |child|
Node.new_from_xpath(child)
end
@nodes << node
end

Using this in the view

In the view you can use the @nodes like this:

<% for @node in @nodes %>
Parent: <%= @node.name %>
Children: <% for @child in @node.children %>
<%= @child.name %> is <%= @child.color %>
<% end %>
<% end %>

Rails nokogiri parse XML file

You're on the right track. parts = xml_doc.xpath('/root/rows/row') gives you back a NodeSet i.e. a list of the <row> elements.

You can loop through these using each or use row indexes like parts[0], parts[1] to access specific rows. You can then get the values of child nodes using xpath on the individual rows.

e.g. you could build a list of the AnalogueCode for each part with:

codes = []
parts.each do |row|
codes << row.xpath('AnalogueCode').text
end

Looking at the full example of the XML you're processing there are 2 issues preventing your XPath from matching:

  1. the <root> tag isn't actually the root element of the XML so /root/.. doesn't match

  2. The XML is using namespaces so you need to include these in your XPaths

so there are a couple of possible solutions:

  1. use CSS selectors rather than XPaths (i.e. use search) as suggested by the Tin Man

  2. after xml_doc = Nokogiri::XML(response.body) do xml_doc.remove_namespaces! and then use parts = xml_doc.xpath('//root/rows/row') where the double slash is XPath syntax to locate the root node anywhere in the document

  3. specify the namespaces:

e.g.

xml_doc  = Nokogiri::XML(response.body)
ns = xml_doc.collect_namespaces
parts = xml_doc.xpath('//xmlns:rows/xmlns:row', ns)

codes = []
parts.each do |row|
codes << xpath('xmlns:AnalogueCode', ns).text
end

I would go with 1. or 2. :-)

Ruby on Rails error when parsing XML

Depending on the version of Rails you use, you can change the following line to one of the options below it:

action.file_name = [doc.xpath("//field[@index='103']").first.content]

Updating to:

action.file_name = [doc.xpath("//field[@index='103']").first&.content]
# or
action.file_name = [doc.xpath("//field[@index='103']").first.try(:content)]

Both of these options protect against NilClass errors. If you don't necessarily need value for action.file_name, this will fix the error.

Otherwise, it's a case of ensuring the selector (doc.xpath("//field[@index='103']")) is definitely correct (it seems to be, as you're not getting an error calling first) and, if so, that there is definitely data in the array it returns.

Hope that helps - let me know if you've any questions.

How to use XML with Ruby on Rails

You can use from_xml to parse XML data to hash:

xml = <<-XML
<?xml version="1.0" encoding="UTF-8"?>
<hash>
<foo type="integer">1</foo>
<bar type="integer">2</bar>
</hash>
XML

hash = Hash.from_xml(xml)
# => {"hash"=>{"foo"=>1, "bar"=>2}}

Reading from a local file:

# reading the file content into a variable  
xml_file = File.read("my_xml_file.xml")
hash = Hash.from_xml(xml_file)

Reference:
https://apidock.com/rails/v4.2.7/Hash/from_xml/class

Xml parsing in rails

you can use Nokigiri here.
suppose this is your error.xml

<?xml version="1.0" encoding="UTF-8"?>
<responseParam>
<RESULT>-1</RESULT>
<ERROR_CODE>509</ERROR_CODE>
</responseParam>

you can do something like:-

@doc = Nokogiri::XML(File.open("error.xml"))
@doc.xpath("//ERROR_CODE")
will give you something like:-
# => ["<ERROR_CODE>509</ERROR_CODE>]"

The Node methods xpath and css actually return a NodeSet, which acts very much like an array, and contains matching nodes from the document.



Related Topics



Leave a reply



Submit