How to Convert Nokogiri Document Object into JSON

How to convert Nokogiri Document object into JSON

Here's one way to do it. As noted by my comment, the 'right' answer depends on what your output should be. There is no canonical representation of XML nodes in JSON, and hence no such capability is built into the libraries involved:

require 'nokogiri'
require 'json'
class Nokogiri::XML::Node
def to_json(*a)
{"$name"=>name}.tap do |h|
kids = children.to_a
h.merge!(attributes)
h.merge!("$text"=>text) unless text.empty?
h.merge!("$kids"=>kids) unless kids.empty?
end.to_json(*a)
end
end
class Nokogiri::XML::Document
def to_json(*a); root.to_json(*a); end
end
class Nokogiri::XML::Text
def to_json(*a); text.to_json(*a); end
end
class Nokogiri::XML::Attr
def to_json(*a); value.to_json(*a); end
end

xml = Nokogiri::XML '<root a="b" xmlns:z="zzz">
<z:a>Hello <b z:x="y">World</b>!</z:a>
</root>'
puts xml.to_json
{
"$name":"root",
"a":"b",
"$text":"Hello World!",
"$kids":[
{
"$name":"a",
"$text":"Hello World!",
"$kids":[
"Hello ",
{
"$name":"b",
"x":"y",
"$text":"World",
"$kids":[
"World"
]
},
"!"
]
}
]
}

Note that the above completely ignores namespaces, which may or may not be what you want.


Converting to JsonML

Here's another alternative that converts to JsonML. While this is a lossy conversion (it does not support comment nodes, DTDs, or namespace URLs) and the format is a little bit "goofy" by design (the first child element is at [1] or [2] depending on whether or not attributes are present), it does indicate namespace prefixes for elements and attributes:

require 'nokogiri'
require 'json'
class Nokogiri::XML::Node
def namespaced_name
"#{namespace && "#{namespace.prefix}:"}#{name}"
end
end
class Nokogiri::XML::Element
def to_json(*a)
[namespaced_name].tap do |parts|
unless attributes.empty?
parts << Hash[ attribute_nodes.map{ |a| [a.namespaced_name,a.value] } ]
end
parts.concat(children.select{|n| n.text? ? (n.text=~/\S/) : n.element? })
end.to_json(*a)
end
end
class Nokogiri::XML::Document
def to_json(*a); root.to_json(*a); end
end
class Nokogiri::XML::Text
def to_json(*a); text.to_json(*a); end
end
class Nokogiri::XML::Attr
def to_json(*a); value.to_json(*a); end
end

xml = Nokogiri::XML '<root a="b" xmlns:z="zzz">
<z:a>Hello <b z:x="y">World</b>!</z:a>
</root>'
puts xml.to_json
#=> ["root",{"a":"b"},["z:a","Hello ",["b",{"z:x":"y"},"World"],"!"]]

How to return a JSON object with Nokogiri

I'm not sure why you'd use Nokogiri to parse JSON objects. Nokogiri is for parsing XML/HTML. The default response type of Facebook's API is already JSON.

Better use httparty, faraday, or plain ol OpenURI and JSON.parse.

And here it is on cURL:

curl -i -H "Content-type: application/json" -H "Accept: application/json" -X GET \"https://graph.facebook.com/v2.8/me?fields=id%2Cname&access_token=<ACCESS_TOKEN>

Convert a Nokogiri document to a Ruby Hash

I use this code with libxml-ruby (1.1.3). I have not used nokogiri myself, but I understand that it uses libxml-ruby anyway. I would also encourage you to look at ROXML (http://github.com/Empact/roxml/tree) which maps xml elements to ruby objects; it is built atop libxml.

# USAGE: Hash.from_libxml(YOUR_XML_STRING)
require 'xml/libxml'
# adapted from
# http://movesonrails.com/articles/2008/02/25/libxml-for-active-resource-2-0

class Hash
class << self
def from_libxml(xml, strict=true)
begin
XML.default_load_external_dtd = false
XML.default_pedantic_parser = strict
result = XML::Parser.string(xml).parse
return { result.root.name.to_s => xml_node_to_hash(result.root)}
rescue Exception => e
# raise your custom exception here
end
end

def xml_node_to_hash(node)
# If we are at the root of the document, start the hash
if node.element?
if node.children?
result_hash = {}

node.each_child do |child|
result = xml_node_to_hash(child)

if child.name == "text"
if !child.next? and !child.prev?
return result
end
elsif result_hash[child.name.to_sym]
if result_hash[child.name.to_sym].is_a?(Object::Array)
result_hash[child.name.to_sym] << result
else
result_hash[child.name.to_sym] = [result_hash[child.name.to_sym]] << result
end
else
result_hash[child.name.to_sym] = result
end
end

return result_hash
else
return nil
end
else
return node.content.to_s
end
end
end
end

XML to JSON using Ruby and save it for separate file

I'm assuming you want to print the listing blocks to individual files as JSON. If you have access to 'active_support/core_ext' and 'nokogiri', and you aren't too concerned about how your XML is converted to JSON, you can just do:

require 'active_support/core_ext'
require 'nokogiri'

xml = Nokogiri::XML(File.read "yourfile")

xml.search("//listing").each do |l|
filename = l.at_xpath("id").content
File.open(filename + '.json', 'w') do |file|
file.print Hash.from_xml(l.to_xml).to_json
end
end

Scraping Table with Nokogiri and need JSON output

I think the general approach is:

  1. Create a hash for each table where the key is the employee
  2. Merge the results from both tables together
  3. Convert to JSON

Create a hash for each table where the key is the employee

This part you can do in Watir or Nokogiri. It only makes sense to use Nokogiri if Watir is giving poor performance due large tables.

Watir:

#I assume you would have a better way to identify the tables than by index
hours_table = browser.table(:index, 0)
wage_table = browser.table(:index, 1)

#Turn the tables into a hash
employee_hours = {}
hours_table.trs.drop(1).each do |tr|
tds = tr.tds
employee_hours[ tds[0].text ] = {"Reg Hours" => tds[1].text, "OT Hours" => tds[2].text}
end
#=> {"Employee 1"=>{"Reg Hours"=>"10", "OT Hours"=>"20"}, "Employee 2"=>{"Reg Hours"=>"5", "OT Hours"=>"10"}}

employee_wage = {}
wage_table.trs.drop(1).each do |tr|
tds = tr.tds
employee_wage[ tds[0].text ] = {"Revenue" => tds[1].text}
end
#=> {"Employee 2"=>{"Revenue"=>"$10"}, "Employee 1"=>{"Revenue"=>"$50"}}

Nokogiri:

page = Nokogiri::HTML.parse(browser.html)

hours_table = page.search('table')[0]
wage_table = page.search('table')[1]

employee_hours = {}
hours_table.search('tr').drop(1).each do |tr|
tds = tr.search('td')
employee_hours[ tds[0].text ] = {"Reg Hours" => tds[1].text, "OT Hours" => tds[2].text}
end
#=> {"Employee 1"=>{"Reg Hours"=>"10", "OT Hours"=>"20"}, "Employee 2"=>{"Reg Hours"=>"5", "OT Hours"=>"10"}}

employee_wage = {}
wage_table.search('tr').drop(1).each do |tr|
tds = tr.search('td')
employee_wage[ tds[0].text ] = {"Revenue" => tds[1].text}
end
#=> {"Employee 2"=>{"Revenue"=>"$10"}, "Employee 1"=>{"Revenue"=>"$50"}}

Merge the results from both tables together

You want to merge the two hashes together so that for a specific employee, the hash will include their hours as well as their revenue.

employee = employee_hours.merge(employee_wage){ |key, old, new| new.merge(old) }
#=> {"Employee 1"=>{"Revenue"=>"$50", "Reg Hours"=>"10", "OT Hours"=>"20"}, "Employee 2"=>{"Revenue"=>"$10", "Reg Hours"=>"5", "OT Hours"=>"10"}}

Convert to JSON

Based on this previous question, you can then convert the hash to json.

require 'json'
employee.to_json

How to convert Nokogiri object to xml file in rails

Ahh okay now I see... try this:

data = Hash.from_xml(client.profile.to_s)

hf...

Extract some JSON using Nokogiri

Here's how you can access the script tags (that don't reference an external file) from a URL:

require 'open-uri'
require 'nokogiri'
doc = Nokogiri.HTML(open('http://www.highcharts.com/demo/'))
inline_script = doc.xpath('//script[not(@src)]')
inline_script.each do |script|
puts "-"*50, script.text
end

Now you just need to find the script block you want and extract just the data you want (using regex). Without more details, it's hard to guess what you want and are relying upon.

Here's a fairly fragile regex that finds what I'm guessing you were looking for:

inline = doc.xpath('//script[not(@src)]').map(&:text)
data = inline.map{ |js| js[/new Highcharts\.Chart\((.+?\})\);/m,1] }.compact[0]
puts data

Here's what you get out:

{
chart: {
renderTo: 'container',
defaultSeriesType: 'line',
marginRight: 130,
marginBottom: 25
},
title: {
text: 'Monthly Average Temperature',
x: -20 //center
},
subtitle: {
text: 'Source: WorldClimate.com',
x: -20
},
xAxis: {
categories: ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
},
yAxis: {
title: {
text: 'Temperature (°C)'
},
plotLines: [{
value: 0,
width: 1,
color: '#808080'
}]
},
tooltip: {
formatter: function() {
return '<b>'+ this.series.name +'</b><br/>'+
this.x +': '+ this.y +'°C';
}
},
legend: {
layout: 'vertical',
align: 'right',
verticalAlign: 'top',
x: -10,
y: 100,
borderWidth: 0
},
series: [{
name: 'Tokyo',
data: [7.0, 6.9, 9.5, 14.5, 18.2, 21.5, 25.2, 26.5, 23.3, 18.3, 13.9, 9.6]
}, {
name: 'New York',
data: [-0.2, 0.8, 5.7, 11.3, 17.0, 22.0, 24.8, 24.1, 20.1, 14.1, 8.6, 2.5]
}, {
name: 'Berlin',
data: [-0.9, 0.6, 3.5, 8.4, 13.5, 17.0, 18.6, 17.9, 14.3, 9.0, 3.9, 1.0]
}, {
name: 'London',
data: [3.9, 4.2, 5.7, 8.5, 11.9, 15.2, 17.0, 16.6, 14.2, 10.3, 6.6, 4.8]
}]
}

Note that this is not JSON; this is a string representing JavaScript code with object, string, array, numeric, and function literals.



Related Topics



Leave a reply



Submit