How to Access Attributes Using Nokogiri

How to get the value of an attribute using Nokogiri

It's idiomatic to access parameter values by treating the node as a hash:

require 'nokogiri'

doc = Nokogiri::HTML('<div class="foo"></div>')
doc.at('div')['class'] # => "foo"

And, just like a hash, you can assign to it too:

doc.at('div')['class'] = 'bar'
puts doc.to_html

# >> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
# >> <html><body><div class="bar"></div></body></html>

See [] and []= "Modifying Nodes and Attributes" in the documentation.

Nokogiri - Get attributes?

Meditate on this:

require 'nokogiri'
doc = Nokogiri::XML("<root attr=1></root>")
doc.errors # => [#<Nokogiri::XML::SyntaxError: 1:12: FATAL: AttValue: " or ' expected>, #<Nokogiri::XML::SyntaxError: 1:12: FATAL: attributes construct error>, #<Nokogiri::XML::SyntaxError: 1:12: FATAL: Couldn't find end of Start Tag root line 1>, #<Nokogiri::XML::SyntaxError: 1:12: FATAL: Extra content at the end of the document>]

doc.errors is your friend.

Nokogiri to Find All Data Attrabutes Using a Wildcard

You can search for img tags with an attribute that starts with "data-" using the following:

//img[@*[starts-with(name(),'data-')]]

To break this down:

  • // - Anywhere in the document
  • img - img tag
  • @* - All Attributes
  • starts-with(name(),'data-') - Attribute's name starts with "data-"

Example:

require 'nokogiri'

doc = Nokogiri::HTML(<<-END_OF_HTML)
<img src='' />
<img data-method='a' src= ''>
<img data-info='b' src= ''>
<img data-type='c' src= ''>
<img src= ''>
END_OF_HTML

imgs = doc.xpath("//img[@*[starts-with(name(),'data-')]]")

puts imgs
# <img data-method="a" src="">
# <img data-info="b" src="">
# <img data-type="c" src="">

or using your desired loop

doc.css('img').select do |img|
img.xpath(".//@*[starts-with(name(),'data-')]").any?
end
#[#<Nokogiri::XML::Element:0x384 name="img" attributes=[#<Nokogiri::XML::Attr:0x35c name="data-method" value="a">, #<Nokogiri::XML::Attr:0x370 name="src">]>,
# #<Nokogiri::XML::Element:0x3c0 name="img" attributes=[#<Nokogiri::XML::Attr:0x398 name="data-info" value="b">, #<Nokogiri::XML::Attr:0x3ac name="src">]>,
# #<Nokogiri::XML::Element:0x3fc name="img" attributes=[#<Nokogiri::XML::Attr:0x3d4 name="data-type" value="c">, #<Nokogiri::XML::Attr:0x3e8 name="src">]>]

UPDATE To remove the attributes:

doc.css('img').each do |img|
img.xpath(".//@*[starts-with(name(),'data-')]").each(&:remove)
end

puts doc.to_s
#<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" #\"http://www.w3.org/TR/REC-html40/loose.dtd\">
#<html>
#<body>
# <img src=\"\">
# <img src=\"\">
# <img src=\"\">
# <img src=\"\">
# <img src=\"\">
#</body>
#</html>

This can be simplified to doc.xpath("//img/@*[starts-with(name(),'data-')]").each(&:remove)

how can I get some attributes when using Nokogiri

You can use css selector:

result.css("attr[name='English']").children.to_s

will give you "B"

How to get an attribute of the children of a Nokogiri nodeset

You can use @ to get the value of an attribute:

file.xpath('//w:ins/w:r/@w:rsidR|//w:del/w:r/@w:rsidDel').each do |id|
puts id
end

The w:r element inside the w:del element doesn't have a w:rsidR attribute only a w:rsidDel attribute.

how to get attribute values using nokogiri

To select all attributes of an element that is selected using the XPath expression someExpr, you need to evaluate a new XPath expression:

someExpr/@*

where someExpr must be substituted with the real XPath expression used to select the particular element.

This selects all attributes of all (we assume that's just one) elements that are selected by the Xpath expression someExpr

For example, if the element we want is selected by:

/a/b/c 

then all of its attributes are selected by:

/a/b/c/@*


Related Topics



Leave a reply



Submit