How to Make Empty Tags Self-Closing with Nokogiri

Create non-self-closed empty tag with Nokogiri

You can use Nokogiri's NO_EMPTY_TAGS save option. (XML calls self-closing tags empty-element tags.)

builder = Nokogiri::XML::Builder.new do |xml|
xml.my_tag({key: :value})
end

puts builder.to_xml(save_with: Nokogiri::XML::Node::SaveOptions::NO_EMPTY_TAGS)
<?xml version="1.0"?>
<my_tag key="value"></my_tag>

Each of the options is represented in a bit, so you can mix and match the ones you want. For example, setting NO_EMPTY_TAGS by itself will leave your XML on one line without spacing or indentation. If you still want it formatted for humans, you can bitwise or (|) it with the FORMAT option.

builder = Nokogiri::XML::Builder.new do |xml|
xml.my_tag({key: :value}) do |my_tag|
my_tag.nested({another: :value})
end
end

puts builder.to_xml(
save_with: Nokogiri::XML::Node::SaveOptions::NO_EMPTY_TAGS
)
puts
puts builder.to_xml(
save_with: Nokogiri::XML::Node::SaveOptions::NO_EMPTY_TAGS |
Nokogiri::XML::Node::SaveOptions::FORMAT
)
<?xml version="1.0"?>
<my_tag key="value"><nested another="value"></nested></my_tag>

<?xml version="1.0"?>
<my_tag key="value">
<nested another="value"></nested>
</my_tag>

There are also a handful of DEFAULT_* options at the end of the list that already combine options into common uses.

Your update mentions "it saves all tags as non-self-closed", as if perhaps you only want this single tag instance to be non-self-closed, and the rest to self close. Nokogiri won't produce an inconsistent document like that, but if you must, you can concatenate some XML strings together that you built with different options.

Nokogiri removes closing tags to some nodes I want with a closing tag

You can use the NO_EMPTY_TAGS option:

doc.to_xml(:save_with => Nokogiri::XML::Node::SaveOptions::NO_EMPTY_TAGS)

or the rather more concise:

doc.to_xml &:no_empty_tags

When using Nokogiri, how do you suppress the insertion of self-closing tags?

With Nokogiri, how can I close unmatched tags?

Nokogiri fixes them automatically.

You can use inner_html to get the corrected HTML code

require 'rubygems'
require 'nokogiri'
doc = Nokogiri::HTML.parse('<p>')
doc.inner_html # => "<html><body><p></p></body></html>"

Nokogiri pull parser (Nokogiri::XML::Reader) issue with self closing tag

There is a feature request on project page regarding this issue (with the corresponding failing test).

Until it will be fixed and pushed into the current version, we'll stick with good'ol

input_text.gsub! /<([^<>]+)\/>/, '<\1></\1>'

Unclosed tags and Nokogiri

Give this a try:

require 'open-uri'
require 'nokogiri'

@doc = Nokogiri::HTML(File.open('t.html', 'r'))
@doc.at_css('#qcbody').to_html

In IRB:

>> @doc.at_css('#qcbody').to_html
=> "<div id="qcbody"> \r\n <form method="post" name="form" id="form" action="#">\r\n <input type="hidden" name="Search Engine" id="Search Engine"><input type="hidden" name="Keyword" id="Keyword"><input type="button" onclick="javascript:validate()" name="sendsubmit" id="sendsubmit" class="submit">\n</form>\r\n <div class="clear"></div>\r\n </div>"

The difference between using Nokogiri::XML and Nokogiri::HTML is the leniency when parsing the document. XML is required to validate and be correct. Some XML parsers would reject an XML file that doesn't meet the standard. Nokogiri allows us to set how picky it is. (And in the case of XML, you can look at the errors array after parsing to see if there is a problem.)

For HTML, Nokogiri relaxes the parser so there's a better chance of handling real-world HTML. I've seen it handle some really ugly markup and keep on going when lesser parsers blew their lunch. If you look at Nokogiri::HTML.parse it has options = XML::ParseOptions::DEFAULT_HTML defined, which are the relaxed settings. You can override that if you want to make sure the HTML conforms.

Building blank XML tags with Nokogiri?

SaveOptions::NO_EMPTY_TAGS will get you what you want.

require 'nokogiri'

builder = Nokogiri::XML::Builder.new do |xml|
xml.blah(nil)
end

puts 'broken:'
puts builder.to_xml
puts 'fixed:'
puts builder.to_xml(save_with: Nokogiri::XML::Node::SaveOptions::NO_EMPTY_TAGS)

output:

(511)-> ruby derp.rb 
broken:
<?xml version="1.0"?>
<blah/>
fixed:
<?xml version="1.0"?>
<blah></blah>


Related Topics



Leave a reply



Submit