How to Remove a Node with Nokogiri

How do I remove a node with Nokogiri?

have a try!

f = Nokogiri::XML.fragment(str)

f.search('.//img').remove
puts f

How to remove a node using Nokogiri


1st problem

To remove all the script nodes :

require 'nokogiri'

html = "<div>
This is
<p> very
<script>
some code
</script>
</p>
important.
</div>"

doc = Nokogiri::HTML(html)

doc.xpath("//script").remove

p doc.text
#=> "\n This is\n very\n \n \n important.\n"

Thanks to @theTinMan for his tip (calling remove on one NodeSet instead of each Node).

2nd problem

To remove the unneeded whitespaces, you can use :

  • strip to remove spaces (whitespace, tabs, newlines, ...) at beginning and end of string
  • gsub to replace mutiple spaces by just one whitespace


p doc.text.strip.gsub(/[[:space:]]+/,' ')
#=> "This is very important."

How to remove an XML node searching by child node value using Nokogiri?

I solved the question with this code:

xml.xpath("//Carga//Imoveis//Imovel[CodigoImovel='"+re_code+"']").remove

The XPATH allows to search in specific node by value. This query returns the specific node. Then the job is to remove the node.

The accepted answer in this question helped me in the solution.

Using nokogiri how do I remove all elements with a certain classname

Should be:

doc.css('a.target').remove
puts doc.at('html').to_s

How to remove an element from XML using Nokogiri

This is more idiomatic Nokogiri and Ruby code:

require 'nokogiri'

xml =<<EOT
<products>
<product>
<name> product1 </name>
<price> 21 </price>
</product>
<product>
<name> product2 </name>
<price> 0 </price>
</product>
<product>
<name> product3 </name>
<price> 10 </price>
</product>
</products>
EOT

doc = Nokogiri::XML(xml)

# strip the offending nodes
doc.xpath('//product/price[text()=" 0 "]/..').remove

At this point the resulting XML looks like:

doc.to_xml
# => "<?xml version=\"1.0\"?>\n" +
# "<products>\n" +
# " <product>\n" +
# " <name> product1 </name>\n" +
# " <price> 21 </price>\n" +
# " </product>\n" +
# " \n" +
# " <product>\n" +
# " <name> product3 </name>\n" +
# " <price> 10 </price>\n" +
# " </product>\n" +
# " </products>\n"

Then simply write it:

File.write('myfile.xml', doc.to_xml)

How to delete a Node from a Nokogiri Nodeset?

It is not strange to me.

my_nodeset.last.remove means:

call Nodeset my_nodeset then go to its last Node member and call remove method (owned by last). You want to ask to a Node method to modify a NodeSet. That's semantically wrong to me.

my_nodeset.delete(my_nodeset.last) is how it should be.

Ruby, Nokogiri Remove ul element selected by class= foo

You are not providing much context or detail here. But, the following code should remove the item you want if you are selecting it correctly. Please provide more details such as your output received, output expected, etc.

Given the limited information, you could try this bit:

UPDATE:

html.html

<div class="metis manual-toogle" id="tocList">...
<li id="tocElement-ebook_cs_1111111_11">...
<a data-content href="url" class=" "></a> <!-- only this urls I want -->
<ul class="foo">
<!-- the following content and urls I want to remove -->
<li class id="tocElement-ebook_cs_1111111_cs12">
<a data-content href="url" class=" "></a>
...
<a data-content href="url" class=" "></a>
</li>
</ul>
</li>
</div>

main.rb

require 'nokogiri'
require 'open-uri'
require 'pry'

doc = Nokogiri::HTML(open('html.html'))

doc.xpath('//ul[@class="foo"]').remove

doc.xpath('//a').each do |item|
puts item
end

Output:

~/code/projects/test ⌚ 8:28:32
$ ruby main.rb ‹2.6.1›
<a data-content href="urliwant" class=" "></a>

We worked this out through chat. Above example works. But, for his specific case we needed to do this because of the messy html:

document = Nokogiri::HTML(open('html.html'))

document.css('//ul//ul//ul').remove
document.css('ul .collapse').remove

links = document.xpath('//*[@id="toc"]//ul')

File.open("input.html", "a") do |output_txt|
links.each do |item|
output_txt.write(item)
end
end


Related Topics



Leave a reply



Submit