I can't remove whitespaces from a string parsed by Nokogiri
strip
only removes ASCII whitespace and the character you've got here is a Unicode non-breaking space.
Removing the character is easy. You can use gsub
by providing a regex with the character code:
gsub(/\u00a0/, '')
You could also call
gsub(/[[:space:]]/, '')
to remove all Unicode whitespace. For details, check the Regexp documentation.
I can't remove white spaces from nokogiri node content
Two things to try:
If you're checking the population
variable, your method doesn't actually put the substitution in it. Change the last line to:
population << value.gsub(/\s+/, "")
If that still doesn't work, perhaps there is some non-space character that looks like a space in your terminal? Try replacing non-digits instead:
population << value.gsub(/\D/, "")
How to remove white space from HTML text
Consider this:
require 'nokogiri'
doc = Nokogiri::HTML('<div class="address-thoroughfare mobile-inline-comma ng-binding">Kühlungsborner Straße
10
</div>')
doc.search('div').text
# => "Kühlungsborner Straße\n 10\n "
puts doc.search('div').text
# >> Kühlungsborner Straße
# >> 10
# >>
The given HTML doesn't replicate the problem you're having. It's really important to present valid input that duplicates the problem. Moving on....
Don't use xpath
, css
or search
with text
. You usually won't get what you expect:
require 'nokogiri'
doc = Nokogiri::HTML(<<EOT)
<html>
<body>
<div>
<span>foo</span>
<span>bar</span>
</div>
</body>
</html>
EOT
doc.search('span').class # => Nokogiri::XML::NodeSet
doc.search('span') # => [#<Nokogiri::XML::Element:0x3fdb6981bcd8 name="span" children=[#<Nokogiri::XML::Text:0x3fdb6981b5d0 "foo">]>, #<Nokogiri::XML::Element:0x3fdb6981aab8 name="span" children=[#<Nokogiri::XML::Text:0x3fdb6981a054 "bar">]>]
doc.search('span').text
# => "foobar"
Note that text
returned the concatenated text of all nodes found.
Instead, walk the NodeSet and grab the individual node's text:
doc.search('span').map(&:text)
# => ["foo", "bar"]
Using nokogiri how do I remove all elements with a certain classname
Should be:
doc.css('a.target').remove
puts doc.at('html').to_s
Rails - strip xml import from whitespace and line break
You could use XSLT to remove all the unnecessary characters.
remove whitespace from xml document using ruby
Following should give you what you are looking for
string.gsub(/\\n/, '').gsub(/>\s*/, ">").gsub(/\s*</, "<")
How to remove a node using Nokogiri
1st problem
To remove all the script nodes :
require 'nokogiri'
html = "<div>
This is
<p> very
<script>
some code
</script>
</p>
important.
</div>"
doc = Nokogiri::HTML(html)
doc.xpath("//script").remove
p doc.text
#=> "\n This is\n very\n \n \n important.\n"
Thanks to @theTinMan for his tip (calling remove
on one NodeSet instead of each Node).
2nd problem
To remove the unneeded whitespaces, you can use :
strip
to remove spaces (whitespace, tabs, newlines, ...) at beginning and end of stringgsub
to replace mutiple spaces by just one whitespace
p doc.text.strip.gsub(/[[:space:]]+/,' ')
#=> "This is very important."
Related Topics
Ruby on Rails:How to Implement Cancel Button in Form_Tag
How to Use Ffmpeg on a Remote Machine via Ssh
How to Send Message to All Client Except Sender in Rails/Actioncable
Raise Exception on Shell Command Failure
Ruby: Multiply All Elements of an Array
Heroku Won't Reset My Database
Error Nomethoderror: Undefined Method 'Debug_Rjs=' for Actionview::Base:Class
How to Replicate Class_Inheritable_Accessor's Behavior in Rails 3.1
Capistrano - Can't Deploy My Database.Yml
Before(:Each) for All Tests Except One
How to Join a Table and Count Records in Rails 3
Nokogiri Recursively Get All Children
How to Bundle Install Gemfile with Specific Version of Bundler
Installing Pl/Ruby for Postgresql 8.3
How Does One Match Character or Nothing Using Regular Expression
Invalid Configuration or No Rubies Listed