Removing Non-Breaking Spaces from Strings Using Python

Replace non breaking spaces \xa0 inside string

You can use

df = df.replace('\xa0', '', regex=True)

By passing the regex=True option, you trigger re.sub behind the scenes, that replaces all occurrences of non-breaking spaces with an empty string.

Python: replace nonbreaking space in Unicode

  1. In Python 2, chr(160) is a byte string of length one whose only byte has value 160, or hex a0. There's no meaning attached to it except in the context of a specific encoding.
  2. I'm not familiar with Eclipse, but it may be playing encoding tricks of its own.
  3. If you want the Unicode character NO-BREAK SPACE, i.e. code point 160, that's unichr(160).

E.g.,

>>> u"hello\u00a0world".replace(unichr(160), "X")
u'helloXworld

Remove/Exclude Non-Breaking Space from Scrapy result

\xa0 is a non-breaking space in Latin1. Replace it like this:

string = string.replace(u'\xa0', u' ')

Update:

You can apply the code as following:

for post in response.xpath('//body'):
item = myItem()
item['article_name'] = post.xpath('//a[@class="title-link"]/span/text()').extract()
item['price'] = post.xpath('//p[@class="display-price"]/span]/text()').extract()
item['price'] = item['price'].replace(u'\xa0', u' ')
if(item['price'].strip()):
yield item

In here you replace the char and then only yield the item if the price is not empty.

remove non-breaking space from python code

This subject discussed on Github page of jupyter-notebook.

Links :

Strip trailing whitespace (Closed)

Trailing whitespace in editor (Open)

Trailing whitespace in editor (Open)

remove (non-breaking) space character in string

You may shorten the test creation to just 2 steps and using just 1 PCRE regex (note the perl=TRUE parameter):

test = sub(",", ".", gsub("(*UCP)[\\s\\p{L}]+|\\W+$", "", area_cult10$V5, perl=TRUE), fixed=TRUE)

Result:

 [1] "11846.4" "6529.2"  "3282.7"  "616.0"   "1621.8"  "125.7"   "14.2"   
[8] "401.6" "455.5" "11.7" "160.4" "79.1" "37.6" "29.6"
[15] "" "13.9" "554.1" "236.7" "312.8" "4.6" "136.9"
[22] "1374.4" "1332.3" "1281.8" "3.7" "5.0" "18.4" "23.4"
[29] "42.0" "2746.2" "106.6" "2100.4" "267.8" "258.4" "13.1"
[36] "23.5" "11.6" "310.2"

The gsub regex is worth special attention:

  • (*UCP) - the PCRE verb that enforces the pattern to be Unicode aware
  • [\\s\\p{L}]+ - matches 1+ whitespace or letter characters
  • | - or (an alternation operator)
  • \\W+$ - 1+ non-word chars at the end of the string.

Then, sub(",", ".", x, fixed=TRUE) will replace the first , with a . as literal strings, fixed=TRUE saves performance since it does not have to compile a regex.



Related Topics



Leave a reply



Submit