Replace non breaking spaces \xa0 inside string
You can use
df = df.replace('\xa0', '', regex=True)
By passing the regex=True
option, you trigger re.sub
behind the scenes, that replaces all occurrences of non-breaking spaces with an empty string.
Python: replace nonbreaking space in Unicode
- In Python 2,
chr(160)
is a byte string of length one whose only byte has value 160, or hex a0. There's no meaning attached to it except in the context of a specific encoding. - I'm not familiar with Eclipse, but it may be playing encoding tricks of its own.
- If you want the Unicode character
NO-BREAK SPACE
, i.e. code point 160, that'sunichr(160)
.
E.g.,
>>> u"hello\u00a0world".replace(unichr(160), "X")
u'helloXworld
Remove/Exclude Non-Breaking Space from Scrapy result
\xa0
is a non-breaking space in Latin1. Replace it like this:
string = string.replace(u'\xa0', u' ')
Update:
You can apply the code as following:
for post in response.xpath('//body'):
item = myItem()
item['article_name'] = post.xpath('//a[@class="title-link"]/span/text()').extract()
item['price'] = post.xpath('//p[@class="display-price"]/span]/text()').extract()
item['price'] = item['price'].replace(u'\xa0', u' ')
if(item['price'].strip()):
yield item
In here you replace the char and then only yield the item if the price is not empty.
remove non-breaking space from python code
This subject discussed on Github page of jupyter-notebook.
Links :
Strip trailing whitespace (Closed)
Trailing whitespace in editor (Open)
Trailing whitespace in editor (Open)
remove (non-breaking) space character in string
You may shorten the test
creation to just 2 steps and using just 1 PCRE regex (note the perl=TRUE
parameter):
test = sub(",", ".", gsub("(*UCP)[\\s\\p{L}]+|\\W+$", "", area_cult10$V5, perl=TRUE), fixed=TRUE)
Result:
[1] "11846.4" "6529.2" "3282.7" "616.0" "1621.8" "125.7" "14.2"
[8] "401.6" "455.5" "11.7" "160.4" "79.1" "37.6" "29.6"
[15] "" "13.9" "554.1" "236.7" "312.8" "4.6" "136.9"
[22] "1374.4" "1332.3" "1281.8" "3.7" "5.0" "18.4" "23.4"
[29] "42.0" "2746.2" "106.6" "2100.4" "267.8" "258.4" "13.1"
[36] "23.5" "11.6" "310.2"
The gsub
regex is worth special attention:
(*UCP)
- the PCRE verb that enforces the pattern to be Unicode aware[\\s\\p{L}]+
- matches 1+ whitespace or letter characters|
- or (an alternation operator)\\W+$
- 1+ non-word chars at the end of the string.
Then, sub(",", ".", x, fixed=TRUE)
will replace the first ,
with a .
as literal strings, fixed=TRUE
saves performance since it does not have to compile a regex.
Related Topics
Tensorflow:Attributeerror: 'Module' Object Has No Attribute 'Mul'
How to Flatten a Hierarchical Index in Columns
Pandas: Sum Dataframe Rows for Given Columns
Python Ssl.Sslerror: [Ssl: Certificate_Verify_Failed] Certificate Verify Failed (_Ssl.C:748)
How to Ignore Null Byte When Reading a CSV File
How to Convert a Float into Hex
Putting Multiple Conditions Using Np.Where on Python Pandas
How Can My Model Primary Key Start With a Specific Number
Is There a Short-Hand for Nth Root of X in Python
Python Searching for Partial Matches in a List
How to Convert Column With Dtype as Object to String in Pandas Dataframe
In Dictionary, Converting the Value from String to Integer
How to Compare Two Image Files Contents in Python
Selecting Specific Rows of CSV Based on a Column'S Value in Python