How to remove this \xa0 from a string in python?
If you know for sure that is the only character you don't want, you can .replace
it:
>>> word.replace(u'\xa0', ' ')
u'Buffalo, IL 60625'
If you need to handle all non-ascii characters, encoding and replacing bad characters might be a good start...:
>>> word.encode('ascii', 'replace')
'Buffalo,?IL?60625'
How to remove the â\xa0 from list of strings in python
Use unicodedata
library. That way you can save more information from each word.
import unicodedata
final_list = [[unicodedata.normalize("NFKD", word) for word in ls] for ls in my_list]
To also replace â
with a
very_final_list = [[word.encode('ascii', 'ignore') for word in ls] for ls in final_list]
If you want to completely remove â
then you can
very_final_list = [[word.replace('â', '') for word in ls] for ls in final_list]
and to remove b'
in front of every string, decode it back to utf-8
So putting everything together,
import unicodedata
final_list = [[unicodedata.normalize("NFKD", word) for word in ls] for ls in my_list]
very_final_list = [[word.encode('ascii', 'ignore').decode('utf-8') for word in ls] for ls in final_list]
#very_final_list = [[word.replace('â', '') for word in ls] for ls in final_list]
And here is the final result:
[['the', 'production', 'business', 'environmenta evaluating', 'the'], ['impact', 'of', 'the', 'environmental', 'influences', 'such'], ['as', 'political', 'economic', 'technological', 'sociodemographica ']]
If you switch the very_final_list
statements, then this is the output
[['the', 'production', 'business', 'environment evaluating', 'the'], ['impact', 'of', 'the', 'environmental', 'influences', 'such'], ['as', 'political', 'economic', 'technological', 'sociodemographic ']]
How to remove \xa0 in unicode from a list
You are missing the u
at the beginning of your string:
[el.replace(u'\xa0',' ') for el in list]
I would also avoid using list
since it is a built-in function in Python.
Removing \xa0 from string in a list
the easiest way:
lista = [el.replace('\xa0',' ') for el in lista]
Replace non breaking spaces \xa0 inside string
You can use
df = df.replace('\xa0', '', regex=True)
By passing the regex=True
option, you trigger re.sub
behind the scenes, that replaces all occurrences of non-breaking spaces with an empty string.
Related Topics
Apply VS Transform on a Group Object
Rotate Axis Text in Python Matplotlib
Finding Local Maxima/Minima with Numpy in a 1D Numpy Array
Is There a List of Pytz Timezones
Python: Execute Cat Subprocess in Parallel
Python Threading Multiple Bash Subprocesses
Python Beautifulsoup Parsing Table
Using Pip Behind a Proxy with Cntlm
How to Filter Query Objects by Date Range in Django
How to Format a Decimal to Always Show 2 Decimal Places
How to Convert a Utc Datetime to a Local Datetime Using Only Standard Library
Good Python Modules for Fuzzy String Comparison
Should I Always Specify an Exception Type in 'Except' Statements