How to Remove \Xa0 from String in Python

How to remove this \xa0 from a string in python?

If you know for sure that is the only character you don't want, you can .replace it:

>>> word.replace(u'\xa0', ' ')
u'Buffalo, IL 60625'

If you need to handle all non-ascii characters, encoding and replacing bad characters might be a good start...:

>>> word.encode('ascii', 'replace')
'Buffalo,?IL?60625'

How to remove the â\xa0 from list of strings in python

Use unicodedata library. That way you can save more information from each word.

import unicodedata
final_list = [[unicodedata.normalize("NFKD", word) for word in ls] for ls in my_list]

To also replace with a

very_final_list = [[word.encode('ascii', 'ignore') for word in ls] for ls in final_list]

If you want to completely remove then you can

very_final_list = [[word.replace('â', '') for word in ls] for ls in final_list]

and to remove b' in front of every string, decode it back to utf-8

So putting everything together,

import unicodedata
final_list = [[unicodedata.normalize("NFKD", word) for word in ls] for ls in my_list]
very_final_list = [[word.encode('ascii', 'ignore').decode('utf-8') for word in ls] for ls in final_list]
#very_final_list = [[word.replace('â', '') for word in ls] for ls in final_list]

And here is the final result:

[['the', 'production', 'business', 'environmenta evaluating', 'the'], ['impact', 'of', 'the', 'environmental', 'influences', 'such'], ['as', 'political', 'economic', 'technological', 'sociodemographica ']]

If you switch the very_final_list statements, then this is the output

[['the', 'production', 'business', 'environment evaluating', 'the'], ['impact', 'of', 'the', 'environmental', 'influences', 'such'], ['as', 'political', 'economic', 'technological', 'sociodemographic ']]

How to remove \xa0 in unicode from a list

You are missing the u at the beginning of your string:

[el.replace(u'\xa0',' ') for el in list]

I would also avoid using list since it is a built-in function in Python.

Removing \xa0 from string in a list

the easiest way:

lista = [el.replace('\xa0',' ') for el in lista]

Replace non breaking spaces \xa0 inside string

You can use

df = df.replace('\xa0', '', regex=True)

By passing the regex=True option, you trigger re.sub behind the scenes, that replaces all occurrences of non-breaking spaces with an empty string.



Related Topics



Leave a reply



Submit