Suppress the U'Prefix Indicating Unicode' in Python Strings

Suppress the u'prefix indicating unicode' in python strings

You could use Python 3.0.. The default string type is unicode, so the u'' prefix is no longer required..

In short, no. You cannot turn this off.

The u comes from the unicode.__repr__ method, which is used to display stuff in REPL:

>>> print repr(unicode('a'))
u'a'
>>> unicode('a')
u'a'

If I'm not mistaken, you cannot override this without recompiling Python.

The simplest way around this is to simply print the string..

>>> print unicode('a')
a

If you use the unicode() builtin to construct all your strings, you could do something like..

>>> class unicode(unicode):
... def __repr__(self):
... return __builtins__.unicode.__repr__(self).lstrip("u")
...
>>> unicode('a')
a

..but don't do that, it's horrible

What's the u prefix in a Python string?

You're right, see 3.1.3. Unicode Strings.

It's been the syntax since Python 2.0.

Python 3 made them redundant, as the default string type is Unicode. Versions 3.0 through 3.2 removed them, but they were re-added in 3.3+ for compatibility with Python 2 to aide the 2 to 3 transition.

Removing u in list

That 'u' is part of the external representation of the string, meaning it's a Unicode string as opposed to a byte string. It's not in the string, it's part of the type.

As an example, you can create a new Unicode string literal by using the same synax. For instance:

>>> sandwich = u"smörgås"
>>> sandwich
u'sm\xf6rg\xe5s'

This creates a new Unicode string whose value is the Swedish word for sandwich. You can see that the non-English characters are represented by their Unicode code points, ö is \xf6 and å is \xe5. The 'u' prefix appears just like in your example to signify that this string holds Unicode text.

To get rid of those, you need to encode the Unicode string into some byte-oriented representation, such as UTF-8. You can do that with e.g.:

>>> sandwich.encode("utf-8")
'sm\xc3\xb6rg\xc3\xa5s'

Here, we get a new string without the prefix 'u', since this is a byte string. It contains the bytes representing the characters of the Unicode string, with the Swedish characters resulting in multiple bytes due to the wonders of the UTF-8 encoding.

How to get rid of the unicode 'u' from output ? Python

Change this

tab  = [str(rule.to_port), "0.0.0.0/0", str(securityGroup.name), str(getTag(connection, instanceId.split(':')[1]))]

to

tab  = [str(rule.to_port), "0.0.0.0/0", str(securityGroup.name), tuple(list(i.encode('UTF8') for i in getTag(connection, instanceId.split(':')[1])[0:2] ) + [getTag(connection, instanceId.split(':')[1])[2]] )]

What is the difference between u' ' prefix and unicode() in python?

  • u'..' is a string literal, and decodes the characters according to the source encoding declaration.

  • unicode() is a function that converts another type to a unicode object, you've given it a byte string literal. It'll decode a byte string according to the default ASCII codec.

So you created a byte string object using a different type of literal notation, then tried to convert it to a unicode() object, which fails because the default codec for str -> unicode conversions is ASCII.

The two are quite different beasts. If you want to use the latter, you need to give it an explicit codec:

print unicode('上午', 'utf8')

The two are related in the same way that using 0xFF and int('0xFF', 0) are related; the former defines an integer of value 255 using hex notation, the latter uses the int() function to extract an integer from a string.

An alternative method would be to use the str.decode() method:

print '上午'.decode('utf8')

Don't be tempted to use an error handler (such as ignore' or 'replace') unless you know what you are doing. 'ignore' especially can mask underlying issues with having picked the wrong codec, for example.

You may want to read up on Python and Unicode:

  • Pragmatic Unicode by Ned Batchelder

  • The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky

  • The Python Unicode HOWTO

Remove 'u' suffix in an output coming from an API call

nameout.json() is suposedly json response, already parsed into python JSON object.

nameout.json()["tags"] will return the list object with tags. You really don't want to convert it to str in the first place.

u prefix just indicate that these are unicode. You don't need to remove them. They are not printed when you print the list elements properly, e.g.:

tags = [u'tomcat', u'app', u'all', u'subt', u'biz', u'sub1t']
print(', '.join(tags))
for tag in tags:
print(tag)

output:

tomcat, app, all, subt, biz, sub1t
tomcat
app
all
subt
biz
sub1t


Related Topics



Leave a reply



Submit