Python __str__ versus __unicode__
__str__()
is the old method -- it returns bytes. __unicode__()
is the new, preferred method -- it returns characters. The names are a bit confusing, but in 2.x we're stuck with them for compatibility reasons. Generally, you should put all your string formatting in __unicode__()
, and create a stub __str__()
method:
def __str__(self):
return unicode(self).encode('utf-8')
In 3.0, str
contains characters, so the same methods are named __bytes__()
and __str__()
. These behave as expected.
Unicode in django 2
What's happening here is that print()
needs a string representation of the object. This happens by first looking for a __str__()
method and falling back on the __repr__()
method if that doesn't exist.
If there's no __str__()
method, the final fallback is object.__repr__()
at the end of the inheritance chain. Or in the case of django model objects, django.db.models.Model.__str__()
which gives you the output seen in the question.
In django < version 2.0, the __unicode__
method was used instead of __str__
in template rendering. The reason was python 2 compatibility. In current versions of django, use __str__()
instead.
Why __unicode__ doesn't work but __str__ does?
it looks like you are using Python3.x
and here is the relevant documentation on Str and Unicode methods
In Python 2, the object model specifies
__str__()
and__unicode__()
methods. If these methods exist, they must return str (bytes) and
unicode (text) respectively.The print statement and the str() built-in call
__str__()
to determine
the human-readable representation of an object. The unicode() built-in
calls__unicode__()
if it exists, and otherwise falls back to
__str__()
and decodes the result with the system encoding. Conversely, the Model base class automatically derives__str__()
from
__unicode__()
by encoding to UTF-8.In Python 3, there’s simply
__str__()
, which must return str (text).
So
On Python 3, the decorator is a no-op. On Python 2, it defines
appropriate__unicode__()
and__str__()
methods (replacing the
original__str__()
method in the process).
Python unicode character in __str__
Where does that UnicodeEncodeError
occur exactly? I can think about two possible issues here:
The
UnicodeEncodeError
occurs in you__unicode__
method.Your
__unicode__
method returns a byte string instead of a unicode object and that byte string contains non-ASCII characters.
Do you have a __unicode__
method in your class?
I tried this on the Python console according to the actual data from your comment:
>>> u'\u2660'.encode('utf-8')
'\xe2\x99\xa0'
>>> print '\xe2\x99\xa0'
♠
It seems to work. Could you please try to print the same on your console? Maybe your console encoding is the problem.
Python print isn't using __repr__, __unicode__ or __str__ for unicode subclass?
The problem is that print
doesn't respect __str__
on unicode
subclasses.
From PyFile_WriteObject
, used by print
:
int
PyFile_WriteObject(PyObject *v, PyObject *f, int flags)
{
...
if ((flags & Py_PRINT_RAW) &&
PyUnicode_Check(v) && enc != Py_None) {
char *cenc = PyString_AS_STRING(enc);
char *errors = fobj->f_errors == Py_None ?
"strict" : PyString_AS_STRING(fobj->f_errors);
value = PyUnicode_AsEncodedString(v, cenc, errors);
if (value == NULL)
return -1;
PyUnicode_Check(v)
returns true if v
's type is unicode
or a subclass. This code therefore writes unicode objects directly, without consulting __str__
.
Note that subclassing str
and overriding __str__
works as expected:
>>> class mystr(str):
... def __str__(self): return "str"
... def __repr__(self): return "repr"
...
>>> print mystr()
str
as does calling str
or unicode
explicitly:
>>> class myuni(unicode):
... def __str__(self): return "str"
... def __repr__(self): return "repr"
... def __unicode__(self): return "unicode"
...
>>> print myuni()
>>> str(myuni())
'str'
>>> unicode(myuni())
u'unicode'
I believe this could be construed as a bug in Python as currently implemented.
The __str__ method returning a unicode string works in one environment but fails in another
See this related question: Python __str__ versus __unicode__
Basically, you should probably be implementing the special method __unicode__
rather than __str__
, and add a stub __str__
that calls __unicode__
:
def __str__(self):
return unicode(self).encode('utf-8')
What is the difference between __str__ and __repr__?
Alex summarized well but, surprisingly, was too succinct.
First, let me reiterate the main points in Alex’s post:
- The default implementation is useless (it’s hard to think of one which wouldn’t be, but yeah)
__repr__
goal is to be unambiguous__str__
goal is to be readable- Container’s
__str__
uses contained objects’__repr__
Default implementation is useless
This is mostly a surprise because Python’s defaults tend to be fairly useful. However, in this case, having a default for __repr__
which would act like:
return "%s(%r)" % (self.__class__, self.__dict__)
would have been too dangerous (for example, too easy to get into infinite recursion if objects reference each other). So Python cops out. Note that there is one default which is true: if __repr__
is defined, and __str__
is not, the object will behave as though __str__=__repr__
.
This means, in simple terms: almost every object you implement should have a functional __repr__
that’s usable for understanding the object. Implementing __str__
is optional: do that if you need a “pretty print” functionality (for example, used by a report generator).
The goal of __repr__
is to be unambiguous
Let me come right out and say it — I do not believe in debuggers. I don’t really know how to use any debugger, and have never used one seriously. Furthermore, I believe that the big fault in debuggers is their basic nature — most failures I debug happened a long long time ago, in a galaxy far far away. This means that I do believe, with religious fervor, in logging. Logging is the lifeblood of any decent fire-and-forget server system. Python makes it easy to log: with maybe some project specific wrappers, all you need is a
log(INFO, "I am in the weird function and a is", a, "and b is", b, "but I got a null C — using default", default_c)
But you have to do the last step — make sure every object you implement has a useful repr, so code like that can just work. This is why the “eval” thing comes up: if you have enough information so eval(repr(c))==c
, that means you know everything there is to know about c
. If that’s easy enough, at least in a fuzzy way, do it. If not, make sure you have enough information about c
anyway. I usually use an eval-like format: "MyClass(this=%r,that=%r)" % (self.this,self.that)
. It does not mean that you can actually construct MyClass, or that those are the right constructor arguments — but it is a useful form to express “this is everything you need to know about this instance”.
Note: I used %r
above, not %s
. You always want to use repr()
[or %r
formatting character, equivalently] inside __repr__
implementation, or you’re defeating the goal of repr. You want to be able to differentiate MyClass(3)
and MyClass("3")
.
The goal of __str__
is to be readable
Specifically, it is not intended to be unambiguous — notice that str(3)==str("3")
. Likewise, if you implement an IP abstraction, having the str of it look like 192.168.1.1 is just fine. When implementing a date/time abstraction, the str can be "2010/4/12 15:35:22", etc. The goal is to represent it in a way that a user, not a programmer, would want to read it. Chop off useless digits, pretend to be some other class — as long is it supports readability, it is an improvement.
Container’s __str__
uses contained objects’ __repr__
This seems surprising, doesn’t it? It is a little, but how readable would it be if it used their __str__
?
[moshe is, 3, hello
world, this is a list, oh I don't know, containing just 4 elements]
Not very. Specifically, the strings in a container would find it way too easy to disturb its string representation. In the face of ambiguity, remember, Python resists the temptation to guess. If you want the above behavior when you’re printing a list, just
print("[" + ", ".join(l) + "]")
(you can probably also figure out what to do about dictionaries.
Summary
Implement __repr__
for any class you implement. This should be second nature. Implement __str__
if you think it would be useful to have a string version which errs on the side of readability.
Is __repr__ supposed to return bytes or unicode?
The type is str
(for both python2.x and python3.x):
>>> type(repr(object()))
<class 'str'>
This has to be the case because __str__
defaults to calling __repr__
if the former is not present, but __str__
has to return a str
.
For those not aware, in python3.x, str
is the type that represents unicode. In python2.x, str
is the type that represents bytes.
Related Topics
How to Use Pil to Make All White Pixels Transparent
Automating Pydrive Verification Process
Serialize Python Dictionary to Xml
How to Extract an Arbitrary Line of Values from a Numpy Array
How to Plot Nan Values as a Special Color with Imshow in Matplotlib
What Is the Most Pythonic Way to Check If an Object Is a Number
Process to Convert Simple Python Script into Windows Executable
Programmatically Searching Google in Python Using Custom Search
How to Remove Blanks/Na's from Dataframe and Shift the Values Up
How to Access Pandas Groupby Dataframe by Key
How Does _Contains_ Work for Ndarrays
Heapq with Custom Compare Predicate
How to Set Env Variable in Jupyter Notebook
How to Use PDFminer as a Library
Advanced Nested List Comprehension Syntax
Downloading File to Specified Location with Selenium and Python