What Are "Named Tuples" in Python

What are named tuples in Python?

Named tuples are basically easy-to-create, lightweight object types. Named tuple instances can be referenced using object-like variable dereferencing or the standard tuple syntax. They can be used similarly to struct or other common record types, except that they are immutable. They were added in Python 2.6 and Python 3.0, although there is a recipe for implementation in Python 2.4.

For example, it is common to represent a point as a tuple (x, y). This leads to code like the following:

pt1 = (1.0, 5.0)
pt2 = (2.5, 1.5)

from math import sqrt
line_length = sqrt((pt1[0]-pt2[0])**2 + (pt1[1]-pt2[1])**2)

Using a named tuple it becomes more readable:

from collections import namedtuple
Point = namedtuple('Point', 'x y')
pt1 = Point(1.0, 5.0)
pt2 = Point(2.5, 1.5)

from math import sqrt
line_length = sqrt((pt1.x-pt2.x)**2 + (pt1.y-pt2.y)**2)

However, named tuples are still backwards compatible with normal tuples, so the following will still work:

Point = namedtuple('Point', 'x y')
pt1 = Point(1.0, 5.0)
pt2 = Point(2.5, 1.5)

from math import sqrt
# use index referencing
line_length = sqrt((pt1[0]-pt2[0])**2 + (pt1[1]-pt2[1])**2)
# use tuple unpacking
x1, y1 = pt1

Thus, you should use named tuples instead of tuples anywhere you think object notation will make your code more pythonic and more easily readable. I personally have started using them to represent very simple value types, particularly when passing them as parameters to functions. It makes the functions more readable, without seeing the context of the tuple packing.

Furthermore, you can also replace ordinary immutable classes that have no functions, only fields with them. You can even use your named tuple types as base classes:

class Point(namedtuple('Point', 'x y')):
[...]

However, as with tuples, attributes in named tuples are immutable:

>>> Point = namedtuple('Point', 'x y')
>>> pt1 = Point(1.0, 5.0)
>>> pt1.x = 2.0
AttributeError: can't set attribute

If you want to be able change the values, you need another type. There is a handy recipe for mutable recordtypes which allow you to set new values to attributes.

>>> from rcdtype import *
>>> Point = recordtype('Point', 'x y')
>>> pt1 = Point(1.0, 5.0)
>>> pt1 = Point(1.0, 5.0)
>>> pt1.x = 2.0
>>> print(pt1[0])
2.0

I am not aware of any form of "named list" that lets you add new fields, however. You may just want to use a dictionary in this situation. Named tuples can be converted to dictionaries using pt1._asdict() which returns {'x': 1.0, 'y': 5.0} and can be operated upon with all the usual dictionary functions.

As already noted, you should check the documentation for more information from which these examples were constructed.

Dictionary of (named) tuples in Python and speed/RAM performance

Cython's cdef-classes might be what you want: They use less memory than the pure Python classes, even if it comes at costs of more overhead when accessing members (because fields are stored as C-values and not Python-objects).

For example:

%%cython
cdef class CTuple:
cdef public unsigned long long int id
cdef public str name
cdef public bint isvalid

def __init__(self, id, name, isvalid):
self.id = id
self.name = name
self.isvalid = isvalid

which can be used as wished:

ob=CTuple(1,"mmm",3)
ob.id, ob.name, ob.isvalid # prints (2, "mmm", 3)

Timings/memory consumption:

First, the baseline on my machine:

0.258 s  252.4 MB  # tuples
0.343 s 417.5 MB # dict
1.181 s 264.0 MB # namedtuple collections

with CTuple we get:

0.306 s  191.0 MB

which is almost as fast and needs considerable less memory.

If the C-type of members isn't clear at compile time, one could use simple python-objects:

%%cython
cdef class PTuple:
cdef public object id
cdef public object name
cdef public object isvalid

def __init__(self, id, name, isvalid):
self.id = id
self.name = name
self.isvalid = isvalid

The timings are a little bit surprising:

0.648 s  249.8 MB

I didn't expect it to be so much slower than the CTuple-version, but at least it is twice as fast as named tuples.


One disadvantage of this approach is that it needs compilation. Cython however offers cython.inline which can be used to compile Cython-code created on-the-fly.

I've released cynamedtuple which can be installed via pip install cynamedtuple, and is based on the prototype bellow:

import cython

# for generation of cython code:
tab = " "
def create_members_definition(name_to_ctype):
members = []
for my_name, my_ctype in name_to_ctype.items():
members.append(tab+"cdef public "+my_ctype+" "+my_name)
return members

def create_signature(names):
return tab + "def __init__(self,"+", ".join(names)+"):"

def create_initialization(names):
inits = [tab+tab+"self."+x+" = "+x for x in names]
return inits

def create_cdef_class_code(classname, names):
code_lines = ["cdef class " + classname + ":"]
code_lines.extend(create_members_definition(names))
code_lines.append(create_signature(names.keys()))
code_lines.extend(create_initialization(names.keys()))
return "\n".join(code_lines)+"\n"

# utilize cython.inline to generate and load pyx-module:
def create_cnamedtuple_class(classname, names):
code = create_cdef_class_code(classname, names)
code = code + "GenericClass = " + classname +"\n"
ret = cython.inline(code)
return ret["GenericClass"]

Which can be used as follows, to dynamically define CTuple from above:

CTuple = create_cnamedtuple_class("CTuple", 
{"id":"unsigned long long int",
"name":"str",
"isvalid":"bint"})

ob = CTuple(1,"mmm",3)
...

Another alternative could be to use jit-compilation and Numba's jitted-classes which offer this possibility. They however seem to be much slower:

from numba import jitclass, types

spec = [
('id', types.uint64),
('name', types.string),
('isvalid', types.uint8),
]

@jitclass(spec)
class NBTuple(object):
def __init__(self, id, name, isvalid):
self.id = id
self.name = name
self.isvalid = isvalid

and the results are:

20.622 s  394.0 MB

so numba jitted classes are not (yet?) a good choice.

NamedTuples, Hashable and Python

NamedTuple is based on the tuple class. See collections.namedtuple()

The hash of a tuple is the combined hash of all the elements. See tupleobject.c

Since set is unhashable it is not possible to hash a tuple or NamedTuple containing a set.

And since the hashing of a set is implemented in C you don't see the traceback

Can namedtuples be used in set()?

Yes, they are hashable, and can be used in sets, like tuples.

The gotcha is that a tuple of mutable objects can change underneath you.

Tuples composed of immutable objects are safe in this regards.

Not sure it is a gotcha, but it is worth noting @user2357112supportsMonica's remark in the comments:

The other gotcha is that a namedtuple is still a tuple, and its
__hash__ and __eq__ are ordinary tuple hash and equality. A namedtuple will compare equal to ordinary tuples or instances of unrelated
namedtuple classes if the contents match.

When and why should I use a namedtuple instead of a dictionary?

In dicts, only the keys have to be hashable, not the values. namedtuples don't have keys, so hashability isn't an issue.

However, they have a more stringent restriction -- their key-equivalents, "field names", have to be strings.

Basically, if you were going to create a bunch of instances of a class like:

class Container:
def __init__(self, name, date, foo, bar):
self.name = name
self.date = date
self.foo = foo
self.bar = bar

mycontainer = Container(name, date, foo, bar)

and not change the attributes after you set them in __init__, you could instead use

Container = namedtuple('Container', ['name', 'date', 'foo', 'bar'])

mycontainer = Container(name, date, foo, bar)

as a replacement.

Of course, you could create a bunch of dicts where you used the same keys in each one, but assuming you will have only valid Python identifiers as keys and don't need mutability,

mynamedtuple.fieldname

is prettier than

mydict['fieldname']

and

mynamedtuple = MyNamedTuple(firstvalue, secondvalue)

is prettier than

mydict = {'fieldname': firstvalue, 'secondfield': secondvalue}

Finally, namedtuples are ordered, unlike regular dicts, so you get the items in the order you defined the fields, unlike a dict.

How named tuples are implemented internally in python?

Actually, it's very easy to find out how a given namedtuple is implemented: if you pass the keyword argument verbose=True when creating it, its class definition is printed:

>>> Point = namedtuple('Point', "x y", verbose=True)
from builtins import property as _property, tuple as _tuple
from operator import itemgetter as _itemgetter
from collections import OrderedDict

class Point(tuple):
'Point(x, y)'

__slots__ = ()

_fields = ('x', 'y')

def __new__(_cls, x, y):
'Create new instance of Point(x, y)'
return _tuple.__new__(_cls, (x, y))

@classmethod
def _make(cls, iterable, new=tuple.__new__, len=len):
'Make a new Point object from a sequence or iterable'
result = new(cls, iterable)
if len(result) != 2:
raise TypeError('Expected 2 arguments, got %d' % len(result))
return result

def _replace(_self, **kwds):
'Return a new Point object replacing specified fields with new values'
result = _self._make(map(kwds.pop, ('x', 'y'), _self))
if kwds:
raise ValueError('Got unexpected field names: %r' % list(kwds))
return result

def __repr__(self):
'Return a nicely formatted representation string'
return self.__class__.__name__ + '(x=%r, y=%r)' % self

@property
def __dict__(self):
'A new OrderedDict mapping field names to their values'
return OrderedDict(zip(self._fields, self))

def _asdict(self):
'''Return a new OrderedDict which maps field names to their values.
This method is obsolete. Use vars(nt) or nt.__dict__ instead.
'''
return self.__dict__

def __getnewargs__(self):
'Return self as a plain tuple. Used by copy and pickle.'
return tuple(self)

def __getstate__(self):
'Exclude the OrderedDict from pickling'
return None

x = _property(_itemgetter(0), doc='Alias for field number 0')

y = _property(_itemgetter(1), doc='Alias for field number 1')

So, it's a subclass of tuple with some extra methods to give it the required behaviour, a _fields class-level constant containing the field names, and property methods for attribute access to the tuple's members.

As for the code behind actually building this class definition, that's deep magic.

list of namedtuples; how to calculate the sum of individual elements

If you don't care what value the non-numeric values will have, you could use zip to form a new tuple with the totals:

R = Trade(*((max,sum)[isinstance(v[0],(int,float))](v) 
for v in zip(*listofstuff)))

print(R)
Trade(Ticker='foo', Date='{2020:12:24}', QTY=150, Sell=550.0,
Buy=600.0, Profit=-50.0)

alternatively, you could place a None value in the non total fields:

R = Trade(*( sum(v) if isinstance(v[0],(int,float)) else None 
for v in zip(*listofstuff)))

print(R)
Trade(Ticker=None, Date=None, QTY=150, Sell=550.0, Buy=600.0, Profit=-50.0)

If the aggregation type varies with each field (e.g. min for some fields, sum for others, etc), you can prepare a dictionary of aggregation functions and use it in the comprehension:

aggregate = {'QTY':sum, 'Sell':min, 'Buy':max, 'Profit':lambda v:sum(v)/len(v)}

R = Trade(*(aggregate.get(f,lambda _:None)(v)
for f,v in zip(Trade._fields,zip(*listofstuff))))

print(R)
Trade(Ticker=None, Date=None, QTY=150, Sell=50.0, Buy=500.0, Profit=-25.0)

Python namedtuples elementwise addition

This is a possible solution:

class Point(namedtuple("Point", "x y")):
def __add__(self, other):
return Point(**{field: getattr(self, field) + getattr(other, field)
for field in self._fields})


Related Topics



Leave a reply



Submit