What Are Data Classes and How Are They Different from Common Classes

What are data classes and how are they different from common classes?

Data classes are just regular classes that are geared towards storing state, rather than containing a lot of logic. Every time you create a class that mostly consists of attributes, you make a data class.

What the dataclasses module does is to make it easier to create data classes. It takes care of a lot of boilerplate for you.

This is especially useful when your data class must be hashable; because this requires a __hash__ method as well as an __eq__ method. If you add a custom __repr__ method for ease of debugging, that can become quite verbose:

class InventoryItem:
    '''Class for keeping track of an item in inventory.'''
    name: str
    unit_price: float
    quantity_on_hand: int = 0

    def __init__(
            self, 
            name: str, 
            unit_price: float,
            quantity_on_hand: int = 0
        ) -> None:
        self.name = name
        self.unit_price = unit_price
        self.quantity_on_hand = quantity_on_hand

    def total_cost(self) -> float:
        return self.unit_price * self.quantity_on_hand
    
    def __repr__(self) -> str:
        return (
            'InventoryItem('
            f'name={self.name!r}, unit_price={self.unit_price!r}, '
            f'quantity_on_hand={self.quantity_on_hand!r})'

    def __hash__(self) -> int:
        return hash((self.name, self.unit_price, self.quantity_on_hand))

    def __eq__(self, other) -> bool:
        if not isinstance(other, InventoryItem):
            return NotImplemented
        return (
            (self.name, self.unit_price, self.quantity_on_hand) == 
            (other.name, other.unit_price, other.quantity_on_hand))

With dataclasses you can reduce it to:

from dataclasses import dataclass

@dataclass(unsafe_hash=True)
class InventoryItem:
    '''Class for keeping track of an item in inventory.'''
    name: str
    unit_price: float
    quantity_on_hand: int = 0

    def total_cost(self) -> float:
        return self.unit_price * self.quantity_on_hand

The same class decorator can also generate comparison methods (__lt__, __gt__, etc.) and handle immutability.

namedtuple classes are also data classes, but are immutable by default (as well as being sequences). dataclasses are much more flexible in this regard, and can easily be structured such that they can fill the same role as a namedtuple class.

The PEP was inspired by the attrs project, which can do even more (including slots, validators, converters, metadata, etc.).

If you want to see some examples, I recently used dataclasses for several of my Advent of Code solutions, see the solutions for day 7, day 8, day 11 and day 20.

If you want to use dataclasses module in Python versions < 3.7, then you could install the backported module (requires 3.6) or use the attrs project mentioned above.

What is the difference between a normal class and a data class in Kotlin?

for a data class.

The compiler automatically derives the following members from all
properties declared in the primary constructor:

equals()/hashCode() pair,

toString() of the form "User(name=John, age=42)",

componentN() functions corresponding to the properties in their order
of declaration,

copy() function (see below).

see https://kotlinlang.org/docs/reference/data-classes.html

Dataclass - why attributes are treated differently from normal classes?

In a nutshell, the @dataclass decorator transforms the definition of the class by extracting variables from the type annotations. The best way to understand what's going on when you can't find the documentation is by looking at the source code.

We can first go to the definition of dataclass and see that it returns a class processed by _process_class(). Inside the function, you can find that it gives a new initializer to the class being decorated, which is basically what you have guessed.

As @juanpa.arrivillaga has pointed out, the reason why your Normal.i is different from Data.i is because Data.i is, by @dataclass, an object attribute while your Normal.i is a class attribute. This is also why setting Data.s has no effect on your object_4.s.

Lastly, this behavior is not elaborated too much inside the docs itself but in the linked PEP557, where it states the exact effects of adding @dataclass.

Data Classes vs typing.NamedTuple primary use cases

It depends on your needs. Each of them has own benefits.

Here is a good explanation of Dataclasses on PyCon 2018 Raymond Hettinger - Dataclasses: The code generator to end all code generators

In Dataclass all implementation is written in Python, whereas in NamedTuple, all of these behaviors come for free because NamedTuple inherits from tuple. And because the tuple structure is written in C, standard methods are faster in NamedTuple (hash, comparing and etc).

Note also that Dataclass is based on dict whereas NamedTuple is based on tuple. Thus, you have advantages and disadvantages of using these structures. For example, space usage is less with a NamedTuple, but time access is faster with a Dataclass.

Please, see my experiment:

In [33]: a = PageDimensionsDC(width=10, height=10)

In [34]: sys.getsizeof(a) + sys.getsizeof(vars(a))
Out[34]: 168

In [35]: %timeit a.width
43.2 ns ± 1.05 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

In [36]: a = PageDimensionsNT(width=10, height=10)

In [37]: sys.getsizeof(a)
Out[37]: 64

In [38]: %timeit a.width
63.6 ns ± 1.33 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

But with increasing the number of attributes of NamedTuple access time remains the same small, because for each attribute it creates a property with the name of the attribute. For example, for our case the part of the namespace of the new class will look like:

from operator import itemgetter

class_namespace = {
...
    'width': property(itemgetter(0, doc="Alias for field number 0")),
    'height': property(itemgetter(0, doc="Alias for field number 1"))**
}

In which cases namedtuple is still a better choice?

When your data structure needs to/can be immutable, hashable, iterable, unpackable, comparable then you can use NamedTuple. If you need something more complicated, for example, a possibility of inheritance for your data structure then use Dataclass.

Is DataClass a good fit to replace a dictionary?

Dataclasses are more of a replacement for NamedTuples, then dictionaries.

Whilst NamedTuples are designed to be immutable, dataclasses can offer that functionality by setting frozen=True in the decorator, but provide much more flexibility overall.

If you are into type hints in your Python code, they really come into play.

The other advantage is like you said - complex nested dictionaries. You can define Dataclasses as your types, and represent them within Dataclasses in a clear and concise way.

Consider the following:

@dataclass
class City:
    code: str
    population: int

@dataclass
class Country:
   code: str
   currency: str
   cities: List[City]

@dataclass
class Locations:
   countries: List[Country]

You can then write functions where you annotate the function param with dataclass name as a type hint and access it's attributes (similar to passing in a dictionary and accessing it's keys), or alternatively construct the dataclass and output it i.e.

def get_locations(....) -> Locations:
....

It makes the code very readable as opposed a large complicated dictionary.

You can also set defaults, which is not something that is allowed in NamedTuples but is allowed in dictionaries.

@dataclass
class Stock:
   quantity: int = 0

You can also control whether you want the dataclass to be ordered etc in the decorator just like whether want it to be frozen, whereas normal dictionaries are not ordered. See here for more information

You get all the benefits of object comparison if you want them i.e. __eq__() etc. They also by default come with __init__ and __repr__ so you don't have to type out those methods manually like with normal classes.

There is also substantially more control over fields, allowing metadata etc.

And lastly you can convert it into a dictionary at the end by importing from dataclasses import dataclass asdict

Using Python class as a data container

If you're really never defining any class methods, a dict or a namedtuple make far more sense, in my opinion. Simple+builtin is good! To each his own, though.

Classes vs. Functions

Create a function. Functions do specific things, classes are specific things.

Classes often have methods, which are functions that are associated with a particular class, and do things associated with the thing that the class is - but if all you want is to do something, a function is all you need.

Essentially, a class is a way of grouping functions (as methods) and data (as properties) into a logical unit revolving around a certain kind of thing. If you don't need that grouping, there's no need to make a class.

Mixing instance attributes and fields in Python data classes

You can manage fields by listing them in the dataclass using field() and passing init=False and compare=False to the field call, although I am not sure this is what you are looking for.

What Are Data Classes and How Are They Different from Common Classes