Validating detailed types in python dataclasses
Instead of checking for type equality, you should use isinstance
. But you cannot use a parametrized generic type (typing.List[int]
) to do so, you must use the "generic" version (typing.List
). So you will be able to check for the container type but not the contained types. Parametrized generic types define an __origin__
attribute that you can use for that.
Contrary to Python 3.6, in Python 3.7 most type hints have a useful __origin__
attribute. Compare:
# Python 3.6
>>> import typing
>>> typing.List.__origin__
>>> typing.List[int].__origin__
typing.List
and
# Python 3.7
>>> import typing
>>> typing.List.__origin__
<class 'list'>
>>> typing.List[int].__origin__
<class 'list'>
Python 3.8 introduce even better support with the typing.get_origin()
introspection function:
# Python 3.8
>>> import typing
>>> typing.get_origin(typing.List)
<class 'list'>
>>> typing.get_origin(typing.List[int])
<class 'list'>
Notable exceptions being typing.Any
, typing.Union
and typing.ClassVar
… Well, anything that is a typing._SpecialForm
does not define __origin__
. Fortunately:
>>> isinstance(typing.Union, typing._SpecialForm)
True
>>> isinstance(typing.Union[int, str], typing._SpecialForm)
False
>>> typing.get_origin(typing.Union[int, str])
typing.Union
But parametrized types define an __args__
attribute that store their parameters as a tuple; Python 3.8 introduce the typing.get_args()
function to retrieve them:
# Python 3.7
>>> typing.Union[int, str].__args__
(<class 'int'>, <class 'str'>)
# Python 3.8
>>> typing.get_args(typing.Union[int, str])
(<class 'int'>, <class 'str'>)
So we can improve type checking a bit:
for field_name, field_def in self.__dataclass_fields__.items():
if isinstance(field_def.type, typing._SpecialForm):
# No check for typing.Any, typing.Union, typing.ClassVar (without parameters)
continue
try:
actual_type = field_def.type.__origin__
except AttributeError:
# In case of non-typing types (such as <class 'int'>, for instance)
actual_type = field_def.type
# In Python 3.8 one would replace the try/except with
# actual_type = typing.get_origin(field_def.type) or field_def.type
if isinstance(actual_type, typing._SpecialForm):
# case of typing.Union[…] or typing.ClassVar[…]
actual_type = field_def.type.__args__
actual_value = getattr(self, field_name)
if not isinstance(actual_value, actual_type):
print(f"\t{field_name}: '{type(actual_value)}' instead of '{field_def.type}'")
ret = False
This is not perfect as it won't account for typing.ClassVar[typing.Union[int, str]]
or typing.Optional[typing.List[int]]
for instance, but it should get things started.
Next is the way to apply this check.
Instead of using __post_init__
, I would go the decorator route: this could be used on anything with type hints, not only dataclasses
:
import inspect
import typing
from contextlib import suppress
from functools import wraps
def enforce_types(callable):
spec = inspect.getfullargspec(callable)
def check_types(*args, **kwargs):
parameters = dict(zip(spec.args, args))
parameters.update(kwargs)
for name, value in parameters.items():
with suppress(KeyError): # Assume un-annotated parameters can be any type
type_hint = spec.annotations[name]
if isinstance(type_hint, typing._SpecialForm):
# No check for typing.Any, typing.Union, typing.ClassVar (without parameters)
continue
try:
actual_type = type_hint.__origin__
except AttributeError:
# In case of non-typing types (such as <class 'int'>, for instance)
actual_type = type_hint
# In Python 3.8 one would replace the try/except with
# actual_type = typing.get_origin(type_hint) or type_hint
if isinstance(actual_type, typing._SpecialForm):
# case of typing.Union[…] or typing.ClassVar[…]
actual_type = type_hint.__args__
if not isinstance(value, actual_type):
raise TypeError('Unexpected type for \'{}\' (expected {} but found {})'.format(name, type_hint, type(value)))
def decorate(func):
@wraps(func)
def wrapper(*args, **kwargs):
check_types(*args, **kwargs)
return func(*args, **kwargs)
return wrapper
if inspect.isclass(callable):
callable.__init__ = decorate(callable.__init__)
return callable
return decorate(callable)
Usage being:
@enforce_types
@dataclasses.dataclass
class Point:
x: float
y: float
@enforce_types
def foo(bar: typing.Union[int, str]):
pass
Appart from validating some type hints as suggested in the previous section, this approach still have some drawbacks:
type hints using strings (
class Foo: def __init__(self: 'Foo'): pass
) are not taken into account byinspect.getfullargspec
: you may want to usetyping.get_type_hints
andinspect.signature
instead;a default value which is not the appropriate type is not validated:
@enforce_type
def foo(bar: int = None):
pass
foo()does not raise any
TypeError
. You may want to useinspect.Signature.bind
in conjuction withinspect.BoundArguments.apply_defaults
if you want to account for that (and thus forcing you to definedef foo(bar: typing.Optional[int] = None)
);variable number of arguments can't be validated as you would have to define something like
def foo(*args: typing.Sequence, **kwargs: typing.Mapping)
and, as said at the beginning, we can only validate containers and not contained objects.
Update
After this answer got some popularity and a library heavily inspired by it got released, the need to lift the shortcomings mentioned above is becoming a reality. So I played a bit more with the typing
module and will propose a few findings and a new approach here.
For starter, typing
is doing a great job in finding when an argument is optional:
>>> def foo(a: int, b: str, c: typing.List[str] = None):
... pass
...
>>> typing.get_type_hints(foo)
{'a': <class 'int'>, 'b': <class 'str'>, 'c': typing.Union[typing.List[str], NoneType]}
This is pretty neat and definitely an improvement over inspect.getfullargspec
, so better use that instead as it can also properly handle strings as type hints. But typing.get_type_hints
will bail out for other kind of default values:
>>> def foo(a: int, b: str, c: typing.List[str] = 3):
... pass
...
>>> typing.get_type_hints(foo)
{'a': <class 'int'>, 'b': <class 'str'>, 'c': typing.List[str]}
So you may still need extra strict checking, even though such cases feels very fishy.
Next is the case of typing
hints used as arguments for typing._SpecialForm
, such as typing.Optional[typing.List[str]]
or typing.Final[typing.Union[typing.Sequence, typing.Mapping]]
. Since the __args__
of these typing._SpecialForm
s is always a tuple, it is possible to recursively find the __origin__
of the hints contained in that tuple. Combined with the above checks, we will then need to filter any typing._SpecialForm
left.
Proposed improvements:
import inspect
import typing
from functools import wraps
def _find_type_origin(type_hint):
if isinstance(type_hint, typing._SpecialForm):
# case of typing.Any, typing.ClassVar, typing.Final, typing.Literal,
# typing.NoReturn, typing.Optional, or typing.Union without parameters
return
actual_type = typing.get_origin(type_hint) or type_hint # requires Python 3.8
if isinstance(actual_type, typing._SpecialForm):
# case of typing.Union[…] or typing.ClassVar[…] or …
for origins in map(_find_type_origin, typing.get_args(type_hint)):
yield from origins
else:
yield actual_type
def _check_types(parameters, hints):
for name, value in parameters.items():
type_hint = hints.get(name, typing.Any)
actual_types = tuple(_find_type_origin(type_hint))
if actual_types and not isinstance(value, actual_types):
raise TypeError(
f"Expected type '{type_hint}' for argument '{name}'"
f" but received type '{type(value)}' instead"
)
def enforce_types(callable):
def decorate(func):
hints = typing.get_type_hints(func)
signature = inspect.signature(func)
@wraps(func)
def wrapper(*args, **kwargs):
parameters = dict(zip(signature.parameters, args))
parameters.update(kwargs)
_check_types(parameters, hints)
return func(*args, **kwargs)
return wrapper
if inspect.isclass(callable):
callable.__init__ = decorate(callable.__init__)
return callable
return decorate(callable)
def enforce_strict_types(callable):
def decorate(func):
hints = typing.get_type_hints(func)
signature = inspect.signature(func)
@wraps(func)
def wrapper(*args, **kwargs):
bound = signature.bind(*args, **kwargs)
bound.apply_defaults()
parameters = dict(zip(signature.parameters, bound.args))
parameters.update(bound.kwargs)
_check_types(parameters, hints)
return func(*args, **kwargs)
return wrapper
if inspect.isclass(callable):
callable.__init__ = decorate(callable.__init__)
return callable
return decorate(callable)
Thanks to @Aran-Fey that helped me improve this answer.
How to validate typing attributes in Python 3.7
You can use the __post_init__
method of dataclasses to do your validations.
Below I just confirm that everything is an instance of the indicated type
from dataclasses import dataclass, fields
def validate(instance):
for field in fields(instance):
attr = getattr(instance, field.name)
if not isinstance(attr, field.type):
msg = "Field {0.name} is of type {1}, should be {0.type}".format(field, type(attr))
raise ValueError(msg)
@dataclass
class Car:
color: str
name: str
wheels: int
def __post_init__(self):
validate(self)
How to enforce dataclass fields' types?
You can declare a custom __post_init__
method (see python's doc) and put all checks there to force type checking. This method can be declare in parent's class to reduce changes amount.
import dataclasses
@dataclasses.dataclass()
class Parent:
def __post_init__(self):
for (name, field_type) in self.__annotations__.items():
if not isinstance(self.__dict__[name], field_type):
current_type = type(self.__dict__[name])
raise TypeError(f"The field `{name}` was assigned by `{current_type}` instead of `{field_type}`")
print("Check is passed successfully")
@dataclasses.dataclass()
class MyClass(Parent):
value: str
obj1 = MyClass(value="1")
obj2 = MyClass(value=1)
The results:
Check is passed successfully
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 3, in __init__
File "<stdin>", line 7, in __post_init__
TypeError: The field `value` was assigned by `<class 'int'>` instead of `<class 'str'>`
Validating input when mutating a dataclass
Dataclasses are a mechanism to provide a default initialization to accept the attributes as parameters, and a nice representation, plus some niceties like the __post_init__
hook.
Fortunatelly, they do not mess with any other mechanism for attribute access in Python - and you can still have your dataclassess attributes being created as property
descriptors, or a custom descriptor class if you want. In that way, any attribute access will go through your getter and setter functions automatically.
The only drawback for using the default property
built-in is that you have to use it in the "old way", and not with the decorator syntax - that allows you to create annotations for your attributes.
So, "descriptors" are special objects assigned to class attributes in Python in a way that any access to that attribute will call the descriptors __get__
, __set__
or __del__
methods. The property
built-in is a convenince to build a descriptor passed 1 to 3 functions taht will be called from those methods.
So, with no custom descriptor-thing, you could do:
@dataclass
class MyClass:
def setname(self, value):
if not isinstance(value, str):
raise TypeError(...)
self.__dict__["name"] = value
def getname(self):
return self.__dict__.get("name")
name: str = property(getname, setname)
# optionally, you can delete the getter and setter from the class body:
del setname, getname
By using this approach you will have to write each attribute's access as two methods/functions, but will no longer need to write your __post_init__
: each attribute will validate itself.
Also note that this example took the little usual approach of storing the attributes normally in the instance's __dict__
. In the examples around the web, the practice is to use normal attribute access, but prepending the name with a _
. This will leave these attributes polluting a dir
on your final instance, and the private attributes will be unguarded.
Another approach is to write your own descriptor class, and let it check the instance and other properties of the attributes you want to guard. This can be as sofisticated as you want, culminating with your own framework. So for a descriptor class that will check for attribute type and accept a validator-list, you will need:
def positive_validator(name, value):
if value <= 0:
raise ValueError(f"values for {name!r} have to be positive")
class MyAttr:
def __init__(self, type, validators=()):
self.type = type
self.validators = validators
def __set_name__(self, owner, name):
self.name = name
def __get__(self, instance, owner):
if not instance: return self
return instance.__dict__[self.name]
def __delete__(self, instance):
del instance.__dict__[self.name]
def __set__(self, instance, value):
if not isinstance(value, self.type):
raise TypeError(f"{self.name!r} values must be of type {self.type!r}")
for validator in self.validators:
validator(self.name, value)
instance.__dict__[self.name] = value
#And now
@dataclass
class Person:
name: str = MyAttr(str)
age: float = MyAttr((int, float), [positive_validator,])
That is it - creating your own descriptor class requires a bit more knowledge about Python, but the code given above should be good for use, even in production - you are welcome to use it.
Note that you could easily add a lot of other checks and transforms for each of your attributes -
and the code in __set_name__
itself could be changed to introspect the __annotations__
in the owner
class to automatically take note of the types - so that the type parameter would not be needed for the MyAttr
class itself. But as I said before: you can make this as sophisticated as you want.
Python dataclass, what's a pythonic way to validate initialization arguments?
Define a __post_init__
method on the class; the generated __init__
will call it if defined:
from dataclasses import dataclass
@dataclass
class MyClass:
is_good: bool = False
is_bad: bool = False
def __post_init__(self):
if self.is_good:
assert not self.is_bad
This will even work when the replace
function is used to make a new instance.
Validate dataclass field with custom defined method?
The ideal approach would be to use a modified version of the Validator
example from the Python how-to guide on descriptors.
For example:
from abc import ABC, abstractmethod
from dataclasses import dataclass, MISSING
class Validator(ABC):
def __set_name__(self, owner, name):
self.private_name = '_' + name
def __get__(self, obj, obj_type=None):
return getattr(obj, self.private_name)
def __set__(self, obj, value):
self.validate(value)
setattr(obj, self.private_name, value)
@abstractmethod
def validate(self, value):
"""Note: subclasses must implement this method"""
class String(Validator):
# You may or may not want a default value
def __init__(self, default: str = MISSING, minsize=None, maxsize=None, predicate=None):
self.default = default
self.minsize = minsize
self.maxsize = maxsize
self.predicate = predicate
# override __get__() to return a default value if one is not passed in to __init__()
def __get__(self, obj, obj_type=None):
return getattr(obj, self.private_name, self.default)
def validate(self, value):
if not isinstance(value, str):
raise TypeError(f'Expected {value!r} to be an str')
if self.minsize is not None and len(value) < self.minsize:
raise ValueError(
f'Expected {value!r} to be no smaller than {self.minsize!r}'
)
if self.maxsize is not None and len(value) > self.maxsize:
raise ValueError(
f'Expected {value!r} to be no bigger than {self.maxsize!r}'
)
if self.predicate is not None and not self.predicate(value):
raise ValueError(
f'Expected {self.predicate} to be true for {value!r}'
)
@dataclass
class A:
y: str = String(default='DEFAULT', minsize=5, maxsize=10, predicate=str.isupper) # Descriptor instance
x: int = 5
a = A()
print(a)
a = A('TESTING!!')
print(a)
try:
a.y = 'testing!!'
except Exception as e:
print('Error:', e)
try:
a = A('HEY')
except Exception as e:
print('Error:', e)
try:
a = A('HELLO WORLD!')
except Exception as e:
print('Error:', e)
try:
a.y = 7
except Exception as e:
print('Error:', e)
Output:
A(y='DEFAULT', x=5)
A(y='TESTING!!', x=5)
Error: Expected <method 'isupper' of 'str' objects> to be true for 'testing!!'
Error: Expected 'HEY' to be no smaller than 5
Error: Expected 'HELLO WORLD!' to be no bigger than 10
Error: Expected 7 to be an str
Related Topics
How to Show Explosion Image When Collision Happens
Why Do Many Examples Use 'Fig, Ax = Plt.Subplots()' in Matplotlib/Pyplot/Python
How to Use Python to Get the System Hostname
Setting Django Up to Use MySQL
Use Groupby in Pandas to Count Things in One Column in Comparison to Another
How to Check If a Column Exists in Pandas
Running an Interactive Command from Within Python
Python and Openssl Version Reference Issue on Os X
How to Use Win32 API with Python
Cancel an Already Executing Task with Celery
List' Object Has No Attribute 'Get_Attribute' While Iterating Through Webelements
How to Make an Image with a Transparent Backround in Pygame
Why Use Os.Path.Join Over String Concatenation