Trouble creating DefaultDict in Dataclass - python-3.x

I am having trouble setting up a simple dataclass using a new defaultdict(dict).
If I tell the factory to use 'dict' as below , instantiation fails with typerror collection.defaultdict object is not callable
from collections import defaultdict
from dataclasses import dataclass, field
#dataclass
class ResultSet:
changed: bool = False
mqttdata: defaultdict(dict) = field(default_factory=defaultdict(dict)) # does not work!
It does sorta work using field(default_factory=defaultdict) but then my code will fail later when it encounters missing keys - presumably because defaultdict was not set up for dict.
How do I properly set up a new defaultdict(dict) in a dataclass?

You have a few problems with the code and how you are using dataclasses currently:
Type generics in annotations need to be specified through square brackets [] generally, so default[dict] instead of defaultdict(dict) for example.
The default_factory argument to dataclasses.field() needs to be a no-arg callable which returns a new object with default values set. For example, assuming you have a nested dataclass Inner which specifies defaults for all fields, you could use default_factory=Inner to create a new Inner object each time the main dataclass is instantiated.
Note that the default_factory argument is mainly useful for mutable types
such as set, list, and dict, so that the same object isn't
shared (and potentially mutated) between dataclass instances.
Putting it all together, here is the working code which sets a default value for a field of type defaultdict[dict]:
from collections import defaultdict
from dataclasses import dataclass, field
#dataclass
class ResultSet:
changed: bool = False
mqttdata: defaultdict[dict] = field(default_factory=lambda: defaultdict(dict)) # works!
print(ResultSet())
In Python versions earlier than 3.9, which is when PEP 585 was introduced, you'll need to add the following import at the top so that any type annotations are lazy-evaluated:
from __future__ import annotations

Related

MyPy not considering dataclass attribute mechanics

I am developing a Python3.8 project with usage of typing and dataclasses, and automatic tests include mypy. This brings me in a strange behavior that I do not really understand...
In short: Mypy seems not to understand dataclass attributes mechanics that, to my understanding, make them instance attributes.
Here's a minimal example, with a package and two modules:
__init__.py: void
app_events.py:
class AppEvent:
pass
main.py:
import dataclasses
import typing
from . import app_events
class B:
"""Class with *app_events* as instance attribute."""
def __init__(self):
self.app_events: typing.List[app_events.AppEvent] = []
def bar(self) -> app_events.AppEvent:
# no mypy complaint here: the import is correctly distinguished
# from the attribute
...
class C:
"""Class with *app_events* as class attribute."""
app_events: List[app_events.AppEvent]
def chew(self) -> app_events.AppEvent:
# mypy considers app_events to be the class attribute
...
#dataclasses.dataclass
class D:
app_events: typing.List[app_events.AppEvent] = \
dataclasses.field(default_factory=list)
def doo(self) -> app_events.AppEvent:
# same here: mypy considers app_events to be the class attribute
...
And the typecheck result:
PyCharm complains, for methods C.chew and D.doo: Unresolved attribute reference 'AppEvent' for class 'list'
mypy complains, still for methods C.chew and D.doo, that error: Name 'app_events.AppEvent' is not defined.
No issue for B.bar as written, though if app_events attribute is declared as a class attribute (instead of being defined in self.__init__, then mypy raise the same complaint.)
-> any idea how to understand/solve/circumvent this elegantly?
I'd really like not to rename my module and attributes, but if you have nice names in mind, please do not hesitate to propose :-)

Sphinx autodoc does not display all types or circular import error

I am trying to auto document types with sphinx autodoc, napoleon and autodoc_typehints but I am having problems as it does not work with most of my types. I am using the deap package to do some genetic optimization algorithm, which makes that I have some very specific types I guess sphinx cannot handle.
My conf.py file looks like this:
import os
import sys
sys.path.insert(0, os.path.abspath('../python'))
extensions = [
'sphinx.ext.autodoc',
'sphinx.ext.viewcode',
'sphinx.ext.napoleon',
'sphinx_autodoc_typehints'
]
set_type_checking_flag = False
always_document_param_types = False
I have an Algo.rst file with:
.. automodule:: python.algo.algo
:members: crossover_worker,
test
and my python.algo.algo module looks like this (I've added a dummy test function to show it works whenever I have no special types specified):
# Type hinting imports
from config.config import Config
from typing import List, Set, Dict, NamedTuple, Union, Tuple
from types import ModuleType
from numpy import ndarray
from numpy import float64
from multiprocessing.pool import MapResult
from deap.tools.support import Logbook, ParetoFront
from deap.base import Toolbox
from pandas.core.frame import DataFrame
from deap import creator
...
def crossover_worker(sindices: List[creator.Individual, creator.Individual]) -> Tuple[creator.Individual, creator.Individual]:
"""
Uniform crossover using fixed threshold
Args:
sindices: list of two individuals on which we want to perform crossover
Returns:
tuple of the two individuals with crossover applied
"""
ind1, ind2 = sindices
size = len(ind1)
for i in range(size):
if random.random() < 0.4:
ind1[i], ind2[i] = ind2[i], ind1[i]
return ind1, ind2
def test(a: DataFrame, b: List[int]) -> float:
"""
test funcition
Args:
a: something
b: something
Returns:
something
"""
return b
When settings in conf.py are like above I have no error, types for my test function are correct, but types for my crossover_worker function are missing:
However, when I set the set_type_checking_flag= True to force using all types, I have a circular import error:
reading sources... [100%] index
WARNING: autodoc: failed to import module 'algo' from module 'python.algo'; the following exception was raised:
cannot import name 'ArrayLike' from partially initialized module 'pandas._typing' (most likely due to a circular import) (/usr/local/lib/python3.8/site-packages/pandas/_typing.py)
looking for now-outdated files... none found
And I never import ArrayLike so I don't get it from where it comes or how to solve it?
Or how to force to import also the creator.Individual types that appear everywhere in my code?
My sphinx versions:
sphinx==3.0.1
sphinx-autodoc-typehints==1.10.3
After some searching there were some flaws with my approach:
Firstly a "list is a homogeneous structure containing values of one type. As such, List only takes a single type, and every element of that list has to have that type." (source). Consequently, I cannot do something like List[creator.Individual, creator.Individual], but should transform it to List[creator.Individual] or if you have multiple types in the list, you should use an union operator, such as List[Union[int,float]]
Secondly, the type creator.Individual is not recognized by sphinx as a valid type. Instead I should define it using TypeVar as such:
from typing import TypeVar, List
CreatorIndividual = TypeVar("CreatorIndividual", bound=List[int])
So by transforming my crossover_worker function to this, it all worked:
def crossover_worker(sindices: List[CreatorIndividual]) -> Tuple[CreatorIndividual, CreatorIndividual]:
Note: "By contrast, a tuple is an example of a product type, a type consisting of a fixed set of types, and whose values are a collection of values, one from each type in the product type. Tuple[int,int,int], Tuple[str,int] and Tuple[int,str] are all distinct types, distinguished both by the number of types in the product and the order in which they appear."(source)

default_factory not being called on dataclasses' fields

Here is an example of what happens:
#dataclass
class D:
prop1: str
prop2: dict = field(default_factory=lambda: defaultdict(set))
d = D("spam")
print(d)
# D(prop1='spam', prop2=Field(name=None,type=None,default=<dataclasses._MISSING_TYPE object at 0x10274c650>,default_factory=<function D.<lambda> at 0x103ad3a70>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=None))
As you can see, the prop2 was not initialized using a default value from the default_factory, it is still a field. And if I try to do d.prop2["some key"] I get TypeError: 'Field' object is not subscriptable.
You probably have imported the dataclass decorator from the wrong module. It may happen if you use automatic imports in your IDE.
The behaviour described happens when you import dataclass from attr (from attr import dataclass).
If you do from dataclasses import dataclass everything will work as expected (the default_factory will be called to generate the value of the field).

__post_init__ of python 3.x dataclasses is not called when loaded from yaml

Please note that I have already referred to StackOverflow question here. I post this question to investigate if calling __post_init__ is safe or not. Please check the question till the end.
Check the below code. In step 3 where we load dataclass A from yaml string. Note that it does not call __post_init__ method.
import dataclasses
import yaml
#dataclasses.dataclass
class A:
a: int = 55
def __post_init__(self):
print("__post_init__ got called", self)
print("\n>>>>>>>>>>>> 1: create dataclass object")
a = A(33)
print(a) # print dataclass
print(dataclasses.fields(a))
print("\n>>>>>>>>>>>> 2: dump to yaml")
s = yaml.dump(a)
print(s) # print yaml repr
print("\n>>>>>>>>>>>> 3: create class from str")
a_ = yaml.load(s)
print(a_) # print dataclass loaded from yaml str
print(dataclasses.fields(a_))
The solution that I see for now is calling __-post_init__ on my own at the end like in below code snippet.
a_.__post_init__()
I am not sure if this is safe recreation of yaml serialized dataclass. Also, it will pose a problem when __post_init__ takes kwargs in case when dataclass fields are dataclasses.InitVar type.
This behavior is working as intended. You are dumping an existing object, so when you load it pyyaml intentionally avoids initializing the object again. The direct attributes of the dumped object will be saved even if they are created in __post_init__ because that function runs prior to being dumped. When you want the side effects that come from __post_init__, like the print statement in your example, you will need to ensure that initialization occurs.
There are few ways to accomplish this. You can use either the metaclass or adding constructor/representer approaches described in pyyaml's documentation. You could also manually alter the dumped string in your example to be ''!!python/object/new:' instead of ''!!python/object:'. If your eventual goal is to have the yaml file generated in a different manner, then this might be a solution.
See below for an update to your code that uses the metaclass approach and calls __post_init__ when loading from the dumped class object. The call to cls(**fields) in from_yaml ensures that the object is initialized. yaml.load uses cls.__new__ to create objects tagged with ''!!python/object:' and then loads the saved attributes into the object manually.
import dataclasses
import yaml
#dataclasses.dataclass
class A(yaml.YAMLObject):
a: int = 55
def __post_init__(self):
print("__post_init__ got called", self)
yaml_tag = '!A'
yaml_loader = yaml.SafeLoader
#classmethod
def from_yaml(cls, loader, node):
fields = loader.construct_mapping(node, deep=True)
return cls(**fields)
print("\n>>>>>>>>>>>> 1: create dataclass object")
a = A(33)
print(a) # print dataclass
print(dataclasses.fields(a))
print("\n>>>>>>>>>>>> 2: dump to yaml")
s = yaml.dump(a)
print(s) # print yaml repr
print("\n>>>>>>>>>>>> 3: create class from str")
a_ = yaml.load(s, Loader=A.yaml_loader)
print(a_) # print dataclass loaded from yaml str
print(dataclasses.fields(a_))

How to specify the type of a unittest.mock.sentinel?

I am using unittest.mock.sentinel to provide dumb values to my test functions and then assert calls.
I'd like to be able to specify the type of the sentinel so that it passes type checking in the methods.
MWE:
import collections
from unittest.mock import sentinel
def fun(x):
if not isinstance(x, collections.Iterable):
raise TypeError('x should be iterable')
pass
def test_fun_pass_if_x_is_instance_iterable():
# this does not work and raise because sentinel is not iterable
assert fun(sentinel.x) is None
EDIT
I have tried to do sentinel.x = collections.Iterable() but got the error:
TypeError: Can't instantiate abstract class Iterable with abstract methods __iter__
So far I can do sentinel.x = tuple() or sentinel.x = list() for instance, but these are special case of an iterable
I think the problem here is that collections.Iterable is an abstract base class (ABC) and cannot be instantiated directly. That's what the error message says, the method __iter__ is abstract, without body. You have to use a derived class or write one on your own.

Resources