default_factory not being called on dataclasses' fields - python-3.x

Here is an example of what happens:
#dataclass
class D:
prop1: str
prop2: dict = field(default_factory=lambda: defaultdict(set))
d = D("spam")
print(d)
# D(prop1='spam', prop2=Field(name=None,type=None,default=<dataclasses._MISSING_TYPE object at 0x10274c650>,default_factory=<function D.<lambda> at 0x103ad3a70>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=None))
As you can see, the prop2 was not initialized using a default value from the default_factory, it is still a field. And if I try to do d.prop2["some key"] I get TypeError: 'Field' object is not subscriptable.

You probably have imported the dataclass decorator from the wrong module. It may happen if you use automatic imports in your IDE.
The behaviour described happens when you import dataclass from attr (from attr import dataclass).
If you do from dataclasses import dataclass everything will work as expected (the default_factory will be called to generate the value of the field).

Related

Why I cannot create standalone object of HttpURL in pydantic?

from pydantic import BaseModel, Field, HttpUrl
from typing import Optional
class TestClass(BaseModel):
url:Optional[HttpUrl] = None
Creating object TestClass with url="https://www.test.com" works.
Here the imported HttpUrl or BaseModel are class. When I try to create httpurl object standalone it gives typeerror e.g. below.
from pydantic import HttpUrl
myurl = HttpUrl("https://www.test.com")
Why it cannot be used to convert string to http object like above. It results
errors like: need keyword-only args if that is provided then 2 positional provided required 3
You can use parse_obj_as method. This way there will be fewer dependencies:
from pydantic import parse_obj_as, HttpUrl
url=parse_obj_as(HttpUrl, "https://www.test.com")
pydantic types are valid only when used as a class variables inside BaseModel derivatives.
Under the hood, pydantic fires validate method of AnyUrl class which is inconvenient for external usage.
Maybe urllib library will come handy for you?
from urllib.parse import urlparse
myurl = urlparse("https://www.test.com")

Trouble creating DefaultDict in Dataclass

I am having trouble setting up a simple dataclass using a new defaultdict(dict).
If I tell the factory to use 'dict' as below , instantiation fails with typerror collection.defaultdict object is not callable
from collections import defaultdict
from dataclasses import dataclass, field
#dataclass
class ResultSet:
changed: bool = False
mqttdata: defaultdict(dict) = field(default_factory=defaultdict(dict)) # does not work!
It does sorta work using field(default_factory=defaultdict) but then my code will fail later when it encounters missing keys - presumably because defaultdict was not set up for dict.
How do I properly set up a new defaultdict(dict) in a dataclass?
You have a few problems with the code and how you are using dataclasses currently:
Type generics in annotations need to be specified through square brackets [] generally, so default[dict] instead of defaultdict(dict) for example.
The default_factory argument to dataclasses.field() needs to be a no-arg callable which returns a new object with default values set. For example, assuming you have a nested dataclass Inner which specifies defaults for all fields, you could use default_factory=Inner to create a new Inner object each time the main dataclass is instantiated.
Note that the default_factory argument is mainly useful for mutable types
such as set, list, and dict, so that the same object isn't
shared (and potentially mutated) between dataclass instances.
Putting it all together, here is the working code which sets a default value for a field of type defaultdict[dict]:
from collections import defaultdict
from dataclasses import dataclass, field
#dataclass
class ResultSet:
changed: bool = False
mqttdata: defaultdict[dict] = field(default_factory=lambda: defaultdict(dict)) # works!
print(ResultSet())
In Python versions earlier than 3.9, which is when PEP 585 was introduced, you'll need to add the following import at the top so that any type annotations are lazy-evaluated:
from __future__ import annotations

MyPy not considering dataclass attribute mechanics

I am developing a Python3.8 project with usage of typing and dataclasses, and automatic tests include mypy. This brings me in a strange behavior that I do not really understand...
In short: Mypy seems not to understand dataclass attributes mechanics that, to my understanding, make them instance attributes.
Here's a minimal example, with a package and two modules:
__init__.py: void
app_events.py:
class AppEvent:
pass
main.py:
import dataclasses
import typing
from . import app_events
class B:
"""Class with *app_events* as instance attribute."""
def __init__(self):
self.app_events: typing.List[app_events.AppEvent] = []
def bar(self) -> app_events.AppEvent:
# no mypy complaint here: the import is correctly distinguished
# from the attribute
...
class C:
"""Class with *app_events* as class attribute."""
app_events: List[app_events.AppEvent]
def chew(self) -> app_events.AppEvent:
# mypy considers app_events to be the class attribute
...
#dataclasses.dataclass
class D:
app_events: typing.List[app_events.AppEvent] = \
dataclasses.field(default_factory=list)
def doo(self) -> app_events.AppEvent:
# same here: mypy considers app_events to be the class attribute
...
And the typecheck result:
PyCharm complains, for methods C.chew and D.doo: Unresolved attribute reference 'AppEvent' for class 'list'
mypy complains, still for methods C.chew and D.doo, that error: Name 'app_events.AppEvent' is not defined.
No issue for B.bar as written, though if app_events attribute is declared as a class attribute (instead of being defined in self.__init__, then mypy raise the same complaint.)
-> any idea how to understand/solve/circumvent this elegantly?
I'd really like not to rename my module and attributes, but if you have nice names in mind, please do not hesitate to propose :-)

__post_init__() for multiple inherited dataclasses

Trying to evaluate if dataclasses are suitable for an upcoming project, but right now I'm stuck with this code:
from dataclasses import dataclass
#dataclass
class MixinA:
attrA: int
def __post_init__(self):
print('MixinA post_init')
self.attrA = [self.attrA]
#dataclass
class MixinB:
attrB: str
def __post_init__(self):
print('MixinB post_init')
self.attrB = [self.attrB]
#dataclass
class MixinC:
attrC: bool
def __post_init__(self):
print('MixinC post_init')
self.attrC = [self.attrC]
#dataclass
class Inherited(MixinC, MixinB, MixinA):
pass
obj = Inherited(4, 'Hello', False)
print(obj.attrA, obj.attrB, obj.attrC)
print(obj.__class__.mro())
It is a surprise to me that only __post_init__() in the first base class is called, when I expect all three are invoked:
MixinC post_init
4 Hello [False]
[<class '__main__.Inherited'>, <class '__main__.MixinC'>, <class '__main__.MixinB'>, <class '__main__.MixinA'>, <class 'object'>]
Besides, changing inheritance doesn't do me any good. Following inheritance generates the exact same output as above:
class MixinA:
class MixinB(MixinA):
class MixinC(MixinB):
class Inherited(MixinC):
Did I write the testing code in a wrong way, or is current behavior done by oversight or intention?
The core issue for me is, I want to transform each attribute before generating the final dataclass instances. The actual inheritance is of larger scale, and doing it within each and every class would be very redundant.
If __post_init__() is a no-go, is there any alternative approach (such as InitVar or custom __init__())?

__post_init__ of python 3.x dataclasses is not called when loaded from yaml

Please note that I have already referred to StackOverflow question here. I post this question to investigate if calling __post_init__ is safe or not. Please check the question till the end.
Check the below code. In step 3 where we load dataclass A from yaml string. Note that it does not call __post_init__ method.
import dataclasses
import yaml
#dataclasses.dataclass
class A:
a: int = 55
def __post_init__(self):
print("__post_init__ got called", self)
print("\n>>>>>>>>>>>> 1: create dataclass object")
a = A(33)
print(a) # print dataclass
print(dataclasses.fields(a))
print("\n>>>>>>>>>>>> 2: dump to yaml")
s = yaml.dump(a)
print(s) # print yaml repr
print("\n>>>>>>>>>>>> 3: create class from str")
a_ = yaml.load(s)
print(a_) # print dataclass loaded from yaml str
print(dataclasses.fields(a_))
The solution that I see for now is calling __-post_init__ on my own at the end like in below code snippet.
a_.__post_init__()
I am not sure if this is safe recreation of yaml serialized dataclass. Also, it will pose a problem when __post_init__ takes kwargs in case when dataclass fields are dataclasses.InitVar type.
This behavior is working as intended. You are dumping an existing object, so when you load it pyyaml intentionally avoids initializing the object again. The direct attributes of the dumped object will be saved even if they are created in __post_init__ because that function runs prior to being dumped. When you want the side effects that come from __post_init__, like the print statement in your example, you will need to ensure that initialization occurs.
There are few ways to accomplish this. You can use either the metaclass or adding constructor/representer approaches described in pyyaml's documentation. You could also manually alter the dumped string in your example to be ''!!python/object/new:' instead of ''!!python/object:'. If your eventual goal is to have the yaml file generated in a different manner, then this might be a solution.
See below for an update to your code that uses the metaclass approach and calls __post_init__ when loading from the dumped class object. The call to cls(**fields) in from_yaml ensures that the object is initialized. yaml.load uses cls.__new__ to create objects tagged with ''!!python/object:' and then loads the saved attributes into the object manually.
import dataclasses
import yaml
#dataclasses.dataclass
class A(yaml.YAMLObject):
a: int = 55
def __post_init__(self):
print("__post_init__ got called", self)
yaml_tag = '!A'
yaml_loader = yaml.SafeLoader
#classmethod
def from_yaml(cls, loader, node):
fields = loader.construct_mapping(node, deep=True)
return cls(**fields)
print("\n>>>>>>>>>>>> 1: create dataclass object")
a = A(33)
print(a) # print dataclass
print(dataclasses.fields(a))
print("\n>>>>>>>>>>>> 2: dump to yaml")
s = yaml.dump(a)
print(s) # print yaml repr
print("\n>>>>>>>>>>>> 3: create class from str")
a_ = yaml.load(s, Loader=A.yaml_loader)
print(a_) # print dataclass loaded from yaml str
print(dataclasses.fields(a_))

Resources