Can't use collections.defaultdict() in google-app-engine - python-3.x

Trying to use collections.defaultdict() to create an histogram in google-app-engine :
class myDS(ndb.Model):
values = ndb.PickleProperty()
hist = ndb.PickleProperty()
class Handler:
my_ds = myDS()
my_ds.values = {}
my_ds.hist = defaultdict(lambda : 0)
And got the error (from log)
File "/base/alloc/tmpfs/dynamic_runtimes/python27/277b61042b697c7a_unzipped/python27_lib/versions/1/google/appengine/ext/ndb/model.py", line 1331, in call
newvalue = method(self, value)
File "/base/alloc/tmpfs/dynamic_runtimes/python27/277b61042b697c7a_unzipped/python27_lib/versions/1/google/appengine/ext/ndb/model.py", line 1862, in _to_base_type
return pickle.dumps(value, pickle.HIGHEST_PROTOCOL)
PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function failed
Any way to solve this?

A PickleProperty field require a value that is serializable using Python's pickle protocol (see docs for more info):
PickleProperty: Value is a Python object (such as a list or a dict or a
string) that is serializable using Python's pickle protocol; Cloud
Datastore stores the pickle serialization as a blob. Unindexed by
default. Optional keyword argument: compressed.
See also this answer from Martijn Pieters:
Pickle cannot handle lambdas; pickle only ever handles data, not code,
and lambdas contain code. Functions can be pickled, but just like
class definitions only if the function can be imported. A function
defined at the module level can be imported. Pickle just stores a
string in that case, the full 'path' of the function to be imported
and referenced when unpickling again.
There are multiple options to work with default values, depending on your use case.

Related

__post_init__ of python 3.x dataclasses is not called when loaded from yaml

Please note that I have already referred to StackOverflow question here. I post this question to investigate if calling __post_init__ is safe or not. Please check the question till the end.
Check the below code. In step 3 where we load dataclass A from yaml string. Note that it does not call __post_init__ method.
import dataclasses
import yaml
#dataclasses.dataclass
class A:
a: int = 55
def __post_init__(self):
print("__post_init__ got called", self)
print("\n>>>>>>>>>>>> 1: create dataclass object")
a = A(33)
print(a) # print dataclass
print(dataclasses.fields(a))
print("\n>>>>>>>>>>>> 2: dump to yaml")
s = yaml.dump(a)
print(s) # print yaml repr
print("\n>>>>>>>>>>>> 3: create class from str")
a_ = yaml.load(s)
print(a_) # print dataclass loaded from yaml str
print(dataclasses.fields(a_))
The solution that I see for now is calling __-post_init__ on my own at the end like in below code snippet.
a_.__post_init__()
I am not sure if this is safe recreation of yaml serialized dataclass. Also, it will pose a problem when __post_init__ takes kwargs in case when dataclass fields are dataclasses.InitVar type.
This behavior is working as intended. You are dumping an existing object, so when you load it pyyaml intentionally avoids initializing the object again. The direct attributes of the dumped object will be saved even if they are created in __post_init__ because that function runs prior to being dumped. When you want the side effects that come from __post_init__, like the print statement in your example, you will need to ensure that initialization occurs.
There are few ways to accomplish this. You can use either the metaclass or adding constructor/representer approaches described in pyyaml's documentation. You could also manually alter the dumped string in your example to be ''!!python/object/new:' instead of ''!!python/object:'. If your eventual goal is to have the yaml file generated in a different manner, then this might be a solution.
See below for an update to your code that uses the metaclass approach and calls __post_init__ when loading from the dumped class object. The call to cls(**fields) in from_yaml ensures that the object is initialized. yaml.load uses cls.__new__ to create objects tagged with ''!!python/object:' and then loads the saved attributes into the object manually.
import dataclasses
import yaml
#dataclasses.dataclass
class A(yaml.YAMLObject):
a: int = 55
def __post_init__(self):
print("__post_init__ got called", self)
yaml_tag = '!A'
yaml_loader = yaml.SafeLoader
#classmethod
def from_yaml(cls, loader, node):
fields = loader.construct_mapping(node, deep=True)
return cls(**fields)
print("\n>>>>>>>>>>>> 1: create dataclass object")
a = A(33)
print(a) # print dataclass
print(dataclasses.fields(a))
print("\n>>>>>>>>>>>> 2: dump to yaml")
s = yaml.dump(a)
print(s) # print yaml repr
print("\n>>>>>>>>>>>> 3: create class from str")
a_ = yaml.load(s, Loader=A.yaml_loader)
print(a_) # print dataclass loaded from yaml str
print(dataclasses.fields(a_))

Instance attributes in a subclass of Chem.Atom in rdkit cannot be accessed

I defined a subclass of Atom in rdkit.Chem. I also defined an instance attribute in it but I could not get that instance from RWMol object in rdkit.
Below there is a sample code for my problem:
from rdkit import Chem
class MyAtom(Chem.Atom):
def __init__(self, symbol, **kwargs):
super().__init__(symbol, **kwargs)
self.my_attribute = 0
def get_my_attribute(self):
return self.my_attribute
if __name__ == '__main__':
rw_mol = Chem.RWMol()
# I created MyAtom class object then added to RWMol. But I couldn't get it again.
my_atom = MyAtom('C')
my_atom.my_attribute = 3
rw_mol.AddAtom(my_atom)
atom_in_mol = rw_mol.GetAtoms()[0]
# I can access my_atom new defined attributes.
print(my_atom.get_my_attribute())
# below two line gives error: AttributeError: 'Atom' object has no attribute 'get_my_attribute'
print(atom_in_mol.get_my_attribute())
print(atom_in_mol.my_attribute)
# type(atom1): <class '__main__.MyAtom'>
# type(atom_in_mol): <class 'rdkit.Chem.rdchem.Atom'>
# Why below atom types are different? Thanks to polymorphism, that two object types must be same.
Normally this code must run but it gives error due to last line because atom_in_mol object type is Chem.Atom. But should it be MyAtom? I also cannot access my_attribute directly.
rdkit Python library is a wrapper of C++. So is the problem this? Cannot I use inheritance for this library?
Note: I researched rdkit documentation and there is a SetProp method for saving values in atoms. It uses dictionary to save values. It runs fine but it is too slow for my project. I want to use instance attributes to save my extra values. Is there any solution for that inheritance problem, or faster different solution?
Python RDKit library is a C++ wrapper, so sometimes it does not follows the conventional Python object handling.
To go deeper, you will have to dig through the source code:
rw_mol.AddAtom(my_atom)
Above will execute AddAtom method in rdkit/Code/GraphMol/Wrap/Mol.cpp, which, in turn, calls addAtom method in rdkit/Code/GraphMol/RWMol.h, which then calls addAtom method in rdkit/Code/GraphMol/ROMol.cpp with default argument of updateLabel = true and takeOwnership = false.
The takeOwnership = false condition makes the argument atom to be duplicated,
// rdkit/Code/GraphMol/ROMol.cpp
if (!takeOwnership)
atom_p = atom_pin->copy();
else
atom_p = atom_pin;
Finally, if you look into what copy method do in rdkit/Code/GraphMol/Atom.cpp
Atom *Atom::copy() const {
auto *res = new Atom(*this);
return res;
}
So, it reinstantiate Atom class and returns it.

How to specify the type of a unittest.mock.sentinel?

I am using unittest.mock.sentinel to provide dumb values to my test functions and then assert calls.
I'd like to be able to specify the type of the sentinel so that it passes type checking in the methods.
MWE:
import collections
from unittest.mock import sentinel
def fun(x):
if not isinstance(x, collections.Iterable):
raise TypeError('x should be iterable')
pass
def test_fun_pass_if_x_is_instance_iterable():
# this does not work and raise because sentinel is not iterable
assert fun(sentinel.x) is None
EDIT
I have tried to do sentinel.x = collections.Iterable() but got the error:
TypeError: Can't instantiate abstract class Iterable with abstract methods __iter__
So far I can do sentinel.x = tuple() or sentinel.x = list() for instance, but these are special case of an iterable
I think the problem here is that collections.Iterable is an abstract base class (ABC) and cannot be instantiated directly. That's what the error message says, the method __iter__ is abstract, without body. You have to use a derived class or write one on your own.

Can't pickle <class 'a class'>: attribute lookup inner class on a class failed

I was using PySpark to process some calls data. As you see, I added some inner classes to class GetInfoFromCalls dynamically by using metaclass.
code below located in package for_test that existed in all nodes:
class StatusField(object):
"""
some alias.
"""
failed = "failed"
succeed = "succeed"
status = "status"
getNothingDefaultValue = "-999999"
class Result(object):
"""
Result that store result and some info about it.
"""
def __init__(self, result, status, message=None):
self.result = result
self.status = status
self.message = message
structureList = [
("user_mobile", str, None),
("real_name", str, None),
("channel_attr", str, None),
("channel_src", str, None),
("task_data", dict, None),
("bill_info", list, "task_data"),
("account_info", list, "task_data"),
("payment_info", list, "task_data"),
("call_info", list, "task_data")
]
def inner_get(self, defaultValue=StatusField.getNothingDefaultValue):
try:
return self.holder.get(self)
except Exception as e:
return Result(defaultValue, StatusField.failed)
print(e)
class call_meta(type):
def __init__(cls, name, bases, attrs):
for name_str, type_class, pLevel_str in structureList:
setattr(cls, name_str, type(
name_str,
(object,),
{})
)
class GetInfoFromCalls(object, metaclass = call_meta):
def __init__(self, call_deatails):
for name_str, type_class, pLevel_str in structureList:
inn = getattr(self.__class__, name_str)()
object_dict = {
"name": name_str,
"type": type_class,
"pLevel": None if pLevel_str is None else getattr(self, pLevel_str),
"context": None,
"get": inner_get,
"holder": self,
}
for attr_str, real_attr in object_dict.items():
setattr(inn, attr_str, real_attr)
setattr(self, name_str, inn)
self.call_details = call_deatails
when I ran
import pickle
pickle.dumps(GetInfoFromCalls("foo"))
it raised error like this:
Traceback (most recent call last):
File "<ipython-input-11-b2d409e35eb4>", line 1, in <module>
pickle.dumps(GetInfoFromCalls("foo"))
PicklingError: Can't pickle <class '__main__.user_mobile'>: attribute lookup user_mobile on __main__ failed
It seemed that I can't pickle inner classes because them were added dynamically by code. When classes were pickled, inner classes were not existed, is it right?
Really I don't want to write these classes that were nearly same to each other. Does someone has good way to avoid this problem?
Python's pickle actually does not serializes classes: it does serialize instances, and put in the serialization a reference to each instance's class - and that reference is based on the class being bound to a name in a well defined module. So, instances of classes that don't have a module name, but rather live as attribute in other classes, or data inside lists and dictionaries, typically will not work.
One straight forward thing one can try to do is try to use dill instead of pickle. It is a third party package that works like "pickle" but has extensions to actually serialize arbitrary dynamic classes.
While using dill may help other people reaching here, it is not your case, because in order to use dill, you'd have to monkey patch the underlying RPC mechanism PySpark is using to make use of dill instead of pickle, and that might not be trivial nor consistent enough for production use.
If the problem is really about dynamically created classes being unpickable, what you can do is to create extra meta-classes, for the dynamic classes themselves, instead of using "type", and on these metaclasses, create proper __getstate__ and __setstate__ (or other helper methods as it is on pickle documentation) - that might enable these classes to be pickled by ordinary Pickle. That is, a separate metaclass with Pickler helper methods to be used instead of type(..., (object, ), ...) in your code.
However, "unpickable object" is not the error you are getting - it is an attribute lookup error, which suggests the structure you are building is not good enough for Pickle to introspect into it and get all the members from one of your instances - it is not related (yet) to the unpickleability of the class object. Since your dynamic classes live as attributes on the class (which is not itself pickled) and not of the instance, it is very well possible that pickle does not care about it. Check the docs on pickle above, and maybe all you need there is proper helper-method to pickle on you class, nothing different on the the metaclass for all that you have there to work properly.

lxml - use default class element lookup and TreeBuilder parser target at the same time

With lxml, is there a way to use a ElementDefaultClassLookup and a parser target derived from the TreeBuilder class at the same time?
I have tried:
from lxml import etree
class MyElement(etree.ElementBase):
pass
class MyComment(etree.CommentBase):
pass
class MyTreeBuilder(etree.TreeBuilder):
pass
parser_lookup = etree.ElementDefaultClassLookup(element=MyElement, comment=MyComment)
parser = etree.XMLParser(target=MyTreeBuilder()) # (My)TreeBuilder accepts a `parser` argument keyword (which will then use the class lookup from that parser), but we haven't created the parser yet!
parser.set_element_class_lookup(parser_lookup)
xml_string = '<root xmlns:test="hello"><element test:foobar="1" /><!-- world --></root>'
root = etree.fromstring(xml_string, parser=parser)
print(type(root), type(root.xpath('//comment()[1]')[0]), etree.tostring(root))
which results in:
<class 'lxml.etree._Element'> <class 'lxml.etree._Comment'> b'<root><element xmlns:ns0="hello" ns0:foobar="1"/><!-- world --></root>'
I want:
<class 'MyElement'> <class 'MyComment'> b'<root xmlns:test="hello"><element test:foobar="1"/><!-- world --></root>'
Notice the XML namespace differences as well as the Python classes.
I'm using lxml 3.4.4.
I can get the correct namespace prefix on the attribute by using:
parser = etree.XMLParser(target=etree.TreeBuilder())
which doesn't make any sense to me - why doesn't my derived class behave the same way? (I realize that omitting the target argument will by default use a TreeBuilder anyway.)
and I can get the correct class and namespace prefix by using:
parser = etree.XMLParser()
but I specifically want to use my own "target" to create a tree, and ideally don't want to have to reinvent the wheel.
I have tried to set the target after the parser is initialized, like so:
parser.target = MyTreeBuilder(parser=parser)
but this gives an error:
AttributeError: attribute 'target' of 'lxml.etree._BaseParser' objects is not writable
I checked the source code for the TreeBuilder class, and tried:
class MyTreeBuilder(etree.TreeBuilder):
def set_parser(self, parser):
super()._parser = parser
parser_lookup = etree.ElementDefaultClassLookup(element=MyElement, comment=MyComment)
tb = MyTreeBuilder()
parser = etree.XMLParser(target=tb)
parser.set_element_class_lookup(parser_lookup)
tb.set_parser(parser)
which gives:
AttributeError: 'super' object has no attribute '_parser'
and I tried:
parser_lookup = etree.ElementDefaultClassLookup(element=MyElement, comment=MyComment)
fake_parser = etree.XMLParser()
fake_parser.set_element_class_lookup(parser_lookup)
tb = MyTreeBuilder(parser=fake_parser)
parser = etree.XMLParser(target=tb)
parser.set_element_class_lookup(parser_lookup)
which gives the correct classes, but not still the correct attribute namespaces. I therefore imagine it needs some information from the correct parser to be able to build the tree correctly.
I tried setting the element_factory keyword argument instead of, and as well as, parser - but got the same result:
tb = MyTreeBuilder(element_factory=fake_parser.makeelement)
# and
tb = MyTreeBuilder(element_factory=fake_parser.makeelement, parser=fake_parser)
EDIT: looking at the lxml class lookup code, it seems that one can set a class attribute called PARSER on the custom Element.
I tried that:
MyElement.PARSER = parser
but the result was the same as without it.
However, the makeelement method of the parser works as expected:
test = parser.makeelement('root', attrib={ '{hello}foobar': '1' }, nsmap={ 'test': 'hello' })
print(type(test), etree.tostring(test))
as it gives (obviously the attribute is on a different node than when I parse the string):
<class 'MyElement'> b'<root xmlns:test="hello" test:foobar="1"/>'
Combining these two approaches using:
def m(tag, attrib=None,nsmap=None, *children):
return MyElement.PARSER.makeelement(tag, attrib=attrib, nsmap=nsmap, *children)
parser = etree.XMLParser(target=MyTreeBuilder(element_factory=m))
gives the correct classes but still incorrect namespaces.
Is what I want possible? Why do the namespaces "go wrong" as soon as I use a custom target that isn't a pure TreeBuilder? Is there something I need to do differently somewhere, to manually correct the namespace behavior? Will I have to make my own tree builder implementation?

Resources