how to avoid class having a __dict__ - python-3.x

I noticed many built-in classes do not have a __dict__ and even classes in modules such as numpy do not have __dict__ defined as they had been defined in C.
I want to define __getattr__, but I'm worried about a recursive loop creeping into the (long) code of one of my class (e.g. see the second reply Understanding the difference between __getattr__ and __getattribute__).
Is there a way I can disable creation of __dict__? Do I need to use __slots__ ?

Not having a __dict__ will not prevent you from writing an infinite recursive loop.
And yes, defining __slots__ will prevent the creation of a __dict__.

Related

How can I get deterministic hash values for class objects?

I have an application running in Python 3.9.4 where I store class objects in sets (along with many other kinds of objects). I'm getting non-deterministic behavior even when PYTHONHASHSEED=0 because class objects get non-deterministic hash codes. I assume that's because class objects' hash codes come from their addresses in memory.
For example, here are two runs of a little test program, where Before and Equation are classes:
print(hash(Before), hash(Equation), hash(int))
304555224 304593057 271715397
print(hash(Before), hash(Equation), hash(int))
326601328 293027788 273337413
How can I get Python to generate deterministic hash values for class objects?
Is there a metaclass or something that I could monkey-patch so that all class objects, even int, get a hash function that I specify?
Hash for classes is deterministic within the same process . Yes, in cPython it is memory based - but then you can't simply "move" a class object to another memory address using Python code.
If you happen to use some serialization/de-serialization transforms with the classes, the de-serialized objects will ordinarily be new objects, distinct from the original ones, and therefore will hash differently.
For the note: I could not reproduce the behavior you stated in the question: on the same process, the hashes for the class objects will be the same.
If you are calculating the hashes in different processes, though, the will differ. So, although you don't mention multiprocessing there, I assume that is your working case.
Then, indeed, implementing __hash__ and __eq__ proper methods on the metaclass can allow you a stable, across process, hashing - but you can't do that with built-in classes such as int: those are built in native code and can't be changed on the Python side. On the other hand, despite the hash number shown being different for these built-in classes, whatever you are using to serialize/deserialize your classes (that is what Python does for communicating data across processes, even if you don't do any explicit de/serializing) .
Then we come to, while it is straightforward to add __eq__ and __hash__ methods to a metaclass to your classes, it'd be better to ensure that on deserializing, it would always yield the same object (with the same ID). hash stability, as you put it, could possibly ensure you have always the same class, but it would depend on how you'd write your code: it is a bit tricky to retrieve the object instance that is already inside a set, if you check positively for containship of another instance that matches it - the most straightfoward way would be building a identity-dictionary out of a set, and then use the value:
my_registry_dict = {element: element for element in my_registry_set}
my_class = my_registry_dict[incoming_class]
With this in mind, we can have a custom metaclass that not only add __eq__ and __hash__- and you have to pick what elements of the classes you will want to compare for equality - class.__qualname__ can be a simple and functional attribute to use - but also customize the __new__ method so that upon de-serializing the same class a second time will always re-use the first class object defined in the current process (i.e.: ensuring the "singleton" behavior Python classes enjoy in non-corner cases like yours seems to be)
class Meta(type):
registry = {}
def __new__(mcls, name, bases, namespace):
cls = super().__new__(mcls, name, bases, namespace)
if cls not in mcls.registry:
mcls.registry[cls] = cls
else:
# reuse the previously created class
cls = mcls.registry[cls]
return cls
def __hash__(cls):
# when working with metaclasses, using the name `cls` instead of `self``
# helps reminding us that we are dealing with instances that are
# actually classes.
return hash(cls.__qualname__)
def __eq__(cls, other):
return cls.__qualname__ == other.__qualname__

In Python, is what are the differences between a method outside a class definition, or a method in it using staticmethod?

I have been working a a very dense set of calculations. It all is to support a specific problem I have.
But the nature of the problem is no different than this. Suppose I develop a class called 'Matrix' that has the machinery to implement matrices. Instantiation would presumably take a list of lists, which would be the matrix entries.
Now I want to provide a multiply method. I have two choices. First, I could define a method like so:
class Matrix():
def __init__(self, entries)
# do the obvious here
return
def determinant(self):
# again, do the obvious here
return result_of_calcs
def multiply(self, b):
# again do the obvious here
return
If I do this, the call signature for two matrix objects, a and b, is
a.multiply(b)...
The other choice is a #staticmethod. Then, the definition looks like:
#staticethod
def multiply(a,b):
# do the obvious thing.
Now the call signature is:
z = multiply(a,b)
I am unclear when one is better than the other. The free-standing function is not truly part of the class definition, but who cares? it gets the job done, and because Python allows "reaching into an object" references from outside, it seems able to do everything. In practice they'll (the class and the method) end up in the same module, so they're at least linked there.
On the other hand, my understanding of the #staticmethod approach is that the function is now part of the class definition (it defines one of the methods), but the method gets no "self" passed in. In a way this is nice because the call signature is the much better looking:
z = multiply(a,b)
and the function can access all the instances' methods and attributes.
Is this the right way to view it? Are there strong reasons to do one or the other? In what ways are they not equivalent?
I have done quite a bit of Python programming since answering this question.
Suppose we have a file named matrix.py, and it has a bunch of code for manipulating matrices. We want to provide a matrix multiply method.
The two approaches are:
define a free:standing function with the signature multiply(x,y)
make it a method of all matrices: x.multiply(y)
Matrix multiply is what I will call a dyadic function. In other words, it always takes two arguments.
The temptation is to use #2, so that a matrix object "carries with it everywhere" the ability to be multiplied. However, the only thing it makes sense to multiply it with is another matrix object. In such cases there are two equally good ways to do that, viz:
z=x.multiply(y)
or
z=y.multiply(x)
However, a better way to do it is to define a function inside the file that is:
multiply(x,y)
multiply(), as such, is a function any code using the 'library' expects to have available. It need not be associated with each matrix. And, since the user will be doing an 'import', they will get the multiply method. This is better code.
What I was wrongly confounding was two things that led me to the method attached to every object instance:
Functions which need to be generally available inside the file that should be
exposed outside it; and
Functions which are needed only inside the file.
multiply() is an example of type 1. Any matrix 'library' ought to likely define matrix multiplication.
What I was worried about was needing to expose all the 'internal' functions. For example, suppose we want to make externally available matrix add(), multiple() and invert(). Suppose, however, we did not want to make externally available - but needed inside - determinant().
One way to 'protect' users is to make determinant a function (a def statement) inside the class declaration for matrices. Then it is protected from exposure. However, nothing stops a user of the code from reaching in if they know the internals, by using the method matrix.determinant().
In the end it comes down to convention, largely. It makes more sense to expose a matrix multiply function which takes two matrices, and is called like multiply(x,y). As for the determinant function, instead of 'wrapping it' in the class, it makes more sense to define it as __determinant(x) at the same level as the class definition for matrices.
You can never truly protect internal methods by their declaration, it seems. The best you can do is warn users. the "dunder" approach gives warning 'this is not expected to be called outside the code in this file'.

What is the purpose of concrete methods in abstract classes in Python?

I feel like this subject is touched in some other questions but it doesn't get into Python (3.7) specifically, which is the language I'm most familiar with.
I'm starting to get the hang of abstract classes and how to use them as blueprints for subclasses I'm creating.
What I don't understand though, is the purpose of concrete methods in abstract classes.
If I'm never going to instantiate my parent abstract class, why would a concrete method be needed at all, shouldn't I just stick with abstract methods to guide the creation of my subclasses and explicit the expected behavior?
Thanks.
This question is not Python specific, but general object oriented.
There may be cases in which all your sub-classes need a certain method with a common behavior. It would be tedious to implement the same method in all your sub-classes. If you instead implement the method in the parent class, all your sub-classes inherit this method automatically. Even callers may call the method on your sub-class, although it is implemented in the parent class. This is one of the basic mechanics of class inheritance.

Metaclass of metaclass?

At my OOP theory exam I was asked a question "What is a metaclass? What is a metaclass of metaclass?". I answered first one easily, but have no idea about the second one. Is there even something like "metaclasses of metaclasses" in any programming language, or even theoritically? I tried implementing something like that in Python 3, but it seems it's a bit too complicated for my (I only had simple Python course for 1 semester).
Yes.
I will use the Python language to explain how can that be - maybe it is the language where metaclasses are more palpable (at least it is the language with the majority of questions involving metaclasses here).
So, in an OOP language where "classes" are first class objects - that is, they are themselves objects, and all the rules that apply to other objects apply to classes as well - they just happen to be the "template" for other kind of objects, these classes themselves are instances of "something". This something is the "metaclass" - which is just a word to designate the "class of a class".
And it happens that this "metaclass", the class of a class is an object as well, just as other classes. And as such, it also is an instance of a class - this class of the metaclass is what can be named the "metametaclass".
Now, bringing it down to a concrete example in Python - by default, the metaclass for classes in Python is type. That is, type is the class which instances are classes.
The language does have ingenious mechanisms that make it practical in some points to inherit a class from type, and then work with custom meta-classes.
So, what is the class of type? That will be the "metametaclass" for most (or all) classes in Python.
And as it goes, the class of "type" is "type" itself. yes - it is a circular reference, without which the object hierarchy of the language could not be bootstraped. I don't know if other languages do that differently - but in Python it is quite patent:
>>> class A:
... pass
...
>>> type(A)
<class 'type'>
>>> type(type(A))
<class 'type'>
>>> type(type(A)) is type(A)
True
Now, besides working like that as a concept, it actually can be used as a metametaclass, and there are mechanisms of the language that can be tweaked by inheriting type for that purpose. (Normally it will be inherited to be customized as an "ordinary" metaclass).
By coincidence, I had an use case for the "metametaclass" exemplified this very week, in a question here - by customizing Python's type __call__ method, instead of __new__ or __init__, and using this custom class as the "metaclass for a metaclass" one can actually control how those methods are called and the parameters passed to them methods when an ordinary class is be created.

Two classes differing only in base class

I have come across a problem where I need two classes that will have identical implementation and the only difference between them will be different name and base class. What is a reasonable way of doing this?
One obvious solution is to violate DRY and copy the implementations like this:
class FooA(BaseA):
def frobnicate(self):
print("frob")
class FooB(BaseB):
def frobnicate(self):
print("frob")
You can use multiple inheritance to implement interfaces and common functionality in a single mixin class. Given the clear desire to frobnicate in many classes, just implement a frobnicator. Python builds the class from right to left so mixins are left-most.
class Frobnicator(object):
def frobnicate(self):
print("frob")
class FooA(Frobnicator, BaseA):
pass
class FooB(Frobnicator, BaseB):
pass
Note that mixins usually do not implement their own __init__ - that's the job of the base class.

Resources