Let's for example define a class and a function
class class1(object):
"""description of class"""
pass
def fun2(x,y):
x.test=1;
return x.value+y
then define a class instance and run it as a local variable in the function
z=class1()
z.value=1
fun2(z,y=2)
However, if you try to run the code
z.test
a result 1 would be returned.
That was, though the attribute to x was done inside the fun2() locally, it extended to class instance x globally as well. This seemed to violate the first thing one learn about the python function, the argument stays local unless being defined nonlocal or global.
How could this happen? Why the attribute to class inside a function extend outside the function.
I have even stranger example:
def fun3(a):
b=a
b.append(3)
mya = [1]
fun3(mya)
print(mya)
[1, 3]
>
I "copy" the array to a local variable and when I change it, the global one changes as well.
The problem is that the parameters are not passed by a value (basically as a copy of the values). In python they are passed by reference. In C terminology the function gets a pointer to the memory location. It's much faster that way.
Some languages will not let you to play with private attributes of an instance, but in Python it's your responsibility to make sure that does not happen. One other rule of OOP is that you should change the internal state of an instance just by calling its methods.
But you change the value directly.
Python is very flexible and allows you to do even the bad things. But it does not push you.
I always argue to have always at least vague understanding of the underlaying structure of any higher level language (memory model, how the variables are passed etc.). There is another argument for having some C/C++ knowledge. Most of the higher level languages are written in them or at least are inspired by them. A C++ programmer would see clearly what is going on.
Related
I have an application running in Python 3.9.4 where I store class objects in sets (along with many other kinds of objects). I'm getting non-deterministic behavior even when PYTHONHASHSEED=0 because class objects get non-deterministic hash codes. I assume that's because class objects' hash codes come from their addresses in memory.
For example, here are two runs of a little test program, where Before and Equation are classes:
print(hash(Before), hash(Equation), hash(int))
304555224 304593057 271715397
print(hash(Before), hash(Equation), hash(int))
326601328 293027788 273337413
How can I get Python to generate deterministic hash values for class objects?
Is there a metaclass or something that I could monkey-patch so that all class objects, even int, get a hash function that I specify?
Hash for classes is deterministic within the same process . Yes, in cPython it is memory based - but then you can't simply "move" a class object to another memory address using Python code.
If you happen to use some serialization/de-serialization transforms with the classes, the de-serialized objects will ordinarily be new objects, distinct from the original ones, and therefore will hash differently.
For the note: I could not reproduce the behavior you stated in the question: on the same process, the hashes for the class objects will be the same.
If you are calculating the hashes in different processes, though, the will differ. So, although you don't mention multiprocessing there, I assume that is your working case.
Then, indeed, implementing __hash__ and __eq__ proper methods on the metaclass can allow you a stable, across process, hashing - but you can't do that with built-in classes such as int: those are built in native code and can't be changed on the Python side. On the other hand, despite the hash number shown being different for these built-in classes, whatever you are using to serialize/deserialize your classes (that is what Python does for communicating data across processes, even if you don't do any explicit de/serializing) .
Then we come to, while it is straightforward to add __eq__ and __hash__ methods to a metaclass to your classes, it'd be better to ensure that on deserializing, it would always yield the same object (with the same ID). hash stability, as you put it, could possibly ensure you have always the same class, but it would depend on how you'd write your code: it is a bit tricky to retrieve the object instance that is already inside a set, if you check positively for containship of another instance that matches it - the most straightfoward way would be building a identity-dictionary out of a set, and then use the value:
my_registry_dict = {element: element for element in my_registry_set}
my_class = my_registry_dict[incoming_class]
With this in mind, we can have a custom metaclass that not only add __eq__ and __hash__- and you have to pick what elements of the classes you will want to compare for equality - class.__qualname__ can be a simple and functional attribute to use - but also customize the __new__ method so that upon de-serializing the same class a second time will always re-use the first class object defined in the current process (i.e.: ensuring the "singleton" behavior Python classes enjoy in non-corner cases like yours seems to be)
class Meta(type):
registry = {}
def __new__(mcls, name, bases, namespace):
cls = super().__new__(mcls, name, bases, namespace)
if cls not in mcls.registry:
mcls.registry[cls] = cls
else:
# reuse the previously created class
cls = mcls.registry[cls]
return cls
def __hash__(cls):
# when working with metaclasses, using the name `cls` instead of `self``
# helps reminding us that we are dealing with instances that are
# actually classes.
return hash(cls.__qualname__)
def __eq__(cls, other):
return cls.__qualname__ == other.__qualname__
I have been working a a very dense set of calculations. It all is to support a specific problem I have.
But the nature of the problem is no different than this. Suppose I develop a class called 'Matrix' that has the machinery to implement matrices. Instantiation would presumably take a list of lists, which would be the matrix entries.
Now I want to provide a multiply method. I have two choices. First, I could define a method like so:
class Matrix():
def __init__(self, entries)
# do the obvious here
return
def determinant(self):
# again, do the obvious here
return result_of_calcs
def multiply(self, b):
# again do the obvious here
return
If I do this, the call signature for two matrix objects, a and b, is
a.multiply(b)...
The other choice is a #staticmethod. Then, the definition looks like:
#staticethod
def multiply(a,b):
# do the obvious thing.
Now the call signature is:
z = multiply(a,b)
I am unclear when one is better than the other. The free-standing function is not truly part of the class definition, but who cares? it gets the job done, and because Python allows "reaching into an object" references from outside, it seems able to do everything. In practice they'll (the class and the method) end up in the same module, so they're at least linked there.
On the other hand, my understanding of the #staticmethod approach is that the function is now part of the class definition (it defines one of the methods), but the method gets no "self" passed in. In a way this is nice because the call signature is the much better looking:
z = multiply(a,b)
and the function can access all the instances' methods and attributes.
Is this the right way to view it? Are there strong reasons to do one or the other? In what ways are they not equivalent?
I have done quite a bit of Python programming since answering this question.
Suppose we have a file named matrix.py, and it has a bunch of code for manipulating matrices. We want to provide a matrix multiply method.
The two approaches are:
define a free:standing function with the signature multiply(x,y)
make it a method of all matrices: x.multiply(y)
Matrix multiply is what I will call a dyadic function. In other words, it always takes two arguments.
The temptation is to use #2, so that a matrix object "carries with it everywhere" the ability to be multiplied. However, the only thing it makes sense to multiply it with is another matrix object. In such cases there are two equally good ways to do that, viz:
z=x.multiply(y)
or
z=y.multiply(x)
However, a better way to do it is to define a function inside the file that is:
multiply(x,y)
multiply(), as such, is a function any code using the 'library' expects to have available. It need not be associated with each matrix. And, since the user will be doing an 'import', they will get the multiply method. This is better code.
What I was wrongly confounding was two things that led me to the method attached to every object instance:
Functions which need to be generally available inside the file that should be
exposed outside it; and
Functions which are needed only inside the file.
multiply() is an example of type 1. Any matrix 'library' ought to likely define matrix multiplication.
What I was worried about was needing to expose all the 'internal' functions. For example, suppose we want to make externally available matrix add(), multiple() and invert(). Suppose, however, we did not want to make externally available - but needed inside - determinant().
One way to 'protect' users is to make determinant a function (a def statement) inside the class declaration for matrices. Then it is protected from exposure. However, nothing stops a user of the code from reaching in if they know the internals, by using the method matrix.determinant().
In the end it comes down to convention, largely. It makes more sense to expose a matrix multiply function which takes two matrices, and is called like multiply(x,y). As for the determinant function, instead of 'wrapping it' in the class, it makes more sense to define it as __determinant(x) at the same level as the class definition for matrices.
You can never truly protect internal methods by their declaration, it seems. The best you can do is warn users. the "dunder" approach gives warning 'this is not expected to be called outside the code in this file'.
How can you test whether your function is getting [1,2,4,3] or l?
That might be useful to decide whether you want to return, for example, an ordered list or replace it in place.
For example, if it gets [1,2,4,3] it should return [1,2,3,4]. If it gets l, it should link the ordered list to l and do not return anything.
You can't tell the difference in any reasonable way; you could do terrible things with the gc module to count references, but that's not a reasonable way to do things. There is no difference between an anonymous object and a named variable (aside from the reference count), because it will be named no matter what when received by the function; "variables" aren't really a thing, Python has "names" which reference objects, with the object utterly unconcerned with whether it has named or unnamed references.
Make a consistent API. If you need to have it operate both ways, either have it do both things (mutate in place and return the mutated copy for completeness), or make two distinct APIs (one of which can be written in terms of the other, by having the mutating version used to implement the return new version by making a local copy of the argument, passing it to the mutating version, then returning the mutated local copy).
Pass-by-value semantics are easy to implement in an interpreter (for, say, your run-of-the-mill imperative language). For each scope, we maintain an environment that maps identifiers to their values. Processing a function call involves creating a new environment and populating it with copies of the arguments.
This won't work if we allow arguments that are passed by reference. How is this case typically handled?
First, your interpreter must check that the argument is something that can be passed by reference – that the argument is something that is legal in the left-hand side of an assignment statement. For example, if f has a single pass-by-reference parameter, f(x) is okay (since x := y makes sense) but f(1+1) is not (1+1 := y makes no sense). Typical qualifying arguments are variables and variable-like constructs like array indexing (if a is an array for which 5 is a legal index, f(a[5]) is okay, since a[5] = y makes sense).
If the argument passes that check, it will be possible for your interpreter to determine while processing the function call which precise memory location it refers to. When you construct the new environment, you put a reference to that memory location as the value of the pass-by-reference parameter. What that reference concretely looks like depends on the design of your interpreter, particularly on how you represent variables: you could simply use a pointer if your implementation language supports it, but it can be more complex if your design calls for it (the important thing is that the reference must make it possible for you to retrieve and modify the value contained in the memory location being referred to).
while your interpreter is interpreting the body of a function, it may have to treat pass-by-referece parameters specially, since the enviroment does not contain a proper value for it, just a reference. Your interpreter must recognize this and go look what the reference points to. For example, if x is a local variable and y is a pass-by-reference parameter, computing x+1 and y+1 may (depending on the details of your interpreter) work differently: in the former, you just look up the value of x, and then add one to it; in the latter, you must look up the reference that y happens to be bound to in the environment and go look what value is stored in the variable on the far side of the reference, and then you add one to it. Similarly, x = 1 and y = 1 are likely to work differently: the former just goes to modify the value of x, while the latter must first see where the reference points to and modify whatever variable or variable-like thing (such as an array element) it finds there.
You could simplify this by having all variables in the environment be bound to references instead of values; then looking up the value of a variable is the same process as looking up the value of a pass-by-reference parameter. However, this creates other issues, and it depends on your interpreter design and on the details of the language whether that's worth the hassle.
Background: I am programming a .NET compiler (very similar to C#) for a school project. One of the features I am currently trying to add is tailcall recursion within methods.
More info: In CIL, the "this" is passed into instance methods as if it were just another argument. So, accessing the first argument of a static method, you would emit ldarg.0, but accessing the first argument of an instance method, you would emit ldarg.1, and accessing "this" in an instance method you would emit ldarg.0. (Instance methods are even more similar to extension methods than I ever imagined.)
Question: Can you set "this" using starg.0 without any side effects?
Why this is in question: Whether or not a method is an instance method is set with the MethodBuilder, which is a bit of a black box. Although "this" seems just like any other argument, for all I know some JIT compilers keep track of "this" separately and change their behavior depending on this value. If there are side effects when you set "this" in an instance method, then how can I avoid them?
You may want to have a look at how F# implements tail-call.
You can extract this as a local variable. This way you will know that you can set it safely. (I hope I understand your question correctly)