Update different dataframes in imported module from inside a class instance, depending on parameters sent to class instance - python-3.x

I believe my problem is adequately described in the picture:
In the first function, I assign one of two imported dataframes as params to be sent to the class instance.
In the class instance, these params become dataframes contained in local variables, with no connection to the shared resource I want to update.
When I run a concatenate operation on the dataframe, the shared resource in the imported dataframe is not updated. Only the local copy in the class instance is updated.
How to get around this? I cannot use a global declaration, since a variable cannot be both a parameter and global at the same time (throws an error)
This is the part where I find Python quirky compared to most other languages. Any guidance or advice higly appreciated. =)

Related

How can I get deterministic hash values for class objects?

I have an application running in Python 3.9.4 where I store class objects in sets (along with many other kinds of objects). I'm getting non-deterministic behavior even when PYTHONHASHSEED=0 because class objects get non-deterministic hash codes. I assume that's because class objects' hash codes come from their addresses in memory.
For example, here are two runs of a little test program, where Before and Equation are classes:
print(hash(Before), hash(Equation), hash(int))
304555224 304593057 271715397
print(hash(Before), hash(Equation), hash(int))
326601328 293027788 273337413
How can I get Python to generate deterministic hash values for class objects?
Is there a metaclass or something that I could monkey-patch so that all class objects, even int, get a hash function that I specify?
Hash for classes is deterministic within the same process . Yes, in cPython it is memory based - but then you can't simply "move" a class object to another memory address using Python code.
If you happen to use some serialization/de-serialization transforms with the classes, the de-serialized objects will ordinarily be new objects, distinct from the original ones, and therefore will hash differently.
For the note: I could not reproduce the behavior you stated in the question: on the same process, the hashes for the class objects will be the same.
If you are calculating the hashes in different processes, though, the will differ. So, although you don't mention multiprocessing there, I assume that is your working case.
Then, indeed, implementing __hash__ and __eq__ proper methods on the metaclass can allow you a stable, across process, hashing - but you can't do that with built-in classes such as int: those are built in native code and can't be changed on the Python side. On the other hand, despite the hash number shown being different for these built-in classes, whatever you are using to serialize/deserialize your classes (that is what Python does for communicating data across processes, even if you don't do any explicit de/serializing) .
Then we come to, while it is straightforward to add __eq__ and __hash__ methods to a metaclass to your classes, it'd be better to ensure that on deserializing, it would always yield the same object (with the same ID). hash stability, as you put it, could possibly ensure you have always the same class, but it would depend on how you'd write your code: it is a bit tricky to retrieve the object instance that is already inside a set, if you check positively for containship of another instance that matches it - the most straightfoward way would be building a identity-dictionary out of a set, and then use the value:
my_registry_dict = {element: element for element in my_registry_set}
my_class = my_registry_dict[incoming_class]
With this in mind, we can have a custom metaclass that not only add __eq__ and __hash__- and you have to pick what elements of the classes you will want to compare for equality - class.__qualname__ can be a simple and functional attribute to use - but also customize the __new__ method so that upon de-serializing the same class a second time will always re-use the first class object defined in the current process (i.e.: ensuring the "singleton" behavior Python classes enjoy in non-corner cases like yours seems to be)
class Meta(type):
registry = {}
def __new__(mcls, name, bases, namespace):
cls = super().__new__(mcls, name, bases, namespace)
if cls not in mcls.registry:
mcls.registry[cls] = cls
else:
# reuse the previously created class
cls = mcls.registry[cls]
return cls
def __hash__(cls):
# when working with metaclasses, using the name `cls` instead of `self``
# helps reminding us that we are dealing with instances that are
# actually classes.
return hash(cls.__qualname__)
def __eq__(cls, other):
return cls.__qualname__ == other.__qualname__

Python class instance changed during local function variable

Let's for example define a class and a function
class class1(object):
"""description of class"""
pass
def fun2(x,y):
x.test=1;
return x.value+y
then define a class instance and run it as a local variable in the function
z=class1()
z.value=1
fun2(z,y=2)
However, if you try to run the code
z.test
a result 1 would be returned.
That was, though the attribute to x was done inside the fun2() locally, it extended to class instance x globally as well. This seemed to violate the first thing one learn about the python function, the argument stays local unless being defined nonlocal or global.
How could this happen? Why the attribute to class inside a function extend outside the function.
I have even stranger example:
def fun3(a):
b=a
b.append(3)
mya = [1]
fun3(mya)
print(mya)
[1, 3]
>
I "copy" the array to a local variable and when I change it, the global one changes as well.
The problem is that the parameters are not passed by a value (basically as a copy of the values). In python they are passed by reference. In C terminology the function gets a pointer to the memory location. It's much faster that way.
Some languages will not let you to play with private attributes of an instance, but in Python it's your responsibility to make sure that does not happen. One other rule of OOP is that you should change the internal state of an instance just by calling its methods.
But you change the value directly.
Python is very flexible and allows you to do even the bad things. But it does not push you.
I always argue to have always at least vague understanding of the underlaying structure of any higher level language (memory model, how the variables are passed etc.). There is another argument for having some C/C++ knowledge. Most of the higher level languages are written in them or at least are inspired by them. A C++ programmer would see clearly what is going on.

Puppet Include vs Class and Best Practices

When should I be using an include vs a class declaration? I am exploring creating a profile module right now, but am struggling with methodology and how I should lay things out.
A little background, I'm using the puppet-labs java module which can be found here.
My ./modules/profile/manifests/init.pp looks like this:
class profile {
## Hiera Lookups
$java_version = hiera('profile::jdk::package')
class {'java':
package => $java_version,
}
}
This works fine, but I know that I can also remove the class {'java': block of the code and instead use include java. My question relates around two things. One, if I wanted to use an include statement for whatever reason, how could I still pass the package version from hiera to it? Second, is there a preferred method of doing this? Is the include something I really shouldn't be using, or are there advantages and disadvantages to each method?
My long term goal will be building out profile like modules for my environment. Likely I would have a default profile that applies to all of my servers, and then profiles for different application load outs. I could include the profiles into a role and apply things to my individual nodes at that level. Does this make sense?
Thanks!
When should I be using an include vs a class declaration?
Where a class declares another, internal-only class that belongs to the same module, you can consider using a resource-like class declaration. That leverages your knowledge of the implementation details of the module, as you need to be able to prove that no other declaration of the class in question will be evaluated before the resource-like one. If ever that constraint is violated, catalog building will fail.
Under all other circumstances, you should use include or one of its siblings, require and contain.
One, if I wanted to use an include statement for whatever reason, how
could I still pass the package version from hiera to it?
Exactly the same way you would specify any other class parameter via Hiera. I already answered that for you.
Second, is
there a preferred method of doing this?
Yes, see above.
Is the include something I
really shouldn't be using, or are there advantages and disadvantages
to each method?
The include is what you should be using. This is your default, with require and contain as alternatives for certain situations. The resource-like declaration syntax seemed good to the Puppet team when they first introduced it, in Puppet 2.6, along with parameterized classes themselves. But it turns out that that syntax introduced deep design problems into the language, and it has been a source of numerous bugs and headaches. Automatic data binding was introduced in Puppet 3 in part to address many of those, allowing you to assign values to class parameters without using resource-like declarations.
The resource-like syntax has the single advantage -- if you want to consider it one -- that the parameter values are expressed directly in the manifest. Conventional Puppet wisdom holds that it is better to separate data from code, however, so as to avoid needing to modify manifests as configuration requirements change. Thus, expressing parameter values directly in the manifest is a good idea only if you are confident that they will never change. The most significant category of such cases is when a class has read data from an external source (i.e. looked it up via Hiera), and wants to pass those values on to another class.
The resource-like syntax has the great disadvantage that if a resource-like declaration of a given class is evaluated anywhere during the construction of a catalog for a given target node, then it must be the first declaration of that class that is evaluated. In contrast, any number of include-like declarations of the same class can be evaluated, whether instead of or in addition to a resource-like declaration.
Classes are singletons, so multiple declarations have no more effect on the target node than a single declaration. Allowing them is extremely convenient. Evaluation order of Puppet manifests is notoriously hard to predict, however, so if there is a resource-like declaration of a given class somewhere in the manifest set, it is very difficult in the general case to ensure that it is the first declaration of that class that is evaluated. That difficulty can be managed in the special case I described above. This falls into the more general category of evaluation-order dependencies, and you should take care to ensure that your manifest set is free of those.
There are other issues with the resource-like syntax, but none as significant as the evaluation-order dependency.
Clarification with respect to automated data binding
Automated data binding, mentioned above, associates keys identifying class parameters with corresponding values for those parameters. Compound values are supported if the back end supports them, which the default YAML back end in fact does. Your comments on this answer suggest that you do not yet fully appreciate these details, and in particular that you do not recognize the significance of keys identifying (whole) class parameters.
I take your example of a class that could on one hand be declared via this resource-like declaration:
class { 'elasticsearch':
config => { 'cluster.name' => 'clustername', 'node.name' => 'nodename' }
}
To use an include-like declaration instead, we must provide a value for the class's "config" parameter in the Hiera data. The key for this value will be elasticsearch::config (<fully-qualified classname> :: <parameter name>). The associated value is wanted puppet-side as a hash (a.k.a. "associative array", a.k.a. "map"), so that's how it is specified in the YAML-format Hiera data:
elasticsearch::config:
"cluster.name": "clustername"
"node.name": "nodename"
The hash nature of the value would be clearer if there were more than one entry. If you're unfamiliar with YAML, then it would probably be worth your while to at least skim a primer, such as the one at yaml.org.
With that data in place, we can now declare the class in our Puppet manifests simply via
include 'elasticsearch'

How are Chef attributes stored internally

I know where all we can define the chef attributes, attribute types and also their precendence levels. I just want to understand how they are stored internally.
Suppose I declare an attribute
default[:app][:install] = "/etc/app"
1) How is it stored internally? Is it using in a tree structure(heirearchy) in the node object or is it as hashmaps or a list of variables in the node object?
2) Also, in most of the cookbooks I see that attributes are declared in 2 or 3 levels something as above I dont understand if it is a standard or is it a best practice? Are there any guidelines for the way the attributes have to be declared? Is it something to do with its internal storage. Can't I declare the attribute as
default[:appinstall]= "/etc/app"
and access it as below in my recipe?
node[:appinstall]
Just four Mashes (subclass of Hash which does the string vs. symbol key fixups). When you access the merged view via node['foo'] it uses a Chef::Node::Attribute object to traverse all four in parallel until it finds a leaf value.
What you have shown is correct for setting and using attributes, though string keys are preferred over symbols. You should also in general scope your attributes with the name of the cookbook like:
default['mycookbook']['appinstall'] = '/etc/app'
This will reduce the chances of collisions with other cookbooks.

how to use parametrized classes to reduce code base

I wrote puppet manifests and I use puppet to deploy my system.
I am now refactoring manifests in order to make it maintainable.
One of sub systems is tomcat with webapplications.
I have ~10 webapps. Each of those has almost the same procedure to deploy.
For now I use classes. 10 files - almost identical.
When I tried to use parametrized class, puppet lets me instantiate it just once.
Then I tried to create 'empty' classes which inherit from webapp class.
It does not work as well because puppet complains that parameters are not passed parent class.
I do not see any method I could abstract the code. How to do it?
I would like to achieve:
node {
class {"webapp::first": param1 = one}
class {"webapp::second": param1 = two}
}
where first and second are applications using the same recipes.
I know there is define, but recipe is pretty big and even if it would be possible I find class more readable.
You can use parameters in your classes, but defines are more what you want. Quoting the official documentation
Classes and defined types are created similarly, but they are used very differently.
Defined types are used to define reusable objects which will have multiple instances on a given host, so > they cannot include any resources that will only have one instance. For instance, multiple uses of the > same define cannot create the same file.
see http://docs.puppetlabs.com/guides/language_guide.html#resource-collections
try to use user defined type classes are singleton by nature

Resources