python (3.7) dataclass for self referenced structure [duplicate] - python-3.x

This question already has answers here:
How do I type hint a method with the type of the enclosing class?
(7 answers)
Closed 3 years ago.
class Node:
def append_child(self, node: Node):
if node != None:
self.first_child = node
self.child_nodes += [node]
How do I do node: Node? Because when I run it, it says name 'Node' is not defined.
Should I just remove the : Node and instance check it inside the function?
But then how could I access node's properties (which I would expect to be instance of Node class)?
I don't know how implement type casting in Python, BTW.

"self" references in type checking are typically done using strings:
class Node:
def append_child(self, node: 'Node'):
if node != None:
self.first_child = node
self.child_nodes += [node]
This is described in the "Forward references" section of PEP-0484.
Please note that this doesn't do any type-checking or casting. This is a type hint which python (normally) disregards completely1. However, third party tools (e.g. mypy), use type hints to do static analysis on your code and can generate errors before runtime.
Also, starting with python3.7, you can implicitly convert all of your type-hints to strings within a module by using the from __future__ import annotations (and in python4.0, this will be the default).
1The hints are introspectable -- So you could use them to build some kind of runtime checker using decorators or the like if you really wanted to, but python doesn't do this by default.

Python 3.7 and Python 4.03.10 onwards
PEP 563 introduced postponed evaluations, stored in __annotations__ as strings. A user can enable this through the __future__ directive:
from __future__ import annotations
This makes it possible to write:
class C:
a: C
def foo(self, b: C):
...
Starting in Python 3.10 (release planned 2021-10-04), this behaviour will be default.
Edit 2020-11-15: Originally it was announced to be mandatory starting in Python 4.0, but now it appears this will be default already in Python 3.10, which is expected 2021-10-04. This surprises me as it appears to be a violation of the promise in __future__ that this backward compatibility would not be broken until Python 4.0. Maybe the developers consider than 3.10 is 4.0, or maybe they have changed their mind. See also Why did __future__ MandatoryRelease for annotations change between 3.7 and 3.8?.

In Python > 3.7 you can use dataclass. You can also annotate dataclass.
In this particular example Node references itself and if you run it you will get
NameError: name 'Node' is not defined
To overcome this error you have to include:
from __future__ import annotations
It must be the first line in a module. In Python 4.0 and above you don't have to include annotations
from __future__ import annotations
from dataclasses import dataclass
#dataclass
class Node:
value: int
left: Node
right: Node
#property
def is_leaf(self) -> bool:
"""Check if node is a leaf"""
return not self.left and not self.right
Example:
node5 = Node(5, None, None)
node25 = Node(25, None, None)
node40 = Node(40, None, None)
node10 = Node(10, None, None)
# balanced tree
node30 = Node(30, node25, node40)
root = Node(20, node10, node30)
# unbalanced tree
node30 = Node(30, node5, node40)
root = Node(20, node10, node30)

If you just want an answer to the question, go read mgilson's answer.
mgilson's answer provides a good explanation of how you should work around this limitation of Python. But I think it's also important to have a good understanding of why this doesn't work, so I'm going to provide that explanation.
Python is a little different from other languages. In Python, there's really no such thing as a "declaration." As far as Python is concerned, code is just code. When you import a module, Python creates a new namespace (a place where global variables can live), and then executes each line of the module from top to bottom. def foo(args): code is just a compound statement that bundles a bunch of source code together into a function and binds that function to the name foo. Similarly, class Bar(bases): code creates a class, executes all of the code immediately (inside a separate namespace which holds any class-level variables that might be created by the code, particularly including methods created with def), and then binds that class to the name Bar. It has to execute the code immediately, because all of the methods need to be created immediately. Because the code gets executed before the name has been bound, you can't refer to the class at the top level of the code. It's perfectly fine to refer to the class inside of a method, however, because that code doesn't run until the method gets called.
(You might be wondering why we can't just bind the name first and then execute the code. It turns out that, because of the way Python implements classes, you have to know which methods exist up front, before you can even create the class object. It would be possible to create an empty class and then bind all of the methods to it one at a time with attribute assignment (and indeed, you can manually do this, by writing class Bar: pass and then doing def method1():...; Bar.method1 = method1 and so on), but this would result in a more complicated implementation, and be a little harder to conceptualize, so Python does not do this.)
To summarize in code:
class C:
C # NameError: C doesn't exist yet.
def method(self):
return C # This is fine. By the time the method gets called, C will exist.
C # This is fine; the class has been created by the time we hit this line.

Related

TypeError: 'ABCMeta' object is not subscriptable

Here is my code:
from collecionts.abc import Sequence
from typing import TypeVar
T = TypeVar('T')
def first(a: Sequence[T]) -> T:
return a[0]
In my understanding, I can pass any Sequence-like object as parameter into first function, like:
first([1,2,3])
and it returns 1
However, it raises a TypeError:' ABCMeta' object is not subscriptable. What is going on here? How can I make it work that I have a function using typing module which can take first element whatever its type?
UPDATE
If I use from typing import Sequence,it runs alright,what is the difference between from collections.abc import Sequence and from typing import Sequence
two things.
The first one is that the typing module will not raise errors at runtime if you pass arguments that do not respect the type you indicated. Typing module helps for general clarity and for intellisense or stuff like that.
Regarding the error that you encounter is probably beacuse of the python version you are using. Try to upgrade to python >= 3.9

How to implement a Meta Path Importer on Python 3

I'm struggling to refactor some working import-hook-functionality that served us very well on Python 2 the last years... And honestly I wonder if something is broken in Python 3? But I'm unable to see any reports of that around so confidence in doing something wrong myself is still stronger! Ok. Code:
Here is a cooked down version for Python 3 with PathFinder from importlib.machinery:
import sys
from importlib.machinery import PathFinder
class MyImporter(PathFinder):
def __init__(self, name):
self.name = name
def find_spec(self, fullname, path=None, target=None):
print('MyImporter %s find_spec fullname: %s' % (self.name, fullname))
return super(MyImporter, self).find_spec(fullname, path, target)
sys.meta_path.insert(0, MyImporter('BEFORE'))
sys.meta_path.append(MyImporter('AFTER'))
print('sys.meta_path:', sys.meta_path)
# import an example module
import json
print(json)
So you see: I insert an instance of the class right in front and one at the end of sys.meta_path. Turns out ONLY the first one triggers! I never see any calls to the last one. That was different in Python 2!
Looking at the implementation in six I thought, well THEY need to know how to do this properly! ... 🤨 I don't see this working either! When I try to step in there or just put some prints... Nada!
After all:IF I actually put my Importer first in the sys.meta_path list, trigger on certain import and patch my module (which all works fine) It still gets overridden by the other importers in the list!
* How can I prevent that?
* Do I need to do that? It seems dirty!
I have been heavily studying the meta_path in Python3.8
The entire import mechanism has been moved from C to Python and manifests itself as sys.meta_path which contains 3 importers. The Python import machinery is cleverly stupid. i.e. uncomplex.
While the source code of the entire python import is to be found in importlib/
meta_path[1] pulls the importer from frozen something: bytecode=?
underscore import is still the central hook called when you "import mymod"
--import--() first checks if the module has already been imported in which case it retrieves it from sys.modules
if that doesn't work it calls find_spec() on each "spec finder" in meta_path.
If the "spec finder" is successful it return a "spec" needed by the next stage
If none of them find it, import fails
sys.meta_path is an array of "spec finders"
0: is the builtin spec finder: (sys, _sre)
1: is the frozen import lib: It imports the importer (importlib)
2: is the path finder and it finds both library modules: (os, re, inspect)
and your application modules based on sys.path
So regarding the question above, it shouldn't be happening. If your spec finder is first in the meta_path and it returns a valid spec then the module is found, and remaining entries in sys.meta_path won't even be asked.

How do I implement a global "oracle" in python? [duplicate]

I've run into a bit of a wall importing modules in a Python script. I'll do my best to describe the error, why I run into it, and why I'm tying this particular approach to solve my problem (which I will describe in a second):
Let's suppose I have a module in which I've defined some utility functions/classes, which refer to entities defined in the namespace into which this auxiliary module will be imported (let "a" be such an entity):
module1:
def f():
print a
And then I have the main program, where "a" is defined, into which I want to import those utilities:
import module1
a=3
module1.f()
Executing the program will trigger the following error:
Traceback (most recent call last):
File "Z:\Python\main.py", line 10, in <module>
module1.f()
File "Z:\Python\module1.py", line 3, in f
print a
NameError: global name 'a' is not defined
Similar questions have been asked in the past (two days ago, d'uh) and several solutions have been suggested, however I don't really think these fit my requirements. Here's my particular context:
I'm trying to make a Python program which connects to a MySQL database server and displays/modifies data with a GUI. For cleanliness sake, I've defined the bunch of auxiliary/utility MySQL-related functions in a separate file. However they all have a common variable, which I had originally defined inside the utilities module, and which is the cursor object from MySQLdb module.
I later realised that the cursor object (which is used to communicate with the db server) should be defined in the main module, so that both the main module and anything that is imported into it can access that object.
End result would be something like this:
utilities_module.py:
def utility_1(args):
code which references a variable named "cur"
def utility_n(args):
etcetera
And my main module:
program.py:
import MySQLdb, Tkinter
db=MySQLdb.connect(#blahblah) ; cur=db.cursor() #cur is defined!
from utilities_module import *
And then, as soon as I try to call any of the utilities functions, it triggers the aforementioned "global name not defined" error.
A particular suggestion was to have a "from program import cur" statement in the utilities file, such as this:
utilities_module.py:
from program import cur
#rest of function definitions
program.py:
import Tkinter, MySQLdb
db=MySQLdb.connect(#blahblah) ; cur=db.cursor() #cur is defined!
from utilities_module import *
But that's cyclic import or something like that and, bottom line, it crashes too. So my question is:
How in hell can I make the "cur" object, defined in the main module, visible to those auxiliary functions which are imported into it?
Thanks for your time and my deepest apologies if the solution has been posted elsewhere. I just can't find the answer myself and I've got no more tricks in my book.
Globals in Python are global to a module, not across all modules. (Many people are confused by this, because in, say, C, a global is the same across all implementation files unless you explicitly make it static.)
There are different ways to solve this, depending on your actual use case.
Before even going down this path, ask yourself whether this really needs to be global. Maybe you really want a class, with f as an instance method, rather than just a free function? Then you could do something like this:
import module1
thingy1 = module1.Thingy(a=3)
thingy1.f()
If you really do want a global, but it's just there to be used by module1, set it in that module.
import module1
module1.a=3
module1.f()
On the other hand, if a is shared by a whole lot of modules, put it somewhere else, and have everyone import it:
import shared_stuff
import module1
shared_stuff.a = 3
module1.f()
… and, in module1.py:
import shared_stuff
def f():
print shared_stuff.a
Don't use a from import unless the variable is intended to be a constant. from shared_stuff import a would create a new a variable initialized to whatever shared_stuff.a referred to at the time of the import, and this new a variable would not be affected by assignments to shared_stuff.a.
Or, in the rare case that you really do need it to be truly global everywhere, like a builtin, add it to the builtin module. The exact details differ between Python 2.x and 3.x. In 3.x, it works like this:
import builtins
import module1
builtins.a = 3
module1.f()
As a workaround, you could consider setting environment variables in the outer layer, like this.
main.py:
import os
os.environ['MYVAL'] = str(myintvariable)
mymodule.py:
import os
myval = None
if 'MYVAL' in os.environ:
myval = os.environ['MYVAL']
As an extra precaution, handle the case when MYVAL is not defined inside the module.
This post is just an observation for Python behaviour I encountered. Maybe the advices you read above don't work for you if you made the same thing I did below.
Namely, I have a module which contains global/shared variables (as suggested above):
#sharedstuff.py
globaltimes_randomnode=[]
globalist_randomnode=[]
Then I had the main module which imports the shared stuff with:
import sharedstuff as shared
and some other modules that actually populated these arrays. These are called by the main module. When exiting these other modules I can clearly see that the arrays are populated. But when reading them back in the main module, they were empty. This was rather strange for me (well, I am new to Python). However, when I change the way I import the sharedstuff.py in the main module to:
from globals import *
it worked (the arrays were populated).
Just sayin'
A function uses the globals of the module it's defined in. Instead of setting a = 3, for example, you should be setting module1.a = 3. So, if you want cur available as a global in utilities_module, set utilities_module.cur.
A better solution: don't use globals. Pass the variables you need into the functions that need it, or create a class to bundle all the data together, and pass it when initializing the instance.
The easiest solution to this particular problem would have been to add another function within the module that would have stored the cursor in a variable global to the module. Then all the other functions could use it as well.
module1:
cursor = None
def setCursor(cur):
global cursor
cursor = cur
def method(some, args):
global cursor
do_stuff(cursor, some, args)
main program:
import module1
cursor = get_a_cursor()
module1.setCursor(cursor)
module1.method()
Since globals are module specific, you can add the following function to all imported modules, and then use it to:
Add singular variables (in dictionary format) as globals for those
Transfer your main module globals to it
.
addglobals = lambda x: globals().update(x)
Then all you need to pass on current globals is:
import module
module.addglobals(globals())
Since I haven't seen it in the answers above, I thought I would add my simple workaround, which is just to add a global_dict argument to the function requiring the calling module's globals, and then pass the dict into the function when calling; e.g:
# external_module
def imported_function(global_dict=None):
print(global_dict["a"])
# calling_module
a = 12
from external_module import imported_function
imported_function(global_dict=globals())
>>> 12
The OOP way of doing this would be to make your module a class instead of a set of unbound methods. Then you could use __init__ or a setter method to set the variables from the caller for use in the module methods.
Update
To test the theory, I created a module and put it on pypi. It all worked perfectly.
pip install superglobals
Short answer
This works fine in Python 2 or 3:
import inspect
def superglobals():
_globals = dict(inspect.getmembers(
inspect.stack()[len(inspect.stack()) - 1][0]))["f_globals"]
return _globals
save as superglobals.py and employ in another module thusly:
from superglobals import *
superglobals()['var'] = value
Extended Answer
You can add some extra functions to make things more attractive.
def superglobals():
_globals = dict(inspect.getmembers(
inspect.stack()[len(inspect.stack()) - 1][0]))["f_globals"]
return _globals
def getglobal(key, default=None):
"""
getglobal(key[, default]) -> value
Return the value for key if key is in the global dictionary, else default.
"""
_globals = dict(inspect.getmembers(
inspect.stack()[len(inspect.stack()) - 1][0]))["f_globals"]
return _globals.get(key, default)
def setglobal(key, value):
_globals = superglobals()
_globals[key] = value
def defaultglobal(key, value):
"""
defaultglobal(key, value)
Set the value of global variable `key` if it is not otherwise st
"""
_globals = superglobals()
if key not in _globals:
_globals[key] = value
Then use thusly:
from superglobals import *
setglobal('test', 123)
defaultglobal('test', 456)
assert(getglobal('test') == 123)
Justification
The "python purity league" answers that litter this question are perfectly correct, but in some environments (such as IDAPython) which is basically single threaded with a large globally instantiated API, it just doesn't matter as much.
It's still bad form and a bad practice to encourage, but sometimes it's just easier. Especially when the code you are writing isn't going to have a very long life.

Python: Instantiate a class from strings in file with arguments

I have several classes imported on a code but I need to instantiate only those classes that are listed on a text file. So I have something like this
from module1 import c1
from module2 import c2
...
and in the text file I have a list of only those classes I want to instantiate like
c1()
c2(True)
...
so I want to read the file lines to a list (classes) and do something like
for i in classes:
classes_list.append(i)
so that each element of the list is an instantiated class. I tried doing this based on other solutions I found here
for i in classes:
classes_list.append(globals()[i])
but I always get this error
KeyError: 'c1()'
or
KeyError: 'c2(True)'
Any ideas how something like this could be possible?
You are implementing a mini-language that expresses how to call certain functions. This can get difficult, but it turns out python already implements its own mini language with the eval function. With eval, python will compile and execute python expressions.
This is considered risky for stuff coming from anonymous and potentially malicious users on the network but may be a reasonable solution for people who have some level of trust. For instance, if the people writing these files are in your organization and could mess with you a thousand ways anyway, you may be able to trust them with this. I implemented a system were people could write fragments of test code and my system would wrap it all up and turn it into a test suite. No problem because those folks already had complete access to the systems under test.
module1.py
def c1(p=1):
return p
def c2(p=1):
return p
def c3(p=1):
return p
test.py
import module1
my_globals = {
'c1': module1.c1,
'c2': module1.c2,
'c3': module1.c3,
}
test = ["c1()",
"c2(p=c1())",
"c3('i am a string')",
"c1(100)"]
for line in test:
print(line.strip() + ':', eval(line, my_globals))
result
c1(): 1
c2(p=c1()): 1
c3('i am a string'): i am a string
c1(100): 100

restart python (or reload modules) in py.test tests

I have a (python3) package that has completely different behaviour depending on how it's init()ed (perhaps not the best design, but rewriting is not an option). The module can only be init()ed once, a second time gives an error. I want to test this package (both behaviours) using py.test.
Note: the nature of the package makes the two behaviours mutually exclusive, there is no possible reason to ever want both in a singular program.
I have serveral test_xxx.py modules in my test directory. Each module will init the package in the way in needs (using fixtures). Since py.test starts the python interpreter once, running all test-modules in one py.test run fails.
Monkey-patching the package to allow a second init() is not something I want to do, since there is internal caching etc that might result in unexplained behaviour.
Is it possible to tell py.test to run each test module in a separate python process (thereby not being influenced by inits in another test-module)
Is there a way to reliably reload a package (including all sub-dependencies, etc)?
Is there another solution (I'm thinking of importing and then unimporting the package in a fixture, but this seems excessive)?
To reload a module, try using the reload() from library importlib
Example:
from importlib import reload
import some_lib
#do something
reload(some_lib)
Also, launching each test in a new process is viable, but multiprocessed code is kind of painful to debug.
Example
import some_test
from multiprocessing import Manager, Process
#create new return value holder, in this case a list
manager = Manager()
return_value = manager.list()
#create new process
process = Process(target=some_test.some_function, args=(arg, return_value))
#execute process
process.start()
#finish and return process
process.join()
#you can now use your return value as if it were a normal list,
#as long as it was assigned in your subprocess
Delete all your module imports and also your tests import that also import your modules:
import sys
for key in list(sys.modules.keys()):
if key.startswith("your_package_name") or key.startswith("test"):
del sys.modules[key]
you can use this as a fixture by configuring on your conftest.py file a fixture using the #pytest.fixture decorator.
Once I had similar problem, quite bad design though..
#pytest.fixture()
def module_type1():
mod = importlib.import_module('example')
mod._init(10)
yield mod
del sys.modules['example']
#pytest.fixture()
def module_type2():
mod = importlib.import_module('example')
mod._init(20)
yield mod
del sys.modules['example']
def test1(module_type1)
pass
def test2(module_type2)
pass
The example/init.py had something like this
def _init(val):
if 'sample' in globals():
logger.info(f'example already imported, val{sample}' )
else:
globals()['sample'] = val
logger.info(f'importing example with val : {val}')
output:
importing example with val : 10
importing example with val : 20
No clue as to how complex your package is, but if its just global variables, then this probably helps.
I have the same problem, and found three solutions:
reload(some_lib)
patch SUT, as the imported method is a key and value in SUT, you can patch the
SUT. Example, if you use f2 of m2 in m1, you can patch m1.f2 instead of m2.f2
import module, and use module.function.

Resources