I have a project with a function foo in a module my_project.my_functions. I want to pickle that function in a way that I can unpickle it from somewhere else without requiring to import my_project. foo does not have any side effect, so no dependencies outside the function.
I'm using dill to pickle foo, but dill is saving it as a <function my_project.my_functions.foo>, and complains about the unknown my_project module when I try to unpickle it.
Any solution?
I solved it by recreating the function from the code giving and empty globals dictionary.
In /my_project/module.py:
def f(n):
return n+1
In my_project, before pickling the function:
import dill
import types
import module
f = types.FunctionType(module.f.__code__,{})
with open("my_func.pkl", 'wb') as fs:
dill.dump(f, fs)
Somewhere else:
import dill
with open("my_func.pkl", 'rb') as fs:
f = dill.load(fs)
Related
I have features and a target variable which I am wanting to generate a Decision Tree. However, the code is throwing an error. Since the 'out file' did not generate an error, I figured there wouldn't be an error for the 'Source.from_file' either, but there is one.
import os
from graphviz import Source
from sklearn.tree import export_graphviz
f = open("C:/Users/julia/Desktop/iris_tree.dot", 'w')
export_graphviz(
tree_clf,
out_file=f,
feature_names=sample2[0:2],
class_names=sample2[5],
rounded=True,
filled=True
)
Source.from_file(f)
As noted in the docs, from_file accepts a string path, not a file object:
filename – Filename for loading/saving the source.
Just pass the path in:
import os
from graphviz import Source
from sklearn.tree import export_graphviz
path = "C:/Users/julia/Desktop/iris_tree.dot"
f = open(path, 'w')
export_graphviz(
tree_clf,
out_file=f,
feature_names=sample2[0:2],
class_names=sample2[5],
rounded=True,
filled=True
)
Source.from_file(path)
Code below works as expected. It prints 5 random numbers.
import numpy as np
class test_class():
def __init__(self):
self.rand_nums = self.create_rand_num()
def create_rand_num(self):
numbers = np.random.rand(5)
return numbers
myclass = test_class()
myclass.rand_nums
However, the following does not work. NameError: name 'np' is not defined
import numpy as np
from test.calc import create_rand_num
class test_class():
def __init__(self):
self.rand_nums = create_rand_num()
myclass = test_class()
myclass.rand_nums
# contents of calc.py in test folder:
def create_rand_num():
print(np.random.rand(5))
But, this works:
from test.calc import create_rand_num
class test_class():
def __init__(self):
self.rand_nums = create_rand_num()
myclass = test_class()
myclass.rand_nums
# contents of calc.py in test folder:
import numpy as np
def create_rand_num():
print(np.random.rand(5))
Why must I have 'import numpy as np' inside calc.py? I already have this import before my class definition. I am sure I am misunderstanding something here, but I was trying to follow the general rule to have all the import statements at the top of the main code.
What I find confusing is that when I say "from test.calc import create_rand_num," how does Python know whether "import numpy as np" is included at the top of calc.py or not? It must know somehow, because when I include it, the code works, but when I leave it out, the code does not work.
EDIT: After reading the response from #DeepSpace, I want to ask the following:
Suppose I have the following file.py module with contents listed as shown:
import numpy as np
import pandas as pd
import x as y
def myfunc():
pass
So, if I have another file, file1.py, and in it, I say from file.py import myfunc, do I get access to np, pd, and y? This is exactly what seems to be happening in my third example above.
In my third example, notice that np is NOT defined anywhere in the main file, it is only defined in calc.py file, and I am not importing * from calc.py, I am only importing create_rand_num. Why do I not get the same NameError error?
Python is not like C. Importing a module does not copy-paste its source. It simply adds a reference to it to the locals() "namespace". import numpy as np in one file does not make it magically available in all other files.
You have to import numpy as np in every file you want to use np.
Perhaps a worthwhile reading: https://docs.python.org/3.7/reference/simple_stmts.html#the-import-statement
This might have been answered before, but I could not find anything that addresses my issue.
So, I have 2 files.
|
|-- test.py
|-- test1.py
test1.py is as below
def fnc():
return np.ndarray([1,2,3,4])
I'm trying to call test1 from test and calling the function like
from test1 import *
x = fnc()
Now naturally I'm getting NameError: name 'np' is not defined.
I tried to write the import both in test and test1 as
import numpy as np
But still, I'm getting the error. This might be silly, but what exactly I'm missing?
Any help is appreciated. Thanks in advance.
Each Python module has it's own namespace, so if some functions in test1.py depends on numpy, you have to import numpy in test1.py:
# test1.py
import numpy as np
def fnc():
return np.ndarray([1,2,3,4])
If test.py doesn't directly use numpy, you don't have to import it again, ie:
# test.py
# NB: do NOT use 'from xxx import *' in production code, be explicit
# about what you import
from test1 import fnc
if __name__ == "__main__":
result = fnc()
print(result)
Now if test.py also wants to use numpy, it has to import it too - as I say, each module has it's own namespace:
# test.py
# NB: do NOT use 'from xxx import *' in production code, be explicit
# about what you import
import numpy as np
from test1 import fnc
def other():
return np.ndarray([3, 44, 5])
if __name__ == "__main__":
result1 = fnc()
print(result1)
result2 = other()
print(result2)
Note that if you were testing your code in a python shell, just modifying the source and re-importing it in the python shell will not work (modules are only loaded once per process, subsequent imports fetch the already loaded module from the sys.modules cache), so you have to exit the shell and open a new one.
mostly you need to have __init__.py in the directort where you have these files
just try creating init.py file like below in the directory where you .py files are present and see if it helps.
touch __init__.py
I try to serialize (dill) a list containing dill-able objects which is nested inside a dict. The dict itself is imported into my main script using importlib. Calling dill.dump() raises a TypeError: can't pickle SwigPyObject objects. Here is some code with which I managed to reproduce the error for more insight.
some_config.py located under config/some_config.py:
from tensorflow.keras.optimizers import SGD
from app.feature_building import Feature
config = {
"optimizer": SGD(lr=0.001),
"features": [
Feature('method', lambda v: v + 1)
],
}
Here is the code which imports the config and tries to dill config["features"]:
import dill
import importlib.util
from config.some_config import config
spec = importlib.util.spec_from_file_location(undillable.config,"config/some_config.py")
module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(module)
undillable_config = module.config
# Works prefectly fine
with open("dillable_config.pkl", "wb") as f:
dill.dump(config["features"], f)
# Raises TypeError: can't pickle SwigPyObject objects
with open("undillable_config.pkl", "wb") as f:
dill.dump(undillable_config["features"], f)
Now the part that made me wonder: When importing the config-dict with importlib it raises the error and after some debugging I found that not only config["features"] but also config["optimizer"] will be dilled. However, using normal import seems to work and it only tries to dill config["features"]
So my question is why does dill try to serialize the whole dict if it is imported by importlib instead of only the feature-list and how may this error be fixed?
After reading the answer to this question I managed to get it working by avoiding importlib and instead import the config using __import__.
filename = "config/some_config.py"
dir_name = os.path.dirname(filename)
if dir_name not in sys.path:
sys.path.append(dir_name)
file = os.path.splitext(os.path.basename(filename))[0]
config_module = __import__(file)
# Works prefectly fine now
with open("dillable_config.pkl", "wb") as f:
dill.dump(config_module.config["features"], f)
I am trying to put together a unittest to test whether my function that reads in big data files, produces the correct result in shape of an numpy array. However, these files and arrays are huge and can not be typed in. I believe I need to save input and output files and test using them. This is how my testModule looks like:
import numpy as np
from myFunctions import fun1
import unittest
class TestMyFunctions(unittest.TestCase):
def setUp(self):
self.inputFile1 = "input1.txt"
self.inputFile2 = "input2.txt"
self.outputFile = "output.txt"
def test_fun1(self):
m1 = np.genfromtxt(self.inputFile1)
m2 = np.genfromtxt(self.inputFile2)
R = np.genfromtxt(self.outputFile)
self.assertEqual(fun1(m1,m2),R)
if __name__ =='__main__':
unittest.main(exit=False)
I'm not sure if there is a better/neater way of testing huge results.
Edit:
Also getting an attribute error now:
AttributeError: TestMyFunctions object has no attribute '_testMethodName'
Update - AttributeError Solved - 'def init()' is not allowed. Changed with def setUp()!