I am trying to load my saved model from s3 using joblib
import pandas as pd
import numpy as np
import json
import subprocess
import sqlalchemy
from sklearn.externals import joblib
ENV = 'dev'
model_d2v = load_d2v('model_d2v_version_002', ENV)
def load_d2v(fname, env):
model_name = fname
if env == 'dev':
try:
model=joblib.load(model_name)
except:
s3_base_path='s3://sd-flikku/datalake/doc2vec_model'
path = s3_base_path+'/'+model_name
command = "aws s3 cp {} {}".format(path,model_name).split()
print('loading...'+model_name)
subprocess.call(command)
model=joblib.load(model_name)
else:
s3_base_path='s3://sd-flikku/datalake/doc2vec_model'
path = s3_base_path+'/'+model_name
command = "aws s3 cp {} {}".format(path,model_name).split()
print('loading...'+model_name)
subprocess.call(command)
model=joblib.load(model_name)
return model
But I get this error:
from sklearn.externals import joblib
ImportError: cannot import name 'joblib' from 'sklearn.externals' (C:\Users\prane\AppData\Local\Programs\Python\Python37\lib\site-packages\sklearn\externals\__init__.py)
Then I tried installing joblib directly by doing
import joblib
but it gave me this error
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 8, in load_d2v_from_s3
File "/home/ec2-user/.local/lib/python3.7/site-packages/joblib/numpy_pickle.py", line 585, in load
obj = _unpickle(fobj, filename, mmap_mode)
File "/home/ec2-user/.local/lib/python3.7/site-packages/joblib/numpy_pickle.py", line 504, in _unpickle
obj = unpickler.load()
File "/usr/lib64/python3.7/pickle.py", line 1088, in load
dispatch[key[0]](self)
File "/usr/lib64/python3.7/pickle.py", line 1376, in load_global
klass = self.find_class(module, name)
File "/usr/lib64/python3.7/pickle.py", line 1426, in find_class
__import__(module, level=0)
ModuleNotFoundError: No module named 'sklearn.externals.joblib'
Can you tell me how to solve this?
You should directly use
import joblib
instead of
from sklearn.externals import joblib
It looks like your existing pickle save file (model_d2v_version_002) encodes a reference module in a non-standard location – a joblib that's in sklearn.externals.joblib rather than at top-level.
The current scikit-learn documentation only talks about a top-level joblib – eg in 3.4.1 Persistence example – but I do see a reference in someone else's old issue to a DeprecationWarning in scikit-learn version 0.21 about an older scikit.external.joblib variant going away:
Python37\lib\site-packages\sklearn\externals\joblib_init_.py:15:
DeprecationWarning: sklearn.externals.joblib is deprecated in 0.21 and
will be removed in 0.23. Please import this functionality directly
from joblib, which can be installed with: pip install joblib. If this
warning is raised when loading pickled models, you may need to
re-serialize those models with scikit-learn 0.21+.
'Deprecation' means marking something as inadvisable to rely-upon, as it is likely to be discontinued in a future release (often, but not always, with a recommended newer way to do the same thing).
I suspect your model_d2v_version_002 file was saved from an older version of scikit-learn, and you're now using scikit-learn (aka sklearn) version 0.23+ which has totally removed the sklearn.external.joblib variation. Thus your file can't be directly or easily loaded to your current environment.
But, per the DeprecationWarning, you can probably temporarily use an older scikit-learn version to load the file the old way once, then re-save it with the now-preferred way. Given the warning info, this would probably require scikit-learn version 0.21.x or 0.22.x, but if you know exactly which version your model_d2v_version_002 file was saved from, I'd try to use that. The steps would roughly be:
create a temporary working environment (or roll back your current working environment) with the older sklearn
do imports something like:
import sklearn.external.joblib as extjoblib
import joblib
extjoblib.load() your old file as you'd planned, but then immediately re-joblib.dump() the file using the top-level joblib. (You likely want to use a distinct name, to keep the older file around, just in case.)
move/update to your real, modern environment, and only import joblib (top level) to use joblib.load() - no longer having any references to `sklearn.external.joblib' in either your code, or your stored pickle files.
You can import joblib directly by installing it as a dependency and using import joblib,
Documentation.
Maybe your code is outdated. For anyone who aims to use fetch_mldata in digit handwritten project, you should fetch_openml instead. (link)
In old version of sklearn:
from sklearn.externals import joblib
mnist = fetch_mldata('MNIST original')
In sklearn 0.23 (stable release):
import sklearn.externals
import joblib
dataset = datasets.fetch_openml("mnist_784")
features = np.array(dataset.data, 'int16')
labels = np.array(dataset.target, 'int')
For more info about deprecating fetch_mldata see scikit-learn doc
none of the answers below works for me, with a little changes this modification was ok for me
import sklearn.externals as extjoblib
import joblib
for this error, I had to directly use the following and it worked like a charm:
import joblib
Simple
In case the execution / call to joblib is within another .py program instead of your own (in such case even you have installed joblib, it still causes error from within the calling python programme unless you change the code, i thought would be messy), I tried to create a hardlink:
(windows version)
Python> import joblib
then inside your sklearn path >......\Lib\site-packages\sklearn\externals
mklink /J ./joblib .....\Lib\site-packages\joblib
(you can work out the above using a ! or %, !mklink....... or %mklink...... inside your Python juptyter notebook , or use python OS command...)
This effectively create a virtual folder of joblib within the "externals" folder
Remarks:
Of course to be more version resilient, your code has to check for the version of sklearn is >= 0.23 again before hand.
This would be alternative to changing sklearn vesrion.
When getting error:
from sklearn.externals import joblib it deprecated older version.
For new version follow:
conda install -c anaconda scikit-learn (install using "Anaconda Promt")
import joblib (Jupyter Notebook)
I had the same problem
What I did not realize was that joblib was already installed!
so what you have to do is replace
from sklearn.externals import joblib
with
import joblib
and that is it
After a long investigation, given my computer setup, I've found that was because an SSL certificate was required to download the dataset.
Related
I want to import a standard library (of a given version) in a Databricks Notebook Job. I do not want to install the library everytime a job cluster is created for this job. I want to install the library in a DBFS location and import the library directly from that location (by changing sys.path or something similar).
This is working in local:
I installed a library in a given path using:
pip install --target=customLocation library=major.minor
Append custom Location to sys.path variable:
sys.path.insert(0, 'customLocation')
When I import the library and check the location, I get the expected response
import library
print(library.__file__)
#Output - customLocation\library\__init__.py
However, the same exact sequence does not work in Databricks:
I installed the library in DBFS location:
%pip install --target='dbfs/customFolder/' numpy==1.19.4
Append sys.path variable:
sys.path.insert(0, 'dbfs/customFolder/')
Tried to find numpy version and file location:
import numpy
print(numpy.__version__)
print(numpy.__file__)
#Output - 1.21.4 (Databricks Runtime 10.4 Default Numpy)
/dbfs/customFolder/numpy/__init__.py
The customFolder has version 1.19.4 and the imported numpy shows that location, but version number does not match?
How exactly is the imports working in databricks to create this behaviour?
I also tried importing using the importlib function and the result remains the same. Link - https://docs.python.org/3/library/importlib.html#importing-a-source-file-directly
import importlib.util
import sys
spec = importlib.util.spec_from_file_location('numpy', '/dbfs/customFolder/')
module = importlib.util.module_from_spec(spec)
sys.modules['numpy'] = module
spec.loader.exec_module(module)
import numpy
print(numpy.__version__)
print(numpy.__file__)
Related:
How to install Python Libraries in a DBFS Location and access them through Job Clusters (without installing it for every job)?
I was trying to kaggle kernel of Bayesian Hyperparam Optimization of RF. And I couldn't import sklearn.gaussian_process.GaussianProcess. Please help this poor scikit-learn newbie.
from sklearn.gaussian_process import GaussianProcess as GP
error:
Traceback (most recent call last):
File "C:/Users/Develop/PycharmProjects/reinforcement recommandation system/BNP/bayesianoptimization-of-random-forest.py", line 24, in <module>
from sklearn.gaussian_process import GaussianProcess as GP
ImportError: cannot import name 'GaussianProcess' from 'sklearn.gaussian_process' (C:\Users\Develop\PycharmProjects\reinforcement recommandation system\lib\site-packages\sklearn\gaussian_process\__init__.py)
Process finished with exit code 1
Depending on whether you need the regressor or classifier:
from sklearn.gaussian_process import GaussianProcessRegressor as GP
from sklearn.gaussian_process import GaussianProcessClassifier as GP
Also, have a look at the different modules
It seems that you have to use the old scikit-learn version 0.15-git
The documentation for sklearn.gaussian_process.GaussianProcess is located here:
https://scikit-learn.org/0.15/modules/generated/sklearn.gaussian_process.GaussianProcess.html
I just had the same problem, but I will move on to try to understand if sklearn.gaussian_process.GaussianProcessRegressor in the current version scikit-learn 1.0.2 will work for my purposes.
import nashpy as nash
I'm attempting to import nashpy but I keep receiving the error message below despite already pip installing nashy:
File "G:\WinPython-64bit-3.5.3.1Qt5\python-3.5.3.amd64\lib\site-packages\scipy\__init__.py", line 61, in <module>
from numpy._distributor_init import NUMPY_MKL # requires numpy+mkl
ImportError: cannot import name 'NUMPY_MKL'
>>>
Could you help me understand it?
If you look at the line which is causing the error, you'll see this:
from numpy._distributor_init import NUMPY_MKL # requires numpy+mkl
This line comment states the dependency as numpy+mkl (numpy with Intel Math Kernel Library). This means that you've installed the numpy by pip, but the scipy was installed by precompiled archive, which expects numpy+mkl.
This problem can be easy solved by installation for numpy+mkl from whl file from here.
The script normally works on a local machine, but it also fails when it is uploaded to the server. The program falls on the line:
from keras.models import model_from_json
All versions of libraries on a laptop and server are the same. Python 3.6.5, Keras 2.2.4, Tensorflow 1.5.0, Numpy 1.14.3
Error message:
(base) C:\classX5>python app.py
Using TensorFlow backend.
Traceback (most recent call last):
File "app.py", line 8, in <module>
from keras.models import model_from_json
File "C:\Anaconda3\lib\site-packages\keras\__init__.py", line 3, in <module>
from . import utils
File "C:\Anaconda3\lib\site-packages\keras\utils\__init__.py", line 27, in <mo
dule>
from .multi_gpu_utils import multi_gpu_model
File "C:\Anaconda3\lib\site-packages\keras\utils\multi_gpu_utils.py", line 7,
in <module>
from ..layers.merge import concatenate
File "C:\Anaconda3\lib\site-packages\keras\layers\__init__.py", line 4, in <mo
dule>
from ..engine.base_layer import Layer
File "C:\Anaconda3\lib\site-packages\keras\engine\__init__.py", line 8, in <mo
dule>
from .training import Model
File "C:\Anaconda3\lib\site-packages\keras\engine\training.py", line 21, in <m
odule>
from . import training_arrays
File "C:\Anaconda3\lib\site-packages\keras\engine\training_arrays.py", line 8,
in <module>
from scipy.sparse import issparse
File "C:\Anaconda3\lib\site-packages\scipy\__init__.py", line 119, in <module>
from scipy._lib._ccallback import LowLevelCallable
File "C:\Anaconda3\lib\site-packages\scipy\_lib\_ccallback.py", line 1, in <mo
dule>
from . import _ccallback_c
ImportError: cannot import name '_ccallback_c'
All versions of libraries on a laptop and server are the same. Python
3.6.5, Keras 2.2.4, Tensorflow 1.5.0, Numpy 1.14.3
But NOT the scipy package.
Just uninstall (pip3 uninstall scipy) and reinstall scipy package. It solved the problem for many people.
Posting my same answer here I posted on the linked question:
This is likely an issue with the SciPy Python module compatibility you have installed in your site-packages folder and the OS architecture you're running it on.
After digging in, to give the full background on this, first of all SciPy relies on having NumPy already installed. The SciPy wheel's setup.py file uses NumPy functionality to configure and install the wheel.
SciPy setup.py:
...
if __name__ == '__main__':
from numpy.distutils.core import setup
setup(**configuration(top_path='').todict())
Secondly, when just trying to use the wheel, if you run into this error, you can see after inspecting the wheel's files that the reason is the binary wheels have a naming convention where the shared object file, here it's called _ccallback_c.so, is instead named based on the architecture that the binary wheel supports. When trying to import the shared object by file name in /_lib/_ccallback.py it can't find it, hence this error (line 1 in /_lib/_ccallback.py) because, instead of being named _ccallback_c.so it's called _ccallback_c.cpython-36m-x86_64-linux-gnu.so or another architecture variation:
from . import _ccallback_c
These file names may be an artifact of having not run the NumPy setup process yet or something related to Cython, that I'm not quite sure about. But the easiest fix is to change the .whl extension to .zip and rename all those relevant .so files to not contain the architecture snippet. Then change .zip -> .whl and it should be good to go.
Following the tutorial:
http://www.pyimagesearch.com/2016/08/10/imagenet-classification-with-python-and-keras/#comment-419896
Using these files:
https://github.com/fchollet/deep-learning-models
I get 2 separate errors depending on how I execute:
Running in PyCharm:
Using TensorFlow backend.
usage: test_imagenet.py [-h] -i IMAGE
test_imagenet.py: error: the following arguments are required: -i/--image
Running in cmd line:
C:\Users\AppData\Local\Programs\Python\Python35\Scripts>python deep-learning-models/test_imagenet.py --image deep-learning-models/images/dog.jpg
Traceback (most recent call last):
File "deep-learning-models/test_imagenet.py", line 2, in <module>
from keras.preprocessing import image as image_utils
ImportError: No module named keras.preprocessing
How do I resolve?
Its best if you solve this problem outside running the above script... Here is what you can try in your command line environment to make sure it works outside your script:
>>> import keras
Using TensorFlow backend.
>>> keras.__version__
'1.2.1'
>>> keras.preprocessing
<module 'keras.preprocessing' from '/usr/local/lib/python2.7/dist-packages/keras/preprocessing/__init__.pyc'>
>>> from keras.preprocessing import image as image_utils
>>>
Make sure you have latest version of keras installed. If you get above working then it could be the environment issue where above script is not able to find the keras package. However if above does not work or work partially you would need to install keras again by removing it first..
$ pip install keras --user
Every dependency in a python project need to be installed using pip or easy_install or from the source code. You will have to install the keras module as mentioned here.
This happened to me. It turned out I was working in a pyvenv which wasn't activated. Just run source bin/activate on Linux/Mac or Scripts\activate.bat on Windows
from keras.models import Sequential
from keras import legacy_tf_layer
from keras.preprocessing import image as image_utils
from keras.preprcessing.text import Toknizer
import pandas as pd
from sklearn.model_selection import train_test_spli