Error with scikit-learn module in AWS Lambda - python-3.x

I'm using AWS Lambda to host a sklearn model. I'm able to do this successfully with a model that was made in python 3.6 but I'm getting the following error with one using python 3.7.
[ERROR] Runtime.ImportModuleError: Unable to import module 'lambda_function': No module named 'sklearn.__check_build._check_build'
___________________________________________________________________________
Contents of /opt/python/lib/python3.7/site-packages/sklearn/__check_build:
__init__.py __pycache__ _check_build.cpython-37m-darwin.so
setup.py
___________________________________________________________________________
It seems that scikit-learn has not been built correctly.
If you have installed scikit-learn from source, please do not forget
to build the package before using it: run `python setup.py install` or
`make` in the source directory.
If you have used an installer, please check that it is suited for your
Python version, your operating system and your platform.
I created my sklearn layer by uploading a zipped file of the library. I created this zipped file using a virtual environment and installing the library through pip.
Does anyone know what I'm doing wrong? Has anyone been able to successfully install a sklearn layer in python 3.7 in AWS Lambda?
I uploaded my zip file here: https://github.com/aos226/Sklearn_AWSLambda
Thanks for your help!

Related

Install pywin32 package in google colab or kaggle notebook environment

pywin32 package was required to install as part of requirements to set up the environment for pix2pix implementation codebase, pywin32 is used to enable the features of the Win32 API in python. I tried to set up an environment in google colab, and produced the following error message during pywin32 setup.
ERROR: Could not find a version that satisfies the requirement pywin32
(from versions: none) ERROR: No matching distribution found for
pywin32
Similar issue with the following message encountered while trying to implement in kaggle:
ERROR: Could not find a version that satisfies the requirement pywin32
ERROR: No matching distribution found for pywin32
The same issue encountered when I tried in my local python environment (Python 3.6.10) in my mac.
Also, I attempt to install pywin32 package from its source itself, using the latest tag build-300 as suggested for python 3.5+. But no luck, installation terminated with the dependency issue with winreg package not found, following message was shown.
ModuleNotFoundError: No module named 'winreg'
Likewise, tried with fake-winreg, but no luck at all. I checked the platform in google colab by print(sys.platform), it shows linux. Please advise if there is any workaround to install pywin32 package in colab and/or resolution solving any issue reported in the above steps. Thank you in advance.
Note:
Issue can be replicated by simply try pip install pywin32 in native python environment, and !pip install pywin32 in colab or kaggle environment.
Unfortunately you can't install it in linux python, pywin32 is a package of extension modules for accessing Windows C and COM APIs in Windows python:
Python extensions for Microsoft Windows Provides access to much of the Win32 API, the ability to create and use COM objects, and the Pythonwin environment.
Google Colab
Kaggle

Unable to import numpy 1.19.1 in AWS Lambda No module named 'numpy.core._multiarray_umath'

I am unable to import numpy 1.19.1 in AWS Lambda with python3.8 on AWS Lambda
I am using the following dependencies:
pandas 1.1.0
pyarrow 1.0.0
numpy 1.19.1
psycopg2 2.8.5
Because I work on a windows environment, I created an EC2 Linux instance installed python3.8 and downloaded all required libraries, then I added them into the project, but the moment I try to import pandas I get the following:
[ERROR] ImportError: Unable to import required dependencies:
numpy:
IMPORTANT: PLEASE READ THIS FOR ADVICE ON HOW TO SOLVE THIS ISSUE!
Importing the numpy c-extensions failed.
- Try uninstalling and reinstalling numpy.
- If you have already done that, then:
1. Check that you expected to use Python3.8 from "/var/lang/bin/python3.8",
and that you have no directories in your PATH or PYTHONPATH that can
interfere with the Python and numpy version "1.18.2" you're trying to use.
2. If (1) looks fine, you can open a new issue at
https://github.com/numpy/numpy/issues. Please include details on:
- how you installed Python
- how you installed numpy
- your operating system
- whether or not you have multiple versions of Python installed
- if you built from source, your compiler versions and ideally a build log
- If you're working with a numpy git repository, try `git clean -xdf`
(removes all files not under version control) and rebuild numpy.
Note: this error has many possible causes, so please don't comment on
an existing issue about this - open a new one instead.
Original error was: No module named 'numpy.core._multiarray_umath'
Traceback (most recent call last):
  File "/var/task/src/py38-lib-test.py", line 28, in py38test
    import pandas
  File "/tmp/lib/pandas/__init__.py", line 16, in <module>
    raise ImportError(END RequestId: 07762380-1fc4)
Lastly, I noticed AWS Lambda provides a layer with numpy and sci-kit, I tried removing my numpy version but kept the rest and added the layer to the function, but the same error occurs.
Thanks in advance your comments.
I use the layer provided by Klayers to solve the problem.
Suppose you're running python 3.8 in us-east-1 region, according to this Klayers document, you can use arn:aws:lambda:us-east-1:770693421928:layer:Klayers-p38-numpy:9 as your layer so that you can run import numpy in the lambda function.
AWS Lambda function don't work this way. If you open the Pandas package it'll be having the Numpy package with them but they would not work.
The easy solution is to first download the required packages separately depending upon your python version and work enviroment from this site, unzip them and add them to your project directory. Create a .zip of your project and deploy it on AWS Lambda function. It'll work this way.
You can refer to this site in order to follow the complete procedure.
Is your ec2 instance an amazon linux2 machine? You could also try building and run a docker image for amazon linux 2 and get the python libs compatible to the environment you need in your Lambda, by volume mounting to your host.
Something similat to docker lambda:
https://github.com/lambci/docker-lambda/tree/master/python3.8
I had the same issue, tried packaging all libs with my base code, tried custom lambda layer by separating numpy and pandas libs. Nothing worked.
Used default AWS Layers. In the default layers, AWS provides layers like AWSSDKPandas, CodeGuru, Lambda Insights, etc. AWSSDKPandas layer is packaged with pandas libs and other dependencies like numpy, etc.
So I removed numpy dependency from my base package and added AWSSDKPandas as Lambda layer. Worked well.

Creating .exe file from python package for windows10

I implemented one package with GUI based on Pyside2. This package has the following structure:
Repository
Function
GUI_class
Main_GUI.py
It means Main_GUI.py use some classes and function in GUI and Function folders. Now I want to create the executable version from it for windows 10. I tried Pyinstaller for this aim like the following comments:
(conda env)F:\Repository> Pyinstaller Main_GUI.py
But when I run the *.exe file in dist folder it gives me the following error:
ModuleNotFoundError: No module named 'pkg_resources.py2_warn'
Blockquote[15728] Failed to execute script pyi_rth_pkgre
Would you please help me how can I convert this python package to *.exe? Thank you in advance.
These are my version of my dependencies:
Python 3.6.10,
Pyinstaller 3.6,
Pyside2 5.13.2.

ClobberError when trying to install the nltk_data package using conda?

I am trying to install nltk_data package to my environment natlang using conda by giving the following command:
(natlang) C:\Users\asus>conda install -c conda-forge nltk_data
I receive the following errors:
Verifying transaction: failed
CondaVerificationError: The package for nltk_data located at
C:\Users\asus\Anaconda3\pkgs\nltk_data-2017.10.22-py_0
appears to be corrupted. The path
'lib/nltk_data/corpora/propbank/frames/con.xml'
specified in the package manifest cannot be found.
ClobberError: This transaction has incompatible packages due to a shared
path.
packages: conda-forge::nltk_data-2017.10.22-py_0, conda-forge::nltk_data-
2017.10.22-py_0
path: 'lib/nltk_data/corpora/nombank.1.0/readme'
ClobberError: This transaction has incompatible packages due to a shared
path.
packages: conda-forge::nltk_data-2017.10.22-py_0, conda-forge::nltk_data-
2017.10.22-py_0
path: 'lib/nltk_data/corpora/nombank.1.0/readme-dictionaries'
ClobberError: This transaction has incompatible packages due to a shared
path.
packages: conda-forge::nltk_data-2017.10.22-py_0, conda-forge::nltk_data-
2017.10.22-py_0
path: 'lib/nltk_data/corpora/nombank.1.0/readme-nombank-proposition-
structure'
I am working on Anaconda 3, python version 3.6.5, windows 10 enterprise.
Can someone please tell me why this error is occurring and how can I fix it.
Background: I originally wanted to use punkt in one of my programs using the code lines:
import nltk_data
nltk.download()
This would open the nltk downloader and after installing all the packages including punkt, on further running the program I would still encounter the following error:
LookupError:
Resource [93mpunkt[0m not found.
Please use the NLTK Downloader to obtain the resource:
[31m>>> import nltk
>>> nltk.download('punkt')
I tried rerunning the nltk.donwload() and nltk.download('punkt') a couple of times with no change. So then I decided to simply install the nltk_data package to my environment based on the assumption that if I install the package to the env itself, I won't have to use the nltk.download function to use punkt.
Summarizing, I have the following two questions:
If I install the nltk_data package to my evn, do I still need to use the nltk.download function in my code? If yes, how do I resolve the lookup error?
If installing to the evn is enough, then how do I resolve the clobber error?
(ps: I apologize if this sounds stupid, I am very new to machine learning and working with python in general.)
The nltk_data repository is a collection of zipfiles and xml meta data. Usually, it is not installation through packaging tools such as conda or pip.
But there is this utility from conda-forge that tries to install the nltk_data, https://github.com/conda-forge/nltk_data-feedstock
To use it, on the terminal/command prompt/console, first add the conda-forge channel:
conda config --add channels conda-forge
Then you shouldn't need the -c option, and just use:
conda install nltk_data
Please try the above and see whether that get rids of the ClobberError.
This error is requesting you to download a specific nltk dataset call punkt:
Please use the NLTK Downloader to obtain the resource:
>>> import nltk
>>> nltk.download('punkt')
Running nltk.download() without specifying which specific dataset you want to download will call up a tkinter GUI which normally wouldn't be possible if you are accessing your machine remotely without a GUI.
If you're unsure of which resource you need, I would suggest using the popular collection.
import nltk
nltk.download('popular')
Answering 2 que first- there have been similar issues all across windows machines. Its better to use the ntlk.download() function if you want to use punkt or a similar module.
1) The lookup error can easily be resolved. It was because of a typo. Instead of
import nltk_data
it should be
import nltk.data

Bazel extension file not found error when installing TensorFlow in python3?

I currently have an error while installing tensorflow with python 3. Previously I had successfully installed tensorflow with python 2, and that was after I commented out several sections for ios/android. I decided to copy over the exact tensorflow files that worked for the python 2, and reconfigured the settings to suit it for python 3. Is this correct? Because I figured the configurations would be overwritten if I reconfigure the same files, and the edited files I require for TensorFlow to work will still be present.
Here is my error code:
error loading package 'bazel-tensorflow/external/bazel_tools/tools/build_defs/docker/testdata': Extension file not found. Unable to load package for '//tools/build_defs/docker:docker.bzl': BUILD file not found on package path.

Resources