Unable to import Pandas in AWS Lambda - python-3.x

I am new to AWS Lambda and I want to run code on Lambda for a machine learning API. The functions that I want to run on Lambda are, in summary, one to read some csv files to create a pandas dataFrame and search in it and the other to run some pickled machine learning models through requests from a Flask application. To do this, I need to import pandas, joblib and possibly scikit-learn which are compatible with Amazon Linux. I am using a Windows machine.
In general, I am going with the approach of using Lambda's layers by uploading zip files. Of course, since Lambda has a pre-built layer with SciPy and Numpy so I will not import them. If I import them, I will exceed Lambda's layer limit anyway.
To be more specific, I have done the following:
Downloaded and extracted linux-compatible versions of the libraries listed above. For example: From this link I have downloaded "pandas-0.25.0-cp35-cp35m-manylinux1_x86_64.whl" and unzipped to a folder.
The unzipped libraries are in the following directory:
lambda_layers\python\lib\python3.7\site-packages
They are zipped into a file and uploaded onto S3 Bucket for creating a layer.
I imported the packages:
import json
import boto3
import pandas as pd
I got the following error from Lambda:
{
"errorMessage": "Unable to import module 'lambda_function': C extension: No module named 'pandas._libs.tslibs.conversion' not built. If you want to import pandas from the source directory, you may need to run 'python setup.py build_ext --inplace --force' to build the C extensions first.",
"errorType": "Runtime.ImportModuleError"
}

Folder structure should be standard, you can also use Docker to create the zipped Linux compatible library and upload it in AWS Lambda layers. Below are the tested commands to create the zipped library for AWS Lambda layer:
Create and navigate to a directory :
$mkdir aws1
$cd aws1
Write the below commands in Dockerfile and exit by CTRL + D :
$cat> Dockerfile
FROM amazonlinux:2017.03
RUN yum -y install git \
python36 \
python36-pip \
zip \
&& yum clean all
RUN python3 -m pip install --upgrade pip \
&& python3 -m pip install boto3
You can provide any name for the image :
$docker build -t pythn1/lambda .
Run the image :
$docker run --rm -it -v ${PWD}:/var/task pythn1/lambda:latest bash
Specify the package which you want to zip, in requirements.txt and exit by CTRL + D :
$ cat > requirements.txt
pandas
sklearn
You can try using correct file structure (/python/lib/python3.6/site-packages/) here, but I did not test it yet :
$pip install -r requirements.txt -t /usr/lib/python3.6/dist-packages/
Navigate to the below directory :
$cd var/task
Create a zip file :
$ zip -r ./layers.zip /usr/lib/python3.6/dist-packages/
You should be able to see a layers.zip file in aws1 folder. If you provide the correct folder structure while installing, then the below steps are not required. But, with the folder structure I used, below commands are required :
Unzip layers.zip.
Exit Docker or open a new terminal and navigate to the folder where you unzipped the file. Unzipped file will be in the folder structure /usr/lib/python3.6/dist-packages/.
Copy these files to the correct folder structure :
$ cp -r ./python/lib/python3.6/site-packages/ /usr/lib/python3.6/dist-packages/
Zip them again :
$ zip -r ./lib_python.zip ./python
Upload the zip file to the layer, and add that layer to your Lambda function. Also, make sure that you select the right running environment while creating the layer.

Following this document - https://docs.aws.amazon.com/lambda/latest/dg/configuration-layers.html#configuration-layers-path, you should zip python\lib\python3.7\site-packages\pandas (and other dependencies) folder for your python layers.
Make sure you add the layer to your function and follow the documentation for the right permissions.

I appreciate the answers that were given, just posting my own answer (that I found after a whole day looking) here for reference purpose.
I followed this guide and also this guide.
In summary, the steps to what I did are:
Connect to my Amazon EC2 instance (running on Linux) through ssh. I
wanted to deploy an application on Beanstalk so it was already up for
me anyway.
Follow the steps in the first guide to install python 3.7.
Follow the steps in the second guide to install the libraries. One of
the key notes is not to install with pip install -t since that
will lead to the libraries and the C extensions not built.
Zip the directory found in python\lib\python3.7\site-packages\ as
mentioned by the answers here (although I did follow the directory
guide in my first attempts)
Get the file from EC2 instance through
FileZilla.
Follow the Lambda layers guide and it is done.

Related

Convert databricks noteboooks.dbc into standard .py files

I often get sent databricks notebooks from various sources to move around / look at / refactor. Due to different tenancies I can't log into the actual environment. These are usually sent as .dbc files and I can convert them by opening up a new databricks enviroment are re-saving them as a .py file. I was wondering if there was a method where I could do this from command line, like nb-convert for Juypter ?
it's a little bit of a pain to import a whole host of files, then re-convert to python just for the sake of reading code.
Source control is not always an option due to permissions.
Import the .dbc in your Databricks workspace, for example in the Shared directory.
Then, as suggested by Carlos, install the Databricks CLI on your local computer and set it up.
pip install databricks-cli
databricks configure --token
and run the following to import the .py notebooks into your local folder
mkdir export_notebooks
cd export_notebooks
databricks workspace export_dir /Shared ./

Uploaded pypi package missing module on install

I have created a python package and would like to distribute it on pypi ( https://pypi.org/project/catapi.py/ ). My initial v0.1.1 upload worked without issue. I decided to add in a sub directory to store abstract classes because there was a lot of code reuse. Upon uploading this to pypi and installing, I get the message that the abc module does not exist.
I did some research and found that I must include the subdirectory in the MANIFEST.in file, so I did. Upon uploading and attempting an install again, I get the same error. I downloaded the package directly and extracted the files to find the abc directory does indeed exist. Next I checked the site-packages version of catapi only to find it does not have the abc module.
Has anyone encountered this and know how to fix this? Here's a script to show the issue
# make a temp dir to hold this in
mkdir catapi
cd catapi
# Prepare python venv
python -m venv env-catapi
source env-catapi/bin/activate
pip install catapi.py==0.3.4
# Download file for comparison
wget https://files.pythonhosted.org/packages/ac/ee/044c1cc53e7c994fe4a7d57362651da8adff54eb34680c66f62a1d4fb57d/catapi.py-0.3.4.tar.gz
tar -xvf catapi.py-0.3.4.tar.gz
diff catapi.py-0.3.4/catapi env-catapi/lib/python3.8/site-packages/catapi
deactivate
cd ../
# Prints out
# Only in catapi: abc
# Only in env-catapi/lib/python3.8/site-packages/catapi: __pycache__
It's necessary to add in the sub-directories into the
packages=['package1', 'package2', 'etc']
part of setup.py. In my case, I had to add in the abc directory to have it placed in the catapi install
packages=['catapi', 'catapi.abc'],

Google Cloud Platform API for Python and AWS Lambda Incompatibility: Cannot import name 'cygrpc'

I am trying to use Google Cloud Platform (specifically, the Vision API) for Python with AWS Lambda. Thus, I have to create a deployment package for my dependencies. However, when I try to create this deployment package, I get several compilation errors, regardless of the version of Python (3.6 or 2.7). Considering the version 3.6, I get the issue "Cannot import name 'cygrpc'". For 2.7, I get some unknown error with the .path file. I am following the AWS Lambda Deployment Package instructions here. They recommend two options, and both do not work / result in the same issue. Is GCP just not compatible with AWS Lambda for some reason? What's the deal?
Neither Python 3.6 nor 2.7 work for me.
NOTE: I am posting this question here to answer it myself because it took me quite a while to find a solution, and I would like to share my solution.
TL;DR: You cannot compile the deployment package on your Mac or whatever pc you use. You have to do it using a specific OS/"setup", the same one that AWS Lambda uses to run your code. To do this, you have to use EC2.
I will provide here an answer on how to get Google Cloud Vision working on AWS Lambda for Python 2.7. This answer is potentially extendable for other other APIs and other programming languages on AWS Lambda.
So the my journey to a solution began with this initial posting on Github with others who have the same issue. One solution someone posted was
I had the same issue " cannot import name 'cygrpc' " while running
the lambda. Solved it with pip install google-cloud-vision in the AMI
amzn-ami-hvm-2017.03.1.20170812-x86_64-gp2 instance and exported the
lib/python3.6/site-packages to aws lambda Thank you #tseaver
This is partially correct, unless I read it wrong, but regardless it led me on the right path. You will have to use EC2. Here are the steps I took:
Set up an EC2 instance by going to EC2 on Amazon. Do a quick read about AWS EC2 if you have not already. Set one up for amzn-ami-hvm-2018.03.0.20180811-x86_64-gp2 or something along those lines (i.e. the most updated one).
Get your EC2 .pem file. Go to your Terminal. cd into your folder where your .pem file is. ssh into your instance using
ssh -i "your-file-name-here.pem" ec2-user#ec2-ip-address-here.compute-1.amazonaws.com
Create the following folders on your instance using mkdir: google-cloud-vision, protobuf, google-api-python-client, httplib2, uritemplate, google-auth-httplib2.
On your EC2 instance, cd into google-cloud-vision. Run the command:
pip install google-cloud-vision -t .
Note If you get "bash: pip: command not found", then enter "sudo easy_install pip" source.
Repeat step 4 with the following packages, while cd'ing into the respective folder: protobuf, google-api-python-client, httplib2, uritemplate, google-auth-httplib2.
Copy each folder on your computer. You can do this using the scp command. Again, in your Terminal, not your EC2 instance and not the Terminal window you used to access your EC2 instance, run the command (below is an example for your "google-cloud-vision" folder, but repeat this with every folder):
sudo scp -r -i your-pem-file-name.pem ec2-user#ec2-ip-address-here.compute-1.amazonaws.com:~/google-cloud-vision ~/Documents/your-local-directory/
Stop your EC2 instance from the AWS console so you don't get overcharged.
For your deployment package, you will need a single folder containing all your modules and your Python scripts. To begin combining all of the modules, create an empty folder titled "modules." Copy and paste all of the contents of the "google-cloud-vision" folder into the "modules" folder. Now place only the folder titled "protobuf" from the "protobuf" (sic) main folder in the "Google" folder of the "modules" folder. Also from the "protobuf" main folder, paste the Protobuf .pth file and the -info folder in the Google folder.
For each module after protobuf, copy and paste in the "modules" folder the folder titled with the module name, the .pth file, and the "-info" folder.
You now have all of your modules properly combined (almost). To finish combination, remove these two files from your "modules" folder: googleapis_common_protos-1.5.3-nspkg.pth and google_cloud_vision-0.34.0-py3.6-nspkg.pth. Copy and paste everything in the "modules" folder into your deployment package folder. Also, if you're using GCP, paste in your .json file for your credentials as well.
Finally, put your Python scripts in this folder, zip the contents (not the folder), upload to S3, and paste the link in your AWS Lambda function and get going!
If something here doesn't work as described, please forgive me and either message me or feel free to edit my answer. Hope this helps.
Building off the answer from #Josh Wolff (thanks a lot, btw!), this can be streamlined a bit by using a Docker image for Lambdas that Amazon makes available.
You can either bundle the libraries with your project source or, as I did below in a Makefile script, upload it as an AWS layer.
layer:
set -e ;\
docker run -v "$(PWD)/src":/var/task "lambci/lambda:build-python3.6" /bin/sh -c "rm -R python; pip install -r requirements.txt -t python/lib/python3.6/site-packages/; exit" ;\
pushd src ;\
zip -r my_lambda_layer.zip python > /dev/null ;\
rm -R python ;\
aws lambda publish-layer-version --layer-name my_lambda_layer --description "Lambda layer" --zip-file fileb://my_lambda_layer.zip --compatible-runtimes "python3.6" ;\
rm my_lambda_layer.zip ;\
popd ;
The above script will:
Pull the Docker image if you don't have it yet (above uses Python 3.6)
Delete the python directory (only useful for running a second
time)
Install all requirements to the python directory, created in your projects /src directory
ZIP the python directory
Upload the AWS layer
Delete the python directory and zip file
Make sure your requirements.txt file includes the modules listed above by Josh: google-cloud-vision, protobuf, google-api-python-client, httplib2, uritemplate, google-auth-httplib2
There's a fast solution that doesn't require much coding.
Cloud9 uses AMI so using pip on their virtual environment should make it work.
I created a Lambda from the Cloud9 UI and from the console activated the venv for the EC2 machine. I proceeded to install google-cloud-speech with pip.That was enough to fix the issue.
I was facing same error using goolge-ads API.
{
"errorMessage": "Unable to import module 'lambda_function': cannot import name'cygrpc' from 'grpc._cython' (/var/task/grpc/_cython/init.py)","errorType": "Runtime.ImportModuleError","stackTrace": []}
My Lambda runtime was Python 3.9 and architecture x86_64.
If somebody encounter similar ImportModuleError then see my answer here : Cannot import name 'cygrpc' from 'grpc._cython' - Google Ads API

Create data files in pip3 editable install mode

I'm trying to install python package in editable mode with:
pip3 install -e ./
setup.py file contains:
data_files=[
(os.path.expanduser("~") + "/.xxx", ["xxx/yyy.data"])
],
After installation the yyy.data file is not copied to .xxx folder.
Is there an option to create data files outside of the package folder when working in editable mode?
The truth is data_files has caveats. See No single, complete solution for packaging data issue on the list of Problems in Python Packaging, note in data_files section of Packaging and Distributing Project tutorial from Python Packaging User Guide, pip's bug All packages that contain non-package data are now likely installed in a broken way since 7.0.0 and wheel's bug bdist_wheel makes absolute data_files relative to site-packages.
According to information gathered from above sources your data was installed into site-packages directory instead of your home directory as you were expecting.

Can i install Python-unicodecsv in Python 3.0?

Im trying to install python-unicodecsv in python 3.0 for Odoo. but it say "unable to locate package python-unicodecsv
You can just make the download in below from the git repository
https://github.com/jdunck/python-unicodecsv
Installation steps:
Step 1. Download and Extract the .zip file
download directly as zip file and extract it from the below way
Extract within the same directory :
unzip python-unicodecsv-master.zip <your .zip file name>
Extract within the another directory :
unzip python-unicodecsv-master.zip <your .zip file name> -d <direcroty path>
Step 2. Install the .zip file using terminal in Ubuntu
just go to the extracted directory path through the terminal then type the below command
sudo python setup.py install
then finally your library installed successfully and you can
access the all the library which are related to the python-unicodecsv and used it in your python file.
I hope this should helpful for you :)

Resources