Can Zappa Update be used with the output package from Zappa Package? - python-3.x

I'm trying to load a model on AWS lambda using Zappa. The problem is that the total unzipped file size from the package created by Zappa and uploaded to S3 is about 550mb, which exceeds the limit. One of the packages I'm using is Spacy (an NLP dependency that is very large), and I'm able to reduce the size of this package by manually removing unused languages in the lang folder. Doing this I can get the unzipped file size under 500mb. Problem is that Zappa automatically downloads the full Spacy version (spacy==2.1.4: Using locally cached manylinux wheel) on deploy and update.
I've learned that I can call Zappa Package, and it will generate a package that I can then upload myself. What I've done is unzipped the generated package and removed the unnecessary lang files, then I zipped it back up. Is it possible for me to call Zappa Deploy/Update and use the modified package and handler that was created by Zappa Package? This way Zappa can still handle the deployment.

For me the following two things fixed that issue:
AWS Lambda requires your environment to have a maximum size of 50mb, but our packaged environment will be around 100mb. Lucky for us, it is possible for Lambda’s to load code from Amazon S3 without much performance loss (only a few milliseconds).
To activate this feature, you must add a new line to your zappa_settings.json
"slim_handler": true
Installing only spacy with and not the language packages (python3 -m spacy download en). Afterwards, I uploaded the language package manually to S3 and then loaded the spacy language "model" similar as described here: Sklearn joblib load function IO error from AWS S3

Here's how I solved the issue, there are two ways:
The first is to simply move the dependency folder from the
site-packages directory to the root folder, and then make any
modifications there. This will force zappa to not download a wheels on linux version of the dependency upon upload
The simpler solution is to remove the *dist folder for a
specific module that you modify. Removing this will force zappa to bypass re-downlading modules from wheels on linux; meaning your modified module will be packaged during deployment.

Related

python package installation error while creating a webjob in azure

i am creating a webjob which has following python dependencies(azure-storage-blob==12.8.1,azure) along with other dependencies, the problem is here that my code is getting stuck at below from almost 3-4 hours only.
Dowenloading azure_common-1.1.8-py2.py3-none-any.whl(7.9kb)
pip is looking at multiple versions of azure-core to determine which version is compatible
with other requirements. This could take a while.
[08/12/2021 19:55:54 > d827c9: INFO] INFO: This is taking longer than usual. You might need to
provide the dependency resolver with stricter constraints to reduce runtime. If you want to
abort this run, you can press Ctrl + C to do so. To improve how pip performs, tell us what
happened here: https://pip.pypa.io/surveys/backtracking
The thing is that if i installed specific version of azure then its not compatible with azure-storage-blob and its throwing error at importing blob storage and if f didnt install azure or other version of azure which is not compatible with azure-storage-blob==12.8.1 and throwing below error :
from azure.keyvault import KeyVaultAuthentication, KeyVaultClient
ImportError: cannot import name 'KeyVaultAuthentication'
does anyone know how to install python packages while creating azure webjob and also solution to overcome this issue
i have another question related to a triggered webjob , so suppose if i installed packages successfully so every time it runs whether it will install all the packages ever ytime or it will do only at first hit and saved packages in env
Check what the dependency tree looks like locally by running pip freeze and then provide the strict versions to prevent the dependency resolution timeouts.

How can I use modules in Azure ML studio designer pipeline?

I am currently using a python script in my Azure pipeline
Import data as Dataframe --> Run Python Script --> Export Dataframe
My script is developed locally and I get import errors when trying to import tensorflow... No problem, guess I just have to add it to environment dependencies somewhere -- and it is here the documentation fails me. They seem to rely on the SDK without touching the GUI, but I am using the designer.
I have at this point already build some enviroments with the dependencies, but utilizing these environments on the run or script level is not obvious to me.
It seems trivial, so any help as to use modules is greatly appreciated.
To use the modules that are not preinstalled(see Preinstalled Python packages). You need to add the zipped file containing new Python packages on Script bundle. See below description in the document:
To include new Python packages or code, connect the zipped file that contains these custom resources to Script bundle port. Or if your script is larger than 16 KB, use the Script Bundle port to avoid errors like CommandLine exceeds the limit of 16597 characters.
Bundle the script and other custom resources to a zip file.
Upload the zip file as a File Dataset to the studio.
Drag the dataset module from the Datasets list in the left module pane in the designer authoring page.
Connect the dataset module to the Script Bundle port of Execute Python Script module.
Please check out document How to configure Execute Python Script.
For more information about how to prepare and upload these resources, see Unpack Zipped Data
You can also check out this similar thread.

Using JTR in Cloud Functions?

I am trying JTR to brute force a pdf file.
The password of pdf is like First 4 Letters Last 4 Number ex: ABCD1234 or ZDSC1977
I've downloaded the jumbo source code from github and using pdf2john.pl i've extracted the hash.
But now by reading the documentation it says i need to configure and install john which is not going to work in my case.
Cloud Functions or firebase functions does not allow sudo apt get installs. and that's the reasone we can't use tools like popple utils which includes amazing pdftotext.
How can i use JTR in cloud functions properly without need of installation ?
is there any portable or prebuilt for ubuntu 18.04 version of JTR ?
It is important to keep in mind that you can't arrange for packages to be installed on Cloud Functions instances. This due to your code doesn't run with root privileges.
If you need binaries to be available to your code deployed to Cloud Functions, you will have to build it yourself for Debian, and include the binaries in your functions directory so it gets deployed along with the rest of your code.
Even if you're able to do that, there's no guarantee it will work, because the Cloud Fucntions images may not include all the shared libraries required for the executables to work.
You can request that new packages be added to the runtime using the Public Issue Tracker.
Otherway, you can use Cloud Run or Compute Engine.

How to bundle headless chromium module with AWS Lambda?

I'm attempting to use Puppeteer with Lambda, however, on serverless deploy, the lambda errors out due to exceeding the 250mb unbundled package size limit.
So, to get under the limit, I've switched to Puppeteer core which doesn't come packaged with chromium. This requires referencing a path to an executable to launch chrome. (e.g. puppeteer.launch({executablePath: headlessChromiumPath}));
However, I'm not sure how to load a headless Chromium into my container so that I can later reference it.
To solve this I'm trying a couple of things:
First, I've downloaded a binary headless chromium and I've included it into my API.
File structure:
-run-puppeteer.js
-headless_shell.tar.gz
Referenced like:
const browser = await puppeteer.launch({
executablePath: "../headless_shell.tar.gz"
});
However, I can't import or require it so my lambda doesn't recognize that it exists and doesn't include it in my deployment package.
My question here is how do I correctly include the headless file into my API so that I can reference it from within this lambda?
If that isn't an option - I see that I can upload the binary to S3 and then download it on container startup. Any references on where to begin tackling this would be much appreciated.
You can use chrome-aws-lambda to either package chrome with your Lambda or create a Lambda Layer to avoid the package size.
I did something similar here based on chrome-aws-lambda
chrome-aws-lambda indeed is big ~40MB adding to the deployment package, using Layer could potentially reduce the package size but also could increase the size, because the 250MB unzipped limit includes the layer and the lambda code. If you use chrome-aws-lambda then definitely do NOT use puppeteer, instead use puppeteer-core for a smaller size. I did a very similar setup this hopefully it helps1

Why serverless lambda deploy will error with: No module named '_sqlite3' ?

There are other similar question like mine, but I think no one looks complete or fits/answer my case.
I'm deploying a Python 3.6 application on AWS lambda via serverless framework.
With this application I'm using diskcache to perform some small file caching (not using at all sqlite actually)
I'm using "serverless-python-requirements" plugin in order to have all my dependencies (defined in requirements.txt file) packed up and uploaded (diskcache in this case)
When application is live on AWS and I request it, I'll get back a 500 error. And in my logs I can read:
Unable to import module 'handler': No module named '_sqlite3'
Then from answer below I get that sqlite module should not be needed to be installed.
Python: sqlite no matching distribution found for sqlite
So no need (and it wont work) to add sqlite as a requirement...
Then I wonder why AWS lambda is unable to find sqlite once deployed.
Any hint pls?
Thanks

Resources