I'm attempting to use Puppeteer with Lambda, however, on serverless deploy, the lambda errors out due to exceeding the 250mb unbundled package size limit.
So, to get under the limit, I've switched to Puppeteer core which doesn't come packaged with chromium. This requires referencing a path to an executable to launch chrome. (e.g. puppeteer.launch({executablePath: headlessChromiumPath}));
However, I'm not sure how to load a headless Chromium into my container so that I can later reference it.
To solve this I'm trying a couple of things:
First, I've downloaded a binary headless chromium and I've included it into my API.
File structure:
-run-puppeteer.js
-headless_shell.tar.gz
Referenced like:
const browser = await puppeteer.launch({
executablePath: "../headless_shell.tar.gz"
});
However, I can't import or require it so my lambda doesn't recognize that it exists and doesn't include it in my deployment package.
My question here is how do I correctly include the headless file into my API so that I can reference it from within this lambda?
If that isn't an option - I see that I can upload the binary to S3 and then download it on container startup. Any references on where to begin tackling this would be much appreciated.
You can use chrome-aws-lambda to either package chrome with your Lambda or create a Lambda Layer to avoid the package size.
I did something similar here based on chrome-aws-lambda
chrome-aws-lambda indeed is big ~40MB adding to the deployment package, using Layer could potentially reduce the package size but also could increase the size, because the 250MB unzipped limit includes the layer and the lambda code. If you use chrome-aws-lambda then definitely do NOT use puppeteer, instead use puppeteer-core for a smaller size. I did a very similar setup this hopefully it helps1
Related
I have been trying to create a working image resizer in lambda and following various examples and code I see out there to do it in node.js
I tried gm with imagemagick tools but there seems to be a built-in buffer limit which causes it to fail in the lambda environment on large images.
I tried using sharp but it keeps running into errors looking for libvips and the documentation is a cluster##$^ and I can't seem to find a succinct "do this to get it to work" instruction anywhere.
So I'm yet again looking for some kind of a tool that can be run in node.js in the lambda environment to resize an image from an s3 download stream and re-upload the end result back to another S3 bucket. I need to also be able to get the image pixel dimensions while resizing it.
It needs to be able to handle large images without puking and not require a doctorate in amazon linux to install on lambda. I've wasted too much time on this aspect of this project already.
Any help or suggestions is greatly appreciated.
Alas with much banging of my face on the keyboard intermittently, I eventually found a magic combination of using the docker run 'npm install' syntax on the sharp installation page combined with setting that particular script up to node.js v10.x that got it working on my third attempt. (I have no idea what was different from the first attempts, but I'm still figuring out how serverless deploy works for combined functions as well - too much 'new stuff' all in one project sigh)
I'm trying to load a model on AWS lambda using Zappa. The problem is that the total unzipped file size from the package created by Zappa and uploaded to S3 is about 550mb, which exceeds the limit. One of the packages I'm using is Spacy (an NLP dependency that is very large), and I'm able to reduce the size of this package by manually removing unused languages in the lang folder. Doing this I can get the unzipped file size under 500mb. Problem is that Zappa automatically downloads the full Spacy version (spacy==2.1.4: Using locally cached manylinux wheel) on deploy and update.
I've learned that I can call Zappa Package, and it will generate a package that I can then upload myself. What I've done is unzipped the generated package and removed the unnecessary lang files, then I zipped it back up. Is it possible for me to call Zappa Deploy/Update and use the modified package and handler that was created by Zappa Package? This way Zappa can still handle the deployment.
For me the following two things fixed that issue:
AWS Lambda requires your environment to have a maximum size of 50mb, but our packaged environment will be around 100mb. Lucky for us, it is possible for Lambda’s to load code from Amazon S3 without much performance loss (only a few milliseconds).
To activate this feature, you must add a new line to your zappa_settings.json
"slim_handler": true
Installing only spacy with and not the language packages (python3 -m spacy download en). Afterwards, I uploaded the language package manually to S3 and then loaded the spacy language "model" similar as described here: Sklearn joblib load function IO error from AWS S3
Here's how I solved the issue, there are two ways:
The first is to simply move the dependency folder from the
site-packages directory to the root folder, and then make any
modifications there. This will force zappa to not download a wheels on linux version of the dependency upon upload
The simpler solution is to remove the *dist folder for a
specific module that you modify. Removing this will force zappa to bypass re-downlading modules from wheels on linux; meaning your modified module will be packaged during deployment.
I have a project which uses puppeteer to print PDFs, the problem is the download of chromium is too large to work with servers, so I want to migrate it to chrome-remote-interface instead. There is a better way to do that? I will change too much my code?
You don't even need to switch to such libraries for this problem. Puppeteer already has solution for that.
puppeteer-core
Puppeteer has puppeteer-core library which is without the chrome download and will work with remote interface.
The only difference between puppeteer-core and puppeteer atm is that puppeteer-core doesn't install chromium. So you can just swipe it.
The original difference is described here. The document for .connect is here.
Using the environment variable
You can use puppeteer as usual, except provide PUPPETEER_SKIP_CHROMIUM_DOWNLOAD environment variable to skip the download when doing npm install.
I am creating an Alexa skill and I am using Amazon Lambda to handle the intents. I found online several tutorials and decided to use NodeJs with the alexa-sdk. After installing the alexa-sdk with npm, the zipped archive occupied a disksize of ~6MB. If I upload it to amazon, it tells me
The deployment package of your Lambda function "..." is too large to enable inline code editing. However, you can still invoke your function right now.
My index.js has a size of < 4KB but the dependencies are large. If I want to change something, I have to zip it altogether (index.js and the folder with the depencencies "node_modules"), upload it to Amazon and wait till its processed, because online editing isn't available anymore. So every single change of the index.js wastes > 1 minute of my time to zip and upload it. Is there a possibility to use the alexa-sdk dependency (and other dependencies) without uploading the same code continually every time I am changing something? Is there a possibility to use the online-editing function though I am using large dependencies? I just want to edit the index.js.
If the size of your Lambda function's zipped deployment packages
exceeds 3MB, you will not be able to use the inline code editing
feature in the Lambda console. You can still use the console to invoke
your Lambda function.
Its mentioned here under AWS Lambda Deployment Limits
ASK-CLI
ASK Command Line Interface let you manage your Alexa skills and related AWS Lambda functions from your local machine. Once you set it up, you can make necessary changes in your Lambda code or skill and use deploy command to deploy a skill. The optional target will let you deploy the associated Lambda code.
ask deploy [--no-wait] [-t| --target <target>] [--force] [-p| --profile <profile>] [--debug]
More info about ASK CLI here and more about deploy command here
I have some problems with serverless deploy, because when I deploy my Lambda function, the Serverless Framework start packing my node_modules, but it takes a lot of time.
I mean why to upload node_modules again if it have not been updated. Maybe somebody know, how deploy only a Lambda function code without packing binaries?
You need to add packaging configuration.
In your serverless.yml file, add:
package:
exclude:
- node_modules/**
It is useful to remove the AWS-SDK modules (because if you don't upload them, Lambda will use what AWS offers - which is better) and to remove dev modules (like testing frameworks). However, all other modules are dependencies and will be needed to be uploaded for your function to work properly. So, configure the package settings to include/exclude exactly what you need.
Regarding your other question
why to upload node_modules again if it have not been updated
It is not a limitation of the Serverless Framework. It is a limitation of the AWS Lambda service. You can't make a partial upload of a Lambda function. Lambda always requires that the uploaded zip package contains the updated code and all required modules dependencies.
If your deploy is taking too long, maybe you should consider breaking this Lambda function into smaller units.