How do I run puppeteer on a server/in the cloud - node.js

Feels like I've searched the entire web for an answer...to no avail. I have a puppeteer script that works perfectly locally. My local machine is a little unreliable, so I've been trying to push this script to the cloud so that it can run there. But I have no idea where to start. I'm sitting here with an IBM cloud account with no idea what to do. Can anyone help me out?

Running Puppeteer scripts can be done on any cloud platform that
exposes a Node.js environment
enables running a browser (Puppeteer will need to start Chromium)
This could be achieved, for example, using AWS EC2.
AWS Lambda, Google Cloud Functions and IBM Cloud Functions (and similar services) might also work but they might need additional work on your side to get the browser running.
For a step-by-step guide, I would suggest checking out this article and this follow-up.
Also, it might just be easier to look into services like Checkly (disclaimer: I work for Checkly), Browserless and similar (a quick search for something along the lines of "run puppeteer online" will return several of those), which allow you to run Puppeteer checks online without requiring any additional setup. Useful if you are serious about using Puppeteer for testing or synthetic monitoring in the long run.

Related

Build an extensible system for scraping websites

Currently, I have a server running. Whenever I receive a request, I want some mechanism to start the scraping process on some other resource(preferably dynamically created) as I don't want to perform scraping on my main instance. Further, I don't want the other instance to keep running and charging me when I am not scraping data.
So, preferably a system that I can request to start scraping the site and close when it finishes.
Currently, I have looked in google cloud functions but they have a cap at 9 min max for every function so it won't fit my requirement as scraping would take much more time than that. I have also looked in AWS SDK it allows us to create VMs on runtime and also close them but I can't figure out how to push my API script onto the newly created AWS instance.
Further, the system should be extensible. Like I have many different scripts that scrape different websites. So, a robust solution would be ideal.
I am open to using any technology. Any help would be greatly appreciated. Thanks
I can't figure out how to push my API script onto the newly created AWS instance.
This is achieved by using UserData:
When you launch an instance in Amazon EC2, you have the option of passing user data to the instance that can be used to perform common automated configuration tasks and even run scripts after the instance starts.
So basically, you would construct your UserData to install your scripts, all dependencies and run them. This would be executed when new instances are launched.
If you want the system to be scalable, you can lunch your instances in Auto Scaling Group and scale it up or down as you require.
The other option is running your scripts as Docker containers. For example using AWS Fargate.
By the way, AWS Lambda has limit of 15 minutes, so not much more than Google functions.

How is it possible to debug an AWS Lambda function from remote?

We are taking over a whole application from another company, and they have built the whole pipeline for deploying, but we still don't have access to it. What we know, that there's a lambda function is running triggered by certain SNS messages, and all the code is in Node.js, and the development is in VS Code. We also have issues debugging it locally, but it's a bigger problem, that we need to debug it remotely.
Since I am new in AWS services, I'd really appreciate if somebody could help me in this.
Does it necessary to open a port? How is it possible to connect to a lambda? Do we need serverless to setup? Many unresolved questions.
I don't think there is way you can debug a lambda function remotely. Your best bet is to download the code on local machine, setup the env variables as you have set up on your lambda function and take it from there.
Remember at the end of the day lambda is just a container which is running the code for you. AWS doesn't allow any ssh or connection with those container. In your case you should be able to debug it on local till you have the same env variables. There are other things as well which are lambda specific but considering it is a running code which you have got so you should be able to find out the issue.
Hope it makes sense.
Thundra (https://www.thundra.io/aws-lambda-debugger) has live/remote debugging support for AWS Lambda through its native IDE plugins (VSCode and IntelliJ IDEA).
The way AWS have you 'remote' debug is to execute the lambda locally through Docker as it proxies the requests to the cloud for you, using AWS Toolkit. You have a lambda running on your local computer via docker that can access resources on the cloud, such as databases, api's etc. You can step through debug them using editors like vscode.
I use SAM with a template.yaml . This way, I can pass event data to the handler, reference dependency layers (shared code libraries) and have a deployment manifest to create a Cloudformation stack (release instance with history and resource management).
Debugging can be a bit slow as it compiles, deploys to Docker and invokes, but allows step through debugging and variable inspection.
https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/serverless-sam-cli-using-debugging.html
While far from ideal, any console-printing actions would likely get logged to CloudWatch, which you could then access to go through printed data.
For local debugging, there are many Github projects with Dockerfiles which which you can build a docker container locally, just like AWS does when your Lambda is invoked.

Use Google App Engine or Google Cloud Compute VM to Test Run My App?

I'm moving my Three.js app and its customized node.js environment, which I've been running on my local machine to Google Cloud. I want to test things out there, and hopefully soon get some early alpha testing going with other people.
I'm not sure which is the wiser way to go... to upload the repo I've been running locally as-is onto a VM which users would then access via the VM's external IP until I get a good name to call this app... or merge my local node.js environment with what's available via the Google App Engine and run it on GAE.
Issues I'm running into with the linux VM approach... I'm not sure how to do the equivalent on the VM of what I've been doing locally. In Windows Powershell I cd into the app directory and then enter node index.js. I'm assuming by this method of deployment that I can get the app running as soon as the browser hits the external IP. I should mention too that the app will allow users to save content as well as upload images, and eventually, 3D models as well as json datasets.
Issues I'm running into with the App Engine approach: it looks like I only have access to a linux-based command line, and have to install all the node.js modules manually. Meanwhile I have a bunch of files to upload, both the server-side node files and all the frontend stuff. I don't see where to upload those files, and ultimately what I'd like to do is have access to a visual, editable file-tree interface, as I have in Windows and FileZilla, so I can swap files in and out, etc. Alternatively I suppose I could import a repo from Github? Github would be fine as long as I can visually see what's happening. Is there a visual interface for file structure available in GAE somewhere? Am I missing something?
I went through the GAE "Hello World" tutorial and that worked fine, but was left scratching my head afterward regarding how to actually see and edit the guts of the tutorial app, or even where to look for the files.
So first off, I want to determine what's the better approach, and then if possible, determine how to make the experience of getting my app up there and running a more visual, user-friendly experience.
Thanks.
There are many things to consider when choosing how to run an app, but my instinct for your use case is to simply use a VM on GCE. The most compelling reason for this is that it's the most similar thing to what you have now. You can SSH into the machine and run nohup node index.js & (or node index.js inside tmux/screen if you prefer) and it will start the app and not stop it when you log out of SSH. You can use SCP / SFTP with whatever GUI client you want to upload files. You don't have to learn anything new! If you wanted to, you could even use a Windows VM (although I think you have to pay a little more than for a comparable Linux VM due to the licensing fees).
That said, the other way is arguably more "correct" by modern development standards, but it will involve a lot more learning that will prevent you from getting your app running somewhere other than your laptop in the short term:
First, you'll need to learn about Docker and stateless containers, which is basically what your app runs inside of on AppEngine.
Next, you'll need to learn how to hook up a separate stateful service (database, file server, ...) to your app's container so you can store your files, etc. in it, and then probably rewrite your app somewhat to use it to store stuff.
Next, you'll probably want some way to automatically deploy this from code instead of manually doing it, which gets you into build systems, package managers, artifact storage, continuous integration systems, and on and on and on.
This latter path is certainly what you should choose for a long-running production service if you work with a big team of developers -- but that doesn't mean that it's necessarily the right path for your project today. If you don't care about scaling up automatically, load balancing between nodes, redundant copies of your app running in different regions in case there's a natural disaster, etc., then go with the easy way for now, and you can learn new ways to improve the service when they're actually needed.

Can I manually run scripts on App Engine?

Is there a way to run something like "node testscript.js" remotely?
If not, how do you test particular functions on App Engine? I can test them locally, but there are difference when running on App Engine.
If you want to run something in App Engine, you will have to deploy it, and whenever you make changes to the source code, you will have to redeploy it again to be able to run the updated code on App Engine. You should test your application in local thoroughly to be sure it will be working as expected when deployed.
With respect to the timeouts, please keep in mind that there are two environments: flexible and standard, where the timeout deadlines differ (60 sec for standard vs 60 min for flexible). Also, you can have long-running requests on App Engine standard if you use the manual scaling option.
You might also look at Cloud Functions, depending on what your scripts do. Some of the options to trigger Cloud Functions are HTTP requests or Direct Triggers.

How to run node.js script on ftp server?

I've developed a wallpaper app whose backend is written in node.js . Everything in development was fine then today I bought a vps server with cpanel for 3 months from a hosting service provider .But I'm a complete beginner in server management so now how to run this node.js script in my ftp server . Can anyone please guide me to run node.js script on server ?
It may sound like off-topic but please do help me to fix my problem.
I'm a complete beginner so please do excuse me if my question is foolish.
I think there's no support for using Node.js with cPanel. But I can recommend you to use for example Heroku for testing or hobby development. Heroku is free, very simple and cool for small projects.
If it's just a simple, single node.js script, I'd recommend running this as a lambda function – basically, use a service that will run this script at a given HTTP endpoint without needing to manage a server. This can be free depending on how much it's used. For example, here are some places to run it:
https://blog.cloudflare.com/cloudflare-workers-unleashed/
https://www.netlify.com/docs/functions/
https://aws.amazon.com/lambda/
As CenyGG suggested, Heroku is also a great place to do this and will be free if you don't need the app to be live 24/7 (it will go to sleep and take a few seconds to "wake up" if it hasn't been used in a while).

Resources