Automating NodeJS scripts with Google Cloud Platform - node.js

My question is in regards to clarification and/or anybodies previous experience with NodeJS and Google Cloud Platform (GCP).
I have developed numerous NodeJS scripts that read and transform serveral JSON sports feed in order to populate a Google Firebase database backend.
The NodeJS scripts work exactly as desired; with the exception that I need to run/execute the NodeJs script manually in order to populate the backend. I obviously want this to be automatically, lets say an interval of every 2 mins.
I am unclear on how to achieve this!? Does GCP offer a cron job that can execute my NodeJS on a specific time interval? If so how should I implement this?!

If you are planning on using Compute Engine you can just use a cron job which comes with the both the Debian and Red Hat Linux public images available within Google Cloud Platform.
You could create an enrty like this to run the script every 2 hours.
* /2* * * * /usr/local/bin/node /home/example/script.js

Here are two examples of how to do this using cron and appengine:
https://github.com/firebase/functions-cron
https://mhaligowski.github.io/blog/2017/05/25/scheduled-cloud-function-execution.html
The basic idea is the same: one appengine app for cron, where you tell it what URL to get, at what frequency. What is serving at the URL is immaterial here, you would obviously have your nodejs app in an appengine instance, serving URLs that match those given to cron. The cron portion of the examples is independent of language, it is REST based.
So the steps for you would be:
Setup your nodejs app in GAE the standard way (regardless of the
fact that you want your app URLs called at intervals)
Setup your cron app in GAE as explained in those examples
Notice your nodejs app of step 1 being called as you specified in step 2!

Related

GCP App Engine use for non web applications

I have a use case where I'd like to have an app running on GCP, with a schedule. Every X hours my main.py would execute a function, but I think I am in no need of having a web app or use Flask (which are the examples I've found).
I did try to use the function-framework, would this be an option within App Engine? (have the function-framework entrypoint as the entrypoint for the app)
Conceptually I don't know if the app engine is the right way forward, although it does look like the simplest option (excluding cloud function which I can't use because of the time restrictions)
Thanks!
You can use a Cloud Run Job (note that it's still in preview). As its documentation says
Unlike a Cloud Run service, which listens for and serves requests, a Cloud Run job only runs its tasks and exits when finished. A job does not listen for or serve requests, and cannot accept arbitrary parameters at execution.
You can also still use App Engine (Python + Flask). Using Cloud Scheduler, you schedule invoking a url of your web app. However, because your task is long running, you should use Cloud Tasks. Tasks allow you run longer processes. Essentially, you'll have a 2 step process
a. Cloud Scheduler invokes a url on your GAE App.
b. This url in turn pushes a task into your task queue which then executes the task. This is a blog article (with sample code) we wrote for using tasks in GAE. It's for DJango but you can easily replace it with Flask.
If you just need to run some backend logic and then shutdown until the next run, cloud functions is done for that.
You can setup a cloud scheduler task to invoke the function on a time basis.
Make sure to keep the function private to the internet, as well as configuring a service account for the cloud scheduler to use with the rights to invoke the private function.
Be aware of functions configuration options to fit your use case https://cloud.google.com/functions/docs/configuring , as well as limits https://cloud.google.com/functions/quotas#resource_limits
Good turtorial to implement it: https://cloud.google.com/community/tutorials/using-scheduler-invoke-private-functions-oidc

How to deploy a Cron Job Node Js Script to Google Cloud?

I have a nodeJs script that is run by a Cron Job using the node-cron module
The purpose of this nodeJs script is to loop over items in my MongoDB and run some function.
Is it possible to deploy this nodeJS script/app to the GCP and have it run at every Sunday?
In my CronJob config in my NodeJS app, I already have it run only every Sunday.
However I was wondering whether if I Could use GCP's scheduler or just keep my Cron-Job in my NodeJs.
I've achieved this before by using Heroku Scheduler, however I have been having problems with deploying Puppeteer to Heroku therefore I am using GCP since Puppeteer works fine in the google cloud node js environment.
If anyone can give me some insight or some instructions on what I have to do I would appreciate it.
Thank you
What you are trying to achieve could be done by setting up a MongoDB Atlas with Google Cloud. Here you can find the documentation.
Then, you could use the Cloud Scheduler and Pub/Sub to trigger a Cloud Function (in nodeJS, like your script). Here is an example tutorial.
Then, in order to be able to connect your Cloud Function to your MongoDB cluster, this detailed guide will show you how to do so.
This should give you some insights to start searching for more information by yourself. Have in mind there are different alternatives. For example, instead of using MongoDB, you could use Firestore with your Cloud Functions and set the Cron Schedule with the Pub/Sub as previously mentioned.

How do I run puppeteer on a server/in the cloud

Feels like I've searched the entire web for an answer...to no avail. I have a puppeteer script that works perfectly locally. My local machine is a little unreliable, so I've been trying to push this script to the cloud so that it can run there. But I have no idea where to start. I'm sitting here with an IBM cloud account with no idea what to do. Can anyone help me out?
Running Puppeteer scripts can be done on any cloud platform that
exposes a Node.js environment
enables running a browser (Puppeteer will need to start Chromium)
This could be achieved, for example, using AWS EC2.
AWS Lambda, Google Cloud Functions and IBM Cloud Functions (and similar services) might also work but they might need additional work on your side to get the browser running.
For a step-by-step guide, I would suggest checking out this article and this follow-up.
Also, it might just be easier to look into services like Checkly (disclaimer: I work for Checkly), Browserless and similar (a quick search for something along the lines of "run puppeteer online" will return several of those), which allow you to run Puppeteer checks online without requiring any additional setup. Useful if you are serious about using Puppeteer for testing or synthetic monitoring in the long run.

Build an extensible system for scraping websites

Currently, I have a server running. Whenever I receive a request, I want some mechanism to start the scraping process on some other resource(preferably dynamically created) as I don't want to perform scraping on my main instance. Further, I don't want the other instance to keep running and charging me when I am not scraping data.
So, preferably a system that I can request to start scraping the site and close when it finishes.
Currently, I have looked in google cloud functions but they have a cap at 9 min max for every function so it won't fit my requirement as scraping would take much more time than that. I have also looked in AWS SDK it allows us to create VMs on runtime and also close them but I can't figure out how to push my API script onto the newly created AWS instance.
Further, the system should be extensible. Like I have many different scripts that scrape different websites. So, a robust solution would be ideal.
I am open to using any technology. Any help would be greatly appreciated. Thanks
I can't figure out how to push my API script onto the newly created AWS instance.
This is achieved by using UserData:
When you launch an instance in Amazon EC2, you have the option of passing user data to the instance that can be used to perform common automated configuration tasks and even run scripts after the instance starts.
So basically, you would construct your UserData to install your scripts, all dependencies and run them. This would be executed when new instances are launched.
If you want the system to be scalable, you can lunch your instances in Auto Scaling Group and scale it up or down as you require.
The other option is running your scripts as Docker containers. For example using AWS Fargate.
By the way, AWS Lambda has limit of 15 minutes, so not much more than Google functions.

Can I manually run scripts on App Engine?

Is there a way to run something like "node testscript.js" remotely?
If not, how do you test particular functions on App Engine? I can test them locally, but there are difference when running on App Engine.
If you want to run something in App Engine, you will have to deploy it, and whenever you make changes to the source code, you will have to redeploy it again to be able to run the updated code on App Engine. You should test your application in local thoroughly to be sure it will be working as expected when deployed.
With respect to the timeouts, please keep in mind that there are two environments: flexible and standard, where the timeout deadlines differ (60 sec for standard vs 60 min for flexible). Also, you can have long-running requests on App Engine standard if you use the manual scaling option.
You might also look at Cloud Functions, depending on what your scripts do. Some of the options to trigger Cloud Functions are HTTP requests or Direct Triggers.

Resources