How to run scripts more than 15 mins in app engine - node.js

So i wrote a node script which fetches data from WP.org themes/plugins. The theme script will take around 4-5 hours to complete ( scraping and inserting data into BigQuery ).
The problem arises when i used google app engine to deploy the script, it works fine for 15 mins then it stops. Any way to increase the execution time of scripts in app engine.
These scripts will run weekly or every fortnight and will run until they are done. But app engine stops them after 15 mins. They works fine on my localhost so its not issue with node.

The max allowed run-time of a request is based on your selected scaling type. So it sounds like you will need to create a separate service to run this task with Basic or Manual set for the scaling type
https://cloud.google.com/appengine/docs/standard/nodejs/how-instances-are-managed#scaling_types
You could also try breaking up your task into multiple 10 minute tasks and chain them together

Related

Deploying Trading Bots on Google Cloud with Cloud Scheduler

If I have 5 bots for trading and a along with this a script that does some updating on prices using scraping. All these files uses Node js. Now, I was able to deploy all the 6 scripts on digital ocean, but due to 6 scripts running together as 6 different processes the CPU usage in even their most expensive plan became 100%. Then I decided to shift to google cloud. But it turns out with GPU it is hell expensive.
Essentially what I want to do is that run the 6 scripts at 3 distinct times in a day for 10 mins. Other than those particular times the 6 scripts do nothing.
I have set a file named concurrently.js that runs all these scripts using the command concurrently.
Is it possible to run concurrently.js at 3 particular times of the day and then after 10 mins when the job is done, shut down the virtual machine?
Say machine turns on at 12.00pm then the 6 files work for 10 mins and then the machine shuts off at 12.10 pm. And then turns on at say 3.05 pm and so on.
If I can schedule on and off of the VM I can afford google cloud.
I got to know about cron and google cloud scheduler, but they need an App url to schedule tasks. But I don't have an app url because I don't have app only, I just want to run the concurrently.js file present in the virtual machine along with other files, can I do the scheduling?
Any help is highly appreciated!!!
You can do this with Google Cloud. Here the process
Cloud Scheduler start your Compute Engine VM
At startup, the Compute Engine VM runs a startup script that run your process
At the end of the process the VM auto shutdown
So for that you need to
Call the Compute Engine start API
Set a startup script on your VM
Shutdown the VM automatically at the end of the processing
If you are stuck in one step, let me know, I could narrow my help.

Delayed or Random Tasks with Google Cloud Scheduler

I'm currently running 2 scripts on a weekly schedule on a raspberry pi with the following configuration:
Cron executes a python script at a fixed time weekly. This python script waits between 0 and 50 hours then runs python script A. It waits about 16 hours and runs script A again 3 more times every 8 hours (The script takes about 4x longer to run the first time). 8 hours after the 4th run it runs script B.
I would like to move my scripts to Google Cloud VM for improved reliability but running the VM 24/7 just to run 30 hours worth of computations over a 100 hour period is inefficient and expensive.
I know I can use Google Scheduler as my cron to initiate the VM weekly but I still risk letting it run up to 50 hours waiting for script A to run. I understand cron supports adding a random sleep interval as listed in the example here:
30 8-21/* * * * sleep ${RANDOM:0:2}m ; /path/to/script.php
However, from what I've discovered, Google Cloud Scheduler is limited to 60 minutes and rightfully so. In this case what are my options? Does Google Cloud Task support delayed triggering of VM (up to 50 hours)? Is this something Pub Sub would support instead?
My scripts use a python library that I don't think is compatible with Google App Engine so I would further need to figure how to trigger a specific script in the VM on trigger.
You can use Cloud Scheduler and Pub/Sub to trigger a Cloud Function that will start your VM and execute your script. If you do not want your Compute Engine instances to be running 24/7, at the end of your script you can have your Cloud Function stop your VM.
You can find how to schedule compute instances with Cloud Scheduler here and how to use HTTP functions in Cloud Functions to start and stop your Compute Engine instance [1].
Most importantly, here is the documentation on how to use Cloud Scheduler and Pub/Sub to trigger a Cloud Function [2].
[1] https://cloud.google.com/scheduler/docs/start-and-stop-compute-engine-instances-on-a-schedule
[2] https://cloud.google.com/scheduler/docs/tut-pub-sub
[3] Cloud Functions: https://cloud.google.com/functions/docs/concepts/overview

Is it possible to use Node JS App Engine Standard Environment with basic scaling to execute a long running task?

I would like to perform some video processing task which can take a long time to complete.
I had thought of using Cloud functions but I found that it can run for a maximum time of 540 seconds.
Browsing the internet, I find that App Engine can be used to execute long running processes.
I need the 'scale to zero' functionality, so, I cannot use Flexible environment.
On https://cloud.google.com/appengine/docs/the-appengine-environments, I find that 'Maximum request timeout' in standard environment is 60 seconds.
Is there a way to execute long running task in standard environment?
You can use Cloud Tasks
all workers must send an HTTP response code (200-299) to the Cloud
Tasks service before a deadline based on the instance scaling type of
the service: 10 minutes for automatic scaling or up to 24 hours for
manual scaling.

Allow users to set up schedule for server-side scripts to run in Node

I'm creating a project in Node & Express that allows users to schedule the server to run test scripts e.g. once every ten minutes. I looked into node-schedule which looks great however it seems that all scheduled tasks disappear if the server ever restarts Node.
Cron looks good too but it has the problem that it doesn't seem to have a way to delete scheduled tasks after they have been set up.
If you were doing this, how would you go about it? I really don't want anything that's going to be complex, just need to schedule tasks, be able to delete individual tasks, and keep tasks in the event of a server reboot.
Simplest solution is to store the configurations for Cron in a database (since it takes a string as a parameter). Load the jobs from the db every time the app starts.

How to parallelize crontab executions to increase user base for web app based on mongodb and mysql?

I have a symfony based web application that runs on mongodb and mysql backend. The principal of the application is that for each user there is a python script that runs 4-12 times a day on cronjobs and populates the mysql and mongodb databases. The script takes between 1.5 minutes to 2 minutes to execute. At the moment the cronjob runs on sequential basis. That means that the script executes a job and waits for the job to end before executing the next one. The moment my web application has a new user the cronjobs are auto created for a duration of time. With 24 hours in a day I can run a limited number of cronjobs thereby, limited number of users (around 250-300)
What would I need to do if I wanted to host 1000 to a million users on my web application? Can I run my script on multithread basis? That means instead of waiting for a job to finish, launch hundreds of job at the same time. This way I can grow my user base exponentially.
But, what concurrency will mongodb and mysql be able to sustain? how many jobs can I execute parallelly? What system factors do I need to consider to grow my user base? Do I need to add more machines to my application?

Resources