GCP App Engine use for non web applications - python-3.x

I have a use case where I'd like to have an app running on GCP, with a schedule. Every X hours my main.py would execute a function, but I think I am in no need of having a web app or use Flask (which are the examples I've found).
I did try to use the function-framework, would this be an option within App Engine? (have the function-framework entrypoint as the entrypoint for the app)
Conceptually I don't know if the app engine is the right way forward, although it does look like the simplest option (excluding cloud function which I can't use because of the time restrictions)
Thanks!

You can use a Cloud Run Job (note that it's still in preview). As its documentation says
Unlike a Cloud Run service, which listens for and serves requests, a Cloud Run job only runs its tasks and exits when finished. A job does not listen for or serve requests, and cannot accept arbitrary parameters at execution.
You can also still use App Engine (Python + Flask). Using Cloud Scheduler, you schedule invoking a url of your web app. However, because your task is long running, you should use Cloud Tasks. Tasks allow you run longer processes. Essentially, you'll have a 2 step process
a. Cloud Scheduler invokes a url on your GAE App.
b. This url in turn pushes a task into your task queue which then executes the task. This is a blog article (with sample code) we wrote for using tasks in GAE. It's for DJango but you can easily replace it with Flask.

If you just need to run some backend logic and then shutdown until the next run, cloud functions is done for that.
You can setup a cloud scheduler task to invoke the function on a time basis.
Make sure to keep the function private to the internet, as well as configuring a service account for the cloud scheduler to use with the rights to invoke the private function.
Be aware of functions configuration options to fit your use case https://cloud.google.com/functions/docs/configuring , as well as limits https://cloud.google.com/functions/quotas#resource_limits
Good turtorial to implement it: https://cloud.google.com/community/tutorials/using-scheduler-invoke-private-functions-oidc

Related

gcloud app deploy service restart audomatically

I recently deployed a Node JS app via
gcloud app deploy
Inside my code, I have setInterval that triggers a function every hour. Unfortunately, the deployed server automatically restarts and as the result, it destroys my timing function. Anyone knows how could I prevent auto-restart for such deployment with gcloud?
Thanks
The answer is to schedule this outside the GAE app itself. GAE is not meant to have functions triggered like you are doing. You need to use cron jobs for this.
How to do this is very well documented.
Another option would be to run your code on a GCE instance instead.

Prevent duplicate schedule job on Azure App service when scale out

I have a Nuxt app deployed on Azure App Service, with cron library to run schedule jobs. However I found that If there are more than one instance running, the schedule job will be duplicated. What is the proper way to handle it? Thanks!
If you have more than one instance of your app running, then you have more than one instance of cron running (I'm assuming you are referring to the npm module in your case). Both are going to activate on the same schedule since it's coded into both apps. And of course if you scale beyond that, you would have three, four, five jobs running.
There are a few options that let you run singleton jobs on a timer such as adding a WebJob to your App Service, creating a Logic App, and running an Azure Function. For a basic JS script I would recommend the function. You create a JSON file that defines the schedule you want to run it on (just like cron), it's JS so you can probably just copy your code over along with other npm modules you need, and you can set Configuration just like web apps so if your job needs to connect to storage or a database you can have connection strings and other info there just like you do for your existing web app.

Build an extensible system for scraping websites

Currently, I have a server running. Whenever I receive a request, I want some mechanism to start the scraping process on some other resource(preferably dynamically created) as I don't want to perform scraping on my main instance. Further, I don't want the other instance to keep running and charging me when I am not scraping data.
So, preferably a system that I can request to start scraping the site and close when it finishes.
Currently, I have looked in google cloud functions but they have a cap at 9 min max for every function so it won't fit my requirement as scraping would take much more time than that. I have also looked in AWS SDK it allows us to create VMs on runtime and also close them but I can't figure out how to push my API script onto the newly created AWS instance.
Further, the system should be extensible. Like I have many different scripts that scrape different websites. So, a robust solution would be ideal.
I am open to using any technology. Any help would be greatly appreciated. Thanks
I can't figure out how to push my API script onto the newly created AWS instance.
This is achieved by using UserData:
When you launch an instance in Amazon EC2, you have the option of passing user data to the instance that can be used to perform common automated configuration tasks and even run scripts after the instance starts.
So basically, you would construct your UserData to install your scripts, all dependencies and run them. This would be executed when new instances are launched.
If you want the system to be scalable, you can lunch your instances in Auto Scaling Group and scale it up or down as you require.
The other option is running your scripts as Docker containers. For example using AWS Fargate.
By the way, AWS Lambda has limit of 15 minutes, so not much more than Google functions.

Can I manually run scripts on App Engine?

Is there a way to run something like "node testscript.js" remotely?
If not, how do you test particular functions on App Engine? I can test them locally, but there are difference when running on App Engine.
If you want to run something in App Engine, you will have to deploy it, and whenever you make changes to the source code, you will have to redeploy it again to be able to run the updated code on App Engine. You should test your application in local thoroughly to be sure it will be working as expected when deployed.
With respect to the timeouts, please keep in mind that there are two environments: flexible and standard, where the timeout deadlines differ (60 sec for standard vs 60 min for flexible). Also, you can have long-running requests on App Engine standard if you use the manual scaling option.
You might also look at Cloud Functions, depending on what your scripts do. Some of the options to trigger Cloud Functions are HTTP requests or Direct Triggers.

Cloud-based node.js console app needs to run once a day

I'm looking for what I would assume is quite a standard solution: I have a node app that doesn't do any web-work - simply runs and outputs to a console, and ends. I want to host it, preferably on Azure, and have it run once a day - ideally also logging output or sending me the output.
The only solution I can find is to create a VM on Azure, and set a cron job - then I need to either go fetch the debug logs daily, or write node code to email me the output. Anything more efficient available?
Azure Functions would be worth investigating. It can be timer triggered and would avoid the overhead of a VM.
Also I would investigate Azure Container Instances, this is a good match for their use case. You can have a container image that you run on an ACI instance that has your Node app. https://learn.microsoft.com/en-us/azure/container-instances/container-instances-tutorial-deploy-app

Resources