How can I execute asynchronous tasks in the background as scheduled in Google Cloud Platform? - python-3.x

Problem
I want to get a lot of game data at 9 o'clock every morning. Therefore I use App Engine & cron job. However I would like to add Cloud Tasks, I don't know how to do.
Question
How can I execute asynchronous tasks in the background as scheduled in Google Cloud Platform?
Which is natural to implement (Cloud Scheduler + Cloud Tasks) or (cron job+ Cloud Tasks)?
Development Environment
App Engine Python (Flexible environment).
Python 3.6
Best regards,

Cloud Tasks are asynchronous by design. As you mentioned, the best way would be to pair them with Cloud Scheduler.
First of all, since cloud Scheduler needs either a Pub/Sub or an HTTP endpoint, to call once it runs the jobs, I recommend to you to create an App Engine handler, to which the Cloud Scheduler will call, that creates and sends the task.
You can do so by following this documentation. First of all you will have to create a queue, and afterwards I recommend you to deploy simple application that has a handler to create the tasks. A small example:
from google.cloud import tasks_v2beta3
from flask import Flask, request
app = Flask(__name__)
#app.route('/createTask', methods=['POST'])
def create_task_handler():
client = tasks_v2beta3.CloudTasksClient()
parent = client.queue_path('YOUR_PROJECT', 'PROJECT_LOCATION', 'YOUR_QUEUE_NAME')
task = {
'app_engine_http_request': {
'http_method': 'POST',
'relative_uri': '/handler_to_call'
}
}
response = client.create_task(parent, task)
return response
Where the 'relative_uri' is the handler that the task will call, and processes your data.
Once that is done, follow the Cloud Scheduler documentation to create jobs, and specify the target to be App Engine HTTP, set the URL to '/createTask', the service to whichever is handling the URL, and the HTTP method to POST. Fill the rest of parameters as required, and you can set the Frequency to 'every monday 09:00'.

Related

Waiting for an azure function durable orchestration to complete

Currently working on a project where I'm using the storage queue to pick up items for processing. The Storage Queue triggered function is picking up the item from the queue and starts a durable orchestration. Normally the according to the documentation the storage queue picks up 16 messages (by default) in parallel for processing (https://learn.microsoft.com/en-us/azure/azure-functions/functions-bindings-storage-queue), but since the orchestration is just being started (simple and quick process), in case I have a lot of messages in the queue I will end up with a lot of orchestrations running at the same time. I would like to be able to start the orchestration and wait for it to complete before the next batch of messages are being picked up for processing in order to avoid overloading my systems. The solution I came up with and seems to work is:
public class QueueTrigger
{
[FunctionName(nameof(QueueTrigger))]
public async Task Run([QueueTrigger("queue-processing-test", Connection = "AzureWebJobsStorage")]Activity activity, [DurableClient] IDurableOrchestrationClient starter,
ILogger log)
{
log.LogInformation($"C# Queue trigger function processed: {activity.ActivityId}");
string instanceId = await starter.StartNewAsync<Activity>(nameof(ActivityProcessingOrchestrator), activity);
log.LogInformation($"Started orchestration with ID = '{instanceId}'.");
var status = await starter.GetStatusAsync(instanceId);
do
{
status = await starter.GetStatusAsync(instanceId);
} while (status.RuntimeStatus == OrchestrationRuntimeStatus.Running || status.RuntimeStatus == OrchestrationRuntimeStatus.Pending);
}
which basically picks up the message, starts the orchestration and then in a do/while loop waits while the staus is Pending or Running.
Am I missing something here or is there any better way of doing this (I could not find much online).
Thanks in advance your comments or suggestions!
This might not work since you could either hit timeouts causing duplicate orchestration runs or just force your function app to scale out defeating the purpose of your code all together.
Instead, you could rely on the concurrency throttles that Durable Functions come with. While the queue trigger would queue up orchestrations runs, only the max defined would run at any time on a single instance of a function.
This would still cause your function app to scale out, so you would have to consider that as well when setting this limit and you could also set the WEBSITE_MAX_DYNAMIC_APPLICATION_SCALE_OUT app setting to control how many instances you function app can scale out to.
It could be that the Function app's built in scaling throttling does not reduce load on downstream services because it is per app and will just cause the app to scale more. Then what is needed is a distributed max instance count that all app instances adhere to. I have built this functionality into my Durable Function orchestration app with a scaleGroupId and it`s max instance count. It has an Api call to save this info and the scaleGroupId is a string that can be set to anything that describes the resource you want to protect from overloading. Here is my app that can do this:
Microflow

How to use callbacks for long-running Azure Functions in a DevOps pipeline?

I have an Azure DevOps release pipeline that deploys a Python Azure function and then invokes it. The Python function does some heavy lifting, so it takes a few minutes to execute.
There are two options for the completion event for the Invoke Azure Function task: Api Response and Callback.0
The maximum response time when using Api Response is 20 seconds, so I need to use Callback. OK, fine. Using this documentation, I implemented an Azure function that returns an HTTPResponse immediately, and then posts completion data to the specified endpoint. Here's the complete code for my Azure function:
import logging
import time
import threading
import azure.functions as func
def main(req: func.HttpRequest) -> func.HttpResponse:
t = threading.Thread(target=do_work, args=(req,))
t.start()
return func.HttpResponse("Succeeded", status_code=200)
def do_work(req: func.HttpRequest):
logging.info ("Starting asynchronous worker task")
#time.sleep(21)
try:
planUrl = req.headers["PlanUrl"]
projectId = req.headers["ProjectId"]
hubName = req.headers["HubName"]
planId = req.headers["PlanId"]
taskInstanceId = req.headers["TaskInstanceId"]
jobId = req.headers["JobId"]
endpoint = f"{planUrl}/{projectId}/_apis/distributedtask/hubs/{hubName}/plans/{planId}/events?api-version=2.0-preview.1"
body = {
"name": "TaskCompleted",
"taskId": taskInstanceId,
"jobId": jobId,
"result": "succeeded"
}
logging.info(endpoint)
logging.info(body)
requests.post(endpoint, data=body)
except:
pass
finally:
logging.info ("Completed asynchronous worker task")
Now, the Invoke Azure Function task doesn't time out, but it doesn't complete either. It just looks like it's waiting for something else to happen:
Not sure what I'm supposed to do here. I'm also following this thread on GitHub, but it's not leading to a resolution.
Unless you are using Durable functions (which shouldn't be necessary here), background jobs as part of a function execution like this is not supported with Azure Functions.
Once a response is returned by a function, that channel is closed in a way that any background operations would not work as expected (exception- Durable Functions).
If you need something to work asynchronously like this and you are ok not waiting for a response, the recommended approach would be to use multiple functions.
In this case, your HttpTrigger could drop a message in the queue (or any other messaging service) and return a response immediately. Then, you'd have a queue trigger (or other event based trigger) to pick up events from the queue (or any such messaging service) and do your heavy lifting and once completed, could post to that endpoint in your example.
If you want to implement this with just one function, then from your devops pipeline, you could directly drop a message to a messaging service and have your function trigger off of that.
Hope this helps!
I can't find it explicitly documented anywhere, but it appears you need to add an authorization header (at least when running locally)
Debugging my (very similar C#) function, the callback response gives me a 204 status code with an empty body when I include the header, and a 203 response with an Azure DevOps sign in page in the body when I do not.
The bearer token is passed in the same way as the other system variables, as AuthToken so you need to add the Authorization header with a value of "Bearer " + req.headers["AuthToken"]

Related Scheduler Job not created-Firebase Scheduled Function

I have written a scheduled function in node.js using typescript that successfully deploys.The related pub/sub topic gets created automatically but somehow the related scheduler job does not.
This is even after getting these lines
i scheduler: ensuring necessary APIs are enabled...
i pubsub: ensuring necessary APIs are enabled...
+ scheduler: all necessary APIs are enabled
+ pubsub: all necessary APIs are enabled
+ functions: created scheduler job firebase-schedule-myFunction-us-central1
+ functions[myFunction(us-central1)]: Successful create operation.
+ Deploy complete!
I have cloned the sample at https://github.com/firebase/functions-samples/tree/master/delete-unused-accounts-cron which deploys and automatically creates both the related pub/sub topic and scheduler job.
What could i be missing?
Try to change .timeZone('utc') (per the docs) to .timeZone('Etc/UTC') (also per the self-contradictory docs).
It seems that when using the 'every 5 minutes' syntax, the deploy does not create the scheduler job.
Switching to the cron syntax solved the problem for me
Maybe your cron syntax isn't correct. There are some tools to validate the syntax
Check your firebase-debug.log
At some point,it will invoke a POST request to:
>> HTTP REQUEST POST https://cloudscheduler.googleapis.com/v1beta1/projects/*project_name*/locations/*location*/jobs
This must be a 200 response.

How to avoid callback-hell using asyncio in python

I have the following situation.
I have 3 Services JobInitiator,Mediator,Executor That talk to eachother in the following manner.
The JobInitiator once every X minutes publishes to a queue (RabbitMQ) a requested job
The Executor service every Y minutes sends a REST API call to the Mediator service and asks if there is any jobs to be done. If so - the Mediator pulls a message from the queue and returns the message to the Executor service in the response.
After the Executor finishes executing the job - he posts the job results to an API in the Mediator service that publishes it to a queue that the JobInitiator listens to.
Side notes + restrictions and limitations:
The Mediator service is just a REST API wrapper to my queue. The main issue is that Executor service can't be accessed publicly - only outgoing api calls are allowed.
I cannot connect the queue directly from the JobInitiator to the Executor service
Up until now - nothing really special about this process. The thing i was wondering about is if its possible to write this with asyncio in python so i won't deal with callback hell. Something like this (pseudo code)
class JobInitiator(object):
def do_job():
token = await get_token()
applicative_results = await get_results(token=token)
where get_token() and get_results() both run through the process described above.

Azure subscription and webjob questions

So i'm trying to get a small project of mine going that I want to host on azure, it's a web app which works fine and I've recently found webjobs which I now want to use to have a task run which does data gathering and updating, which I have a Console App for.
My problem is that I can't set a schedule, since it is published to the web app which dosen't support scheduling, so I tried using the Azure Webjobs SDK and using a timer but it wont run without a AzureWebJobsStorage connection string which I cannot get since my Azure account is a Dreamspark account and I cannot make a Azure Storage Account with it.
So I was wondering if there is some way to get this webjob to run on a time somehow (every hour or so). Otherwise if I just upgraded my account to "Pay-As-You-Go"? would I still retain my free features? namely SQL Server.
Im not sure if this is the right palce to ask but I tried googling for it without success.
Update: Decided to just make the console app run oin a infinate loop and ill just monitor it through the portal, the code below is what I am using to made that loop.
class Program
{
static void Main()
{
var time = 1000 * 60 * 30;
Timer myTimer = new Timer(time);
myTimer.Start();
myTimer.Elapsed += new ElapsedEventHandler(myTimer_Elapsed);
Console.ReadLine();
}
public static void myTimer_Elapsed(object sender, ElapsedEventArgs e)
{
Functions.PullAndUpdateDatabase();
}
}
The simplest way to get your Web Job on a schedule is detailed in Amit Apple's blog titled "How to add a schedule to a triggered WebJob".
It's as simple as adding a JSON file called settings.job to your console application and in it describing the schedule you want as a cron expression like so:
{"schedule": "the schedule as a cron expression"}
For example, to run your job every 30 minutes you'd have this in your settings.job file:
{"schedule": "0 0,30 * * * *"}
Amit's blog also goes into details on how to write a cron expression.
Caveat: The scheduling mechanism used in this method is hosted on the instance where your web application is running. If your web application is not configured as Always On and is not in constant use it might be unloaded and the scheduler will then stop running.
To prevent this you will either need to set your web application to Always On or choose an alternative scheduling option - based on the Azure Scheduler service, as described in a blog post titled "Hooking up a scheduler job to a WebJob" written by David Ebbo.

Resources