Best way to start a background process from GCP HTTP function call? - node.js

So, according to the docs here https://cloud.google.com/functions/docs/writing/http
Terminating HTTP functions
If a function creates background tasks (such as threads, futures, Node.js Promise objects, callbacks, or system processes), you must terminate or otherwise resolve these tasks before returning an HTTP response. Any tasks not terminated prior to an HTTP response may not be completed, and may also cause undefined behavior.
So, if one needs to launch a long-running background task from within HTTP function, but still return from function fast, there is no a straightforward way.
Have tried the PubSub approach (calling await topic.publishJSON(pars)), but looks like publishing a topic is quite time-consuming operation - which takes 2-3 secs. (8-)
Then probably pubsub trigger function runs well ok, but this 2-3 seconds delay makes it useless.
P.S.: using the approach with starting Promise from inside function is actually working, but it sounds like error-prone since it's against the docs.

If you need a quick answer you have 2 type of solutions
Async
With Cloud Functions, you need to invoke (perform an HTTP call) another functions (or Cloud Run or App Engine), without waiting the answer, and answer back to the requester. The call that you performed will run in background and answer something to your cloud function that no longer listen!
With PubSub, it's similar. Instead of invoking a Cloud Functions (or Cloud Run or App Engine), you publish a message into a PubSub topic. Then create a subscription to call your long running pocess
Same idea with Cloud Task, but you create a Task in a queue
Sync
If you use Cloud Run instead of Cloud Functions, you are able to perform partial answer to the requester. Like that, you can immediately answer back to the requester with a partial response which says "OK" and continue the process in the request context, and send another partial response when you want, or at the end of the long running process to inform the user the end of their process.

Related

Preventing Potential Race Condition in Calls to an API?

There's an API that my node.js server accesses quite a bit. It requires me to change my password every 3 months. Fortunately, there's also an API call for changing the password. :) I have a cron job that runs regularly and changes the password when necessary.
If my app is accessing the API at the exact time the password is being changed, there's a potential race condition and the API call could fail. What are some good patterns for dealing with this?
I could put all the API calls into a queue, and use a cron job to pull the most recent one off the queue and run it. If the API call fails, it would stay in the queue and get run next time the cron job runs. But that seems like it might be overkill.
I could use a try/catch handler with the API call, inside a while loop, and just run the while loop until the API call completes successfully. But that's going to block the rest of my app.
I could use a try/catch handler with the API call, inside a setTimeOut, and just re-run the setTimeOut until the API call completes successfully. This way the API call would only run when the main thread is done with other work and gets around to it. But would this be a mistake if the server is under heavy load?
Is there a better pattern for dealing with this sort of thing?
The try/catch handlers would lose data in the event of a server crash, so I went with the cron job/queue approach. I'm using a queue maintained as a table in my db, so that if something interrupts the server, nothing will be lost.

Is it possible for multiple AWS Lambdas to service a single HTTP request?

On AWS, is it possible to have one HTTP request execute a Lambda, which in turn triggers a cascade of Lambdas running in serial, where the final Lambda returns the result to the user?
I know one way to achieve this is for the initial Lambda to "stay running" and orchestrate the other Lambdas, but I'd be paying for that orchestration Lambda to effectively do nothing most of the time, i.e. paying for the time it's waiting on the others. If it were non-lambda code, that would be like blocking (and paying for) an entire thread while the other threads do their work.
Unless AWS stops the billing clock while async Lambdas are "sleeping"/waiting on network IO?
Unfortunately as you've found only a single Lambda function can be invoked, this becomes an orchestrator.
This is not ideal but will have to be the case if you want to use multiple Lambda functions as you're serving a HTTP request, you can either use the Lambda to call a number of Lambda or instead create a Step Function which can can orchestrate the individual steps. You would still need the Lambda to start this, and then poll the status of it before returning the results.

How does a Cloud Function instance handles multiple requests?

I trying to wrap my head around Cloud Function's instances and how they work.
I'm asking about an example of an HTTP function, but I think the concept applies to any kind of function.
Let's say I have this cloud function that handles SSR for my app, named ssrApp.
And let's assume that it takes 1 second to complete every time it gets a request.
When Cloud Function receives the 1st request, it will spin up an instance to respond it.
QUESTION
How does that instance behave when multiple requests are coming?
From: https://cloud.google.com/functions/docs/concepts/exec
Each instance of a function handles only one concurrent request at a time. This means that while your code is processing one request, there is no possibility of a second request being routed to the same instance. Thus the original request can use the full amount of resources (CPU and memory) that you requested.
Does it mean that during that 1 second when my ssrApp function is running, if somebody hits my app URL, it is guaranteed that Cloud Function will spin up another instance for that second request? Does it matter if the function does only sync calls or some async calls in its execution? What I mean is, could an async call free the instance to respond to another request in parallel?
Does it mean that during that 1 second when my ssrApp function is running, if somebody hits my app URL, it is guaranteed that Cloud Function will spin up another instance for that second request?
That's the general behavior, although there are no guarantees around scheduling.
Does it matter if the function does only sync calls or some async calls in its execution? What I mean is, could an async call free the instance to respond to another request in parallel?
No, that makes no difference. If the container is waiting for an async call, it it still considered to be in-use.
2022 Update
For future searchers, Cloud Functions Gen2 now supports concurrency: https://cloud.google.com/functions/docs/2nd-gen/configuration-settings#concurrency

Calling Azure function synchronously, without HttpTrigger

I want to test a Queue-triggered Azure Function over HTTP (integration test).
Is there any general method to call a deployed Azure Function, synchronously?
I have successfully called it with the admin/functions/{function} endpoint as shown here. But I get 202 Accepted which is no good - My test needs to wait for the function to complete (and fail if the function failed).
That behavior is driven by the Function, not the client. So if your Function properly closes the http connection, but continues processing, there's nothing the client can do about that.
So, you can either test through the queue, or have a side function with an HTTP Trigger that calls the same processing method(s) that only returns when it's done.
This is not possible. The Azure Functions host does not support returning the output of a queue trigger to the admin HTTP endpoint. I suggest looking into the suggestion by Johns-305.

Nodejs callback mechanism - which thread handles the callback?

I'm new to nodeJS and was wondering about the single-instance model of Node.
In a simple nodeJs application, when some blocking operation is asynchronously handled with callbacks, does the main thread running the nodeJs handle the callback as well?.
If the request is to get some data off the database, and there are 100s of concurrent users, and each db operation takes couple of seconds, when the callback is finally fired (for each of the connections), is the main thread accepting these requests used for executing the callback as well? If so, how does nodeJs scale and how does it respond so fast?.
Each instance of nodejs runs in a single thread. Period. When you make an async call to, say, a network request, it doesn't wait around for it, not in your code or anywhere else. It has an event loop that runs through. When the response is ready, it invokes your callback.
This can be incredibly performant, because it doesn't need lots of threads and all the memory overhead, but it means you need to be careful not to do synchronous blocking stuff.
There is a pretty decent explanation of the event loop at http://blog.mixu.net/2011/02/01/understanding-the-node-js-event-loop/ and the original jsconf presentation by Ryan Dahl http://www.youtube.com/watch?v=ztspvPYybIY is worth watching. Ever seen an engineer get a standing ovation for a technical presentation?

Resources