How to implement a pull-queue using Cloud Tasks in Node.js - node.js

I am trying to implement a pull-queue using Cloud Tasks + App Engine standard environment in Node.js. So basically I am trying to lease tasks from a queue. The problem is that I can only find examples in other languages, and I can find no mention of creating or leasing tasks for pull queues in the GCP Node.js documentation.
Please tell me this is possible and I do not need to start using a different language in my project only to implement a pull-queue mechanism.
Here is a link to the equivalent Python documentation
--- edit ---
I managed to find a reference in the types that allowed me to do this:
import { v2beta2 } from "#google-cloud/tasks";
const client = new v2beta2.CloudTasksClient();
const [{ tasks }] = await client.leaseTasks({
parent: client.queuePath(project, location, "my-pull-queue"),
maxTasks: 100,
});
...but it is giving me some odd quota error:
Error: Failed to lease tasks from the my-pull-queue queue: 8
RESOURCE_EXHAUSTED: Quota exceeded for quota metric 'Alpha API
requests' and limit 'Alpha API requests per minute (should be 0 unless
whitelisted)' of service 'cloudtasks.googleapis.com' for consumer
'project_number:xxx'.
I can hardly find sources referencing this type of quota error, but it seems to stem from APIs that are not made public yet and can only be used when access is granted explicitly (which would explain the whitelisting).
Another thing I find very odd is that there seem to be two beta clients v2beta2 and v2beta3, but only beta2 types define methods for leasing a task. Both beta APIs are defining types for creating a pull-queue task.

I just found this statement that pull-queues are not supported in Node.js.
https://github.com/googleapis/nodejs-tasks/issues/123#issuecomment-445090253

Related

Google Pub/Sub with distributed subscribers in Node.js

We are attempting to migrate a message processing app from Kafka to Google Pub/Sub and it's just not working as expected.
We are running in Kubernetes (Google Cloud) where there may be multiple pods processing messages on the same subscription. Topics and subscriptions are all created using terraform and are more or less permanent. They are not created/destroyed on the fly by the application.
In our development environment, where message throughput is rather low, everything works just fine. But when we scale up to production levels, everything seems to fall apart. We get big backlogs of unacked messages, and yet some pods are not receiving any messages at all. And then, all of a sudden, the backlog will just go away, but then climb again.
We are using the nodejs client library provided by google: #google-cloud/pubsub:3.1.0
Each instance of the application subscribes to the same named subscription, and according to the documentation, messages should be distributed to each subscriber. But that is not happening. Some pods will be consuming messages rapidly, while others sit idle.
Every message is processed in a try/catch block and we are not observing any errors being thrown. So, as far as we know, every received message is getting acked.
I am suspicious that, as pods are terminated with autoscaling or updated deployments, that we are not properly closing subscriptions, but there are no examples addressing a distributed environment and I have not found any document that specifically addresses how to properly manage resources. It is also worth mentioning that the app has multiple subscriptions to different topics.
When a pod shuts down, what actions should be taken on the Subscription object and the PubSub client object? Maybe that's not even the issue, but it seems like a reasonable place to start.
When we start a subscription we do something like this:
private exampleSubscribe(): Subscription {
// one suggestion for having multiple subscriptions in the same app
// was to use separate clients for each
const pubSubClient = new PubSub({
// use a regional endpoint for message ordering
apiEndpoint: 'us-central1-pubsub.googleapis.com:443',
});
pubSubClient.projectId = 'my-project-id';
const sub = pubSubClient.subscription('my-subscription-name', {
// have tried various values for maxMessage from 5 to the default of 1000
flowControl: { maxMessages: 250, allowExcessMessages: false },
ackDeadline: 30,
});
sub.on('message', async (message) => {
await this.exampleMessageProcessing(message);
});
return sub;
}
private async exampleMessageProcessing(message: Message): Promise<void> {
try {
// do some cool stuff
} catch (error) {
// log the error
} finally {
message.ack();
}
}
Upon termination of a pod, we do this:
private async exampleCloseSub(sub: Subscription) {
try {
sub.removeAllListeners('message');
await sub.close();
// note that we do nothing with the PubSub
// client object -- should it also be closed?
} catch (error) {
// ignore error, we are shutting down
}
}
When running with Kafka, we can easily keep up with the message pace with usually no more than 2 pods. So I know that we are not running into issues of it simply taking too long to process each message.
Why are messages being left unacked? Why are pods not receiving messages when there is clearly a large backlog? What is the correct way to shut down one subscriber on a shared subscription?
It turns out that the issue was an improper implementation of message ordering.
The official docs for message ordering in Pub/Sub are rather brief:
https://cloud.google.com/pubsub/docs/ordering
Not much there regarding how to implement an ordering key or the implications of message ordering on horizontal scaling.
Though they do link to some external resources, one of which is this blog post:
https://medium.com/google-cloud/google-cloud-pub-sub-ordered-delivery-1e4181f60bc8
In our case, we did not have enough distinct ordering keys to allow for proper distribution of messages across subscribers/pods.
So this was definitely an RTFM situation, or more accurately: Read The Fine Blog Post Referred To By The Manual. I would have much preferred that the important details were actually in the official documentation. Is that to much to ask for?

Name or Service not known - intermittent error in Azure

I have a TimerTrigger which calls my own Azure Functions at a relatively high rate - a few times per second. It is not being stress tested. Every call takes just a 100ms and the purpose of the test is not a stress test.
This call to my own endpoint works about 9999 times out of 10000 but just once in a while I get the following error:
System.Net.Http.HttpRequestException: Name or service not known (app.mycustomdomain.com:443)
---> System.Net.Sockets.SocketException (0xFFFDFFFF): Name or service not known
at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken)
I replaced my actual domain with "app.mycustomdomain.com" in the error message above. It is a custom domain set up to point to the Azure Function App using CNAME.
The Function App does not detect any downtime in the Azure Portal and I have Application Insights enabled and do not see any errors. So I assume the issue is somehow on the callers side and the call never actually happens.
What does this error indicate? And how can I alleviate the problem?
For your second question - alleviating the problem, one option would certainly be to build in retry using a library like Polly. High level you create a policy, e.g. for a simple retry:
var myPolicy = Policy
.Handle<SomeExceptionType>()
.Retry(3);
This would retry 3 times, to use the policy you can call a sync or async version of Execute:
await myPolicy.ExecuteAsync(async () =>
{
//do stuff that might fail up to three times
});
More complete samples are available
This library has lots of support for other approaches, e.g. with delays, exponential delays, etc.

APP Engine Google Cloud Storage - Error 500 when downloading a file

I'm having an error 500 when I download a JSON file (2MB aprox) using the nodejs-storage library. The file gets downloaded without any problem, but once I render the view and pass the file as parameter the app crashes "The server encountered an error and could not complete your request."
file.download(function(err, contents) {
var messages = JSON.parse(contents);
res.render('_myview.ejs', {
"messages": messages
})
}
I am using the App Engine Standard Environment and have this further error detail:
Exceeded soft memory limit of 256 MB with 282 MB after servicing 11 requests total. Consider setting a larger instance class in app.yaml
Can someone give me hint? Thank you in advance.
500 error messages are quite hard to troubleshoot due to the all the possible scenarios that could go wrong with the App Engine instances. A good way to start debugging this type of errors with App Engine would be to go to the Stackdriver logging, query for the 500 error messages click on the expander arrow and check for the specific error code. In the specific case of the Exceeded soft memory limit... error message in the App Engine Standard environment my suggestion would be to choose an instance class better suited to your application's load.
Assuming you are using automatic scaling you could try to use an F2 instance class (which has a higher Memory and CPU limit than the default F1) and start from there. Adding or modifying the instance_class element of your app.yaml file to instance_class: F2 would suffice to accomplish the instance class suggested, or you could change your app.yaml file to use an instance better suited to your application's load.
Notice that increasing the instance class directly affects your billing and you can use the Google Cloud Platform Pricing Calculator to get an estimate of the costs associated to using a different instance class for your App Engine application.

How to set Infinite Timeout for Azure Function app v2.0

I have a very long running process which is hosted using Azure Function App (though it's not recommended for long running processes) targeting v2.0. Earlier it was targeting v1.0 runtime so I didn't face any function timeout issue.
But now after updating the runtime to target v2.0, I am not able to find any way to set the function timeout to Infinite as it was in case of v1.0.
Can someone please help me out on this ?
From your comments it looks like breaking up into smaller functions or using something other than functions isn't an option for you currently. In such case, AFAIK you can still do it with v2.0 as long as you're ready to use "App Service Plan".
The max limit of 10 minutes only applies to "Consumption Plan".
In fact, documentation explicitly suggests that if you have functions that run continuously or near continuously then App Service Plan can be more cost-effective as well.
You can use the "Always On" setting. Read about it on Microsoft Docs here.
Azure Functions scale and hosting
Also, documentation clearly states that default value for timeout with App Service plan is 30 minutes, but it can be set to unlimited manually.
Changes in features and functionality
UPDATE
From our discussion in comments, as null value isn't working for you like it did in version 1.x, please try taking out the "functionTimeout" setting completely.
I came across 2 different SO posts mentioning something similar and the Microsoft documentation text also says there is no real limit. Here are the links to SO posts I came across:
SO Post 1
SO Post 2
One way of doing it is to implement Eternal orchestrations from Durable Functions. It allows you to implement an infinite loop with dynamic intervals. Of course, you need to slightly modify your code by adding support for the stop/start function at any time (you must pass the state between calls).
[FunctionName("Long_Running_Process")]
public static async Task Run(
[OrchestrationTrigger] DurableOrchestrationContext context)
{
var initialState = context.GetInput<object>();
var state = await context.CallActivityAsync("Run_Long_Running_Process", initialState);
if (state == ???) // stop execution when long running process is completed
{
return;
}
context.ContinueAsNew(state);
}
You cannot set an Azure Function App timeout to infinite. I believe the longest any azure function app will consistently run is 10 minuets. As you stated Azure functions are not meant for long running processes. You may need find a new solution for your app, especially if you will need to scale up the app at all in the future.

Calling hundreds of azure functions in parallel

I have an application that executes rules using some rules engine. I have around 500+ rules and our application will receive around 10,000 entries and all of those 10,000 entries should go through those 500 rules for validation individually. We are currently planning to migrate all our rules into Azure functions. That is each Azure function will correspond to a single rule.
So I was trying a proof of concept and created couple of Azure functions with Task.delay to mock real rule validation. I need to execute all 500 rules for each and every request. So I created a Durable function as orchestrator calling all the 500 activity triggers (rules) using Task.WhenAll, but that didn't work. Azure functions got hung with just one request. So instead I created individual HttpTrigger function for the rules. From a C# class library I called all the 500 functions using Task.WhenAll for one request and it worked like a charm. However, when I tried calling azure functions for all entries (starting with 50) then Task.WhenAll started throwing errors saying a task got cancelled or TimeoutException for HTTP call. Below is the code:
var tasksForCallingAzureFunctions = new List<List<Task<HttpResponseMessage>>>();
Parallel.For(0, 50, (index) =>
{
var tasksForUser = new List<Task<HttpResponseMessage>>();
//call the mocked azure function 500 times
for (int j = 0; j < 500; j++)
{
tasksForUser.Add(client.SendAsync(CreateRequestObject()));
}
//add the tasks for each user into main list to call all the requests in parallel.
tasksForCallingAzureFunctions.Add(tasksForUser);
});
// call all the requests - 50 * 500 requests in parallel.
Parallel.ForEach(tasksForCallingAzureFunctions, (tasks) => Task.WhenAll(tasks).GetAwaiter().GetResult());
So my question is, Is this a recommended approach? I'm making 50 * 500 I/O calls. Requirement is all the entries (10000) should go through 500 rules.
Other option I thought was send all the entries to each Azure function, so that Azure function will have logic to loop through each entry and validate them. That way I only have to make 500 calls?
Please advise.
I would implement this scenario with queue-triggered Functions. Send a queue message per entity per rule, each function will pick one up, do its work and save the result to some storage. This should be more resilient than doing a gazillion sync HTTP requests.
Now, Durable Functions basically do the same behind the scenes, so you were on the right path. Maybe add a new question with minimal repro for your hanging problem.

Resources