How to avoid memory leak when using pub sub to call function? - node.js

I stuck on performance issue when using pubsub to triggers the function.
//this will call on index.ts
export function downloadService() {
// References an existing subscription
const subscription = pubsub.subscription("DOWNLOAD-sub");
// Create an event handler to handle messages
// let messageCount = 0;
const messageHandler = async (message : any) => {
console.log(`Received message ${message.id}:`);
console.log(`\tData: ${message.data}`);
console.log(`\tAttributes: ${message.attributes.type}`);
// "Ack" (acknowledge receipt of) the message
message.ack();
await exportExcel(message);//my function
// messageCount += 1;
};
// Listen for new messages until timeout is hit
subscription.on("message", messageHandler);
}
async function exportExcel(message : any) {
//get data from database
const movies = await Sales.findAll({
attributes: [
"SALES_STORE",
"SALES_CTRNO",
"SALES_TRANSNO",
"SALES_STATUS",
],
raw: true,
});
... processing to excel// 800k rows
... bucket.upload to gcs
}
The function above is working fine if I trigger ONLY one pubsub message.
However, the function will hit memory leak issue or database connection timeout issue if I trigger many pubsub message in short period of time.
The problem I found is, first processing havent finish yet but others request from pubsub will straight to call function again and process at the same time.
I have no idea how to resolve this but I was thinking implement the queue worker or google cloud task will solve the problem?

As mentioned by #chovy in the comments, there is a need to queue up the excelExport function calls since the function's execution is not keeping up with the rate of invocation. One of the modules that can be used to queue function calls is async. Please note that the async module is not officially supported by Google.
As an alternative, you can employ flow control features on the subscriber side. Data pipelines often receive sporadic spikes in published traffic which can overwhelm subscribers in an effort to catch up. The usual response to high published throughput on a subscription would be to dynamically autoscale subscriber resources to consume more messages. However, this can incur unwanted costs — for instance, you may need to use more VM’s — which can lead to additional capacity planning. Flow control features on the subscriber side can help control the unhealthy behavior of these tasks on the pipeline by allowing the subscriber to regulate the rate at which messages are ingested. Please refer to this blog for more information on flow control features.

Related

Google Pub/Sub with distributed subscribers in Node.js

We are attempting to migrate a message processing app from Kafka to Google Pub/Sub and it's just not working as expected.
We are running in Kubernetes (Google Cloud) where there may be multiple pods processing messages on the same subscription. Topics and subscriptions are all created using terraform and are more or less permanent. They are not created/destroyed on the fly by the application.
In our development environment, where message throughput is rather low, everything works just fine. But when we scale up to production levels, everything seems to fall apart. We get big backlogs of unacked messages, and yet some pods are not receiving any messages at all. And then, all of a sudden, the backlog will just go away, but then climb again.
We are using the nodejs client library provided by google: #google-cloud/pubsub:3.1.0
Each instance of the application subscribes to the same named subscription, and according to the documentation, messages should be distributed to each subscriber. But that is not happening. Some pods will be consuming messages rapidly, while others sit idle.
Every message is processed in a try/catch block and we are not observing any errors being thrown. So, as far as we know, every received message is getting acked.
I am suspicious that, as pods are terminated with autoscaling or updated deployments, that we are not properly closing subscriptions, but there are no examples addressing a distributed environment and I have not found any document that specifically addresses how to properly manage resources. It is also worth mentioning that the app has multiple subscriptions to different topics.
When a pod shuts down, what actions should be taken on the Subscription object and the PubSub client object? Maybe that's not even the issue, but it seems like a reasonable place to start.
When we start a subscription we do something like this:
private exampleSubscribe(): Subscription {
// one suggestion for having multiple subscriptions in the same app
// was to use separate clients for each
const pubSubClient = new PubSub({
// use a regional endpoint for message ordering
apiEndpoint: 'us-central1-pubsub.googleapis.com:443',
});
pubSubClient.projectId = 'my-project-id';
const sub = pubSubClient.subscription('my-subscription-name', {
// have tried various values for maxMessage from 5 to the default of 1000
flowControl: { maxMessages: 250, allowExcessMessages: false },
ackDeadline: 30,
});
sub.on('message', async (message) => {
await this.exampleMessageProcessing(message);
});
return sub;
}
private async exampleMessageProcessing(message: Message): Promise<void> {
try {
// do some cool stuff
} catch (error) {
// log the error
} finally {
message.ack();
}
}
Upon termination of a pod, we do this:
private async exampleCloseSub(sub: Subscription) {
try {
sub.removeAllListeners('message');
await sub.close();
// note that we do nothing with the PubSub
// client object -- should it also be closed?
} catch (error) {
// ignore error, we are shutting down
}
}
When running with Kafka, we can easily keep up with the message pace with usually no more than 2 pods. So I know that we are not running into issues of it simply taking too long to process each message.
Why are messages being left unacked? Why are pods not receiving messages when there is clearly a large backlog? What is the correct way to shut down one subscriber on a shared subscription?
It turns out that the issue was an improper implementation of message ordering.
The official docs for message ordering in Pub/Sub are rather brief:
https://cloud.google.com/pubsub/docs/ordering
Not much there regarding how to implement an ordering key or the implications of message ordering on horizontal scaling.
Though they do link to some external resources, one of which is this blog post:
https://medium.com/google-cloud/google-cloud-pub-sub-ordered-delivery-1e4181f60bc8
In our case, we did not have enough distinct ordering keys to allow for proper distribution of messages across subscribers/pods.
So this was definitely an RTFM situation, or more accurately: Read The Fine Blog Post Referred To By The Manual. I would have much preferred that the important details were actually in the official documentation. Is that to much to ask for?

Waiting for an azure function durable orchestration to complete

Currently working on a project where I'm using the storage queue to pick up items for processing. The Storage Queue triggered function is picking up the item from the queue and starts a durable orchestration. Normally the according to the documentation the storage queue picks up 16 messages (by default) in parallel for processing (https://learn.microsoft.com/en-us/azure/azure-functions/functions-bindings-storage-queue), but since the orchestration is just being started (simple and quick process), in case I have a lot of messages in the queue I will end up with a lot of orchestrations running at the same time. I would like to be able to start the orchestration and wait for it to complete before the next batch of messages are being picked up for processing in order to avoid overloading my systems. The solution I came up with and seems to work is:
public class QueueTrigger
{
[FunctionName(nameof(QueueTrigger))]
public async Task Run([QueueTrigger("queue-processing-test", Connection = "AzureWebJobsStorage")]Activity activity, [DurableClient] IDurableOrchestrationClient starter,
ILogger log)
{
log.LogInformation($"C# Queue trigger function processed: {activity.ActivityId}");
string instanceId = await starter.StartNewAsync<Activity>(nameof(ActivityProcessingOrchestrator), activity);
log.LogInformation($"Started orchestration with ID = '{instanceId}'.");
var status = await starter.GetStatusAsync(instanceId);
do
{
status = await starter.GetStatusAsync(instanceId);
} while (status.RuntimeStatus == OrchestrationRuntimeStatus.Running || status.RuntimeStatus == OrchestrationRuntimeStatus.Pending);
}
which basically picks up the message, starts the orchestration and then in a do/while loop waits while the staus is Pending or Running.
Am I missing something here or is there any better way of doing this (I could not find much online).
Thanks in advance your comments or suggestions!
This might not work since you could either hit timeouts causing duplicate orchestration runs or just force your function app to scale out defeating the purpose of your code all together.
Instead, you could rely on the concurrency throttles that Durable Functions come with. While the queue trigger would queue up orchestrations runs, only the max defined would run at any time on a single instance of a function.
This would still cause your function app to scale out, so you would have to consider that as well when setting this limit and you could also set the WEBSITE_MAX_DYNAMIC_APPLICATION_SCALE_OUT app setting to control how many instances you function app can scale out to.
It could be that the Function app's built in scaling throttling does not reduce load on downstream services because it is per app and will just cause the app to scale more. Then what is needed is a distributed max instance count that all app instances adhere to. I have built this functionality into my Durable Function orchestration app with a scaleGroupId and it`s max instance count. It has an Api call to save this info and the scaleGroupId is a string that can be set to anything that describes the resource you want to protect from overloading. Here is my app that can do this:
Microflow

How to persist Saga instances using storage engines and avoid race condition

I tried persisting Saga Instances using RedisSagaRepository; I wanted to run Saga in load balancing setup, so I cannot use InMemorySagaRepository.
However, after I switched, I noticed that some of the events published by Consumers were not getting processed by Saga. I checked the queue and did not see any messages.
What I noticed is it will likely occurs when the Consumer took little to no time to process command and publish event.
This issue will not occur if I use InMemorySagaRepository or add Task.Delay() in Consumer.Consume()
Am I using it incorrectly?
Also, If I want to run Saga in load balancing setup, and if the Saga needs to send multiple commands of the same type using dictionary to track completeness (similar logic as in Handling transition to state for multiple events). When multiple Consumer publish events at the same time, would I have race condition if two Sagas are process two different events at the same time? In this case, would the Dictionary in State object will be set correctly?
The code is available here
SagaService.ConfigureSagaEndPoint() is where I switch between InMemorySagaRepository and RedisSagaRepository
private void ConfigureSagaEndPoint(IRabbitMqReceiveEndpointConfigurator endpointConfigurator)
{
var stateMachine = new MySagaStateMachine();
try
{
var redisConnectionString = "192.168.99.100:6379";
var redis = ConnectionMultiplexer.Connect(redisConnectionString);
///If we switch to RedisSagaRepository and Consumer publish its response too quick,
///It seems like the consumer published event reached Saga instance before the state is updated
///When it happened, Saga will not process the response event because it is not in the "Processing" state
//var repository = new RedisSagaRepository<SagaState>(() => redis.GetDatabase());
var repository = new InMemorySagaRepository<SagaState>();
endpointConfigurator.StateMachineSaga(stateMachine, repository);
}
catch (Exception ex)
{
Console.WriteLine(ex.ToString());
}
}
LeafConsumer.Consume is where we add the Task.Delay()
public class LeafConsumer : IConsumer<IConsumerRequest>
{
public async Task Consume(ConsumeContext<IConsumerRequest> context)
{
///If MySaga project is using RedisSagaRepository, uncomment await Task.Delay() below
///Otherwise, it seems that the Publish message from Consumer will not be processed
///If using InMemorySagaRepository, code will work without needing Task.Delay
///Maybe I am doing something wrong here with these projects
///Or in real life, we probably have code in Consumer that will take a few milliseconds to complete
///However, we cannot predict latency between Saga and Redis
//await Task.Delay(1000);
Console.WriteLine($"Consuming CorrelationId = {context.Message.CorrelationId}");
await context.Publish<IConsumerProcessed>(new
{
context.Message.CorrelationId,
});
}
}
When you have events published in this manner, and are using multiple service instances with a non-transactional saga repository (such as Redis), you need to design your saga such that a unique identifier is used and enforced by Redis. This prevents multiple instances of the same saga from being created.
You also need to accept the events in more than the "expected" state. For instance, expecting to receive a Start, which puts the saga into a processing state, before receiving another event only in processing, is likely to fail. Allowing the saga to be started (Initially, in Automatonymous) by any of the sequence of events is recommended, to avoid out-of-order message delivery issues. As long as the events all move the dial from the left to the right, the eventual state will be reached. If an earlier event is received after a later event, it shouldn't move the state backwards (or to the left, in this example) but only add information to the saga instance and leave it at the later state.
If two events are processed on separate service instances, they'll both try to insert the saga instance to Redis, which will fail as a duplicate. The message should then retry (add UseMessageRetry() to your receive endpoint), which would then pick up the now existing saga instance and apply the event.

With the retry options in durable functions, what happens after the last attempt?

I'm using a durable function that's triggered off a queue. I'm sending messages off the queue to a service that is pretty flaky, so I set up the RetryPolicy. Even still, I'd like to be able to see the failed messages even if the max retries has been exhausted.
Do I need to manually throw those to a dead-letter queue (and if so, it's not clear to me how I know when a message has been retried any number of times), or will the function naturally throw those to some kind of dead-letter/poison queue?
When an activity fails in Durable Functions, an exception is marshalled back to the orchestration with FunctionFailedException thrown. It doesn't matter whether you used automatic retry or not - at the very end, the whole activity fails and it's up to you to handle the situation. As per documentation:
try
{
await context.CallActivityAsync("CreditAccount",
new
{
Account = transferDetails.DestinationAccount,
Amount = transferDetails.Amount
});
}
catch (Exception)
{
// Refund the source account.
// Another try/catch could be used here based on the needs of the application.
await context.CallActivityAsync("CreditAccount",
new
{
Account = transferDetails.SourceAccount,
Amount = transferDetails.Amount
});
}
The only thing retry changes is handling the transient error(so you do not have to enable the safe route each time you have e.g. network issues).

How to listen to a queue using azure service-bus with Node.js?

Background
I have several clients sending messages to an azure service bus queue. To match it, I need several machines reading from that queue and consuming the messages as they arrive, using Node.js.
Research
I have read the azure service bus queues tutorial and I am aware I can use receiveQueueMessage to read a message from the queue.
However, the tutorial does not mention how one can listen to a queue and read messages as soon as they arrive.
I know I can simply poll the queue for messages, but this spams the servers with requests for no real benefit.
After searching in SO, I found a discussion where someone had a similar issue:
Listen to Queue (Event Driven no polling) Service-Bus / Storage Queue
And I know they ended up using the C# async method ReceiveAsync, but it is not clear to me if:
That method is available for Node.js
If that method reads messages from the queue as soon as they arrive, like I need.
Problem
The documentation for Node.js is close to non-existant, with that one tutorial being the only major document I found.
Question
How can my workers be notified of an incoming message in azure bus service queues ?
Answer
According to Azure support, it is not possible to be notified when a queue receives a message. This is valid for every language.
Work arounds
There are 2 main work arounds for this issue:
Use Azure topics and subscriptions. This way you can have all clients subscribed to an event new-message and have them check the queue once they receive the notification. This has several problems though: first you have to pay yet another Azure service and second you can have multiple clients trying to read the same message.
Continuous Polling. Have the clients check the queue every X seconds. This solution is horrible, as you end up paying the network traffic you generate and you spam the service with useless requests. To help minimize this there is a concept called long polling which is so poorly documented it might as well not exist. I did find this NPM module though: https://www.npmjs.com/package/azure-awesome-queue
Alternatives
Honestly, at this point, you may be wondering why you should be using this service. I agree...
As an alternative there is RabbitMQ which is free, has a community, good documentation and a ton more features.
The downside here is that maintaining a RabbitMQ fault tolerant cluster is not exactly trivial.
Another alternative is Apache Kafka which is also very reliable.
You can receive messages from the service bus queue via subscribe method which listens to a stream of values. Example from Azure documentation below
const { delay, ServiceBusClient, ServiceBusMessage } = require("#azure/service-bus");
// connection string to your Service Bus namespace
const connectionString = "<CONNECTION STRING TO SERVICE BUS NAMESPACE>"
// name of the queue
const queueName = "<QUEUE NAME>"
async function main() {
// create a Service Bus client using the connection string to the Service Bus namespace
const sbClient = new ServiceBusClient(connectionString);
// createReceiver() can also be used to create a receiver for a subscription.
const receiver = sbClient.createReceiver(queueName);
// function to handle messages
const myMessageHandler = async (messageReceived) => {
console.log(`Received message: ${messageReceived.body}`);
};
// function to handle any errors
const myErrorHandler = async (error) => {
console.log(error);
};
// subscribe and specify the message and error handlers
receiver.subscribe({
processMessage: myMessageHandler,
processError: myErrorHandler
});
// Waiting long enough before closing the sender to send messages
await delay(20000);
await receiver.close();
await sbClient.close();
}
// call the main function
main().catch((err) => {
console.log("Error occurred: ", err);
process.exit(1);
});
source :
https://learn.microsoft.com/en-us/azure/service-bus-messaging/service-bus-nodejs-how-to-use-queues
I asked myslef the same question, here is what I found.
Use Google PubSub, it does exactly what you are looking for.
If you want to stay with Azure, the following ist possible:
cloud functions can be triggered from SBS messages
trigger an event-hub event with that cloud function
receive the event and fetch the message from SBS
You can make use of serverless functions which are "ServiceBusQueueTrigger",
they are invoked as soon as message arrives in queue,
Its pretty straight forward doing in nodejs, you need bindings defined in function.json which have type as
"type": "serviceBusTrigger",
This article (https://learn.microsoft.com/en-us/azure/azure-functions/functions-bindings-service-bus#trigger---javascript-example) probably would help in more detail.

Resources