can I limit consumption of kafka-node consumer? - node.js

It seems like my kafka node consumer:
var kafka = require('kafka-node');
var consumer = new Consumer(client, [], {
...
});
is fetching way too many messages than I can handle in certain cases.
Is there a way to limit it (for example accept no more than 1000 messages per second, possibly using the pause api?)
I'm using kafka-node, which seems to have a limited api comparing to the Java version

In Kafka, poll and process should happen in a coordinated/synchronized way. Ie, after each poll, you should process all received data first, before you do the next poll. This pattern will automatically throttle the number of messages to the max throughput your client can handle.
Something like this (pseudo-code):
while(isRunning) {
messages = poll(...)
for(m : messages) {
process(m);
}
}
(That is the reason, why there is not parameter "fetch.max.messages" -- you just do not need it.)

I had a similar situation where I was consuming messages from Kafka and had to throttle the consumption because my consumer service was dependent on a third party API which had its own constraints.
I used async/queue along with a wrapper of async/cargo called asyncTimedCargo for batching purpose.
The cargo gets all the messages from the kafka-consumer and sends it to queue upon reaching a size limit batch_config.batch_size or timeout batch_config.batch_timeout.
async/queue provides saturated and unsaturated callbacks which you can use to stop the consumption if your queue task workers are busy. This would stop the cargo from filling up and your app would not run out of memory. The consumption would resume upon unsaturation.
//cargo-service.js
module.exports = function(key){
return new asyncTimedCargo(function(tasks, callback) {
var length = tasks.length;
var postBody = [];
for(var i=0;i<length;i++){
var message ={};
var task = JSON.parse(tasks[i].value);
message = task;
postBody.push(message);
}
var postJson = {
"json": {"request":postBody}
};
sms_queue.push(postJson);
callback();
}, batch_config.batch_size, batch_config.batch_timeout)
};
//kafka-consumer.js
cargo = cargo-service()
consumer.on('message', function (message) {
if(message && message.value && utils.isValidJsonString(message.value)) {
var msgObject = JSON.parse(message.value);
cargo.push(message);
}
else {
logger.error('Invalid JSON Message');
}
});
// sms-queue.js
var sms_queue = queue(
retryable({
times: queue_config.num_retries,
errorFilter: function (err) {
logger.info("inside retry");
console.log(err);
if (err) {
return true;
}
else {
return false;
}
}
}, function (task, callback) {
// your worker task for queue
callback()
}), queue_config.queue_worker_threads);
sms_queue.saturated = function() {
consumer.pause();
logger.warn('Queue saturated Consumption paused: ' + sms_queue.running());
};
sms_queue.unsaturated = function() {
consumer.resume();
logger.info('Queue unsaturated Consumption resumed: ' + sms_queue.running());
};

From FAQ in the README
Create a async.queue with message processor and concurrency of one (the message processor itself is wrapped with setImmediate function so it will not freeze up the event loop)
Set the queue.drain to resume() the consumer
The handler for consumer's message event to pause() the consumer and pushes the message to the queue.

As far as I know the API does not have any kind of throttling. But both consumers (Consumer and HighLevelConsumer) have a 'pause()' function. So you could stop consuming if you get to much messages. Maybe that already offers what you need.
Please keep in mind what's happening. You send a fetch request to the broker and get a batch of message back. You can configure the min and max size of the messages (according to the documentation not the number of messages) you want to fetch:
{
....
// This is the minimum number of bytes of messages that must be available to give a response, default 1 byte
fetchMinBytes: 1,
// The maximum bytes to include in the message set for this partition. This helps bound the size of the response.
fetchMaxBytes: 1024 * 1024,
}

I was facing the same issue, initially fetchMaxBytes value was
fetchMaxBytes: 1024 * 1024 * 10 // 10MB
I just chanbed it to
fetchMaxBytes: 1024
It worked very smoothly after the change.

Related

Cloud Run PubSub high latency

I'm building a microservice application consisting of many microservices build with Node.js and running on Cloud Run. I use PubSub in several different ways:
For streaming data daily. The microservices responsible for gathering analytical data from different advertising services (Facebook Ads, LinkedIn Ads, etc.) use PubSub to stream data to a microservice responsible for uploading data to Google BigQuery. There also are services that stream a higher load of data (> 1 Gb) from CRMs and other services by splitting it into smaller chunks.
For messaging among microservices about different events that don't require an immediate response.
Earlier, I experienced some insignificant latency with PubSub. I know it's an open issue considering up to several seconds latency with low messages throughput. But in my case, we are talking about several minutes latency.
Also, I occasionally get an error message
Received error while publishing: Total timeout of API google.pubsub.v1.Publisher exceeded 60000 milliseconds before any response was received.
I this case a message is not sent at all or is highly delayed.
This is how my code looks like.
const subscriptions = new Map<string, Subscription>();
const topics = new Map<string, Topic>();
const listenForMessages = async (
subscriptionName: string,
func: ListenerCallback,
secInit = 300,
secInter = 300
) => {
let logger = new TestLogger("LISTEN_FOR_MSG");
let init = true;
const _setTimeout = () => {
let timer = setTimeout(() => {
console.log(`Subscription to ${subscriptionName} cancelled`);
subscription.removeListener("message", messageHandler);
}, (init ? secInit : secInter) * 1000);
init = false;
return timer;
};
const messageHandler = async (msg: Message) => {
msg.ack();
await func(JSON.parse(msg.data.toString()));
// wait for next message
timeout = _setTimeout();
};
let subscription: Subscription;
if (subscriptions.has(subscriptionName)) {
subscription = subscriptions.get(subscriptionName);
} else {
subscription = pubSubClient.subscription(subscriptionName);
subscriptions.set(subscriptionName, subscription);
}
let timeout = _setTimeout();
subscription.on("message", messageHandler);
console.log(`Listening for messages: ${subscriptionName}`);
};
const publishMessage = async (
data: WithAnyProps,
topicName: string,
options?: PubOpt
) => {
const serializedData = JSON.stringify(data);
const dataBuffer = Buffer.from(serializedData);
try {
let topic: Topic;
if (topics.has(topicName)) {
topic = topics.get(topicName);
} else {
topic = pubSubClient.topic(topicName, {
batching: {
maxMessages: options?.batchingMaxMessages,
maxMilliseconds: options?.batchingMaxMilliseconds,
},
});
topics.set(topicName, topic);
}
let msg = {
data: dataBuffer,
attributes: options.attributes,
};
await topic.publishMessage(msg);
console.log(`Publishing to ${topicName}`);
} catch (err) {
console.error(`Received error while publishing: ${err.message}`);
}
};
A listenerForMessage function is triggered by an HTTP request.
What I have already checked
PubSub client is created only once outside the function.
Topics and Subscriptions are reused.
I made at least one instance of each container running to eliminate the possibility of delays triggered by cold start.
I tried to increase the CPU and Memory capacity of containers.
batchingMaxMessages and batchingMaxMilliseconds are set to 1
I checked that the latest version of #google-cloud/pubsub is installed.
Notes
High latency problem occurs only in the cloud environment. With local tests, everything works well.
Timeout error sometimes occurs in both environments.
The problem was in my understanding of Cloud Run Container's lifecycle. I used to send HTTP response 202 while having PubSub working in the background. After sending the response, the container switched to the idling state, what looked like high latency in my logs.

Any suggestions about how to publish a huge amount of messages within one round of request / response?

If I publish 50K messages using Promise.all like below:
const pubsub = new PubSub({ projectId: PUBSUB_PROJECT_ID });
const topic = pubsub.topic(topicName, {
batching: {
maxMessages: 1000,
maxMilliseconds: 100,
},
});
const n = 50 * 1000;
const dataBufs: Buffer[] = [];
for (let i = 0; i < n; i++) {
const data = `message payload ${i}`;
const dataBuffer = Buffer.from(data);
dataBufs.push(dataBuffer);
}
const tasks = dataBufs.map((d, idx) =>
topic.publish(d).then((messageId) => {
console.log(`[${new Date().toISOString()}] Message ${messageId} published. index: ${idx}`);
})
);
// publish messages concurrencly
await Promise.all(tasks);
// send response to front-end
res.json(data);
I will hit this issue: pubsub-emulator throw error and publisher throw "Retry total timeout exceeded before any response was received" when publish 50k messages
If I use for loop and async/await. The issue is gone.
const n = 50 * 1000;
for (let i = 0; i < n; i++) {
const data = `message payload ${i}`;
const dataBuffer = Buffer.from(data);
const messageId = await topic.publish(dataBuffer)
console.log(`[${new Date().toISOString()}] Message ${messageId} published. index: ${i}`)
}
// some logic ...
// send response to front-end
res.json(data);
But it will block the execution of subsequent logic because of async/await until all messages have been published. It takes a long time to post 50k messages.
Any suggestions about how to publish a huge amount of messages(about 50k) without blocking the execution of subsequent logic? Do I need to use child_process or some queue like bull to publish the huge amount of messages in the background without blocking request/response workflow of the API? This means I need to respond to the front-end as soon as possible, the 50k messages should be the background tasks.
It seems there is a memory queue inside #google/pubsub library. I am not sure if I should use another queue like bull again.
The time it will take to publish large amounts of data depends on a lot of factors:
Message size. The larger the messages, the longer it takes to send them.
Network capacity (both of the connection between wherever the publisher is running and Google Cloud and, if relevant, of the virtual machine itself). This puts an upper bound on the amount of data that can be transmitted. It is not atypical to see smaller virtual machines with limits in the 40MB/s range. Note that if you are testing via Wifi, the limits could be even lower than this.
Number of threads and number of CPU cores. When having to run a lot of asynchronous callbacks, the ability to schedule them to run can be limited by the parallel capacity of the machine or runtime environment.
Typically, it is not good to try to send 50,000 publishes simultaneously from one instance of a publisher. It is likely that the above factors will cause the client to get overloaded and result in deadline exceeded errors. The best way to prevent this is to limit the number of messages that can be outstanding for publish at one time. Some of the libraries like Java support this natively. The Node.js library does not yet support this feature, but likely will in the future.
In the meantime, you'd want to keep a counter of the number of messages outstanding and limit it to whatever the client seems to be able to handle. Start with 1000 and work up or down from there based on the results. A semaphore would be a pretty standard way to achieve this behavior. In your case the code would look something like this:
var sem = require('semaphore')(1000);
var publishes = []
const tasks = dataBufs.map((d, idx) =>
sem.take(function() => {
publishes.push(topic.publish(d).then((messageId) => {
console.log(`[${new Date().toISOString()}] Message ${messageId} published. index: ${idx}`);
sem.leave();
}));
})
);
// Await the start of publishing all messages
await Promise.all(tasks);
// Await the actual publishes
await Promise.all(publishes);

Google Cloud PubSub not ack messages

We have the system of publisher and subscriber systems based on GCP PubSub. Subscriber processing single message quite long, about 1 minute. We already set subscribers ack deadline to 600 seconds (10 minutes) (maximal one) to make sure, that pubsub will not start redelivery too earlier, as basically we have long running operation here.
I'm seeing this behavior of PubSub. While code sending ack, and monitor confirms that PubSub acknowledgement request has been accepted and acknowledgement itself completed with success status, total number of unacked messages still the same.
Metrics on the charts showing the same for sum, count and mean aggregation aligner. On the picture above aligner is mean and no reducers enabled.
I'm using #google-cloud/pubsub Node.js library. Different versions has been tried (0.18.1, 0.22.2, 0.24.1), but I guess issue not in them.
The following class can be used to check.
TypeScript 3.1.1, Node 8.x.x - 10.x.x
import { exponential, Backoff } from "backoff";
const pubsub = require("#google-cloud/pubsub");
export interface IMessageHandler {
handle (message): Promise<void>;
}
export class PubSubSyncListener {
private readonly client;
private listener: Backoff;
private runningOperations: Promise<unknown>[] = [];
constructor (
private readonly handler: IMessageHandler,
private readonly options: {
/**
* Maximal messages number to be processed simultaniosly.
* Listener will try to keep processing number as close to provided value
* as possible.
*/
maxMessages: number;
/**
* Formatted full subscrption name /projects/{projectName}/subscriptions/{subscriptionName}
*/
subscriptionName: string;
/**
* In milliseconds
*/
minimalListenTimeout?: number;
/**
* In milliseconds
*/
maximalListenTimeout?: number;
}
) {
this.client = new pubsub.v1.SubscriberClient();
this.options = Object.assign({
minimalListenTimeout: 300,
maximalListenTimeout: 30000
}, this.options);
}
public async listen () {
this.listener = exponential({
maxDelay: this.options.maximalListenTimeout,
initialDelay: this.options.minimalListenTimeout
});
this.listener.on("ready", async () => {
if (this.runningOperations.length < this.options.maxMessages) {
const [response] = await this.client.pull({
subscription: this.options.subscriptionName,
maxMessages: this.options.maxMessages - this.runningOperations.length
});
for (const m of response.receivedMessages) {
this.startMessageProcessing(m);
}
this.listener.reset();
this.listener.backoff();
} else {
this.listener.backoff();
}
});
this.listener.backoff();
}
private startMessageProcessing (message) {
const index = this.runningOperations.length;
const removeFromRunning = () => {
this.runningOperations.splice(index, 1);
};
this.runningOperations.push(
this.handler.handle(this.getHandlerMessage(message))
.then(removeFromRunning, removeFromRunning)
);
}
private getHandlerMessage (message) {
message.message.ack = async () => {
const ackRequest = {
subscription: this.options.subscriptionName,
ackIds: [message.ackId]
};
await this.client.acknowledge(ackRequest);
};
return message.message;
}
public async stop () {
this.listener.reset();
this.listener = null;
await Promise.all(
this.runningOperations
);
}
}
This is basically partial implementation of async pulling of the messages and immediate acknowledgment. Because one of the proposed solutions was in usage of the synchronous pulling.
I found similar reported issue in java repository, if I'm not mistaken in symptoms of the issue.
https://github.com/googleapis/google-cloud-java/issues/3567
The last detail here is that acknowledgment seems to work on the low number of requests. In case if I fire single message in pubsub and then immediately process it, undelivered messages number decreases (drops to 0 as only one message was there before).
The question itself - what is happening and why unacked messages number is not reducing as it should when ack has been received?
To quote from the documentation, the subscription/num_undelivered_messages metric that you're using is the "Number of unacknowledged messages (a.k.a. backlog messages) in a subscription. Sampled every 60 seconds. After sampling, data is not visible for up to 120 seconds."
You should not expect this metric to decrease immediately upon acking a message. In addition, it sounds as if you are trying to use pubsub for an exactly once delivery case, attempting to ensure the message will not be delivered again. Cloud Pub/Sub does not provide these semantics. It provides at least once semantics. In other words, even if you have received a value, acked it, received the ack response, and seen the metric drop from 1 to 0, it is still possible and correct for the same worker or another to receive an exact duplicate of that message. Although in practice this is unlikely, you should focus on building a system that is duplicate tolerant instead of trying to ensure your ack succeeded so your message won't be redelivered.

Can I filter an Azure ServiceBusService using node.js SDK?

I have millions of messages in a queue and the first ten million or so are irrelevant. Each message has a sequential ActionId so ideally anything < 10000000 I can just ignore or better yet delete from the queue. What I have so far:
let azure = require("azure");
function processMessage(sb, message) {
// Deserialize the JSON body into an object representing the ActionRecorded event
var actionRecorded = JSON.parse(message.body);
console.log(`processing id: ${actionRecorded.ActionId} from ${actionRecorded.ActionTaken.ActionTakenDate}`);
if (actionRecorded.ActionId < 10000000) {
// When done, delete the message from the queue
console.log(`Deleting message: ${message.brokerProperties.MessageId} with ActionId: ${actionRecorded.ActionId}`);
sb.deleteMessage(message, function(deleteError, response) {
if (deleteError) {
console.log("Error deleting message: " + message.brokerProperties.MessageId);
}
});
}
// immediately check for another message
checkForMessages(sb);
}
function checkForMessages(sb) {
// Checking for messages
sb.receiveQueueMessage("my-queue-name", { isPeekLock: true }, function(receiveError, message) {
if (receiveError && receiveError === "No messages to receive") {
console.log("No messages left in queue");
return;
} else if (receiveError) {
console.log("Receive error: " + receiveError);
} else {
processMessage(sb, message);
}
});
}
let connectionString = "Endpoint=sb://<myhub>.servicebus.windows.net/;SharedAccessKeyName=KEYNAME;SharedAccessKey=[mykey]"
let serviceBusService = azure.createServiceBusService(connectionString);
checkForMessages(serviceBusService);
I've tried looking at the docs for withFilter but it doesn't seem like that applies to queues.
I don't have access to create or modify the underlying queue aside from the operations mentioned above since the queue is provided by a client.
Can I either
Filter my results that I get from the queue
speed up the queue processing somehow?
Filter my results that I get from the queue
As you found, filters as a feature are only applicable to Topics & Subscriptions.
speed up the queue processing somehow
If you were to use the #azure/service-bus package which is the newer, faster library to work with Service Bus, you could receive the messages in ReceiveAndDelete mode until you reach the message with ActionId 9999999, close that receiver and then create a new receiver in PeekLock mode. For more on these receive modes, see https://learn.microsoft.com/en-us/azure/service-bus-messaging/message-transfers-locks-settlement#settling-receive-operations

Amazon SQS with aws-sdk receiveMessage Stall

I'm using the aws-sdk node module with the (as far as I can tell) approved way to poll for messages.
Which basically sums up to:
sqs.receiveMessage({
QueueUrl: queueUrl,
MaxNumberOfMessages: 10,
WaitTimeSeconds: 20
}, function(err, data) {
if (err) {
logger.fatal('Error on Message Recieve');
logger.fatal(err);
} else {
// all good
if (undefined === data.Messages) {
logger.info('No Messages Object');
} else if (data.Messages.length > 0) {
logger.info('Messages Count: ' + data.Messages.length);
var delete_batch = new Array();
for (var x=0;x<data.Messages.length;x++) {
// process
receiveMessage(data.Messages[x]);
// flag to delete
var pck = new Array();
pck['Id'] = data.Messages[x].MessageId;
pck['ReceiptHandle'] = data.Messages[x].ReceiptHandle;
delete_batch.push(pck);
}
if (delete_batch.length > 0) {
logger.info('Calling Delete');
sqs.deleteMessageBatch({
Entries: delete_batch,
QueueUrl: queueUrl
}, function(err, data) {
if (err) {
logger.fatal('Failed to delete messages');
logger.fatal(err);
} else {
logger.debug('Deleted recieved ok');
}
});
}
} else {
logger.info('No Messages Count');
}
}
});
receiveMessage is my "do stuff with collected messages if I have enough collected messages" function
Occasionally, my script is stalling because I don't get a response for Amazon at all, say for example there are no messages in the queue to consume and instead of hitting the WaitTimeSeconds and sending a "no messages object", the callback isn't called.
(I'm writing this up to Amazon Weirdness)
What I'm asking is whats the best way to detect and deal with this, as I have some code in place to stop concurrent calls to receiveMessage.
The suggested answer here: Nodejs sqs queue processor also has code that prevents concurrent message request queries (granted it's only fetching one message a time)
I do have the whole thing wrapped in
var running = false;
runMonitorJob = setInterval(function() {
if (running) {
} else {
running = true;
// call SQS.receive
}
}, 500);
(With a running = false after the delete loop (not in it's callback))
My solution would be
watchdogTimeout = setTimeout(function() {
running = false;
}, 30000);
But surely this would leave a pile of floating sqs.receive's lurking about and thus much memory over time?
(This job runs all the time, and I left it running on Friday, it stalled Saturday morning and hung till I manually restarted the job this morning)
Edit: I have seen cases where it hangs for ~5 minutes and then suddenly gets messages BUT with a wait time of 20 seconds it should throw a "no messages" after 20 seconds. So a WatchDog of ~10 minutes might be more practical (depending on the rest of ones business logic)
Edit: Yes Long Polling is already configured Queue Side.
Edit: This is under (latest) v2.3.9 of aws-sdk and NodeJS v4.4.4
I've been chasing this (or a similar) issue for a few days now and here's what I've noticed:
The receiveMessage call does eventually return although only after 120 seconds
Concurrent calls to receiveMessage are serialised by the AWS.SDK library so making multiple calls in parallel have no effect.
The receiveMessage callback does not error - in fact after the 120 seconds have passed, it may contain messages.
What can be done about this? This sort of thing can happen for a number of reasons and some/many of these things can't necessarily be fixed. The answer is to run multiple services each calling receiveMessage and processing the messages as they come - SQS supports this. At any time, one of these services may hit this 120 second lag but the other services should be able to continue on as normal.
My particular problem is that I have some critical singleton services that can't afford 120 seconds of down time. For this I will look into either 1) use HTTP instead of SQS to push messages into my service or 2) spawn slave processes around each of the singletons to fetch the messages from SQS and push them into the service.
I also ran into this issue, but not when calling receiveMessage but sendMessage. I also saw hangups of exactly 120 seconds. I also saw it with a few other services, like Firehose.
That lead me to this line in the AWS SDK:
SQS Constructor
httpOptions:
timeout [Integer] — Sets the socket to timeout after timeout milliseconds of inactivity on the socket. Defaults to two minutes (120000).
to implement a fix, I override the timeout for my SQS client that performs the sendMessage to timeout after 10 seconds, and another with 25 seconds for receiving (where I long poll for 20 seconds):
var sendClient = new AWS.SQS({httpOptions:{timeout:10*1000}});
var receiveClient = new AWS.SQS({httpOptions:{timeout:25*1000}});
I've had this out in production for a week now and I've noticed that all of my SQS stalling issues have been eliminated.

Resources