Apache Pulsar Client - Broker notification of Closed consumer - how to resume data feed? - apache-pulsar

TLDR: using python client library to subscribe to pulsar topic. logs show: 'broker notification of consumer closed' when something happens server-side. subscription appears to be re-established according to logs but we find later that backlog was growing on cluster b/c no msgs being sent to our subscription to consume
Running into an issue where we have an Apache-Pulsar cluster we are using that is opaque to us, and has a namespace defined where we publish/consume topics, is losing connection with our consumer.
We have a python client consuming from a topic (with one Pulsar Client subscription per thread).
We have run into an issue where, due to an issue on the pulsar cluster, we see the following entry in our client logs:
"Broker notification of Closed consumer"
followed by:
"Created connection for pulsar://houpulsar05.mycompany.com:6650"
....for every thread in our agent.
Then we see the usual periodic log entries like this:
{"log":"2022-09-01 04:23:30.269 INFO [139640375858944] ConsumerStatsImpl:63 | Consumer [persistent://tenant/namespace/topicname, subscription-name, 0] , ConsumerStatsImpl (numBytesRecieved_ = 0, totalNumBytesRecieved_ = 6545742, receivedMsgMap_ = {}, ackedMsgMap_ = {}, totalReceivedMsgMap_ = {[Key: Ok, Value: 3294], }, totalAckedMsgMap_ = {[Key: {Result: Ok, ackType: 0}, Value: 3294], })\n","stream":"stdout","time":"2022-09-01T04:23:30.270009746Z"}
This gives the appearance that some connection has been re-established to some other broker.
However, we do not get any messages being consumed. We have an alert on Grafana dashboard which shows us the backlog on topics and subscription backlog. Eventually it either hits a count or rate thresshold which will alert us that there is a problem. When we restart our agent, the subscription is re-establish and the backlog is can immediately be seen heading to 0.
Has anyone experienced such an issue?
Our code is typical:
consumer = client.subscribe(
topic='my-topic',
subscription_name='my-subscription',
consumer_type=my_consumer_type,
consumer_name=my_agent_name
)
while True:
msg = consumer.receive()
ex = msg.value()
i haven't yet found a readily-available way docker-compose or anything to run a multi-cluster pulsar installation locally on Docker desktop for me to try killing off a broker and see how consumer reacts.

Currently Python client only supports configuring one broker's address and doesn't support retry for lookup yet. Here are two related PRs to support it:
https://github.com/apache/pulsar/pull/17162
https://github.com/apache/pulsar/pull/17410
Therefore, setting up a multi-nodes cluster might be nothing different from a standalone.
If you only specified one broker in the service URL, you can simply test it with a standalone. Run a consumer and a producer sending messages periodically, then restart the standalone. The "Broker notification of Closed consumer" appears when the broker actively closes the connection, e.g. your consumer has sent a SEEK command (by seek call), then broker will disconnect the consumer and the log appears.
BTW, it's better to show your Python client version. And GitHub issues might be a better place to track the issue.

Related

Azure Service Bus message receipt and publishing fails after changing database DTUs

I have an Azure Kubernetes Service that subscribes to a topic on an Azure Service Bus. As messages are received, a number of operations happen before calling a stored proc in an Azure Database. It then publishes a message to another topic on the same Service Bus. This runs, processing thousands of messages without issue.
When the DBA changes the DTUs on that Azure Database, the k8s service stops receiving messages, indicating a message exists but none received. It also begins showing "Service link closed" errors with the topic the app would attempt to publish to.
It never corrects out of this state.
Subscribed to topic messages
my-subscribed-topic/Subscriptions/my-subscription-d363a5a7-2262-4c74-a134-3a94f6b3c290-Receiver: RenewLockAsync start. MessageCount = 1, LockToken = 5abf6b8a-21fe-4b16-938a-b179b29ebadc
my-subscribed-topic/Subscriptions/my-subscription-d363a5a7-2262-4c74-a134-3a94f6b3c290-Receiver: RenewLockAsync done. LockToken = 5abf6b8a-21fe-4b16-938a-b179b29ebadc
my-subscribed-topic/Subscriptions/my-subscription-d363a5a7-2262-4c74-a134-3a94f6b3c290: Processor RenewMessageLock complete. LockToken = 5abf6b8a-21fe-4b16-938a-b179b29ebadc
my-subscribed-topic/Subscriptions/my-subscription-d363a5a7-2262-4c74-a134-3a94f6b3c290: Processor RenewMessageLock start. MessageCount = 1, LockToken = 5abf6b8a-21fe-4b16-938a-b179b29ebadc
my-subscribed-topic/Subscriptions/my-subscription-eaf2b5f2-2d34-43e0-88b1-76414175422e-Receiver: ReceiveBatchAsync done. Received '0' messages. LockTokens =
Published to topic messages
Send Link Closed. Identifier: published-to-topic-995469e6-d697-433a-aaea-112366bdc58a, linkException: Azure.Messaging.ServiceBus.ServiceBusException: The link 'G24:260656346:amqps://my-sb-resource.servicebus.windows.net/-34d9f631;1:454:455' is force detached. Code: publisher(link83580724). Details: AmqpMessagePublisher.IdleTimerExpired: Idle timeout: 00:10:00. (GeneralError). For troubleshooting information, see https://aka.ms/azsdk/net/servicebus/exceptions/troubleshoot..
Send Link Closed. Identifier: published-to-topic-6ba720c4-8894-474e-b321-0f84f569e6fc, linkException: Azure.Messaging.ServiceBus.ServiceBusException: The link 'G24:260657004:amqps://my-sb-resource.servicebus.windows.net/-34d9f631;1:456:457' is force detached. Code: publisher(link83581007). Details: AmqpMessagePublisher.IdleTimerExpired: Idle timeout: 00:10:00. (GeneralError). For troubleshooting information, see https://aka.ms/azsdk/net/servicebus/exceptions/troubleshoot..
Send Link Closed. Identifier: published-to-topic-865efa89-0775-4f5f-a5d0-9fde35fdabce, linkException: Azure.Messaging.ServiceBus.ServiceBusException: The link 'G24:260657815:amqps://my-sb-resource.servicebus.windows.net/-34d9f631;1:458:459' is force detached. Code: publisher(link83581287). Details: AmqpMessagePublisher.IdleTimerExpired: Idle timeout: 00:10:00. (GeneralError). For troubleshooting information, see https://aka.ms/azsdk/net/servicebus/exceptions/troubleshoot..
I can't think of a reason changing DTUs would have any impact whatsoever on maintaining connection with a service bus. We've replicated the behavior three straight times, though.

Google Pub/Sub with distributed subscribers in Node.js

We are attempting to migrate a message processing app from Kafka to Google Pub/Sub and it's just not working as expected.
We are running in Kubernetes (Google Cloud) where there may be multiple pods processing messages on the same subscription. Topics and subscriptions are all created using terraform and are more or less permanent. They are not created/destroyed on the fly by the application.
In our development environment, where message throughput is rather low, everything works just fine. But when we scale up to production levels, everything seems to fall apart. We get big backlogs of unacked messages, and yet some pods are not receiving any messages at all. And then, all of a sudden, the backlog will just go away, but then climb again.
We are using the nodejs client library provided by google: #google-cloud/pubsub:3.1.0
Each instance of the application subscribes to the same named subscription, and according to the documentation, messages should be distributed to each subscriber. But that is not happening. Some pods will be consuming messages rapidly, while others sit idle.
Every message is processed in a try/catch block and we are not observing any errors being thrown. So, as far as we know, every received message is getting acked.
I am suspicious that, as pods are terminated with autoscaling or updated deployments, that we are not properly closing subscriptions, but there are no examples addressing a distributed environment and I have not found any document that specifically addresses how to properly manage resources. It is also worth mentioning that the app has multiple subscriptions to different topics.
When a pod shuts down, what actions should be taken on the Subscription object and the PubSub client object? Maybe that's not even the issue, but it seems like a reasonable place to start.
When we start a subscription we do something like this:
private exampleSubscribe(): Subscription {
// one suggestion for having multiple subscriptions in the same app
// was to use separate clients for each
const pubSubClient = new PubSub({
// use a regional endpoint for message ordering
apiEndpoint: 'us-central1-pubsub.googleapis.com:443',
});
pubSubClient.projectId = 'my-project-id';
const sub = pubSubClient.subscription('my-subscription-name', {
// have tried various values for maxMessage from 5 to the default of 1000
flowControl: { maxMessages: 250, allowExcessMessages: false },
ackDeadline: 30,
});
sub.on('message', async (message) => {
await this.exampleMessageProcessing(message);
});
return sub;
}
private async exampleMessageProcessing(message: Message): Promise<void> {
try {
// do some cool stuff
} catch (error) {
// log the error
} finally {
message.ack();
}
}
Upon termination of a pod, we do this:
private async exampleCloseSub(sub: Subscription) {
try {
sub.removeAllListeners('message');
await sub.close();
// note that we do nothing with the PubSub
// client object -- should it also be closed?
} catch (error) {
// ignore error, we are shutting down
}
}
When running with Kafka, we can easily keep up with the message pace with usually no more than 2 pods. So I know that we are not running into issues of it simply taking too long to process each message.
Why are messages being left unacked? Why are pods not receiving messages when there is clearly a large backlog? What is the correct way to shut down one subscriber on a shared subscription?
It turns out that the issue was an improper implementation of message ordering.
The official docs for message ordering in Pub/Sub are rather brief:
https://cloud.google.com/pubsub/docs/ordering
Not much there regarding how to implement an ordering key or the implications of message ordering on horizontal scaling.
Though they do link to some external resources, one of which is this blog post:
https://medium.com/google-cloud/google-cloud-pub-sub-ordered-delivery-1e4181f60bc8
In our case, we did not have enough distinct ordering keys to allow for proper distribution of messages across subscribers/pods.
So this was definitely an RTFM situation, or more accurately: Read The Fine Blog Post Referred To By The Manual. I would have much preferred that the important details were actually in the official documentation. Is that to much to ask for?

Active message count in Azure service bus keep decrease after kill the app

I am using ServiceBusProcessorClient consume the events from topic:
ServiceBusProcessorClient serviceBusProcessorClient = new ServiceBusClientBuilder()
.connectionString(busConnectionString)
.processor()
.disableAutoComplete()
.topicName(topicName)
.subscriptionName(subscriptionName)
.processMessage(processMessage)
.processError(context -> processError(context,countdownLatch))
.maxConcurrentCalls(maxConcurrentCalls)
.buildProcessorClient();
serviceBusProcessorClient.start();
But after kill the app ,The message count in Azure service bus keep decrease until reach 0 .
I can not understand what goes wrong in my implementation.
The Topic configuration :
topic config
The subscription configuration :
subscription config
Looks like helm deletes using the background propagation policy which lets the garbage collector to delete in the background. This is probably why your service is processing messages even after you run uninstall.
You would have to kill the process directly in addition to helm uninstall to not have anymore messages from being processed.

nodejs rhea npm for amqp couldn't create subscription queue on address in activemq artemis

I have an address "pubsub.foo" already configured as multicast in broker.xml.
<address name="pubsub.foo">
<multicast/>
</address>
As per the Artemis documentation:
When clients connect to an address with the multicast element, a subscription queue for the client will be automatically created for the client.
I am creating a simple utility using rhea AMQP Node.js npm to publish messages to the address.
var connection = require('rhea').connect({ port: args.port, host: args.host, username:'admin', password:'xxxx' });
var sender = connection.open_sender('pubsub.foo');
sender.on('sendable', function(context) {
var m = 'Hii test'
console.log('sent ' + m);
sender.send({body:m});
connection.close();
});
I enabled debug log and while running the client code I see the message like this.
2020-02-03 22:43:25,071 DEBUG [org.apache.activemq.artemis.core.postoffice.impl.PostOfficeImpl] Message org.apache.activemq.artemis.protocol.amqp.broker.AMQPMessage#68933e4b is not going anywhere as it didn't have a binding on address:pubsub.foo
I also tried different variations of the topic, for example, client1.pubsub.foo, pubsub.foo::client1 however no luck from the client code. Please share your thoughts. I am new to ActiveMQ Artemis.
What you're observing actually is the expected behavior.
Unfortunately, the documentation you cited isn't as clear as it could be. When it says a subscription queue will be created in response to a client connecting it really means a subscriber not a producer. That's why it creates a subscription queue. The semantics for a multicast address (and publish/subscribe in general) dictate that a message sent when there are no subscribers will be dropped. Therefore, you need to create a subscriber and then send a message.
If you want different semantics then I recommend you use anycast.

Rabbitmq keep request after stopping rabitmq procces and queue

I make a connection app with rabbitmq, it works fine but when I stop rabbitmq process all of my request get lost, I want even after killing rabitmq service, my requests get saved and after restart rabitmq service, all of my request return to their own places.
Here is my rabitmq.py:
import pika
import SimilarURLs
data = ''
connection = pika.BlockingConnection(pika.ConnectionParameters(host='localhost'))
channel = connection.channel()
def rabit_mq_start(Parameter):
channel.queue_declare(queue='req')
a = (take(datas=Parameter.decode()))
channel.basic_publish(exchange='',
routing_key='req',
body=str(a))
print(" [x] Sent {}".format(a))
return a
channel.start_consuming()
def take(datas):
returns = SimilarURLs.start(data=datas)
return returns
In addition, I'm sorry for writing mistakes in my question.
You need to enable publisher confirms (via the confirm_delivery method on your channel object). Then your application must keep track of what messages have been confirmed as published, and what messages have not. You will have to implement this yourself. When RabbitMQ is stopped and started again, your application can re-publish the messages that weren't confirmed.
It would be best to use the asynchronous publisher example as a guide. If you use BlockingConnection you won't get the async notifications when a message is confirmed, defeating their purpose.
If you need further assistance after trying to implement this yourself I suggest following up on the pika-python mailing list.
NOTE: the RabbitMQ team monitors the rabbitmq-users mailing list and only sometimes answers questions on StackOverflow.

Resources