What is the most efficient way to delete/expire all messages in a Apache Pulsar topic? - apache-pulsar

I'm trying to figure out what would be the best way to remove all the messages from a Pulsar topic (either logically or physically), so that they are no longer consumable by subscriptions?
I know we can simply do $ pulsar-admin persistent delete persistent://tenant/namespace/topic.
But, this solution has some drawbacks: it removes the topic completely (so we have to recreate it later) then there should be no active client connected to it (i.e: subscriptions or producers).
Alternatively is there a way to programmatically make all messages between two MessageId unavailable to the subscriptions ?
Thanks

There are a couple of options you can choose from.
You can use topics skip to skip N messages for a specific subscription of a given topic. https://pulsar.apache.org/docs/en/admin-api-persistent-topics/#skip-messages
You can use topics skip-all to skip all the old messages for a specific subscription for a given topic. https://pulsar.apache.org/docs/en/admin-api-persistent-topics/#skip-all-messages
You can use topics clear-backlog to clear the backlog of a specific subscription. It is same as topics skip-all.
You can also use topics reset-cursor to move the subscription cursor to a specific message id or a timestamp back.

From Sijie Guo's answer, I tried skip-all, but got:
Expected a command, got skip-all
Invalid command, please use pulsar-admin --help to check out how to use
I retried with clear-backlog with success.
https://github.com/apache/pulsar/issues/5685#issuecomment-664751216
The doc is updated here:
https://pulsar.apache.org/docs/pulsar-admin/#list-5
but not here:
https://pulsar.apache.org/docs/admin-api-topics#skip-all-messages
So it's confusing

Related

Why Are There 2 Events In Some Azure Eventhub Records?

sorry if this is a dumb question, but I cannot seem to find the answer:
I am using an external source to read in Audit Log events in Azure Eventhub. I am able to get the data flowing and working, but I see that there are messages with the records field that have 2 messages, but some records only have 1 message. For those records that have 2 json events in them, why is this the case? I see that they might be related.
What I mean is that some logs I will see for some:
category:NoninteractiveSignin:
records:[{..},{..}]
Event Hub messages are binary, and opaque to Event Hubs. It’s entirely up to the sender what’s in each one.
So you’ll need to ask whatever application creates the messages about that.

Is there a way to acknowledge specific message in Pulsar?

Is there a way to acknowledge a particular message in a topic on behalf of a specific subscriber?
I couldn't find anything related to this in the api, both the admin and client api.
Yes, Consumer.acknowledge(msg) method is acknowledging the consumption of 1 specific message.
If I understand the question correctly, it sounds like you're trying to do this on an administrative basis, rather than via the typical consumer acknowledgment behavior of a subscriber using the Pulsar client. If your consumer is unable to process the message, for example, and you want to remove it from the backlog, you can skip the message.
pulsar-admin topics skip \
--count 1 --subscription my-subscription \
persistent://my-tenant/my-namespace/my-topic
Keep in mind that if you are using retention, skipping the message bypasses the mechanism that retains the message, so unless the message was already acknowledged and is stuck due to a bug, skipping will consequently schedule the message for deletion.

How to populate MQTT topic list dynamically on node js

I am using mqtt-node to get subscribed messages. But the problem is the topic list for subscribing will be appended through an API. But the appended topic list is not being read by the mqtt connection while subscribing for other topics. Please advice or suggest a suitable way to solve this issue.
There is no topic list.
The only way to discover what topics are in use is to either maintain a list external to the broker or to subscribe to a a wildcard and see what messages are published.
It's important to remember that topics only really exist at the moment a message is published to one. Subscribers supply a list of patterns (they can include wildcards like + or #) to match against those published topics and any matching messages are delivered.
You maintain an array of Topics
var topics = [
"test/1",
"test/2",
"test/3"
]
When a new Topic arrives via the API, you will need to first unsubscribe from the existing Topics
client.unsubscribe(topics)
then add the new Topic
topics.push(newTopic)
then re-subscribe
client.subscribe(topics)
This is what worked best for me when I have this use case.
Keep in mind that the time between unsubscribing and re-subscribing, messages could be Published and your client would not see them due to not being subscribed at the time. This is easy to overcome if you can use the RETAIN field on your Publishers....but in some use cases, this isn't practical.

Azure service bus: is it wise to create a separate topic for every event you broadcast?

I am trying to design the strategy that my organization will employ to create topics, and which messages will go to which one. I am looking at either creating a separate topic for each event, or a single topic to hold messages from all events, and then to triage with filters. I am convinced that using a separate topic for every event is better because:
Filters will be less complex and thus more performant, since each
event is already separated in its own topic.
There will be less chance of message congestion in any given topic.
Messages are less likely to be needlessly copied into any given
subscription.
More topics means more messaging stores, which means better message
retrieval and sending.
From a risk management perspective, it seems like having more topics
is better. If I only used a single topic, an outage would affect all
subscribers for all messages. If I use many topics, then perhaps outages would only affect some
topics and leave the others operational.
I get 12 more shared access keys per topic. It's easier to have more granular control over which topics are
exposed to which client apps since I can add/revoke access by
add/revoking the shared access key for each app on a per-topic basis.
Any thoughts would be appreciated
Like Sean already mentioned, there is really no one answer but here are some details about topics that could help you.
Topics are designed for large number of recipients by sending messages to multiple (upto 2000) subscriptions, which actually have the filters
Topics don't really store messages but subscriptions do
For outages, unless you have topics across regions, I'm not sure if it would help as such
The limit is for shared access authorization rules per policy. You should be using one of these to generate a SAS key for your clients.
Also, chaining service bus with autoforwarding is something you could consider as required.

How do I remove events from Eventhub

I might be confused how EventHubs supposed to be used or need guidance on how to reliably process events posted into Eventhub. I export Azure ActivityLog to Eventhub and currently just using console application to read those messages. What I don't understand is what I'm supposed to do with events which I already read and processed. Say I want to write content of all messages into Storage account AppendLog. For this I need to delete messages which I already processed (like it would be done if it will be message queue), how do I do that with eventhub?
You cannot delete them. From the docs:
Event Hubs retains data for a configured retention time that applies across all partitions in the event hub. Events expire on a time basis; you cannot explicitly delete them.
Back to your question:
Say I want to write content of all messages into Storage account AppendLog. For this I need to delete messages which I already processed
I am not sure why you need this though. You can keep a pointer to the last read message so you are able to process only new messages. Why should you need to delete the older ones? You can read about offsets and ceckpointing here.
What technique are you using for reading the messages?
If you need a pattern of popping messages, you need the Queue or Topic from the Azure Service Bus.
When you ack that message, it is popped from the queue.

Resources