Azure ServiceBus. QueueClient. ReceiveBatch issue - azure

I have Azure Service Bus setup with a bunch of queues on it. Some are partitioned. When I tried to read dead letter messages from one of those queues I've deferred messages, then did some massaging and then try to complete those deferred messages. And this is where trouble came in. On the call to the QueueClient.ReceiveBatch() I'm getting InavlidOperationExceptionexception with following message:
ReceiveBatch of sequence numbers from different partitions is not supported
for an entity with partitioning enabled.
Inner exception contains following justification:
BR0012ReceiveBatch of sequence numbers from different partitions is not
supported for an entity with partitioning enabled.
Here is actual line of code, which produces the error:
var deferredMessages = queueClient?.ReceiveBatch(lstSequenceNums);
where lstSequenceNums is type of List<long>and contains sequence numbers of deferred messages; queueClient is of type QueueClient
So I wonder how should this situation be handled? I don't quite understand why that exception has been thrown in the first place? If that expected behavior how I could figure out relation between partition and Service Bus message sequence number?
Any help would be greatly appreciated.

From Azure Service Bus partitioning documentation, for a single message requested
When a client wants to receive a message from a partitioned queue, or from a subscription to a partitioned topic, Service Bus queries all fragments for messages, then returns the first message that is obtained from any of the messaging stores to the receiver.
and
Each partitioned queue or topic consists of multiple fragments. Each fragment is stored in a different messaging store and handled by a different message broker.
I suspect you ask for messages from multiple partitions by requesting a batch using Sequence numbers from different partitions, it would force to query too many brokers and Azure Service Bus service doesn't allow that.
SequenceNumber can help to determine what partition your messages are coming from. Top 16 bits are used to encode the partition ID. You could you that to break your batch into separate calls.
ReceiveBatch(IEnumerable<int64>) does say anything about this. I suggest to raise an issue with the team to clarify the documentation.

I made it work by reading messages one by one(getting all messages in the loop) and abandoning batch reading approach completely, because it's too fragile and issues-pron. It's the only option one have at the moment, imho, if there are partitioned queues on one's Azure Service Bus. Sean Feldman provided some insights(thank you again), but it doesn't provide any acceptable way of making ReceiveBatch viable solution for the task at hand.

Related

Event Hub -- how to prevent duplicate handling when consumers scale out

When we have multiple consumers of Event Hub (or any messaging service, for that matter), how to make sure that no message is processed twice especially in a situation when consumer auto-scales out to multiple instances?
I know we could keep track of last message processed but then again, between the check if message was processed and actuall, processing it,other instance could process it already (race condition?.
so, how to solve that in a scalable way?
[UPDATE]
i am aware there is a recommendation to have at least as many partitions as there are consumers but what to do in case when a single consumer cannot process messages directed to it but needs to scale out to multiple instances?
Each processor takes a lease on a partition, see the docs
An event processor instance typically owns and processes events from one or more partitions. Ownership of partitions is evenly distributed among all the active event processor instances associated with an event hub and consumer group combination.
So scaling out doesn't result in duplicate message processing because a new processor cannot take a lease on a partition that is already being handled by another processor.
Then, regarding your comment:
i am aware there is a recommendation to have at least as many partitions as there are consumers
It is the other way around: it is recommended to have as many consumers as you have partitions. If you have more consumers than partitions the consumers will compete with each other to obtain a lock on a partition.
Now, regarding duplicate messages, since Event Hub guarantees at-least-once delivery there isn't much you can do to prevent this. There aren't that many scalable services that offer at-most-once deliveries, I know that Azure Service Bus Queues do offer this if you really need it.
The question may arise what can cause duplicate message processing. Well, when processing message the processor does some checkpointing: once in a while it will store its position within a partition event sequence (remember, a partition is bound to a single processor). Now when the processer instance crashes between two checkpoint events a new instance will resume processing messages from the position of the last checkpoint. That may very well lead to older messages being processed again.
If a reader disconnects from a partition, when it reconnects it begins reading at the checkpoint that was previously submitted by the last reader of that partition in that consumer group.
So, that means you need to make sure your processing logic is idempotent. How, that is up to you as I don't know your use case.
One option is to track each individual message to see whether it is already processed or not. If you do not have a unique ID to check on maybe you can generate a hash of the whole message and compare with that.

Spark Streaming job fails with ReceiverDisconnectedException Class

I've Spark Streaming job which captures the near real time data from Azure Eventhub and runs 24/7.
More interestingly, my job fails at least 2 times a day with the below error. if I google the error, Microsoft docs gives me 'This exception is thrown if two or more PartitionReceiver instances connect to the same partition with different epoch values'. I'm not worried about data loss because spark Checkpointing will automatically take care of data when i restart the job, but my question is why the spark streaming job fails 2-3 times a day with the same error.
Has anybody faced the same issue, is there any solution/workaround available of this. Any help would be much appreciated.
error:
This exception is thrown if two or more Partitions Receiver instances connect to the same partition with different epoch values.
What is Partition Receiver?
This is a logical representation of receiving from a EventHub partition.
A PartitionReceiver is tied to a ConsumerGroup + Partition combination. If you are creating an epoch based PartitionReceiver (i.e. PartitionReceiver.Epoch != 0) you cannot have more than one active receiver per ConsumerGroup + Partition combo. You can have multiple receivers per ConsumerGroup + Partition combination with non-epoch receivers.
It sounds like you are running two instances of the application, two concurrent classes, or two applications that use the same event hub consumer group. Event hub consumer groups are effectively pointers to a point in time on the event stream. If you try and use one consumer group pointing with two instances of code, then you get a conflict like the one you are seeing.
Either:
Ensure you only have a single instance reading the consumer group at a time.
Use two consumer groups when you need two separate programs or sets of functionality to process the event hub at the same time.
If you are looking to parallelize for performance, look in to event hub Partitioning and how to take advantage of processing each partition independently.
There is also an alternative scenario where an event hub partition is switched over to another host as part of the event hub's internal load balancing. In this case you may see the error you are receiving. In this case, just log it and continue on.
For more details, refer "Features and terminology in Azure Event Hubs" and "Event Hubs Receiver Epoch".
Hope this helps.

How does Azure Service Bus Queue guarantees at most once delivery?

According to this doc service bus supports two modes Receive-and-Delete and Peek-Lock.
If using Peek-Lock Mode if the consumer crashes/hangs/do a very long GC right after processing the message, but before the messageId is "Completed" and visibility time expires there's a chance that same message is delivered twice.
Then how does Microsoft says that Service Bus supports at most once delivery mode. Is it because of the Receive-and-Delete mode which sends messages only once.But then again, if something happens while consumers are processing the message then that valuable info is lost.
If yes then what is the best way to ensure exact once delivery using Azure Services Bus as Queue and Azure Functions as Consumers.
P.S. The one approach I can think of is storing MessageID's in blob but since in my case number of MessageID's could be very large storing and loading all of them is not right approach.
Azure Functions will always consume Service Bus messages in Peek-Lock mode. Exactly Once delivery is basically not possible in general case: there's always a chance that consuming application will crash at wrong time just before completing the message, and then the message will be re-delivered.
You should strive to implement Effectively Once processing. This is usually achieved with idempotent message processor.
Storing MessageID's (consumer-side de-duplication) is one option. You could have a policy to clean up old Message IDs to keep the size of such storage manageable. To make this 100% reliable you would have to store Message ID in the same transaction as other modifications done by processor.
Other options really depend on your processing scenario. Find a way to make it idempotent - so that processing the same message multiple times is functionally same as processing it just once.

what is best practice to consume messages from multiple kafka topics?

I need to consumer messages from different kafka topics,
Should i create different consumer instance per topic and then start a new processing thread as per the number of partition.
or
I should subscribe all topics from a single consumer instance and the should start different processing threads
Thanks & regards,
Megha
The only rule is that you have to account for what Kafka does and doesn't not guarantee:
Kafka only guarantees message order for a single topic/partition. edit: this also means you can get messages out of order if your single topic Consumer switches partitions for some reason.
When you subscribe to multiple topics with a single Consumer, that Consumer is assigned a topic/partition pair for each requested topic.
That means the order of incoming messages for any one topic will be correct, but you cannot guarantee that ordering between topics will be chronological.
You also can't guarantee that you will get messages from any particular subscribed topic in any given period of time.
I recently had a bug because my application subscribed to many topics with a single Consumer. Each topic was a live feed of images at one image per message. Since all the topics always had new images, each poll() was only returning images from the first topic to register.
If processing all messages is important, you'll need to be certain that each Consumer can process messages from all of its subscribed topics faster than the messages are created. If it can't, you'll either need more Consumers committing reads in the same group, or you'll have to be OK with the fact that some messages may never be processed.
Obviously one Consumer/topic is the simplest, but it does add some overhead to have the additional Consumers. You'll have to determine whether that's important based on your needs.
The only way to correctly answer your question is to evaluate your application's specific requirements and capabilities, and build something that works within those and within Kafka's limitations.
This really depends on logic of your application - does it need to see all messages together in one place, or not. Sometimes, consumption from single topic could be easier to implement in terms of business logic of your application.

Azure event hubs and multiple consumer groups

Need help on using Azure event hubs in the following scenario. I think consumer groups might be the right option for this scenario, but I was not able to find a concrete example online.
Here is the rough description of the problem and the proposed solution using the event hubs (I am not sure if this is the optimal solution. Will appreciate your feedback)
I have multiple event-sources that generate a lot of event data (telemetry data from sensors) which needs to be saved to our database and some analysis like running average, min-max should be performed in parallel.
The sender can only send data to a single endpoint, but the event-hub should make this data available to both the data handlers.
I am thinking about using two consumer groups, first one will be a cluster of worker role instances that take care of saving the data to our key-value store and the second consumer group will be an analysis engine (likely to go with Azure Stream Analysis).
Firstly, how do I setup the consumer groups and is there something that I need to do on the sender/receiver side such that copies of events appear on all consumer groups?
I did read many examples online, but they either use client.GetDefaultConsumerGroup(); and/or have all partitions processed by multiple instances of a same worker role.
For my scenario, when a event is triggered, it needs to be processed by two different worker roles in parallel (one that saves the data and second one that does some analysis)
Thank You!
TLDR: Looks reasonable, just make two Consumer Groups by using different names with CreateConsumerGroupIfNotExists.
Consumer Groups are primarily a concept so exactly how they work depends on how your subscribers are implemented. As you know, conceptually they are a group of subscribers working together so that each group receives all the messages and under ideal (won't happen) circumstances probably consumes each message once. This means that each Consumer Group will "have all partitions processed by multiple instances of the same worker role." You want this.
This can be implemented in different ways. Microsoft has provided two ways to consume messages from Event Hubs directly plus the option to use things like Streaming Analytics which are probably built on top of the two direct ways. The first way is the Event Hub Receiver, the second which is higher level is the Event Processor Host.
I have not used Event Hub Receiver directly so this particular comment is based on the theory of how these sorts of systems work and speculation from the documentation: While they are created from EventHubConsumerGroups this serves little purpose as these receivers do not coordinate with one another. If you use these you will need to (and can!) do all the coordination and committing of offsets yourself which has advantages in some scenarios such as writing the offset to a transactional DB in the same transaction as computed aggregates. Using these low level receivers, having different logical consumer groups using the same Azure consumer group probably shouldn't (normative not practical advice) be particularly problematic, but you should use different names in case it either does matter or you change to EventProcessorHosts.
Now onto more useful information, EventProcessorHosts are probably built on top of EventHubReceivers. They are a higher level thing and there is support to enable multiple machines to work together as a logical consumer group. Below I've included a lightly edited snippet from my code that makes an EventProcessorHost with a bunch of comments left in explaining some choices.
//We need an identifier for the lease. It must be unique across concurrently
//running instances of the program. There are three main options for this. The
//first is a static value from a config file. The second is the machine's NETBIOS
//name ie System.Environment.MachineName. The third is a random value unique per run which
//we have chosen here, if our VMs have very weak randomness bad things may happen.
string hostName = Guid.NewGuid().ToString();
//It's not clear if we want this here long term or if we prefer that the Consumer
//Groups be created out of band. Nor are there necessarily good tools to discover
//existing consumer groups.
NamespaceManager namespaceManager =
NamespaceManager.CreateFromConnectionString(eventHubConnectionString);
EventHubDescription ehd = namespaceManager.GetEventHub(eventHubPath);
namespaceManager.CreateConsumerGroupIfNotExists(ehd.Path, consumerGroupName);
host = new EventProcessorHost(hostName, eventHubPath, consumerGroupName,
eventHubConnectionString, storageConnectionString, leaseContainerName);
//Call something like this when you want it to start
host.RegisterEventProcessorFactoryAsync(factory)
You'll notice that I told Azure to make a new Consumer Group if it doesn't exist, you'll get a lovely error message if it doesn't. I honestly don't know what the purpose of this is because it doesn't include the Storage connection string which needs to be the same across instances in order for the EventProcessorHost's coordination (and presumably commits) to work properly.
Here I've provided a picture from Azure Storage Explorer of leases the leases and presumably offsets from a Consumer Group I was experimenting with in November. Note that while I have a testhub and a testhub-testcg container, this is due to manually naming them. If they were in the same container it would be things like "$Default/0" vs "testcg/0".
As you can see there is one blob per partition. My assumption is that these blobs are used for two things. The first of these is the Blob leases for distributing partitions amongst instances see here, the second is storing the offsets within the partition that have been committed.
Rather than the data getting pushed to the Consumer Groups the consuming instances are asking the storage system for data at some offset in one partition. EventProcessorHosts are a nice high level way of having a logical consumer group where each partition is only getting read by one consumer at a time, and where the progress the logical consumer group has made in each partition is not forgotten.
Remember that the throughput per partition is measured so that if you're maxing out ingress you can only have two logical consumers that are all up to speed. As such you'll want to make sure you have enough partitions, and throughput units, that you can:
Read all the data you send.
Catch up within the 24 hour retention period if you fall behind for a few hours due to issues.
In conclusion: consumer groups are what you need. The examples you read that use a specific consumer group are good, within each logical consumer group use the same name for the Azure Consumer Group and have different logical consumer groups use different ones.
I haven't yet used Azure Stream Analytics, but at least during the preview release you are limited to the default consumer group. So don't use the default consumer group for something else, and if you need two separate lots of Azure Stream Analytics you may need to do something nasty. But it's easy to configure!

Resources