Inconsistent Send and SendBatch latency for event hub sender - azure

We are using EventHubSender.Send and EventHubSender.SendBatch API methods to send data packets on Azure Event Hub. Each data packet is typically of 6KB in size and there are 13 such data packets every second from 13 different client machines (Each machine sends one data packet every second). We have two event hubs in single namespace each with [6 KB * 13] packets as ingress and egress. Since the total ingress and egress is much lower than the capacity of one Throughput Unit, there is no throttling observed.
However, the send latency does not remain consistent for packets sent every second. Sometimes the send latency goes as high as 3 to 4 seconds. This behaviour was tested for on-premise client machines as well as client machines in Azure data-centre (only for testing purpose).
Client Initialization code snippet:
var factory = MessagingFactory.CreateFromConnectionString(EventHubConnectionString);
EventHubClient eventHubClient = this.factory.CreateEventHubClient(EventHubName);
this.eventHubSender = eventHubClient.CreatePartitionedSender(EventHubPartitionId);
The sender code snippet:
using (EventData eventData = CreateEventDataPacket(data, settings))
{
this.eventHubSender.Send(eventData);
}
Please note: The eventHubSender instance is being reused for every subsequent send request and the transport type used in EventHubConnectionString is AMQP.
Please suggest if the latency can be reduced and made consistent for both Send and SendBatch methods.

You are creating PartitionedSender which sends event to a specific partition. On the service side, partition might move from one backend to another when performing service upgrade, which will make partition unavailable for couple seconds and might contribute to the send latency. If you do not use partitionedSender, event hub client will send to each partition round-robin way which should mitigate the situation when partitions are moved.
Besides that, send latency is also subject to many factors such as network, load balancing, service availability etc.

Related

Is there any message receiving limit per device on Azure IoTHub?

Is there any message receiving limit per device on Azure IoTHub?
If any, can I remove or raise the upper limit without registering additional devices?
I tested 2 things to make sure if I can place enough load (ideally, 18000 message/s)on Azure IoT Hub in the future load tests.
① Send a certain amount of mqtt messages from a VM.
② Send a certain amount of mqtt messages from two VMs.
I expected that the traffic of ② would be twice as large as that of ①. But it wasn't. Maximum messages per minute on IoTHub of ② is not so different from that of ①. Both of them are around 3.6k [message/min]. At that time, I registered only one device on IoT Hub. So I added another device and tested ② again to see if the second device could increase the traffic. As a result, it increased the traffic and IoT Hub had bigger messages per minute.
Judging from this result, I thought IoTHub has some kind of limit on receiving message per device. But I am not sure. So if anyone know about the limit, could you tell me what kind of limit it is and how to raise the upper limit without registering additional devices because in production we use only one device.
For your information, I know there is a "unit" to increase the throughput in IoTHub. To increase the load I changed the number of unit from 2 to 20 in both ① and ②. However, it did not make messages/min in IotHub bigger. I'd also like to know why the "unit" did not work as expected.
Thank you for reading, in advance. Any comment would be my help.
Every basic (B1,B2, B3) or standard unit of IoT Hub SKU (S1, S2, S3) has specific daily message quota as per https://azure.microsoft.com/en-us/pricing/details/iot-hub/. A single IoTHub can support 1 million devices and there is no per device cost associated, only the msg/day quota as above.
e.g. S1 SKU has 400,000 msg/day quota and you can add multiple units of S1 to increase capacity. S2 has 6000,000 msg/day and S3 has 300,000,000 msg/day quota per unit and more units can be added.
Before this limit is reached IoTHub will raise alert which can be used to automatically add more units or jump to higher SKU.
Regarding your test, there are specific throttling limits to avoid misuse of the service here -
https://learn.microsoft.com/en-us/azure/iot-hub/iot-hub-devguide-quotas-throttling
As an example, for 18000 msg/sec you will need 3 units of S3 SKU (each with 6000 msg/sec rate limit). In addition there are other limits like how quickly connections can be attempted, if using Azure IoT SDK's the built-in retry logic helps overcome this otherwise you need to have retry policy. Basically you dont want million device trying to connect at the same time, IoTHub will only accept connections at a certain rate. This is not concurrent connection limit but a rate at which new connnections are accepted.

Azure Service Bus Send throughput .Net SDK

I am currently implementing a library to send the messages faster to the Service bus queue. What is observed is that, if I used the same ServiceBusClient and use the same sender to send the messages in Parallel.For, the throughput is not so high and my network upload speed is not fully utilized. The moment I make individual clients and use them to send, the throughput increases drastically and even utilizes my upload bandwidth very well.
Is my understanding correct or a single client-sender must do? Also, I am averse to create multiple clients as it will use a lot of resources to establish the client connection. Any articles that throw some light on this?
There is a throughput test tool and its code also creates multiple client.
protected override Task OnStartAsync()
{
for (int i = 0; i < this.Settings.SenderCount; i++)
{
this.senders.Add(Task.Run(SendTask));
}
return Task.WhenAll(senders);
}
async Task SendTask()
{
var client = new ServiceBusClient(this.Settings.ConnectionString);
ServiceBusSender sender = client.CreateSender(this.Settings.SendPath);
var payload = new byte[this.Settings.MessageSizeInBytes];
var semaphore = new DynamicSemaphoreSlim(this.Settings.MaxInflightSends.Value);
var done = new SemaphoreSlim(1);
done.Wait();
long totalSends = 0;
https://github.com/Azure-Samples/service-bus-dotnet-messaging-performance
Is there a library to manage the connections in a pool?
From the patterns in your code, I'm assuming that you're using the Azure.Messaging.ServiceBus package. If that isn't the case, please ignore the remainder of this post.
ServiceBusClient represents a single AMQP connection to the service. Any senders, receivers, and processors spawned from this client will share that connection. This gives your application the ability to control the number of connections used and pool them in the manner that works best in your context.
It is recommended to reuse clients, senders, receivers, and processors for the lifetime of your application; though the connection is shared, each time a new child type is spawned, it must establish a new AMQP link and perform the authorization handshake - which is non-trivial overhead.
These types are self-managing with respect to resources. For idle periods, connections and links will be closed to avoid waste, and they'll automatically be recreated for the first operation that requires them.
With respect to using multiple clients, senders, receivers, and processors - it is a valid approach and can yield better performance in some scenarios. The one caveat that I'll mention is that using more clients than the number of CPU cores in your host environment comes with an increased risk of causing contention in the thread pool. The Service Bus library is highly asynchronous, and its performance relies on continuations for async calls being scheduled in a timely manner.
Unfortunately, performance tuning is very difficult to generalize due to how much it varies for different application and hosting contexts. To find the right number of senders to maximize throughput for your application, we recommend that you spend time testing different values and observing the performance characteristics in your specific system.
For the new SDK, the same principle of connection management is applied i.e., re-creating connections is expensive.
You can connect client objects directly to the bus or by creating a ServiceBusConnection, can share a single connection between client
This is for the scenario to send as many messages as possible to a single queue then you can increase throughput by spinning up multiple ServiceBusConnection and client objects on separate threads.
Is there a library to manage the connections in a pool?
There’s no connection pooling happening under the hood and new connections are relatively expensive to create. With the previous SDK the advice was to re-use factories and clients where possible.
Refer this article for more information.

How to achieve high speed processing receiving from Azure Event Hub?

I am working on the POC for Azure Event hubs to implement the same into our application.
Quick Brief on flow.
Created tool to read the CSV data from local folder and send it to event hub.
We are sending Event Data in Batch to event hub.
With 12 instance of tool (Parallel), I can send a total of 600 000 lines of messages to Event hub within 1 min.
But, On receiver side, to receive the 600 000 lines of data, it takes more than 10 mins.
Need to achieve
I would like to Match/double my egress speed on the receiver to
process the data. Existing Configuration
The configuration I have made user are
TU - 10 One Event hub with 32 Partition.
Coding logic goes same as mentioned in MSDN
Only difference is, I am sending line of data in a batch.
EventProcessorhost with options {MaxBatchSize= 1000000,
PrefetchCount=1000000
To achieve higher egress rate (aka faster processing pipeline) in eventhubs:
Create a Scaled-out pipeline - each partition in EventHub is the unit-of-scale for processing events out of EventHub. With the Scale you described (6Lakh events per min --> 10K events per sec - with 32 partitions - you already got this right). Make sure you create as many partitions as you envision your pipeline need in near future. Imagine analyzing traffic on a Highway and no. of lanes is the only limitation for the amount of traffic.
Equal load distribution across partitions: if you are using SendToASpecificPartition or SendUsingPartitionKey - you will need to take care of equal load distribution. If you use EventHubClient.Send(EventDataWithOutPartitionKey) - EventHubs service will make sure all of your partitions are equally loaded. If a single EventHub Partition is heavily loaded - the amount of time you can process all events on EventHub will be bound by no. of events on this Partition.
Scale-out physical resources on the Receiver/EventProcessorHost: most importantly Network (Sockets & bandwidth) & after-a-point, CPU & Memory. Use PartitionManagerOptions.MaxReceiveClients to increase the maximum number of EventHubClients (which has a dedicated MessagingFactory, which maps to 1 socket) created per EventProcessorHost instance. By default it is 16.
Let me know how it went... :)
More on Event Hubs.

High throughput send to EventHubs resulting into MessagingException / TimeoutException / Server was unable to process the request errors

We are experiencing lots of these exceptions sending events to EventHubs during peak traffic:
"Failed to send event to EventHub. Exception : Microsoft.ServiceBus.Messaging.MessagingException: The server was unable to process the request; please retry the operation. If the problem persists, please contact your Service Bus administrator and provide the tracking id."
or
"Failed to send event to EventHub. Exception : System.TimeoutException: The operation did not complete within the allocated time "
You can see it clearly here:
As you can see, we got lots of Internal Errors, Server Busy Errors, Failed Request when Incoming messages are over 400K events/hour (or ~270 MB/hour). This is not just a transient issue. It's clearly related to throughput.
Our EH has 32 partitions, message retention of 7 days, and 5 throughput units assigned. OperationTimeout is set to 5 mins, and we are using the default RetryPolicy.
Is it anything we still need to tweak here? We are really concerned about the scalability of EH.
Thanks
Send throughput tuning can be achieved using efficient partition distribution strategies. There isn't any single knob which can do this. Below is the basic information you will need to be able to design for High-Thruput Scenarios.
1) Lets start from the Namespace: Throughput Units(aka TUs) are configured at Namespace level. Pls. bear in mind, that, TUs configured is applied - aggregate of all EventHubs under that Namespace. If you have 5 TUs on your Namespace and 5 eventhubs under it - it will be divided among all 5 eventhubs.
2) Now lets look at EventHub level: If the EventHub is allocated with 5 TUs and it has 32 partitions - No single partition can use all 5 TUs. For ex. if you are trying to send 5TU of data to 1 partition and 'Zero' to all other 31 partitions - this is not possible. Maximum you should plan per Partition is 1 TU. In general, you will need to ensure that the data is distributed evenly across all partitions. EventHubs support 3 types of sends - which gives users different level of control on Partition distribution:
EventHubClient.Send(EventDataWithoutPartitionKey) -> if you are using this API to send - eventhub will take care of evenly distributing the data across all partitions. EventHubs service gateway will round-robin the data to all partitions. When a specific partition is down - the Gateways auto-detect and ensure Clients doesn't see any impact. This is the most recommended way to Send to EventHubs.
EventHubClient.Send(EventDataWithPartitionKey) -> if you are using this API to send to EventHubs - the partitionKey will determine the distribution of your data. PartitionKey is used to Hash the EventData to the appropriate partition (algo. to hash is Microsoft Proprietary and not Shared). Typically users who require correlation of a group of messages will use this variant of Send.
EventHubSender.Send(EventData) -> In this variant, the Sender is already attached to the Partition. So - this gives complete control of Distribution across partitions to the Client.
To measure your present distribution of Data - use EventHubClient.GetPartitionRuntimeInfo Api to estimate which Partition is overloaded. The difference b/w BeginSequenceNumber and LastEnqueuedSequenceNumber is supposed to give an estimate of that partitions load compared to others.
3) Last but not the least - you can tune performance (not Throughput) at send operation level - using the SendBatch API.
1 TU can buy a Max of 1000 msgs/sec or 1MBPS - you will be throttled with whichever limit hits first - this cannot be changed.
If your messages are small - lets say 100 bytes and you can send only 1000 msgs/sec (as per the TU limit) - you will first hit the 1000 events/sec limit. However, overall using SendBatch API - you can batch lets say 10 of 100byte msgs and push at the same rate - 1000 msgs/sec with just 100 API calls and improve the end-to-end latency of the system (as it helps service also to persist messages efficiently). Remember, the only limitation here is the Max. Msg Size that can be sent - which is 256 kb (this limit will apply on your BatchSize if you use SendBatch API).
Given that background, in your case:
- Having 32 partitions and 5 TUs - I would really double-check the Partition distribution strategy.
here's some more general reading on Event Hubs...
After a lot of digging we decided to stop setting the PK for posted messages, and the issue simply went away!. We were using GUID as PK. We start to get very few erros on the Azure Portal, and no more exceptions. Hope this helps someone else

Cloud Architecture On Azure for Internet of Things

I'm working on a server architecture for sending/receiving messages from remote embedded devices, which will be hosted on Windows Azure. The front-facing servers are going to be maintaining persistent TCP connections with these devices, and I need a way to communicate with them on the backend.
Problem facts:
Devices: ~10,000
Frequency of messages device is sending up to servers: 1/min
Frequency of messages originating server side (e.g. from user actions, scheduled triggers, etc.): 100/day
Average size of message payload: 64 bytes
Upward communication
The devices send up messages very frequently (sensor readings). The constraints for that data are not very strong, due to the fact that we can aggregate/insert those sensor readings in a batched manner, and that they don't require in-order guarantees. I think the best way of handling them is to put them in a Storage Queue, and have a worker process poll the queue at intervals and dump that data. Of course, I'll have to be careful about making sure the worker process does this frequently enough so that the queue doesn't infinitely back up. The max batch size of Azure Storage Queues is 32, but I'm thinking of potentially pulling in more than that: something like publishing to the data store every 1,000 readings or 30 seconds, whichever comes first.
Downward communication
The server sends down updates and notifications much less frequently. This is a slightly harder problem, as I can see two viable paradigms here (with some blending in between). Could either:
Create a Service Bus Queue for each device (or one queue with thousands of subscriptions - limit is for number of queues is 10,000)
Have a state table housed in a DB that contains the latest "state" of a specific message type that the devices will get sent to them
With option 1, the application server simply enqueues a message in a fire-and-forget manner. On the front-end servers, however, there's quite a bit of things that have to happen. Concerns I can see include:
Monitoring 10k queues (or many subscriptions off of a queue - the
Azure SDK apparently reuses connections for subscriptions to the same
queue)
Connection Management
Should no longer monitor a queue if device disconnects.
Need to expire messages if device is disconnected for an extended period of time (so that queue isn't backed up)
Need to enable some type of "refresh" mechanism to update device's complete state when it goes back online
The good news is that service bus queues are durable, and with sessions can arrange messages to come in a FIFO manner.
With option 2, the DB would house a table that would maintain state for all of the devices. This table would be checked periodically by the front-facing servers (every few seconds or so) for state changes written to it by the application server. The front-facing servers would then dispatch to the devices. This removes the requirement for queueing of FIFO, the reasoning being that this message contains the latest state, and doesn't have to compete with other messages destined for the same device. The message is ephemeral: if it fails, then it will be resent when the device reconnects and requests to be refreshed, or at the next check interval of the front-facing server.
In this scenario, the need for queues seems to be removed, but the DB becomes the bottleneck here, and I fear it's not as scalable.
These are both viable approaches, and I feel this question is already becoming too large (although I can provide more descriptions if necessary). Just wanted to get a feel for what's possible, what's usually done, if there's something fundamental I'm missing, and what things in the cloud can I take advantage of to not reinvent the wheel.
If you can identify the device (may be device id/IMEI/Mac address) by the the message it sends then you can reduce the number of queues from 10,000 to 1 queue and not have 10000 subscriptions too. This could also help you in the downward communication as you will be able to identify the device and send the message to the appropriate socket.
As you mentioned the connections last longer you could deliver the command to the device that is connected and decide what to do with the commands to the device that are not connected.
Hope it helps

Resources