Azure Service Bus Send throughput .Net SDK - azure

I am currently implementing a library to send the messages faster to the Service bus queue. What is observed is that, if I used the same ServiceBusClient and use the same sender to send the messages in Parallel.For, the throughput is not so high and my network upload speed is not fully utilized. The moment I make individual clients and use them to send, the throughput increases drastically and even utilizes my upload bandwidth very well.
Is my understanding correct or a single client-sender must do? Also, I am averse to create multiple clients as it will use a lot of resources to establish the client connection. Any articles that throw some light on this?
There is a throughput test tool and its code also creates multiple client.
protected override Task OnStartAsync()
{
for (int i = 0; i < this.Settings.SenderCount; i++)
{
this.senders.Add(Task.Run(SendTask));
}
return Task.WhenAll(senders);
}
async Task SendTask()
{
var client = new ServiceBusClient(this.Settings.ConnectionString);
ServiceBusSender sender = client.CreateSender(this.Settings.SendPath);
var payload = new byte[this.Settings.MessageSizeInBytes];
var semaphore = new DynamicSemaphoreSlim(this.Settings.MaxInflightSends.Value);
var done = new SemaphoreSlim(1);
done.Wait();
long totalSends = 0;
https://github.com/Azure-Samples/service-bus-dotnet-messaging-performance
Is there a library to manage the connections in a pool?

From the patterns in your code, I'm assuming that you're using the Azure.Messaging.ServiceBus package. If that isn't the case, please ignore the remainder of this post.
ServiceBusClient represents a single AMQP connection to the service. Any senders, receivers, and processors spawned from this client will share that connection. This gives your application the ability to control the number of connections used and pool them in the manner that works best in your context.
It is recommended to reuse clients, senders, receivers, and processors for the lifetime of your application; though the connection is shared, each time a new child type is spawned, it must establish a new AMQP link and perform the authorization handshake - which is non-trivial overhead.
These types are self-managing with respect to resources. For idle periods, connections and links will be closed to avoid waste, and they'll automatically be recreated for the first operation that requires them.
With respect to using multiple clients, senders, receivers, and processors - it is a valid approach and can yield better performance in some scenarios. The one caveat that I'll mention is that using more clients than the number of CPU cores in your host environment comes with an increased risk of causing contention in the thread pool. The Service Bus library is highly asynchronous, and its performance relies on continuations for async calls being scheduled in a timely manner.
Unfortunately, performance tuning is very difficult to generalize due to how much it varies for different application and hosting contexts. To find the right number of senders to maximize throughput for your application, we recommend that you spend time testing different values and observing the performance characteristics in your specific system.

For the new SDK, the same principle of connection management is applied i.e., re-creating connections is expensive.
You can connect client objects directly to the bus or by creating a ServiceBusConnection, can share a single connection between client
This is for the scenario to send as many messages as possible to a single queue then you can increase throughput by spinning up multiple ServiceBusConnection and client objects on separate threads.
Is there a library to manage the connections in a pool?
There’s no connection pooling happening under the hood and new connections are relatively expensive to create. With the previous SDK the advice was to re-use factories and clients where possible.
Refer this article for more information.

Related

Need suggestion on best EventLoopGroup configuration for particular case

I was going through multiple docs and examples about the correct configuration of EventLoopGroup but couldn't decide what's best for my use case.
I am using Netty 4.1.86.Final on linux.
Use case:
We have a web server that accepts 50-100k connections (both http :80 and https :443) and handles 20-30k requests per second. It's server-to-server communication, where multiple servers maintain connections for a short duration (keep-alive few minutes to an hour) and then recycle (close and open a new connection) the connection. Every request typically takes 50-100ms to process. The processing involves parsing JSON payload and running some complex business logic and some async network call to internal servers as well (cache, db etc.).
Question:
What I understood so far is that there are a couple of options that I can try -
use same EventLoopGroup for boss and worker and use separate EventExecutorGroup in pipeline for my business logic like: pipeline.addLast(handlerExecutorGroup, "handler", new MyHttpEndServer());
use different EventLoopGroup for boss and worker (.group(ioThreadExecutors, workerThreadExecutors)) and use a very high number of threads for the worker EventLoopGroup (cpu * 8)
use 3 different EventLoopGroup for boss, worker and handler (cpu * 8 thread).
use same EventLoopGroup for boss and worker and use SO_REUSEPORT and bind multiple threads to the same port so that instead of 1 I/O thread per port, now we will have more I/O thread per port to accept the connection along with using separate handlerExecutorGroup in pipeline for my business logic same as option 1.
.group(ioThreadExecutors)
.option(UnixChannelOption.SO_REUSEPORT, true)
for (int i = 0; i < channelThreadCount; i++) {
ChannelFuture f = httpServer.bind(port).sync();
channelFutureSet.add(f);
}
which among these seems a better option for my use case? Or if there is some other better approach that I should try.
Note: I did the load test with all 4 options, but didn't notice much difference.

ActiveMQ Java NIO transport connector vs PoolConnectionFactory

What is the different use cases of Java NIO transport connector vs PoolConnectionFactory in ActiveMQ. Both serves the pool of connections.I want to use thousand of clients connect to the broker and maintain a seperate queue for each client. Where is is use case for both of this in the scenario?
The NIO Transport connector is a server side incoming connection API that utilizes a selector based event loop to share the load of multiple active connections where normally on the normal transport connector a single thread is created per connection to process IO leading to higher thread counts when large numbers of connections are active.
The PooledConnectionFactory is a client side device that provides a pool of one or more open connections that can be used by application code to reduce the number of connection create / destroy events thereby leading to faster client side code in some cases and lower overhead on the remote broker as it would not need to process connection create / destroy events from an application whose model causes this sort of behavior. Depending on how you've coded your application or what API layering you have such as Camel or Spring etc a pool may or may not be of benefit.
The two things are not related and should not be equated with one another.
NIO transport uses on low level the selector which is much more performant then Pool connectionfactory.
It means it get notification if any new data is ready while Pool wait for each Connection. For your use case i would strongly suggest NIO Connector

Sharing EventHub between Azure Fabric reliable actors

I'm having an application where I map devices from the physical world to Reliable Actors in Azure Fabric. Each time I receive a message from a device, I want to push a message to an event hub.
What I'm doing right now is creating/using/closing the EventHubClient object for each message.
This is very inefficient (it takes about 1500ms) but it solves an issue I had in the past where I was keeping the EventHubClient in memory. When I have a lot of devices, the underlying virtual machine can quickly run out of network connections.
I'm thinking about creating a new actor that would be responsible for pushing data to the EventHub (by keeping the EventHubClient alive). Because of the turned based concurrency model of Reliable Actors, I'm not sure it's a good idea. If I get 10 000 devices pushing data "at the same time", each of their actors will block to push the message to the new actor that pushes message to the EventHub.
What is the recommended approach for this scenario ?
Thanks,
One approach would be to create a stateless service that is responsible for pushing messages to the EventHub. Each time an Actor receives a message from the device (by the way, how are they communicating with actors?) the Actor calls the stateless service. The stateless service in turn would be responsible for creating, maintining and disposing of one EventHubClient per service. Reliable Service would not introduce the same 'overhead' when it comes to handling incoming messages as a Reliable Actor would. If it is important for your application that the messages reach the EventHub in strictly the same order that they were produced in then you would have to do this with a Stateful Service and a Reliable Queue. (Note, this there is on the other hand no guarantee that Actors would be able to finish handling incoming messages in the same order as they are produced)
You could then fine tune-tune the solution by experimenting with the instance count (https://learn.microsoft.com/en-us/azure/service-fabric/service-fabric-availability-services) to make sure you have enough instances to handle the throughput of incoming messages. How many instances are roughly determined by the number of nodes and cores per node, although other factors may also affect.
Devices communicate with your Actors, the Actors in turn communicate with the Service (may be Stateless or Stateful if you want to queue message, see below), each Service manages an EventHubClient that can push messages to the EventHub.
If your cluster is unable to support an instance count for this service that is high enough (a little simplified: more instances = higher throughput), then you may need to create it as a Stateful Service instead and put messages in a Reliable Queue in the Service and then have the the RunAsync for the Service processing the queue in order. This could take the pressure of peaks in performance.
The Service Fabric Azure-Samples WordCount shows how you work with different Partitions to make the messages from Actors target different instances (or really partitions).
A general tip would be to not try to use Actors for everything (but for the right things they are great and reduces complexity a lot), the Reliable Services model support a lot more scenarios and requirements and could really complement your Actors (rather than trying to make Actors do something they are not really designed for).
You could use a pub/sub pattern here (use the BrokerService).
By decoupling event publishing from event processing, you don't need to worry about the turn based concurrency model.
Publishers:
The Actor sends out messages by simply publishing them to a BrokerService.
Subscribers
Then you use one or more Stateless Services or (different) Actors as subscribers of the events.
They would send them into EventHub in their own pace.
Event Hub Client
Using this approach you'd have full control over the EventHubClient instance counts and lifetimes.
You could increase event processing power by simply adding more subscribers.
In my opinion you should directly call from your actors the event hub in a background thread with an internal memory queue. You should aggregate messages and use SendBatch to improve performance.
The event hub is able to receive the load by himself.

Queue vs Non Blocking I/O

So, we're designing a new micro-service architecture. One of the biggest challenge is internal communication. For communication, in which response is required, we're using REST APIs. But for the services, which just wants to relay the information, this API processing is unnecessary overhead.
One way is to use Queue. The service1 will push the information into a queue, and service2 can consume from there. Therefore service1 don't have to wait (unlike an API call). (If there is any error in processing the information, service2 can either inform via a callback URL to service1, or any other way; this is not a concern at this point [1])
Now with Queue, there are two options, one is RabbitMQ. And another is AWS SQS. With RabbitMQ I've to worry about server-setup and everything (which can be done, but wants to avoid it). So after a POC of SQS, it seems like a good option, but the thing is SQS internally uses Rest APIs to communicate with AWS servers, at both point (service1 when pushing, service2 when consuming), there will be overhead. So now I'm thinking why not do it in NodeJS, service1 will hit the service2 with information. Service2 will respond immediately, acknowledging that it has received the information, if there is any error then [1].
Now Pros/Cons I could summarise is -
RabbitMQ
Easy to implement
In case of unavailability of receiver, sender won't have to worry about retrying.
Server Setup Cost + Maintenance (+ Tuning)
SQS
Easiest to implement
Pricing
Constant Polling for Messages
Overhead at push/receive
Non-blocking APIs
No 3rd medium required for communication
Service1 has to manage retry mechanism
Relative to SQS, less overhead
Information will be in-memory until processed
So to some up, my question is, is it a good idea to go with Non-blocking APIs? Or which one will be better approach, in terms of making system scalable.
Edit -
Can a PubSub provider like PubNub or Pusher can be used instead of Queue?
SQS uses XML over http, RabbitMQ uses AMQP, all protocols have overhead. Serializing/deserializing has a cost. Both the amazon SQS and AMQP are very efficient. I would exclude these "overheads" from your calculations, and instead focus on your other requirements.
One of the big advantages of using a queue is the handling of surge activity. If you get 100K hits, and need to send 100K messages, and you try to implement this as inter-service calls (non-blocking or otherwise), you will hit real limits on the scalability of your system (from a port count if nothing else). If you instead put 100K messages on a queue, those messages can be processed basically at the remote server's "leisure".
Additionally, as you have mentioned above, queues have a persistence that is much more difficult to implement on your own. If you data is not critical, this is not a big concern, but if this data is of higher importance, you really want something that pushes to a persistent store (Like SQS, or Rabbit persistent queues)...
I am late here but off late I have started working with NON Blocking I/O and see a great benefit of NIO especially when you are calling external services which cannot be given access to a message queue. Using a fixed connection pool will ensure that 100K problem is handled with non blocking I/O and too many connections are not created.
While calling internal services a message queue is prefered, but lets say you do not have that option, you can leverage NIO with a retry mechanism and connection pooling to given you the same scalability message queues would give. This is assuming that receivers are able to handle the load of NIO calls.

When does a single JMS connection with multiple producing sessions start becoming a bottleneck?

I've recently read a lot about best practices with JMS, Spring (and TIBCO EMS) around connections, sessions, consumers & producers
When working within the Spring world, the prevailing wisdom seems to be
for consuming/incoming flows - to use an AbstractMessageListenerContainer with a number of consumers/threads.
for producing/publishing flows - to use a CachingConnectionFactory underneath a JmsTemplate to maintain a single connection to the broker and then cache sessions and producers.
For producing/publishing, this is what my (largeish) server application is now doing, where previously it was creating a new connection/session/producer for every single message it was publishing (bad!) due to use of the raw connection factory under JmsTemplate. The old behaviour would sometimes lead to 1,000s of connections being created and closed on the broker in a short period of time in high peak periods and even hitting socket/file handle limits as a result.
However, when switching to this model I am having trouble understanding what the performance limitations/considerations are with the use of a single TCP connection to the broker. I understand that the JMS provider is expected to ensure it can be used in the multi-threaded way etc - but from a practical perspective
it's just a single TCP connection
the JMS provider to some degree needs to co-ordinate writes down the pipe so they don't end up an interleaved jumble, even if it has some chunking in its internal protocol
surely this involves some contention between threads/sessions using the single connection
with certain network semantics (high latency to broker? unstable throughput?) surely a single connection will not be ideal?
On the assumption that I'm somewhat on the right track
Am I off base here and misunderstanding how the underlying connections work and are shared by a JMS provider?
is any contention a problem mitigated by having more connections or does it just move the contention to the broker?
Does anyone have any practical experience of hitting such a limit they could share? Either with particular message or network throughput, or even caused by # of threads/sessions sharing a connection in parallel
Should one be concerned in a single-connection scenario about sessions that write very large messages blocking other sessions that write small messages?
Would appreciate any thoughts or pointers to more reading on the subject or experience even with other brokers.
When thinking about the bottleneck, keep in mind two facts:
TCP is a streaming protocol, almost all JMS providers use a TCP based protocol
lots of the actions from TIBCO EMS client to EMS server are in the form of request/reply. For example, when you publish a message / acknowledge a receive message / commit a transactional session, what's happening under the hood is that some TCP packets are sent out from client and the server will respond with some packets as well. Because of the nature of TCP streaming, those actions have to be serialised if they are initiated from the same connection -- otherwise say if from one thread you publish a message and in the exact same time from another thread you commit a session, the packets will be mixed on the wire and there is no way server can interpret the right message from the packets. [ Note: the synchronisation is done from the EMS client library level, hence user can feel free to share one connection with multiple threads/sessions/consumers/producers ]
My own experience is multiple connections always output perform single connection. In a lossy network situation, it is definitely a must to use multiple connections. Under best network condition, with multiple connections, a single client can nearly saturate the network bandwidth between client and server.
That said, it really depends on what is your clients' performance requirement, a single connection under good network can already provides good enough performance.
Even if you use one connection and 100 sessions it means finally you
are using 100threads, it is same as using 10connections* 10 sessions =
100threads.
You are good until you reach your system resource limits

Resources