What is the different use cases of Java NIO transport connector vs PoolConnectionFactory in ActiveMQ. Both serves the pool of connections.I want to use thousand of clients connect to the broker and maintain a seperate queue for each client. Where is is use case for both of this in the scenario?
The NIO Transport connector is a server side incoming connection API that utilizes a selector based event loop to share the load of multiple active connections where normally on the normal transport connector a single thread is created per connection to process IO leading to higher thread counts when large numbers of connections are active.
The PooledConnectionFactory is a client side device that provides a pool of one or more open connections that can be used by application code to reduce the number of connection create / destroy events thereby leading to faster client side code in some cases and lower overhead on the remote broker as it would not need to process connection create / destroy events from an application whose model causes this sort of behavior. Depending on how you've coded your application or what API layering you have such as Camel or Spring etc a pool may or may not be of benefit.
The two things are not related and should not be equated with one another.
NIO transport uses on low level the selector which is much more performant then Pool connectionfactory.
It means it get notification if any new data is ready while Pool wait for each Connection. For your use case i would strongly suggest NIO Connector
Related
I am currently implementing a library to send the messages faster to the Service bus queue. What is observed is that, if I used the same ServiceBusClient and use the same sender to send the messages in Parallel.For, the throughput is not so high and my network upload speed is not fully utilized. The moment I make individual clients and use them to send, the throughput increases drastically and even utilizes my upload bandwidth very well.
Is my understanding correct or a single client-sender must do? Also, I am averse to create multiple clients as it will use a lot of resources to establish the client connection. Any articles that throw some light on this?
There is a throughput test tool and its code also creates multiple client.
protected override Task OnStartAsync()
{
for (int i = 0; i < this.Settings.SenderCount; i++)
{
this.senders.Add(Task.Run(SendTask));
}
return Task.WhenAll(senders);
}
async Task SendTask()
{
var client = new ServiceBusClient(this.Settings.ConnectionString);
ServiceBusSender sender = client.CreateSender(this.Settings.SendPath);
var payload = new byte[this.Settings.MessageSizeInBytes];
var semaphore = new DynamicSemaphoreSlim(this.Settings.MaxInflightSends.Value);
var done = new SemaphoreSlim(1);
done.Wait();
long totalSends = 0;
https://github.com/Azure-Samples/service-bus-dotnet-messaging-performance
Is there a library to manage the connections in a pool?
From the patterns in your code, I'm assuming that you're using the Azure.Messaging.ServiceBus package. If that isn't the case, please ignore the remainder of this post.
ServiceBusClient represents a single AMQP connection to the service. Any senders, receivers, and processors spawned from this client will share that connection. This gives your application the ability to control the number of connections used and pool them in the manner that works best in your context.
It is recommended to reuse clients, senders, receivers, and processors for the lifetime of your application; though the connection is shared, each time a new child type is spawned, it must establish a new AMQP link and perform the authorization handshake - which is non-trivial overhead.
These types are self-managing with respect to resources. For idle periods, connections and links will be closed to avoid waste, and they'll automatically be recreated for the first operation that requires them.
With respect to using multiple clients, senders, receivers, and processors - it is a valid approach and can yield better performance in some scenarios. The one caveat that I'll mention is that using more clients than the number of CPU cores in your host environment comes with an increased risk of causing contention in the thread pool. The Service Bus library is highly asynchronous, and its performance relies on continuations for async calls being scheduled in a timely manner.
Unfortunately, performance tuning is very difficult to generalize due to how much it varies for different application and hosting contexts. To find the right number of senders to maximize throughput for your application, we recommend that you spend time testing different values and observing the performance characteristics in your specific system.
For the new SDK, the same principle of connection management is applied i.e., re-creating connections is expensive.
You can connect client objects directly to the bus or by creating a ServiceBusConnection, can share a single connection between client
This is for the scenario to send as many messages as possible to a single queue then you can increase throughput by spinning up multiple ServiceBusConnection and client objects on separate threads.
Is there a library to manage the connections in a pool?
There’s no connection pooling happening under the hood and new connections are relatively expensive to create. With the previous SDK the advice was to re-use factories and clients where possible.
Refer this article for more information.
I am now working on the application saving data into the database using the REST API. The basic flow is: REST API -> object -> save to database. I wanted to introduce the queue to the application, having in mind the idea of the producer and consumer being a part of one, abovementioned application.
Is it possible for the Node.js application to act as both producer and consumer of the queue? Knowing that Node.js is single-threaded language, does it give me any other choice instead of creating two applications - one producing to the queue and the second one - waiting actively for messages in a queue and saving to the database?
Also, the requirement here would be for an application to process any item that hasn't been acknowledged on the queue on the restart. That also makes me think that the 'two applications' architecture is the best idea here.
Thank you for the help.
Yes, nodejs is able to do that and is well suited for every I/O intensive application use case. The point here is "what are you trying to achieve"? message queues are meant to make different applications communicate together, while if you need an in-process event bus is a total overkill. There are many easier and efficient ways to propagate messages between decoupled components of the same nodejs app; one of these way is EventEmitter that let your components collaborate in a pubsub fashion
If you are convinced that an AMQP broker is you solution, you just need to
Define a "producer" class that publishes data on an exchange myExchange
Define a "consumer" queue that declares a queue myQueue
Create a binding at application startup between myExchange and myQueue, based on some routing key. Then, when a message is received from "consumer" you need to acknowledge after db saving. When a message is acked, it will be destroyed since it's already been consumed. You can decide, after an error, to recover the message via NACK
There are nodejs libraries that make code easier, such as Rascal
Short answer: YES and use two separate connections for publishing and consuming
Is it possible for the NodeJS application to act as both producer and consumer of the queue?
I would even state that it is a good usecase matching extremely well with NodeJS philosophy and threading mechanism.
Knowing that Node.js is single-threaded language, does it give me any other choice instead of creating two applications - one producing to the queue and the second one - waiting actively for messages in a queue and saving to the database?
You can have one application handling both, just be aware that if your client is publish too fast for the server to handle, RabbitMQ can apply back pressure on the TCP connection, thus consuming on a back-pressured TCP connection would greatly affect consumer performance.
I'm writing a socket.io based server in Node.js (6.9.0). I am using the builtin cluster module to enable multiple processes. For now, there is only two process: a master and a worker. The master receives the connections and maintains an in-memory global data structure (which the worker can query via IPC). The worker process does the majority of work by handling each incoming connection.
I am finding a hanging condition that I cannot attribute to any internal failure when the server is stressed at 300 concurrent users. Under lower concurrency, I don't see the hanging condition.
I'm enabling all forms of debugging (using the debug module: socket.io:socket, socket.io:client as well as my own custom calls to debug).
The last activity I can see is in socket.io, however, the messages indicate that sockets are closing ("reason client namespace disconnect") due to their own "end of test" cycle. It just seems like incoming connections are not be serviced.
I'm using Artillery.io as the test client.
In the server application, I have handlers for uncaught exceptions and try-catch blocks around everything.
In a prior iteration, I also used cluster, but reversed the responsibilities so that the master process handled the connections (with the worker handling global data). That didn't exhibit the same failure. Not sure if something is wrong with the connection distribution. For that, I have also dumped internalMessage events to monitor the internal workings of cluster.
I am not using any other module for connection distribution or sticky sessions. As there is only a single process handling connections (at this time), it doesn't seem relevant.
I was able to remove the hanging condition by changing the cluster scheduling policy from Round Robin (SCHED_RR) to None, which is OS specific (SCHED_NONE). I can't tell whether this is due to a bug in connection distribution (or something else inherent in the scheduling policy), but this one change seems to prevent the hanging condition.
So, we're designing a new micro-service architecture. One of the biggest challenge is internal communication. For communication, in which response is required, we're using REST APIs. But for the services, which just wants to relay the information, this API processing is unnecessary overhead.
One way is to use Queue. The service1 will push the information into a queue, and service2 can consume from there. Therefore service1 don't have to wait (unlike an API call). (If there is any error in processing the information, service2 can either inform via a callback URL to service1, or any other way; this is not a concern at this point [1])
Now with Queue, there are two options, one is RabbitMQ. And another is AWS SQS. With RabbitMQ I've to worry about server-setup and everything (which can be done, but wants to avoid it). So after a POC of SQS, it seems like a good option, but the thing is SQS internally uses Rest APIs to communicate with AWS servers, at both point (service1 when pushing, service2 when consuming), there will be overhead. So now I'm thinking why not do it in NodeJS, service1 will hit the service2 with information. Service2 will respond immediately, acknowledging that it has received the information, if there is any error then [1].
Now Pros/Cons I could summarise is -
RabbitMQ
Easy to implement
In case of unavailability of receiver, sender won't have to worry about retrying.
Server Setup Cost + Maintenance (+ Tuning)
SQS
Easiest to implement
Pricing
Constant Polling for Messages
Overhead at push/receive
Non-blocking APIs
No 3rd medium required for communication
Service1 has to manage retry mechanism
Relative to SQS, less overhead
Information will be in-memory until processed
So to some up, my question is, is it a good idea to go with Non-blocking APIs? Or which one will be better approach, in terms of making system scalable.
Edit -
Can a PubSub provider like PubNub or Pusher can be used instead of Queue?
SQS uses XML over http, RabbitMQ uses AMQP, all protocols have overhead. Serializing/deserializing has a cost. Both the amazon SQS and AMQP are very efficient. I would exclude these "overheads" from your calculations, and instead focus on your other requirements.
One of the big advantages of using a queue is the handling of surge activity. If you get 100K hits, and need to send 100K messages, and you try to implement this as inter-service calls (non-blocking or otherwise), you will hit real limits on the scalability of your system (from a port count if nothing else). If you instead put 100K messages on a queue, those messages can be processed basically at the remote server's "leisure".
Additionally, as you have mentioned above, queues have a persistence that is much more difficult to implement on your own. If you data is not critical, this is not a big concern, but if this data is of higher importance, you really want something that pushes to a persistent store (Like SQS, or Rabbit persistent queues)...
I am late here but off late I have started working with NON Blocking I/O and see a great benefit of NIO especially when you are calling external services which cannot be given access to a message queue. Using a fixed connection pool will ensure that 100K problem is handled with non blocking I/O and too many connections are not created.
While calling internal services a message queue is prefered, but lets say you do not have that option, you can leverage NIO with a retry mechanism and connection pooling to given you the same scalability message queues would give. This is assuming that receivers are able to handle the load of NIO calls.
I've recently read a lot about best practices with JMS, Spring (and TIBCO EMS) around connections, sessions, consumers & producers
When working within the Spring world, the prevailing wisdom seems to be
for consuming/incoming flows - to use an AbstractMessageListenerContainer with a number of consumers/threads.
for producing/publishing flows - to use a CachingConnectionFactory underneath a JmsTemplate to maintain a single connection to the broker and then cache sessions and producers.
For producing/publishing, this is what my (largeish) server application is now doing, where previously it was creating a new connection/session/producer for every single message it was publishing (bad!) due to use of the raw connection factory under JmsTemplate. The old behaviour would sometimes lead to 1,000s of connections being created and closed on the broker in a short period of time in high peak periods and even hitting socket/file handle limits as a result.
However, when switching to this model I am having trouble understanding what the performance limitations/considerations are with the use of a single TCP connection to the broker. I understand that the JMS provider is expected to ensure it can be used in the multi-threaded way etc - but from a practical perspective
it's just a single TCP connection
the JMS provider to some degree needs to co-ordinate writes down the pipe so they don't end up an interleaved jumble, even if it has some chunking in its internal protocol
surely this involves some contention between threads/sessions using the single connection
with certain network semantics (high latency to broker? unstable throughput?) surely a single connection will not be ideal?
On the assumption that I'm somewhat on the right track
Am I off base here and misunderstanding how the underlying connections work and are shared by a JMS provider?
is any contention a problem mitigated by having more connections or does it just move the contention to the broker?
Does anyone have any practical experience of hitting such a limit they could share? Either with particular message or network throughput, or even caused by # of threads/sessions sharing a connection in parallel
Should one be concerned in a single-connection scenario about sessions that write very large messages blocking other sessions that write small messages?
Would appreciate any thoughts or pointers to more reading on the subject or experience even with other brokers.
When thinking about the bottleneck, keep in mind two facts:
TCP is a streaming protocol, almost all JMS providers use a TCP based protocol
lots of the actions from TIBCO EMS client to EMS server are in the form of request/reply. For example, when you publish a message / acknowledge a receive message / commit a transactional session, what's happening under the hood is that some TCP packets are sent out from client and the server will respond with some packets as well. Because of the nature of TCP streaming, those actions have to be serialised if they are initiated from the same connection -- otherwise say if from one thread you publish a message and in the exact same time from another thread you commit a session, the packets will be mixed on the wire and there is no way server can interpret the right message from the packets. [ Note: the synchronisation is done from the EMS client library level, hence user can feel free to share one connection with multiple threads/sessions/consumers/producers ]
My own experience is multiple connections always output perform single connection. In a lossy network situation, it is definitely a must to use multiple connections. Under best network condition, with multiple connections, a single client can nearly saturate the network bandwidth between client and server.
That said, it really depends on what is your clients' performance requirement, a single connection under good network can already provides good enough performance.
Even if you use one connection and 100 sessions it means finally you
are using 100threads, it is same as using 10connections* 10 sessions =
100threads.
You are good until you reach your system resource limits