NodeJS with Redis message queue - How to set multiple consumers (threads)

NodeJS with Redis message queue - How to set multiple consumers (threads) - node.js

I have a nodejs project that is exposing a simple rest api for an external web application. This webhook must cope with a large number of requests per second as well as return 200 OK very quickly to the caller. In order for that to happen I investigate a redis simple queue to be enqueued with each request's to be handled asynchronously later on (via a consumer thread).
The redis simple queue seems like an easy way to achieve this task (https://github.com/smrchy/rsmq)
1) Is rsmq.receiveMessage() { ....... } a blocking method? if this handler is slow - will it impact my server's performance?
2) If the answer to question 1 is true - Is it recommended to extract the consumption of the messages to an external micro service? (a dedicated consumer)? what are the best practices to create multi threaded consumers on such environment?

You can use pubsub feature provided by redis https://redis.io/topics/pubsub
You can publish to various channels without any knowledge of subscribers . Subscribers can subscribe to the channels they wish.
sreeni

1) No, it won't block the event loop, however you will only start processing a second message once you call the "next" method, i.e., you will process one message at a time. To overcome this, you can start multiple workers in parallel. Take a look here: https://stackoverflow.com/a/45984677/7201847
2) That's an architectural decision that depends on the load you have to support and the hardware capacity you have. I would recommend at least two Node.js processes, one for adding the messages to the queue and another one to actually processing them, with the option to start additional worker processes if needed, depending on the results of your performance tests.

Related

How splitting in spring integration works for web container?

I want to use Spring Integration for HTTP inbound message processing.
I know, that it spring integration channel would run on a container thread, but if I want to use splits,
what threads would be used?
How the result of split would be returned to the initial web request thread?

(Note: I am not 100% sure if I understand you use case, but as a general remark:)
The spring integration spitter splits a message in multiple "smaller" messages. This is unrelated to multi-threading, that is, it does not per-se imply that the smaller messages are processed in parallel. It is still a sequential stream of smaller messages.
You can then process the smaller messages in parallel, by defining a handler with a given parallelism and you can define that this handler uses a dedicated thread pool.
(Sorry if this does not answer your question, please clarify).

Node.js application acting as producer and consumer

I am now working on the application saving data into the database using the REST API. The basic flow is: REST API -> object -> save to database. I wanted to introduce the queue to the application, having in mind the idea of the producer and consumer being a part of one, abovementioned application.
Is it possible for the Node.js application to act as both producer and consumer of the queue? Knowing that Node.js is single-threaded language, does it give me any other choice instead of creating two applications - one producing to the queue and the second one - waiting actively for messages in a queue and saving to the database?
Also, the requirement here would be for an application to process any item that hasn't been acknowledged on the queue on the restart. That also makes me think that the 'two applications' architecture is the best idea here.
Thank you for the help.

Yes, nodejs is able to do that and is well suited for every I/O intensive application use case. The point here is "what are you trying to achieve"? message queues are meant to make different applications communicate together, while if you need an in-process event bus is a total overkill. There are many easier and efficient ways to propagate messages between decoupled components of the same nodejs app; one of these way is EventEmitter that let your components collaborate in a pubsub fashion
If you are convinced that an AMQP broker is you solution, you just need to
Define a "producer" class that publishes data on an exchange myExchange
Define a "consumer" queue that declares a queue myQueue
Create a binding at application startup between myExchange and myQueue, based on some routing key. Then, when a message is received from "consumer" you need to acknowledge after db saving. When a message is acked, it will be destroyed since it's already been consumed. You can decide, after an error, to recover the message via NACK
There are nodejs libraries that make code easier, such as Rascal

Short answer: YES and use two separate connections for publishing and consuming
Is it possible for the NodeJS application to act as both producer and consumer of the queue?
I would even state that it is a good usecase matching extremely well with NodeJS philosophy and threading mechanism.
Knowing that Node.js is single-threaded language, does it give me any other choice instead of creating two applications - one producing to the queue and the second one - waiting actively for messages in a queue and saving to the database?
You can have one application handling both, just be aware that if your client is publish too fast for the server to handle, RabbitMQ can apply back pressure on the TCP connection, thus consuming on a back-pressured TCP connection would greatly affect consumer performance.

Queue vs Non Blocking I/O

So, we're designing a new micro-service architecture. One of the biggest challenge is internal communication. For communication, in which response is required, we're using REST APIs. But for the services, which just wants to relay the information, this API processing is unnecessary overhead.
One way is to use Queue. The service1 will push the information into a queue, and service2 can consume from there. Therefore service1 don't have to wait (unlike an API call). (If there is any error in processing the information, service2 can either inform via a callback URL to service1, or any other way; this is not a concern at this point [1])
Now with Queue, there are two options, one is RabbitMQ. And another is AWS SQS. With RabbitMQ I've to worry about server-setup and everything (which can be done, but wants to avoid it). So after a POC of SQS, it seems like a good option, but the thing is SQS internally uses Rest APIs to communicate with AWS servers, at both point (service1 when pushing, service2 when consuming), there will be overhead. So now I'm thinking why not do it in NodeJS, service1 will hit the service2 with information. Service2 will respond immediately, acknowledging that it has received the information, if there is any error then [1].
Now Pros/Cons I could summarise is -
RabbitMQ
Easy to implement
In case of unavailability of receiver, sender won't have to worry about retrying.
Server Setup Cost + Maintenance (+ Tuning)
SQS
Easiest to implement
Pricing
Constant Polling for Messages
Overhead at push/receive
Non-blocking APIs
No 3rd medium required for communication
Service1 has to manage retry mechanism
Relative to SQS, less overhead
Information will be in-memory until processed
So to some up, my question is, is it a good idea to go with Non-blocking APIs? Or which one will be better approach, in terms of making system scalable.
Edit -
Can a PubSub provider like PubNub or Pusher can be used instead of Queue?

SQS uses XML over http, RabbitMQ uses AMQP, all protocols have overhead. Serializing/deserializing has a cost. Both the amazon SQS and AMQP are very efficient. I would exclude these "overheads" from your calculations, and instead focus on your other requirements.
One of the big advantages of using a queue is the handling of surge activity. If you get 100K hits, and need to send 100K messages, and you try to implement this as inter-service calls (non-blocking or otherwise), you will hit real limits on the scalability of your system (from a port count if nothing else). If you instead put 100K messages on a queue, those messages can be processed basically at the remote server's "leisure".
Additionally, as you have mentioned above, queues have a persistence that is much more difficult to implement on your own. If you data is not critical, this is not a big concern, but if this data is of higher importance, you really want something that pushes to a persistent store (Like SQS, or Rabbit persistent queues)...

I am late here but off late I have started working with NON Blocking I/O and see a great benefit of NIO especially when you are calling external services which cannot be given access to a message queue. Using a fixed connection pool will ensure that 100K problem is handled with non blocking I/O and too many connections are not created.
While calling internal services a message queue is prefered, but lets say you do not have that option, you can leverage NIO with a retry mechanism and connection pooling to given you the same scalability message queues would give. This is assuming that receivers are able to handle the load of NIO calls.

How to design a scalable rpc call listener?

I have to listen for rpc calls , stack them somewhere , process them, and answer. The thing is that they are not run as soon as they come. The response is an ACK for each rpc call recieved.
The problem is that i want to design it in a way that i can have many listening servers writing in the same stack of calls, piling them up as they come.
My objective is to listen to as many calls as possible. How should i achieve this?
My main technology is Perl and node.js but would use any open source software for this task.

It sounds like any kind of job queue will do what you need it to; I'm personally a big fan of using Redis for this kind of thing. Since Redis lists maintain insertion order, you can simply LPUSH your RPC call info on to the end of the list from any number of web servers listening to the RPC calls, and somewhere else (in another process/on another machine, I assume) RPOP (or BRPOP) them off and process them.
Since Node.js uses fully asynchronous IO, assuming you're not doing a lot of processing in your RPC listeners (that is, you're only listening for requests, sending an ACK, and pushing onto Redis), my guess is that Node would be exceedingly efficient at this.
An aside on using Redis for a queue: if you want to ensure that, in the event of a catastrophic failure, jobs are not lost, you'll need to implement a little more logic; from the RPOPLPUSH documentation:
Pattern: Reliable queue
Redis is often used as a messaging server to implement processing of background jobs or other kinds of messaging
tasks. A simple form of queue is often obtained pushing values into a
list in the producer side, and waiting for this values in the consumer
side using RPOP (using polling), or BRPOP if the client is better
served by a blocking operation.
However in this context the obtained
queue is not reliable as messages can be lost, for example in the case
there is a network problem or if the consumer crashes just after the
message is received but it is still to process.
RPOPLPUSH (or
BRPOPLPUSH for the blocking variant) offers a way to avoid this
problem: the consumer fetches the message and at the same time pushes
it into a processing list. It will use the LREM command in order to
remove the message from the processing list once the message has been
processed.
An additional client may monitor the processing list for
items that remain there for too much time, and will push those timed
out items into the queue again if needed.

How to design a service that processes messages arriving in a queue

I have a design question for a multi-threaded windows service that processes messages from multiple clients.
The rules are
Each message is to process something for an entity (with a unique id) and can be different i.e DoA, DoB, DoC etc. Entity id is in the payload of the message.
The processing may take some time (up to few seconds).
Messages must be processed in the order they arrive for each entity (with same id).
Messages can however be processed for another entity concurrently (i.e as long as they are not the same entity id)
The no of concurrent processing is configurable (generally 8)
Messages can not be lost. If there is an error in processing a message then that message and all other messages for the same entity must be stored for future processing manually.
The messages arrive in a transactional MSMQ queue.
How would you design the service. I have a working solution but would like to know how others would tackle this.

First thing you do is step back, and think about how critical is performance for this application. Do you really need to proccess messages concurrently? Is it mission critical? Or do you just think that you need it? Have you run a profiler on your service to find the real bottlenecks of the procces and optimized those?
The reason I ask, is be cause you mention you want 8 concurrent procceses - however, if you make this app single threaded, it will greatly reduce the complexity & developement & testing time... And since you only want 8, it almost seems not worth it...
Secondly, since you can only proccess concurrent messages on the same entity - how often will you really get concurrent requests from your client to procces the same entity? Is it worth adding so many layers of complexity for a use case that might not come up very often?
I would KISS. I'd use MSMQ via WCF, and keep my WCF service as a singleton. Now you have the power, ordered reliability of MSMQ and you are now meeting your actual requirements. Then I'd test it at high load with realistic data, and run a profiler to find bottlenecks if i found it was too slow. Only then would I go through all the extra trouble of building a much more complex app to manage concurrency for only specific use cases...
One design to consider is creating a central 'gate keeper' or 'service bus' service who receives all the messages from the clients, and then passes these messages down to the actual worker service(s). When he gets a request, he then finds if another one of his clients are already proccessing a message for the same entity - if so, he sends it to that same service he sent the other message to. This way you can proccess the same messages for a given entity concurrently and nothing more... And you have ease of seamless scalability... However, I would only do this if I absolutely had to and it was proved out via profiling and testing, and not because 'we think we needed it' (see YAGNI principal :))

My approach would be the following:
Create a threadpool with your configurable number of threads.
Keep map of entity ids and associate each id with a queue of messages.
When you receive a message place it in the queue of the corresponding entity id.
Each thread will only look at the entity id dedicated to it (e.g. make a class that is initialized as such Service(EntityID id)).
Let the thread only process messages from the queue of its dedicated entity id.
Once all the messages are processed for the given entity id remove the id from the map and exit the loop of the thread.
If there is room in the threadpool, then add a new thread to deal with the next available entity id.
You'll have to manage the messages that can't be processed at the time, including the situations where the message processing fails. Create a backlog of messages, etc.
If you have access to a concurrent map (a lock-free/wait-free map), then you can have multiple readers and writers to the map without the need of locking or waiting. If you can't get a concurrent map, then all the contingency will be on the map: whenever you add messages to a queue in the map or you add new entity id's you have to lock it. The best thing to do is wrap the map in a structure that offers methods for reading and writing with appropriate locking.
I don't think you will see any significant performance impact from locking, but if you do start seeing one I would suggest that you create your own lock-free hash map: http://www.azulsystems.com/events/javaone_2007/2007_LockFreeHash.pdf
Implementing this system will not be a rudimentary task, so take my comments as a general guideline... it's up to the engineer to implement the ideas that apply.

While my requirements were different from yours, I did have to deal with the concurrent processing from a message queue. My solution was to have a service which would look at each incoming message and hand it off to an agent process to consume. The service has a setting which controls how many agents it can have running.

I would look at having n thread each that read from a single thread-safe queue. I would then hash the EntityId to decide witch queue on put an incomming message on.
Sometimes, some threads will have nothing to do, but is this a problem if you have a few more threads then CPUs?
(Also you may wish to group entites by type into the queues so as to reduce the number of locking conflits in your database.)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string