How to automatically assign a worker for message processing?

How to automatically assign a worker for message processing? - node.js

After the master has forked the workers and now wants to start sending messages to the worker processes, is specifying a worker before sending a message the only way to pass the message? The documentation suggests so.
const worker = cluster.fork();
worker.send('hi there');
If yes, what is the scheduling policy all about? Is there a way where we could:
master.sendToWorker('Hi there!');
and it automatically selects the worker according to the default/configured algorithm?

The scheduling policy is for handling incoming connections. If you have 3 workers that are express applications, when a user connects, only one worker will handle the request. It will either be Round Robin, by default, or OS's choice. So that does not give you lots of flexibility.
Now, that does not help us on your request, which is to send messages from the master. The correct solution depends on the nature of the message you'd like to send.
If you are sending a message to make the worker start a task, messages might not be the best solution, you might like to use a job queue instead. But if you'd like to use messages anyways, your master could simply take note of available workers and arbitrarily send the message to a free one, removing it from the available workers until it reports to have finished.
You could simply use your round robin implementation, in one line of code it would look like this:
workersList[++messageCount%workersList.length].send("message");
If you wanted to use the native policy, you could have your workers listen on a specific port and have your master send a message to that port on localhost, it should work, but you'll have to implement your own messaging system...
IMO, if you want to send a message, you know who you want to send it to. If you want to send a message to a "random" recipient, it may be because a message might not be the appropriate way to communicate for that scenario.

Related

How to access to worker's queued requests?

I'm implementing a web server using nodejs which must serve a lot of concurrent requests. As nodejs processes the requests one by one, it keeps them in an internal queue (in libuv, I guess).
I also want to run my web server using cluster module, so there will be one requests queue per worker.
Questions:
If any worker dies, how can I retrieve its queued
requests?
How can I put retrieved requests into other workers' queues?
Is there any API to access to alive workers' requests queue?
By No. 3 I want to keep queued requests somewhere such as Redis (if possible), so in case of server crash, failure or even hardware restart I can retrieve them.

As you mentioned in the tags that you are-already-using/want-to-use redis, you can use queue-manager based on redis to do all the work for you.
Checkout https://github.com/OptimalBits/bull (or it's alternatives).
bull has a concept of queue. you add jobs to the queue and listen to the same queue from different processes/vms. bull will send the same job to only one listener and you have the ability to control how many jobs each listener is processing at the same time (concurrency-level).
In addition, if one of the jobs fails to run (in other words, the listener of the queue threw an error), bull will try to give the same job to different listener.

Netty multi threading per connection

I am new to netty. I would like to develop a server which aims at receiving requests from possibly few(say Max is of 2) clients. But each client will be sending many requests to server continuously. Server has to process such requests and respond to client. So, here I assume that even though if I configure multiple worker threds,it may not be useful as there are only 2 active connections. Worker thread again block till it process and respond to client. So, please let me know how to handle these type of problems.
If I use threadpoolexecutor in worker thread to process both clients requests in multi threaded manner, will it be efficient? Or if it cane achieved through netty framework, plz let me know how to do this?
Thanks in advance...

If I understand correctly: your clients (2) will send many messages, each of them implying an answear as quickly as possible from the server.
2 options can be seen:
The answear process is short time (short enough to not be an isssue for the rate you want to reach, meaning 1 thread is able to answear as fast as you need for 1 client): then you can stay with the standard threads from Netty (1 worker thread for 1 client at a time) set up in the server bootstrap. This is the shortest path.
The answear process is not short time enough (the rate will be terrible, for instance because there is a "long time" process, such as blocking call, database access, file writing, ...): then you can add a thread pool (a group) in the Netty pipeline for you ChannelHandler doing such blocking/long process.
Here is an extract of the API documentation taken from ChannelPipeline:
http://netty.io/4.0/api/io/netty/channel/ChannelPipeline.html
// Tell the pipeline to run MyBusinessLogicHandler's event handler methods
// in a different thread than an I/O thread so that the I/O thread is not blocked by
// a time-consuming task.
// If your business logic is fully asynchronous or finished very quickly, you don't
// need to specify a group.
pipeline.addLast(group, "handler", new MyBusinessLogicHandler());

just add a ChannelHandler with a special EventExecutorGroup to the ChannelPipeline. For example UnorderedThreadPoolEventExecutor (src).
something like this.
UnorderedThreadPoolEventExecutor executorGroup = ...;
pipeline.addLast(executorGroup, new MyChannelHandler());

Distributing topics between worker instances with minimum overlap

I'm working on a Twitter project, using their streaming API, built on Heroku with Node.js.
I have a collection of topics that my app needs to process, which are pulled from MongoDB. I need to track each of these topics via the API, however it needs to be done such that each topic is tracked only once. As each worker process expires after approximately 1 hour, when a worker receives SIGTERM it needs to untrack each topic assigned, and release it back to the pool again.
I've been using RabbitMQ to communicate between app and worker processes, however with this I'm a little stuck. Are there any good examples, or advice you can offer on the correct way to do this?

Couldn't the worker just send a message via the messagequeue to the application when it receives a SIGTERM? According to the heroku docs on shutdown the process is allowed a couple of seconds (10) before it will be forecefully killed.
So you can do something like this:
// listen for SIGTERM sent by heroku
process.on('SIGTERM', function () {
// - notify app that this worker is shutting down
messageQueue.sendSomeMessageAboutShuttingDown();
// - shutdown process (might need to wait for async completion
// of message delivery to not prevent it from being delivered)
process.exit()
});
Alternatively you could break up your work in much smaller chunks and have workers only 'take' work that will run for a couple of minutes or even seconds max. Your main application should be the bookkeeper and if a process doesn't complete its task within a specified time assume it has gone missing and make the task available for another process to handle. You can probably also implement this behavior using confirms in rabbitmq.

RabbitMQ won't do this for you.
It will allow you to distribute the work to another process and/or computer, but it won't provide the kind of mechanism you need to prevent more than one process / computer from working on a particular topic.
What you want is a semaphore - a way to control access to a particular "resource" from multiple processes... a way to ensure only one process is working on a particular resource at a given time. In your case the "resource" will be the topic... but it will still be the resource that you want to control access to.
FWIW, there has been discussion of using RabbitMQ to implement a distributed semaphore in the past:
https://www.rabbitmq.com/blog/2014/02/19/distributed-semaphores-with-rabbitmq/
https://aphyr.com/posts/315-call-me-maybe-rabbitmq
but the general consensus is that this is a bad idea. there are too many edge cases and scenarios in which RabbitMQ will fail to work as proper semaphore.
There are some node.js semaphore libraries available. I would recommend looking at them, and using one of them. Have a single process manage the semaphore and decide which other process can / cannot work on which topic.

Distributed pub/sub with single consumer per message type

I have no clue if it's better to ask this here, or over on Programmers.SE, so if I have this wrong, please migrate.
First, a bit about what I'm trying to implement. I have a node.js application that takes messages from one source (a socket.io client), and then does processing on the message, which might result in zero or more messages back out, either to the sender, or other clients within that group.
For the processing, I would like to essentially just shove the message into a queue, then it works its way through various message processors that might kick off their own items, and eventually, the bit running socket.io is informed "Hey, send this message back"
As a concrete example, say a user signs into the service, that sign in message is then placed in the queue, where the authorization processor gets it, does it's thing, then places a message back in the queue saying the client's been authorized. This goes back to the socket.io socket that is connected to the client, along with other clients that might be interested. It can also go to other subsystems that might want to do more processing on authorization (looking up user info, sending more info to the client based on their data, etc).
If I wanted strong coupling, this would be easy, but I tried that before, and it just goes to a mess of spaghetti code that's very fragile, and I would like to avoid that. Another wrench in the setup is this should be cluster-able, which is where the real problem comes in. There might be more than one, say, authorization processor running. But the authorization message should be processed only once.
So, in short, I'm looking for a pattern/technique that will allow me to, essentially, have multiple "groups" of subscribers for a message, and the message will be processed only once per group.
I thought about maybe having each instance of a processor generate a unique name that would be used as a list in Reids. This name would then be registered with some sort of dispatch handler, and placed into a set for that group of subscribers. Then when a message arrives, the dispatch pulls a random member out of that set, and places it into that list. While it seems like this would work, it seems somewhat over-complicated and fragile.
The core problem is I've never designed a system like this, so I'm not even sure the proper terms to use or look up. If anyone can point me in the right direction for this, I would be most appreciative.

I think what your describing is similar to https://www.getbridge.com/ service. I it but ended up writing my own based on zeromq, it allows you to register services, req -> <- rec and channels which are pub / sub workers.
As for the design, I used a client -> broker -> services & channels which are all plug and play using auto discovery, you have the services register their schema with the brokers who open a tcp connection so that brokers on other servers can communicate with that broker groups services. Then internal services and clients connect via unix sockets or ipc channels which ever is preferred.

I ended up wrapping around the redis publish/subscribe functions a bit to do this. Each type of message processor gets a "group name", and there can be multiple instances of the processor within that group (so multiple instances of the program can run for clustering).
When publishing a message, I generate an incremental ID, then store the message in a string key with that ID, then publish the message ID.
On the receiving end, the first thing the subscriber does is attempt to add the message ID it just got from the publisher into a set of received messages for that group with sadd. If sadd returns 0, the message has already been grabbed by another instance, and it just returns. If it returns 1, the full message is pulled out of the string key and sent to the listener.
Of course, this relies on redis being single threaded, which I imagine will continue to be the case.

What you might be looking for is an AMQP protocol implementation,where you can have queue get custom exchanges,and implement a pub-sub model.
RabbitMQ - a popular amqp protocol implementation with lots of libraries
it also has node.js library

How to avoid flooding a message queue?

I'm working on an application that is divided in a thin client and a server part, communicating over TCP. We frequently let the server make asynchronous calls (notifications) to the client to report state changes. This avoids that the server loses too much time waiting for an acknowledgement of the client. More importantly, it avoids deadlocks.
Such deadlocks can happen as follows. Suppose the server would send the state-changed-notification synchronously (please note that this is a somewhat constructed example). When the client handles the notification, the client needs to synchronously ask the server for information. However, the server cannot respond, because he is waiting for an answer to his question.
Now, this deadlock is avoided by sending the notification asynchronously, but this introduces another problem. When asynchronous calls are made more rapidly than they can be processed, the call queue keeps growing. If this situation is maintained long enough, the call queue will get totally full (flooded with messages). My question is: what can be done when that happens?
My problem can be summarized as follows. Do I really have to choose between sending notifications without blocking at the risk of flooding the message queue, or blocking when sending notifications at the risk of introducing a deadlock? Is there some trick to avoid flooding the message queue?
Note: To repeat, the server does not stall when sending notifications. They are sent asynchronously.
Note: In my example I used two communicating processes, but the same problem exists with two communicating threads.

If the server is sending informational messages to the client, which you yourself say are asynchronous, it should not have to wait for a reply from the client. If they are not informational, in other words they require an answer, I would say a server should never send such messages to a client, and their presence indicates a poor design.

If you have a constant congestion problem, there is little you can do other than gracefully fail and notify the client that no new messages can be posted; then it is up to the client to maintain a backlog of messages to be posted.
Introducing a priority queue and using message expiration/filtering could allow you to free up space in the queue, but that really just postpones the problem. If possible, you could also aggregate messages or ignore duplicate messages, but again the problem does not seem to be the queue itself. (Not to mention that the more complex queue logic could eat up valuable resources that would be better used actually processing messages.)
Depending on what the server side does, you could introduce result hashing for long computations, offload some types of messages to a dedicated device, check if the server waits unreasonably long for I/O operations, and a myriad of other techniques. Profile if possible, at least try to find out which message(s) causes congestion.
Oh, and the business solution: Compare cost of estimated development time to the cost of better hardware and conclude that you should just buy a more powerful server (or an additional one).

Depending on how important these messages are you might want to look into Message Expiration, or perhaps a Message Filter, though it sounds like your architecture may be incorrect.

I would rather fix the logic in the server side. The message queue should not stall waiting for the answer. Rather have a state machine which can also receive those info queries while it is waiting for the answer from the client.
Of course you can still flood your message queue, but with TCP you can handle it pretty easily.

The best way, I believe, would be to add another state to your client. This I borrowed from the SMPP protocol specs.
Add a congestion state to the client, whereby it always checks the queue length, assuming this is possible, and therefore once a certain threshold is attained, say 1000 unprocessed messages, the client sends the server a message indicating that it's congested and the server will be required to cease all messaging until it receives a notification indicating that the client is no longer congested.
Alternatively, on the server side, if there is a certain number of pending replies, the server could simply cease sending messages until the client replies a certain number of them.
These thresholds can be dynamically calculated or fixed, depending.....

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string