I have a multi-node cluster Hazelcast application that uses ITopic's. I'm trying to understand whether, in order for things to be "cleaned up" properly when a node crashes, my application should detect the node crash and remove that node's registration IDs - or whether Hazelcast automatically takes care of that.
By "node crash" I mean that an app that is part of a Hazelcast cluster terminates ungracefully, without calling ITopic.removeMessageListener or HazelcastInstance.shutdown. This could be due to the app crashing or being killed or the host crashing.
Here's the long story, in case it helps. I don't know the internals of Hazelcast and couldn't find anything relevant in the documentation. However, I can think of a couple of ways this "automatic" cleanup could work:
1. On each node, Hazelcast keeps a list of all subscribers, both local and remote. When it detects that another node is unavailable, Hazelcast automatically removes that other node's listeners from the list of ITopic subscribers.
2. On each node, Hazelcast only keeps a list of local subscribers. When a publisher calls ITopic.publish, Hazelcast sends the message to all nodes. Upon receiving the message, Hazelcast on each node calls onMessage on all local subscriber.
Here's a sample scenario. Let's suppose I have a Hazelcast cluster with 2 nodes, A and B. Both node A and node B register listeners to the same ITopic via ITopic.addMessageListener.
Let's suppose that node B crashes without calling ITopic.removeMessageListener or HazelcastInstance.shutdown
Eventually, Hazelcast on node A detects that node B is unavailable.
Now let's suppose that a publisher on node A calls ITopic.publish. Does Hazelcast on A still tries to send the message to the subscriber on B? And let's suppose that after some time node B is restarted, and a publisher on A calls ITopic.publish. Does Hazelcast on A still tries to send the message to the old subscriber on B?
Thank you in advance.
Hazelcast will remove listeners for dead nodes automatically on death-detection. If this doesn't happen (I guess there might be a reason for you to ask) this is a bug.
Hazelcast will also not try to send events to the dead node after it was recognized as dead, that said it means that events being send in abstinence of node B won't be redelivered whenever the node is coming back. There is no correlation between the old, dead node B and the newly connected one.
Does that answer the question? :)
Related
I am running a node express app in pm2 cluster mode. Everything is working fine, however; I have noticed that incoming connections to my express routes only ever hit the forked worker app instances and never the primary (master) process.
In the pm2 documentation (https://pm2.keymetrics.io/docs/usage/cluster-mode/) on cluster mode they say
Under the hood, this uses the Node.js cluster module
In the "how it works" section on the Node.js website (https://nodejs.org/api/cluster.html#cluster_how_it_works) it says
The cluster module supports two methods of distributing incoming
connections. The first one (and the default one on all platforms
except Windows) is the round-robin approach, where the primary process
listens on a port, accepts new connections and distributes them across
the workers in a round-robin fashion, with some built-in smarts to
avoid overloading a worker process.
Does this mean the primary process will never actually handle any incoming requests? That can't be!! That would make the entire primary process a glorified load balancer and essentially a dead weight with a bunch of code and a full CPU never really getting used.
If the above IS accurate does that mean that the primary process is a bottleneck for all incoming express connections?
What am I understanding incorrectly or doing wrong that the primary (master) process never actually handles any requests please?
After I completely removed and re installed pm2 and then re-added all my node apps back in cluster mode via cli the first instance (app 0) started receiving messages. I didn't change any code so I'm not exactly sure what the issues was. Thank you to #JonePolvora for your time with comments that lead me to troubleshoot more.
This is something that I haven't been able to find in the official documentation nor anywhere else yet; the situtation I propose is basically this:
I have a cluster of N Vert.x instances of the same service, same codebase.
At some point in time I register an EventBus consumer C with an address A cluster-wide. I subscribe a completion handler to get notified when the registration completes on all nodes of the cluster.
Everything is working fine, but now I add a new node to the cluster.
My question is actually two-fold:
Will the C consumer be propagated to the new-joiner? That is, if I do a eventBus().publish(A, ...) from the new-joiner, will the handler get executed?
Will the completion handler be called again (My guess is no, but just in case)?
When you add a new node to the cluster, the app will be started again on this node (if if understood correctly the situation you described).
So on the new node, you'll register an EventBus consumer with for address A cluster-wide.
The new node will be aware of all registrations created previously on the cluster. The previous nodes will be aware of the new registration.
When you do eventBus().publish(A, ...) from the new-joiner, all nodes include it will invoke the consumer registered for this address.
On the new-joiner, the completion handler will be called when the registration has been persisted. There could be a (very small) delay before the new registration is visible from other nodes because the process is asynchronous.
The completion handler on previous nodes will not be invoked again (because the registration of the corresponding consumer already happened).
Scenario: " 2 nodes hazelcast cluster" The ReliableItopic with topic name sample_topic is registered and messages were consumed in 2nd node. Node 1 publishes the message to the ReliableItopic with topic name sample_topic. The two nodes were up and messages were published and received.
After sometime the 2nd node got separated(member removed in hazelcast logs) and when joined(members joined and size was 2 )back, the ReliableItopic message listener stopped working and messages were not consumed.
Because of Hazelcast split brain ,facing this issue
hazelcast version:3.11.2
So whenever Hazelcast split brain Happens,we need to register the ReliableItopicmessage Listener?
That’s not split brain, that’s a normal downscaling and then upscaling of cluster. Whatever events that were generated while the 2nd node was out, would not be delivered. If you are using the same configuration as before disconnection then as soon as the node comes back in, it should start normal operations.
I have a Hazelcast cluster with multiple nodes, each consisting of identical instances of a "Daemon" server process. These daemons are Java applications with embedded Hazelcast caches as well as logic that forms the core of my platform. I need to distribute certain events on the platform to listeners across the cluster which can reside in any (or all) of the connected nodes. From my reading of the documentation it seems to me that if I attach an EntryEventListener to the maps on daemon startup then whenever the event happens in that map my callback will be called in every running instance of the daemon.
What I would like is for the callback to be called once (on any single node) across the cluster for an event. So if I have 10 nodes in the cluster, and each node registers an EntryEventListener on a map when it joins I would like any single one of those listener instances (on any of the nodes) to be triggered when that event happens and not all of them... I don't care which node listener handles the event, as long as it is only a single instance of the listener and not every registered listener. How can I do this?
I saw this old question which sounds like the same question, but I'm not certain and the answer doesn't make sense to me.
hazelcast entry listener on a multinode cluster
In The Hazelcast documentation there is this:
There is also another attribute called local, which is not shown in
the above examples. It is also a boolean attribute that is optional,
and if you set it to true, you can listen to the items on the local
member. Its default value is false.
Does that "local" attribute mean that the event would be triggered only on the node that is the primary owner of the key?
Thanks,
Troy
Yes, setting local to true will make the listener to fire events only if the member is the primary owner of the key. You can achieve what you want using local listeners
There's an application stack containing of
2 embedded hazelcast apps; (app A)
2 apps using hazelcast clients. (app B)
App B needs to coordinate task execution among the nodes, so only one node executes a particular task.
With app A it's rather easy to implement by creating a gatekeeper as a library, which needs to be queried for a task execution permit. The gatekeeper would keep track of hazelcast members in the cluster, and assign permit to only a single node. It would register a MembershipListener in order to track changes in the cluster.
However, app B, being a Hazelcast client, can't make use of such gatekeeper, as clients can't access ClientService (via hazelcastInstance.getClientService()), thus it's unable to register a ClientListener (similar to MembershipListener, but for client nodes) to be notified of added or removed clients.
How could such coordination gatekeeper be implemented for applications that join the cluster as HazelcastClients?
You would probably have to use a listener on a member (take the oldest member in the cluster and update the listener when the "master" changes) and use an ITopic to inform other clients.
Can't think of another way right now.