When subscriptions are created and maintained by crossbar where are they stored? I did a quick look through the source code and think they are all stored in local process memory. Is that right? What is the horizontal scale out model if stuff is stored in memory? Are connections expected to be stuck to a given node? What if the connection breaks and is restablished or if a server node goes offline? Do those connections loose all the state (subscription information)?
The scale-out model which Crossbar.io will implement (upcoming in 2015) is described here. On a Crossbar.io node, subscription state is transiently stored in process memory (of each router process), and synchronized across router processes. A given client is always connected to a single node. When it looses it's connection, it's subscriptions are gone. When a node goes down, the client will automatically reconnect - to another node in a cluster. The client will need to reestablish it's subscriptions at the new node. Two clients connected to two different nodes (and the same realm), where the nodes are both part of one cluster will transparently communicate.
Related
Using worker_threads from node 12, is it suitable to establish remote connection within the workers and keep those connection alive ?
I don't mean sharing the socket between the master and the workers like we could do with node cluster and fork.
The idea would be to have pools of secure connections already established within the workers to use if needed.
Let say I have a pool of 10 workers. When a worker is created, some pre-established "TLS" connection are created (streams) to server X,Y amd Z, and the worker is marked as "ready"
Each time that I use a worker to process "heavy" tasks (mapReduce, etc, ) and if I need to post data or get data to/from server X,Y or Z during the process,
I use the appropriate "TLS" connection already established from the pool.
Once the task completed, the result is return to the master and the worker just execute a new/next tasks.
1 ) Do you see any side effect / impact of doing so ?
2 ) would it be better to have the pool of "TLS" connection on the "main thread" (master) . If "remote" data are needed within the workers during the tasks, use the "postMessage" method to communicate with the "master" ( and vice/versa ).
Thanks
Worker Threads do not work for remote connections. However, you can build your own system that would work similar using TLS sockets. In a case of such a system I would definitely recommend keeping these types of connections alive. There is a significant latency in setting up these connections, and having these connections active in memory, will use a minimum amount of resources.
Keep in mind that a system like this has some drawbacks:
You are working with different machines, and each of these machines can have its own set of failure conditions.
You are communicating over a network, connections with remote servers might suddenly drop, for any reason imaginable.
You are increasing the physical distance, this will cause latency.
So keep this in the back of your mind.
Would I recommend building a system like this. It is really hard to determine and it relies on your use case, time and money. You mentioned the cluster nodes are processing 'heavy tasks', and with that I reckon CPU / GPU intensive tasks. So a system like this might be a good solution, however, a simple rest API in front of your processing servers might be good enough. Or maybe even database synchronized servers, that just check the database for tasks to execute.
There are many solutions for the same problem, just have to consider what works best for your project(s).
According to this answer from an Azure Redis Cache team member, the Azure Redis Cache exposes a single endpoint. That endpoint is automatically routed to either the master or the slave node (on failover I assume). That answer also states that:
Azure... requires checks on the client side to ensure that the node is
indeed Master or Slave
So clients see a single endpoint and have to sometime check which instance they're talking to - that raises some questions:
When should a Redis client care whether it talks to the master or the slave node? Is it only to prevent inconsistency during failover, or are there other concerns here?
How (and when) should a client check whether it's connected to the master or the slave instance? Is it by running info replication?
From the docs:
When the master node is rebooted, Azure Redis Cache fails over to the replica node and promotes it to master. During this failover, there may be a short interval in which connections may fail to the cache.
My understanding is you never connect to the slave because it is never exposed to you. If the master goes out, the slave is promoted to master and that's what you reconnect to.
There is already a question on the difference between Hazelcast Instance and Hazelcast client.
And it is mentioned that
HazelcastInstance = HazelcastClient + AnotherFeatures
So is it right to say client just reads and writes to the cluster formed without getting involved in the cluster? i.e. client does not store the data?
This is important to know since we can configure JVM memory as per the usage. The instances forming the cluster will be allocated more than the ones that are just connecting as a client.
It is a little bit more complicated than that. The Hazelcast Lite Member is a full-blown cluster member, without getting partitions assigned. That said, it doesn't store any data but otherwise behaves like a normal member.
Clients on the other side are simple proxies that have to forward everything to one cluster member to get any operation done. You can imagine a Hazelcast client to be something like a JDBC client, that has just enough code to connect to the cluster and redirect requests / retrieve responses.
There's an application stack containing of
2 embedded hazelcast apps; (app A)
2 apps using hazelcast clients. (app B)
App B needs to coordinate task execution among the nodes, so only one node executes a particular task.
With app A it's rather easy to implement by creating a gatekeeper as a library, which needs to be queried for a task execution permit. The gatekeeper would keep track of hazelcast members in the cluster, and assign permit to only a single node. It would register a MembershipListener in order to track changes in the cluster.
However, app B, being a Hazelcast client, can't make use of such gatekeeper, as clients can't access ClientService (via hazelcastInstance.getClientService()), thus it's unable to register a ClientListener (similar to MembershipListener, but for client nodes) to be notified of added or removed clients.
How could such coordination gatekeeper be implemented for applications that join the cluster as HazelcastClients?
You would probably have to use a listener on a member (take the oldest member in the cluster and update the listener when the "master" changes) and use an ITopic to inform other clients.
Can't think of another way right now.
I am running an ExecutorService of more than 50 threads concurrently. Each thread is opening a connection to Cassandra and performing inserts using springframework.data.cassandra. The problem is when I open more than 50 connections at a time, I get the following error.
Caused by: org.jboss.netty.channel.ChannelException: Failed to create a selector.
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.openSelector(AbstractNioSelector.java:343)
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.<init>(AbstractNioSelector.java:100)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.<init>(AbstractNioWorker.java:52)
at org.jboss.netty.channel.socket.nio.NioWorker.<init>(NioWorker.java:45)
at org.jboss.netty.channel.socket.nio.NioWorkerPool.createWorker(NioWorkerPool.java:45)
at org.jboss.netty.channel.socket.nio.NioWorkerPool.createWorker(NioWorkerPool.java:28)
at org.jboss.netty.channel.socket.nio.AbstractNioWorkerPool.newWorker(AbstractNioWorkerPool.java:143)
at org.jboss.netty.channel.socket.nio.AbstractNioWorkerPool.init(AbstractNioWorkerPool.java:81)
at org.jboss.netty.channel.socket.nio.NioWorkerPool.<init>(NioWorkerPool.java:39)
at org.jboss.netty.channel.socket.nio.NioWorkerPool.<init>(NioWorkerPool.java:33)
at org.jboss.netty.channel.socket.nio.NioClientSocketChannelFactory.<init>(NioClientSocketChannelFactory.java:151)
at org.jboss.netty.channel.socket.nio.NioClientSocketChannelFactory.<init>(NioClientSocketChannelFactory.java:116)
at com.datastax.driver.core.Connection$Factory.<init>(Connection.java:532)
at com.datastax.driver.core.Cluster$Manager.<init>(Cluster.java:1201)
at com.datastax.driver.core.Cluster$Manager.<init>(Cluster.java:1144)
at com.datastax.driver.core.Cluster.<init>(Cluster.java:121)
at com.datastax.driver.core.Cluster.<init>(Cluster.java:108)
at com.datastax.driver.core.Cluster.buildFrom(Cluster.java:177)
at com.datastax.driver.core.Cluster$Builder.build(Cluster.java:1109)
If I open exactly 50 threads (or less), it works fine. Is there a way to configure this so I can allow more? In my cassandra.yaml file, rpc_max_threads according to the comments by default "The default is unlimited"
My guess is you are overwhelming your OS by creating too many connections. You should only create 1 Cluster instance per Cassandra cluster. Clusters create Sessions, which manage their own connection pools. Both Cluster and Session are thread safe, so you can share them between threads.
Four simple rules for coding with the driver distills these concepts well:
When writing code that uses the driver, there are four simple rules that you should follow that will also make your code efficient:
Use one cluster instance per (physical) cluster (per application lifetime)
Use at most one session instance per keyspace, or use a single Session and explicitly specify the keyspace in your queries
...
A Cluster instance allows to configure different important aspects of the way connections and queries will be handled. At this level you can configure everything from contact points (address of the nodes to be contacted initially before the driver performs node discovery), the request routing policy, retry and reconnection policies, and so forth. Generally such settings are set once at the application level.
While the session instance is centered around query execution, the Session it also manages the per-node connection pools. The session instance is a long-lived object, and it should not be used in a request-response, short-lived fashion. The code should share the same cluster and session instances across your application.