Is each Cassandra node required to communicate with all other nodes? - cassandra

The Cluster I want to deploy contains multiple Datacenters, the problem is: Not every Datacenter would be directly able to communicate to each other Datacenter. But indirectly, through few Datacenters that are able to communicate to the whole Cluster, they would be "connected".
Through testing it has become evident that this doesnt immediately work. So my question is: Can you somehow make it work? Is there maybe a way to use the fully connected Datacenters as "intermediate" Nodes for the (more) isolated ones?
Thanks in advance for your ideas.

I believe it's impossible on current Cassandra's version. Gossip Protocol requires every nodes communicating with each other.
A node tries to communicate with few nodes every second and it doesn't remember last nodes it reached.
Actually there's no way to restrict this communication between some DCs.
If you have this, you'll have a single point of failure. And we don't like this. :)

Related

Multiple instances of Cassandra on each node in the cluster

Is it possible to have a cluster in Cassandra where each of the server is running multiple instances of Cassandra(each instance is part of the same cluster).
I'm aware that if there's a single server in the cluster, then it's possible to run multiple instances of Cassandra on it, but is it also possible to have multiple such servers in the cluster. If yes, how will the configuration look like(listen address,ports etc)?
Even if it was possible, I understand that there might not be any performance benefits at all, just wanted to know if it's theoretically possible.
Yes, it's possible & such setup is often used for testing, for example, using CCM, although it creates multiple interfaces on loopback (127.0.0.2, ...). DataStax Enterprise also has so-called Multi-instance.
You need carefully configure your instances separating ports, etc. Right now, potentially using the Docker could be the simpler solution to implement it.
But why do you need to do it? Until you have really biffy machine, with a lot of RAM & multiple SSDs, this won't bring you additional performance.
Yes, it is possible even i have worked with 5 instance running in one server in production cluster.
Trust me still it is still running but the generic issues i had is high GC all the time, dropped mutations and high latency so of course it is not good to have this kind of setup.
but for your questions's answer yes it is possible and can be in production also.

How do I determine the number of Node Types, Number of nodes and VM size in Service Fabric cluster for a relatively simple but high throughput API?

I have an Asp.Net core 2.0 Wen API that has a relatively simple logic (simple select on a SQL Azure DB, return about 1000-2000 records. No joins, aggregates, functions etc.). I have only 1 GET API. which is called from an angular SPA. Both are deployed in service fabric as as stateless services, hosted in Kestrel as self hosting exes.
considering the number of users and how often they refresh, I've determined there will be around 15000 requests per minute. in other words 250 req/sec.
I'm trying to understand the different settings when creating my Service Fabric cluster.
I want to know:
How many Node Types? (I've determined as Front-End, and Back-End)
How many nodes per node type?
What is the VM size I need to select?
I have ready the azure documentation on cluster capacity planning. while I understand the concepts, I don't have a frame of reference to determine the actual values i need to provide to the above questions.
In most places where you read about the planning of a cluster they will suggest that this subject is part science and part art, because there is no easy answer to this question. It's hard to answer it because it depends a lot on the complexity of your application, without knowing the internals on how it works we can only guess a solution.
Based on your questions the best guidance I can give you is, Measure first, Measure again, Measure... Plan later. Your application might be memory intensive, network intensive, CPU, Disk and son on, the only way to find the best configuration is when you understand it.
To understand your application before you make any decision on SF structure, you could simply deploy a simple cluster with multiple node types containing one node of each VM size and measure your application behavior on each of them, and then you would add more nodes and span multiple instances of your service on these nodes and see which configuration is a best fit for each service.
1.How many Node Types?
I like to map node types as 1:1 to roles on your application, but is not a law, it will depend how much resource each service will consume, if the service consume enough resource to make a single VM(node) busy (Memory, CPU, Disk, IO), this is a good candidate to have it's own node type, in other cases there are services that are light-weight that would be a waste of resources provisioning an entire VM(node) just for it, an example is scheduled jobs, backups, and so on, In this case you could provision a set of machines that could be shared for these services, one important thing you have to keep in mind when you share a node-type with multiple service is that they will compete for resources(memory, CPU, network, disk) and the performance measures you took for each service in isolation might not be the same anymore, so they would require more resources, the option is test them together.
Another point is the number of replicas, having a single instance of your service is not reliable, so you would have to create replicas of it(the right number I describe on next answer), in this case you end up with a service load split in to multiple nodes, making this node-type under utilized, is where you would consider joining services on same node-type.
2.How many nodes per node type?
As stated before, it will depend on your service resource consumption, but a very basic rule is a minimum of 3 per node type.
Why 3?
Because 3 is the lowest number where you could have a rolling update and guarantee a quorum of 51% of nodes\service\instances running.
1 Node: If you have a service running 1 instance in a node-type of 1 node, when you deploy a new version of your service, you would have to bring down this instance before the new comes up, so you would not have any instance to serve the load while upgrading.
2 Nodes: Similar to 1 node, but in this case you keep only 1 node running, in case of failure, you wouldn't have a failover to handle the load until the new instance come up, it will worse if you are running a stateful service, because you will have only one copy of your data during the upgrade and in case of failure you might loose data.
3 Nodes: During a update you still have 2 nodes available, when the one being updated get back, the next one is put down and you still have 2 nodes running, in case of failure of one node, the other node can support the load until a new node is deployed.
3 nodes does not mean the your cluster will be highly reliable, it means the chances of failure and data loss will be lower, you might be unlucky a loose 2 nodes at same time. As suggested in the docs, in production is better to always keep the number of nodes as 5 or more, and plan to have a quorum of 51% nodes\services available. In this case I would recommend 5, 7 or 9 nodes in cases you really need higher uptime 99.9999...%
3.What is the VM size I need to select?
As said before, only measurements will give this answer.
Observations:
These recommendations does not take into account the planning for primary node types, it is recommended to have at least 5 nodes on primary Node Types, it is where SF system services are placed, they are responsible to manage the
cluster, so they must be highly reliable, otherwise you risk losing control of your cluster. If you plan to share these nodes with your application services, keep in mind that your services might impact them, so you have to always monitor them to check for any impact it might cause.

Whether replicationFactor=2 makes sense in SolrCloud?

We are trying to building our solr cloud servers, we want to increase replicationFactor, but don't want to set it as 3 as we have a lot of data.
So I am wondering whether it makes sense to set replicationFactor as 2, and what's the impact, whether this will cause problem for replica leader election such as split brain etc?
Thanks
replicationFactor will not affect whether a split brain situation arises or not. The cluster details are stored in Zookeeper. As long as you have a working Zookeper ensemble Solr will not have this issue. This means you should make sure you have 2xF+1 zookeper nodes (minimum 3) .
From zookeeper documentation:
For the ZooKeeper service to be active, there must be a majority of
non-failing machines that can communicate with each other.
To create a deployment that can tolerate the failure of F machines,
you should count on deploying 2xF+1 machines.
Here are some links explaining it further:
http://lucene.472066.n3.nabble.com/SolrCloud-and-split-brain-tp3989857p3989868.html
http://lucene.472066.n3.nabble.com/Whether-replicationFactor-2-makes-sense-tp4300204p4300206.html

What would be the fastest way to connect to a Cassandra cluster?

I have an HTTP server receiving new client connections all the time. Each time, I have to reconnect to the Cassandra cluster (each client is attached to a new process via a fork() call.)
I have two problems:
Speed: I'd like to make use of a connection as fast as possible;
Robustness: any one of the Cassandra node could be down.
I would imagine that the best mechanism will work with any cluster, not just Cassandra.
We use thrift to connect, although we may change that later. Either way, as far as network connections are concerned, we just do the regular socket(), bind(), and connect() call sequence.
Most of the code I have seen dealing with similar problems is very serial: it tries to connect to host 1, if it times out, try again with host 2, etc. until all hosts are exhausted.
I was thinking I could instead create one thread per connection attempt (with some sort of limit like 3, 4, or 5 parallel attempts--the number will depend on the size of the Cassandra cluster.) However, I am although thinking that if all connections succeed, I am probably going to waste a lot of time on the cluster side...
Is there a specific way this sort of a thing is generally resolved?
Most (if not all) of these features (failover, smart request routin, retries) are available in the DataStax drivers for Cassandra.
If possible you should migrate away from Thrift.
If you really really have to (please consider the time to develop and maintain this solution on a protocol that has been deprecated for a long while) create your own, you could take some inspiration from the DataStax drivers.

Apache Cassandra overwhelming bandwidth overhead

while testing Apache Cassandra, I inserted 1000 rows of data. I allow it to propagate to the other machine on LAN. This is a 2 machine cluster. I monitor the network connection between the two machine. The total data I expected to flow between the two servers should be around 25Mb including all column names, column values and timestamps). But the actual data sent and received between them was an whopping 362Mb!! Anybody knows why is there such an overwhelming overhead? Thank you
That's interesting. It's probably easier to figure out what's going on if you look at a single operation at a time, though.
It is probably due to the gossip protocol implemented to handle things like cluster membership, management and replication.

Resources