Can I have different partitioners in a multiple datacenter configuration in cassandra? - cassandra

Can I have RandomPartitioner in the cluster in datacenter1 and Murmur3Partitioner in the cluster in datacenter2?

No, you need to have the same partitioner on all nodes in the cluster.
If you are asking this because you want a way of migrating from RandomPartitioner to Murmur3Partitioner then it won't work unfortunately. I don't know of a method of moving to Murmur3Partitioner on a live cluster, but the benefit is small so it is unlikely to be worth doing.

Related

Three node Cassandra cluster all nodes configured different rack in same dc

I am a newbie in Cassandra.
In our production environment three node Cassandra clusters are running and serving production traffic but I have below mentioned doubts:-
1) All nodes are configured in different racks i.e rack 1, rack 2 and rack 3 in the same dc. Is it fine or does this configuration have some drawbacks?
2) We are using rf2 and network topology for all the keyspaces except system tables and these system tables are configured with rf2 and simplestrategy ..is it fine or does this need to be changed? should we increase the replication factor of system_auth? ..please let me know..
3) Now I want to add another node in the same dc, what will be the best procedure to do the same without impacting the live traffic?
Cassandra version is Apache cassandra 3.11.
Thanks in advance..
Ans 1) It seems good to have Cassandra nodes in different racks for availability and fault tolerance .
Ans 2) You must increase RF on system_auth so that you can avoid cqlsh login issue from other nodes.
Ans 3) You can add new node without affecting the live traffic on existing cluster. please follow below procedure.
http://cassandra.apache.org/doc/latest/operating/topo_changes.html
Cassandra is designed as a distributed system. Cassandra’s distributed architecture is specifically tailored for multiple-data center deployment. These features are robust and flexible enough that you can configure the cluster for optimal geographical distribution, for redundancy for fail-over and disaster recovery.
Multiple data center deployments are excellent for global solutions where in some applications are operational in one region and other applications in another region and yet using a single cluster of Cassandra which is working in multiple data centers across regions.
For single region applications, still having multiple data-centers is preferred option because it provides disaster recovery even in case one region goes down.
Ans 1) For a single DC Cassandra cluster , recommendation is to have 4 nodes with RF3. Rack 1 with 2 nodes and Rack 2 with 2 nodes. Remember that nodes in the same rack have faster network than nodes in different racks. With two nodes on the same Rack, queries with LOCAL_QUORUM will be faster as compared to queries on a cluster with all nodes on different racks.
If you are not concerned with the query latency , all nodes in different racks (3 racks) will give better disaster recovery as compared with two RACK deployment. Having said that, it's always recommended to use multi DC deployments for production cluster.
Ans 2) It’s always recommended to increase the replication factor of System_auth keyspace and change the replication class to NetworkTopologyStrategy. Please follow this documentation for more details https://docs.datastax.com/en/security/6.0/security/secSystemKeyspace.html
Ans 3) Yes, You can add a new node to existing cluster with ease without impacting the traffic. Please follow this documentation for more details: https://docs.datastax.com/en/archived/cassandra/3.0/cassandra/operations/opsAddNodeToCluster.html

How to generate tokens for murmur3Partitioner for multiple nodes?

I have a Cassandra Cluster with multiple datacenters and nodes
How can I generate token values for Multi-datacenter Cassandra cluster for Murmur3Partitioner?
I found a site which generates token for Murmur3Partitioner and RandomPartitioner but only for 1 DC.
https://www.geroba.com/cassandra/cassandra-token-calculator/
I will update here If found a better link.
https://docs.datastax.com/en/cassandra/3.0/cassandra/configuration/configGenTokens.html
You just need to make sure that each of the nodes in the each DC have the ranges spread evenly.

In Cassandra 3.x, is there a way to set a limit on the cluster usage for each keyspace?

I am currently working on setting up a Cassandra cluster that will be used by different applications each with their own keyspace (in a multi-tenancy fashion).
So I was wondering if I could limitate the usage of my cluster for each keyspace individually.
For example, if keyspace1 is using 65% of the cluster resources, every new request on that keyspace would be put in a queue so it doesn't impact requests on other keyspaces.
I know I can get statistics on each keyspace using nodetool cfstats but I don't know how to take counter measures.
Cluster resources is also a term to define as it can be total CPU usage, JVM heap usage, or proportion of write/read on each keyspace on the cluster at instant t.
Also, if you have strategies to avoid entering into this kind of situation, I'm glad to hear about it !
No, Cassandra doesn't have such functionality. That's why it's recommended to setup separate clusters to isolate from noisy neighbors...
Theoretically you can do this on Docker/Kubernetes/... but it could take a lot of resources to build something working reliably.

Working of vnodes in Cassandra

Can someone explain the working of vnodes allocation in Cassandra?
If we have a cluster of N nodes and a new node is added how are token ranges allocated to this new node?
Rebalancing a cluster is automatically accomplished when adding or removing nodes. When a node joins the cluster, it assumes responsibility for an even portion of data from the other nodes in the cluster. If a node fails, the load is spread evenly across other nodes in the cluster.
Here is some reading that might help you better understand how vnodes work and how ranges are being allocated - Virtual nodes in Cassandra 1.2
As I said above, Cassandra automatically handles the calculation of token ranges for each node in the cluster in proportion to their num_tokens value. Token assignments for vnodes are calculated by the org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocator class.
When a new node joins the cluster, it will inject it's own ranges and steal some rages from the existing nodes. Also this video might help

Configure cassandra to use different network interfaces for data streaming and client connection?

I have a cassandra cluster deployed with 3 cassandra nodes with replication factor of 3. I have a lot of data being written to cassandra on daily basis (10-15GB). I have provisioned these cassandra on commodity hardware as suggested by "Big data community" and I am expecting the nodes to go down frequently which is handled using redundancy provided by cassandra.
My problem is, I have observed cassandra to slow down with writes when a new node is provisioned and the data is being streamed while bootstrapping. So, to overcome this hurdle, We have decided to have a separate network interface for inter-node communication and for client application to write data to cassandra. My question is how can this be configured, if at all this is possible ?
Any help is appreciated.
I think you are chasing the wrong solution.
I am confused by the fact that you only have 3 nodes, yet your concern is around slow writes while bootstrapping. Why? Are you planning to grow your cluster regularly? What is your consistency level on write, as this has a big impact on performance? Obviously if you only have 2 or 3 nodes and you're trying to bootstrap, you will see a slowdown, because you're tying up a significant percentage of your cluster to do the streaming.
Note that "commodity hardware" doesn't mean cheap, low-performance hardware. It just means you don't need the super high-end database-class machines used for databases like Oracle. You should still use really good commodity hardware. You may also need more nodes, as setting RF equal to cluster size is not typically a great idea.
Having said that, you can set your listen_address to the inter-node interface and rpc_address to the client address if you feel that will help.

Resources