CouchDB cluster database distribution - couchdb

We have a CouchDB cluster with 24 nodes, with q=8 and n=3as default cluster settings, and 100 databases already created. If we added 24 more nodes to the cluster and started creating new databases, would they be created in the new nodes or not necessarily? How does CouchDB make the decision of where to put the new databases?
We are running all nodes in 2.3.1

By default CouchDB will assign the shards for databases across all the nodes in the cluster randomly, so your new databases will have shards on both the new nodes and the old ones. It will spread the shards across as many nodes as possible, and guarantee that two replicas of a shard are never co-located on the same node.
If you wanted to have shards hosted only on the new nodes (e.g. because the old ones are filling up) you could take advantage of the “placement” feature. There are two steps involved:
Walk through the documents in the _nodes database and set a “zone” attribute on each document, e.g. “zone”:”old” for your old nodes and “zone”:”new” for the new ones.
Define a [cluster] placement config setting that tells the server to place 3 copies of each shard in your new zone:
[cluster]
placement = new:3
You could also use this feature for other purposes; for example, by splitting your nodes up into zones based on rack location, availability zone, etc. to ensure that replicas have maximum isolation from each other. You can read more about the setting here:
http://docs.couchdb.org/en/2.3.1/cluster/databases.html#placing-a-database-on-specific-nodes

Related

Hazelcast High Availability in case of 3 nodes cluster

We are using Hazelcast IMDG as an in memory grid. The number of nodes in our cluster is three, and we have one sync backup and the cluster is partition aware. In that case, I expect the distributed map will be distributed across 3 nodes (more or less) homogeneously. In case of a node break down, the leadership should be transferred to a healthy node(which has the sync backup for the lost data). If there is a write request to this newly assigned leader node, the same partition should be replicated synchronously to one of the alive nodes. Does it mean that in case of node failure, approximately one third of the distributed map should be replicated and during the replication time, all reads are blocked? Availability is hit if one of three node is down in case of one sync backup till approximately one third of the distribution is restored?
If a node goes down, the cluster will promote the backup partitions to primary.
And the migrations will start to create backups of these new primary partitions.
Please check the Data Partitioning section.
During migrations, read operations are not blocked.
Only write operations are blocked on the partition that is actively migrating.
Since the partitions are migrated one by one, the effect on availability is minimal.

Member in Hazelcast Map in removed when adding new cluster node

I am using Hazelcast version 3.9.1 and when tried to add a new node many map Members are removed.
We are using Discovering Members by TCP and adding all IP's in the new node and then restart Hazelcast service.
What is a reason of removing map member?
In a running cluster, Map data is not removed but repartitioned i.e. data is rebalanced across the cluster. In other words, when a new member is added to the cluster, some of the existing entries move to the new member.
If you only wanted to add members to an existing cluster then there is no need to restart, Hazelcast automatically rebalances the data.
If there are no backups enabled and you restart the cluster, then the data is lost as data is stored in memory. To prevent data loss, enabled backup (default is 1 synchronous backup enabled) and do not shut down the entire cluster at once.
As wildnez said in his response, adding a new node will cause rebalancing to happen; so while map entries may be removed from the map of a particular node, they will remain in the cluster.
As a concrete example, if you are using the default of 1 backup, and start up a cluster with one node and 1000 entries, then all 1000 entries will be in the map of that node. There will be no backups in that case because even though you specified 1 backup, backups won't be stored on the same node as their primary.
If you add a second node, then approximately 50% of the map entries will be migrated to the new node. For the entries that migrate, their backups will be added on the first node, and for the map entries that don't migrate, their backups will be created on the second node.
If you add a third node, migration will happen again, and each node will have roughly one third of the primary entries and one third of the backups.
Adding a node should never cause map entries to be removed from the cluster, but they will cause migration of entries between cluster members.

Cassandra: Who creates/distributes virtual Nodes among nodes - Leader?

In Cassandra, virtual nodes are created and distributed among nodes as given in http://www.datastax.com/dev/blog/virtual-nodes-in-cassandra-1-2. But who does that process ? Creating the virtual nodes, distributing among peers. Is it some sort of leader ? How does it work ?
Also in case a node joins, virtual nodes are re-distributed. Lot more similar actions are present. Who does all those ?
Edit: Is it like when a node joins, it takes up some part of virtual nodes from existing cluster thus eliminating the need of leader ?
New node retrieves data about the cluster using seed nodes.
The new node will take his part of the cluster, based of num_tokens parameter (by default it will be distributed evenly between all nodes existing nodes), and will bootstrap it's part of data.
The rest of the cluster will be aware about the changes by "gossiping" - using gossip protocol.
Except the seed nodes part, there's no need for any "master" in the process.
Old nodes will not delete partitions automatically, you need to run nodetool cleanup on the old nodes after adding a new node.
Here's good article about it:
http://cassandra.apache.org/doc/latest/operating/topo_changes.html

What are the differences between a node, a cluster and a datacenter in a cassandra nosql database?

I am trying to duplicate data in a cassandra nosql database for a school project using datastax ops center. From what I have read, there is three keywords: cluster, node, and datacenter, and from what I have understand, the data in a node can be duplicated in another node, that exists in another cluster. And all the nodes that contains the same (duplicated) data compose a datacenter. Is that right?
If it is not, what is the difference?
The hierarchy of elements in Cassandra is:
Cluster
Data center(s)
Rack(s)
Server(s)
Node (more accurately, a vnode)
A Cluster is a collection of Data Centers.
A Data Center is a collection of Racks.
A Rack is a collection of Servers.
A Server contains 256 virtual nodes (or vnodes) by default.
A vnode is the data storage layer within a server.
Note: A server is the Cassandra software. A server is installed on a machine, where a machine is either a physical server, an EC2 instance, or similar.
Now to specifically address your questions.
An individual unit of data is called a partition. And yes, partitions are replicated across multiple nodes. Each copy of the partition is called a replica.
In a multi-data center cluster, the replication is per data center. For example, if you have a data center in San Francisco named dc-sf and another in New York named dc-ny then you can control the number of replicas per data center.
As an example, you could set dc-sf to have 3 replicas and dc-ny to have 2 replicas.
Those numbers are called the replication factor. You would specifically say dc-sf has a replication factor of 3, and dc-ny has a replication factor of 2. In simple terms, dc-sf would have 3 copies of the data spread across three vnodes, while dc-sf would have 2 copies of the data spread across two vnodes.
While each server has 256 vnodes by default, Cassandra is smart enough to pick vnodes that exist on different physical servers.
To summarize:
Data is replicated across multiple virtual nodes (each server contains 256 vnodes by default)
Each copy of the data is called a replica
The unit of data is called a partition
Replication is controlled per data center
A node is a single machine that runs Cassandra. A collection of nodes holding similar data are grouped in what is known as a "ring" or cluster.
Sometimes if you have a lot of data, or if you are serving data in different geographical areas, it makes sense to group the nodes of your cluster into different data centers. A good use case of this, is for an e-commerce website, which may have many frequent customers on the east coast and the west coast. That way your customers on the east coast connect to your east coast DC (for faster performance), but ultimately have access to the same dataset (both DCs are in the same cluster) as the west coast customers.
More information on this can be found here: About Apache Cassandra- How does Cassandra work?
And all the nodes that contains the same (duplicated) data compose a datacenter. Is that right?
Close, but not necessarily. The level of data duplication you have is determined by your replication factor, which is set on a per-keyspace basis. For instance, let's say that I have 3 nodes in my single DC, all storing 600GB of product data. My products keyspace definition might look like this:
CREATE KEYSPACE products
WITH replication = {'class': 'NetworkTopologyStrategy', 'MyDC': '3'};
This will ensure that my product data is replicated equally to all 3 nodes. The size of my total dataset is 600GB, duplicated on all 3 nodes.
But let's say that we're rolling-out a new, fairly large product line, and I estimate that we're going to have another 300GB of data coming, which may start pushing the max capacity of our hard drives. If we can't afford to upgrade all of our hard drives right now, I can alter the replication factor like this:
CREATE KEYSPACE products
WITH replication = {'class': 'NetworkTopologyStrategy', 'MyDC': '2'};
This will create 2 copies of all of our data, and store it in our current cluster of 3 nodes. The size of our dataset is now 900GB, but since there are only two copies of it (each node is essentially responsible for 2/3 of the data) our size on-disk is still 600GB. The drawback here, is that (assuming I read and write at a consistency level of ONE) I can only afford to suffer a loss of 1 node. Whereas with 3 nodes and a RF of 3 (again reading and writing at consistency ONE), I could lose 2 nodes and still serve requests.
Edit 20181128
When I make a network request am I making that against the server? or the node? Or I make a request against the server does it then route it and read from the node or something else?
So real quick explanation: server == node
As far as making a request against the nodes in your cluster, that behavior is actually dictated from the driver on the application side. In fact, the driver maintains a copy of the current network topology, as it reads the cluster gossip similar to how the nodes do.
On the application side, you can set a load balancing policy. Specifically, the TokenAwareLoadBalancingPolicy class will examine the partition key of each request, figure out which node(s) has the data, and send the request directly there.
For the other load balancing policies, or for queries where a single partition key cannot be determined, the request will be sent to a single node. This node will act as a "coordinator." This chosen node will handle the routing of requests to the nodes responsible for them, as well as the compilation/returning of any result sets.
Node:
A machine which stores some portion of your entire database. This may included data replicated from another node as well as it's own data. What data it is responsible for is determined by it's token ranges, and the replication strategy of the keyspace holding the data.
Datacenter:
A logical grouping of Nodes which can be separated from another nodes. A common use case is AWS-EAST vs AWS-WEST. The replication NetworkTopologyStrategy is used to specify how many replicas of the entire keyspace should exist in any given datacenter. This is how Cassandra users achieve cross-dc replication. In addition their are Consistency Level policies that only require acknowledgement only within the Datacenter of the coordinator (LOCAL_*)
Cluster
The sum total of all the machines in your database including all datacenters. There is no cross-cluster replication.
As per below documents:-
https://docs.datastax.com/en/archived/cassandra/3.0/cassandra/architecture/archIntro.html
Node
Where you store your data. It is the basic infrastructure component of Cassandra.
Datacenter
A collection of related nodes. A datacenter can be a physical datacenter or virtual datacenter. Different workloads should use separate datacenters, either physical or virtual. Replication is set by datacenter. Using separate datacenters prevents Cassandra transactions from being impacted by other workloads and keeps requests close to each other for lower latency. Depending on the replication factor, data can be written to multiple datacenters. datacenters must never span physical locations.
Cluster
A cluster contains one or more datacenters. It can span physical locations.

Ability to write to a particular cassandra node

Is there a possibility to write to a particular node using datastax driver?
For example, I have three nodes in datacenter 1 and three nodes in datacenter 2.
Existing
If i build up the cluster with any one of them as seed, all the nodes will get detected by the datastax java driver. So, in this case, if i insert a data using driver, it will automatically choose one of the nodes and proceed with it as the co-ordinator(preferably local data center)
Requirement
I want a way to contact any node in datacenter 2 and hand over the co-ordinator job to one of the nodes in datacenter 2.
Why i need this
I am trying to use the trigger functionality from datacenter 2 alone. Since triggers are taken care by co-ordinator , i want a co-ordinator to be selected from datacenter 2 so that data center 1 doesnt have to do this operation.
You may be able to use the DCAwareRoundRobinPolicy load balancing policy to achieve this by creating the policy such that DC2 is considered the "local" DC.
Cluster.Builder builder = Cluster.builder().withLoadBalancingPolicy(new DCAwareRoundRobinPolicy("dc2"));
In the above example, remote (non-DC2) nodes will be ignored.
There is also a new WhiteListPolicy in driver version 2.0.2 that wraps another load balancing policy and restricts the nodes to a specific list you provide.
Cluster.Builder builder = Cluster.builder().withLoadBalancingPolicy(new WhiteListPolicy(new DCAwareRoundRobinPolicy("dc2"), whiteList));
For multi-DC scenarios Cassandra provides EACH and LOCAL consistency levels where EACH will acknowledge successful operation in each DC and LOCAL only in local one.
If I understood correctly, what you are trying to achieve is DC failover in your application. This is not a good practice. Let's assume your application is hosted in DC1 alongside with Cassandra. If DC1 goes down, your entire application is unavailable. If DC2 goes down, your application still can write with LOCAL CL and C* will replicate changes when DC2 is back.
If you want to achieve HA, you need to deploy application in each DC, use CL=LOCAL_X and finally do failover on DNS level (e.g. using AWS Route53).
See data consistency docs and this blog post for more info about consistency levels for multiple DCs.

Resources