Say I have 2 datacenters - DC1 and DC2. DC1 has 3 nodes with replication 3 (fully replicated) and DC2 has 1 node with replication 1 (fully replicated).
Say the lone node in DC2 is up, all nodes in DC1 are down, and my read/write consistency is at LOCAL_QUORUM everywhere.
I try to do a transaction on DC2 but it fails due to UnavailableException, which of course means not enough nodes are online. But why? Does the LOCAL part of LOCAL_QUORUM get ignored because I only have one node in that data center?
The lone node in DC2 has 100% of the data so why can't I do anything unless 2 nodes are also up in DC1, regardless of read/write consistency settings?
With your settings, 2 replicas need to be written to disk for a write to succeed. Here the failed write(row) partition might belong to the down nodes. Because the hash of that partition decides where it needs to go. Once you decommision those nodes, ring gets re-adjusted and work fine.
But as long as they are simply down, some writes will succeed and some writes will fail. You can check which write succeeds and which one fails by simply checking the hash and ring tokens
eg: Now imagine we got a request for that node with token range 41-50. And according to replication strategy, the next replica should go to 1-20 and 11-20, then LOCAL_QUORAM is not satisfied because they are down. So your write fails.
From https://groups.google.com/forum/#!topic/aureliusgraphs/fJYH1de5wBw
"titan uses an internal consistency for locking and id allocation, the level it uses is quorum.
as a result no matter what I do titan will always access both DC."
Related
I'm new to cassandra and I'm asking myself what will happen if I have multiple datacenters and at one point one datacenter won't have enough physical memory to store all the data.
Assume we have 2 DCs. The first DC can store 1 TB and the second DC can only hold 500 GB. Furthermore lets say we have a replication factor = 1 for both DCs. As I understand the both DCs will have the complete token ring, so every DC will have the full data. What happens now, if I push data to DC 1 and the total amount of storage needed exceeds 500 GB?
For keeping things simple, I will consider that you write the data using DC1, so this one will be the local DC in each scenario. DC2 that is down will be remote all the time. So what really matters here is what consistency level you are using for yours writes:
consistency level of type LOCAL (LOCAL_QUORUM, ONE, LOCAL_ONE) - you can write your data.
consistency level of type REMOTE (ALL, EACH_QUORUM, QUORUM, TWO, THREE) - you cannot write your data.
I suggest to read about consistency levels.
Also, a very quick test using ccm and cassandra-stress tool might be helpful to reproduce different scenarios.
Another comment is regarding your free space: when a node will hit the 250GB mark (half of 500GB) you will have compaction issues. The recommendation is to have half of the disk empty for compactions to run.
Let's say, however, that you will continue getting data to that node and you will hit the 500GB mark. Cassandra will stop on that node.
I have a cassandra 3.11 production cluster with 15 nodes. Each node has ~500GB total with replication factor 3. Unfortunately the cluster is setup with Replication 'SimpleStrategy'. I am switching it to 'NetworkTopologyStrategy'. I am looking to understand the caveats of doing so on a production cluster. What should I expect?
Switching from mSimpleStrategy to NetworkTopologyStrategy in a single data center configuration is very simple. The only caveat of which I would warn, is to make sure you spell the data center name correctly. Failure to do so will cause operations to fail.
One way to ensure that you use the right data center, is to query it from system.local.
cassdba#cqlsh> SELECT data_center FROM system.local;
data_center
-------------
west_dc
(1 rows)
Then adjust your keyspace to replicate to that DC:
ALTER KEYSPACE stackoverflow WITH replication = {'class': 'NetworkTopologyStrategy',
'west_dc': '3'};
Now for multiple data centers, you'll want to make sure that you specify your new data center names correctly, AND that you run a repair (on all nodes) when you're done. This is because SimpleStrategy treats all nodes as a single data center, regardless of their actual DC definition. So you could have 2 replicas in one DC, and only 1 in another.
I have changed RFs for keyspaces on-the-fly several times. Usually, there are no issues. But it's a good idea to run nodetool describecluster when you're done, just to make sure all nodes have schema agreement.
Pro-tip: For future googlers, there is NO BENEFIT to creating keyspaces using SimpleStrategy. All it does, is put you in a position where you have to fix it later. In fact, I would argue that SimpleStrategy should NEVER BE USED.
so when will the data movement commence? In my case since I have specific rack ids now, so I expect my replicas to switch nodes upon this alter keyspace action.
This alone will not cause any adjustments of token range responsibility. If you already have a RF of 3 and so does your new DC definition, you won't need to run a repair, so nothing will stream.
I have a 15 nodes cluster which is divided into 5 racks. So each rack has 3 nodes belonging to it. Since I previously have replication factor 3 and SimpleStrategy, more than 1 replica could have belonged to the same rack. Whereas NetworkStrategy guarantees that no two replicas will belong to the same rack. So shouldn't this cause data to move?
In that case, if you run a repair your secondary or ternary replicas may find a new home. But your primaries will stay the same.
So are you saying that nothing changes until I run a repair?
Correct.
I have a three node Cassandra (DSE) cluster where I don't care about data loss so I've set my RF to 1. I was wondering how Cassandra would respond to read/write requests if a node goes down (I have CL=ALL in my requests right now).
Ideally, I'd like these requests to succeed if the data exists - just on the remaining available nodes till I replace the dead node. This keyspace is essentially a really huge cache; I can replace any of the data in the event of a loss.
(Disclaimer: I'm a ScyllaDB employee)
Assuming your partition key was unique enough, when using RF=1 each of your 3 nodes contains 1/3 of your data. BTW, in this case CL=ONE/ALL is basically the same as there's only 1 replica for your data and no High Availability (HA).
Requests for "existing" data from the 2 up nodes will succeed. Still, when one of the 3 nodes is down a 1/3 of your client requests (for the existing data) will not succeed, as basically 1/3 of you data is not available, until the down node comes up (note that nodetool repair is irrelevant when using RF=1), so I guess restore from snapshot (if you have one available) is the only option.
While the node is down, once you execute nodetool decommission, the token ranges will re-distribute between the 2 up nodes, but that will apply only for new writes and reads.
You can read more about the ring architecture here:
http://docs.scylladb.com/architecture/ringarchitecture/
I am new to cassandra.
I have a cluster which contains two nodes. I have set Replication factor as 1. Now if one node goes down, I can insert the data with no errors with Consistency = ONE. After insert, if I try to tad the same then it gives me an error
Unavailable: code=1000 [Unavailable exception] message="Cannot achieve consistency level ONE" info={'required_replicas': 1, 'alive_replicas': 0, 'consistency': 'ONE'}
Why didn't cassandra read data from coordinator node ? If one node is UP then alive_replicas should be 1, isn't it ?
I am using cqlsh client.
Replication factor of 1 means that every data exists only once (it is not the number of additional copies, but the number of total copies). Having a cluster with two nodes and RF=1, roughly 50% of your data will be on node1, and the other half will reside on the node2.
You can verify this with the command (check the percentages under the Owns column)
nodetool status your_keyspace_name
Now, if one of your nodes is down, then only those keys are accessible that are stored on the live node. This applies for both read and write. Thus the operations affecting the live node will succeed while the ones affecting the dead node will fail. You can check which node is responsible for any given partition key with the command
nodetool getendpoints your_keyspace your_table your_key
So to answer your question, I suppose that the successful write affected the live node, while the failing read affected the node which was down.
I would like to understand the following,
Suppose we have two data centers DC1 and DC2, each with two nodes.
Now I have formed a token ring with the order DC1:1 - DC2:1 - DC1:2 - DC2:2.
Let us assume, I have not configured my replicas across DCs.
Now my question is, if I write a data into say DC2, will the key be mapped only to the nodes in DC2 or will it get mapped to any of the nodes in the token ring?
If your keyspace replication options are set to
{DC1:2}
(I assume this is what you mean by replicas not being configured across DCs.) Then data will only be stored on DC1 because implicitly the replication factor is zero for DC2. You can write data to any node (DC1 or DC2) and it will be forwarded. This is because, in Cassandra the destination of writes does not depend on which node the write was made to.
If, however, you use
{DC1:2, DC2:2}
then all data will be written to all nodes, again regardless of where the write is made.