My cluster consists of 20 nodes and they are in same DC and Rack.
Keyspace DDL is:
create keyspace hello with replication = {'class': 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '1'};
Table DDL is:
create table world
(
foo text,
bar text,
baz text,
primary key (foo, bar)
)
with compression = {'chunk_length_in_kb': '16', 'class':
'org.apache.cassandra.io.compress.LZ4Compressor'}
and gc_grace_seconds = 86400;
create index idx_world_bar
on world (bar);
The situation is that one of the nodes has a disk failure, so status of the node changes to DN checked by nodetool status.
In this situation, when I use query like:
select * from hello.world where bar="..";
every query result was NoHostAvaiable.
(I know this query is bad pattern.)
I think the reason is,
secondary index is local index.
rf of hello keyspace is 1.
so coordinator try to full node search.
status of one of node is Down. so NoHostAvaiable raised.
These are solutions I think but not best solution.
1st solution
stop cassandra, replace fault disk, cassandra restart on node which is down.
but replace + restart time is so long and node is still down during that time.
2nd solution
remove node which is down by nodetool removenode.
replace fault disk, clear data, bootstrap cassandra.
this solution occurs data loss by clearing data and nodetool repair is useless because rf=1.
this solution may cause token redistribution.
Is there any other way,
to avoid NoHostAvailable while keeping the node down?
Or
restore data and token range after remove nodes?
Or Would you give me a best solution for this situation?
Related
I have a development cassandra cluster of two cassandra nodes [Let's call them NodeA and NodeB]. I also have a script that is continuously sending data on NodeA. I have created the database with the following parameters:
CREATE KEYSPACE test_database WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'} AND durable_writes = true;
Now, for some reason NodeB is stoping after some time. But the issue is, as soon as NodeB stops, the script that is sending data to NodeA starts giving data insertion error.
Can anyone point out a probable reason for the same.
Update: Both the nodes are seed nodes.
How Cassandra handle data repartition
Each key in cassandra can be converted to a token. When you install your cluster, the nodes calculate what range of token they will accept.
Let's take a simple example:
You have two nodes, and a token that goes from 0 to 9. A simple repartition would be: node A stores every token between 0-4 and node B stores every token between 5-9.
How Cassandra works for write
You choose a Coordinator (in your case node A), that receive the data. This node will then calculate a token. As seen in the first example, every node has a range of token assigned to it. So imagine the key is converted to token 4, then the data goes to node A (here the coordinator). If the token is 8, the data will be sent to node B.
What is cassandra data replication factor
The replication factor is how many time your data will be stored on your cluster. For a single database with no racks (your case), the data is first send to the node who owns the token associated with the key, and the replicas are sent to the next node in the topology.
In case of failure of one node, the replicas will help the node to restore its data.
In your case, there are no replicas, and if a node is down, Cassandra can't store the data and throws an error. If you have replication factor 2, Cassandra should be able to store a replica on node A and not fail.
Cassandra's Replication Factor:
Lets say we have 'n' as replication factor which means given input data will be stored/retrieved from 'n' nodes.
t
If you mention the replication factor as '1' which means only one node will have the data.
Partitioning:
Lets say we have 2 nodes, whenever you are inserting the data. Both these nodes will have some data, based on partitioning algorithm mentioned.
For example:
You are inserting 10 records, based on the hashing and partitioning algorithm, it chooses which node needs to be written for each record. Of-course the identification of node is done by the Coordinator :)
Durable Writes:
By default, cassandra always write in commit-log before flushing to disk. If you set to false, it will bypass commit-log and write directly to disk(SSTable).
The problem you have mentioned, for example lets say you are inserting 10 rows.
For simplicity, we can make the partitioning/hashing calculation as n/2.
So, Cassandra's Coordinator node splits up your data into two pieces(for simple calculation it will be 10/2) and tries to put 1st half in to 1st node and succeeds and tries to put the 2nd half into the second node(writing to commit-log), since it is unavailable it is throwing error.
So how do we fix this issue? lets say I want to batch insert multiple insert queries when 1 node in a cluster is down? It returns me
Connection to Cassandra cluster associated with connection cs1 not available due to Host not available. Host Address: cassandra1
If your table is not counter table , you can use consistency level of ANY which gives high availaiblity for write.
Refer this to learn more about it => https://www.datastax.com/blog/2011/05/understanding-hinted-handoff-cassandra-08
I am a complete newbie at Cassandra and am just setting it up and playing around with it and testing different scenarios using cqlsh.
I currently have 4 nodes in 2 datacenters something like this (with proper IPs of course):
a.b.c.d=DC1:RACK1
a.b.c.d=DC1:RACK1
a.b.c.d=DC2:RACK1
a.b.c.d=DC2:RACK1
default=DCX:RACKX
Everything seems to make sense so far except that I brought down a node on purpose just to see the resulting behaviour and I notice that I can no longer query/insert data on the remaining nodes as it results in "Unable to complete request: one or more nodes were unavailable."
I get that a node is unavailable (I did that on purpose), but isnt one of the main points of distributed DB is to continue to support functionalities even as some nodes go down? Why does bringing one node down put a complete stop to everything?
What am I missing?
Any help would be greatly appreciated!!
You're correct in assuming that one node down should still allow you to query the cluster, but there are a few things to consider.
I'm assuming that "nodetool status" returns the expected results for that DC (i.e. "UN" for the UP node, "DN" for the DOWNed node)
Check the following:
Connection's Consistency level (default is ONE)
Keyspace replication strategy and factor (default is Simple, rack/dc unaware)
In cqlsh, "describe keyspace "
Note that if you've been playing around with replication factor you'll need to run a "nodetool repair" on the nodes.
More reading here
Is it possible that you did not set the replication factor on your keyspace with a value greater than 1? For example:
CREATE KEYSPACE "Excalibur"
WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'dc1' : 2, 'dc2' : 2};
Will configure your keyspace such that data is replicated to 2 nodes in each dc1 and dc2 datacenters.
If your replication factor is 1 and a node goes down that owns the data you are querying you will not be able to retrieve the data and C* will fail fast with an unavailable error. In general if C* detects that the consistency level cannot be met to service your query it will fail fast.
I have a Cassandra cluster with three nodes, two of which are up. They are all in the same DC. When my Java application goes to write to the cluster, I get an error in my application that seems to be caused by some problem with Cassandra:
Caused by: com.datastax.driver.core.exceptions.UnavailableException: Not enough replica available for query at consistency ONE (1 required but only 0 alive)
at com.datastax.driver.core.exceptions.UnavailableException.copy(UnavailableException.java:79)
The part that doesn't make sense is that "1 required but only 0 alive" statement. There are two nodes up, which means that one should be "alive" for replication.
Or am I misunderstanding the error message?
Thanks.
You are likely getting this error because the Replication Factor of the keyspace the table you are querying belongs to has a Replication Factor of one, is that correct?
If the partition you are reading / updating does not have enough available replicas (nodes with that data) to meet the consistency level, you will get this error.
If you want to be able to handle more than 1 node being unavailable, what you could do is look into altering your keyspace to set a higher replication factor, preferably three in this case, and then running a nodetool repair on each node to get all of your data on all nodes. With this change, you would be able to survive the loss of 2 nodes to read at a consistency level of one.
This cassandra parameters calculator is a good reference for understanding the considerations of node count, replication factor, and consistency levels.
I hit this today because the datacenter field is case sensitive. If your dc is 'somedc01' this isn't going to work:
replication =
{
'class': 'NetworkTopologyStrategy',
'SOMEDC01': '3' # <-- BOOM!
}
AND durable_writes = true;
Anyway, it's not that intuitive, hope this helps.
in my case, I got a message 0 available, but cassandra was up and cqlsh worked correctly, the problem was accessing from java: query was for a complete table, and some records were not accesible (all nodes containing them down). From cqlsh, select * from table works, only shows accesible records. So, the solution is to recover down nodes, and maybe to change replication factors with:
ALTER KEYSPACE ....
nodetool repair -all
then nodetool status to see changes and cluster structure
For me it was that my endpoint_snitch was still set to SimpleSnitch instead of something like GossipingPropertyFileSnitch. This was preventing the multi-DC cluster from connecting properly and manifesting in the error above.
Lets say I've created a keyspace and table:
CREATE KEYSPACE IF NOT EXISTS keyspace_rep_0
WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 0};
CREATE TABLE IF NOT EXISTS some_table (
some_key ascii,
some_data ascii,
PRIMARY KEY (some_key)
);
I don't want any replica of this data. I can insert into this table with consistency level ANY. But I couldn't select any data from this table.
I got the following errors when querying with consistency levels ANY and ONE, respectively:
message="ANY ConsistencyLevel is only supported for writes"
message="Cannot achieve consistency level ONE"
info={'required_replicas': 1, 'alive_replicas': 0, 'consistency': 1}
I've tried other read consistency levels but none of them worked for me.
This is very similar to choosing 'replication_factor': 1 and shutting down a node. Again I couldn't select any data. All read consistency levels require at least one replica to be up. Is this how Cassandra works? You cannot select data without replication? What am I missing?
Every copy of the data, including the original, is a replica. Replication factor is not a count of additional copies, it is the total number of copies. You need RF >= 1.
I'm rather surprised that it allows RF == 0. With no replicas available, there's nothing to read. However, a comment on CASSANDRA-4486 indicates that this is intentionally allowed, but for special purposes:
. . . the point is that it's legitimate to set up a zero-replication keyspace (this is common when adding a new datacenter) and change it later. In the meantime, it's correct to reject writes to it.
And the write does not result in an error probably due to hinted handoff as mentioned in the descriptions for consistency levels, for ANY:
A write must be written to at least one node. If all replica nodes for the given partition key are down, the write can still succeed after a hinted handoff has been written. If all replica nodes are down at write time, an ANY write is not readable until the replica nodes for that partition have recovered.
So, if you want confirmation that your write was persisted to at least one node and not rely on the hinted handoff (which can expire), then write with consistency level ONE and not ANY.
I'm trying to perform an insert on a brand new install of Cassandra 2, and while I was able to set up a new keyspace and table just fine, I get the eror mentioned above when attempting to perform an insert.
I dont' have any fancy multi server setup, it's just running one one computer with a test db hence my confusion with node configuration
Commands used to create said items are:
CREATE KEYSPACE demodb WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'DC1' : 3 };
USE demodb;
CREATE TABLE users (user_name varchar, state varchar, birth_year bigint, PRIMARY KEY (user_name));
INSERT INTO users (user_name, state, birth_year) VALUES ('canadiancreed', 'PA', 1976);
CREATE KEYSPACE demodb WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'DC1' : 3 };
Is most likely your culprit. It says that data in the demodb keyspace belongs in DC1 and should be replicated 3 times. If your single test node is not specified as being a member of DC1 any request to insert to this keyspace will fail. In addition, if it is a member of DC1 and the consistency level is greater than 1 all requests will fail because it will be impossible for the write to get more than one acknolegdment.
Check what your Data Center is named (nodetool status) and adjust they keyspace replication details to match. That will most likely solve your problems.
---- Edited for more Details and Better Formatting ----
This is one of the most common errors new users have with Cassandra. Basically in Cassandra there are logical units of hardware we call Datacenters. A datacenter is supposed to represent a group of geographically or in some other way distinct group of machines. You can make many of these and protect against failure in one geographic location from causing your application to go offline.
Keyspaces are a logical structure for organizing groups of information, it would be analgous to a Database in the relational world. Each Keyspace gets to specify on which and how many machines should it replicate against. If we use the NetworkTopologyStrategy the replication is specified on a per datacenter basiss. We specify these details at creation time (although they can be modified later) using the "CREATE KEYSPACE .... WITH REPLICATION ".
In your above statement you have specified that all information within the Keyspace demodb should be placed in the datacenter "DC1" and there should be 3 copies of the data in that datacenter. This basically means you have at least 3 Nodes in DC1 and you want a copy of the data on each of those nodes. This by itself will not cause an insert to fail unless the entire datacenter is unknown to the Cassandra cluster. This would be the case if you did no initial configuration of your C* cluster and are just running off the stock yaml.
Running nodetool status will show you what a current node believes about the state of the cluster. Here is the output from C* running off my local machine.
Datacenter: Cassandra
=====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Owns (effective) Host ID Token Rack
UN 127.0.0.1 93.37 KB 100.0% 50be3bec-7e30-4385-bd4a-918055a29292 4731866028208108826 rack1
This output shows that I have a single node operating within a cluster named "Cassandra". This means any inserts to keyspaces which require replicas in other Datacenters will fail because the cluster doesn't know how to handle those requests. (If the nodes were simply down but we had seen them before we could save hints but if the other DC has never been seen we reject the request because the cluster has most likely been misconfigured.)
To fix this situation for I would modify my Keyspace using
cqlsh:demodb> ALTER KEYSPACE demodb WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'Cassandra' : 1 };
Now demoDB requires a copy of the data in 1 machine in the datacenter Cassandra. This is great beacuase as my nodetool output states, I have one node in a datacenter named Cassandra. If I try an insert now it passes.
cqlsh:demodb> INSERT INTO users (user_name, state, birth_year) VALUES ('canadiancreed', 'PA', 1976);
cqlsh:demodb> select * from users where user_name = 'canadiancreed' ;
user_name | birth_year | state
---------------+------------+-------
canadiancreed | 1976 | PA
(1 rows)
and I would change my setup schema script to have the correct datacenter name as well
CREATE KEYSPACE demodb WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'Cassandra' : 1 };
In case you end up here after a Google search, I found that you can get this error if you are using the consistency level ALL (might also be the case for QUORUM with specific replication factor numbers) and the keyspace you use is setup to be replicated on a non-existent or dead datacenter.
Updating the keyspace replication to remove reference to the non-existent datacenter solves the issue.
(and the message is entirely logical in this case: you want results from nodes that don't exist anymore).