Cassandra : node become unavailable, while ingesting with Spark - cassandra

After few successfully ingested data into Cassandra with Spark,
an error is now returned every time I try to ingest data with Spark (after few minutes or instantly) :
Caused by: com.datastax.oss.driver.api.core.AllNodesFailedException: Could not reach any contact point, make sure you've provided valid addresses
I checked with simple CQLSH (not Spark), and similar error is indeed returned too (2 nodes of 4) :
Connection error: ('Unable to connect to any servers', {'1.2.3.4': error(111, "Tried connecting to [('1.2.3.4', 9042)]. Last error: Connection refused")})
So basically, when I do ingestion into Cassandra with Spark, some nodes go down at some point. And I have to reboot the node, in order to access it again through cqlsh (and spark).
What is strange, is that it is still written "UP" for the given node when I run nodetool status, while cqlsh tells connection refused for that node.
I try to investigate logs, but I have a big problem : nothing in the logs, no single exception triggered server-side.
What to do in my case ? Why a node go down or become unresponsive in that case ? How to prevent it ?
Thanks
!!! edit !!!
Some of the details asked for, bellow :
Cassandra infrastructure :
network : 10 gbps
two datacenters : datacenter1 and datacenter2
4 nodes in each datacenter
2 replicas per datacenter :
CREATE KEYSPACE my_keyspace WITH replication = {'class': 'NetworkTopologyStrategy', 'datacenter1': '2', 'datacenter2': '2'} AND durable_writes = true;
consistency used for input and output : LOCAL_QUORUM
total physical memory per node : 128GB.
memory repartition per node : 64GB dedicated for each Cassandra instance, and 64GB dedicated for each Spark worker (colocated on each Cassandra node)
storage : 4 TB NVME for each node
Spark application config :
total executors cores : 24 cores (4 instances * 6 cores each)
total executors ram : 48 GB (4 instances * 8 GB each)
cassandra config on spark :
spark.sql.catalog.cassandra.spark.cassandra.output.batch.size.rows 1
spark.sql.catalog.cassandra.spark.cassandra.output.concurrent.writes 100
spark.sql.catalog.cassandra.spark.cassandra.output.batch.grouping.key none
spark.sql.catalog.cassandra.spark.cassandra.output.throughputMBPerSec 80
spark.sql.catalog.cassandra.spark.cassandra.output.consistency.level LOCAL_QUORUM
spark.sql.catalog.cassandra.spark.cassandra.output.metrics false
spark.sql.catalog.cassandra.spark.cassandra.connection.timeoutMS 90000
spark.sql.catalog.cassandra.spark.cassandra.query.retry.count 10
spark.sql.catalog.cassandra com.datastax.spark.connector.datasource.CassandraCatalog
spark.sql.extensions com.datastax.spark.connector.CassandraSparkExtensions

(2 nodes of 4)
Just curious, but what is the replication factor (RF) of the keyspace, and what consistency level is being used for the write operation?
I'll echo Alex, and say that usually this happens because Spark is writing faster than Cassandra can process. That leaves you with two options:
Increase the size of the cluster to handle the write load.
Throttle-back the write throughput of the Spark job.
One thing worth calling out:
2 replicas per datacenter
consistency used for input and output : LOCAL_QUORUM
So you'll probably get more throughput by dropping the write consistency to LOCAL_ONE.
Remember, quorum == RF / 2 + 1, which means LOCAL_QUORUM of 2 is 2.
So I do recommend dropping to LOCAL_ONE, because right now Spark is effectively operating # ALL consistency.
Which JMX indicators I need to care about ?
Can't remember the exact name of it, but if you can find the metric for disk IOPs or throughput, I wonder if it's hitting a threshold and plateauing.

Related

ResponseError: Not enough replicas available for query at consistency SERIAL (2 required but only 1 alive)

I am a newcomer for Cassandra, current I met an issue, my cassandra setup as following,
1 DC, 1 Cluster
3 Nodes.
SimpleStrategy
durable write : true
Replication factor : 2 when creating keyspace.
Use IF NOT EXISTS to insert data into table.
Seed node: 2 of them
Then I bring down one seed node, and I got the following error:
ResponseError: Not enough replicas available for query at consistency SERIAL (2 required but only 1 alive)
That's normal, SERIAL requires a Paxos transaction with a quorum of replicas. For RF 2, the quorum is 2; iow, you cannot tolerate any node down to write at SERIAL to a keyspace with RF 2.
Rule of thumb: don't use RF 2, it's useless. Your quorum is: (2/2)+1 = 2, but for RF 3, it's the same quorum. So you should always prefer RF 3. If you change your keyspace to RF 3, your application would be able to write at SERIAL even if one replica is down.
Also see https://www.ecyrd.com/cassandracalculator/
As per understanding Consistency serial is equivalent to QUORUM.You have RF=2 in 3 node cluster so data in Cassandra inserted based on hash. so when you have inserted the data into the cluster, data may be inserted on both seed nodes.So when you are retrieving the data with one seed node down you can get this error as cluster is not achieving the desired consistency level.
Please refer link for more details.
https://docs.datastax.com/en/ddac/doc/datastax_enterprise/dbInternals/dbIntConfigSerialConsistency.html

JanusGraph Not enough replicas available for query at consistency QUORUM

We have 7 nodes storage cluster with RF=4 and CL=ONE. Janus Graph has below settings in properties file-
storage.cql.replication-factor=4
storage.cql.read-consistency-level=ONE
storage.cql.write-consistency-level=ONE
log.tx.key-consistent=true
When we stopped 2 nodes (out of 7) , Janus Graph failing with below errors:
gremlin-server.log:Caused by: java.util.concurrent.ExecutionException: com.datastax.driver.core.exceptions.UnavailableException: Not enough replicas available for query at consistency QUORUM (3 required but only 2 alive)
I tried log.tx.key-consistent=true , but its not working.
Can you please assist here?
Obliviously there is a quorum operation going on. Seems like the configuration of cl=1 wasn't enough

Cassandra clustering fail over-High Avialability

I have configured a cassandra clustter with 3 nodes
Node1(192.168.0.2) , Node2(192.168.0.3), Node3(192.168.0.4)
Created a keyspace 'test' with replication factor as 2.
Create KEYSPACE test WITH replication = {'class':'SimpleStrategy',
'replication_factor' : 2}
When I stop either Node2 or Node3 (one at a time and both at one time) , I am able to do the CRUD operations on the keyspace.table.
When I stop Node1 and try to update/create a row from Node4 or Node3, getting following error although Node3 and Node4 are up and running-:
All host(s) tried for query failed (tried: /192.168.0.4:9042
(com.datastax.driver.core.exceptions.DriverException: Timeout while
trying to acquire available connection (you may want to increase the
driver number of per-host connections)))
com.datastax.driver.core.exceptions.NoHostAvailableException: All
host(s) tried for query failed (tried: /192.168.0.4:9042
(com.datastax.driver.core.exceptions.DriverException: Timeout while
trying to acquire available connection (you may want to increase the
driver number of per-host connections)))
I am not sure how Cassandra elects a leader if a leader node dies.
So, you are using replication_factor 2, so only 2 nodes will have a replica of you keyspace (not all the 3 nodes).
My first advise is to change the RF to 3.
You have to pay attention to the consistency level you are using; If you have only 2 copies of you data (RF: 2), and you are using Consistency Level QUORUM, it will try to write the data on half of nodes + 1, in this case, all 2 nodes. So if 1 node is down, you will not be able to write/read data.
to verify where the data is replicated you could see how is the ring in you cluster. As you are using SimpleStrategy it will copy the data clockwise direction. And in your case its copied at nodes at 192.168.0.2 and 192.168.0.3.
Take a look at the concepts of replication factor: http://docs.datastax.com/en/archived/cassandra/2.0/cassandra/architecture/architectureDataDistributeReplication_c.html
And Consistency Level: http://docs.datastax.com/en/archived/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html
Great answer about RF vs CL: https://stackoverflow.com/a/24590299/6826860
You can use this calculator to find out if your setup have a decent consistency. In your case the result is You can survive the loss of no nodes without impacting the application
I think I wasn't clear at response. The replication factor is about how many copies of your data will exists. The consistency level is how many copies your client will wait to be made before get an response from server.
Ex: All your nodes are up. The client make a CQL with CL Quorum, the server will copy the data in 2 nodes (3/2 + 1) and reply to client, in background it will copy the data at the third node as well.
In your example, if you shutdown 2 nodes of a 3 node cluster you will never achieve an QUORUM to make requests (with CL QUORUM), so you have to use consistency level ONE, once the nodes are up again, cassandra will copy the data on them. One thing that can happen is: before cassandra copy the data on other 2 nodes, the client make a request for node1 or node2 and the data is not there yet.

Consistency level in cassandra issue

Environment :
5 machines Cassandra 2.1.15 cluster.
RF = 3, CL = QUORUM
1 machine goes down for more than 3 hours, without the possibility to bring it back
Decide to do noderemove and replace it :
The problem i saw is this :
Did heavy load over the node :
cassandra-stress write n=50000000 cl=QUORUM -rate threads=1000 -node 192.168.0.171,192.168.0.177,192.168.0.178,192.168.0.179,192.168.0.220
At one time gave me the error :
com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during write query at consistency QUORUM (3 replica were required but only 2 acknowledged the write)
According to my knowledge QUORUM = RF/2+1 rounded down => 2 replicas should be acquired.
Is this some kind of a bug!? Does it have some kind of negative impact?
Are you certain that cassandra-stress is using your keyspace? If you have not configured it to do so, it must be using default keyspace with replications as many as number of nodes. Try using -schema switch for cassandra-stress.
https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsCStress_t.html

cassandra replica exception HUnavailableException

I have a cassandra 2 datacenter pair with single replication with each datacenter containing a single node and each datacenter located on separate physical servers on the network. If one datacenter crashes, the other one will continue to be available for reads and writes I started up my java application, on a 3rd server, and everything it running ok. It's reading and writing to cassandra.
Next I disconnected, pulled the network cable, the 2nd datacenter server from the network.
I expected the application to continue running with no exceptions against the 1st datacenter, but that was not the case.
The following exception started to occur in the application:
me.prettyprint.hector.api.exceptions.HUnavailableException: : May not be enough replicas present to handle consistency level.
at me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:60)
at me.prettyprint.cassandra.service.KeyspaceServiceImpl$9.execute(KeyspaceServiceImpl.java:354)
at me.prettyprint.cassandra.service.KeyspaceServiceImpl$9.execute(KeyspaceServiceImpl.java:343)
at me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:101)
at me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:232)
at me.prettyprint.cassandra.service.KeyspaceServiceImpl.operateWithFailover(KeyspaceServiceImpl.java:131)
at me.prettyprint.cassandra.service.KeyspaceServiceImpl.getSuperColumn(KeyspaceServiceImpl.java:360)
at me.prettyprint.cassandra.model.thrift.ThriftSuperColumnQuery$1.doInKeyspace(ThriftSuperColumnQuery.java:51)
at me.prettyprint.cassandra.model.thrift.ThriftSuperColumnQuery$1.doInKeyspace(ThriftSuperColumnQuery.java:45)
at me.prettyprint.cassandra.model.KeyspaceOperationCallback.doInKeyspaceAndMeasure(KeyspaceOperationCallback.java:20)
at me.prettyprint.cassandra.model.ExecutingKeyspace.doExecute(ExecutingKeyspace.java:85)
at me.prettyprint.cassandra.model.thrift.ThriftSuperColumnQuery.execute(ThriftSuperColumnQuery.java:44)
Once I reconnected the network cable to the 2nd server, the error stopped.
Here's more details on cassandra 1.0.10
1) Here's the following describe from cassandra on both datacenters
Keyspace: AdvancedAds:
Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
Durable Writes: true
Options: [DC2:1, DC1:1]
2) I ran a node tool ring against each instance
./nodetool -h 111.111.111.111 -p 11000 ring
Address DC Rack Status State Load Owns Token
1
111.111.111.111 DC1 RAC1 # <-- usUp Normal 1.07 GB 100.00% 0
111.111.111.222 DC2 RAC1 Up Normal 1.1 GB 0.00% 1
./nodetool -h 111.111.111.222 ring -port 11000
Address DC Rack Status State Load Owns Token
1
111.111.111.111 DC1 RAC1 Up Normal 1.07 GB 100.00% 0
111.111.111.222 DC2 RAC1 # <-- usUp Normal 1.1 GB 0.00% 1
3) I checked the cassandra.yaml
the seeds are 111.111.111.111, 111.111.111.222
4) I checked the cassandra-topology.properties
111.111.111.111
# Cassandra Node IP=Data Center:Rack
# datacenter 1
111.111.111.111=DC1:RAC1 # <-- us
# datacenter 2
111.111.111.222=DC2:RAC1
default=DC1:r1
111.111.111.222
# Cassandra Node IP=Data Center:Rack
# datacenter 1
111.111.111.111=DC1:RAC1
# datacenter 2
111.111.111.222=DC2:RAC1 # <-- us
default=DC1:r1
5) we set the consistencyLevel to LOCAL_QUORUM in our java application as follows:
public Keyspace getKeyspace(final String keyspaceName, final String serverAddresses)
{
Keyspace ks = null;
Cluster c = clusterMap.get(serverAddresses);
if (c != null)
{
ConfigurableConsistencyLevel policy = new ConfigurableConsistencyLevel();
policy.setDefaultReadConsistencyLevel(consistencyLevel);
policy.setDefaultWriteConsistencyLevel(consistencyLevel);
// Create Keyspace
ks = HFactory.createKeyspace(keyspaceName, c, policy);
}
return ks;
}
I was told this configuration would work, but maybe I'm missing something.
Thanks for any insight
Hector is known to return spurious unavailable errors. The native protocol Java driver does not have this problem: https://github.com/datastax/java-driver
If you have only two nodes and your data would be placed on the node that is actually down, when the consistency is required, you may not be able to achieve full write availability. Cassandra would be achieving that with Hinted Handoff, but for the QUORUM consistency level the UnavailableException will be thrown anyway.
The same is true when requesting data belonging to the down node.
However it seems like your cluster is not well balanced. Your node 111.111.111.111 owns 100% and then 111.111.111.222 seem to own 0%, looking at your tokens they seem to be a reason for that.
Checkout how to set initial token here : http://www.datastax.com/docs/0.8/install/cluster_init#token-gen-cassandra
Additionally you may want to check Another Question, which contains answer with more reasons, when the situation like this may happen.
LOCAL_QUORUM won't work if you configure NetworkTopologyStrategy like this:
Options: [DC2:1, DC1:1] # this will make LOCAL_QUORUM and QUORUM fail always
LOCAL_QUORUM and (in my experience) QUORUM require data centers to have at least 2 replicas up. If you want a quorum spanning your data centers you have to set consistency level to data center agnostic TWO.
More examples:
Options: [DC2:3, DC1:1] # LOCAL_QUORUM for clients in DC2 works, QUORUM fails
Options: [DC2:2, DC1:1] # LOCAL_QUORUM in DC2 works, but down after 1 node failure
# QUORUM fails, TWO works.

Resources