Cassandra NoHostAvailable and Failed to open native connection to Cassandra - apache-spark

I'm trying to get cassandra setup, and having some issues where google and other questions here are not helpful.
From cqlsh, I get NoHostAvailable: when I try to query tables after creating them:
Connected to DS Cluster at 10.101.49.129:9042.
[cqlsh 5.0.1 | Cassandra 3.0.9 | CQL spec 3.4.0 | Native protocol v4]
Use HELP for help.
cqlsh> use test;
cqlsh:test> describe kv;
CREATE TABLE test.kv (
key text PRIMARY KEY,
value int
) WITH bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
cqlsh:test> select * from kv;
NoHostAvailable:
All of the nodes are up and running according to nodetool.
When I try to connect from Spark, I get something similar -- everything works fine I can manipulate and connect to tables, until I try to access any data, and then it fails.
val df = sql.read.format("org.apache.spark.sql.cassandra").options(Map("keyspace" -> "test2", "table" -> "words")).load
df: org.apache.spark.sql.DataFrame = [word: string, count: int]
df.show
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 25, HOSTNAME): java.io.IOException: Failed to open native connection to Cassandra at {10.101.49.129, 10.101.50.24, 10.101.61.251, 10.101.49.141, 10.101.60.94, 10.101.63.27, 10.101.49.5}:9042
at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:162)
at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$3.apply(CassandraConnector.scala:148)
at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$3.apply(CassandraConnector.scala:148)
at com.datastax.spark.connector.cql.RefCountedCache.createNewValueAndKeys(RefCountedCache.scala:31)
at com.datastax.spark.connector.cql.RefCountedCache.acquire(RefCountedCache.scala:56)
at com.datastax.spark.connector.cql.CassandraConnector.openSession(CassandraConnector.scala:81)
at com.datastax.spark.connector.rdd.CassandraTableScanRDD.compute(CassandraTableScanRDD.scala:325)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
...
Caused by: java.lang.NoSuchMethodError: com.google.common.util.concurrent.Futures.withFallback(Lcom/google/common/util/concurrent/ListenableFuture;Lcom/google/common/util/concurrent/FutureFallback;Ljava/util/concurrent/Executor;)Lcom/google/common/util/concurrent/ListenableFuture;
at com.datastax.driver.core.Connection.initAsync(Connection.java:177)
at com.datastax.driver.core.Connection$Factory.open(Connection.java:731)
at com.datastax.driver.core.ControlConnection.tryConnect(ControlConnection.java:251)
I apologize if this is a naive question, and thank you in advance.

NoHostAvailable
Replication in Cassandra is done via one of two strategies which can be specified on a particular keyspace.
SimpleStrategy :
Represents a naive approach and spreads data globally among nodes based on the token ranges owned by each node. There is no differentiation between nodes that are in different datacenters.
There is one parameter for SimpleStrategy which chooses how many replicas for any partition will exist within the entire cluster.
NetworkTopologyStrategy :
Represents a per Datacenter replication strategy. With this strategy data is replicated based on the token ranges owned by nodes but only within a datacenter.
This means that if you have a two datacenters with nodes [Token] and a full range of [0-20]
Datacenter A : [1], [11]
Datacenter B : [2], [12]
Then with simple strategy the range would be viewed as being split like this
[1] [2-10] [11] [12 -20]
Which means we would end up with two very unbalanced nodes which only own a single token.
If instead we use NetworkTopologyStrategy the responsibilities look like
Datacenter A : [1-10], [11-20]
Datacenter B : [2-11], [12-01]
The strategy itself can be described with a dictionary as a parameter which lists each datacenter and how many replicas should exist in that datacenter.
For example you can set the replication as
'A' : '1'
'B' : '2'
Which would create 3 replicas for the data, 2 replicas in B but only 1 in A.
This is where a lot of users run into trouble since you could specify
a_mispelled : '4'
Which would mean that a datacenter which doesn't exist should have replicas for that particular keyspace. Cassandra would then respond whenever doing requests to that keyspace that it could not obtain replicas because it can't find the datacenter.
With VNodes you can get skewed replication (if required) by giving different nodes different numbers of VNodes. Without VNodes it just requires shrinking the ranges covered by nodes which have less capacity.
How data gets read
Regardless of the replication, data can be read from any node because the mapping is completely deterministic. Given a keyspace, table and partition key, Cassandra can determine on which nodes any particular token should exist and obtain that information as long as the Consistency Level for the query can be met.
Guava Error
The error you are seeing most commonly comes from a bad package of Spark Cassandra Connector being used. There is a difficulty with working with the Java Cassandra Driver and Hadoop since both require different (incompatible) versions of Guava. To get around this the SCC provides builds with the SCC guava version shaded but re-including the Java Driver as a dependency or using an old build can break things.

for me it looks like two issues:
1st for cqlsh you seem to have missconfigured the replication factor of your keyspace. What's the RF you've used there?
See also the datastax documentation.
For the spark issue it seems like that some of the google guava dependency isn't compatible with your driver?
In the latest guava release there was an API change. See
java.lang.NoClassDefFoundError: com/google/common/util/concurrent/FutureFallback

Related

Why does data become inaccessible on Cassandra when it is scaled out with replication factor 1

I was experimenting a scale out operation with Cassandra, and noticed that the data becomes inaccessible (disappears from client PoV) when you horizontally scale with replication factor (RF) set to 1.
Before scale out:
2 nodes cluster (1 seed + 1 non-seed)
replication = {'class': 'SimpleStrategy', 'replication_factor': '1'}
After scale out:
4 nodes cluster (3 seeds + 1 non-seed)
replication = {'class': 'SimpleStrategy', 'replication_factor': '1'}
In both cases, I tried writing/reading data using cqlsh with CONSISTENCY ONE.
Also checked the backup data, and written data actually still exists after scale out i.e. it is not a data loss. Setting RF=3 somehow made the data available again.
I understand that this is not an ideal setup anyway but why can't cqlsh read an existing replica?
Version info in case it matters:
[cqlsh 5.0.1 | Cassandra 3.11.6 | CQL spec 3.4.4 | Native protocol v4]

uneven data size on cassandra nodes

I am struggling to understand why my Cassandra nodes have uneven data size.
I have a cluster of three nodes. According to nodetool ring, each node owns 33.33%. Still disk space usages are uneven.
Node1: 4.7 GB (DC: logg_2, RAC: RAC1)
Node2: 13.9 GB (DC: logg_2, RAC:RAC2)
Node3: 9.3 GB (DC: logg_2, RAC:RAC1)
There is only one keysapce.
keyspace_definition: |
CREATE KEYSPACE stresscql_cass_logg WITH replication = { 'class': 'NetworkTopologyStrategy', 'logg_2' : 2, 'logg_1' : 1};
And there is only one table named blogposts.
table_definition: |
CREATE TABLE blogposts (
domain text,
published_date timeuuid,
url text,
author text,
title text,
body text,
PRIMARY KEY(domain, published_date)
) WITH CLUSTERING ORDER BY (published_date DESC)
AND compaction = { 'class':'LeveledCompactionStrategy' }
AND comment='A table to hold blog posts'
Please help me to understand it why each node has uneven datasize.
Ownership is how much data is owned by the node.
The percentage of the data owned by the node per datacenter times the
replication factor. For example, a node can own 33% of the ring, but
show 100% if the replication factor is 3.
Attention: If your cluster uses keyspaces having different replication
strategies or replication factors, specify a keyspace when you run
nodetool status to get meaningful ownship information.
More information can be found here:
https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsStatus.html#toolsStatus__description
NetworkTopologyStrategy places replicas in the same datacenter by walking the ring clockwise until reaching the first node in another rack.
NetworkTopologyStrategy attempts to place replicas on distinct racks because nodes in the same rack (or similar physical grouping) often fail at the same time due to power, cooling, or network issues.
Because you only have two racks(RAC1 and RAC2), you are placing node 1 and node 3's replicas in node 2, which is why it is bigger.
https://docs.datastax.com/en/cassandra/3.0/cassandra/architecture/archDataDistributeReplication.html

Not enough replicas available for query at consistency LOCAL_ONE (1 required but only 0 alive)

I am running spark-cassandra-connector and hitting a weird issue:
I run the spark-shell as:
bin/spark-shell --packages datastax:spark-cassandra-connector:2.0.0-M2-s_2.1
Then I run the following commands:
import com.datastax.spark.connector._
val rdd = sc.cassandraTable("test_spark", "test")
println(rdd.first)
# CassandraRow{id: 2, name: john, age: 29}
Problem is that following command gives an error:
rdd.take(1).foreach(println)
# CassandraRow{id: 2, name: john, age: 29}
rdd.take(2).foreach(println)
# Caused by: com.datastax.driver.core.exceptions.UnavailableException: Not enough replicas available for query at consistency LOCAL_ONE (1 required but only 0 alive)
# at com.datastax.driver.core.exceptions.UnavailableException.copy(UnavailableException.java:128)
# at com.datastax.driver.core.Responses$Error.asException(Responses.java:114)
# at com.datastax.driver.core.RequestHandler$SpeculativeExecution.onSet(RequestHandler.java:467)
# at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:1012)
# at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:935)
# at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
And the following command just hangs:
println(rdd.count)
My Cassandra keyspace seems to have the right replication factor:
describe test_spark;
CREATE KEYSPACE test_spark WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '3'} AND durable_writes = true;
How to fix both the above errors?
I assume you hit the issue with SimpleStrategy and multi-dc when using LOCAL_ONE (spark connector default) consistency. It will look for a node in the local DC to make the request to but theres a chance that all the replicas exist in a different DC and wont meet the requirement. (CASSANDRA-12053)
If you change your consistency level (input.consistency.level to ONE) I think it will be resolved. You should also really consider using the network topology strategy instead.

New cassandra node should finish compaction before joining ring

I’d like to know if there is a way to have a Cassandra node join the ring only after it has finished streaming and compaction. The issue I’m experiencing is that when I add a node to my cluster, it streams data from the other nodes then joins the ring, at this point it begins a lot of compactions, and the compactions take a very long time to complete (greater than a day), during this time CPU utilization on that node is nearly 100%, and bloom filter false positive ratio is very high as well which happens to be relevant to my use case. This causes the whole cluster to experience an increase in read latency, with the newly joined node in particular having 10x the typical latency for reads.
I read this post http://www.datastax.com/dev/blog/bootstrapping-performance-improvements-for-leveled-compaction which has this snippet about one way to possibly improve read latency when adding a node.
“Operators usually avoid this issue by passing -Dcassandra.join_ring=false when starting the new node and wait for the bootstrap to finish along with the followup compactions before putting the node into the ring manually with nodetool join”
The documentation on the join_ring option is pretty limited but after experimenting with it it seems that streaming data and the later compaction can’t be initiated until after I run nodetool join for the new host, so I’d like to know how or if this can be achieved.
Right now my use case is just deduping records being processed by a kafka consumer application. The table in cassandra is very simple, just a primary key, and the queries are just inserting new keys with a ttl of several days and checking existence of a key. The cluster needs to perform 50k reads and 50k writes per second at peak traffic.
I’m running cassandra 3.7 My cluster is in EC2 originally on 18 m3.2xlarge hosts. Those hosts were running at very high (~90%) CPU utilization during compactions which was the impetus for trying to add new nodes to the cluster, I’ve since switched to c3.4xlarge to give more CPU without having to actually add hosts, but it’d be helpful to know at what CPU threshold I should be adding new hosts since waiting until 90% is clearly not safe, and adding new hosts exacerbates the CPU issue on the newly added host.
CREATE TABLE dedupe_hashes (
app int,
hash_value blob,
PRIMARY KEY ((app, hash_value))
) WITH bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '90PERCENTILE';
One thing to check that may help avoid the 100% CPU due to compactions is the setting for "compaction_throughput_mb_per_sec".
By default it should be 16MB but check if you have this disabled (set to 0).
nodetool getcompactionthroughput
You can also set the value dynamically
nodetool setcompactionthroughput 16
(set in cassandra.yaml to persist after restart).

Cassandra timeout during read query at consistency LOCAL_ONE

I have a single-node cassandra cluster, 32 cores CPU, 32GB memory and RAID of 3 SSDs, totally around 2.5TB. and i also have another host with 32 cores and 32GB memory, on which i run a Apache Spark.
I have a huge history data in cassandra, maybe 600GB. There're approx more than 1 million new records every day which come from Kafka. And I need to query these new rows every day. But Cassandra failed. I'm confused.
My scheme of Cassandra table is:
CREATE TABLE rainbow.activate (
rowkey text,
qualifier text,
act_date text,
info text,
log_time text,
PRIMARY KEY (rowkey, qualifier)
) WITH CLUSTERING ORDER BY (qualifier ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'}
AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';
CREATE INDEX activate_act_date_idx ON rainbow.activate (act_date);
CREATE INDEX activate_log_time_idx ON rainbow.activate (log_time);
cause the source data maybe contains some duplicative data, so i need to use a primary key to drop the duplicative records. there're two index on this table, the act_date is a date string like '20151211', the log_time is a datetime string like '201512111452', that is the log_time separates records more finer.
if i select records using log_time, cassandra works. but it fails using act_date.
at the first, spark job exit with an error:
java.io.IOException: Exception during execution of SELECT "rowkey", "qualifier", "info" FROM "rainbow"."activate" WHERE token("rowkey") > ? AND token("rowkey") <= ? AND log_time = ? ALLOW FILTERING: All host(s) tried for query failed (tried: noah-cass01/192.168.1.124:9042 (com.datastax.driver.core.OperationTimedOutException: [noah-cass01/192.168.1.124:9042] Operation timed out))
i try to increase the spark.cassandra.read.timeout_ms to 60000. But the job post another error as follow:
java.io.IOException: Exception during execution of SELECT "rowkey", "qualifier", "info" FROM "rainbow"."activate" WHERE token("rowkey") > ? AND token("rowkey") <= ? AND act_date = ? ALLOW FILTERING: Cassandra timeout during read query at consistency LOCAL_ONE (1 responses were required but only 0 replica responded)
i don't know how to solve this problem, i read the docs on spark-cassandra-connector but i don't find any tips.
so would you like to give some advise to help me solve this problem.
thanks very much!
Sounds like an unusual setup. If you have two machines it would be more efficient to configure Cassandra as two nodes and run Spark on both nodes. That would spread the data load and you'd generate a lot less traffic between the two machines.
Ingesting so much data every day and then querying arbitrary ranges of it sounds like a ticking time bomb. When you start getting frequent time out errors it is usually a sign of an inefficient schema where Cassandra cannot do what you are asking in an efficient way.
I don't see the specific cause of the problem, but I'd consider adding another field to the partition key, such as the day so that you could restrict your queries to a smaller subset of your data.

Resources