Why read fails with cqlsh query when huge tombstones is present - apache-spark

I have a table with huge tombstones.So when i performed a spark job (which reads) on that specific table, it gave result without any issues. But i executed same query using cqlsh it gave me error because huge tombstones present in that table.
Cassandra failure during read query at consistency one(1 replica
needed but o replicas responded ,1 failed
I know tombstones should not be there, i can run repair to avoid them , but apart from that why spark succeeded and cqlsh failed. They both use same sessions and queries.
How spark-cassandra connector works? is it different than cqlsh?
Please let me know .
thank you.

The Spark Cassandra Connector is different to cqlsh in a few ways.
It uses the Java Driver and not the python Driver
It has significantly more lenient retry policies
It full table scans by breaking the request up into pieces
Any of these items could be contributing to why it would work in SCC and not in CQLSH.

Related

How to flush data in all tables of keyspace in cassandra?

I am currently writing tests in golang and I want to get rid of all the data of tables after finishing tests. I was wondering if it is possible to flush the data of all tables in cassandra.
FYI: I am using 3.11 version of Cassandra.
The term "flush" is ambiguous in this case.
In Cassandra, "flush" is an operation where data is "flushed" from memory and written to disk as SSTables. Flushing can happen automatically based on certain triggers or can be done manually with the nodetool flush command.
However based on your description, what you want is to "truncate" the contents of tables. You can do this using the following CQL command:
cqlsh> TRUNCATE ks_name.table_name
You will need to iterate over each table in the keyspace. For more info, see the CQL TRUNCATE command. Cheers!

Data Inconsistency in Cassandra Cluster after migration of data to a new cluster

I see some data inconsistency after moving data to a new cluster.
Old cluster has 9 nodes in total and each has got 2+ TB of data on it.
New cluster has same set of nodes as old and configuration is same.
Here is what I've performed in order:
nodetool snapshot.
Copied that snapshot to destination
Created a new Keyspace on Destination Cluster.
Used sstableloader utility to load.
Restarted all nodes.
After successful completion of transfer, I ran few queries to compare(Old vs New Cluster) and found out that the new cluster is not consistent but the data I see is properly distributed on each node (nodetool status).
Same query returns different sets of results for some of the partitions and I get zero rows first time, second time 100 rows,200 rows and eventually it becomes consistent for few partitions and record count matches with old cluster.
Few partitions have no data in the new cluster where as old cluster has data for those partitions.
I tried running queries on cqlsh with CONSISTENCY ALL but the problem still exist.
Did i miss any important steps to consider before and after?
Is there any procedure to find out the root cause of this?
I am currently running "nodetool repair" but I doubt if that could solve as I tried with Consistency ALL.
Highly Appreciate your help!
The fact that the results eventually becomes consistent indicates that the replicas are out-of-sync.
You can verify this by reviewing the logs around the time that you were loading data, particularly for dropped mutations. You can also check the output of nodetool netstats. If you're seeing blocking read repairs, that's another confirmation that the replicas are out-of-sync.
If you still have other partitions you can test, enable TRACING ON in cqlsh when you query with CONSISTENCY ALL. You will see if there are digest mismatches in the trace output which should also trigger read repairs. Cheers!
[EDIT] Based on your comments below, it sounds like you possibly did not load the snapshots from ALL the nodes in the source cluster with sstableloader. If you've missed loading SSTables to the target cluster, then that would explain why data is missing.

How does spark copy data between cassandra tables?

Can anyone please explain the internal working of spark when reading data from one table and writing it to another in cassandra.
Here is my use case:
I am ingesting data coming in from an IOT platform into cassandra through a kafka topic. I have a small python script that parses each message from kafka to get the tablename it belongs to, prepares a query and writes it to cassandra using datastax's cassandra-driver for python. With that script I am able to ingest around 300000 records per min into cassandra. However my incoming data rate is 510000 records per minute so kafka consumer lag keeps on increasing.
Python script is already making concurrent calls to cassandra. If I increase the number of python executors, cassandra-driver starts failing because cassandra nodes become unavailable to it. I am assumin there is a limit of cassandra calls per sec that I am hitting there. Here is the error message that I get:
ERROR Operation failed: ('Unable to complete the operation against any hosts', {<Host: 10.128.1.3 datacenter1>: ConnectionException('Pool is shutdown',), <Host: 10.128.1.1 datacenter1>: ConnectionException('Pool is shutdown',)})"
Recently, I ran a pyspark job to copy data from a couple of columns in one table to another. The table had around 168 million records in it. Pyspark job completed in around 5 hours. So it processed over 550000 records per min.
Here is the pyspark code I am using:
df = spark.read\
.format("org.apache.spark.sql.cassandra")\
.options(table=sourcetable, keyspace=sourcekeyspace)\
.load().cache()
df.createOrReplaceTempView("data")
query = ("select dev_id,datetime,DATE_FORMAT(datetime,'yyyy-MM-dd') as day, " + field + " as value from data " )
vgDF = spark.sql(query)
vgDF.show(50)
vgDF.write\
.format("org.apache.spark.sql.cassandra")\
.mode('append')\
.options(table=newtable, keyspace=newkeyspace)\
.save()
Versions:
Cassandra 3.9.
Spark 2.1.0.
Datastax's spark-cassandra-connector 2.0.1
Scala version 2.11
Cluster:
Spark setup with 3 workers and 1 master node.
3 worker nodes also have a cassandra cluster installed. (each cassandra node with one spark worker node)
Each worker was allowed 10 GB ram and 3 cores.
So I am wondering:
Does spark read all the data from cassandra first and then writes it to the new table or is there some kind of optimization in spark cassandra connector that allows it to move the data around cassandra tables without reading all the records?
If I replace my python script with a spark streaming job in which I parse the packet to get the table name for cassandra, will that help me ingest data more quickly into cassandra?
Spark connector is optimized because it parallelize processing and reading/inserting data into nodes that are owns the data. You may get better throughput by using Cassandra Spark Connector, but this will require more resources.
Talking about your task - 300000 inserts/minute is 5000/second, and this is not very big number frankly speaking - you can increase throughput by putting different optimizations:
Using asynchronous calls to submit requests. You only need to make sure that you submit more requests that could be handled by one connection (but you can also increase this number - I'm not sure how to do it in Python, but please check Java driver doc to get an idea).
use correct consistency level (LOCAL_ONE should give you very good performance)
use correct load balancing policy
you can run several copies of your script in parallel, making sure that they are all in the same Kafka consumer group.

All host(s) tried for query failed - com.datastax.driver.core.OperationTimedOutException

While performing Cassandra operations (Batch execution- insert and update operations on two tables) using spark job I am getting "All host(s) tried for query failed - com. datastax. driver. core. OperationTimedOutException" error.
Cluster information:
Cassandra 2.1.8.621 | DSE 4.7.1
spark-cassandra-connector-java_2.10 version - 1.2.0-rc1 | cassandra-driver-core version - 2.1.7
Spark 1.2.1 | Hadoop 2.7.1 => 3 nodes
Cassandra 2.1.8 => 5 nodes
Each node having 28 gb memory and 24 cores
While searching for it's solution I came across some discussions, which says you should not use BATCHES. Though I would like to find the root cause of this error.Also, How and from where to set/get "SocketOptions. setReadTimeout", as this timeout limit must be greater than the Cassandra requests timeout as per standard guideline and to avoid possible errors.
Is the request_timeout_in_ms and the SocketOptions. setReadTimeout same?Can anyone help me with this?
While performing Cassandra operations (Batch execution- insert and
update operations on two tables) using spark job I am getting "All
host(s) tried for query failed - com. datastax. driver. core.
OperationTimedOutException" error.
Directly from the docs:
Why are my write tasks timing out/ failing?
The most common cause of this is that Spark is able to issue write requests much more quickly than Cassandra can handle them. This can lead to GC issues and build up of hints. If this is the case with your application, try lowering the number of concurrent writes and the current batch size using the following options.
spark.cassandra.output.batch.size.rows spark.cassandra.output.concurrent.writes
or in versions of the Spark Cassandra Connector greater than or equal to 1.2.0 set
spark.cassandra.output.throughput_mb_per_sec
which will allow you to control the amount of data written to C* per Spark core per second.
you should not use BATCHES
This is not always true, the connector uses local token aware batches for faster reads and writes but this is tricky to get right in a custom app. In many cases async queries are better or just as good.
setReadTimeout
This is a DataStax java driver method. The connector takes care of this for you, no need to change it.

spark datasax cassandra connector slow to read from heavy cassandra table

I am new to Spark/ Spark Cassandra Connector. We are trying spark for the first time in our team and we are using spark cassandra connector to connect to cassandra Database.
I wrote a query which is using a heavy table of the database and I saw that Spark Task didn't start until the query to the table fetched all the records.
It is taking more than 3 hours just to fetch all the records from the database.
To get the data from the DB we use.
CassandraJavaUtil.javaFunctions(sparkContextManager.getJavaSparkContext(SOURCE).sc())
.cassandraTable(keyspaceName, tableName);
Is there a way to tell spark to start working even if all the data didn't finish to download ?
Is there an option to tell spark-cassandra-connector to use more threads for the fetch ?
thanks,
kokou.
If you look at the Spark UI, how many partitions is your table scan creating? I just did something like this and I found that Spark was creating too many partitions for the scan and it was taking much longer as a result. The way I decreased the time on my job was by setting the configuration parameter spark.cassandra.input.split.size_in_mb to a value higher than the default. In my case it took a 20 minute job down to about four minutes. There are also a couple more Cassandra read specific Spark variables that you can set found here.
These stackoverflow questions are what I referenced originally, I hope they help you out as well.
Iterate large Cassandra table in small chunks
Set number of tasks on Cassandra table scan
EDIT:
After doing some performance testing with regards to fiddling with some Spark configuration parameters, I found that Spark was creating far too many table partitions when I wasn't giving the Spark executors enough memory. In my case, upping the memory by a gigabyte was enough to render the input split size parameter unnecessary. If you can't give the executors more memory, you may still need to set spark.cassandra.input.split.size_in_mbhigher as a form of workaround.

Resources