Cassandra Read Timeouts on Specific Servers - cassandra

We have a five node Cassandra cluster with replication factor 3. We are experiencing a lot of Read Timeouts in our application. When we checked tpstats on each Cassandra node, we see that three of the nodes have a lot of Read request drops and a high CPU utilisation, whereas on the other two nodes Read request drops are zero and CPU utilisation is moderate. Note that the total number of Read requests on all servers are almost same.
After taking thread dump we found out that the reason for high CPU utilisation is that Parallel GC is running a lot on the three nodes compared to the other two nodes, which is causing CPU utilisation to go high. What we are not able to understand is why GC should be running more on three nodes and less on two nodes, when the distribution of our partition key and our queries is almost uniform.
Cassandra version is 2.2.3.

Related

Cassandra read latency increases while writing

I have a cassandra cluster, its read latency increases during writes. The writes mostly happen via spark jobs during the night time. The writes happen in huge bursts, is there a way to reduce read latency during the writes. The writes happen using LOCAL_QUORUM and reads happen using LOCAL_ONE. Is there a way to reduce read latency when writes are happening?
Cassandra Cluster Configs
10 Node cassandra cluster (5 in DC1, 5 in DC2)
CPU: 8 Core
Memory: 32GB
Grafana Metrics
I can give some advice:
Use LCS compaction strategy.
Prefer round-robin load balancing policy for reads.
Choose partition_key wisely so that requests are not bombarded on a single partition.
Partition size also play a good role. Cassandra recommends to have smaller partition size. However, I have tested with Partitions of 10000 rows each with each row having size of 800 bytes. It worked better than with 3000 rows(or even 1 row). Very tiny partitions tend to increase CPU usage when data stored is large in terms of row count. However, very large partitions should be avoided even.
Replication Factor should be chosen strategically . Write consistency level should be decided considering the replication of all keyspaces.

Partition Count in Hazelcast

Can we set the partition count of HazelCast IMap equal to number of nodes in the cluster?
What are the pitfalls?
I understand parallelism could be one.
As a consequence of single parallelism on a node, CPU won't be utilized well.
If a new node is added, then it won't get assigned any partitions.
If a node crashes, one of the remaining nodes will have two partitions, hence double CPU and memory load.

How to speedup node joining process in cassandra cluster

I have a cluster 4 cassandra nodes. I have recently added a new node but data processing is taking too long. Is there a way to make this process faster ? output of nodetool
Less data per node. Your screenshot shows 80TB per node, which is insanely high.
The recommendation is 1TB per node, 2TB at most. The logic behind this is bootstrap times get too high (as you have noticed). A good Cassandra ring should be able to rapidly recover from node failure. What happens if other nodes fail while the first one is rebuilding?
Keep in mind that the typical model for Cassandra is lots of smaller nodes, in contrast to SQL where you would have a few really powerful servers. (Scale out vs scale up)
So, I would fix the problem by growing your cluster to have 10X - 20X the number of nodes.
https://groups.google.com/forum/m/#!topic/nosql-databases/FpcSJcN9Opw

Hadoop/Spark : How replication factor and performance are related?

Without discussing all other performance factors, the disk space and the Name node objects, how can replication factor emproves the performance of MR, Tez and Spark.
If we have for example 5 datanades, does it better for the execution engine to set the replication to 5 ? Whats the best and the worst value ?
How this can be good for aggregations, joins, and map-only jobs ?
One of the major tenants of Hadoop is moving the computation to the data.
If you set the replication factor approximately equal to the number of datanodes, you're guaranteed that every machine will be able to process that data.
However, as you mention, namenode overhead is very important and more files or replicas causes slow requests. More replicas also can saturate your network in an unhealthy cluster. I've never seen anything higher than 5, and that's only for the most critical data of the company. Anything else, they left at 2 replicas
The execution engine doesn't matter too much other than Tez/Spark outperforming MR in most cases, but what matters more is the size of your files and what format they are stored in - that will be a major drive in execution performance

Settle the right number of partition on RDD

I read some comments which says than a good number of partition for a RDD is 2-3 time the number of core. I have 8 nodes each with two 12-cores processor, so i have 192 cores, i setup the partition beetween 384-576 but it doesn't seems works efficiently, i tried 8 partition, same result. Maybe i have to setup other parameters in order to my job works better on the cluster rather than on my machine. I add that the file i analyse make 150k lines.
val data = sc.textFile("/img.csv",384)
The primary effect would be by specifying too few partitions or far too many partitions.
Too few partitions You will not utilize all of the cores available in the cluster.
Too many partitions There will be excessive overhead in managing many small tasks.
Between the two the first one is far more impactful on performance. Scheduling too many smalls tasks is a relatively small impact at this point for partition counts below 1000. If you have on the order of tens of thousands of partitions then spark gets very slow.
Now, considering your case, you are getting the same results from 8 and 384-576 partitions. Generally the thumb rule says,
NoOfPartitions = (NumberOfWorkerNodes*NoOfCoresPerWorkerNode)-1
It says that, as we know, the task is processed by CPU cores. So we should set that many number of partitions which is the total number of cores in the cluster to process-1(for Application Master of driver). That means the each core will process each partition at a time.
That means with 191 partitions can improve the performance. Otherwise impact of setting less and more partitions scenario is explained in beginnning.
Hope this will help!!!

Resources