Is there a relation between Cassandra's configuration parameters(given below with current values), Datastax's C++ driver configuration parameters(given below with current values) and the node's hardware specifications(no. of processors, RAM, no. of disks etc.)
Cassandra's Configuration Parameters(in YAML)
concurrent_reads set as 16
concurrent_writes set as 256
native_transport_max_threads set as 256
native_transport_max_frame_size_in_mb set as 512
Datastax's C++ Driver Configuration Parameters
cass_cluster_set_num_threads_io set as 10
cass_cluster_set_core_connections_per_host set as 1
cass_cluster_set_max_connections_per_host set as 20
cass_cluster_set_max_requests_per_flush set as 10000
Node's specs
No. of processors: 32
RAM: >150 GB
No. of hard disks: 1
Cassandra's Version: 3.11.2
Datastax C++ driver version: 2.7
RHEL version: 6.5
I have a cluster of 2 nodes and I've been getting dismal throughput(12000 ops/second). 1 operation = read + write(I can't use row cache). Is there any parameter which should've been set higher/lower(considering the nodes' specs)?
Please also note that my read+write application is multi-threaded(10
threads). Also, I'm doing asynchronous read+ asynchronous write(using future).
Replication factor is 2, both nodes are in the same DC, consistency
level for both read and write is also 2.
Some of the configuration properties in Cassandra are computed from available CPU cores and drives.
concurrent_reads = 16 * (number of drives)
concurrent_writes = 8 * (CPU cores)
It looks like you've done that, although I would question whether or not your 32 CPUs are all physical cores, or hyper-threaded.
I have a cluster of 2 nodes and I've been getting dismal throughput(12000 ops/second).
Just my opinion, but I think 12k ops/sec is pretty good. Actually REALLY good for a two node cluster. Cassandra scales horizontally, and linearly at that. So the solution here is an easy one...add more nodes.
What is your target operations per second? Right now, you're proving that you can get 6k ops/second per node. Which means, if you add another, the cluster should support 18K/sec. If you go to six nodes, you should be able to support 36k/sec. Basically, figure out your target, and do the math.
One thing you might consider, is to try ScyllaDB. Scylla is a drop-in replacement for Cassandra, which trumpets the ability to hit very high throughput requirements. The drawback, is that I think Scylla is only Cassandra 2.1 or 2.2 compatible ATM. But it might be worth a try based on what you're trying to do.
Related
Does anybody have any tips when moving Spark execution from a few large nodes to many, smaller nodes?
I am running a system with 4 executors, each executor has 24Gb of ram and 12 cores. If I try to scale that out to 12 executors, 4 cores each and 8 Gb of ram (Same total RAM, same total cores, just distributed differently) I run into out of memory errors:
Container killed by YARN for exceeding memory limits. 8.8 GB of 8.8 GB physical memory used.
I have increased the number partitions by a factor of 3 to create more (yet smaller) partitions, but this didn't help.
Does anybody have any tips & tricks when trying to scale spark horizontally?
This is a pretty broad question, executor sizing in Spark is a very complicated kind of black magic, and the rules of thumb which were correct in 2015 for example are obsolete now, as will whatever I say be obsolete in 6 months with the next release of Spark. A lot comes down to exactly what you are doing and avoiding key skew in your data.
This is a good place to start to learn and develop your own understanding:
https://spark.apache.org/docs/latest/tuning.html
There are also a multitude of presentations on Slideshare about tuning Spark, try and read / watch the most recent ones. Anything older than 18 months be sceptical of, and anything older than 2 years just ignore.
I will make the assumption that you are using at least Spark 2.x.
The error you're encountering is indeed because of poor executor sizing. What is happening is that your executors are attempting to do too much at once, and running themselves into the ground as they run out of memory.
All other things being equal these are the current rules of thumb as I apply them:
The short version
3 - 4 virtual (hyperthreaded) cores and 29GB of RAM is a reasonable default executor size (I will explain why later). If you know nothing else, partition your data well and use that.
You should normally aim for a data partition size (in memory) on the order of ~100MB to ~3GB
The formulae I apply
Executor memory = number of executor cores * partition size * 1.3 (safety factor)
Partition size = size on disk of data / number of partitions * deser ratio
The deserialisation ratio is the ratio between the size of the data on disk and the size of data in memory. The Java memory representation of the same data tends to be a decent bit larger than on disk.
You also need to account for whether your data is compressed, many common formats like Parquet and ORC use compression like gzip or snappy.
For snappy compressed text data (very easily compressed), I use ~10X - 100X.
For snappy compressed data with a mix of text, floats, dates etc I see between 3X and 15X typically.
number of executor cores = 3 to 4
Executor cores totally depends on how compute vs memory intensive your calculation is. Experiment and see what is best for your use case. I have never seen anyone informed on Spark advocate more than 6 cores.
Spark is smart enough to take advantage of data locality, so the larger your executor, the better chance that your data is PROCESS_LOCAL
More data locality is good, up to a point.
When a JVM gets too large > 50GB, it begins to operate outside what it was originally designed to do, and depending on your garbage collection algorithm, you may begin to see degraded performance and high GC time.
https://databricks.com/blog/2015/05/28/tuning-java-garbage-collection-for-spark-applications.html
There also happens to be a performance trick in Java that if your JVM is smaller than 32GB, you can use 32 bit compressed pointers rather than 64 bit pointers, which saves space and reduces cache pressure.
https://docs.oracle.com/javase/7/docs/technotes/guides/vm/performance-enhancements-7.html
https://blog.codecentric.de/en/2014/02/35gb-heap-less-32gb-java-jvm-memory-oddities/
It also so happens that YARN adds 7% or 384MB of RAM (whichever is larger) to your executor size for overhead / safety factor, which is where 29GB rule of thumb comes from: 29GB + 7% ~= 32GB
You mentioned that you are using 12 core, 24GB RAM executors. This sends up a red flags for me.
Why?
Because every "core" in an executor is assigned one "task" at time. A task is equivalent to the work required to compute the transformation of one partition from "stage" A to "stage" B.
https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-taskscheduler-tasks.html
https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-DAGScheduler-Stage.html
If your executor has 12 cores, then it is going to try and do 12 tasks simulatenously with a 24GB memory budget. 24GB / 12 cores = 2GB per core. If your partitions are greater than 2GB, you will get an out of memory error. If the particular transformation doubles the size of the input (even intermediately), then you need to account for that as well.
I have few questions with regards to the In-Memory feature in Cassandra
1.) I have a 4 node datacenter and in Opscenter, under memory usage , it shows there is 100GB of in-memory available. Does it mean that each of the 4 nodes have 100GB memory available or is the 100Gb the total in memory capacity for my datacenter?
2.) If really 100GB is available for In-Memory for a datacenter, is it advisable to use the full capacity? Do I need to factor replication factor as well? Say I have a 15GB data which I want to store it in In-Memory, if the replication factor is 2, will it be like we have 30GB of data in In-memory for the datacenter?
3.) In dse.yaml file, there is a property which has the value like percentage of system memory "max_memory_to_lock_fraction" and by default it is 20%. As per the guidelines from Datastax Cassandra, we need to ensure that the in memory usage does not exceed 45% of total available system memory for each node. Is this "max_memory_to_lock_fraction" the parameter that needs to be set for 45%?
4.) Datastax documentation says compression needs to be removed for In-memory table. If compression is indeed set, will it affect the read/write performance?
5.) Output of dsetool inmemorystatus has a parameter called "Current Total memory not able to lock". Is the value present in this parameter denote the available memory. Like say if the value is 1024MB, does it mean that still 1GB In-memory is available for use.
I am using DSE 4.8.11 version. Please help me as I am trying to understand this feature so as to leverage it best.
Thanks in advance.
1) It depends on how you configure it it can be per cluster (all of the available memory) or you can view graphs of individual nodes
2) Yes, replication factor increases data by factor times in total. You will have to factor that in on the cluster level. Very nice tool to help you start: https://www.ecyrd.com/cassandracalculator/
3) Yes max_memory_to_lock_fraction is what you are looking for
4) It will increase processing time, since writes in cassandra are actually cpu bound this might not be best performance wise idea.
5) Yes this means there is still memory (of specified amount), but due to settings cassandra is unable to lock it.
I'm doing some prototyping/benchmarking on Titan, a NoSQL graph database. Titan uses Cassandra as back-end.
I've got one Titan-Cassandra VM running and two cassandra VM's.
Each of them owning roughly 33% of the data (replication factor 1):
All the machines have 4GB of RAM and 4 i7 cores (shared).
I'm interested in all adjacent nodes, so I call Rexter (a REST API) with: http://192.168.33.10:8182/graphs/graph/vertices/35082496/both
These are the results (in seconds):
Note that with the two nodes test, the setup was the exact same as described above, except there is one Cassandra node less. The two nodes (titan-casssandra and Cassandra) both owned 50% of the data.
Titan is the fastest with 1 node and performance tend to degrade when more nodes are added. This is the opposite of what distribution should accomplish, so obviously I'm doing something wrong, right?
These is my Cassandra config:
Cassandra YAML: http://pastebin.com/ZrsRdtuD
Node 2 and node 3 have the exact same YAML file. The only difference is the listen_address (this is equal to the node's IP)
How to improve this performance?
If you need any additional information, don't hesitate to reply.
Thanks
Imagine we have a cassandra cluster with 8 nodes. And we installed 1 million objects and out of that 100 objects are popular.
So, there is a high probability to distribute these 100 objects to all the 8 nodes. This mean that the whole cluster become heavily loaded because of only few popular objects. isn't it?
In such a case, increase replication of objects will make huge effect on the system.
Do you think in such a situation, higher degree of replication would downgrade the system performance?
How the replication factor affect access distributions in such a system?
We have a test cluster of 4 nodes, and we've turned on vnodes. It seems that reading out is somewhat slower than the old method (initial_token). Is there some performance overhead by using vnodes? Do we have to increase/decrease the default num_tokens (256) if we only have 4 physical nodes?
Another scenario we would like to test is to change the num_tokens of the cluster on the fly. Is it possible, or do we have to recreate the whole cluster? If possible, how can we accomplish that?
We're using Cassandra 2.0.4.
It really depends on your application, but if you are running Spark queries on top of Cassandra, then a high number of vnodes can significantly slow down your queries, by at least 2x or 5x. This is because Spark cannot subdivide queries across vnodes, and each vnode results in one Spark partition, and a high number of partitions slows down low latency queries.
The recommended number of vnodes is more like 16. This lets you split a two node cluster in theory to 32 nodes max, which is more than enough of an expansion ratio for most folks.