YCSB for Cassandra 3.0 Benchmarking - cassandra

I have a cassandra ubuntu visual cluster and need to benchmark it.
I try to do it with yahoo's ycsb (without use of maven if possible).
I use cassandra 3.0.1 but I cant find a suitbale version of ycsb.
I dont want to change to an oldest version of cassandra (ycsb latest cassandra-binding is for cassandra 2.x)
What should I do?

As suggested here, despite Cassandra 3.x is not officially supported, you can use the cassandra-cql binding.
For instance:
/bin/ycsb load cassandra-cql -threads 4 -P workloads/workloada
I just tested it on Cassandra 3.11.0 and it works for both load and run.
That said, the benchmark software to use depends on your test schedule. If you want to benchmark only Cassandra, then #gsteiner 's solution might be the best. If you want to benchmark different databases using the same tool to avoid variability, then YCSB is the right one.

I would recommend using Cassandra-stress to perform a load/performance test on your Cassandra cluster. It is very customizable, to the point that you can test distributions with different data models as well as specify how hard you want to push your cluster.
Here is a link to the Datastax documentation for it that goes into how to use the tool in depth.
https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsCStress_t.html

Related

When there are minor version differences in Cassandra, node operation and different version upgrade issues

Excuse me,
Can 3.11.10 nodes be added to the 3.11.4 cluster?
If I want to upgrade from 3.11.4 to 3.11.10, do I need to run upgradesstables?
Thank you!
Usually it's not recommended to mix different versions of Cassandra inside the cluster, except the time when you're doing upgrades. This comes from the possibility of having some differences in the streaming protocol that is used for bootrapping/removing the nodes, and doing repairs. But it could be ok for versions inside the same major version (3.11), but it makes sense to change the changelog for any changes that may affect streaming.
For upgrade from 3.11.4 to 3.11.10 you don't need to run upgradesstables - this step is always optional, as SSTables will be written in the new format when the compaction happens. Usual recommendation to execute it explicitly mostly for cases where you can benefit from better performance using the new file format, or having bug fixes.

Which Apache cassandra version to use for production

We are exploring apache cassandra and are going to use it for Production soon.
We are going to use mostly Datastax community edition of apache cassandra.
But after reading :
http://www.planetcassandra.org/blog/cassandra-2-2-3-0-and-beyond/
https://www.pythian.com/blog/cassandra-version-production/
With this sentence from above blog “If you don’t mind facing serious bugs and contribute to the development pick 3.x”
I am confused about which version to opt for our production deployment ?
Just need to know whether 3.5.0 and 3.0.6 are production ready.
Datastax community : 3.5.0 from http://www.planetcassandra.org/cassandra/
Datastax community : 3.0.6 from
http://www.planetcassandra.org/archived-versions-of-datastaxs-distribution-of-apache-cassandra/
or
Datastax community : 2.2.6 from
http://www.planetcassandra.org/archived-versions-of-datastaxs-distribution-of-apache-cassandra/
The version provided by datastax is supposed to be stable and production ready. You have an application to monitor your cluster, which is nice if you don't have any ops that knows about cassandra in the first place, and you can pay to get support.
However, you don't have the latest version of Cassandra, and you can miss interesting features.
As for Cassandra 3.x, as said above, you get more features (for example JSON support) and better performance, but if you find a critical bug and can't fix it, you can only writes a ticket and hope they will take care of it quickly. Yet it is production ready and this could work well for you.
In conclusion, go for the latest version only if you need a special feature, or if you have the developers in your team to back your choice. Go for Datastax if you want something that works with less effort.

How to change cassandra standalone mode to distributed

I have installed Cassandra 2.1 stand alone mode in two nodes seperately.
Is there any way to change both to distributed or make both the node used in one cluster.
please help.
This is what you're looking for: https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_add_node_to_cluster_t.html
I also suggest taking a look at this hands on training course: https://academy.datastax.com/courses/ds201-cassandra-core-concepts
It's free and definitely worth your time if you're thinking about using Cassandra in production.

For Cassandra kundera.client.lookup.class options

In order to configure kundera for Cassandra, I notice there are 3 possible options for kundera.client.lookup.class as below
com.impetus.client.cassandra.pelops.PelopsClientFactory
com.impetus.kundera.client.cassandra.dsdriver.DSClientFactory
com.impetus.client.cassandra.thrift.ThriftClientFactory
I am not sure of the Pros and Cons of the above 3 and hence not sure which one to use. Please help me decide
I suggest you to use com.impetus.client.cassandra.thrift.ThriftClientFactory. It is the implementation using just Cassandra's thrift api.
PelopsClient is not in active development.
DSClient is built over datastax driver of cassandra.
There is no real advantage of using either DSClient or ThriftClient.
After further research, I found the following
Don't use PelopsClient as its not in active development as mentioned by #karthik , but more importantly because of the issue reported here
Data Stax Driver is better than thrift client as it over comes few limitations of thrift and they use a different binary protocol specific to cassandra which gives a better performance. Refer Datastax java driver support for Cassandra using Kundera

What is the best way to test Cassandra applications?

I am currently using Achilles Embedded to spin up a local, temporary Cassandra instance and test my functionality there. While this is working to some extend, there seems to be a memory leak as the more tests I run, the more I see messages like PS Scavenge GC in xx ms, and my system slows to a crawl, even freezing the mouse pointer.
So, is there a better way to automatically spin up a small Cassandra instance to run my tests against?
The tool I use for quickly creating a local Cassandra cluster is the ccm (Cassandra Cluster Manager) utility. You can easily create a multi-node cluster on your local machine for any release. See more information here.
I believe some of the Cassandra developers use ccm for their development work, so ccm is kept up to date with the newest releases.
I agree, you can use use CCM. if you have a test cluster. Try using cassandra stress tool (Either standalone or using yam profile). If I am getting your question correct, it will solve your problem.

Resources