Performance testing in Cassandra - cassandra

I'm currently doing some improvement to Apache cassandra 1.2.8, and I want to do some performance testing on the data base. What is the best way of doing performance testing on this kind of NO-SQL data base? are there any tools or standards which we can use for performance testings?

Check out YCSB. While not a standard it has been used by quite a few products including Cassandra.

Related

Janus Graph backend cassandra vs Bigtable

I am planning to use Janusgraph for building graph of different uses our team handles and I see that janus graph has option to use BigTable or Cassandra as storage backend. I am looking for any recommendation on which backend is more optimal/performant ( I am mainly talking about gremlin query performance on 2 hop neighbor of a node ) with JanusGraph.
I understand that performance is pretty subjective and varies based on datasize/graph connectivity and use case so best approach will be to try out myself, which I am planning to do. But has anyone else has done similar performance comparison ? Is there any general recommendation about storage backend here ?
You're right in that performance is both:
subjective
depends largely on data size
I can tell you that I have done this exercise as well. To that end, I think it's important to share this comparison from DB-Engines.com.
In terms of performance, the biggest thing I'd be looking at is how each handles consistency. As a general rule, databases which enforce stronger levels of consistency typically have to sacrifice performance.
BigTable == strong-consistent
Cassandra == eventually consistent
Other factors worth considering, are the fact that BigTable limits you to Google Cloud (GCP). And if you don't want to lose performance over the network, you'll also need to pay for more (Janus) instances on GCP for data locality.
In terms of raw DB-Engine "score," Cassandra is currently at 114.112, while BigTable is at a paltry 3.582. These scores will change month-to-month, but in general this signifies that Cassandra has a much stronger community around it. Similarly, Cassandra has 18182 questions on this site, while BigTable only has 449. Bottom line, is that it'll be much easier to get support and answers to questions.
Just based on the underlying strength of the community, Cassandra is the better option here.
Having supported JanusGraph on Cassandra for the last few years, I can tell you that overall it's been solid. The difficulties tend to come into play with bulk data loading. But outside of that, things seem to run pretty well.

Performance Difference between Apache Ignite compute grid and Spark

I need to do some computation on a large set of data. In this computation, data will be manipulated and stored (persisted) to Database.
So can anyone suggest which technology is preferable in accordance with performance and Resource Utilization?
The best way here is to implement your logic and check how it works with both frameworks. Also, you can run different benchmarks in your environment.
Information about benchmarks for Apache Ignite you and read here:
https://apacheignite.readme.io/docs/perfomance-benchmarking
In case of any technical questions, you can ask them to the community.

DF.write.mode("append") at scale

Is the SPARK SQL family of API's for writing to a database like this:
DF.write.mode("append").jdbc(url, table, prop)
able to work at scale?
Or is there a time that sqoop should then be used?
In general writing over JDBC will be typically limited by the capabilities of the destination system. In general JDBC connectors are not designed for batch data migrations, and majority of vendors, have their own, platform specific bulk insert tools.
Specific writing mode like append has little or no impact at all.
And as always - if you have questions about performance implications of a specific choice it's best to benchmark it yourself on the platform you use, data that reflects properties of the real input and using resources comparable to the ones, you have at your disposal in production.

Best way to benchmark Cassandra and Hbase for performance?

What's the best way to benchmark Cassandra and Hbase for performance?
I'm working on an application where the Read (80%) and Write (20%) usage through an web application. Users can also do CRUD (Create, Read, Update, Delete) to the data. Our data is all structured from (RDBMS). I have heard about YCSB (Yahoo! Cloud Serving Benchmark).
Had anyone done benchmark on Cassandra vs Hbase for a similar usecase like above?
I will assume that your Cassandra is sitting behind a web app?
If so (as you mentioned CRUD), just benchmark the end points of your CRUD for WRITE (the Create) and the READ via Apache Workbench or Siege under load (ie concurrent calls, etc..)
Update
If you want to purely test if your configuration of Cassandra is correct for raw power:
http://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsCStress_t.html
but if you want to test the application as a whole, Apache workbench and Siege will test your App.
Most of the databases provide some tool to do performance testing. In my opinion, the best way to get an unbiased view is to use a third party tool like https://github.com/brianfrankcooper/YCSB which supports testing different types of ACID and NoSQL databases.

How to optimize a Sybase ASE database?

What are the tricks to optimize a sybase database?
What are the does and don'ts?
Your question seems rather broad and open-ended.
For performance tuning guidelines across the entire product, I would probably start with the several performance tuning books that are in the online documentation.
Ongoing performance optimization can often include monitoring by 3rd party products such as Confio's Ignite (I don't work for them, but it is impressive software).

Resources