Cassandra compaction strategy for data that is updated frequently during the day - cassandra

In my use case I have data that will be updated frequently during one day after its insertion and after that will be used rarely for reads only. What would be the best compaction strategy for that? TWCS or DTCS?

TWCS was created because DTCS requires a lot of tuning and operational care in order to achieve & maintain good performance. TWCS provides a similar level of performance to DTCS and is much easier to work with, so it is definitely the one to use for 99% of cases where time-series data is involved and there will be no inserts/updates after the first window.
Take a look at CASSANDRA-9666 for the details.

Twcs is the best compaction strategy for time serious data if you have frequent updates with less number of reads .
http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html

Related

How should I choose parameters for size tiered compaction strategy?

I have these 2 particular use cases:
Streaming jobs, writing 30mb every 5 seconds
Batch jobs, writing 500 gb every morning
The TTL of my tables in 1,5 years.
These writes can contain many updates, so, according to this table right here:
I should use the SizeTieredCompactionStrategy. However, how do I choose the correct parameters for it?
It has several parameters:
bucket_high
bucket_low
min_sstable_size
min_threshold
max_threshold
As a general proposition, it is very rare for operators to have to configure the size-tiered compaction sub-properties.
Unless you're very experienced with Cassandra, there just isn't any reason to reconfigure the defaults for STCS. That is why it is default compaction strategy out-of-the-box and is suitable for majority of workloads.
The exceptions are using TWCS for true time-series use cases and LCS for very read-heavy with hardly any writes. Cheers!

What is the difference between scylla read path and cassandra read path?

What is the difference between Scylla read path and Cassandra read path? When I stress Cassandra and Scylla then Scylla read performance poor by 5 times than Cassandra using 16 core and normal HDD.
I expect better read performance on Scylla compared to Cassandra using normal HDD, because my company doesn't provide SSD's.
Can someone please confirm, is it possible to achieve better read performance using normal HDD or not?
If yes, what changes required scylla config?. Please guide me!
Some other responses focused on write performance, but this isn't what you asked about - you asked about reads.
Uncached read performance on HDDs is bound to be poor in both Cassandra and Scylla, because reads from disk each requires several seeks on the HDD, and even the best HDD cannot do more than, say, 200 of those seeks per second. Even with a RAID of several of these disks, you will rarely be able to do more than, say, 1000 requests per second. Since a modern multi-core can do orders of magnitude more CPU work than 1000 requests per second, in both Scylla and Cassandra cases, you'll likely see free CPU. So Scylla's main benefit, of using much less CPU per request, will not even matter when the disk is the performance bottleneck. In such cases I would expect Scylla's and Cassandra's performance (I am assuming that you're measuring throughput when you talk about performance?) should be roughly the same.
If, still, you're seeing better throughput from Cassandra than Scylla, there are several details that may explain why, beyond the general client mis-configuration issues raised in other responses:
If you have low amounts of data, that can fit in memory, Cassandra's caching policy is better for your workload. Cassandra uses the OS's page cache, which reads whole disk pages and may cache multiple items in one read, as well as multiple index entries. While Scylla works differently, and has a row cache - only caching the specific data read. Scylla's caching is better for large volumes of data that do not fit in memory, but much worse when the data can fit in memory, until the entire data set has been cached (after everything is cached, it becomes very efficient again).
On HDDs, the details of compaction are very important for read performance - if in one setup you have more sstables to read, it can increase the number of reads and lower the performance. This can change depending on your compaction configuration, or even randomly (depending on when compaction was run last). You can check if this explains your performance issues by doing a major compaction ("nodetool compact") on both systems and checking the read performance afterwards. You can switch the compaction strategy to LCS to ensure that random-access read performance is better, at the cost of more write work (on HDDs, this can be a worthwhile compromise).
If you are measuring scan performance (reading an entire table) instead of reading individual rows, other issues become relevant: As you may have heard, Scylla subdivides each nodes into shards (each shard is a single CPU). This is fantastic for CPU-bounded work, but could be worse for scanning tables which aren't huge, because each sstable is now smaller and the amount of contiguous data you can read before needing to seek again is lower.
I don't know which of these differences - or something else - is causing performance of your use-case to be lower in Scylla, but I please keep in mind that whatever you fix, your performance is always going to be bad with HDDs. With SDDs, we've measured in the past more than a million random-access read requests per second on a single node. HDDs cannot come anything close. If you really need optimum performance or performance per dollar, SDDs are really the way to go.
There can be various reasons why you are not getting the most out of your Scylla Cluster.
Number of concurrent connections from your clients/loaders is not high enough, or you're not using sufficient amount of loaders. In such case, some shards will be doing all the work, while others will be mostly idle. You want to keep your parallelism high.
Scylla likes have a minimum of 2 connections per shard (you can see the number of shards in /etc/scylla.d/cpuset.conf)
What's the size of your dataset? Are you reading a large amount of partitions or just a few? You might be hitting a hot partition situation
I strongly recommend reading the following docs that will provide you more insights:
https://www.scylladb.com/2019/03/27/best-practices-for-scylla-applications/
https://docs.scylladb.com/operating-scylla/benchmarking-scylla/
#Sateesh, I want to add to the answer by #TomerSan that both Cassandra and ScyllaDB utilize the same disk storage architecture (LSM). That means that they have relatively the same disk access patterns because the algorithms are largely the same. The LSM trees were built with the idea in mind that it is not necessary to do instant in-place updates. It consists of immutable data buckets that are large continuous pieces of data on disk. That means less random IO, more sequential IO for which the HDD works great (not counting utilized parallelism by modern database implementations).
All the above means that the difference that you see, is not induced by the difference in how those databases use a disk. It must be related to the configuration differences and what happens underneath. Maybe ScyllaDB tries to utilize more parallelism or more aggressively do compaction. It depends.
In order to be able to say anything specific, please share your tests, envs, and configurations.
Both databases use LSM tree but Scylla has thread-per-core architecture on top plus we use O_Direct while C* uses the page cache. Scylla also has a sophisticated IO scheduler that makes sure not to overload the disk and thus scylla_setup runs a benchmark automatically to tune. Check your output of it in io.conf.
There are far more things to review, better to send your data to the mailing list. In general, Scylla should perform better in this case as well but your disk is likely to be the bottleneck in both cases.
As a summary I would say Scylladb and cassandra have the same read / write path
memtable, commitlog, sstable.
However implementation is very different:
- cassandra rely on OS for low level IO and network (most DBMS does)
- scylladb rely on its own lib (seastar) to handle IO and network at a low level independently from OS page cache etc. This is why they can provide feature such as workload scheduling within the same cluster that would be very hard to implement in cassandra.

Is RDBMS with redundancy as good as nosql dbs?

I am reading about NoSQL DBs (Specifically Cassandra) and It says that Cassandra is faster for writing and queries are fast as well. Schema design is done more based on queries than based on data. For example, You have queries like in this example
then I have a question, Suppose I design the RDBMS schema similar to Cassandra's way and I ensure that no joins are required for queries. Will I get any significant performance gains still by using Cassandra(NoSql DBs)?
Cannot have an exact answer but few points,
JOIN is just of the many things - Cassandra stores the data physically based on the partition keys and hence making the read by partition as fast as possible.
On the performance side - its not about the performance at the beginning but keeping the performance consistent over a period of time. Say for example you have a time series like requirement where data is inserted every second, RDBMS performance will usually degrade as the data grows and not easy to keep up the index and stats up to date etc, while cassandra will fit better for a time series pattern and as the data grows its easy to scale up by adding nodes.
On the write performance - Cassandra's write workflow itself is different and is designed in a way to take up faster (the complicated process like merging sstabls, compaction etc happens in the background without affecting the actual write).
In short - you need to review the business case and make decision.

Compaction = LeveledCompactionStrategy and Counter ColumnFamily

I consider a counter ColumnFamily. Since it holds only counters, I expect to see a large number of updates in this table.
Following http://www.datastax.com/dev/blog/when-to-use-leveled-compaction, I consider using compaction = LeveledCompactionStrategy.
Is this a good idea? If yes, I would have expected counter ColumnFamilies to have compaction=LeveledCompactionStrategy by default, which seems not to be the case.
All tables (ColumnFamilies) are set to size tiered compaction strategy by default. There is no distinction based on the type. This is due to the fact that choosing LCS requires thoughtful consideration. The url you link to is the standard for determining whether or not you should enable LCS. In general, it is not a good idea to do so unless your storage is SSDs or if your use case is write-heavy.
As is mentioned in the article, you can test the impact of LCS using write-survey mode. I would urge you to make use of this feature before making the switch.

Cassandra repair - lots of streaming in case of incremental repair with Leveled Compaction enabled

I use Cassandra for gathering time series measurements. To enable nice partitioning, beside device-id I added day-from-UTC-beginning and a bucket created on the basis of a written measurement. The time is added as a clustering key. The final key can be written as
((device-id, day-from-UTC-beginning, bucket), measurement-uuid)
Queries against this schema in majority of cases take whole rows with the given device-id and day-from-UTC-beginning using IN for buckets. Because of this query schema Leveled Compaction looked like a perfect match, as it ensures with great probability that a row is held by one SSTable.
Running incremental repair was fine, when appending to the table was disabled. Once, the repair was run under the write pressure, lots of streaming was involved. It looked like more data was streamed than was appended after the last repair.
I've tried using multiple tables, one for each day. When a day ended and no further writes were made to a given table, repair was running smoothly. I'm aware of thousands of tables overhead though it looks like it's only one feasible solution.
What's the correct way of combining Leveled Compaction with incremental repairs under heavy write scenario?
Leveled Compaction is not a good idea when you have a write heavy workload. It is better for a read/write mixed workload when read latency matters. Also if your cluster is already pressed for I/O, switching to leveled compaction will almost certainly only worsen the problem. So ensure you have SSDs.
At this time size tiered is the better choice for a write heavy workload. There are some improvements in 2.1 for this though.

Resources