whats would be the compaction strategy for performing better in Range queries on clustered column - cassandra

I have Cassandra table
CREATE TABLE schema1 (
key bigint,
lowerbound bigint,
upperbound bigint,
data blob,
PRIMARY KEY (key, lowerbound,upperbound)
) WITH COMPACT STORAGE ;
I want to perform a range query by using CQL
Select lowerbound, upperbound from schema1 where key=(some key) and lowerbound<=123 order by lowerbound desc limit 1 allow filtering;
Any Suggsetion please Regarding the compaction strategy
Note MY read:write ration is 1:1

Size-tiered compaction is the default, and should be appropriate for most use-cases. In 2012 DataStax posted an article titled When To Use Leveled Compaction, in which it specified three (main) conditions for which leveled compaction was a good idea:
High Sensitivity to Read Latency (your queries need to meet a latency SLA in the 99th percentile).
High Read/Write Ratio
Rows Are Frequently Updated
It also identifies three scenarios when leveled compaction is not a good idea:
Your Disks Can’t Handle the Compaction I/O
Write-heavy Workloads
Rows Are Write-Once
Note how none of the six scenarios I mentioned above are specific to range queries.
My question would be "what problem are you trying to fix?" You mentioned "performing better," but I have found that query performance issues tend to be more tied to data model design. Switching the compaction strategy isn't going to help much if you're running with an inefficient primary key strategy. By virtue of the fact that your query requires ALLOW FILTERING, I would say that changing compaction strategy isn't going to help much.
The DataStax docs contain a section on Slicing over partition rows, which appears to be somewhat similar to your query. Give it a look and see if it helps.

Leveled compaction will mean fewer SSTables are involved for your queries on a key, but requires extra IO. Also, during compaction it uses 10% more disk than data, while for size tiered compaction, you need double. Which is better depends on your setup, queries, etc. Are you experiencing performance problems? If not, and if I could deal with the extra IO, I might choose leveled as it means I don't have to keeps 50+% of headroom in terms of disk space for compaction. But again, there's no "one right way".
Perhaps read this:
http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra

When Rows Are Frequently Updated
From datasatx article
Whether you’re dealing with skinny rows where columns are overwritten frequently (like a “last access” timestamp in a Users column family) or wide rows where new columns are constantly added, when you update a row with size-tired compaction, it will be spread across multiple SSTables. Leveled compaction, on the other hand, keeps the number of SSTables that the row is spread across very low, even with frequent row updates.

Related

Cassandra: Is manual bucketing still needed when applying TWCS?

I am just about to start exploring Cassandra for (long term) saving time series (write only once) data, that potentially can grow quite large.
Assuming the probably most simple time series:
CREATE TABLE raw_data (
sensor uuid,
timestamp timestamp,
value int,
primary key(sensor, timestamp)
) WITH CLUSTERING ORDER BY (timestamp DESC)
To make sure, partitions don't grow too much, many posts on the internet recommend bucketing, e.g. introducing day or just an up counting bucket number like
primary key((sensor, day, bucket), timestamp)
. However, these strategies need to be managed manually what seems quite cumbersome especially for unknown number of buckets.
But, what if I say add:
AND compaction = {
'class': 'TimeWindowCompactionStrategy',
'compaction_window_size': 1,
'compaction_window_unit': 'DAYS'
};
As said e.g. in https://thelastpickle.com/blog/2016/12/08/TWCS-part1.html:
TWCS aims at simplifying DTCS by creating time windowed buckets of SSTables that are compacted with each other using the Size Tiered Compaction Strategy.
As far as I understand this means that Cassandra when using TWCS internally creates readonly buckets anyway. Thus, I am wondering if I still need to manually implement the bucketing key day?
The purpose of the bucket is to stop the partition growing too large. Without the bucket the growth of the partition is unbounded - that is, the more data you collect for a particular sensor, the larger the partition becomes, with no ultimate limit.
Changing the compaction strategy alone will not stop growth of the partition, so you would still need the bucket.
(You wrote "Cassandra when using TWCS internally creates readonly buckets". Don't confuse this with the 'bucket' column. The same word is being used for two completely different things.)
On the other hand, if you were to set a TTL on the data then this would effectively limit the size of the partition because data older than the TTL would (eventually) be deleted from disc. So, if the TTL were small enough, you would no longer need the bucket. In this particular scenario - timeseries data collected in-order and a TTL - then TWCS is the optimum compaction strategy.

Why Cassandra COUNT(*) on a specific partition takes really long on relatively small datasets

I have a table defined like:
Keyspace:
CREATE KEYSPACE messages WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'} AND durable_writes = true;
Table:
CREATE TABLE messages.textmessages (
categoryid int,
date timestamp,
messageid timeuuid,
message text,
userid int,
PRIMARY KEY ((categoryid, date), messageid)
) WITH CLUSTERING ORDER BY (messageid ASC);
The goal is to have a wide row time-series storage such that categoryid and date(beginning of day) constitutes my partition key and the messageid provides the clustering. This enables me to do queries like:
SELECT * FROM messages.textmessages WHERE categoryid=2 AND date='2019-05-14 00:00:00.000+0300' AND messageId > maxTimeuuid('2019-05-14 00:00:00.000+0300') AND messageId < minTimeuuid('2019-05-15 00:00:00.000+0300')
to get messages in a given day; it works so well so fast!
Problem
I need to be able to count the messages in a given day by substituting SELECT * above with SELECT COUNT(*). This takes very long even with a little less than 100K entries in the column family; it actually times out on cqlsh.
I have read and understood quite a bit why COUNT is an expensive operation for a distributed database like Cassandra in Counting keys? Might as well be counting stars
Question
Why would this query take so long even when:
SELECT COUNT(*) FROM messages.textmessages WHERE categoryid=2 AND date='2019-05-14 00:00:00.000+0300' AND messageId > maxTimeuuid('2019-05-14 00:00:00.000+0300') AND messageId < minTimeuuid('2019-05-15 00:00:00.000+0300')
The count is on a specific partition with less than 100K records
I have only one Cassandra node on a performant Macbook Pro
No active writes/reads in the instance; less than 20 partitions on development laptop
This is understandably caused by a common pitfall when the concept of 'everything-is-a-write' in Cassandra is overlooked and thence why tombstones happen.
When executing a scan, within or across a partition, we need to keep the tombstones seen in memory so we can return them to the coordinator, which will use them to make sure other replicas also know about the deleted rows. With workloads that generate a lot of tombstones, this can cause performance problems and even exhaust the server heap.
Thanks to #JimWartnick's suggestion on possible tombstone related latency; this was casued by overwhelming amount of tombstones generated by my inserts that had NULL fields. I did not expect this to cause tombstones, neither did I expect tombstones to be a big deal in query performance; especially the COUNT.
Solution
Use default unset values in the fields when not present or omit them altogether in the inserts/updates
Be cognisant of the below facts as outlined by Common Problems with Cassandra Tombstones - Alla Babkina
One common misconception is that tombstones only appear when the client issues DELETE statements to Cassandra. Some developers assume that it is safe to choose a way of operations which relies on Cassandra being completely tombstone free. In reality there are other many other things causing tombstones apart from issuing DELETE statements. Inserting null values, inserting collections and expiring data using TTL are common sources of tombstones.

Optimal way to insert highly-duplicated data into Cassandra

I have a set-like table:
It consists of 2 primary columns and a dummy boolean non-primary column.
The table is replicated.
I write massively into this table and very often the entry already exists in the database.
Deletion of entries happens due to TTL and sometimes (not so often) due to DELETE queries.
What is the most performant way to write values into this table?
First option:
Just blindly write values.
Second option:
Check if the value already exists and write only if it is missing.
The second approach requires one more lookup before each write but saves database capacity because it doesn't propagate unnecessary writes to the other replicas.
I would go with option 1, and then tune the compaction strategies. Option 2 will add much more load to the cluster, as reads are always slower than writes, and if in your case inserts happen when previous data still in memtable, then they will be directly overwritten (so you may consider to tune memtable as well).
If you have high read/write ration, you can go with leveled compaction - it could be more optimized for this use case. If ratio isn't very high, leave the default compaction strategy.
But in any case you'll need to tune compaction:
decrease gc_grace_period to acceptable value, depending on how fast you can bring back nodes that are down;
change table options like tombstone_compaction_interval (doc), and maybe unchecked_tombstone_compaction;
You may also tune things like, concurrent_compactors & compaction_throughput_mb_per_sec to perform more aggressive compactions.

cassandra blobs, tombstones and space reclamation

I'm trying to understand how quickly space is reclaimed in Cassandra after deletes. I've found a number of articles that describe tombstoning and the problems this can create when you are doing range queries and Cassandra has to scan through lots of tombstoned rows to find the much more scarce live ones. And I get that you can't set gc_grace_seconds too low or you will have zombie records that can pop up if a node goes offline and comes back after the tombstones disappeared off the remaining machines. That all makes sense.
However, if the tombstone is placed on the key then it should be possible for the space from rest of the row data to be reclaimed.
So my question is, for this table:
create table somedata (
category text,
id timeuuid,
data blob,
primary key ((category), id)
);
If I insert and then remove a number of records in this table and take care not to run into the tombstone+range issues described above and at length elsewhere, when will the space for those blobs be reclaimed?
In my case, the blobs may be larger than the recommended size (1mb I believe) but they should not be larger than ~15mb, which I think is still workable. But it makes a big space difference if all of those blobs stick around for 10 days (default gc_grace_seconds value) vs if only the keys stick around for 10 days.
When I looked I couldn't find this particular aspect described anywhere.
The space will be reclaimed after the gc_grace_seconds clause is done, and you will have keys and blobs sticking around. Also you'll need to consider that this may increase if you also have updates (which will be different versions of the same record identified by the timestamp of when it was created) and the replication factor used (amount of copies of the same record distributed across the nodes).
You will always have trade-offs between fault resilience and disk usage, the customization of your settings (gc_grace_seconds, ttl, replication factor, consistency level) will depend on your use case and the SLA's that you need to fulfill.

Cassandra repair - lots of streaming in case of incremental repair with Leveled Compaction enabled

I use Cassandra for gathering time series measurements. To enable nice partitioning, beside device-id I added day-from-UTC-beginning and a bucket created on the basis of a written measurement. The time is added as a clustering key. The final key can be written as
((device-id, day-from-UTC-beginning, bucket), measurement-uuid)
Queries against this schema in majority of cases take whole rows with the given device-id and day-from-UTC-beginning using IN for buckets. Because of this query schema Leveled Compaction looked like a perfect match, as it ensures with great probability that a row is held by one SSTable.
Running incremental repair was fine, when appending to the table was disabled. Once, the repair was run under the write pressure, lots of streaming was involved. It looked like more data was streamed than was appended after the last repair.
I've tried using multiple tables, one for each day. When a day ended and no further writes were made to a given table, repair was running smoothly. I'm aware of thousands of tables overhead though it looks like it's only one feasible solution.
What's the correct way of combining Leveled Compaction with incremental repairs under heavy write scenario?
Leveled Compaction is not a good idea when you have a write heavy workload. It is better for a read/write mixed workload when read latency matters. Also if your cluster is already pressed for I/O, switching to leveled compaction will almost certainly only worsen the problem. So ensure you have SSDs.
At this time size tiered is the better choice for a write heavy workload. There are some improvements in 2.1 for this though.

Resources