Cassandra UUID partition key and partition size - cassandra

Given a table
CREATE TABLE sensors_by_id (
id uuid,
time timeuuid,
some_text text,
PRIMARY KEY (id, time)
)
Will this scale when there are a lot of entries? I´m not sure, if a UUID field is sufficient as a good partition key or is there a need to create some artificial key like week_first_day or something similar?

It's really depends on how will you insert your data - if you generate the UUID really randomly for every insert, then the chance of duplicates is very low, and you'll get so-called "skinny rows" (a lot of partitions with 1 row inside). Even if you start to get the duplicates, there will be not so many for every row...

It could be a problem with partition size cause cassandra has limit for disk size per one partition.
Good rule of thumb is to keep the maximum number of rows below 100,000 items and the disk size under 100 MB.
It is easy to calculate partition size by using that formula
You can read more about data modeling here.
So in your case with current schema for 1 000 000 rows count per one partition with average size 100 byte for some_text column will be:
Number of Values: (1000000 * (3 - 2 - 0) + 0) = 1000000
Partition Size on Disk: (16 + 0 + (1000000 * 116) + (8 * 1000000))
= 124000016 bytes (118.26 Mb)
So as you can see you out of limit with 118.26 Mb per one partition. So you need optimize your partition keys.
I calculated it using my open source project - cql-calculator.

Related

How do I calculate the size of a Cassandra Partition?

CREATE TABLE IF NOT EXISTS video (key int, value int, PRIMARY KEY (key, value));
Here Partition Key is key and Clustering Key is value. No regular columns.
Assume, there are 1000000 rows in this partition.
What is the size of the partition?
To calculate the partition size, you need the following data points:
size of the partition key columns
size of static columns
size of cells in the partition (clustering + regular columns)
size of metadata overhead per row
In your case:
the size the partition key (a single int column) is 4 bytes
the size of static columns (there are none) is 0 bytes
the size of the cells (clustering int + 0 regular columns) is 4 + 0 bytes
the size of the metadata overhead is 8 bytes on average
So for 1M rows:
partition size = 4B + 0B + (4B x 1,000,000 cells) + (8B x 1,000,000,000 rows)
= 12,000,004 bytes
= 11.44 MB
Cheers!
Insert the desired number of records into your Cassandra table.
Wait for the flush to happen persisting records to the disk or invoke nodetool flush manually on your cluster node(s).
Navigate to the data directory. By default, data_file_directories will be persisting data to /var/lib/cassandra/data. Switch to <your_table_name-timeuuid> formatted directory
List the <sstable_version-Data.db> file to view its size. Note that this is just on a single node size. If you have more than one node in your cluster, you'd have to repeat the steps to calculate size on each node.
Alternatively, you could also run nodetool tablestats command on each node to understand statistics about a particular table.

cassnadra multi/single partition batch explanation

I red the cassandra docs about Good use of BATCH statement -
single partition batch example
I want to understand about multi/single partition batch.
According to the docs this is a single partition batch.
CREATE TABLE cycling.cyclist_expenses (
cyclist_name text,
balance float STATIC,
expense_id int,
amount float,
description text,
paid boolean,
PRIMARY KEY (cyclist_name, expense_id)
);
BEGIN BATCH
INSERT INTO cycling.cyclist_expenses (cyclist_name, expense_id, amount, description, paid) VALUES ('Vera ADRIAN', 2, 13.44, 'Lunch', true);
INSERT INTO cycling.cyclist_expenses (cyclist_name, expense_id, amount, description, paid) VALUES ('Vera ADRIAN', 3, 25.00, 'Dinner', false);
...
APPLY BATCH;
First partition is - 'Vera ADRIAN', 2
Second partition - 'Vera ADRIAN', 3
Could u explain pls why is it single partition batch?
In another docs I found the example of multi partition batch:
Create table shopping_chart
(cart_id UUID,item_id UUID,price Decimal, total Decimal static,
primary key ((cart_id),item_id));
insert into shopping_chart(cart_id,item_id,price,total)
values (ABC12345,ABCITEM12345,0.01,0.01);
Begin Batch
insert into shopping_chart(cart_id,item_id,price) values ( ABC12345,ABCITEM123451,1.00);
insert into shopping_chart(cart_id,item_id,price) values ( ABC12345,ABCITEM1234512,2.00);
Update …. cart_id=ABC12345 IF total =0.01;
Apply Batch;
And I can’t understand why it's a multi partition batch? Could u pls explain ? There is working only with one partition = ABC12345
First partition is - 'Vera ADRIAN', 2 Second partition - 'Vera ADRIAN', 3
Could u explain pls why is it single partition batch?
Sure. Because the expense_id is not part of the partition key. Therefore, Vera ADRIAN is the same partition key value used in both INSERTs.
For the 2nd part of your question, you're right in that the 2nd example does not appear to be a multi-partition query as the cart_ids are the same. Following your link above, I quickly found a bad use of BATCH (multi-partition): https://docs.datastax.com/en/dse/6.8/cql/cql/cql_using/useBatchBadExample.html
The single-partition batch is when your queries are targeting the same partitions - in this case, Cassandra packs all queries into a single operation (also called "mutation").
The description of second example is incorrect - it's still single-partition batch.

Performance of query with only partition key

Is the performance impacted if I provide only the partition key while querying a table containing both partition key and clustering key?
For example, for a table with partition key p1 and clustering key c1, would
SELECT * FROM table1 where p1 = 'abc';
be less efficient than
SELECT * FROM table1 where p1 = 'abc' and c1 >= 'some range start value' and c1 <= 'some range end value';
My goal is to fetch all rows with p1 = 'abc'.
Main cost in going to particular row vs a particular partition is that theres an extra work and necessity of deserializing the clustering key index at the beginning of the partition. Its a bit old and based on thrift but the gist of it remains true in the following:
http://thelastpickle.com/blog/2011/07/04/Cassandra-Query-Plans.html
(note: row level bloom filter was removed)
When reading from a beginning of a partition you can save a little work which will improve the latency.
I wouldn't worry too much about it as long as your queries are not spanning multiple partitions. Then you will generally only have issues if the partitions get to be hundreds of mb or gb's in size.

Slow range queries in Cassandra

I am working on a single node. I have the following table to store a list of documents:
CREATE TABLE my_keyspace.document (
status text,
date timestamp,
doc_id text,
raw_content text,
title text,
PRIMARY KEY (status, date, doc_id)
) WITH CLUSTERING ORDER BY (date ASC, doc_id ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 0
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
CREATE INDEX doc_id_idx ON my_keyspace.document (doc_id);
I am doing a lot of queries like:
SELECT * FROM my_keyspace.document WHERE status='PROCESSED' AND data>=start_date AND data<=end_date;
For some reason it is very slow, at first the warnings that I had were this:
[2016-07-26 18:10:46] {cassandra.protocol:378} WARNING - Server warning: Read 5000 live rows and 19999 tombstone cells for query SELECT * FROM my_keyspace.document WHERE token(status) >= token(PROCESSED) AND token(status) <= token(PROCESSED) AND date >= 2016-07-08 02:00+0200 AND date <= 2016-07-23 01:59+0200 LIMIT 5000 (see tombstone_warn_threshold)
[2016-07-26 18:10:52] {cassandra.protocol:378} WARNING - Server warning: Read 5000 live rows and 19999 tombstone cells for query SELECT * FROM my_keyspace.document WHERE token(status) >= token(PROCESSED) AND token(status) <= token(PROCESSED) AND date >= 2016-07-08 02:00+0200 AND date <= 2016-07-23 01:59+0200 LIMIT 5000 (see tombstone_warn_threshold)
Thinking the issue was linked to having too many tombestones I did:
ALTER TABLE my_keyspace.document WITH gc_grace_seconds = '0';
and then:
nodetool compact my_keyspace document
Now I don't have any warning but the queries are still very slow and often timeout. No message is displayed in any logs concerning the timeout.The number of documents I am having is roughly 200k instances. Those documents are distributed over a 20 days period, about 4500 documents have status='PROCESSED' each day. The queries answer time vary depending of the date range: about 3 seconds for a one day time range, 15 secs for 4 days and timeout for 2 weeks. Also, I disabled the swap. The version of Cassandra I am using is 3.5.
Recently I've noticed that giving the precise columns to extract instead of * is improving the response time a bit, but the system is still too slow.
EDIT: Computing partition size as proposed by Reveka
So, following the formula:
Number of rows = 20 * 4500 = 90,000
Number of columns = 19
Number of primary keys = 3
Number of static column = 0
So the number of values is 90000*(19-3)=1,440,000
For the size of the partition, I got to an estimate of about 1.2GB.
This might be a bit big. But how can I modify my partition key to still be able to do the same range queries while having smaller partitions? I could have a composite partition key containing the status and the day extracted from date, but wouldn't I have to then specify the day before being able to query by range:
SELECT * FROM my_keyspace.document WHERE status='PROCESSED' AND day='someday' AND date>='start_date' AND date<='end_date';
Which forces me to do one query per day.
I see that your primary key consists of status, date and doc_id and you only use status as your partition key. That means that all the documents of the same status regardless of date will be put in the same partition. I guess that is a lot of information for one partition. Cassandra works well in partitions that are 100MB (or a couple of hundred MB in later versions) big, see here. Datastax D220 cource (it is free you just need to create an account) has a video that shows you how to calculate your partition size. You can post the results to your analysis so we can further help you. :)
EDIT: After the size analysis
You will have to make your partition by date in order to have smaller partition. That means that now you will not be able to query by range. A workaround for this would be to do multiple queries based on the range you want. For example: if you want to do a query for range 12 August to 14 August you split by day and do three queries, one for 12 August, one for 13 and one for 14. Again though, if your range is big you will end up retrieving gb of data. I do not know your use case, but I am going to make a guess that you don't need gb worth of files everytime you do a date range query. Can you give me more info on your use case (a.k.a what do you want to do?)
ps. I can't write comments yet so I can only advice you through this answer

fetching timeseries/range data in cassandra

I am new to Cassandra and trying to see if it fits my data query needs. I am populating test data in a table and fetching them using cql client in Golang.
I am storing time series data in Cassandra, sorted by timestamp. I store data on a per-minute basis.
Schema is like this:
parent: string
child: string
bytes: int
val2: int
timestamp: date/time
I need to answer queries where a timestamp range is provided and a childname is given. The result needs to be the bytes value in that time range(Single value, not series) I made a primary key(child, timestamp). I followed this approach rather than the column-family, comparator-type with timeuuid since that was not supported in cql.
Since the data stored in every timestamp(every minute) is the accumulated value, when I get a range query for time t1 to t2, I need to find the bytes value at t2, bytes value at t1 and subtract the 2 values before returning. This works fine if t1 and t2 actually had entries in the table. If they do not, I need to find those times between (t1, t2) that have data and return the difference.
One approach I can think of is to "select * from tablename WHERE timestamp <= t2 AND timestamp >= t1;" and then find the difference between the first and last entry in this array of rows returned. Is this the best way to do it? Since MIN and MAX queries are not supported, is there is a way to find the maximum timestamp in the table less than a given value? Thanks for your time.
Are you storing each entry as a new row with a different partition key(first column in the Primary key)? If so, select * from x where f < a and f > b is a cluster wide query, which will cause you problems. Consider adding a "fake" partition key, or use a partition key per date / week / month etc. so that your queries hit a single partition.
Also, your queries in cassandra are >= and <= even if you specify > and <. If you need strictly greater than or less than, you'll need to filter client side.

Resources