Using partition key along with secondary index

Using partition key along with secondary index - cassandra

Following are the two queries that I need to perform.
select * from where dept = 100 and emp_id = 1;
select * from where dept = 100 and name = 'One';
Which of the below options is better ?
Option 1: Use secondary index along with a partition key. I assume this way query will be executed faster as there is no need to go different nodes and index needs to be searched only locally.
cqlsh:d2> desc table emp_by_dept;
CREATE TABLE d2.emp_by_dept (
dept int,
emp_id int,
name text,
PRIMARY KEY (dept, emp_id)
) WITH CLUSTERING ORDER BY (emp_id ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
AND comment = ''
AND compaction = {'min_threshold': '4', 'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32'}
AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';
CREATE INDEX emp_by_dept_name_idx ON d2.emp_by_dept (name);
cqlsh:d2> select * from emp_by_dept where dept = 100;
dept | emp_id | name
------+--------+------
100 | 1 | One
100 | 2 | Two
100 | 10 | Ten
(3 rows)
activity | timestamp | source | source_elapsed
-------------------------------------------------------------------------------------------------+----------------------------+-----------+----------------
Execute CQL3 query | 2015-06-15 17:36:55.860000 | 10.0.2.16 | 0
Parsing select * from emp_by_dept where dept = 100; [SharedPool-Worker-1] | 2015-06-15 17:36:55.861000 | 10.0.2.16 | 202
Preparing statement [SharedPool-Worker-1] | 2015-06-15 17:36:55.861000 | 10.0.2.16 | 418
Executing single-partition query on emp_by_dept [SharedPool-Worker-3] | 2015-06-15 17:36:55.871000 | 10.0.2.16 | 10525
Acquiring sstable references [SharedPool-Worker-3] | 2015-06-15 17:36:55.871000 | 10.0.2.16 | 10564
Merging memtable tombstones [SharedPool-Worker-3] | 2015-06-15 17:36:55.871000 | 10.0.2.16 | 10635
Key cache hit for sstable 1 [SharedPool-Worker-3] | 2015-06-15 17:36:55.871000 | 10.0.2.16 | 10748
Seeking to partition beginning in data file [SharedPool-Worker-3] | 2015-06-15 17:36:55.871000 | 10.0.2.16 | 10757
Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-3] | 2015-06-15 17:36:55.879000 | 10.0.2.16 | 18141
Merging data from memtables and 1 sstables [SharedPool-Worker-3] | 2015-06-15 17:36:55.879000 | 10.0.2.16 | 18166
Read 3 live and 0 tombstoned cells [SharedPool-Worker-3] | 2015-06-15 17:36:55.879000 | 10.0.2.16 | 18335
Request complete | 2015-06-15 17:36:55.928174 | 10.0.2.16 | 68174
cqlsh:d2> select * from emp_by_dept where dept = 100 and name = 'One';
dept | emp_id | name
------+--------+------
100 | 1 | One
(1 rows)
Tracing session: c56e70a0-1357-11e5-ab8b-fb5400f1b4af
activity | timestamp | source | source_elapsed
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+-----------+----------------
Execute CQL3 query | 2015-06-15 17:42:20.010000 | 10.0.2.16 | 0
Parsing select * from emp_by_dept where dept = 100 and name = 'One'; [SharedPool-Worker-1] | 2015-06-15 17:42:20.010000 | 10.0.2.16 | 12
Preparing statement [SharedPool-Worker-1] | 2015-06-15 17:42:20.010000 | 10.0.2.16 | 19
Computing ranges to query [SharedPool-Worker-1] | 2015-06-15 17:42:20.011000 | 10.0.2.16 | 881
Candidate index mean cardinalities are CompositesIndexOnRegular{columnDefs=[ColumnDefinition{name=name, type=org.apache.cassandra.db.marshal.UTF8Type, kind=REGULAR, componentIndex=1, indexName=emp_by_dept_name_idx, indexType=COMPOSITES}]}:1. Scanning with emp_by_dept.emp_by_dept_name_idx. [SharedPool-Worker-1] | 2015-06-15 17:42:20.011000 | 10.0.2.16 | 1144
Submitting range requests on 1 ranges with a concurrency of 1 (0.003515625 rows per range expected) [SharedPool-Worker-1] | 2015-06-15 17:42:20.011000 | 10.0.2.16 | 1238
Executing indexed scan for [100, 100] [SharedPool-Worker-2] | 2015-06-15 17:42:20.011000 | 10.0.2.16 | 1703
Candidate index mean cardinalities are CompositesIndexOnRegular{columnDefs=[ColumnDefinition{name=name, type=org.apache.cassandra.db.marshal.UTF8Type, kind=REGULAR, componentIndex=1, indexName=emp_by_dept_name_idx, indexType=COMPOSITES}]}:1. Scanning with emp_by_dept.emp_by_dept_name_idx. [SharedPool-Worker-2] | 2015-06-15 17:42:20.012000 | 10.0.2.16 | 1827
Candidate index mean cardinalities are CompositesIndexOnRegular{columnDefs=[ColumnDefinition{name=name, type=org.apache.cassandra.db.marshal.UTF8Type, kind=REGULAR, componentIndex=1, indexName=emp_by_dept_name_idx, indexType=COMPOSITES}]}:1. Scanning with emp_by_dept.emp_by_dept_name_idx. [SharedPool-Worker-2] | 2015-06-15 17:42:20.012000 | 10.0.2.16 | 1929
Executing single-partition query on emp_by_dept.emp_by_dept_name_idx [SharedPool-Worker-2] | 2015-06-15 17:42:20.012000 | 10.0.2.16 | 2058
Acquiring sstable references [SharedPool-Worker-2] | 2015-06-15 17:42:20.012000 | 10.0.2.16 | 2087
Merging memtable tombstones [SharedPool-Worker-2] | 2015-06-15 17:42:20.012000 | 10.0.2.16 | 2173
Key cache hit for sstable 1 [SharedPool-Worker-2] | 2015-06-15 17:42:20.012000 | 10.0.2.16 | 2352
Seeking to partition indexed section in data file [SharedPool-Worker-2] | 2015-06-15 17:42:20.012001 | 10.0.2.16 | 2377
Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-2] | 2015-06-15 17:42:20.014000 | 10.0.2.16 | 4300
Merging data from memtables and 1 sstables [SharedPool-Worker-2] | 2015-06-15 17:42:20.014000 | 10.0.2.16 | 4322
Submitted 1 concurrent range requests covering 1 ranges [SharedPool-Worker-1] | 2015-06-15 17:42:20.031000 | 10.0.2.16 | 21798
Read 1 live and 0 tombstoned cells [SharedPool-Worker-2] | 2015-06-15 17:42:20.032000 | 10.0.2.16 | 21989
Executing single-partition query on emp_by_dept [SharedPool-Worker-2] | 2015-06-15 17:42:20.032000 | 10.0.2.16 | 22374
Acquiring sstable references [SharedPool-Worker-2] | 2015-06-15 17:42:20.032000 | 10.0.2.16 | 22385
Merging memtable tombstones [SharedPool-Worker-2] | 2015-06-15 17:42:20.032000 | 10.0.2.16 | 22433
Key cache hit for sstable 1 [SharedPool-Worker-2] | 2015-06-15 17:42:20.032000 | 10.0.2.16 | 22514
Seeking to partition indexed section in data file [SharedPool-Worker-2] | 2015-06-15 17:42:20.032000 | 10.0.2.16 | 22523
Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-2] | 2015-06-15 17:42:20.033000 | 10.0.2.16 | 22963
Merging data from memtables and 1 sstables [SharedPool-Worker-2] | 2015-06-15 17:42:20.033000 | 10.0.2.16 | 22972
Read 1 live and 0 tombstoned cells [SharedPool-Worker-2] | 2015-06-15 17:42:20.033000 | 10.0.2.16 | 22991
Scanned 1 rows and matched 1 [SharedPool-Worker-2] | 2015-06-15 17:42:20.033000 | 10.0.2.16 | 23096
Request complete | 2015-06-15 17:42:20.033227 | 10.0.2.16 | 23227
Option 2: Create 2 tables as below.
CREATE TABLE d2.emp_by_dept (
dept int,
emp_id int,
name text,
PRIMARY KEY (dept, emp_id)
) WITH CLUSTERING ORDER BY (emp_id ASC);
select * from emp_by_dept where dept = 100 and emp_id = 1;
CREATE TABLE d2.emp_by_dept_name (
dept int,
emp_id int,
name text,
PRIMARY KEY (dept, name)
) WITH CLUSTERING ORDER BY (name ASC);
select * from emp_by_dept_name where dept = 100 and name = 'One';

Normally it is a good approach to use secondary indexes together with the partition key, because - as you say - the secondary key lookup can be performed on a single machine.
The other concept that needs to be taken into account is the cardinality of the secondary index. In your case emp_id is probably unique, and name is almost unique, so the index will most probably return a single row, and therefore it is not too efficient. For a good explanation I recommend this article: http://www.wentnet.com/blog/?p=77.
As consequence, if query time is critical and you can update both tables in the same time, I recommend using your option 2.
It would also be interesting to measure the two options with some generated data.

Option one won't be possible, as Cassandra does not support queries using both primary keys and secondary keys. Your best bet, would be to go with option two.
Although the similarities are many, don't think of it as a 'relational table'. Instead think of it as a nested, sorted map data structure.
Cassandra believes in de-normalization and duplication of data for better read performance. Therefore, option 2 is completely normal and within the best practices of Cassandra.
Few links which you might find useful - http://www.ebaytechblog.com/2012/07/16/cassandra-data-modeling-best-practices-part-1/
How do secondary indexes work in Cassandra?
Hope this helps.

Since maintaining two tables is harder than maintaining a single, the first option would be more preferable.

Query1 = select * from <> where dept = 100 and emp_id = 1;
Query2 = select * from <> where dept = 100 and name = 'One';
Option 1:
Write : time to write to emp_by_dept + time to update index
Read : Query1 will be a direct read from emp_by_dept, Query2 will be a read from emp_by_dept + get the location from index table + read the value from emp_by_dept
Option 2:
Write : time to write to emp_by_dept + time to write to emp_by_dept_name
Read: Query1 will be a direct read from emp_by_dept, Query2 will be a direct read from emp_by_dept_name (the required data is already sorted and kept )
So I assume write time should be almost the same in both cases (I have not tested this)
If your read response time is more important, then go for Option2.
If you are worried about maintaining 2 tables, go for option 1.
Thanks everyone for your inputs.

Related

Cassandra OOM with large RangeTombstoneList objects in memory

We have had problems with our cassandra nodes going OOM for some time, so finally we got them configured so that we could get a heap dump to try and see what was causing the OOM.
In the dump there where 16 threads (named SharedPool-Worker-XX) each executing a SliceFromReadCommand from the same table. Each of the 16 threads had a RangeTombstoneList object retaining between 200 and 240mb of memory.
The table in question are used as a queue (I know, far from the ideal use case for cassandra, but that is how it is) between two applications, where one writes and the other reads. So it is not unlikely that there is a large number of tombstones in the table...how ever I have been unable to find them.
I did a trace on the query issued against the table, and it resulted in the following:
cqlsh:ddp> select json_data
... from file_download
... where ti = 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX';
ti | uuid | json_data
----+------+-----------
(0 rows)
Tracing session: b2f2be60-01c8-11e8-ae90-fd18acdea80d
activity | timestamp | source | source_elapsed
------------------------------------------------------------------------------------------------------------------------------------+----------------------------+--------------+----------------
Execute CQL3 query | 2018-01-25 12:10:14.214000 | 10.60.73.232 | 0
Parsing select json_data\nfrom file_download\nwhere ti = 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'; [SharedPool-Worker-1] | 2018-01-25 12:10:14.260000 | 10.60.73.232 | 105
Preparing statement [SharedPool-Worker-1] | 2018-01-25 12:10:14.262000 | 10.60.73.232 | 197
Executing single-partition query on file_download [SharedPool-Worker-4] | 2018-01-25 12:10:14.263000 | 10.60.73.232 | 442
Acquiring sstable references [SharedPool-Worker-4] | 2018-01-25 12:10:14.264000 | 10.60.73.232 | 491
Merging memtable tombstones [SharedPool-Worker-4] | 2018-01-25 12:10:14.265000 | 10.60.73.232 | 517
Bloom filter allows skipping sstable 2444 [SharedPool-Worker-4] | 2018-01-25 12:10:14.270000 | 10.60.73.232 | 608
Bloom filter allows skipping sstable 8 [SharedPool-Worker-4] | 2018-01-25 12:10:14.271000 | 10.60.73.232 | 665
Skipped 0/2 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-4] | 2018-01-25 12:10:14.273000 | 10.60.73.232 | 700
Merging data from memtables and 0 sstables [SharedPool-Worker-4] | 2018-01-25 12:10:14.274000 | 10.60.73.232 | 707
Read 0 live and 0 tombstone cells [SharedPool-Worker-4] | 2018-01-25 12:10:14.274000 | 10.60.73.232 | 754
Request complete | 2018-01-25 12:10:14.215148 | 10.60.73.232 | 1148
cfstats also shows avg./max tombstones pr. slice to be 0.
This makes me question whether or not the OOM was actually due to large amount of tombstones or something else?
We run cassandra v2.1.17

Counting partition size in cassandra

I'm implementing unique entry counter in Cassandra. The counter may be represented just as a set of tuples:
counter_id = broadcast:12345, token = user:123
counter_id = broadcast:12345, token = user:321
where value for counter broadcast:12345 may be counted as size of corresponding entries set. Such counter can be effectively stored as a table with counter_id being partition key. My first thought was that since single counter value is basically size of partition, i can do count(1) WHERE counter_id = ? query, which won't need to read data and would be super-duper fast. However, i see following trace output:
cqlsh > select count(1) from token_counter_storage where id = '1';
activity | timestamp | source | source_elapsed
-------------------------------------------------------------------------------------------------+----------------------------+------------+----------------
Execute CQL3 query | 2016-06-10 11:22:42.809000 | 172.17.0.2 | 0
Parsing select count(1) from token_counter_storage where id = '1'; [SharedPool-Worker-1] | 2016-06-10 11:22:42.809000 | 172.17.0.2 | 260
Preparing statement [SharedPool-Worker-1] | 2016-06-10 11:22:42.810000 | 172.17.0.2 | 565
Executing single-partition query on token_counter_storage [SharedPool-Worker-2] | 2016-06-10 11:22:42.810000 | 172.17.0.2 | 1256
Acquiring sstable references [SharedPool-Worker-2] | 2016-06-10 11:22:42.810000 | 172.17.0.2 | 1350
Skipped 0/0 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-2] | 2016-06-10 11:22:42.810000 | 172.17.0.2 | 1465
Merging data from memtables and 0 sstables [SharedPool-Worker-2] | 2016-06-10 11:22:42.810000 | 172.17.0.2 | 1546
Read 10 live and 0 tombstone cells [SharedPool-Worker-2] | 2016-06-10 11:22:42.811000 | 172.17.0.2 | 1826
Request complete | 2016-06-10 11:22:42.811410 | 172.17.0.2 | 2410
I guess that this trace confirms data being read from disk. Am i right in this conclusion, and if yes, is there any way to simply fetch partition size using index without any excessive disk hits?

Fetching data from Cassandra is too slow

We have a cluster of 3 cassandra nodes. All nodes are working fine, but fetching results is EXTREMELY slow. I run a SELECT-query in cql-shell to fetch ~100k of rows and before starting to show me first results it may take up to 30 seconds to warm-up.
Why it may happen? Is there any way to speed it up?
Here is a trace log:
activity | timestamp | source | source_elapsed
-------------------------------------------------------------------------------------------------------------------------------------------------------+--------------+----------------+----------------
execute_cql3_query | 17:41:34,923 | *.*.*.211 | 0
Parsing SELECT * FROM ga_page_visits WHERE site_global_key='GLOBAL_KEY' AND reported_at < '2014-05-17' AND reported_at > '2014-05-01' LIMIT 100000; | 17:41:34,923 | *.*.*.211 | 87
Preparing statement | 17:41:34,923 | *.*.*.211 | 290
Sending message to /*.*.*.213 | 17:41:34,924 | *.*.*.211 | 1579
Sending message to /*.*.*.212 | 17:41:34,924 | *.*.*.211 | 1617
Executing single-partition query on users | 17:41:34,924 | *.*.*.211 | 1666
Message received from /*.*.*.211 | 17:41:34,925 | *.*.*.213 | 48
Message received from /*.*.*.211 | 17:41:34,925 | *.*.*.212 | 40
Acquiring sstable references | 17:41:34,925 | *.*.*.211 | 1704
Merging memtable tombstones | 17:41:34,925 | *.*.*.211 | 1767
Key cache hit for sstable 2 | 17:41:34,925 | *.*.*.211 | 1886
Seeking to partition beginning in data file | 17:41:34,925 | *.*.*.211 | 1908
Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones | 17:41:34,925 | *.*.*.211 | 2337
Merging data from memtables and 1 sstables | 17:41:34,925 | *.*.*.211 | 2366
Read 1 live and 0 tombstoned cells | 17:41:34,925 | *.*.*.211 | 2410
Executing single-partition query on users | 17:41:34,926 | *.*.*.213 | 702
Executing single-partition query on users | 17:41:34,926 | *.*.*.212 | 813
Acquiring sstable references | 17:41:34,926 | *.*.*.213 | 747
Acquiring sstable references | 17:41:34,926 | *.*.*.212 | 848
Merging memtable tombstones | 17:41:34,926 | *.*.*.213 | 838
Merging memtable tombstones | 17:41:34,926 | *.*.*.212 | 922
Key cache hit for sstable 5 | 17:41:34,926 | *.*.*.213 | 1006
Key cache hit for sstable 1 | 17:41:34,926 | *.*.*.212 | 1044
Seeking to partition beginning in data file | 17:41:34,926 | *.*.*.213 | 1034
Seeking to partition beginning in data file | 17:41:34,926 | *.*.*.212 | 1066
Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones | 17:41:34,927 | *.*.*.213 | 1635
Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones | 17:41:34,927 | *.*.*.212 | 1543
Merging data from memtables and 1 sstables | 17:41:34,927 | *.*.*.213 | 1676
Merging data from memtables and 1 sstables | 17:41:34,927 | *.*.*.212 | 1571
Read 1 live and 0 tombstoned cells | 17:41:34,927 | *.*.*.213 | 1760
Read 1 live and 0 tombstoned cells | 17:41:34,927 | *.*.*.212 | 1634
Enqueuing response to /*.*.*.211 | 17:41:34,927 | *.*.*.213 | 2014
Enqueuing response to /*.*.*.211 | 17:41:34,927 | *.*.*.212 | 1846
Sending message to /*.*.*.211 | 17:41:34,927 | *.*.*.213 | 2173
Sending message to /*.*.*.211 | 17:41:34,927 | *.*.*.212 | 1960
Message received from /*.*.*.213 | 17:41:34,928 | *.*.*.211 | 4759
Message received from /*.*.*.212 | 17:41:34,928 | *.*.*.211 | 4775
Processing response from /*.*.*.213 | 17:41:34,928 | *.*.*.211 | 5439
Processing response from /*.*.*.212 | 17:41:34,928 | *.*.*.211 | 5668
Sending message to /*.*.*.40 | 17:41:34,929 | *.*.*.211 | 5954
Executing single-partition query on ga_page_visits | 17:41:34,930 | *.*.*.211 | 6792
Acquiring sstable references | 17:41:34,930 | *.*.*.211 | 6868
Merging memtable tombstones | 17:41:34,930 | *.*.*.211 | 6936
Key cache hit for sstable 2 | 17:41:34,930 | *.*.*.211 | 7075
Seeking to partition indexed section in data file | 17:41:34,930 | *.*.*.211 | 7102
Key cache hit for sstable 1 | 17:41:34,930 | *.*.*.211 | 7211
Seeking to partition indexed section in data file | 17:41:34,930 | *.*.*.211 | 7232
Skipped 1/3 non-slice-intersecting sstables, included 0 due to tombstones | 17:41:34,930 | *.*.*.211 | 7290
Merging data from memtables and 2 sstables | 17:41:34,930 | *.*.*.211 | 7311
Message received from /*.*.*.211 | 17:41:34,964 | *.*.*.40 | 45
Executing single-partition query on users | 17:41:34,965 | *.*.*.40 | 658
Acquiring sstable references | 17:41:34,965 | *.*.*.40 | 695
Merging memtable tombstones | 17:41:34,965 | *.*.*.40 | 762
Key cache hit for sstable 14 | 17:41:34,965 | *.*.*.40 | 852
Seeking to partition beginning in data file | 17:41:34,965 | *.*.*.40 | 876
Key cache hit for sstable 16 | 17:41:34,965 | *.*.*.40 | 1275
Seeking to partition beginning in data file | 17:41:34,965 | *.*.*.40 | 1293
Skipped 0/2 non-slice-intersecting sstables, included 0 due to tombstones | 17:41:34,966 | *.*.*.40 | 1504
Merging data from memtables and 2 sstables | 17:41:34,966 | *.*.*.40 | 1526
Read 1 live and 0 tombstoned cells | 17:41:34,966 | *.*.*.40 | 1592
Enqueuing response to /*.*.*.211 | 17:41:34,966 | *.*.*.40 | 1755
Sending message to /*.*.*.211 | 17:41:34,966 | *.*.*.40 | 1834
Message received from /*.*.*.40 | 17:41:34,980 | *.*.*.211 | 57679
Processing response from /*.*.*.40 | 17:41:34,981 | *.*.*.211 | 57785
Read 99880 live and 0 tombstoned cells | 17:41:36,315 | *.*.*.211 | 1392676
Request complete | 17:41:37,541 | *.*.*.211 | 2618390
The table schema is:
CREATE TABLE ga_page_visits (
site_global_key ascii,
reported_at timestamp,
timeuuid ascii,
bounces int,
campaign text,
channel ascii,
conversions int,
device ascii,
keyword text,
medium text,
new_visits int,
page_id int,
page_views int,
referral_path text,
site_search_engine_id int,
social_network text,
source text,
value decimal,
visits int,
PRIMARY KEY (site_global_key, reported_at, timeuuid)
) WITH
bloom_filter_fp_chance=0.010000 AND
caching='KEYS_ONLY' AND
comment='' AND
dclocal_read_repair_chance=0.000000 AND
gc_grace_seconds=864000 AND
index_interval=128 AND
read_repair_chance=0.100000 AND
replicate_on_write='true' AND
populate_io_cache_on_flush='false' AND
default_time_to_live=0 AND
speculative_retry='99.0PERCENTILE' AND
memtable_flush_period_in_ms=0 AND
compaction={'class': 'SizeTieredCompactionStrategy'} AND
compression={'sstable_compression': 'LZ4Compressor'};

I suspect you've got an issue with your data model. It looks like you're pushing in an absolute ton of data into a single partition. That will become a problem as your data set gets bigger. I recommend this type of structure instead:
CREATE TABLE ga_page_visits (
site_global_key ascii,
day timestamp,
timeuuid ascii,
bounces int,
campaign text,
channel ascii,
conversions int,
device ascii,
keyword text,
medium text,
new_visits int,
page_id int,
page_views int,
referral_path text,
site_search_engine_id int,
social_network text,
source text,
value decimal,
visits int,
PRIMARY KEY ((site_global_key, day), timeuuid)
) WITH
bloom_filter_fp_chance=0.010000 AND
caching='KEYS_ONLY' AND
comment='' AND
dclocal_read_repair_chance=0.000000 AND
gc_grace_seconds=864000 AND
index_interval=128 AND
read_repair_chance=0.100000 AND
replicate_on_write='true' AND
populate_io_cache_on_flush='false' AND
default_time_to_live=0 AND
speculative_retry='99.0PERCENTILE' AND
memtable_flush_period_in_ms=0 AND
compaction={'class': 'SizeTieredCompactionStrategy'} AND
compression={'sstable_compression': 'LZ4Compressor'};
You already have a timeuuid, that has a time embedded in it, so you don't need a reported_at timestamp.
You are going to have a ridiculous number of rows in this partition. You should use a composite primary key of site_global_key & the date (do not add the time, let it be 00:00:00.)
This way, each day will live on a separate partition and is more easily queried. Do 1 query for each day. This will perform much better.

As I'm looking at your model requirements, this is really looking like a data warehouse type design. Jon nailed it with the rollups. If you need low latency, high frequency creating those tighter views of your data is done often in production with high scaling requirements. I don't want to spray out links but there are examples of this if you google.

Cassandra query very slow when database is large

I have a table with 45 million keys.
Compaction strategy is LCS.
Single node Cassandra version 2.0.5.
No key or row caching.
Queries are very slow. They were several times faster when the database was smaller.
activity | timestamp | source | source_elapsed
-----------------------------------------------------------------------------------------------+--------------+-------------+----------------
execute_cql3_query | 17:47:07,891 | 10.72.9.151 | 0
Parsing select * from object_version_info where key = 'test10-Client9_99900' LIMIT 10000; | 17:47:07,891 | 10.72.9.151 | 80
Preparing statement | 17:47:07,891 | 10.72.9.151 | 178
Executing single-partition query on object_version_info | 17:47:07,893 | 10.72.9.151 | 2513
Acquiring sstable references | 17:47:07,893 | 10.72.9.151 | 2539
Merging memtable tombstones | 17:47:07,893 | 10.72.9.151 | 2597
Bloom filter allows skipping sstable 1517 | 17:47:07,893 | 10.72.9.151 | 2652
Bloom filter allows skipping sstable 1482 | 17:47:07,893 | 10.72.9.151 | 2677
Partition index with 0 entries found for sstable 1268 | 17:47:08,560 | 10.72.9.151 | 669935
Seeking to partition beginning in data file | 17:47:08,560 | 10.72.9.151 | 669956
Skipped 0/3 non-slice-intersecting sstables, included 0 due to tombstones | 17:47:09,411 | 10.72.9.151 | 1520279
Merging data from memtables and 1 sstables | 17:47:09,411 | 10.72.9.151 | 1520302
Read 1 live and 0 tombstoned cells | 17:47:09,411 | 10.72.9.151 | 1520351
Request complete | 17:47:09,411 | 10.72.9.151 | 1520615

what are cqlsh tracing entries meaning?

activity | timestamp | source | source_elapsed
------------------------------------------------------------------------------------------------------+--------------+---------------+----------------
execute_cql3_query | 06:30:52,479 | 192.168.11.23 | 0
Parsing select adid from userlastadevents where userid = '90000012' and type in (1,2,3) LIMIT 10000; | 06:30:52,479 | 192.168.11.23 | 44
Peparing statement | 06:30:52,479 | 192.168.11.23 | 146
Executing single-partition query on userlastadevents | 06:30:52,480 | 192.168.11.23 | 665
Acquiring sstable references | 06:30:52,480 | 192.168.11.23 | 680
Executing single-partition query on userlastadevents | 06:30:52,480 | 192.168.11.23 | 696
Acquiring sstable references | 06:30:52,480 | 192.168.11.23 | 704
Merging memtable tombstones | 06:30:52,480 | 192.168.11.23 | 706
Merging memtable tombstones | 06:30:52,480 | 192.168.11.23 | 721
Bloom filter allows skipping sstable 37398 | 06:30:52,480 | 192.168.11.23 | 758
Bloom filter allows skipping sstable 37426 | 06:30:52,480 | 192.168.11.23 | 762
Bloom filter allows skipping sstable 35504 | 06:30:52,480 | 192.168.11.23 | 768
Bloom filter allows skipping sstable 36671 | 06:30:52,480 | 192.168.11.23 | 771
Merging data from memtables and 0 sstables | 06:30:52,480 | 192.168.11.23 | 777
Merging data from memtables and 0 sstables | 06:30:52,480 | 192.168.11.23 | 780
Executing single-partition query on userlastadevents | 06:30:52,480 | 192.168.11.23 | 782
Acquiring sstable references | 06:30:52,480 | 192.168.11.23 | 791
Read 0 live and 0 tombstoned cells | 06:30:52,480 | 192.168.11.23 | 797
Read 0 live and 0 tombstoned cells | 06:30:52,480 | 192.168.11.23 | 800
Merging memtable tombstones | 06:30:52,480 | 192.168.11.23 | 815
Bloom filter allows skipping sstable 37432 | 06:30:52,480 | 192.168.11.23 | 857
Bloom filter allows skipping sstable 36918 | 06:30:52,480 | 192.168.11.23 | 866
Merging data from memtables and 0 sstables | 06:30:52,480 | 192.168.11.23 | 874
Read 0 live and 0 tombstoned cells | 06:30:52,480 | 192.168.11.23 | 898
Request complete | 06:30:52,479 | 192.168.11.23 | 990
Above is the tracing output from cassandra cqlsh for a single query, but I couln't understand some the entries, at first the column "source_elapsed" what does it mean, does it mean time elapsed to execute particular task or cumulative time elapsed up to this task. second "timestamp" doesn't maintain chronology like "Request Complete" timestamp is 06:30:52,479 but "Merging data from memtables and 0 sstables" is 06:30:52,480 which is suppose to happen earlier but timestamp shows it happens later.
And couldn't understand some of the activities as well,
Executing single-partition query -- doesn't it mean all the task as a whole or is it a starting point? what are the job it includes? And why is it repeating three times? is it link to replication factor.
Acquiring sstable references -- What does it mean, does it checks all the sstable's bloom filters whether that contains a particular key we search for? An then find the reference in data file with the help of "Partition Index".
Bloom filter allows skipping sstable -- when does it happen? How does it happen? it is taking same amount of time of finding sstable references.
Request complete -- what does it mean? is it the finishing line or it is some job which takes most amount of time?

Did you see the request tracing in Cassandra link that explains different tracing scenarios?
source_elapsed: is the cumulative execution time on a specific node (if you check the above link it will be clearer)
Executing single-partition query: (seems to represent) the start time
Request complete: all work has been done for this request
For the rest you'd be better off reading the Reads in Cassandra docs as that would be much more detailed than I could summarize it here.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Using partition key along with secondary index - cassandra

Since maintaining two tables is harder than maintaining a single, the first option would be more preferable.

Related

Cassandra OOM with large RangeTombstoneList objects in memory

Counting partition size in cassandra

Fetching data from Cassandra is too slow

Cassandra query very slow when database is large

what are cqlsh tracing entries meaning?

Categories

Resources