Using partition key along with secondary index - cassandra
Following are the two queries that I need to perform.
select * from where dept = 100 and emp_id = 1;
select * from where dept = 100 and name = 'One';
Which of the below options is better ?
Option 1: Use secondary index along with a partition key. I assume this way query will be executed faster as there is no need to go different nodes and index needs to be searched only locally.
cqlsh:d2> desc table emp_by_dept;
CREATE TABLE d2.emp_by_dept (
dept int,
emp_id int,
name text,
PRIMARY KEY (dept, emp_id)
) WITH CLUSTERING ORDER BY (emp_id ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
AND comment = ''
AND compaction = {'min_threshold': '4', 'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32'}
AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';
CREATE INDEX emp_by_dept_name_idx ON d2.emp_by_dept (name);
cqlsh:d2> select * from emp_by_dept where dept = 100;
dept | emp_id | name
------+--------+------
100 | 1 | One
100 | 2 | Two
100 | 10 | Ten
(3 rows)
activity | timestamp | source | source_elapsed
-------------------------------------------------------------------------------------------------+----------------------------+-----------+----------------
Execute CQL3 query | 2015-06-15 17:36:55.860000 | 10.0.2.16 | 0
Parsing select * from emp_by_dept where dept = 100; [SharedPool-Worker-1] | 2015-06-15 17:36:55.861000 | 10.0.2.16 | 202
Preparing statement [SharedPool-Worker-1] | 2015-06-15 17:36:55.861000 | 10.0.2.16 | 418
Executing single-partition query on emp_by_dept [SharedPool-Worker-3] | 2015-06-15 17:36:55.871000 | 10.0.2.16 | 10525
Acquiring sstable references [SharedPool-Worker-3] | 2015-06-15 17:36:55.871000 | 10.0.2.16 | 10564
Merging memtable tombstones [SharedPool-Worker-3] | 2015-06-15 17:36:55.871000 | 10.0.2.16 | 10635
Key cache hit for sstable 1 [SharedPool-Worker-3] | 2015-06-15 17:36:55.871000 | 10.0.2.16 | 10748
Seeking to partition beginning in data file [SharedPool-Worker-3] | 2015-06-15 17:36:55.871000 | 10.0.2.16 | 10757
Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-3] | 2015-06-15 17:36:55.879000 | 10.0.2.16 | 18141
Merging data from memtables and 1 sstables [SharedPool-Worker-3] | 2015-06-15 17:36:55.879000 | 10.0.2.16 | 18166
Read 3 live and 0 tombstoned cells [SharedPool-Worker-3] | 2015-06-15 17:36:55.879000 | 10.0.2.16 | 18335
Request complete | 2015-06-15 17:36:55.928174 | 10.0.2.16 | 68174
cqlsh:d2> select * from emp_by_dept where dept = 100 and name = 'One';
dept | emp_id | name
------+--------+------
100 | 1 | One
(1 rows)
Tracing session: c56e70a0-1357-11e5-ab8b-fb5400f1b4af
activity | timestamp | source | source_elapsed
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+-----------+----------------
Execute CQL3 query | 2015-06-15 17:42:20.010000 | 10.0.2.16 | 0
Parsing select * from emp_by_dept where dept = 100 and name = 'One'; [SharedPool-Worker-1] | 2015-06-15 17:42:20.010000 | 10.0.2.16 | 12
Preparing statement [SharedPool-Worker-1] | 2015-06-15 17:42:20.010000 | 10.0.2.16 | 19
Computing ranges to query [SharedPool-Worker-1] | 2015-06-15 17:42:20.011000 | 10.0.2.16 | 881
Candidate index mean cardinalities are CompositesIndexOnRegular{columnDefs=[ColumnDefinition{name=name, type=org.apache.cassandra.db.marshal.UTF8Type, kind=REGULAR, componentIndex=1, indexName=emp_by_dept_name_idx, indexType=COMPOSITES}]}:1. Scanning with emp_by_dept.emp_by_dept_name_idx. [SharedPool-Worker-1] | 2015-06-15 17:42:20.011000 | 10.0.2.16 | 1144
Submitting range requests on 1 ranges with a concurrency of 1 (0.003515625 rows per range expected) [SharedPool-Worker-1] | 2015-06-15 17:42:20.011000 | 10.0.2.16 | 1238
Executing indexed scan for [100, 100] [SharedPool-Worker-2] | 2015-06-15 17:42:20.011000 | 10.0.2.16 | 1703
Candidate index mean cardinalities are CompositesIndexOnRegular{columnDefs=[ColumnDefinition{name=name, type=org.apache.cassandra.db.marshal.UTF8Type, kind=REGULAR, componentIndex=1, indexName=emp_by_dept_name_idx, indexType=COMPOSITES}]}:1. Scanning with emp_by_dept.emp_by_dept_name_idx. [SharedPool-Worker-2] | 2015-06-15 17:42:20.012000 | 10.0.2.16 | 1827
Candidate index mean cardinalities are CompositesIndexOnRegular{columnDefs=[ColumnDefinition{name=name, type=org.apache.cassandra.db.marshal.UTF8Type, kind=REGULAR, componentIndex=1, indexName=emp_by_dept_name_idx, indexType=COMPOSITES}]}:1. Scanning with emp_by_dept.emp_by_dept_name_idx. [SharedPool-Worker-2] | 2015-06-15 17:42:20.012000 | 10.0.2.16 | 1929
Executing single-partition query on emp_by_dept.emp_by_dept_name_idx [SharedPool-Worker-2] | 2015-06-15 17:42:20.012000 | 10.0.2.16 | 2058
Acquiring sstable references [SharedPool-Worker-2] | 2015-06-15 17:42:20.012000 | 10.0.2.16 | 2087
Merging memtable tombstones [SharedPool-Worker-2] | 2015-06-15 17:42:20.012000 | 10.0.2.16 | 2173
Key cache hit for sstable 1 [SharedPool-Worker-2] | 2015-06-15 17:42:20.012000 | 10.0.2.16 | 2352
Seeking to partition indexed section in data file [SharedPool-Worker-2] | 2015-06-15 17:42:20.012001 | 10.0.2.16 | 2377
Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-2] | 2015-06-15 17:42:20.014000 | 10.0.2.16 | 4300
Merging data from memtables and 1 sstables [SharedPool-Worker-2] | 2015-06-15 17:42:20.014000 | 10.0.2.16 | 4322
Submitted 1 concurrent range requests covering 1 ranges [SharedPool-Worker-1] | 2015-06-15 17:42:20.031000 | 10.0.2.16 | 21798
Read 1 live and 0 tombstoned cells [SharedPool-Worker-2] | 2015-06-15 17:42:20.032000 | 10.0.2.16 | 21989
Executing single-partition query on emp_by_dept [SharedPool-Worker-2] | 2015-06-15 17:42:20.032000 | 10.0.2.16 | 22374
Acquiring sstable references [SharedPool-Worker-2] | 2015-06-15 17:42:20.032000 | 10.0.2.16 | 22385
Merging memtable tombstones [SharedPool-Worker-2] | 2015-06-15 17:42:20.032000 | 10.0.2.16 | 22433
Key cache hit for sstable 1 [SharedPool-Worker-2] | 2015-06-15 17:42:20.032000 | 10.0.2.16 | 22514
Seeking to partition indexed section in data file [SharedPool-Worker-2] | 2015-06-15 17:42:20.032000 | 10.0.2.16 | 22523
Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-2] | 2015-06-15 17:42:20.033000 | 10.0.2.16 | 22963
Merging data from memtables and 1 sstables [SharedPool-Worker-2] | 2015-06-15 17:42:20.033000 | 10.0.2.16 | 22972
Read 1 live and 0 tombstoned cells [SharedPool-Worker-2] | 2015-06-15 17:42:20.033000 | 10.0.2.16 | 22991
Scanned 1 rows and matched 1 [SharedPool-Worker-2] | 2015-06-15 17:42:20.033000 | 10.0.2.16 | 23096
Request complete | 2015-06-15 17:42:20.033227 | 10.0.2.16 | 23227
Option 2: Create 2 tables as below.
CREATE TABLE d2.emp_by_dept (
dept int,
emp_id int,
name text,
PRIMARY KEY (dept, emp_id)
) WITH CLUSTERING ORDER BY (emp_id ASC);
select * from emp_by_dept where dept = 100 and emp_id = 1;
CREATE TABLE d2.emp_by_dept_name (
dept int,
emp_id int,
name text,
PRIMARY KEY (dept, name)
) WITH CLUSTERING ORDER BY (name ASC);
select * from emp_by_dept_name where dept = 100 and name = 'One';
Normally it is a good approach to use secondary indexes together with the partition key, because - as you say - the secondary key lookup can be performed on a single machine.
The other concept that needs to be taken into account is the cardinality of the secondary index. In your case emp_id is probably unique, and name is almost unique, so the index will most probably return a single row, and therefore it is not too efficient. For a good explanation I recommend this article: http://www.wentnet.com/blog/?p=77.
As consequence, if query time is critical and you can update both tables in the same time, I recommend using your option 2.
It would also be interesting to measure the two options with some generated data.
Option one won't be possible, as Cassandra does not support queries using both primary keys and secondary keys. Your best bet, would be to go with option two.
Although the similarities are many, don't think of it as a 'relational table'. Instead think of it as a nested, sorted map data structure.
Cassandra believes in de-normalization and duplication of data for better read performance. Therefore, option 2 is completely normal and within the best practices of Cassandra.
Few links which you might find useful - http://www.ebaytechblog.com/2012/07/16/cassandra-data-modeling-best-practices-part-1/
How do secondary indexes work in Cassandra?
Hope this helps.
Since maintaining two tables is harder than maintaining a single, the first option would be more preferable.
Query1 = select * from <> where dept = 100 and emp_id = 1;
Query2 = select * from <> where dept = 100 and name = 'One';
Option 1:
Write : time to write to emp_by_dept + time to update index
Read : Query1 will be a direct read from emp_by_dept, Query2 will be a read from emp_by_dept + get the location from index table + read the value from emp_by_dept
Option 2:
Write : time to write to emp_by_dept + time to write to emp_by_dept_name
Read: Query1 will be a direct read from emp_by_dept, Query2 will be a direct read from emp_by_dept_name (the required data is already sorted and kept )
So I assume write time should be almost the same in both cases (I have not tested this)
If your read response time is more important, then go for Option2.
If you are worried about maintaining 2 tables, go for option 1.
Thanks everyone for your inputs.
Related
Cassandra OOM with large RangeTombstoneList objects in memory
We have had problems with our cassandra nodes going OOM for some time, so finally we got them configured so that we could get a heap dump to try and see what was causing the OOM. In the dump there where 16 threads (named SharedPool-Worker-XX) each executing a SliceFromReadCommand from the same table. Each of the 16 threads had a RangeTombstoneList object retaining between 200 and 240mb of memory. The table in question are used as a queue (I know, far from the ideal use case for cassandra, but that is how it is) between two applications, where one writes and the other reads. So it is not unlikely that there is a large number of tombstones in the table...how ever I have been unable to find them. I did a trace on the query issued against the table, and it resulted in the following: cqlsh:ddp> select json_data ... from file_download ... where ti = 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'; ti | uuid | json_data ----+------+----------- (0 rows) Tracing session: b2f2be60-01c8-11e8-ae90-fd18acdea80d activity | timestamp | source | source_elapsed ------------------------------------------------------------------------------------------------------------------------------------+----------------------------+--------------+---------------- Execute CQL3 query | 2018-01-25 12:10:14.214000 | 10.60.73.232 | 0 Parsing select json_data\nfrom file_download\nwhere ti = 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'; [SharedPool-Worker-1] | 2018-01-25 12:10:14.260000 | 10.60.73.232 | 105 Preparing statement [SharedPool-Worker-1] | 2018-01-25 12:10:14.262000 | 10.60.73.232 | 197 Executing single-partition query on file_download [SharedPool-Worker-4] | 2018-01-25 12:10:14.263000 | 10.60.73.232 | 442 Acquiring sstable references [SharedPool-Worker-4] | 2018-01-25 12:10:14.264000 | 10.60.73.232 | 491 Merging memtable tombstones [SharedPool-Worker-4] | 2018-01-25 12:10:14.265000 | 10.60.73.232 | 517 Bloom filter allows skipping sstable 2444 [SharedPool-Worker-4] | 2018-01-25 12:10:14.270000 | 10.60.73.232 | 608 Bloom filter allows skipping sstable 8 [SharedPool-Worker-4] | 2018-01-25 12:10:14.271000 | 10.60.73.232 | 665 Skipped 0/2 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-4] | 2018-01-25 12:10:14.273000 | 10.60.73.232 | 700 Merging data from memtables and 0 sstables [SharedPool-Worker-4] | 2018-01-25 12:10:14.274000 | 10.60.73.232 | 707 Read 0 live and 0 tombstone cells [SharedPool-Worker-4] | 2018-01-25 12:10:14.274000 | 10.60.73.232 | 754 Request complete | 2018-01-25 12:10:14.215148 | 10.60.73.232 | 1148 cfstats also shows avg./max tombstones pr. slice to be 0. This makes me question whether or not the OOM was actually due to large amount of tombstones or something else? We run cassandra v2.1.17
Counting partition size in cassandra
I'm implementing unique entry counter in Cassandra. The counter may be represented just as a set of tuples: counter_id = broadcast:12345, token = user:123 counter_id = broadcast:12345, token = user:321 where value for counter broadcast:12345 may be counted as size of corresponding entries set. Such counter can be effectively stored as a table with counter_id being partition key. My first thought was that since single counter value is basically size of partition, i can do count(1) WHERE counter_id = ? query, which won't need to read data and would be super-duper fast. However, i see following trace output: cqlsh > select count(1) from token_counter_storage where id = '1'; activity | timestamp | source | source_elapsed -------------------------------------------------------------------------------------------------+----------------------------+------------+---------------- Execute CQL3 query | 2016-06-10 11:22:42.809000 | 172.17.0.2 | 0 Parsing select count(1) from token_counter_storage where id = '1'; [SharedPool-Worker-1] | 2016-06-10 11:22:42.809000 | 172.17.0.2 | 260 Preparing statement [SharedPool-Worker-1] | 2016-06-10 11:22:42.810000 | 172.17.0.2 | 565 Executing single-partition query on token_counter_storage [SharedPool-Worker-2] | 2016-06-10 11:22:42.810000 | 172.17.0.2 | 1256 Acquiring sstable references [SharedPool-Worker-2] | 2016-06-10 11:22:42.810000 | 172.17.0.2 | 1350 Skipped 0/0 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-2] | 2016-06-10 11:22:42.810000 | 172.17.0.2 | 1465 Merging data from memtables and 0 sstables [SharedPool-Worker-2] | 2016-06-10 11:22:42.810000 | 172.17.0.2 | 1546 Read 10 live and 0 tombstone cells [SharedPool-Worker-2] | 2016-06-10 11:22:42.811000 | 172.17.0.2 | 1826 Request complete | 2016-06-10 11:22:42.811410 | 172.17.0.2 | 2410 I guess that this trace confirms data being read from disk. Am i right in this conclusion, and if yes, is there any way to simply fetch partition size using index without any excessive disk hits?
Fetching data from Cassandra is too slow
We have a cluster of 3 cassandra nodes. All nodes are working fine, but fetching results is EXTREMELY slow. I run a SELECT-query in cql-shell to fetch ~100k of rows and before starting to show me first results it may take up to 30 seconds to warm-up. Why it may happen? Is there any way to speed it up? Here is a trace log: activity | timestamp | source | source_elapsed -------------------------------------------------------------------------------------------------------------------------------------------------------+--------------+----------------+---------------- execute_cql3_query | 17:41:34,923 | *.*.*.211 | 0 Parsing SELECT * FROM ga_page_visits WHERE site_global_key='GLOBAL_KEY' AND reported_at < '2014-05-17' AND reported_at > '2014-05-01' LIMIT 100000; | 17:41:34,923 | *.*.*.211 | 87 Preparing statement | 17:41:34,923 | *.*.*.211 | 290 Sending message to /*.*.*.213 | 17:41:34,924 | *.*.*.211 | 1579 Sending message to /*.*.*.212 | 17:41:34,924 | *.*.*.211 | 1617 Executing single-partition query on users | 17:41:34,924 | *.*.*.211 | 1666 Message received from /*.*.*.211 | 17:41:34,925 | *.*.*.213 | 48 Message received from /*.*.*.211 | 17:41:34,925 | *.*.*.212 | 40 Acquiring sstable references | 17:41:34,925 | *.*.*.211 | 1704 Merging memtable tombstones | 17:41:34,925 | *.*.*.211 | 1767 Key cache hit for sstable 2 | 17:41:34,925 | *.*.*.211 | 1886 Seeking to partition beginning in data file | 17:41:34,925 | *.*.*.211 | 1908 Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones | 17:41:34,925 | *.*.*.211 | 2337 Merging data from memtables and 1 sstables | 17:41:34,925 | *.*.*.211 | 2366 Read 1 live and 0 tombstoned cells | 17:41:34,925 | *.*.*.211 | 2410 Executing single-partition query on users | 17:41:34,926 | *.*.*.213 | 702 Executing single-partition query on users | 17:41:34,926 | *.*.*.212 | 813 Acquiring sstable references | 17:41:34,926 | *.*.*.213 | 747 Acquiring sstable references | 17:41:34,926 | *.*.*.212 | 848 Merging memtable tombstones | 17:41:34,926 | *.*.*.213 | 838 Merging memtable tombstones | 17:41:34,926 | *.*.*.212 | 922 Key cache hit for sstable 5 | 17:41:34,926 | *.*.*.213 | 1006 Key cache hit for sstable 1 | 17:41:34,926 | *.*.*.212 | 1044 Seeking to partition beginning in data file | 17:41:34,926 | *.*.*.213 | 1034 Seeking to partition beginning in data file | 17:41:34,926 | *.*.*.212 | 1066 Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones | 17:41:34,927 | *.*.*.213 | 1635 Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones | 17:41:34,927 | *.*.*.212 | 1543 Merging data from memtables and 1 sstables | 17:41:34,927 | *.*.*.213 | 1676 Merging data from memtables and 1 sstables | 17:41:34,927 | *.*.*.212 | 1571 Read 1 live and 0 tombstoned cells | 17:41:34,927 | *.*.*.213 | 1760 Read 1 live and 0 tombstoned cells | 17:41:34,927 | *.*.*.212 | 1634 Enqueuing response to /*.*.*.211 | 17:41:34,927 | *.*.*.213 | 2014 Enqueuing response to /*.*.*.211 | 17:41:34,927 | *.*.*.212 | 1846 Sending message to /*.*.*.211 | 17:41:34,927 | *.*.*.213 | 2173 Sending message to /*.*.*.211 | 17:41:34,927 | *.*.*.212 | 1960 Message received from /*.*.*.213 | 17:41:34,928 | *.*.*.211 | 4759 Message received from /*.*.*.212 | 17:41:34,928 | *.*.*.211 | 4775 Processing response from /*.*.*.213 | 17:41:34,928 | *.*.*.211 | 5439 Processing response from /*.*.*.212 | 17:41:34,928 | *.*.*.211 | 5668 Sending message to /*.*.*.40 | 17:41:34,929 | *.*.*.211 | 5954 Executing single-partition query on ga_page_visits | 17:41:34,930 | *.*.*.211 | 6792 Acquiring sstable references | 17:41:34,930 | *.*.*.211 | 6868 Merging memtable tombstones | 17:41:34,930 | *.*.*.211 | 6936 Key cache hit for sstable 2 | 17:41:34,930 | *.*.*.211 | 7075 Seeking to partition indexed section in data file | 17:41:34,930 | *.*.*.211 | 7102 Key cache hit for sstable 1 | 17:41:34,930 | *.*.*.211 | 7211 Seeking to partition indexed section in data file | 17:41:34,930 | *.*.*.211 | 7232 Skipped 1/3 non-slice-intersecting sstables, included 0 due to tombstones | 17:41:34,930 | *.*.*.211 | 7290 Merging data from memtables and 2 sstables | 17:41:34,930 | *.*.*.211 | 7311 Message received from /*.*.*.211 | 17:41:34,964 | *.*.*.40 | 45 Executing single-partition query on users | 17:41:34,965 | *.*.*.40 | 658 Acquiring sstable references | 17:41:34,965 | *.*.*.40 | 695 Merging memtable tombstones | 17:41:34,965 | *.*.*.40 | 762 Key cache hit for sstable 14 | 17:41:34,965 | *.*.*.40 | 852 Seeking to partition beginning in data file | 17:41:34,965 | *.*.*.40 | 876 Key cache hit for sstable 16 | 17:41:34,965 | *.*.*.40 | 1275 Seeking to partition beginning in data file | 17:41:34,965 | *.*.*.40 | 1293 Skipped 0/2 non-slice-intersecting sstables, included 0 due to tombstones | 17:41:34,966 | *.*.*.40 | 1504 Merging data from memtables and 2 sstables | 17:41:34,966 | *.*.*.40 | 1526 Read 1 live and 0 tombstoned cells | 17:41:34,966 | *.*.*.40 | 1592 Enqueuing response to /*.*.*.211 | 17:41:34,966 | *.*.*.40 | 1755 Sending message to /*.*.*.211 | 17:41:34,966 | *.*.*.40 | 1834 Message received from /*.*.*.40 | 17:41:34,980 | *.*.*.211 | 57679 Processing response from /*.*.*.40 | 17:41:34,981 | *.*.*.211 | 57785 Read 99880 live and 0 tombstoned cells | 17:41:36,315 | *.*.*.211 | 1392676 Request complete | 17:41:37,541 | *.*.*.211 | 2618390 The table schema is: CREATE TABLE ga_page_visits ( site_global_key ascii, reported_at timestamp, timeuuid ascii, bounces int, campaign text, channel ascii, conversions int, device ascii, keyword text, medium text, new_visits int, page_id int, page_views int, referral_path text, site_search_engine_id int, social_network text, source text, value decimal, visits int, PRIMARY KEY (site_global_key, reported_at, timeuuid) ) WITH bloom_filter_fp_chance=0.010000 AND caching='KEYS_ONLY' AND comment='' AND dclocal_read_repair_chance=0.000000 AND gc_grace_seconds=864000 AND index_interval=128 AND read_repair_chance=0.100000 AND replicate_on_write='true' AND populate_io_cache_on_flush='false' AND default_time_to_live=0 AND speculative_retry='99.0PERCENTILE' AND memtable_flush_period_in_ms=0 AND compaction={'class': 'SizeTieredCompactionStrategy'} AND compression={'sstable_compression': 'LZ4Compressor'};
I suspect you've got an issue with your data model. It looks like you're pushing in an absolute ton of data into a single partition. That will become a problem as your data set gets bigger. I recommend this type of structure instead: CREATE TABLE ga_page_visits ( site_global_key ascii, day timestamp, timeuuid ascii, bounces int, campaign text, channel ascii, conversions int, device ascii, keyword text, medium text, new_visits int, page_id int, page_views int, referral_path text, site_search_engine_id int, social_network text, source text, value decimal, visits int, PRIMARY KEY ((site_global_key, day), timeuuid) ) WITH bloom_filter_fp_chance=0.010000 AND caching='KEYS_ONLY' AND comment='' AND dclocal_read_repair_chance=0.000000 AND gc_grace_seconds=864000 AND index_interval=128 AND read_repair_chance=0.100000 AND replicate_on_write='true' AND populate_io_cache_on_flush='false' AND default_time_to_live=0 AND speculative_retry='99.0PERCENTILE' AND memtable_flush_period_in_ms=0 AND compaction={'class': 'SizeTieredCompactionStrategy'} AND compression={'sstable_compression': 'LZ4Compressor'}; You already have a timeuuid, that has a time embedded in it, so you don't need a reported_at timestamp. You are going to have a ridiculous number of rows in this partition. You should use a composite primary key of site_global_key & the date (do not add the time, let it be 00:00:00.) This way, each day will live on a separate partition and is more easily queried. Do 1 query for each day. This will perform much better.
As I'm looking at your model requirements, this is really looking like a data warehouse type design. Jon nailed it with the rollups. If you need low latency, high frequency creating those tighter views of your data is done often in production with high scaling requirements. I don't want to spray out links but there are examples of this if you google.
Cassandra query very slow when database is large
I have a table with 45 million keys. Compaction strategy is LCS. Single node Cassandra version 2.0.5. No key or row caching. Queries are very slow. They were several times faster when the database was smaller. activity | timestamp | source | source_elapsed -----------------------------------------------------------------------------------------------+--------------+-------------+---------------- execute_cql3_query | 17:47:07,891 | 10.72.9.151 | 0 Parsing select * from object_version_info where key = 'test10-Client9_99900' LIMIT 10000; | 17:47:07,891 | 10.72.9.151 | 80 Preparing statement | 17:47:07,891 | 10.72.9.151 | 178 Executing single-partition query on object_version_info | 17:47:07,893 | 10.72.9.151 | 2513 Acquiring sstable references | 17:47:07,893 | 10.72.9.151 | 2539 Merging memtable tombstones | 17:47:07,893 | 10.72.9.151 | 2597 Bloom filter allows skipping sstable 1517 | 17:47:07,893 | 10.72.9.151 | 2652 Bloom filter allows skipping sstable 1482 | 17:47:07,893 | 10.72.9.151 | 2677 Partition index with 0 entries found for sstable 1268 | 17:47:08,560 | 10.72.9.151 | 669935 Seeking to partition beginning in data file | 17:47:08,560 | 10.72.9.151 | 669956 Skipped 0/3 non-slice-intersecting sstables, included 0 due to tombstones | 17:47:09,411 | 10.72.9.151 | 1520279 Merging data from memtables and 1 sstables | 17:47:09,411 | 10.72.9.151 | 1520302 Read 1 live and 0 tombstoned cells | 17:47:09,411 | 10.72.9.151 | 1520351 Request complete | 17:47:09,411 | 10.72.9.151 | 1520615
what are cqlsh tracing entries meaning?
activity | timestamp | source | source_elapsed ------------------------------------------------------------------------------------------------------+--------------+---------------+---------------- execute_cql3_query | 06:30:52,479 | 192.168.11.23 | 0 Parsing select adid from userlastadevents where userid = '90000012' and type in (1,2,3) LIMIT 10000; | 06:30:52,479 | 192.168.11.23 | 44 Peparing statement | 06:30:52,479 | 192.168.11.23 | 146 Executing single-partition query on userlastadevents | 06:30:52,480 | 192.168.11.23 | 665 Acquiring sstable references | 06:30:52,480 | 192.168.11.23 | 680 Executing single-partition query on userlastadevents | 06:30:52,480 | 192.168.11.23 | 696 Acquiring sstable references | 06:30:52,480 | 192.168.11.23 | 704 Merging memtable tombstones | 06:30:52,480 | 192.168.11.23 | 706 Merging memtable tombstones | 06:30:52,480 | 192.168.11.23 | 721 Bloom filter allows skipping sstable 37398 | 06:30:52,480 | 192.168.11.23 | 758 Bloom filter allows skipping sstable 37426 | 06:30:52,480 | 192.168.11.23 | 762 Bloom filter allows skipping sstable 35504 | 06:30:52,480 | 192.168.11.23 | 768 Bloom filter allows skipping sstable 36671 | 06:30:52,480 | 192.168.11.23 | 771 Merging data from memtables and 0 sstables | 06:30:52,480 | 192.168.11.23 | 777 Merging data from memtables and 0 sstables | 06:30:52,480 | 192.168.11.23 | 780 Executing single-partition query on userlastadevents | 06:30:52,480 | 192.168.11.23 | 782 Acquiring sstable references | 06:30:52,480 | 192.168.11.23 | 791 Read 0 live and 0 tombstoned cells | 06:30:52,480 | 192.168.11.23 | 797 Read 0 live and 0 tombstoned cells | 06:30:52,480 | 192.168.11.23 | 800 Merging memtable tombstones | 06:30:52,480 | 192.168.11.23 | 815 Bloom filter allows skipping sstable 37432 | 06:30:52,480 | 192.168.11.23 | 857 Bloom filter allows skipping sstable 36918 | 06:30:52,480 | 192.168.11.23 | 866 Merging data from memtables and 0 sstables | 06:30:52,480 | 192.168.11.23 | 874 Read 0 live and 0 tombstoned cells | 06:30:52,480 | 192.168.11.23 | 898 Request complete | 06:30:52,479 | 192.168.11.23 | 990 Above is the tracing output from cassandra cqlsh for a single query, but I couln't understand some the entries, at first the column "source_elapsed" what does it mean, does it mean time elapsed to execute particular task or cumulative time elapsed up to this task. second "timestamp" doesn't maintain chronology like "Request Complete" timestamp is 06:30:52,479 but "Merging data from memtables and 0 sstables" is 06:30:52,480 which is suppose to happen earlier but timestamp shows it happens later. And couldn't understand some of the activities as well, Executing single-partition query -- doesn't it mean all the task as a whole or is it a starting point? what are the job it includes? And why is it repeating three times? is it link to replication factor. Acquiring sstable references -- What does it mean, does it checks all the sstable's bloom filters whether that contains a particular key we search for? An then find the reference in data file with the help of "Partition Index". Bloom filter allows skipping sstable -- when does it happen? How does it happen? it is taking same amount of time of finding sstable references. Request complete -- what does it mean? is it the finishing line or it is some job which takes most amount of time?
Did you see the request tracing in Cassandra link that explains different tracing scenarios? source_elapsed: is the cumulative execution time on a specific node (if you check the above link it will be clearer) Executing single-partition query: (seems to represent) the start time Request complete: all work has been done for this request For the rest you'd be better off reading the Reads in Cassandra docs as that would be much more detailed than I could summarize it here.