I have a C* table which are wide and some columns are read-heavy. I am considering using row cache but do not know if row cache can store specific columns. If all the cells of that row need to be stored, the content to be cached may grow too fast and defeat the purpose.
The schema is like follows:
CREATE TABLE tb1 (
pk1 int,
ck1 int,
read_heavy_col1 int,
read_heavy_col2 int,
normal_col1 int,
normal_col2 int,
...
PRIMARY KEY (pk1, ck1)
)
The question is if row cache is able to cache pk1, ck1, read_heavy_col1, read_heavy_col2 only and ignores normal_col1, normal_col2, ....
According to DataStax Configuring data caches,
If the newly cached data does not include all cells configured by
user, Cassandra performs another read.
Does that mean C* can cache only the columns of interest?
You can instead of using a large number of columns, instead have a single clustering key that the heavy read columns sort earlier as, then limit the rows_per_partition to just capture those.
You will probably not get a whole lot of benefit vs just caching entire row. Also while the key cache is generally huge for read performance the row cache really only helps in very specific scenarios (and hurts in some) so be sure to benchmark it as the os cache is usually sufficient to keep the important bits in memory.
Answering my own questions:
Cassandra will cache all columns of that row.
From "Cassandra The Definitive Guide" (2nd edition) by Jeff Carpenter & Eben Hewitt:
The row cache caches entire rows and can speed up read access for frequently accessed rows, at the cost of more memory usage.
Experiment:
abc#cqlsh:test> SELECT pk1, ck1, read_heavy_col1 FROM tb1 WHERE pk1=2 and ck1=1;
pk1 | ck1 | read_heavy_col1
-----+-----+-----------------
2 | 1 | 1
(1 rows)
Tracing session: 53aac630-11be-11e9-9cb3-fbffb1c1e13b
activity | timestamp | source | source_elapsed
--------------------------------------------------------------------------------------------------------+----------------------------+--------------+----------------
Execute CQL3 query | 2019-01-06 22:21:15.667000 | x.x.x.x | 0
Parsing SELECT pk1, ck1, read_heavy_col1 FROM tb1 WHERE pk1=2 and ck1=1; [Native-Transport-Requests-1] | 2019-01-06 22:21:15.668000 | x.x.x.x | 310
Preparing statement [Native-Transport-Requests-1] | 2019-01-06 22:21:15.668000 | x.x.x.x | 568
Executing single-partition query on roles [ReadStage-2] | 2019-01-06 22:21:15.669000 | x.x.x.x | 1259
Acquiring sstable references [ReadStage-2] | 2019-01-06 22:21:15.669000 | x.x.x.x | 1345
Key cache hit for sstable 2 [ReadStage-2] | 2019-01-06 22:21:15.669000 | x.x.x.x | 1475
Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones [ReadStage-2] | 2019-01-06 22:21:15.669000 | x.x.x.x | 1569
Merged data from memtables and 1 sstables [ReadStage-2] | 2019-01-06 22:21:15.669000 | x.x.x.x | 1768
Read 1 live rows and 0 tombstone cells [ReadStage-2] | 2019-01-06 22:21:15.669000 | x.x.x.x | 1869
Executing single-partition query on roles [ReadStage-3] | 2019-01-06 22:21:15.672000 | x.x.x.x | 4579
Acquiring sstable references [ReadStage-3] | 2019-01-06 22:21:15.672000 | x.x.x.x | 4701
Key cache hit for sstable 2 [ReadStage-3] | 2019-01-06 22:21:15.672000 | x.x.x.x | 4812
Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones [ReadStage-3] | 2019-01-06 22:21:15.672000 | x.x.x.x | 4891
Merged data from memtables and 1 sstables [ReadStage-3] | 2019-01-06 22:21:15.672000 | x.x.x.x | 5001
Read 1 live rows and 0 tombstone cells [ReadStage-3] | 2019-01-06 22:21:15.673000 | x.x.x.x | 5081
Row cache miss [ReadStage-3] | 2019-01-06 22:21:15.674000 | x.x.x.x | 6227
Executing single-partition query on tb1 [ReadStage-3] | 2019-01-06 22:21:15.674000 | x.x.x.x | 6387
Acquiring sstable references [ReadStage-3] | 2019-01-06 22:21:15.674000 | x.x.x.x | 6445
Skipped 0/0 non-slice-intersecting sstables, included 0 due to tombstones [ReadStage-3] | 2019-01-06 22:21:15.674001 | x.x.x.x | 6514
Caching 3 rows [ReadStage-3] | 2019-01-06 22:21:15.674001 | x.x.x.x | 6647
Merged data from memtables and 0 sstables [ReadStage-3] | 2019-01-06 22:21:15.675000 | x.x.x.x | 7231
Read 1 live rows and 0 tombstone cells [ReadStage-3] | 2019-01-06 22:21:15.675000 | x.x.x.x | 7443
Request complete | 2019-01-06 22:21:15.676482 | x.x.x.x | 9482
Then the rows are cached, fetching a normal column:
abc#cqlsh:test> SELECT pk1, ck1, normal_col2 FROM tb1 WHERE pk1=2 and ck1=1;
pk1 | ck1 | normal_col2
-----+-----+-------------
2 | 1 | 1
(1 rows)
Tracing session: a178ae90-11be-11e9-9cb3-fbffb1c1e13b
activity | timestamp | source | source_elapsed
----------------------------------------------------------------------------------------------------+----------------------------+--------------+----------------
Execute CQL3 query | 2019-01-06 22:23:26.201000 | x.x.x.x | 0
Parsing SELECT pk1, ck1, normal_col2 FROM tb1 WHERE pk1=2 and ck1=1; [Native-Transport-Requests-1] | 2019-01-06 22:23:26.202000 | x.x.x.x | 205
Preparing statement [Native-Transport-Requests-1] | 2019-01-06 22:23:26.202000 | x.x.x.x | 393
Executing single-partition query on roles [ReadStage-3] | 2019-01-06 22:23:26.202000 | x.x.x.x | 968
Acquiring sstable references [ReadStage-3] | 2019-01-06 22:23:26.203000 | x.x.x.x | 1413
Key cache hit for sstable 2 [ReadStage-3] | 2019-01-06 22:23:26.203000 | x.x.x.x | 1564
Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones [ReadStage-3] | 2019-01-06 22:23:26.203000 | x.x.x.x | 1685
Merged data from memtables and 1 sstables [ReadStage-3] | 2019-01-06 22:23:26.203000 | x.x.x.x | 1841
Read 1 live rows and 0 tombstone cells [ReadStage-3] | 2019-01-06 22:23:26.203000 | x.x.x.x | 1930
Executing single-partition query on roles [ReadStage-5] | 2019-01-06 22:23:26.204000 | x.x.x.x | 2307
Acquiring sstable references [ReadStage-5] | 2019-01-06 22:23:26.204000 | x.x.x.x | 2375
Key cache hit for sstable 2 [ReadStage-5] | 2019-01-06 22:23:26.204000 | x.x.x.x | 2475
Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones [ReadStage-5] | 2019-01-06 22:23:26.204000 | x.x.x.x | 2584
Merged data from memtables and 1 sstables [ReadStage-5] | 2019-01-06 22:23:26.204000 | x.x.x.x | 2691
Read 1 live rows and 0 tombstone cells [ReadStage-5] | 2019-01-06 22:23:26.204000 | x.x.x.x | 2761
Row cache hit [ReadStage-3] | 2019-01-06 22:23:26.205000 | x.x.x.x | 3301
Read 1 live rows and 0 tombstone cells [ReadStage-3] | 2019-01-06 22:23:26.205000 | x.x.x.x | 3489
Request complete | 2019-01-06 22:23:26.204726 | x.x.x.x | 3726
Clearly C* will cache the entire row rather than specific columns.
Related
Versions:
com.datastax.oss
-java-driver-core:4.5.0
-java-driver-query-builder:4.5.0
-java-driver-mapper-runtime:4.5.0
cassandra:3.11.5 docker image
jdk 11.1
I'm running a deployment of feast that I've modified to use cassandra as a backend low latency serving db for machine learning features. I'm sucessfully writing and reading rows, but the read is inconsistent with respect to results returned. Sometimes the payloads are empty and I don't know why. I have already tried updating to the latest datastax driver and coordinating time using ntp/time.google.com. I've also tried to change the consistency of write to ALL and read to LOCAL_ONE/LOCAL_QUOROM, without success. I'm really struggling to figure out why select isn't consistent. Any insight would be great! :) Here is the process:
I write the rows into cassandra using CassandraIO
#Override
public Future<Void> saveAsync(CassandraMutation entityClass) {
return mapper.saveAsync(
entityClass,
Option.timestamp(entityClass.getWriteTime()),
Option.ttl(entityClass.getTtl()),
Option.consistencyLevel(ConsistencyLevel.LOCAL_QUORUM),
Option.tracing(true));
}
This seems to successfully map rows into my cassandra cluster, which I then query in my application as follows
List<InetSocketAddress> contactPoints =
Arrays.stream(cassandraConfig.getBootstrapHosts().split(","))
.map(h -> new InetSocketAddress(h, cassandraConfig.getPort()))
.collect(Collectors.toList());
CqlSession session =
CqlSession.builder()
.addContactPoints(contactPoints)
.withLocalDatacenter(storeProperties.getCassandraDcName())
.build();
....
PreparedStatement query =
session.prepare(
String.format(
"SELECT entities, feature, value, WRITETIME(value) as writetime FROM %s.%s WHERE entities = ?",
keyspace, tableName));
session.execute(
query
.bind(key)
.setTracing(true)
.setConsistencyLevel(ConsistencyLevel.LOCAL_QUORUM)));
My issue is that there doesn't seem to be consistent selects happening. I have been recording various bits for a while now and for example here are two select queries, with the same coordinator, one succeeded, then after that the subsequent select fails to return results.
cqlsh> select * from system_traces.sessions where session_id=be023400-6a1e-11ea-97ca-6b8bbe3a2a36;
session_id | client | command | coordinator | duration | parameters | request | started_at
--------------------------------------+---------------+---------+---------------+----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------+---------------------------------
be023400-6a1e-11ea-97ca-6b8bbe3a2a36 | xx.xx.xxx.189 | QUERY | xx.xx.xxx.158 | 41313 | {'bound_var_0_entities': '''ml_project/test_test_entity:1:entity2_uuid=TenderGreens_8755fff7|entity1_uuid=Zach_Yang_fe7fea92''', 'consistency_level': 'LOCAL_QUORUM', 'page_size': '5000', 'query': 'SELECT entities, feature, value, WRITETIME(value) as writetime FROM feast.feature_store WHERE entities = ?', 'serial_consistency_level': 'SERIAL'} | Execute CQL3 prepared query | 2020-03-19 20:18:05.760000+0000
select event_id, activity, source_elapsed, thread from system_traces.events where session_id=be023400-6a1e-11ea-97ca-6b8bbe3a2a36;
event_id | activity | source_elapsed | thread
--------------------------------------+---------------------------------------------------------------------------------------------------------------+----------------+-----------------------------------------------------------------------------------------------------------
be0652b0-6a1e-11ea-97ca-6b8bbe3a2a36 | Read-repair DC_LOCAL | 27087 | Native-Transport-Requests-1
be0679c0-6a1e-11ea-97ca-6b8bbe3a2a36 | reading data from /xx.xx.xxx.161 | 28034 | Native-Transport-Requests-1
be06a0d0-6a1e-11ea-97ca-6b8bbe3a2a36 | Sending READ message to /xx.xx.xxx.161 | 28552 | MessagingService-Outgoing-/xx.xx.xxx.161-Small
be06a0d1-6a1e-11ea-97ca-6b8bbe3a2a36 | reading digest from /xx.xx.xxx.162 | 28595 | Native-Transport-Requests-1
be06a0d2-6a1e-11ea-97ca-6b8bbe3a2a36 | Executing single-partition query on feature_store | 28598 | ReadStage-3
be06a0d3-6a1e-11ea-97ca-6b8bbe3a2a36 | Acquiring sstable references | 28689 | ReadStage-3
be06a0d4-6a1e-11ea-97ca-6b8bbe3a2a36 | reading digest from /xx.xx.xx.138 | 28852 | Native-Transport-Requests-1
be06a0d5-6a1e-11ea-97ca-6b8bbe3a2a36 | Sending READ message to /xx.xx.xxx.162 | 28904 | MessagingService-Outgoing-/xx.xx.xxx.162-Small
be06a0d6-6a1e-11ea-97ca-6b8bbe3a2a36 | Bloom filter allows skipping sstable 56 | 28937 | ReadStage-3
be06a0d7-6a1e-11ea-97ca-6b8bbe3a2a36 | reading digest from /xx.xx.xxx.171 | 28983 | Native-Transport-Requests-1
be06a0d8-6a1e-11ea-97ca-6b8bbe3a2a36 | Bloom filter allows skipping sstable 55 | 29020 | ReadStage-3
be06a0d9-6a1e-11ea-97ca-6b8bbe3a2a36 | Sending READ message to cassandra-feature-store-1.cassandra-feature-store.team-data/xx.xx.xx.138 | 29071 | MessagingService-Outgoing-cassandra-feature-store-1.cassandra-feature-store.team-data/xx.xx.xx.138-Small
be06a0da-6a1e-11ea-97ca-6b8bbe3a2a36 | Bloom filter allows skipping sstable 54 | 29181 | ReadStage-3
be06a0db-6a1e-11ea-97ca-6b8bbe3a2a36 | Sending READ message to cassandra-feature-store-0.cassandra-feature-store.team-data/xx.xx.xxx.171 | 29201 | MessagingService-Outgoing-cassandra-feature-store-0.cassandra-feature-store.team-data/xx.xx.xx.171-Small
be06c7e0-6a1e-11ea-80ad-dffaf3fb56b4 | READ message received from /xx.xx.xxx.158 | 33 | MessagingService-Incoming-/xx.xx.xxx.158
be06c7e0-6a1e-11ea-8693-577fec389856 | READ message received from /xx.xx.xxx.158 | 34 | MessagingService-Incoming-/xx.xx.xxx.158
be06c7e0-6a1e-11ea-8b1a-e5aa876f7d0d | READ message received from /xx.xx.xxx.158 | 29 | MessagingService-Incoming-/xx.xx.xxx.158
be06c7e0-6a1e-11ea-8d2e-c5837edad3d1 | READ message received from /xx.xx.xxx.158 | 44 | MessagingService-Incoming-/xx.xx.xxx.158
be06c7e0-6a1e-11ea-97ca-6b8bbe3a2a36 | Bloom filter allows skipping sstable 41 | 29273 | ReadStage-3
be06c7e1-6a1e-11ea-80ad-dffaf3fb56b4 | Executing single-partition query on feature_store | 389 | ReadStage-1
be06c7e1-6a1e-11ea-8b1a-e5aa876f7d0d | Executing single-partition query on feature_store | 513 | ReadStage-1
be06c7e1-6a1e-11ea-97ca-6b8bbe3a2a36 | Skipped 0/4 non-slice-intersecting sstables, included 0 due to tombstones | 29342 | ReadStage-3
be06c7e2-6a1e-11ea-80ad-dffaf3fb56b4 | Acquiring sstable references | 457 | ReadStage-1
be06c7e3-6a1e-11ea-80ad-dffaf3fb56b4 | Bloom filter allows skipping sstable 55 | 620 | ReadStage-1
be06c7e4-6a1e-11ea-80ad-dffaf3fb56b4 | Bloom filter allows skipping sstable 54 | 659 | ReadStage-1
be06c7e5-6a1e-11ea-80ad-dffaf3fb56b4 | Bloom filter allows skipping sstable 41 | 677 | ReadStage-1
be06c7e6-6a1e-11ea-80ad-dffaf3fb56b4 | Skipped 0/3 non-slice-intersecting sstables, included 0 due to tombstones | 695 | ReadStage-1
be06eef0-6a1e-11ea-80ad-dffaf3fb56b4 | Merged data from memtables and 0 sstables | 1039 | ReadStage-1
be06eef0-6a1e-11ea-8693-577fec389856 | Executing single-partition query on feature_store | 372 | ReadStage-1
be06eef0-6a1e-11ea-8b1a-e5aa876f7d0d | Acquiring sstable references | 583 | ReadStage-1
be06eef0-6a1e-11ea-8d2e-c5837edad3d1 | Executing single-partition query on feature_store | 454 | ReadStage-1
be06eef0-6a1e-11ea-97ca-6b8bbe3a2a36 | Merged data from memtables and 0 sstables | 30372 | ReadStage-3
be06eef1-6a1e-11ea-80ad-dffaf3fb56b4 | Read 16 live rows and 0 tombstone cells | 1125 | ReadStage-1
be06eef1-6a1e-11ea-8693-577fec389856 | Acquiring sstable references | 493 | ReadStage-1
be06eef1-6a1e-11ea-8b1a-e5aa876f7d0d | Bloom filter allows skipping sstable 54 | 703 | ReadStage-1
be06eef1-6a1e-11ea-8d2e-c5837edad3d1 | Acquiring sstable references | 530 | ReadStage-1
be06eef1-6a1e-11ea-97ca-6b8bbe3a2a36 | Read 16 live rows and 0 tombstone cells | 30484 | ReadStage-3
be06eef2-6a1e-11ea-80ad-dffaf3fb56b4 | Enqueuing response to /xx.xx.xxx.158 | 1155 | ReadStage-1
be06eef2-6a1e-11ea-8693-577fec389856 | Bloom filter allows skipping sstable 56 | 721 | ReadStage-1
be06eef2-6a1e-11ea-8b1a-e5aa876f7d0d | Bloom filter allows skipping sstable 41 | 740 | ReadStage-1
be06eef2-6a1e-11ea-8d2e-c5837edad3d1 | Bloom filter allows skipping sstable 56 | 655 | ReadStage-1
be06eef3-6a1e-11ea-80ad-dffaf3fb56b4 | Sending REQUEST_RESPONSE message to cassandra-feature-store-2.cassandra-feature-store.team-data/xx.xx.xxx.158 | 1492 | MessagingService-Outgoing-cassandra-feature-store-2.cassandra-feature-store.team-data/xx.xx.xxx.158-Small
be06eef3-6a1e-11ea-8693-577fec389856 | Bloom filter allows skipping sstable 55 | 780 | ReadStage-1
be06eef3-6a1e-11ea-8b1a-e5aa876f7d0d | Skipped 0/2 non-slice-intersecting sstables, included 0 due to tombstones | 761 | ReadStage-1
be06eef3-6a1e-11ea-8d2e-c5837edad3d1 | Bloom filter allows skipping sstable 55 | 686 | ReadStage-1
be06eef4-6a1e-11ea-8693-577fec389856 | Bloom filter allows skipping sstable 54 | 815 | ReadStage-1
be06eef4-6a1e-11ea-8b1a-e5aa876f7d0d | Merged data from memtables and 0 sstables | 1320 | ReadStage-1
be06eef4-6a1e-11ea-8d2e-c5837edad3d1 | Bloom filter allows skipping sstable 54 | 705 | ReadStage-1
be06eef5-6a1e-11ea-8693-577fec389856 | Bloom filter allows skipping sstable 41 | 839 | ReadStage-1
be06eef5-6a1e-11ea-8b1a-e5aa876f7d0d | Read 16 live rows and 0 tombstone cells | 1495 | ReadStage-1
be06eef5-6a1e-11ea-8d2e-c5837edad3d1 | Bloom filter allows skipping sstable 41 | 720 | ReadStage-1
be06eef6-6a1e-11ea-8693-577fec389856 | Skipped 0/4 non-slice-intersecting sstables, included 0 due to tombstones | 871 | ReadStage-1
be06eef6-6a1e-11ea-8b1a-e5aa876f7d0d | Enqueuing response to /xx.xx.xxx.158 | 1554 | ReadStage-1
be06eef6-6a1e-11ea-8d2e-c5837edad3d1 | Skipped 0/4 non-slice-intersecting sstables, included 0 due to tombstones | 738 | ReadStage-1
be06eef7-6a1e-11ea-8d2e-c5837edad3d1 | Merged data from memtables and 0 sstables | 1157 | ReadStage-1
be06eef8-6a1e-11ea-8d2e-c5837edad3d1 | Read 16 live rows and 0 tombstone cells | 1296 | ReadStage-1
be06eef9-6a1e-11ea-8d2e-c5837edad3d1 | Enqueuing response to /xx.xx.xxx.158 | 1325 | ReadStage-1
be071600-6a1e-11ea-8693-577fec389856 | Merged data from memtables and 0 sstables | 1592 | ReadStage-1
be071600-6a1e-11ea-8b1a-e5aa876f7d0d | Sending REQUEST_RESPONSE message to cassandra-feature-store-2.cassandra-feature-store.team-data/xx.xx.xxx.158 | 1783 | MessagingService-Outgoing-cassandra-feature-store-2.cassandra-feature-store.team-data/xx.xx.xxx.158-Small
be071600-6a1e-11ea-8d2e-c5837edad3d1 | Sending REQUEST_RESPONSE message to /xx.xx.xxx.158 | 1484 | MessagingService-Outgoing-/xx.xx.xxx.158-Small
be071600-6a1e-11ea-97ca-6b8bbe3a2a36 | REQUEST_RESPONSE message received from /xx.xx.xxx.161 | 31525 | MessagingService-Incoming-/xx.xx.xxx.161
be071601-6a1e-11ea-8693-577fec389856 | Read 16 live rows and 0 tombstone cells | 1754 | ReadStage-1
be071601-6a1e-11ea-97ca-6b8bbe3a2a36 | Processing response from /xx.xx.xxx.161 | 31650 | RequestResponseStage-4
be071602-6a1e-11ea-8693-577fec389856 | Enqueuing response to /xx.xx.xxx.158 | 1796 | ReadStage-1
be071602-6a1e-11ea-97ca-6b8bbe3a2a36 | REQUEST_RESPONSE message received from /xx.xx.xx.138 | 31795 | MessagingService-Incoming-/xx.xx.xx.138
be071603-6a1e-11ea-8693-577fec389856 | Sending REQUEST_RESPONSE message to /xx.xx.xxx.158 | 1973 | MessagingService-Outgoing-/xx.xx.xxx.158-Small
be071603-6a1e-11ea-97ca-6b8bbe3a2a36 | Processing response from /xx.xx.xx.138 | 31872 | RequestResponseStage-4
be071604-6a1e-11ea-97ca-6b8bbe3a2a36 | REQUEST_RESPONSE message received from /xx.xx.xxx.162 | 31918 | MessagingService-Incoming-/xx.xx.xxx.162
be071605-6a1e-11ea-97ca-6b8bbe3a2a36 | Processing response from /xx.xx.xxx.162 | 32047 | RequestResponseStage-4
be073d10-6a1e-11ea-97ca-6b8bbe3a2a36 | REQUEST_RESPONSE message received from /xx.xx.xxx.171 | 32688 | MessagingService-Incoming-/xx.xx.xx.171
be073d11-6a1e-11ea-97ca-6b8bbe3a2a36 | Processing response from /xx.xx.xxx.171 | 32827 | RequestResponseStage-2
be073d12-6a1e-11ea-97ca-6b8bbe3a2a36 | Initiating read-repair | 32985 | RequestResponseStage-2
Failure:
cqlsh> select * from system_traces.sessions where session_id=472551e0-6a1f-11ea-97ca-6b8bbe3a2a36;
session_id | client | command | coordinator | duration | parameters | request | started_at
--------------------------------------+---------------+---------+---------------+----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------+---------------------------------
472551e0-6a1f-11ea-97ca-6b8bbe3a2a36 | xx.xx.xxx.189 | QUERY | xx.xx.xxx.158 | 3044 | {'bound_var_0_entities': '''ml_project/test_test_entity:1:entity1_uuid=Zach_Yang_fe7fea92|entity2_uuid=TenderGreens_8755fff7''', 'consistency_level': 'LOCAL_QUORUM', 'page_size': '5000', 'query': 'SELECT entities, feature, value, WRITETIME(value) as writetime FROM feast.feature_store WHERE entities = ?', 'serial_consistency_level': 'SERIAL'} | Execute CQL3 prepared query | 2020-03-19 20:21:55.838000+0000
cqlsh> select event_id, activity, source_elapsed, thread from system_traces.events where session_id=472551e0-6a1f-11ea-97ca-6b8bbe3a2a36;
event_id | activity | source_elapsed | thread
--------------------------------------+---------------------------------------------------------------------------------------------------------------+----------------+-----------------------------------------------------------------------------------------------------------
472578f0-6a1f-11ea-80ad-dffaf3fb56b4 | READ message received from /xx.xx.xxx.158 | 18 | MessagingService-Incoming-/xx.xx.xxx.158
472578f0-6a1f-11ea-97ca-6b8bbe3a2a36 | reading digest from /xx.xx.xxx.138 | 619 | Native-Transport-Requests-1
472578f1-6a1f-11ea-97ca-6b8bbe3a2a36 | Executing single-partition query on feature_store | 708 | ReadStage-2
472578f2-6a1f-11ea-97ca-6b8bbe3a2a36 | reading digest from /xx.xx.xxx.161 | 755 | Native-Transport-Requests-1
472578f3-6a1f-11ea-97ca-6b8bbe3a2a36 | Acquiring sstable references | 768 | ReadStage-2
472578f4-6a1f-11ea-97ca-6b8bbe3a2a36 | Sending READ message to cassandra-feature-store-1.cassandra-feature-store.team-data/xx.xx.xxx.138 | 836 | MessagingService-Outgoing-cassandra-feature-store-1.cassandra-feature-store.team-data/xx.xx.xxx.138-Small
472578f5-6a1f-11ea-97ca-6b8bbe3a2a36 | Bloom filter allows skipping sstable 56 | 859 | ReadStage-2
472578f6-6a1f-11ea-97ca-6b8bbe3a2a36 | speculating read retry on /xx.xx.xx.171 | 862 | Native-Transport-Requests-1
472578f7-6a1f-11ea-97ca-6b8bbe3a2a36 | Sending READ message to /xx.xx.xxx.161 | 893 | MessagingService-Outgoing-/xx.xx.xxx.161-Small
472578f8-6a1f-11ea-97ca-6b8bbe3a2a36 | Bloom filter allows skipping sstable 55 | 903 | ReadStage-2
472578f9-6a1f-11ea-97ca-6b8bbe3a2a36 | Bloom filter allows skipping sstable 54 | 929 | ReadStage-2
472578fa-6a1f-11ea-97ca-6b8bbe3a2a36 | Sending READ message to cassandra-feature-store-0.cassandra-feature-store.team-data/xx.xx.xx.171 | 982 | MessagingService-Outgoing-cassandra-feature-store-0.cassandra-feature-store.team-data/xx.xx.xx.171-Small
472578fb-6a1f-11ea-97ca-6b8bbe3a2a36 | Bloom filter allows skipping sstable 41 | 996 | ReadStage-2
472578fc-6a1f-11ea-97ca-6b8bbe3a2a36 | Skipped 0/4 non-slice-intersecting sstables, included 0 due to tombstones | 1039 | ReadStage-2
472578fd-6a1f-11ea-97ca-6b8bbe3a2a36 | Merged data from memtables and 0 sstables | 1227 | ReadStage-2
472578fe-6a1f-11ea-97ca-6b8bbe3a2a36 | Read 0 live rows and 0 tombstone cells | 1282 | ReadStage-2
4725a000-6a1f-11ea-80ad-dffaf3fb56b4 | Executing single-partition query on feature_store | 226 | ReadStage-2
4725a000-6a1f-11ea-8693-577fec389856 | READ message received from /xx.xx.xxx.158 | 12 | MessagingService-Incoming-/xx.xx.xxx.158
4725a000-6a1f-11ea-8d2e-c5837edad3d1 | READ message received from /xx.xx.xxx.158 | 15 | MessagingService-Incoming-/xx.xx.xxx.158
4725a001-6a1f-11ea-80ad-dffaf3fb56b4 | Acquiring sstable references | 297 | ReadStage-2
4725a001-6a1f-11ea-8693-577fec389856 | Executing single-partition query on feature_store | 258 | ReadStage-1
4725a001-6a1f-11ea-8d2e-c5837edad3d1 | Executing single-partition query on feature_store | 230 | ReadStage-1
4725a002-6a1f-11ea-80ad-dffaf3fb56b4 | Bloom filter allows skipping sstable 55 | 397 | ReadStage-2
4725a002-6a1f-11ea-8693-577fec389856 | Acquiring sstable references | 327 | ReadStage-1
4725a002-6a1f-11ea-8d2e-c5837edad3d1 | Acquiring sstable references | 297 | ReadStage-1
4725a003-6a1f-11ea-80ad-dffaf3fb56b4 | Bloom filter allows skipping sstable 54 | 433 | ReadStage-2
4725a003-6a1f-11ea-8693-577fec389856 | Bloom filter allows skipping sstable 56 | 451 | ReadStage-1
4725a003-6a1f-11ea-8d2e-c5837edad3d1 | Bloom filter allows skipping sstable 56 | 439 | ReadStage-1
4725a004-6a1f-11ea-80ad-dffaf3fb56b4 | Bloom filter allows skipping sstable 41 | 450 | ReadStage-2
4725a004-6a1f-11ea-8693-577fec389856 | Bloom filter allows skipping sstable 55 | 512 | ReadStage-1
4725a004-6a1f-11ea-8d2e-c5837edad3d1 | Bloom filter allows skipping sstable 55 | 492 | ReadStage-1
4725a005-6a1f-11ea-80ad-dffaf3fb56b4 | Skipped 0/3 non-slice-intersecting sstables, included 0 due to tombstones | 466 | ReadStage-2
4725a005-6a1f-11ea-8693-577fec389856 | Bloom filter allows skipping sstable 54 | 570 | ReadStage-1
4725a005-6a1f-11ea-8d2e-c5837edad3d1 | Bloom filter allows skipping sstable 54 | 513 | ReadStage-1
4725a006-6a1f-11ea-80ad-dffaf3fb56b4 | Merged data from memtables and 0 sstables | 648 | ReadStage-2
4725a006-6a1f-11ea-8693-577fec389856 | Bloom filter allows skipping sstable 41 | 606 | ReadStage-1
4725a006-6a1f-11ea-8d2e-c5837edad3d1 | Bloom filter allows skipping sstable 41 | 526 | ReadStage-1
4725a007-6a1f-11ea-80ad-dffaf3fb56b4 | Read 0 live rows and 0 tombstone cells | 708 | ReadStage-2
4725a007-6a1f-11ea-8693-577fec389856 | Skipped 0/4 non-slice-intersecting sstables, included 0 due to tombstones | 631 | ReadStage-1
4725a007-6a1f-11ea-8d2e-c5837edad3d1 | Skipped 0/4 non-slice-intersecting sstables, included 0 due to tombstones | 542 | ReadStage-1
4725a008-6a1f-11ea-80ad-dffaf3fb56b4 | Enqueuing response to /xx.xx.xxx.158 | 727 | ReadStage-2
4725a008-6a1f-11ea-8d2e-c5837edad3d1 | Merged data from memtables and 0 sstables | 700 | ReadStage-1
4725a009-6a1f-11ea-80ad-dffaf3fb56b4 | Sending REQUEST_RESPONSE message to cassandra-feature-store-2.cassandra-feature-store.team-data/xx.xx.xxx.158 | 838 | MessagingService-Outgoing-cassandra-feature-store-2.cassandra-feature-store.team-data/xx.xx.xxx.158-Small
4725a009-6a1f-11ea-8d2e-c5837edad3d1 | Read 0 live rows and 0 tombstone cells | 756 | ReadStage-1
4725a00a-6a1f-11ea-8d2e-c5837edad3d1 | Enqueuing response to /xx.xx.xxx.158 | 772 | ReadStage-1
4725a00b-6a1f-11ea-8d2e-c5837edad3d1 | Sending REQUEST_RESPONSE message to /xx.xx.xxx.158 | 914 | MessagingService-Outgoing-/xx.xx.xxx.158-Small
4725c710-6a1f-11ea-8693-577fec389856 | Merged data from memtables and 0 sstables | 845 | ReadStage-1
4725c710-6a1f-11ea-97ca-6b8bbe3a2a36 | REQUEST_RESPONSE message received from /xx.xx.xxx.161 | 2327 | MessagingService-Incoming-/xx.xx.xxx.161
4725c711-6a1f-11ea-8693-577fec389856 | Read 0 live rows and 0 tombstone cells | 905 | ReadStage-1
4725c711-6a1f-11ea-97ca-6b8bbe3a2a36 | Processing response from /xx.xx.xxx.161 | 2443 | RequestResponseStage-2
4725c712-6a1f-11ea-8693-577fec389856 | Enqueuing response to /xx.xx.xxx.158 | 929 | ReadStage-1
4725c712-6a1f-11ea-97ca-6b8bbe3a2a36 | REQUEST_RESPONSE message received from /xx.xx.xxx.138 | 2571 | MessagingService-Incoming-/xx.xx.xxx.138
4725c713-6a1f-11ea-8693-577fec389856 | Sending REQUEST_RESPONSE message to /xx.xx.xxx.158 | 1023 | MessagingService-Outgoing-/xx.xx.xxx.158-Small
4725c713-6a1f-11ea-97ca-6b8bbe3a2a36 | Processing response from /xx.xx.xxx.138 | 2712 | RequestResponseStage-2
4725c714-6a1f-11ea-97ca-6b8bbe3a2a36 | REQUEST_RESPONSE message received from /xx.xx.xx.171 | 2725 | MessagingService-Incoming-/xx.xx.xx.171
4725c715-6a1f-11ea-97ca-6b8bbe3a2a36 | Processing response from /xx.xx.xx.171 | 2797 | RequestResponseStage-2
4725c716-6a1f-11ea-97ca-6b8bbe3a2a36 | Initiating read-repair | 2855 | RequestResponseStage-2
Keyspace info
cqlsh> describe keyspace feast;
CREATE KEYSPACE feast WITH replication = {'class': 'NetworkTopologyStrategy', 'stage-us-west1': '5'} AND durable_writes = true;
CREATE TABLE feast.feature_store (
entities text,
feature text,
value blob,
PRIMARY KEY (entities, feature)
) WITH CLUSTERING ORDER BY (feature ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
We have had problems with our cassandra nodes going OOM for some time, so finally we got them configured so that we could get a heap dump to try and see what was causing the OOM.
In the dump there where 16 threads (named SharedPool-Worker-XX) each executing a SliceFromReadCommand from the same table. Each of the 16 threads had a RangeTombstoneList object retaining between 200 and 240mb of memory.
The table in question are used as a queue (I know, far from the ideal use case for cassandra, but that is how it is) between two applications, where one writes and the other reads. So it is not unlikely that there is a large number of tombstones in the table...how ever I have been unable to find them.
I did a trace on the query issued against the table, and it resulted in the following:
cqlsh:ddp> select json_data
... from file_download
... where ti = 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX';
ti | uuid | json_data
----+------+-----------
(0 rows)
Tracing session: b2f2be60-01c8-11e8-ae90-fd18acdea80d
activity | timestamp | source | source_elapsed
------------------------------------------------------------------------------------------------------------------------------------+----------------------------+--------------+----------------
Execute CQL3 query | 2018-01-25 12:10:14.214000 | 10.60.73.232 | 0
Parsing select json_data\nfrom file_download\nwhere ti = 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'; [SharedPool-Worker-1] | 2018-01-25 12:10:14.260000 | 10.60.73.232 | 105
Preparing statement [SharedPool-Worker-1] | 2018-01-25 12:10:14.262000 | 10.60.73.232 | 197
Executing single-partition query on file_download [SharedPool-Worker-4] | 2018-01-25 12:10:14.263000 | 10.60.73.232 | 442
Acquiring sstable references [SharedPool-Worker-4] | 2018-01-25 12:10:14.264000 | 10.60.73.232 | 491
Merging memtable tombstones [SharedPool-Worker-4] | 2018-01-25 12:10:14.265000 | 10.60.73.232 | 517
Bloom filter allows skipping sstable 2444 [SharedPool-Worker-4] | 2018-01-25 12:10:14.270000 | 10.60.73.232 | 608
Bloom filter allows skipping sstable 8 [SharedPool-Worker-4] | 2018-01-25 12:10:14.271000 | 10.60.73.232 | 665
Skipped 0/2 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-4] | 2018-01-25 12:10:14.273000 | 10.60.73.232 | 700
Merging data from memtables and 0 sstables [SharedPool-Worker-4] | 2018-01-25 12:10:14.274000 | 10.60.73.232 | 707
Read 0 live and 0 tombstone cells [SharedPool-Worker-4] | 2018-01-25 12:10:14.274000 | 10.60.73.232 | 754
Request complete | 2018-01-25 12:10:14.215148 | 10.60.73.232 | 1148
cfstats also shows avg./max tombstones pr. slice to be 0.
This makes me question whether or not the OOM was actually due to large amount of tombstones or something else?
We run cassandra v2.1.17
I'm implementing unique entry counter in Cassandra. The counter may be represented just as a set of tuples:
counter_id = broadcast:12345, token = user:123
counter_id = broadcast:12345, token = user:321
where value for counter broadcast:12345 may be counted as size of corresponding entries set. Such counter can be effectively stored as a table with counter_id being partition key. My first thought was that since single counter value is basically size of partition, i can do count(1) WHERE counter_id = ? query, which won't need to read data and would be super-duper fast. However, i see following trace output:
cqlsh > select count(1) from token_counter_storage where id = '1';
activity | timestamp | source | source_elapsed
-------------------------------------------------------------------------------------------------+----------------------------+------------+----------------
Execute CQL3 query | 2016-06-10 11:22:42.809000 | 172.17.0.2 | 0
Parsing select count(1) from token_counter_storage where id = '1'; [SharedPool-Worker-1] | 2016-06-10 11:22:42.809000 | 172.17.0.2 | 260
Preparing statement [SharedPool-Worker-1] | 2016-06-10 11:22:42.810000 | 172.17.0.2 | 565
Executing single-partition query on token_counter_storage [SharedPool-Worker-2] | 2016-06-10 11:22:42.810000 | 172.17.0.2 | 1256
Acquiring sstable references [SharedPool-Worker-2] | 2016-06-10 11:22:42.810000 | 172.17.0.2 | 1350
Skipped 0/0 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-2] | 2016-06-10 11:22:42.810000 | 172.17.0.2 | 1465
Merging data from memtables and 0 sstables [SharedPool-Worker-2] | 2016-06-10 11:22:42.810000 | 172.17.0.2 | 1546
Read 10 live and 0 tombstone cells [SharedPool-Worker-2] | 2016-06-10 11:22:42.811000 | 172.17.0.2 | 1826
Request complete | 2016-06-10 11:22:42.811410 | 172.17.0.2 | 2410
I guess that this trace confirms data being read from disk. Am i right in this conclusion, and if yes, is there any way to simply fetch partition size using index without any excessive disk hits?
I have a table with 45 million keys.
Compaction strategy is LCS.
Single node Cassandra version 2.0.5.
No key or row caching.
Queries are very slow. They were several times faster when the database was smaller.
activity | timestamp | source | source_elapsed
-----------------------------------------------------------------------------------------------+--------------+-------------+----------------
execute_cql3_query | 17:47:07,891 | 10.72.9.151 | 0
Parsing select * from object_version_info where key = 'test10-Client9_99900' LIMIT 10000; | 17:47:07,891 | 10.72.9.151 | 80
Preparing statement | 17:47:07,891 | 10.72.9.151 | 178
Executing single-partition query on object_version_info | 17:47:07,893 | 10.72.9.151 | 2513
Acquiring sstable references | 17:47:07,893 | 10.72.9.151 | 2539
Merging memtable tombstones | 17:47:07,893 | 10.72.9.151 | 2597
Bloom filter allows skipping sstable 1517 | 17:47:07,893 | 10.72.9.151 | 2652
Bloom filter allows skipping sstable 1482 | 17:47:07,893 | 10.72.9.151 | 2677
Partition index with 0 entries found for sstable 1268 | 17:47:08,560 | 10.72.9.151 | 669935
Seeking to partition beginning in data file | 17:47:08,560 | 10.72.9.151 | 669956
Skipped 0/3 non-slice-intersecting sstables, included 0 due to tombstones | 17:47:09,411 | 10.72.9.151 | 1520279
Merging data from memtables and 1 sstables | 17:47:09,411 | 10.72.9.151 | 1520302
Read 1 live and 0 tombstoned cells | 17:47:09,411 | 10.72.9.151 | 1520351
Request complete | 17:47:09,411 | 10.72.9.151 | 1520615
activity | timestamp | source | source_elapsed
------------------------------------------------------------------------------------------------------+--------------+---------------+----------------
execute_cql3_query | 06:30:52,479 | 192.168.11.23 | 0
Parsing select adid from userlastadevents where userid = '90000012' and type in (1,2,3) LIMIT 10000; | 06:30:52,479 | 192.168.11.23 | 44
Peparing statement | 06:30:52,479 | 192.168.11.23 | 146
Executing single-partition query on userlastadevents | 06:30:52,480 | 192.168.11.23 | 665
Acquiring sstable references | 06:30:52,480 | 192.168.11.23 | 680
Executing single-partition query on userlastadevents | 06:30:52,480 | 192.168.11.23 | 696
Acquiring sstable references | 06:30:52,480 | 192.168.11.23 | 704
Merging memtable tombstones | 06:30:52,480 | 192.168.11.23 | 706
Merging memtable tombstones | 06:30:52,480 | 192.168.11.23 | 721
Bloom filter allows skipping sstable 37398 | 06:30:52,480 | 192.168.11.23 | 758
Bloom filter allows skipping sstable 37426 | 06:30:52,480 | 192.168.11.23 | 762
Bloom filter allows skipping sstable 35504 | 06:30:52,480 | 192.168.11.23 | 768
Bloom filter allows skipping sstable 36671 | 06:30:52,480 | 192.168.11.23 | 771
Merging data from memtables and 0 sstables | 06:30:52,480 | 192.168.11.23 | 777
Merging data from memtables and 0 sstables | 06:30:52,480 | 192.168.11.23 | 780
Executing single-partition query on userlastadevents | 06:30:52,480 | 192.168.11.23 | 782
Acquiring sstable references | 06:30:52,480 | 192.168.11.23 | 791
Read 0 live and 0 tombstoned cells | 06:30:52,480 | 192.168.11.23 | 797
Read 0 live and 0 tombstoned cells | 06:30:52,480 | 192.168.11.23 | 800
Merging memtable tombstones | 06:30:52,480 | 192.168.11.23 | 815
Bloom filter allows skipping sstable 37432 | 06:30:52,480 | 192.168.11.23 | 857
Bloom filter allows skipping sstable 36918 | 06:30:52,480 | 192.168.11.23 | 866
Merging data from memtables and 0 sstables | 06:30:52,480 | 192.168.11.23 | 874
Read 0 live and 0 tombstoned cells | 06:30:52,480 | 192.168.11.23 | 898
Request complete | 06:30:52,479 | 192.168.11.23 | 990
Above is the tracing output from cassandra cqlsh for a single query, but I couln't understand some the entries, at first the column "source_elapsed" what does it mean, does it mean time elapsed to execute particular task or cumulative time elapsed up to this task. second "timestamp" doesn't maintain chronology like "Request Complete" timestamp is 06:30:52,479 but "Merging data from memtables and 0 sstables" is 06:30:52,480 which is suppose to happen earlier but timestamp shows it happens later.
And couldn't understand some of the activities as well,
Executing single-partition query -- doesn't it mean all the task as a whole or is it a starting point? what are the job it includes? And why is it repeating three times? is it link to replication factor.
Acquiring sstable references -- What does it mean, does it checks all the sstable's bloom filters whether that contains a particular key we search for? An then find the reference in data file with the help of "Partition Index".
Bloom filter allows skipping sstable -- when does it happen? How does it happen? it is taking same amount of time of finding sstable references.
Request complete -- what does it mean? is it the finishing line or it is some job which takes most amount of time?
Did you see the request tracing in Cassandra link that explains different tracing scenarios?
source_elapsed: is the cumulative execution time on a specific node (if you check the above link it will be clearer)
Executing single-partition query: (seems to represent) the start time
Request complete: all work has been done for this request
For the rest you'd be better off reading the Reads in Cassandra docs as that would be much more detailed than I could summarize it here.