Why is cassandra not reading data from the local node? - cassandra

I'm running a three-node cassandra cluster. I created a table with the following configuration:
cassandra#cqlsh> describe docstore;
CREATE KEYSPACE docstore WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '3'} AND durable_writes = true;
CREATE TABLE docstore.docs (
docid uuid PRIMARY KEY
) WITH bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.0
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = 'NONE';
I traced a read command via CQL, and I see that a message is being sent from the coordinator node to another node:
Execute CQL3 query | 2022-12-21 22:06:53.438000 | 10.155.32.67 | 0 | 10.155.32.56
Parsing select * from docstore.docs where docId = f8e682f0-b4d0-9b83-bcdb-4a182453d9c9; [Native-Transport-Requests-1] | 2022-12-21 22:06:53.438000 | 10.155.32.67 | 97 | 10.155.32.56
Preparing statement [Native-Transport-Requests-1] | 2022-12-21 22:06:53.438000 | 10.155.32.67 | 182 | 10.155.32.56
Executing single-partition query on roles [ReadStage-3] | 2022-12-21 22:06:53.438000 | 10.155.32.67 | 411 | 10.155.32.56
Acquiring sstable references [ReadStage-3] | 2022-12-21 22:06:53.438000 | 10.155.32.67 | 425 | 10.155.32.56
Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones [ReadStage-3] | 2022-12-21 22:06:53.438000 | 10.155.32.67 | 443 | 10.155.32.56
Key cache hit for sstable 1 [ReadStage-3] | 2022-12-21 22:06:53.438001 | 10.155.32.67 | 467 | 10.155.32.56
Merged data from memtables and 1 sstables [ReadStage-3] | 2022-12-21 22:06:53.438001 | 10.155.32.67 | 533 | 10.155.32.56
Read 1 live rows and 0 tombstone cells [ReadStage-3] | 2022-12-21 22:06:53.438001 | 10.155.32.67 | 554 | 10.155.32.56
Executing single-partition query on roles [ReadStage-2] | 2022-12-21 22:06:53.438001 | 10.155.32.67 | 763 | 10.155.32.56
Acquiring sstable references [ReadStage-2] | 2022-12-21 22:06:53.438001 | 10.155.32.67 | 776 | 10.155.32.56
Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones [ReadStage-2] | 2022-12-21 22:06:53.438001 | 10.155.32.67 | 785 | 10.155.32.56
Key cache hit for sstable 1 [ReadStage-2] | 2022-12-21 22:06:53.438001 | 10.155.32.67 | 800 | 10.155.32.56
Merged data from memtables and 1 sstables [ReadStage-2] | 2022-12-21 22:06:53.439000 | 10.155.32.67 | 832 | 10.155.32.56
Read 1 live rows and 0 tombstone cells [ReadStage-2] | 2022-12-21 22:06:53.439000 | 10.155.32.67 | 847 | 10.155.32.56
reading data from /10.155.32.56 [Native-Transport-Requests-1] | 2022-12-21 22:06:53.439000 | 10.155.32.67 | 956 | 10.155.32.56
Sending READ message to cassandra-0.cassandra-headless.default.svc.cluster.local/10.155.32.56 [MessagingService-Outgoing-cassandra-0.cassandra-headless.default.svc.cluster.local/10.155.32.56-Small] | 2022-12-21 22:06:53.439000 | 10.155.32.67 | 1015 | 10.155.32.56
REQUEST_RESPONSE message received from /10.155.32.56 [MessagingService-Incoming-/10.155.32.56] | 2022-12-21 22:06:53.440000 | 10.155.32.67 | 1957 | 10.155.32.56
Processing response from /10.155.32.56 [RequestResponseStage-4] | 2022-12-21 22:06:53.440000 | 10.155.32.67 | 1987 | 10.155.32.56
READ message received from /10.155.32.67 [MessagingService-Incoming-/10.155.32.67] | 2022-12-21 22:06:53.451000 | 10.155.32.56 | 4 | 10.155.32.56
Executing single-partition query on docs [ReadStage-3] | 2022-12-21 22:06:53.451000 | 10.155.32.56 | 115 | 10.155.32.56
Acquiring sstable references [ReadStage-3] | 2022-12-21 22:06:53.451000 | 10.155.32.56 | 135 | 10.155.32.56
Merging memtable contents [ReadStage-3] | 2022-12-21 22:06:53.451000 | 10.155.32.56 | 150 | 10.155.32.56
Key cache hit for sstable 1 [ReadStage-3] | 2022-12-21 22:06:53.452000 | 10.155.32.56 | 258 | 10.155.32.56
Read 1 live rows and 0 tombstone cells [ReadStage-3] | 2022-12-21 22:06:53.452000 | 10.155.32.56 | 368 | 10.155.32.56
Enqueuing response to /10.155.32.67 [ReadStage-3] | 2022-12-21 22:06:53.452000 | 10.155.32.56 | 552 | 10.155.32.56
Sending REQUEST_RESPONSE message to /10.155.32.67 [MessagingService-Outgoing-/10.155.32.67-Small] | 2022-12-21 22:06:53.452000 | 10.155.32.56 | 664 | 10.155.32.56
Request complete | 2022-12-21 22:06:53.440087 | 10.155.32.67 | 2087 | 10.155.32.56
Why is a request being sent from the coordinator node to another node, considering the coordinator node contains the data that's being requested? I have a replication factor of 3 on a three-node cluster, so every node contains all the data.

What consistency level do you use for your read? If it's higher than ONE/LOCAL_ONE then the coordinator will request from another node to ensure consistency. I'd try running 'consistency LOCAL_ONE;' before the query with the same trace output to see if it changes.

Related

Cassandra select not stable using datastax driver

Versions:
com.datastax.oss
-java-driver-core:4.5.0
-java-driver-query-builder:4.5.0
-java-driver-mapper-runtime:4.5.0
cassandra:3.11.5 docker image
jdk 11.1
I'm running a deployment of feast that I've modified to use cassandra as a backend low latency serving db for machine learning features. I'm sucessfully writing and reading rows, but the read is inconsistent with respect to results returned. Sometimes the payloads are empty and I don't know why. I have already tried updating to the latest datastax driver and coordinating time using ntp/time.google.com. I've also tried to change the consistency of write to ALL and read to LOCAL_ONE/LOCAL_QUOROM, without success. I'm really struggling to figure out why select isn't consistent. Any insight would be great! :) Here is the process:
I write the rows into cassandra using CassandraIO
#Override
public Future<Void> saveAsync(CassandraMutation entityClass) {
return mapper.saveAsync(
entityClass,
Option.timestamp(entityClass.getWriteTime()),
Option.ttl(entityClass.getTtl()),
Option.consistencyLevel(ConsistencyLevel.LOCAL_QUORUM),
Option.tracing(true));
}
This seems to successfully map rows into my cassandra cluster, which I then query in my application as follows
List<InetSocketAddress> contactPoints =
Arrays.stream(cassandraConfig.getBootstrapHosts().split(","))
.map(h -> new InetSocketAddress(h, cassandraConfig.getPort()))
.collect(Collectors.toList());
CqlSession session =
CqlSession.builder()
.addContactPoints(contactPoints)
.withLocalDatacenter(storeProperties.getCassandraDcName())
.build();
....
PreparedStatement query =
session.prepare(
String.format(
"SELECT entities, feature, value, WRITETIME(value) as writetime FROM %s.%s WHERE entities = ?",
keyspace, tableName));
session.execute(
query
.bind(key)
.setTracing(true)
.setConsistencyLevel(ConsistencyLevel.LOCAL_QUORUM)));
My issue is that there doesn't seem to be consistent selects happening. I have been recording various bits for a while now and for example here are two select queries, with the same coordinator, one succeeded, then after that the subsequent select fails to return results.
cqlsh> select * from system_traces.sessions where session_id=be023400-6a1e-11ea-97ca-6b8bbe3a2a36;
session_id | client | command | coordinator | duration | parameters | request | started_at
--------------------------------------+---------------+---------+---------------+----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------+---------------------------------
be023400-6a1e-11ea-97ca-6b8bbe3a2a36 | xx.xx.xxx.189 | QUERY | xx.xx.xxx.158 | 41313 | {'bound_var_0_entities': '''ml_project/test_test_entity:1:entity2_uuid=TenderGreens_8755fff7|entity1_uuid=Zach_Yang_fe7fea92''', 'consistency_level': 'LOCAL_QUORUM', 'page_size': '5000', 'query': 'SELECT entities, feature, value, WRITETIME(value) as writetime FROM feast.feature_store WHERE entities = ?', 'serial_consistency_level': 'SERIAL'} | Execute CQL3 prepared query | 2020-03-19 20:18:05.760000+0000
select event_id, activity, source_elapsed, thread from system_traces.events where session_id=be023400-6a1e-11ea-97ca-6b8bbe3a2a36;
event_id | activity | source_elapsed | thread
--------------------------------------+---------------------------------------------------------------------------------------------------------------+----------------+-----------------------------------------------------------------------------------------------------------
be0652b0-6a1e-11ea-97ca-6b8bbe3a2a36 | Read-repair DC_LOCAL | 27087 | Native-Transport-Requests-1
be0679c0-6a1e-11ea-97ca-6b8bbe3a2a36 | reading data from /xx.xx.xxx.161 | 28034 | Native-Transport-Requests-1
be06a0d0-6a1e-11ea-97ca-6b8bbe3a2a36 | Sending READ message to /xx.xx.xxx.161 | 28552 | MessagingService-Outgoing-/xx.xx.xxx.161-Small
be06a0d1-6a1e-11ea-97ca-6b8bbe3a2a36 | reading digest from /xx.xx.xxx.162 | 28595 | Native-Transport-Requests-1
be06a0d2-6a1e-11ea-97ca-6b8bbe3a2a36 | Executing single-partition query on feature_store | 28598 | ReadStage-3
be06a0d3-6a1e-11ea-97ca-6b8bbe3a2a36 | Acquiring sstable references | 28689 | ReadStage-3
be06a0d4-6a1e-11ea-97ca-6b8bbe3a2a36 | reading digest from /xx.xx.xx.138 | 28852 | Native-Transport-Requests-1
be06a0d5-6a1e-11ea-97ca-6b8bbe3a2a36 | Sending READ message to /xx.xx.xxx.162 | 28904 | MessagingService-Outgoing-/xx.xx.xxx.162-Small
be06a0d6-6a1e-11ea-97ca-6b8bbe3a2a36 | Bloom filter allows skipping sstable 56 | 28937 | ReadStage-3
be06a0d7-6a1e-11ea-97ca-6b8bbe3a2a36 | reading digest from /xx.xx.xxx.171 | 28983 | Native-Transport-Requests-1
be06a0d8-6a1e-11ea-97ca-6b8bbe3a2a36 | Bloom filter allows skipping sstable 55 | 29020 | ReadStage-3
be06a0d9-6a1e-11ea-97ca-6b8bbe3a2a36 | Sending READ message to cassandra-feature-store-1.cassandra-feature-store.team-data/xx.xx.xx.138 | 29071 | MessagingService-Outgoing-cassandra-feature-store-1.cassandra-feature-store.team-data/xx.xx.xx.138-Small
be06a0da-6a1e-11ea-97ca-6b8bbe3a2a36 | Bloom filter allows skipping sstable 54 | 29181 | ReadStage-3
be06a0db-6a1e-11ea-97ca-6b8bbe3a2a36 | Sending READ message to cassandra-feature-store-0.cassandra-feature-store.team-data/xx.xx.xxx.171 | 29201 | MessagingService-Outgoing-cassandra-feature-store-0.cassandra-feature-store.team-data/xx.xx.xx.171-Small
be06c7e0-6a1e-11ea-80ad-dffaf3fb56b4 | READ message received from /xx.xx.xxx.158 | 33 | MessagingService-Incoming-/xx.xx.xxx.158
be06c7e0-6a1e-11ea-8693-577fec389856 | READ message received from /xx.xx.xxx.158 | 34 | MessagingService-Incoming-/xx.xx.xxx.158
be06c7e0-6a1e-11ea-8b1a-e5aa876f7d0d | READ message received from /xx.xx.xxx.158 | 29 | MessagingService-Incoming-/xx.xx.xxx.158
be06c7e0-6a1e-11ea-8d2e-c5837edad3d1 | READ message received from /xx.xx.xxx.158 | 44 | MessagingService-Incoming-/xx.xx.xxx.158
be06c7e0-6a1e-11ea-97ca-6b8bbe3a2a36 | Bloom filter allows skipping sstable 41 | 29273 | ReadStage-3
be06c7e1-6a1e-11ea-80ad-dffaf3fb56b4 | Executing single-partition query on feature_store | 389 | ReadStage-1
be06c7e1-6a1e-11ea-8b1a-e5aa876f7d0d | Executing single-partition query on feature_store | 513 | ReadStage-1
be06c7e1-6a1e-11ea-97ca-6b8bbe3a2a36 | Skipped 0/4 non-slice-intersecting sstables, included 0 due to tombstones | 29342 | ReadStage-3
be06c7e2-6a1e-11ea-80ad-dffaf3fb56b4 | Acquiring sstable references | 457 | ReadStage-1
be06c7e3-6a1e-11ea-80ad-dffaf3fb56b4 | Bloom filter allows skipping sstable 55 | 620 | ReadStage-1
be06c7e4-6a1e-11ea-80ad-dffaf3fb56b4 | Bloom filter allows skipping sstable 54 | 659 | ReadStage-1
be06c7e5-6a1e-11ea-80ad-dffaf3fb56b4 | Bloom filter allows skipping sstable 41 | 677 | ReadStage-1
be06c7e6-6a1e-11ea-80ad-dffaf3fb56b4 | Skipped 0/3 non-slice-intersecting sstables, included 0 due to tombstones | 695 | ReadStage-1
be06eef0-6a1e-11ea-80ad-dffaf3fb56b4 | Merged data from memtables and 0 sstables | 1039 | ReadStage-1
be06eef0-6a1e-11ea-8693-577fec389856 | Executing single-partition query on feature_store | 372 | ReadStage-1
be06eef0-6a1e-11ea-8b1a-e5aa876f7d0d | Acquiring sstable references | 583 | ReadStage-1
be06eef0-6a1e-11ea-8d2e-c5837edad3d1 | Executing single-partition query on feature_store | 454 | ReadStage-1
be06eef0-6a1e-11ea-97ca-6b8bbe3a2a36 | Merged data from memtables and 0 sstables | 30372 | ReadStage-3
be06eef1-6a1e-11ea-80ad-dffaf3fb56b4 | Read 16 live rows and 0 tombstone cells | 1125 | ReadStage-1
be06eef1-6a1e-11ea-8693-577fec389856 | Acquiring sstable references | 493 | ReadStage-1
be06eef1-6a1e-11ea-8b1a-e5aa876f7d0d | Bloom filter allows skipping sstable 54 | 703 | ReadStage-1
be06eef1-6a1e-11ea-8d2e-c5837edad3d1 | Acquiring sstable references | 530 | ReadStage-1
be06eef1-6a1e-11ea-97ca-6b8bbe3a2a36 | Read 16 live rows and 0 tombstone cells | 30484 | ReadStage-3
be06eef2-6a1e-11ea-80ad-dffaf3fb56b4 | Enqueuing response to /xx.xx.xxx.158 | 1155 | ReadStage-1
be06eef2-6a1e-11ea-8693-577fec389856 | Bloom filter allows skipping sstable 56 | 721 | ReadStage-1
be06eef2-6a1e-11ea-8b1a-e5aa876f7d0d | Bloom filter allows skipping sstable 41 | 740 | ReadStage-1
be06eef2-6a1e-11ea-8d2e-c5837edad3d1 | Bloom filter allows skipping sstable 56 | 655 | ReadStage-1
be06eef3-6a1e-11ea-80ad-dffaf3fb56b4 | Sending REQUEST_RESPONSE message to cassandra-feature-store-2.cassandra-feature-store.team-data/xx.xx.xxx.158 | 1492 | MessagingService-Outgoing-cassandra-feature-store-2.cassandra-feature-store.team-data/xx.xx.xxx.158-Small
be06eef3-6a1e-11ea-8693-577fec389856 | Bloom filter allows skipping sstable 55 | 780 | ReadStage-1
be06eef3-6a1e-11ea-8b1a-e5aa876f7d0d | Skipped 0/2 non-slice-intersecting sstables, included 0 due to tombstones | 761 | ReadStage-1
be06eef3-6a1e-11ea-8d2e-c5837edad3d1 | Bloom filter allows skipping sstable 55 | 686 | ReadStage-1
be06eef4-6a1e-11ea-8693-577fec389856 | Bloom filter allows skipping sstable 54 | 815 | ReadStage-1
be06eef4-6a1e-11ea-8b1a-e5aa876f7d0d | Merged data from memtables and 0 sstables | 1320 | ReadStage-1
be06eef4-6a1e-11ea-8d2e-c5837edad3d1 | Bloom filter allows skipping sstable 54 | 705 | ReadStage-1
be06eef5-6a1e-11ea-8693-577fec389856 | Bloom filter allows skipping sstable 41 | 839 | ReadStage-1
be06eef5-6a1e-11ea-8b1a-e5aa876f7d0d | Read 16 live rows and 0 tombstone cells | 1495 | ReadStage-1
be06eef5-6a1e-11ea-8d2e-c5837edad3d1 | Bloom filter allows skipping sstable 41 | 720 | ReadStage-1
be06eef6-6a1e-11ea-8693-577fec389856 | Skipped 0/4 non-slice-intersecting sstables, included 0 due to tombstones | 871 | ReadStage-1
be06eef6-6a1e-11ea-8b1a-e5aa876f7d0d | Enqueuing response to /xx.xx.xxx.158 | 1554 | ReadStage-1
be06eef6-6a1e-11ea-8d2e-c5837edad3d1 | Skipped 0/4 non-slice-intersecting sstables, included 0 due to tombstones | 738 | ReadStage-1
be06eef7-6a1e-11ea-8d2e-c5837edad3d1 | Merged data from memtables and 0 sstables | 1157 | ReadStage-1
be06eef8-6a1e-11ea-8d2e-c5837edad3d1 | Read 16 live rows and 0 tombstone cells | 1296 | ReadStage-1
be06eef9-6a1e-11ea-8d2e-c5837edad3d1 | Enqueuing response to /xx.xx.xxx.158 | 1325 | ReadStage-1
be071600-6a1e-11ea-8693-577fec389856 | Merged data from memtables and 0 sstables | 1592 | ReadStage-1
be071600-6a1e-11ea-8b1a-e5aa876f7d0d | Sending REQUEST_RESPONSE message to cassandra-feature-store-2.cassandra-feature-store.team-data/xx.xx.xxx.158 | 1783 | MessagingService-Outgoing-cassandra-feature-store-2.cassandra-feature-store.team-data/xx.xx.xxx.158-Small
be071600-6a1e-11ea-8d2e-c5837edad3d1 | Sending REQUEST_RESPONSE message to /xx.xx.xxx.158 | 1484 | MessagingService-Outgoing-/xx.xx.xxx.158-Small
be071600-6a1e-11ea-97ca-6b8bbe3a2a36 | REQUEST_RESPONSE message received from /xx.xx.xxx.161 | 31525 | MessagingService-Incoming-/xx.xx.xxx.161
be071601-6a1e-11ea-8693-577fec389856 | Read 16 live rows and 0 tombstone cells | 1754 | ReadStage-1
be071601-6a1e-11ea-97ca-6b8bbe3a2a36 | Processing response from /xx.xx.xxx.161 | 31650 | RequestResponseStage-4
be071602-6a1e-11ea-8693-577fec389856 | Enqueuing response to /xx.xx.xxx.158 | 1796 | ReadStage-1
be071602-6a1e-11ea-97ca-6b8bbe3a2a36 | REQUEST_RESPONSE message received from /xx.xx.xx.138 | 31795 | MessagingService-Incoming-/xx.xx.xx.138
be071603-6a1e-11ea-8693-577fec389856 | Sending REQUEST_RESPONSE message to /xx.xx.xxx.158 | 1973 | MessagingService-Outgoing-/xx.xx.xxx.158-Small
be071603-6a1e-11ea-97ca-6b8bbe3a2a36 | Processing response from /xx.xx.xx.138 | 31872 | RequestResponseStage-4
be071604-6a1e-11ea-97ca-6b8bbe3a2a36 | REQUEST_RESPONSE message received from /xx.xx.xxx.162 | 31918 | MessagingService-Incoming-/xx.xx.xxx.162
be071605-6a1e-11ea-97ca-6b8bbe3a2a36 | Processing response from /xx.xx.xxx.162 | 32047 | RequestResponseStage-4
be073d10-6a1e-11ea-97ca-6b8bbe3a2a36 | REQUEST_RESPONSE message received from /xx.xx.xxx.171 | 32688 | MessagingService-Incoming-/xx.xx.xx.171
be073d11-6a1e-11ea-97ca-6b8bbe3a2a36 | Processing response from /xx.xx.xxx.171 | 32827 | RequestResponseStage-2
be073d12-6a1e-11ea-97ca-6b8bbe3a2a36 | Initiating read-repair | 32985 | RequestResponseStage-2
Failure:
cqlsh> select * from system_traces.sessions where session_id=472551e0-6a1f-11ea-97ca-6b8bbe3a2a36;
session_id | client | command | coordinator | duration | parameters | request | started_at
--------------------------------------+---------------+---------+---------------+----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------+---------------------------------
472551e0-6a1f-11ea-97ca-6b8bbe3a2a36 | xx.xx.xxx.189 | QUERY | xx.xx.xxx.158 | 3044 | {'bound_var_0_entities': '''ml_project/test_test_entity:1:entity1_uuid=Zach_Yang_fe7fea92|entity2_uuid=TenderGreens_8755fff7''', 'consistency_level': 'LOCAL_QUORUM', 'page_size': '5000', 'query': 'SELECT entities, feature, value, WRITETIME(value) as writetime FROM feast.feature_store WHERE entities = ?', 'serial_consistency_level': 'SERIAL'} | Execute CQL3 prepared query | 2020-03-19 20:21:55.838000+0000
cqlsh> select event_id, activity, source_elapsed, thread from system_traces.events where session_id=472551e0-6a1f-11ea-97ca-6b8bbe3a2a36;
event_id | activity | source_elapsed | thread
--------------------------------------+---------------------------------------------------------------------------------------------------------------+----------------+-----------------------------------------------------------------------------------------------------------
472578f0-6a1f-11ea-80ad-dffaf3fb56b4 | READ message received from /xx.xx.xxx.158 | 18 | MessagingService-Incoming-/xx.xx.xxx.158
472578f0-6a1f-11ea-97ca-6b8bbe3a2a36 | reading digest from /xx.xx.xxx.138 | 619 | Native-Transport-Requests-1
472578f1-6a1f-11ea-97ca-6b8bbe3a2a36 | Executing single-partition query on feature_store | 708 | ReadStage-2
472578f2-6a1f-11ea-97ca-6b8bbe3a2a36 | reading digest from /xx.xx.xxx.161 | 755 | Native-Transport-Requests-1
472578f3-6a1f-11ea-97ca-6b8bbe3a2a36 | Acquiring sstable references | 768 | ReadStage-2
472578f4-6a1f-11ea-97ca-6b8bbe3a2a36 | Sending READ message to cassandra-feature-store-1.cassandra-feature-store.team-data/xx.xx.xxx.138 | 836 | MessagingService-Outgoing-cassandra-feature-store-1.cassandra-feature-store.team-data/xx.xx.xxx.138-Small
472578f5-6a1f-11ea-97ca-6b8bbe3a2a36 | Bloom filter allows skipping sstable 56 | 859 | ReadStage-2
472578f6-6a1f-11ea-97ca-6b8bbe3a2a36 | speculating read retry on /xx.xx.xx.171 | 862 | Native-Transport-Requests-1
472578f7-6a1f-11ea-97ca-6b8bbe3a2a36 | Sending READ message to /xx.xx.xxx.161 | 893 | MessagingService-Outgoing-/xx.xx.xxx.161-Small
472578f8-6a1f-11ea-97ca-6b8bbe3a2a36 | Bloom filter allows skipping sstable 55 | 903 | ReadStage-2
472578f9-6a1f-11ea-97ca-6b8bbe3a2a36 | Bloom filter allows skipping sstable 54 | 929 | ReadStage-2
472578fa-6a1f-11ea-97ca-6b8bbe3a2a36 | Sending READ message to cassandra-feature-store-0.cassandra-feature-store.team-data/xx.xx.xx.171 | 982 | MessagingService-Outgoing-cassandra-feature-store-0.cassandra-feature-store.team-data/xx.xx.xx.171-Small
472578fb-6a1f-11ea-97ca-6b8bbe3a2a36 | Bloom filter allows skipping sstable 41 | 996 | ReadStage-2
472578fc-6a1f-11ea-97ca-6b8bbe3a2a36 | Skipped 0/4 non-slice-intersecting sstables, included 0 due to tombstones | 1039 | ReadStage-2
472578fd-6a1f-11ea-97ca-6b8bbe3a2a36 | Merged data from memtables and 0 sstables | 1227 | ReadStage-2
472578fe-6a1f-11ea-97ca-6b8bbe3a2a36 | Read 0 live rows and 0 tombstone cells | 1282 | ReadStage-2
4725a000-6a1f-11ea-80ad-dffaf3fb56b4 | Executing single-partition query on feature_store | 226 | ReadStage-2
4725a000-6a1f-11ea-8693-577fec389856 | READ message received from /xx.xx.xxx.158 | 12 | MessagingService-Incoming-/xx.xx.xxx.158
4725a000-6a1f-11ea-8d2e-c5837edad3d1 | READ message received from /xx.xx.xxx.158 | 15 | MessagingService-Incoming-/xx.xx.xxx.158
4725a001-6a1f-11ea-80ad-dffaf3fb56b4 | Acquiring sstable references | 297 | ReadStage-2
4725a001-6a1f-11ea-8693-577fec389856 | Executing single-partition query on feature_store | 258 | ReadStage-1
4725a001-6a1f-11ea-8d2e-c5837edad3d1 | Executing single-partition query on feature_store | 230 | ReadStage-1
4725a002-6a1f-11ea-80ad-dffaf3fb56b4 | Bloom filter allows skipping sstable 55 | 397 | ReadStage-2
4725a002-6a1f-11ea-8693-577fec389856 | Acquiring sstable references | 327 | ReadStage-1
4725a002-6a1f-11ea-8d2e-c5837edad3d1 | Acquiring sstable references | 297 | ReadStage-1
4725a003-6a1f-11ea-80ad-dffaf3fb56b4 | Bloom filter allows skipping sstable 54 | 433 | ReadStage-2
4725a003-6a1f-11ea-8693-577fec389856 | Bloom filter allows skipping sstable 56 | 451 | ReadStage-1
4725a003-6a1f-11ea-8d2e-c5837edad3d1 | Bloom filter allows skipping sstable 56 | 439 | ReadStage-1
4725a004-6a1f-11ea-80ad-dffaf3fb56b4 | Bloom filter allows skipping sstable 41 | 450 | ReadStage-2
4725a004-6a1f-11ea-8693-577fec389856 | Bloom filter allows skipping sstable 55 | 512 | ReadStage-1
4725a004-6a1f-11ea-8d2e-c5837edad3d1 | Bloom filter allows skipping sstable 55 | 492 | ReadStage-1
4725a005-6a1f-11ea-80ad-dffaf3fb56b4 | Skipped 0/3 non-slice-intersecting sstables, included 0 due to tombstones | 466 | ReadStage-2
4725a005-6a1f-11ea-8693-577fec389856 | Bloom filter allows skipping sstable 54 | 570 | ReadStage-1
4725a005-6a1f-11ea-8d2e-c5837edad3d1 | Bloom filter allows skipping sstable 54 | 513 | ReadStage-1
4725a006-6a1f-11ea-80ad-dffaf3fb56b4 | Merged data from memtables and 0 sstables | 648 | ReadStage-2
4725a006-6a1f-11ea-8693-577fec389856 | Bloom filter allows skipping sstable 41 | 606 | ReadStage-1
4725a006-6a1f-11ea-8d2e-c5837edad3d1 | Bloom filter allows skipping sstable 41 | 526 | ReadStage-1
4725a007-6a1f-11ea-80ad-dffaf3fb56b4 | Read 0 live rows and 0 tombstone cells | 708 | ReadStage-2
4725a007-6a1f-11ea-8693-577fec389856 | Skipped 0/4 non-slice-intersecting sstables, included 0 due to tombstones | 631 | ReadStage-1
4725a007-6a1f-11ea-8d2e-c5837edad3d1 | Skipped 0/4 non-slice-intersecting sstables, included 0 due to tombstones | 542 | ReadStage-1
4725a008-6a1f-11ea-80ad-dffaf3fb56b4 | Enqueuing response to /xx.xx.xxx.158 | 727 | ReadStage-2
4725a008-6a1f-11ea-8d2e-c5837edad3d1 | Merged data from memtables and 0 sstables | 700 | ReadStage-1
4725a009-6a1f-11ea-80ad-dffaf3fb56b4 | Sending REQUEST_RESPONSE message to cassandra-feature-store-2.cassandra-feature-store.team-data/xx.xx.xxx.158 | 838 | MessagingService-Outgoing-cassandra-feature-store-2.cassandra-feature-store.team-data/xx.xx.xxx.158-Small
4725a009-6a1f-11ea-8d2e-c5837edad3d1 | Read 0 live rows and 0 tombstone cells | 756 | ReadStage-1
4725a00a-6a1f-11ea-8d2e-c5837edad3d1 | Enqueuing response to /xx.xx.xxx.158 | 772 | ReadStage-1
4725a00b-6a1f-11ea-8d2e-c5837edad3d1 | Sending REQUEST_RESPONSE message to /xx.xx.xxx.158 | 914 | MessagingService-Outgoing-/xx.xx.xxx.158-Small
4725c710-6a1f-11ea-8693-577fec389856 | Merged data from memtables and 0 sstables | 845 | ReadStage-1
4725c710-6a1f-11ea-97ca-6b8bbe3a2a36 | REQUEST_RESPONSE message received from /xx.xx.xxx.161 | 2327 | MessagingService-Incoming-/xx.xx.xxx.161
4725c711-6a1f-11ea-8693-577fec389856 | Read 0 live rows and 0 tombstone cells | 905 | ReadStage-1
4725c711-6a1f-11ea-97ca-6b8bbe3a2a36 | Processing response from /xx.xx.xxx.161 | 2443 | RequestResponseStage-2
4725c712-6a1f-11ea-8693-577fec389856 | Enqueuing response to /xx.xx.xxx.158 | 929 | ReadStage-1
4725c712-6a1f-11ea-97ca-6b8bbe3a2a36 | REQUEST_RESPONSE message received from /xx.xx.xxx.138 | 2571 | MessagingService-Incoming-/xx.xx.xxx.138
4725c713-6a1f-11ea-8693-577fec389856 | Sending REQUEST_RESPONSE message to /xx.xx.xxx.158 | 1023 | MessagingService-Outgoing-/xx.xx.xxx.158-Small
4725c713-6a1f-11ea-97ca-6b8bbe3a2a36 | Processing response from /xx.xx.xxx.138 | 2712 | RequestResponseStage-2
4725c714-6a1f-11ea-97ca-6b8bbe3a2a36 | REQUEST_RESPONSE message received from /xx.xx.xx.171 | 2725 | MessagingService-Incoming-/xx.xx.xx.171
4725c715-6a1f-11ea-97ca-6b8bbe3a2a36 | Processing response from /xx.xx.xx.171 | 2797 | RequestResponseStage-2
4725c716-6a1f-11ea-97ca-6b8bbe3a2a36 | Initiating read-repair | 2855 | RequestResponseStage-2
Keyspace info
cqlsh> describe keyspace feast;
CREATE KEYSPACE feast WITH replication = {'class': 'NetworkTopologyStrategy', 'stage-us-west1': '5'} AND durable_writes = true;
CREATE TABLE feast.feature_store (
entities text,
feature text,
value blob,
PRIMARY KEY (entities, feature)
) WITH CLUSTERING ORDER BY (feature ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';

Can Cassandra row cache only cache on specific columns?

I have a C* table which are wide and some columns are read-heavy. I am considering using row cache but do not know if row cache can store specific columns. If all the cells of that row need to be stored, the content to be cached may grow too fast and defeat the purpose.
The schema is like follows:
CREATE TABLE tb1 (
pk1 int,
ck1 int,
read_heavy_col1 int,
read_heavy_col2 int,
normal_col1 int,
normal_col2 int,
...
PRIMARY KEY (pk1, ck1)
)
The question is if row cache is able to cache pk1, ck1, read_heavy_col1, read_heavy_col2 only and ignores normal_col1, normal_col2, ....
According to DataStax Configuring data caches,
If the newly cached data does not include all cells configured by
user, Cassandra performs another read.
Does that mean C* can cache only the columns of interest?
You can instead of using a large number of columns, instead have a single clustering key that the heavy read columns sort earlier as, then limit the rows_per_partition to just capture those.
You will probably not get a whole lot of benefit vs just caching entire row. Also while the key cache is generally huge for read performance the row cache really only helps in very specific scenarios (and hurts in some) so be sure to benchmark it as the os cache is usually sufficient to keep the important bits in memory.
Answering my own questions:
Cassandra will cache all columns of that row.
From "Cassandra The Definitive Guide" (2nd edition) by Jeff Carpenter & Eben Hewitt:
The row cache caches entire rows and can speed up read access for frequently accessed rows, at the cost of more memory usage.
Experiment:
abc#cqlsh:test> SELECT pk1, ck1, read_heavy_col1 FROM tb1 WHERE pk1=2 and ck1=1;
pk1 | ck1 | read_heavy_col1
-----+-----+-----------------
2 | 1 | 1
(1 rows)
Tracing session: 53aac630-11be-11e9-9cb3-fbffb1c1e13b
activity | timestamp | source | source_elapsed
--------------------------------------------------------------------------------------------------------+----------------------------+--------------+----------------
Execute CQL3 query | 2019-01-06 22:21:15.667000 | x.x.x.x | 0
Parsing SELECT pk1, ck1, read_heavy_col1 FROM tb1 WHERE pk1=2 and ck1=1; [Native-Transport-Requests-1] | 2019-01-06 22:21:15.668000 | x.x.x.x | 310
Preparing statement [Native-Transport-Requests-1] | 2019-01-06 22:21:15.668000 | x.x.x.x | 568
Executing single-partition query on roles [ReadStage-2] | 2019-01-06 22:21:15.669000 | x.x.x.x | 1259
Acquiring sstable references [ReadStage-2] | 2019-01-06 22:21:15.669000 | x.x.x.x | 1345
Key cache hit for sstable 2 [ReadStage-2] | 2019-01-06 22:21:15.669000 | x.x.x.x | 1475
Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones [ReadStage-2] | 2019-01-06 22:21:15.669000 | x.x.x.x | 1569
Merged data from memtables and 1 sstables [ReadStage-2] | 2019-01-06 22:21:15.669000 | x.x.x.x | 1768
Read 1 live rows and 0 tombstone cells [ReadStage-2] | 2019-01-06 22:21:15.669000 | x.x.x.x | 1869
Executing single-partition query on roles [ReadStage-3] | 2019-01-06 22:21:15.672000 | x.x.x.x | 4579
Acquiring sstable references [ReadStage-3] | 2019-01-06 22:21:15.672000 | x.x.x.x | 4701
Key cache hit for sstable 2 [ReadStage-3] | 2019-01-06 22:21:15.672000 | x.x.x.x | 4812
Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones [ReadStage-3] | 2019-01-06 22:21:15.672000 | x.x.x.x | 4891
Merged data from memtables and 1 sstables [ReadStage-3] | 2019-01-06 22:21:15.672000 | x.x.x.x | 5001
Read 1 live rows and 0 tombstone cells [ReadStage-3] | 2019-01-06 22:21:15.673000 | x.x.x.x | 5081
Row cache miss [ReadStage-3] | 2019-01-06 22:21:15.674000 | x.x.x.x | 6227
Executing single-partition query on tb1 [ReadStage-3] | 2019-01-06 22:21:15.674000 | x.x.x.x | 6387
Acquiring sstable references [ReadStage-3] | 2019-01-06 22:21:15.674000 | x.x.x.x | 6445
Skipped 0/0 non-slice-intersecting sstables, included 0 due to tombstones [ReadStage-3] | 2019-01-06 22:21:15.674001 | x.x.x.x | 6514
Caching 3 rows [ReadStage-3] | 2019-01-06 22:21:15.674001 | x.x.x.x | 6647
Merged data from memtables and 0 sstables [ReadStage-3] | 2019-01-06 22:21:15.675000 | x.x.x.x | 7231
Read 1 live rows and 0 tombstone cells [ReadStage-3] | 2019-01-06 22:21:15.675000 | x.x.x.x | 7443
Request complete | 2019-01-06 22:21:15.676482 | x.x.x.x | 9482
Then the rows are cached, fetching a normal column:
abc#cqlsh:test> SELECT pk1, ck1, normal_col2 FROM tb1 WHERE pk1=2 and ck1=1;
pk1 | ck1 | normal_col2
-----+-----+-------------
2 | 1 | 1
(1 rows)
Tracing session: a178ae90-11be-11e9-9cb3-fbffb1c1e13b
activity | timestamp | source | source_elapsed
----------------------------------------------------------------------------------------------------+----------------------------+--------------+----------------
Execute CQL3 query | 2019-01-06 22:23:26.201000 | x.x.x.x | 0
Parsing SELECT pk1, ck1, normal_col2 FROM tb1 WHERE pk1=2 and ck1=1; [Native-Transport-Requests-1] | 2019-01-06 22:23:26.202000 | x.x.x.x | 205
Preparing statement [Native-Transport-Requests-1] | 2019-01-06 22:23:26.202000 | x.x.x.x | 393
Executing single-partition query on roles [ReadStage-3] | 2019-01-06 22:23:26.202000 | x.x.x.x | 968
Acquiring sstable references [ReadStage-3] | 2019-01-06 22:23:26.203000 | x.x.x.x | 1413
Key cache hit for sstable 2 [ReadStage-3] | 2019-01-06 22:23:26.203000 | x.x.x.x | 1564
Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones [ReadStage-3] | 2019-01-06 22:23:26.203000 | x.x.x.x | 1685
Merged data from memtables and 1 sstables [ReadStage-3] | 2019-01-06 22:23:26.203000 | x.x.x.x | 1841
Read 1 live rows and 0 tombstone cells [ReadStage-3] | 2019-01-06 22:23:26.203000 | x.x.x.x | 1930
Executing single-partition query on roles [ReadStage-5] | 2019-01-06 22:23:26.204000 | x.x.x.x | 2307
Acquiring sstable references [ReadStage-5] | 2019-01-06 22:23:26.204000 | x.x.x.x | 2375
Key cache hit for sstable 2 [ReadStage-5] | 2019-01-06 22:23:26.204000 | x.x.x.x | 2475
Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones [ReadStage-5] | 2019-01-06 22:23:26.204000 | x.x.x.x | 2584
Merged data from memtables and 1 sstables [ReadStage-5] | 2019-01-06 22:23:26.204000 | x.x.x.x | 2691
Read 1 live rows and 0 tombstone cells [ReadStage-5] | 2019-01-06 22:23:26.204000 | x.x.x.x | 2761
Row cache hit [ReadStage-3] | 2019-01-06 22:23:26.205000 | x.x.x.x | 3301
Read 1 live rows and 0 tombstone cells [ReadStage-3] | 2019-01-06 22:23:26.205000 | x.x.x.x | 3489
Request complete | 2019-01-06 22:23:26.204726 | x.x.x.x | 3726
Clearly C* will cache the entire row rather than specific columns.

Fetching data from Cassandra is too slow

We have a cluster of 3 cassandra nodes. All nodes are working fine, but fetching results is EXTREMELY slow. I run a SELECT-query in cql-shell to fetch ~100k of rows and before starting to show me first results it may take up to 30 seconds to warm-up.
Why it may happen? Is there any way to speed it up?
Here is a trace log:
activity | timestamp | source | source_elapsed
-------------------------------------------------------------------------------------------------------------------------------------------------------+--------------+----------------+----------------
execute_cql3_query | 17:41:34,923 | *.*.*.211 | 0
Parsing SELECT * FROM ga_page_visits WHERE site_global_key='GLOBAL_KEY' AND reported_at < '2014-05-17' AND reported_at > '2014-05-01' LIMIT 100000; | 17:41:34,923 | *.*.*.211 | 87
Preparing statement | 17:41:34,923 | *.*.*.211 | 290
Sending message to /*.*.*.213 | 17:41:34,924 | *.*.*.211 | 1579
Sending message to /*.*.*.212 | 17:41:34,924 | *.*.*.211 | 1617
Executing single-partition query on users | 17:41:34,924 | *.*.*.211 | 1666
Message received from /*.*.*.211 | 17:41:34,925 | *.*.*.213 | 48
Message received from /*.*.*.211 | 17:41:34,925 | *.*.*.212 | 40
Acquiring sstable references | 17:41:34,925 | *.*.*.211 | 1704
Merging memtable tombstones | 17:41:34,925 | *.*.*.211 | 1767
Key cache hit for sstable 2 | 17:41:34,925 | *.*.*.211 | 1886
Seeking to partition beginning in data file | 17:41:34,925 | *.*.*.211 | 1908
Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones | 17:41:34,925 | *.*.*.211 | 2337
Merging data from memtables and 1 sstables | 17:41:34,925 | *.*.*.211 | 2366
Read 1 live and 0 tombstoned cells | 17:41:34,925 | *.*.*.211 | 2410
Executing single-partition query on users | 17:41:34,926 | *.*.*.213 | 702
Executing single-partition query on users | 17:41:34,926 | *.*.*.212 | 813
Acquiring sstable references | 17:41:34,926 | *.*.*.213 | 747
Acquiring sstable references | 17:41:34,926 | *.*.*.212 | 848
Merging memtable tombstones | 17:41:34,926 | *.*.*.213 | 838
Merging memtable tombstones | 17:41:34,926 | *.*.*.212 | 922
Key cache hit for sstable 5 | 17:41:34,926 | *.*.*.213 | 1006
Key cache hit for sstable 1 | 17:41:34,926 | *.*.*.212 | 1044
Seeking to partition beginning in data file | 17:41:34,926 | *.*.*.213 | 1034
Seeking to partition beginning in data file | 17:41:34,926 | *.*.*.212 | 1066
Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones | 17:41:34,927 | *.*.*.213 | 1635
Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones | 17:41:34,927 | *.*.*.212 | 1543
Merging data from memtables and 1 sstables | 17:41:34,927 | *.*.*.213 | 1676
Merging data from memtables and 1 sstables | 17:41:34,927 | *.*.*.212 | 1571
Read 1 live and 0 tombstoned cells | 17:41:34,927 | *.*.*.213 | 1760
Read 1 live and 0 tombstoned cells | 17:41:34,927 | *.*.*.212 | 1634
Enqueuing response to /*.*.*.211 | 17:41:34,927 | *.*.*.213 | 2014
Enqueuing response to /*.*.*.211 | 17:41:34,927 | *.*.*.212 | 1846
Sending message to /*.*.*.211 | 17:41:34,927 | *.*.*.213 | 2173
Sending message to /*.*.*.211 | 17:41:34,927 | *.*.*.212 | 1960
Message received from /*.*.*.213 | 17:41:34,928 | *.*.*.211 | 4759
Message received from /*.*.*.212 | 17:41:34,928 | *.*.*.211 | 4775
Processing response from /*.*.*.213 | 17:41:34,928 | *.*.*.211 | 5439
Processing response from /*.*.*.212 | 17:41:34,928 | *.*.*.211 | 5668
Sending message to /*.*.*.40 | 17:41:34,929 | *.*.*.211 | 5954
Executing single-partition query on ga_page_visits | 17:41:34,930 | *.*.*.211 | 6792
Acquiring sstable references | 17:41:34,930 | *.*.*.211 | 6868
Merging memtable tombstones | 17:41:34,930 | *.*.*.211 | 6936
Key cache hit for sstable 2 | 17:41:34,930 | *.*.*.211 | 7075
Seeking to partition indexed section in data file | 17:41:34,930 | *.*.*.211 | 7102
Key cache hit for sstable 1 | 17:41:34,930 | *.*.*.211 | 7211
Seeking to partition indexed section in data file | 17:41:34,930 | *.*.*.211 | 7232
Skipped 1/3 non-slice-intersecting sstables, included 0 due to tombstones | 17:41:34,930 | *.*.*.211 | 7290
Merging data from memtables and 2 sstables | 17:41:34,930 | *.*.*.211 | 7311
Message received from /*.*.*.211 | 17:41:34,964 | *.*.*.40 | 45
Executing single-partition query on users | 17:41:34,965 | *.*.*.40 | 658
Acquiring sstable references | 17:41:34,965 | *.*.*.40 | 695
Merging memtable tombstones | 17:41:34,965 | *.*.*.40 | 762
Key cache hit for sstable 14 | 17:41:34,965 | *.*.*.40 | 852
Seeking to partition beginning in data file | 17:41:34,965 | *.*.*.40 | 876
Key cache hit for sstable 16 | 17:41:34,965 | *.*.*.40 | 1275
Seeking to partition beginning in data file | 17:41:34,965 | *.*.*.40 | 1293
Skipped 0/2 non-slice-intersecting sstables, included 0 due to tombstones | 17:41:34,966 | *.*.*.40 | 1504
Merging data from memtables and 2 sstables | 17:41:34,966 | *.*.*.40 | 1526
Read 1 live and 0 tombstoned cells | 17:41:34,966 | *.*.*.40 | 1592
Enqueuing response to /*.*.*.211 | 17:41:34,966 | *.*.*.40 | 1755
Sending message to /*.*.*.211 | 17:41:34,966 | *.*.*.40 | 1834
Message received from /*.*.*.40 | 17:41:34,980 | *.*.*.211 | 57679
Processing response from /*.*.*.40 | 17:41:34,981 | *.*.*.211 | 57785
Read 99880 live and 0 tombstoned cells | 17:41:36,315 | *.*.*.211 | 1392676
Request complete | 17:41:37,541 | *.*.*.211 | 2618390
The table schema is:
CREATE TABLE ga_page_visits (
site_global_key ascii,
reported_at timestamp,
timeuuid ascii,
bounces int,
campaign text,
channel ascii,
conversions int,
device ascii,
keyword text,
medium text,
new_visits int,
page_id int,
page_views int,
referral_path text,
site_search_engine_id int,
social_network text,
source text,
value decimal,
visits int,
PRIMARY KEY (site_global_key, reported_at, timeuuid)
) WITH
bloom_filter_fp_chance=0.010000 AND
caching='KEYS_ONLY' AND
comment='' AND
dclocal_read_repair_chance=0.000000 AND
gc_grace_seconds=864000 AND
index_interval=128 AND
read_repair_chance=0.100000 AND
replicate_on_write='true' AND
populate_io_cache_on_flush='false' AND
default_time_to_live=0 AND
speculative_retry='99.0PERCENTILE' AND
memtable_flush_period_in_ms=0 AND
compaction={'class': 'SizeTieredCompactionStrategy'} AND
compression={'sstable_compression': 'LZ4Compressor'};
I suspect you've got an issue with your data model. It looks like you're pushing in an absolute ton of data into a single partition. That will become a problem as your data set gets bigger. I recommend this type of structure instead:
CREATE TABLE ga_page_visits (
site_global_key ascii,
day timestamp,
timeuuid ascii,
bounces int,
campaign text,
channel ascii,
conversions int,
device ascii,
keyword text,
medium text,
new_visits int,
page_id int,
page_views int,
referral_path text,
site_search_engine_id int,
social_network text,
source text,
value decimal,
visits int,
PRIMARY KEY ((site_global_key, day), timeuuid)
) WITH
bloom_filter_fp_chance=0.010000 AND
caching='KEYS_ONLY' AND
comment='' AND
dclocal_read_repair_chance=0.000000 AND
gc_grace_seconds=864000 AND
index_interval=128 AND
read_repair_chance=0.100000 AND
replicate_on_write='true' AND
populate_io_cache_on_flush='false' AND
default_time_to_live=0 AND
speculative_retry='99.0PERCENTILE' AND
memtable_flush_period_in_ms=0 AND
compaction={'class': 'SizeTieredCompactionStrategy'} AND
compression={'sstable_compression': 'LZ4Compressor'};
You already have a timeuuid, that has a time embedded in it, so you don't need a reported_at timestamp.
You are going to have a ridiculous number of rows in this partition. You should use a composite primary key of site_global_key & the date (do not add the time, let it be 00:00:00.)
This way, each day will live on a separate partition and is more easily queried. Do 1 query for each day. This will perform much better.
As I'm looking at your model requirements, this is really looking like a data warehouse type design. Jon nailed it with the rollups. If you need low latency, high frequency creating those tighter views of your data is done often in production with high scaling requirements. I don't want to spray out links but there are examples of this if you google.

Cassandra delayed / denied updates

I'm having trouble with a small Cassandra cluster that used to work well. I used to have 3 nodes. When I added the 4th, I started seeing some issues with values not updating, so I did nodetool repair (a few times now) on the entire cluster. I should mention I did the switch at the same time as the upgrade from python-cql to the new python cassandra driver.
Essentially the weirdness falls into two cases:
Denied Updates:
cqlsh:analytics> select * from metrics where id = '36122cc69a7a12e266ab40f5b7756daee75bd0d2735a707b369302acb879eedc';
(0 rows)
Tracing session: 19e5bbd0-d172-11e3-a039-67dcdc0d02de
activity | timestamp | source | source_elapsed
--------------------------------------------------------------------------------------------------------------------------+--------------+----------------+----------------
execute_cql3_query | 20:49:34,221 | 10.128.214.245 | 0
Parsing select * from metrics where id = '36122cc69a7a12e266ab40f5b7756daee75bd0d2735a707b369302acb879eedc' LIMIT 10000; | 20:49:34,221 | 10.128.214.245 | 176
Preparing statement | 20:49:34,222 | 10.128.214.245 | 311
Sending message to /10.128.180.108 | 20:49:34,222 | 10.128.214.245 | 773
Message received from /10.128.214.245 | 20:49:34,224 | 10.128.180.108 | 67
Row cache hit | 20:49:34,225 | 10.128.180.108 | 984
Read 0 live and 0 tombstoned cells | 20:49:34,225 | 10.128.180.108 | 1079
Message received from /10.128.180.108 | 20:49:34,227 | 10.128.214.245 | 5760
Enqueuing response to /10.128.214.245 | 20:49:34,227 | 10.128.180.108 | 3045
Processing response from /10.128.180.108 | 20:49:34,227 | 10.128.214.245 | 5942
Sending message to /10.128.214.245 | 20:49:34,227 | 10.128.180.108 | 3302
Request complete | 20:49:34,227 | 10.128.214.245 | 6282
cqlsh:analytics> update metrics set n = n + 1 where id = '36122cc69a7a12e266ab40f5b7756daee75bd0d2735a707b369302acb879eedc';
Tracing session: 20845ff0-d172-11e3-a039-67dcdc0d02de
activity | timestamp | source | source_elapsed
---------------------------------------------------------------------------------------------------------------------+--------------+----------------+----------------
execute_cql3_query | 20:49:45,328 | 10.128.214.245 | 0
Parsing update metrics set n = n + 1 where id = '36122cc69a7a12e266ab40f5b7756daee75bd0d2735a707b369302acb879eedc'; | 20:49:45,328 | 10.128.214.245 | 129
Preparing statement | 20:49:45,328 | 10.128.214.245 | 227
Determining replicas for mutation | 20:49:45,328 | 10.128.214.245 | 298
Enqueuing counter update to /10.128.194.70 | 20:49:45,328 | 10.128.214.245 | 425
Sending message to /10.128.194.70 | 20:49:45,329 | 10.128.214.245 | 598
Message received from /10.128.214.245 | 20:49:45,330 | 10.128.194.70 | 37
Acquiring switchLock read lock | 20:49:45,331 | 10.128.194.70 | 623
Message received from /10.128.194.70 | 20:49:45,331 | 10.128.214.245 | 3335
Appending to commitlog | 20:49:45,331 | 10.128.194.70 | 645
Processing response from /10.128.194.70 | 20:49:45,331 | 10.128.214.245 | 3431
Adding to metrics memtable | 20:49:45,331 | 10.128.194.70 | 692
Sending message to /10.128.214.245 | 20:49:45,332 | 10.128.194.70 | 1120
Row cache miss | 20:49:45,332 | 10.128.194.70 | 1611
Executing single-partition query on metrics | 20:49:45,332 | 10.128.194.70 | 1687
Acquiring sstable references | 20:49:45,332 | 10.128.194.70 | 1692
Merging memtable tombstones | 20:49:45,332 | 10.128.194.70 | 1692
Key cache hit for sstable 13958 | 20:49:45,332 | 10.128.194.70 | 1714
Seeking to partition beginning in data file | 20:49:45,332 | 10.128.194.70 | 1856
Key cache hit for sstable 14036 | 20:49:45,333 | 10.128.194.70 | 2271
Seeking to partition beginning in data file | 20:49:45,333 | 10.128.194.70 | 2271
Skipped 0/2 non-slice-intersecting sstables, included 0 due to tombstones | 20:49:45,333 | 10.128.194.70 | 2540
Merging data from memtables and 2 sstables | 20:49:45,333 | 10.128.194.70 | 2564
Read 0 live and 1 tombstoned cells | 20:49:45,333 | 10.128.194.70 | 2632
Sending message to /10.128.195.149 | 20:49:45,335 | 10.128.194.70 | null
Message received from /10.128.194.70 | 20:49:45,335 | 10.128.180.108 | 43
Sending message to /10.128.180.108 | 20:49:45,335 | 10.128.194.70 | null
Acquiring switchLock read lock | 20:49:45,335 | 10.128.180.108 | 297
Appending to commitlog | 20:49:45,335 | 10.128.180.108 | 312
Message received from /10.128.194.70 | 20:49:45,336 | 10.128.195.149 | 53
Adding to metrics memtable | 20:49:45,336 | 10.128.180.108 | 374
Enqueuing response to /10.128.194.70 | 20:49:45,336 | 10.128.180.108 | 445
Sending message to /10.128.194.70 | 20:49:45,336 | 10.128.180.108 | 677
Message received from /10.128.180.108 | 20:49:45,337 | 10.128.194.70 | null
Processing response from /10.128.180.108 | 20:49:45,337 | 10.128.194.70 | null
Acquiring switchLock read lock | 20:49:45,338 | 10.128.195.149 | 1874
Appending to commitlog | 20:49:45,338 | 10.128.195.149 | 1970
Adding to metrics memtable | 20:49:45,338 | 10.128.195.149 | 2027
Enqueuing response to /10.128.194.70 | 20:49:45,338 | 10.128.195.149 | 2147
Sending message to /10.128.194.70 | 20:49:45,338 | 10.128.195.149 | 2572
Message received from /10.128.195.149 | 20:49:45,339 | 10.128.194.70 | null
Processing response from /10.128.195.149 | 20:49:45,339 | 10.128.194.70 | null
Request complete | 20:49:45,331 | 10.128.214.245 | 3556
cqlsh:analytics> select * from metrics where id = '36122cc69a7a12e266ab40f5b7756daee75bd0d2735a707b369302acb879eedc';
(0 rows)
Tracing session: 28f1f7b0-d172-11e3-a039-67dcdc0d02de
activity | timestamp | source | source_elapsed
--------------------------------------------------------------------------------------------------------------------------+--------------+----------------+----------------
execute_cql3_query | 20:49:59,468 | 10.128.214.245 | 0
Parsing select * from metrics where id = '36122cc69a7a12e266ab40f5b7756daee75bd0d2735a707b369302acb879eedc' LIMIT 10000; | 20:49:59,468 | 10.128.214.245 | 119
Preparing statement | 20:49:59,468 | 10.128.214.245 | 235
Sending message to /10.128.180.108 | 20:49:59,468 | 10.128.214.245 | 574
Message received from /10.128.214.245 | 20:49:59,469 | 10.128.180.108 | 49
Row cache miss | 20:49:59,470 | 10.128.180.108 | 817
Executing single-partition query on metrics | 20:49:59,470 | 10.128.180.108 | 877
Acquiring sstable references | 20:49:59,470 | 10.128.180.108 | 888
Merging memtable tombstones | 20:49:59,470 | 10.128.180.108 | 938
Key cache hit for sstable 5399 | 20:49:59,470 | 10.128.180.108 | 1025
Seeking to partition beginning in data file | 20:49:59,470 | 10.128.180.108 | 1033
Message received from /10.128.180.108 | 20:49:59,471 | 10.128.214.245 | 3378
Skipped 0/3 non-slice-intersecting sstables, included 0 due to tombstones | 20:49:59,471 | 10.128.180.108 | 1495
Processing response from /10.128.180.108 | 20:49:59,471 | 10.128.214.245 | 3466
Merging data from memtables and 1 sstables | 20:49:59,471 | 10.128.180.108 | 1507
Read 0 live and 1 tombstoned cells | 20:49:59,471 | 10.128.180.108 | 1660
Read 0 live and 0 tombstoned cells | 20:49:59,471 | 10.128.180.108 | 1759
Enqueuing response to /10.128.214.245 | 20:49:59,471 | 10.128.180.108 | 1817
Sending message to /10.128.214.245 | 20:49:59,471 | 10.128.180.108 | 1977
Request complete | 20:49:59,471 | 10.128.214.245 | 3638
So this is pretty straight forward. From y reading, it might have to do with tombstones timestamps that would have somehow gotten messed up. However, it's been several days and "now" still hasn't caught up to the future timestamp of the tombstone. Is there any way to just reset every single timestamp of the entire cluster to 0 while activity is stopped, I can live with slightly inacurate data right now.
The second issue is on some of my tables. Some tables reflect changes instantly, but for others, the changes will get reflected between 30 minutes to an hour later. I can't figure out how timestamps might relate to this.
I've synced all nodes of my cluster using NTP, not the most precise, but won't be out of sync in the scale of days or anything. All the nodes have been synced like this from the beginning, at no point did I have wildly out of sync times.
Can anybody help? As I was saying, by this point I'd settle for shutting down access to the cluster and resetting all timestamps to 0, I don't care about getting some of the order wrong, I just want this thing to work.
Thanks
Timestamps are immutable. You'd have to truncate the table and rebuild it. The easiest way to rebuild is to just insert correct data, but if that's not an option, you can round trip through sstable2json -> edit timestamps -> json2sstable.

Cassandra 1.2 merging data from memtables and sstables takes too long

Here is a trace from a 4 node cassandra cluster, running 1.2.6. I'm seeing a timeout with a simple select when the cluster is under no load and I need some help getting to the bottom of it.
activity | timestamp | source | source_elapsed
-------------------------------------------------------------------------+--------------+---------------+----------------
execute_cql3_query | 05:21:00,848 | 100.69.176.51 | 0
Parsing select * from user_scores where user_id='26257166' LIMIT 10000; | 05:21:00,848 | 100.69.176.51 | 77
Peparing statement | 05:21:00,848 | 100.69.176.51 | 225
Executing single-partition query on user_scores | 05:21:00,849 | 100.69.176.51 | 589
Acquiring sstable references | 05:21:00,849 | 100.69.176.51 | 626
Merging memtable tombstones | 05:21:00,849 | 100.69.176.51 | 676
Key cache hit for sstable 34 | 05:21:00,849 | 100.69.176.51 | 817
Seeking to partition beginning in data file | 05:21:00,849 | 100.69.176.51 | 836
Key cache hit for sstable 32 | 05:21:00,849 | 100.69.176.51 | 1135
Seeking to partition beginning in data file | 05:21:00,849 | 100.69.176.51 | 1153
Merging data from memtables and 2 sstables | 05:21:00,850 | 100.69.176.51 | 1394
Request complete | 05:21:20,881 | 100.69.176.51 | 20033807
Here is the schema. You can see that is includes a few collections.
create table user_scores
(
user_id varchar,
post_type varchar,
score double,
team_to_score_map map<varchar, double>,
affiliation_to_score_map map<varchar, double>,
campaign_to_score_map map<varchar, double>,
person_to_score_map map<varchar, double>,
primary key(user_id, post_type)
)
with compaction =
{
'class' : 'LeveledCompactionStrategy',
'sstable_size_in_mb' : 10
};
I added the leveled compaction strategy as it was supposed to help with read latency.
I'd like to understand what could cause the cluster to timeout during the merge phase. Not all queries timeout. It appears to happen more frequently with rows that have maps with a larger number of entries.
Here is another trace of a failure for good measure. It is very reproducable:
activity | timestamp | source | source_elapsed
-------------------------------------------------------------------------+--------------+----------------+----------------
execute_cql3_query | 05:51:34,557 | 100.69.176.51 | 0
Message received from /100.69.176.51 | 05:51:34,195 | 100.69.184.134 | 102
Executing single-partition query on user_scores | 05:51:34,199 | 100.69.184.134 | 3512
Acquiring sstable references | 05:51:34,199 | 100.69.184.134 | 3741
Merging memtable tombstones | 05:51:34,199 | 100.69.184.134 | 3890
Key cache hit for sstable 5 | 05:51:34,199 | 100.69.184.134 | 4040
Seeking to partition beginning in data file | 05:51:34,199 | 100.69.184.134 | 4059
Merging data from memtables and 1 sstables | 05:51:34,200 | 100.69.184.134 | 4412
Parsing select * from user_scores where user_id='26257166' LIMIT 10000; | 05:51:34,558 | 100.69.176.51 | 91
Peparing statement | 05:51:34,558 | 100.69.176.51 | 238
Enqueuing data request to /100.69.184.134 | 05:51:34,558 | 100.69.176.51 | 567
Sending message to /100.69.184.134 | 05:51:34,558 | 100.69.176.51 | 979
Request complete | 05:51:54,562 | 100.69.176.51 | 20005209
And a trace from when it works:
activity | timestamp | source | source_elapsed
--------------------------------------------------------------------------+--------------+----------------+----------------
execute_cql3_query | 05:55:07,772 | 100.69.176.51 | 0
Message received from /100.69.176.51 | 05:55:07,408 | 100.69.184.134 | 53
Executing single-partition query on user_scores | 05:55:07,409 | 100.69.184.134 | 1014
Acquiring sstable references | 05:55:07,409 | 100.69.184.134 | 1087
Merging memtable tombstones | 05:55:07,410 | 100.69.184.134 | 1209
Partition index with 0 entries found for sstable 5 | 05:55:07,410 | 100.69.184.134 | 1681
Seeking to partition beginning in data file | 05:55:07,410 | 100.69.184.134 | 1732
Merging data from memtables and 1 sstables | 05:55:07,411 | 100.69.184.134 | 2415
Read 1 live and 0 tombstoned cells | 05:55:07,412 | 100.69.184.134 | 3274
Enqueuing response to /100.69.176.51 | 05:55:07,412 | 100.69.184.134 | 3534
Sending message to /100.69.176.51 | 05:55:07,412 | 100.69.184.134 | 3936
Parsing select * from user_scores where user_id='305722020' LIMIT 10000; | 05:55:07,772 | 100.69.176.51 | 96
Peparing statement | 05:55:07,772 | 100.69.176.51 | 262
Enqueuing data request to /100.69.184.134 | 05:55:07,773 | 100.69.176.51 | 600
Sending message to /100.69.184.134 | 05:55:07,773 | 100.69.176.51 | 847
Message received from /100.69.184.134 | 05:55:07,778 | 100.69.176.51 | 6103
Processing response from /100.69.184.134 | 05:55:07,778 | 100.69.176.51 | 6341
Request complete | 05:55:07,778 | 100.69.176.51 | 6780
Looks like I was running into a performance issue with 1.2. Fortunately a patch had just been applied to the 1.2 branch, so when I built from source my problem went away.
see https://issues.apache.org/jira/browse/CASSANDRA-5677 for a detailed explanation.

Resources