Query tracing in Cassandra

Query tracing in Cassandra - cassandra

Since, I could not find a query explain type of feature, I wrote the below python program to list the different phases the query is going through. While I am able to get the various phases and steps, it is not giving me the time spent in each phase/step. Is there a way to print the time as well
qlist = qf.read_query_file("query.txt")
for query in qlist:
rows = session.execute(query, trace=True)
trace = rows.get_query_trace()
duration = rows.get_query_trace().duration.total_seconds()
print('----------------------------------------')
print("Query Name:{query}, Duration:{duration}".format(query=query,duration=duration))
for event in trace.events:
print(event)
print('------------------------------------------')
The output is as shown below
Query Name:select * from iris_with_id, Duration:0.006939
Parsing select * from iris_with_id on xx.xxx.xxx.x[Native-Transport-Requests-1] at 2022-01-13 03:05:34.544000
Preparing statement on xx.xxx.xxx.x[Native-Transport-Requests-1] at 2022-01-13 03:05:34.544000
reading digest from /xx.xxx.xxx.x on xx.xxx.xxx.x[Native-Transport-Requests-1] at 2022-01-13 03:05:34.544000
Executing single-partition query on roles on xx.xxx.xxx.x[ReadStage-2] at 2022-01-13 03:05:34.545000

A Python app is probably a little overkill.
You can trace a query in cqlsh using the TRACING command:
cqlsh> TRACING ON;
You can then see all the stages of the execution including the elapsed time for each stage:
cqlsh> SELECT * FROM community.users_by_email WHERE email = 'alice#getvaccinated.now';
email | colour | name
-------------------------+--------+-------
alice#getvaccinated.now | red | Alice
(1 rows)
Tracing session: 4ed2b930-74dd-11ec-afd3-85873f459502
activity | timestamp | source | source_elapsed | client
--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+--------------+----------------+-----------------------------------------
Execute CQL3 query | 2022-01-14 01:57:20.068000 | 172.28.223.3 | 0 | fa5f:ba11:8ac8:4fe7:a863:8df7:5cf2:7efd
Parsing SELECT * FROM users_by_email WHERE email = 'alice#getvaccinated.now'; [CoreThread-9] | 2022-01-14 01:57:20.070000 | 172.28.223.3 | 1831 | fa5f:ba11:8ac8:4fe7:a863:8df7:5cf2:7efd
Preparing statement [CoreThread-9] | 2022-01-14 01:57:20.071000 | 172.28.223.3 | 2656 | fa5f:ba11:8ac8:4fe7:a863:8df7:5cf2:7efd
Reading data from [/172.28.128.3] [CoreThread-11] | 2022-01-14 01:57:20.071000 | 172.28.223.3 | 3526 | fa5f:ba11:8ac8:4fe7:a863:8df7:5cf2:7efd
Reading digests from [/172.28.237.4] [CoreThread-11] | 2022-01-14 01:57:20.072000 | 172.28.223.3 | 3887 | fa5f:ba11:8ac8:4fe7:a863:8df7:5cf2:7efd
Sending READS.SINGLE_READ message to /172.28.128.3, size=190 bytes [CoreThread-4] | 2022-01-14 01:57:20.072000 | 172.28.223.3 | 4235 | fa5f:ba11:8ac8:4fe7:a863:8df7:5cf2:7efd
Sending READS.SINGLE_READ message to /172.28.237.4, size=191 bytes [CoreThread-2] | 2022-01-14 01:57:20.072000 | 172.28.223.3 | 4340 | fa5f:ba11:8ac8:4fe7:a863:8df7:5cf2:7efd
READS.SINGLE_READ message received from /172.28.223.3 [CoreThread-4] | 2022-01-14 01:57:20.076000 | 172.28.128.3 | 434 | fa5f:ba11:8ac8:4fe7:a863:8df7:5cf2:7efd
Merged data from memtables and 3 sstables during single-partition query on users_by_email for key 'alice#getvaccinated.now' [CoreThread-4] | 2022-01-14 01:57:20.077000 | 172.28.128.3 | 3167 | fa5f:ba11:8ac8:4fe7:a863:8df7:5cf2:7efd
READS.SINGLE_READ message received from /172.28.223.3 [CoreThread-5] | 2022-01-14 01:57:20.077000 | 172.28.237.4 | 461 | fa5f:ba11:8ac8:4fe7:a863:8df7:5cf2:7efd
Read 1 live rows and 0 tombstone ones [CoreThread-4] | 2022-01-14 01:57:20.078000 | 172.28.128.3 | 3354 | fa5f:ba11:8ac8:4fe7:a863:8df7:5cf2:7efd
Merged data from memtables and 3 sstables during single-partition query on users_by_email for key 'alice#getvaccinated.now' [CoreThread-5] | 2022-01-14 01:57:20.078000 | 172.28.237.4 | 3322 | fa5f:ba11:8ac8:4fe7:a863:8df7:5cf2:7efd
Enqueuing READS.SINGLE_READ response to /172.28.128.3 [CoreThread-4] | 2022-01-14 01:57:20.078000 | 172.28.128.3 | 3487 | fa5f:ba11:8ac8:4fe7:a863:8df7:5cf2:7efd
Read 1 live rows and 0 tombstone ones [CoreThread-5] | 2022-01-14 01:57:20.078000 | 172.28.237.4 | 3510 | fa5f:ba11:8ac8:4fe7:a863:8df7:5cf2:7efd
Sending READS.SINGLE_READ message to /172.28.223.3, size=169 bytes [CoreThread-7] | 2022-01-14 01:57:20.078000 | 172.28.128.3 | 3846 | fa5f:ba11:8ac8:4fe7:a863:8df7:5cf2:7efd
Enqueuing READS.SINGLE_READ response to /172.28.237.4 [CoreThread-5] | 2022-01-14 01:57:20.078000 | 172.28.237.4 | 3634 | fa5f:ba11:8ac8:4fe7:a863:8df7:5cf2:7efd
Sending READS.SINGLE_READ message to /172.28.223.3, size=131 bytes [CoreThread-0] | 2022-01-14 01:57:20.078000 | 172.28.237.4 | 4012 | fa5f:ba11:8ac8:4fe7:a863:8df7:5cf2:7efd
READS.SINGLE_READ message received from /172.28.128.3 [CoreThread-7] | 2022-01-14 01:57:20.082000 | 172.28.223.3 | 14426 | fa5f:ba11:8ac8:4fe7:a863:8df7:5cf2:7efd
Processing response from /172.28.128.3 [CoreThread-11] | 2022-01-14 01:57:20.082000 | 172.28.223.3 | 14792 | fa5f:ba11:8ac8:4fe7:a863:8df7:5cf2:7efd
READS.SINGLE_READ message received from /172.28.237.4 [CoreThread-8] | 2022-01-14 01:57:20.082000 | 172.28.223.3 | 14914 | fa5f:ba11:8ac8:4fe7:a863:8df7:5cf2:7efd
Processing response from /172.28.237.4 [CoreThread-11] | 2022-01-14 01:57:20.083000 | 172.28.223.3 | 15165 | fa5f:ba11:8ac8:4fe7:a863:8df7:5cf2:7efd
Request complete | 2022-01-14 01:57:20.085874 | 172.28.223.3 | 17874 | fa5f:ba11:8ac8:4fe7:a863:8df7:5cf2:7efd

Related

Cassandra select not stable using datastax driver

Versions:
com.datastax.oss
-java-driver-core:4.5.0
-java-driver-query-builder:4.5.0
-java-driver-mapper-runtime:4.5.0
cassandra:3.11.5 docker image
jdk 11.1
I'm running a deployment of feast that I've modified to use cassandra as a backend low latency serving db for machine learning features. I'm sucessfully writing and reading rows, but the read is inconsistent with respect to results returned. Sometimes the payloads are empty and I don't know why. I have already tried updating to the latest datastax driver and coordinating time using ntp/time.google.com. I've also tried to change the consistency of write to ALL and read to LOCAL_ONE/LOCAL_QUOROM, without success. I'm really struggling to figure out why select isn't consistent. Any insight would be great! :) Here is the process:
I write the rows into cassandra using CassandraIO
#Override
public Future<Void> saveAsync(CassandraMutation entityClass) {
return mapper.saveAsync(
entityClass,
Option.timestamp(entityClass.getWriteTime()),
Option.ttl(entityClass.getTtl()),
Option.consistencyLevel(ConsistencyLevel.LOCAL_QUORUM),
Option.tracing(true));
}
This seems to successfully map rows into my cassandra cluster, which I then query in my application as follows
List<InetSocketAddress> contactPoints =
Arrays.stream(cassandraConfig.getBootstrapHosts().split(","))
.map(h -> new InetSocketAddress(h, cassandraConfig.getPort()))
.collect(Collectors.toList());
CqlSession session =
CqlSession.builder()
.addContactPoints(contactPoints)
.withLocalDatacenter(storeProperties.getCassandraDcName())
.build();
....
PreparedStatement query =
session.prepare(
String.format(
"SELECT entities, feature, value, WRITETIME(value) as writetime FROM %s.%s WHERE entities = ?",
keyspace, tableName));
session.execute(
query
.bind(key)
.setTracing(true)
.setConsistencyLevel(ConsistencyLevel.LOCAL_QUORUM)));
My issue is that there doesn't seem to be consistent selects happening. I have been recording various bits for a while now and for example here are two select queries, with the same coordinator, one succeeded, then after that the subsequent select fails to return results.
cqlsh> select * from system_traces.sessions where session_id=be023400-6a1e-11ea-97ca-6b8bbe3a2a36;
session_id | client | command | coordinator | duration | parameters | request | started_at
--------------------------------------+---------------+---------+---------------+----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------+---------------------------------
be023400-6a1e-11ea-97ca-6b8bbe3a2a36 | xx.xx.xxx.189 | QUERY | xx.xx.xxx.158 | 41313 | {'bound_var_0_entities': '''ml_project/test_test_entity:1:entity2_uuid=TenderGreens_8755fff7|entity1_uuid=Zach_Yang_fe7fea92''', 'consistency_level': 'LOCAL_QUORUM', 'page_size': '5000', 'query': 'SELECT entities, feature, value, WRITETIME(value) as writetime FROM feast.feature_store WHERE entities = ?', 'serial_consistency_level': 'SERIAL'} | Execute CQL3 prepared query | 2020-03-19 20:18:05.760000+0000
select event_id, activity, source_elapsed, thread from system_traces.events where session_id=be023400-6a1e-11ea-97ca-6b8bbe3a2a36;
event_id | activity | source_elapsed | thread
--------------------------------------+---------------------------------------------------------------------------------------------------------------+----------------+-----------------------------------------------------------------------------------------------------------
be0652b0-6a1e-11ea-97ca-6b8bbe3a2a36 | Read-repair DC_LOCAL | 27087 | Native-Transport-Requests-1
be0679c0-6a1e-11ea-97ca-6b8bbe3a2a36 | reading data from /xx.xx.xxx.161 | 28034 | Native-Transport-Requests-1
be06a0d0-6a1e-11ea-97ca-6b8bbe3a2a36 | Sending READ message to /xx.xx.xxx.161 | 28552 | MessagingService-Outgoing-/xx.xx.xxx.161-Small
be06a0d1-6a1e-11ea-97ca-6b8bbe3a2a36 | reading digest from /xx.xx.xxx.162 | 28595 | Native-Transport-Requests-1
be06a0d2-6a1e-11ea-97ca-6b8bbe3a2a36 | Executing single-partition query on feature_store | 28598 | ReadStage-3
be06a0d3-6a1e-11ea-97ca-6b8bbe3a2a36 | Acquiring sstable references | 28689 | ReadStage-3
be06a0d4-6a1e-11ea-97ca-6b8bbe3a2a36 | reading digest from /xx.xx.xx.138 | 28852 | Native-Transport-Requests-1
be06a0d5-6a1e-11ea-97ca-6b8bbe3a2a36 | Sending READ message to /xx.xx.xxx.162 | 28904 | MessagingService-Outgoing-/xx.xx.xxx.162-Small
be06a0d6-6a1e-11ea-97ca-6b8bbe3a2a36 | Bloom filter allows skipping sstable 56 | 28937 | ReadStage-3
be06a0d7-6a1e-11ea-97ca-6b8bbe3a2a36 | reading digest from /xx.xx.xxx.171 | 28983 | Native-Transport-Requests-1
be06a0d8-6a1e-11ea-97ca-6b8bbe3a2a36 | Bloom filter allows skipping sstable 55 | 29020 | ReadStage-3
be06a0d9-6a1e-11ea-97ca-6b8bbe3a2a36 | Sending READ message to cassandra-feature-store-1.cassandra-feature-store.team-data/xx.xx.xx.138 | 29071 | MessagingService-Outgoing-cassandra-feature-store-1.cassandra-feature-store.team-data/xx.xx.xx.138-Small
be06a0da-6a1e-11ea-97ca-6b8bbe3a2a36 | Bloom filter allows skipping sstable 54 | 29181 | ReadStage-3
be06a0db-6a1e-11ea-97ca-6b8bbe3a2a36 | Sending READ message to cassandra-feature-store-0.cassandra-feature-store.team-data/xx.xx.xxx.171 | 29201 | MessagingService-Outgoing-cassandra-feature-store-0.cassandra-feature-store.team-data/xx.xx.xx.171-Small
be06c7e0-6a1e-11ea-80ad-dffaf3fb56b4 | READ message received from /xx.xx.xxx.158 | 33 | MessagingService-Incoming-/xx.xx.xxx.158
be06c7e0-6a1e-11ea-8693-577fec389856 | READ message received from /xx.xx.xxx.158 | 34 | MessagingService-Incoming-/xx.xx.xxx.158
be06c7e0-6a1e-11ea-8b1a-e5aa876f7d0d | READ message received from /xx.xx.xxx.158 | 29 | MessagingService-Incoming-/xx.xx.xxx.158
be06c7e0-6a1e-11ea-8d2e-c5837edad3d1 | READ message received from /xx.xx.xxx.158 | 44 | MessagingService-Incoming-/xx.xx.xxx.158
be06c7e0-6a1e-11ea-97ca-6b8bbe3a2a36 | Bloom filter allows skipping sstable 41 | 29273 | ReadStage-3
be06c7e1-6a1e-11ea-80ad-dffaf3fb56b4 | Executing single-partition query on feature_store | 389 | ReadStage-1
be06c7e1-6a1e-11ea-8b1a-e5aa876f7d0d | Executing single-partition query on feature_store | 513 | ReadStage-1
be06c7e1-6a1e-11ea-97ca-6b8bbe3a2a36 | Skipped 0/4 non-slice-intersecting sstables, included 0 due to tombstones | 29342 | ReadStage-3
be06c7e2-6a1e-11ea-80ad-dffaf3fb56b4 | Acquiring sstable references | 457 | ReadStage-1
be06c7e3-6a1e-11ea-80ad-dffaf3fb56b4 | Bloom filter allows skipping sstable 55 | 620 | ReadStage-1
be06c7e4-6a1e-11ea-80ad-dffaf3fb56b4 | Bloom filter allows skipping sstable 54 | 659 | ReadStage-1
be06c7e5-6a1e-11ea-80ad-dffaf3fb56b4 | Bloom filter allows skipping sstable 41 | 677 | ReadStage-1
be06c7e6-6a1e-11ea-80ad-dffaf3fb56b4 | Skipped 0/3 non-slice-intersecting sstables, included 0 due to tombstones | 695 | ReadStage-1
be06eef0-6a1e-11ea-80ad-dffaf3fb56b4 | Merged data from memtables and 0 sstables | 1039 | ReadStage-1
be06eef0-6a1e-11ea-8693-577fec389856 | Executing single-partition query on feature_store | 372 | ReadStage-1
be06eef0-6a1e-11ea-8b1a-e5aa876f7d0d | Acquiring sstable references | 583 | ReadStage-1
be06eef0-6a1e-11ea-8d2e-c5837edad3d1 | Executing single-partition query on feature_store | 454 | ReadStage-1
be06eef0-6a1e-11ea-97ca-6b8bbe3a2a36 | Merged data from memtables and 0 sstables | 30372 | ReadStage-3
be06eef1-6a1e-11ea-80ad-dffaf3fb56b4 | Read 16 live rows and 0 tombstone cells | 1125 | ReadStage-1
be06eef1-6a1e-11ea-8693-577fec389856 | Acquiring sstable references | 493 | ReadStage-1
be06eef1-6a1e-11ea-8b1a-e5aa876f7d0d | Bloom filter allows skipping sstable 54 | 703 | ReadStage-1
be06eef1-6a1e-11ea-8d2e-c5837edad3d1 | Acquiring sstable references | 530 | ReadStage-1
be06eef1-6a1e-11ea-97ca-6b8bbe3a2a36 | Read 16 live rows and 0 tombstone cells | 30484 | ReadStage-3
be06eef2-6a1e-11ea-80ad-dffaf3fb56b4 | Enqueuing response to /xx.xx.xxx.158 | 1155 | ReadStage-1
be06eef2-6a1e-11ea-8693-577fec389856 | Bloom filter allows skipping sstable 56 | 721 | ReadStage-1
be06eef2-6a1e-11ea-8b1a-e5aa876f7d0d | Bloom filter allows skipping sstable 41 | 740 | ReadStage-1
be06eef2-6a1e-11ea-8d2e-c5837edad3d1 | Bloom filter allows skipping sstable 56 | 655 | ReadStage-1
be06eef3-6a1e-11ea-80ad-dffaf3fb56b4 | Sending REQUEST_RESPONSE message to cassandra-feature-store-2.cassandra-feature-store.team-data/xx.xx.xxx.158 | 1492 | MessagingService-Outgoing-cassandra-feature-store-2.cassandra-feature-store.team-data/xx.xx.xxx.158-Small
be06eef3-6a1e-11ea-8693-577fec389856 | Bloom filter allows skipping sstable 55 | 780 | ReadStage-1
be06eef3-6a1e-11ea-8b1a-e5aa876f7d0d | Skipped 0/2 non-slice-intersecting sstables, included 0 due to tombstones | 761 | ReadStage-1
be06eef3-6a1e-11ea-8d2e-c5837edad3d1 | Bloom filter allows skipping sstable 55 | 686 | ReadStage-1
be06eef4-6a1e-11ea-8693-577fec389856 | Bloom filter allows skipping sstable 54 | 815 | ReadStage-1
be06eef4-6a1e-11ea-8b1a-e5aa876f7d0d | Merged data from memtables and 0 sstables | 1320 | ReadStage-1
be06eef4-6a1e-11ea-8d2e-c5837edad3d1 | Bloom filter allows skipping sstable 54 | 705 | ReadStage-1
be06eef5-6a1e-11ea-8693-577fec389856 | Bloom filter allows skipping sstable 41 | 839 | ReadStage-1
be06eef5-6a1e-11ea-8b1a-e5aa876f7d0d | Read 16 live rows and 0 tombstone cells | 1495 | ReadStage-1
be06eef5-6a1e-11ea-8d2e-c5837edad3d1 | Bloom filter allows skipping sstable 41 | 720 | ReadStage-1
be06eef6-6a1e-11ea-8693-577fec389856 | Skipped 0/4 non-slice-intersecting sstables, included 0 due to tombstones | 871 | ReadStage-1
be06eef6-6a1e-11ea-8b1a-e5aa876f7d0d | Enqueuing response to /xx.xx.xxx.158 | 1554 | ReadStage-1
be06eef6-6a1e-11ea-8d2e-c5837edad3d1 | Skipped 0/4 non-slice-intersecting sstables, included 0 due to tombstones | 738 | ReadStage-1
be06eef7-6a1e-11ea-8d2e-c5837edad3d1 | Merged data from memtables and 0 sstables | 1157 | ReadStage-1
be06eef8-6a1e-11ea-8d2e-c5837edad3d1 | Read 16 live rows and 0 tombstone cells | 1296 | ReadStage-1
be06eef9-6a1e-11ea-8d2e-c5837edad3d1 | Enqueuing response to /xx.xx.xxx.158 | 1325 | ReadStage-1
be071600-6a1e-11ea-8693-577fec389856 | Merged data from memtables and 0 sstables | 1592 | ReadStage-1
be071600-6a1e-11ea-8b1a-e5aa876f7d0d | Sending REQUEST_RESPONSE message to cassandra-feature-store-2.cassandra-feature-store.team-data/xx.xx.xxx.158 | 1783 | MessagingService-Outgoing-cassandra-feature-store-2.cassandra-feature-store.team-data/xx.xx.xxx.158-Small
be071600-6a1e-11ea-8d2e-c5837edad3d1 | Sending REQUEST_RESPONSE message to /xx.xx.xxx.158 | 1484 | MessagingService-Outgoing-/xx.xx.xxx.158-Small
be071600-6a1e-11ea-97ca-6b8bbe3a2a36 | REQUEST_RESPONSE message received from /xx.xx.xxx.161 | 31525 | MessagingService-Incoming-/xx.xx.xxx.161
be071601-6a1e-11ea-8693-577fec389856 | Read 16 live rows and 0 tombstone cells | 1754 | ReadStage-1
be071601-6a1e-11ea-97ca-6b8bbe3a2a36 | Processing response from /xx.xx.xxx.161 | 31650 | RequestResponseStage-4
be071602-6a1e-11ea-8693-577fec389856 | Enqueuing response to /xx.xx.xxx.158 | 1796 | ReadStage-1
be071602-6a1e-11ea-97ca-6b8bbe3a2a36 | REQUEST_RESPONSE message received from /xx.xx.xx.138 | 31795 | MessagingService-Incoming-/xx.xx.xx.138
be071603-6a1e-11ea-8693-577fec389856 | Sending REQUEST_RESPONSE message to /xx.xx.xxx.158 | 1973 | MessagingService-Outgoing-/xx.xx.xxx.158-Small
be071603-6a1e-11ea-97ca-6b8bbe3a2a36 | Processing response from /xx.xx.xx.138 | 31872 | RequestResponseStage-4
be071604-6a1e-11ea-97ca-6b8bbe3a2a36 | REQUEST_RESPONSE message received from /xx.xx.xxx.162 | 31918 | MessagingService-Incoming-/xx.xx.xxx.162
be071605-6a1e-11ea-97ca-6b8bbe3a2a36 | Processing response from /xx.xx.xxx.162 | 32047 | RequestResponseStage-4
be073d10-6a1e-11ea-97ca-6b8bbe3a2a36 | REQUEST_RESPONSE message received from /xx.xx.xxx.171 | 32688 | MessagingService-Incoming-/xx.xx.xx.171
be073d11-6a1e-11ea-97ca-6b8bbe3a2a36 | Processing response from /xx.xx.xxx.171 | 32827 | RequestResponseStage-2
be073d12-6a1e-11ea-97ca-6b8bbe3a2a36 | Initiating read-repair | 32985 | RequestResponseStage-2
Failure:
cqlsh> select * from system_traces.sessions where session_id=472551e0-6a1f-11ea-97ca-6b8bbe3a2a36;
session_id | client | command | coordinator | duration | parameters | request | started_at
--------------------------------------+---------------+---------+---------------+----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------+---------------------------------
472551e0-6a1f-11ea-97ca-6b8bbe3a2a36 | xx.xx.xxx.189 | QUERY | xx.xx.xxx.158 | 3044 | {'bound_var_0_entities': '''ml_project/test_test_entity:1:entity1_uuid=Zach_Yang_fe7fea92|entity2_uuid=TenderGreens_8755fff7''', 'consistency_level': 'LOCAL_QUORUM', 'page_size': '5000', 'query': 'SELECT entities, feature, value, WRITETIME(value) as writetime FROM feast.feature_store WHERE entities = ?', 'serial_consistency_level': 'SERIAL'} | Execute CQL3 prepared query | 2020-03-19 20:21:55.838000+0000
cqlsh> select event_id, activity, source_elapsed, thread from system_traces.events where session_id=472551e0-6a1f-11ea-97ca-6b8bbe3a2a36;
event_id | activity | source_elapsed | thread
--------------------------------------+---------------------------------------------------------------------------------------------------------------+----------------+-----------------------------------------------------------------------------------------------------------
472578f0-6a1f-11ea-80ad-dffaf3fb56b4 | READ message received from /xx.xx.xxx.158 | 18 | MessagingService-Incoming-/xx.xx.xxx.158
472578f0-6a1f-11ea-97ca-6b8bbe3a2a36 | reading digest from /xx.xx.xxx.138 | 619 | Native-Transport-Requests-1
472578f1-6a1f-11ea-97ca-6b8bbe3a2a36 | Executing single-partition query on feature_store | 708 | ReadStage-2
472578f2-6a1f-11ea-97ca-6b8bbe3a2a36 | reading digest from /xx.xx.xxx.161 | 755 | Native-Transport-Requests-1
472578f3-6a1f-11ea-97ca-6b8bbe3a2a36 | Acquiring sstable references | 768 | ReadStage-2
472578f4-6a1f-11ea-97ca-6b8bbe3a2a36 | Sending READ message to cassandra-feature-store-1.cassandra-feature-store.team-data/xx.xx.xxx.138 | 836 | MessagingService-Outgoing-cassandra-feature-store-1.cassandra-feature-store.team-data/xx.xx.xxx.138-Small
472578f5-6a1f-11ea-97ca-6b8bbe3a2a36 | Bloom filter allows skipping sstable 56 | 859 | ReadStage-2
472578f6-6a1f-11ea-97ca-6b8bbe3a2a36 | speculating read retry on /xx.xx.xx.171 | 862 | Native-Transport-Requests-1
472578f7-6a1f-11ea-97ca-6b8bbe3a2a36 | Sending READ message to /xx.xx.xxx.161 | 893 | MessagingService-Outgoing-/xx.xx.xxx.161-Small
472578f8-6a1f-11ea-97ca-6b8bbe3a2a36 | Bloom filter allows skipping sstable 55 | 903 | ReadStage-2
472578f9-6a1f-11ea-97ca-6b8bbe3a2a36 | Bloom filter allows skipping sstable 54 | 929 | ReadStage-2
472578fa-6a1f-11ea-97ca-6b8bbe3a2a36 | Sending READ message to cassandra-feature-store-0.cassandra-feature-store.team-data/xx.xx.xx.171 | 982 | MessagingService-Outgoing-cassandra-feature-store-0.cassandra-feature-store.team-data/xx.xx.xx.171-Small
472578fb-6a1f-11ea-97ca-6b8bbe3a2a36 | Bloom filter allows skipping sstable 41 | 996 | ReadStage-2
472578fc-6a1f-11ea-97ca-6b8bbe3a2a36 | Skipped 0/4 non-slice-intersecting sstables, included 0 due to tombstones | 1039 | ReadStage-2
472578fd-6a1f-11ea-97ca-6b8bbe3a2a36 | Merged data from memtables and 0 sstables | 1227 | ReadStage-2
472578fe-6a1f-11ea-97ca-6b8bbe3a2a36 | Read 0 live rows and 0 tombstone cells | 1282 | ReadStage-2
4725a000-6a1f-11ea-80ad-dffaf3fb56b4 | Executing single-partition query on feature_store | 226 | ReadStage-2
4725a000-6a1f-11ea-8693-577fec389856 | READ message received from /xx.xx.xxx.158 | 12 | MessagingService-Incoming-/xx.xx.xxx.158
4725a000-6a1f-11ea-8d2e-c5837edad3d1 | READ message received from /xx.xx.xxx.158 | 15 | MessagingService-Incoming-/xx.xx.xxx.158
4725a001-6a1f-11ea-80ad-dffaf3fb56b4 | Acquiring sstable references | 297 | ReadStage-2
4725a001-6a1f-11ea-8693-577fec389856 | Executing single-partition query on feature_store | 258 | ReadStage-1
4725a001-6a1f-11ea-8d2e-c5837edad3d1 | Executing single-partition query on feature_store | 230 | ReadStage-1
4725a002-6a1f-11ea-80ad-dffaf3fb56b4 | Bloom filter allows skipping sstable 55 | 397 | ReadStage-2
4725a002-6a1f-11ea-8693-577fec389856 | Acquiring sstable references | 327 | ReadStage-1
4725a002-6a1f-11ea-8d2e-c5837edad3d1 | Acquiring sstable references | 297 | ReadStage-1
4725a003-6a1f-11ea-80ad-dffaf3fb56b4 | Bloom filter allows skipping sstable 54 | 433 | ReadStage-2
4725a003-6a1f-11ea-8693-577fec389856 | Bloom filter allows skipping sstable 56 | 451 | ReadStage-1
4725a003-6a1f-11ea-8d2e-c5837edad3d1 | Bloom filter allows skipping sstable 56 | 439 | ReadStage-1
4725a004-6a1f-11ea-80ad-dffaf3fb56b4 | Bloom filter allows skipping sstable 41 | 450 | ReadStage-2
4725a004-6a1f-11ea-8693-577fec389856 | Bloom filter allows skipping sstable 55 | 512 | ReadStage-1
4725a004-6a1f-11ea-8d2e-c5837edad3d1 | Bloom filter allows skipping sstable 55 | 492 | ReadStage-1
4725a005-6a1f-11ea-80ad-dffaf3fb56b4 | Skipped 0/3 non-slice-intersecting sstables, included 0 due to tombstones | 466 | ReadStage-2
4725a005-6a1f-11ea-8693-577fec389856 | Bloom filter allows skipping sstable 54 | 570 | ReadStage-1
4725a005-6a1f-11ea-8d2e-c5837edad3d1 | Bloom filter allows skipping sstable 54 | 513 | ReadStage-1
4725a006-6a1f-11ea-80ad-dffaf3fb56b4 | Merged data from memtables and 0 sstables | 648 | ReadStage-2
4725a006-6a1f-11ea-8693-577fec389856 | Bloom filter allows skipping sstable 41 | 606 | ReadStage-1
4725a006-6a1f-11ea-8d2e-c5837edad3d1 | Bloom filter allows skipping sstable 41 | 526 | ReadStage-1
4725a007-6a1f-11ea-80ad-dffaf3fb56b4 | Read 0 live rows and 0 tombstone cells | 708 | ReadStage-2
4725a007-6a1f-11ea-8693-577fec389856 | Skipped 0/4 non-slice-intersecting sstables, included 0 due to tombstones | 631 | ReadStage-1
4725a007-6a1f-11ea-8d2e-c5837edad3d1 | Skipped 0/4 non-slice-intersecting sstables, included 0 due to tombstones | 542 | ReadStage-1
4725a008-6a1f-11ea-80ad-dffaf3fb56b4 | Enqueuing response to /xx.xx.xxx.158 | 727 | ReadStage-2
4725a008-6a1f-11ea-8d2e-c5837edad3d1 | Merged data from memtables and 0 sstables | 700 | ReadStage-1
4725a009-6a1f-11ea-80ad-dffaf3fb56b4 | Sending REQUEST_RESPONSE message to cassandra-feature-store-2.cassandra-feature-store.team-data/xx.xx.xxx.158 | 838 | MessagingService-Outgoing-cassandra-feature-store-2.cassandra-feature-store.team-data/xx.xx.xxx.158-Small
4725a009-6a1f-11ea-8d2e-c5837edad3d1 | Read 0 live rows and 0 tombstone cells | 756 | ReadStage-1
4725a00a-6a1f-11ea-8d2e-c5837edad3d1 | Enqueuing response to /xx.xx.xxx.158 | 772 | ReadStage-1
4725a00b-6a1f-11ea-8d2e-c5837edad3d1 | Sending REQUEST_RESPONSE message to /xx.xx.xxx.158 | 914 | MessagingService-Outgoing-/xx.xx.xxx.158-Small
4725c710-6a1f-11ea-8693-577fec389856 | Merged data from memtables and 0 sstables | 845 | ReadStage-1
4725c710-6a1f-11ea-97ca-6b8bbe3a2a36 | REQUEST_RESPONSE message received from /xx.xx.xxx.161 | 2327 | MessagingService-Incoming-/xx.xx.xxx.161
4725c711-6a1f-11ea-8693-577fec389856 | Read 0 live rows and 0 tombstone cells | 905 | ReadStage-1
4725c711-6a1f-11ea-97ca-6b8bbe3a2a36 | Processing response from /xx.xx.xxx.161 | 2443 | RequestResponseStage-2
4725c712-6a1f-11ea-8693-577fec389856 | Enqueuing response to /xx.xx.xxx.158 | 929 | ReadStage-1
4725c712-6a1f-11ea-97ca-6b8bbe3a2a36 | REQUEST_RESPONSE message received from /xx.xx.xxx.138 | 2571 | MessagingService-Incoming-/xx.xx.xxx.138
4725c713-6a1f-11ea-8693-577fec389856 | Sending REQUEST_RESPONSE message to /xx.xx.xxx.158 | 1023 | MessagingService-Outgoing-/xx.xx.xxx.158-Small
4725c713-6a1f-11ea-97ca-6b8bbe3a2a36 | Processing response from /xx.xx.xxx.138 | 2712 | RequestResponseStage-2
4725c714-6a1f-11ea-97ca-6b8bbe3a2a36 | REQUEST_RESPONSE message received from /xx.xx.xx.171 | 2725 | MessagingService-Incoming-/xx.xx.xx.171
4725c715-6a1f-11ea-97ca-6b8bbe3a2a36 | Processing response from /xx.xx.xx.171 | 2797 | RequestResponseStage-2
4725c716-6a1f-11ea-97ca-6b8bbe3a2a36 | Initiating read-repair | 2855 | RequestResponseStage-2
Keyspace info
cqlsh> describe keyspace feast;
CREATE KEYSPACE feast WITH replication = {'class': 'NetworkTopologyStrategy', 'stage-us-west1': '5'} AND durable_writes = true;
CREATE TABLE feast.feature_store (
entities text,
feature text,
value blob,
PRIMARY KEY (entities, feature)
) WITH CLUSTERING ORDER BY (feature ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';

GCP Dataproc - Slow read speed from GCS

I have a GCP dataproc cluster where I'm running a job. The input of the job is a folder where there are 200 part files. Each part file is approx 1.2 GB big.
My job is just map operations
val df = spark.read.parquet("gs://bucket/data/src/....")
df.withColumn("a", lit("b")).write.save("gs://bucket/data/dest/...")
The property parquet.block.size is set to 128 MB which means that each part file will be read 10 times during the job.
I enabled the bucket access logging and looked at the stats and I was surprised to see that each part file is getting access whopping 85 times. I can see that there are only 10 requests which send the actual data other requests are either sending 0 bytes in return or some very small amount.
I do understand that reading a big parquet file in splits is standard Spark behavior. Also there must be some metadata exchange requests as well but 8X calls is something very strange. Also if I take a look at amount of data transferred and time taken it looks like that data is getting transferred at 100 MB/mins speed which is very very slow for google's internal data transfer (from GCS to dataproc). I am attaching a CSV with bytes, time taken, url for one part file.
Has anybody experienced such behavior with dataproc? Is there an explanation for so many requests to the file and such slow transfer rates.
As a side note both bucket and dataproc cluster are in same region. There are 50 workers with n1-standard-16 machines.
Since I could not attach the file I'm pasting the formatted contents here.
| sc_bytes | time_taken_micros | cs_uri |
|-----------|-------------------|-----------------------------------------------------------------------------------------------------------------------------------------|
| 0 | 21000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 0 | 17000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 0 | 22000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 709922 | 164000 | /download/storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet?alt=media |
| 709922 | 86000 | /download/storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet?alt=media |
| 709922 | 173000 | /download/storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet?alt=media |
| 0 | 18000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 8 | 47000 | /download/storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet?alt=media |
| 0 | 17000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 0 | 18000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 0 | 17000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 8 | 51000 | /download/storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet?alt=media |
| 0 | 17000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 0 | 18000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 0 | 12000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 0 | 17000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 8 | 103000 | /download/storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet?alt=media |
| 709922 | 98000 | /download/storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet?alt=media |
| 8 | 42000 | /download/storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet?alt=media |
| 0 | 18000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 0 | 18000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 0 | 17000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 709922 | 88000 | /download/storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet?alt=media |
| 8 | 42000 | /download/storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet?alt=media |
| 0 | 18000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 0 | 18000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 0 | 20000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 0 | 20000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 8 | 40000 | /download/storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet?alt=media |
| 0 | 17000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 0 | 20000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 0 | 15000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 0 | 19000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 0 | 18000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 0 | 17000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 143092175 | 63484000 | /download/storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet?alt=media |
| 0 | 16000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 0 | 17000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 0 | 19000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 0 | 32000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 0 | 16000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 0 | 19000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 0 | 18000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 0 | 14000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 137585202 | 66010000 | /download/storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet?alt=media |
| 136726977 | 66732000 | /download/storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet?alt=media |
| 176684024 | 101921000 | /download/storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet?alt=media |
| 0 | 32000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 0 | 17000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 709922 | 113000 | /download/storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet?alt=media |
| 0 | 16000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 0 | 23000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 0 | 16000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 0 | 17000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 0 | 19000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 134187229 | 64401000 | /download/storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet?alt=media |
| 135450987 | 73632000 | /download/storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet?alt=media |
| 0 | 24000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 0 | 21000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 0 | 15000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 0 | 17000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 0 | 15000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 0 | 27000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 709922 | 106000 | /download/storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet?alt=media |
| 137020002 | 66333000 | /download/storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet?alt=media |
| 0 | 17000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 0 | 17000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 0 | 24000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 8 | 41000 | /download/storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet?alt=media |
| 0 | 25000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 0 | 16000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 8 | 39000 | /download/storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet?alt=media |
| 0 | 20000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 709922 | 135000 | /download/storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet?alt=media |
| 0 | 16000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 0 | 19000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 709922 | 126000 | /download/storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet?alt=media |
| 8 | 41000 | /download/storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet?alt=media |
| 0 | 18000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 135686216 | 71676000 | /download/storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet?alt=media |
| 179573683 | 90877000 | /download/storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet?alt=media |

Relatively large number of GCS metadata requests (URLs without ?alt=media parameter in your table) is expected in this case. Job driver performs metadata requests to list files and get their sizes to generate splits, after that for each split workers perform multiple metadata requests to check if files exist, get their size, etc. I think that this seeming inefficiency stems from the fact that Spark uses HDFS interface to access GCS and because HDFS requests have much lower latency than GCS, I don't think that whole Hadoop/Spark stack was heavily optimized to reduce number of HDFS requests.
To address this issue, on Spark level, you may want to enable metadata caching with spark.sql.parquet.cacheMetadata=true property.
On GCS connector level, to reduce number of GCS metadata requests you can enable metadata cache with fs.gs.performance.cache.enable=true property (with spark.hadoop. prefix for Spark), but it can introduce some metadata staleness.
Also, to take advantage of latest improvements in GCS connector (including reduced number of GCS metadata requests and support for random reads) you may want to update it in your cluster to latest version or use Dataproc 1.3 that has it pre-installed.
Regarding read speed, you may want to allocate more worker tasks per each VM which will increase read speed by increasing number of simultaneous reads.
Also, you may want to check if read speed is limited by write speed for your workload, by removing write to the GCS at the end entirely or replacing it with with write to HDFS or some computation instead.

Cassandra replicates to a node different from the one in nodetool getendpoints

I have a table in Cassandra:
CREATE TABLE imdb.movies_by_actor (
actor text,
movie_id uuid,
character text,
movie_title text,
salary int,
PRIMARY KEY (actor, movie_id)
) WITH CLUSTERING ORDER BY (movie_id ASC)
actor | movie_id | character | movie_title | salary
-----------+--------------------------------------+-----------+-------------+--------
Tom Hanks | 767b7a89-868c-46ce-8fa6-f6184dfb6d69 | Dad | Seattle | 25000
Tom Hanks | a9a64b89-a19d-46e9-b5ee-991ac4939891 | Officer | Green mile | 20000
Then I find out which nodes are responsible for the 'Tom Hanks' partition:
select token(actor) from movies_by_actor ;
system.token(actor)
----------------------
-4258050846863339499
-4258050846863339499
root#2f5aa8d649e2:/# nodetool getendpoints imdb movies_by_actor -4258050846863339499
172.13.0.6
172.13.0.3
172.13.0.4
Then I shut the node corresponding to 172.13.0.6 down:
docker stop cassandra6
root#2f5aa8d649e2:/# ping 172.13.0.6
PING 172.13.0.6 (172.13.0.6): 56 data bytes
92 bytes from 2f5aa8d649e2 (172.13.0.2): Destination Host Unreachable
When I try to update the row and look at tracing info, it looks like data are sent to 172.13.0.2, 172.13.0.4, 172.13.0.5:
cqlsh:imdb> update movies_by_actor set salary = 26000 where actor = 'Tom Hanks' and movie_id = 767b7a89-868c-46ce-8fa6-f6184dfb6d69;
Tracing session: f44dbd70-4228-11e7-89c9-cf534e0135c6
activity | timestamp | source | source_elapsed | client
----------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+------------+----------------+-----------
Execute CQL3 query | 2017-05-26 15:35:32.295000 | 172.13.0.2 | 0 | 127.0.0.1
Parsing update movies_by_actor set salary = 26000 where actor = 'Tom Hanks' and movie_id = 767b7a89-868c-46ce-8fa6-f6184dfb6d69; [Native-Transport-Requests-1] | 2017-05-26 15:35:32.295000 | 172.13.0.2 | 303 | 127.0.0.1
Preparing statement [Native-Transport-Requests-1] | 2017-05-26 15:35:32.295000 | 172.13.0.2 | 646 | 127.0.0.1
Determining replicas for mutation [Native-Transport-Requests-1] | 2017-05-26 15:35:32.296000 | 172.13.0.2 | 1181 | 127.0.0.1
Appending to commitlog [MutationStage-3] | 2017-05-26 15:35:32.296000 | 172.13.0.2 | 1420 | 127.0.0.1
Adding to movies_by_actor memtable [MutationStage-3] | 2017-05-26 15:35:32.296000 | 172.13.0.2 | 1557 | 127.0.0.1
Sending MUTATION message to /172.13.0.4 [MessagingService-Outgoing-/172.13.0.4-Small] | 2017-05-26 15:35:32.296000 | 172.13.0.2 | 1567 | 127.0.0.1
Sending MUTATION message to /172.13.0.5 [MessagingService-Outgoing-/172.13.0.5-Small] | 2017-05-26 15:35:32.296000 | 172.13.0.2 | 1583 | 127.0.0.1
MUTATION message received from /172.13.0.2 [MessagingService-Incoming-/172.13.0.2] | 2017-05-26 15:35:32.297000 | 172.13.0.4 | 27 | 127.0.0.1
MUTATION message received from /172.13.0.2 [MessagingService-Incoming-/172.13.0.2] | 2017-05-26 15:35:32.297000 | 172.13.0.5 | 23 | 127.0.0.1
Appending to commitlog [MutationStage-1] | 2017-05-26 15:35:32.297000 | 172.13.0.4 | 332 | 127.0.0.1
Adding to movies_by_actor memtable [MutationStage-1] | 2017-05-26 15:35:32.297000 | 172.13.0.4 | 577 | 127.0.0.1
Enqueuing response to /172.13.0.2 [MutationStage-1] | 2017-05-26 15:35:32.298000 | 172.13.0.4 | 884 | 127.0.0.1
Appending to commitlog [MutationStage-2] | 2017-05-26 15:35:32.298000 | 172.13.0.5 | 1526 | 127.0.0.1
Sending REQUEST_RESPONSE message to /172.13.0.2 [MessagingService-Outgoing-/172.13.0.2-Small] | 2017-05-26 15:35:32.298000 | 172.13.0.4 | 1122 | 127.0.0.1
Adding to movies_by_actor memtable [MutationStage-2] | 2017-05-26 15:35:32.299000 | 172.13.0.5 | 1854 | 127.0.0.1
Enqueuing response to /172.13.0.2 [MutationStage-2] | 2017-05-26 15:35:32.299000 | 172.13.0.5 | 2187 | 127.0.0.1
Sending REQUEST_RESPONSE message to /172.13.0.2 [MessagingService-Outgoing-/172.13.0.2-Small] | 2017-05-26 15:35:32.299000 | 172.13.0.5 | 2423 | 127.0.0.1
REQUEST_RESPONSE message received from /172.13.0.4 [MessagingService-Incoming-/172.13.0.4] | 2017-05-26 15:35:32.300000 | 172.13.0.2 | 56 | 127.0.0.1
REQUEST_RESPONSE message received from /172.13.0.5 [MessagingService-Incoming-/172.13.0.5] | 2017-05-26 15:35:32.300000 | 172.13.0.2 | 15 | 127.0.0.1
Processing response from /172.13.0.5 [RequestResponseStage-5] | 2017-05-26 15:35:32.300000 | 172.13.0.2 | 273 | 127.0.0.1
Processing response from /172.13.0.4 [RequestResponseStage-4] | 2017-05-26 15:35:32.300000 | 172.13.0.2 | 774 | 127.0.0.1
Request complete | 2017-05-26 15:35:32.296887 | 172.13.0.2 | 1887 | 127.0.0.1
Selecting with consistency level ALL also works although 172.13.0.6 is down. Could someone explain it please?

The command nodetool getendpoints received partition key value as parameter not the token
But there is a issue with nodetool getendpoints when the parameter value contain space.
You could use the script from the answer : https://stackoverflow.com/a/43155224/2320144
Or
You could run a nodetool ring to list out the token ranges for each node, and see which nodes are responsible for that range.
Source : https://stackoverflow.com/a/30515201/2320144

Cassandra delayed / denied updates

I'm having trouble with a small Cassandra cluster that used to work well. I used to have 3 nodes. When I added the 4th, I started seeing some issues with values not updating, so I did nodetool repair (a few times now) on the entire cluster. I should mention I did the switch at the same time as the upgrade from python-cql to the new python cassandra driver.
Essentially the weirdness falls into two cases:
Denied Updates:
cqlsh:analytics> select * from metrics where id = '36122cc69a7a12e266ab40f5b7756daee75bd0d2735a707b369302acb879eedc';
(0 rows)
Tracing session: 19e5bbd0-d172-11e3-a039-67dcdc0d02de
activity | timestamp | source | source_elapsed
--------------------------------------------------------------------------------------------------------------------------+--------------+----------------+----------------
execute_cql3_query | 20:49:34,221 | 10.128.214.245 | 0
Parsing select * from metrics where id = '36122cc69a7a12e266ab40f5b7756daee75bd0d2735a707b369302acb879eedc' LIMIT 10000; | 20:49:34,221 | 10.128.214.245 | 176
Preparing statement | 20:49:34,222 | 10.128.214.245 | 311
Sending message to /10.128.180.108 | 20:49:34,222 | 10.128.214.245 | 773
Message received from /10.128.214.245 | 20:49:34,224 | 10.128.180.108 | 67
Row cache hit | 20:49:34,225 | 10.128.180.108 | 984
Read 0 live and 0 tombstoned cells | 20:49:34,225 | 10.128.180.108 | 1079
Message received from /10.128.180.108 | 20:49:34,227 | 10.128.214.245 | 5760
Enqueuing response to /10.128.214.245 | 20:49:34,227 | 10.128.180.108 | 3045
Processing response from /10.128.180.108 | 20:49:34,227 | 10.128.214.245 | 5942
Sending message to /10.128.214.245 | 20:49:34,227 | 10.128.180.108 | 3302
Request complete | 20:49:34,227 | 10.128.214.245 | 6282
cqlsh:analytics> update metrics set n = n + 1 where id = '36122cc69a7a12e266ab40f5b7756daee75bd0d2735a707b369302acb879eedc';
Tracing session: 20845ff0-d172-11e3-a039-67dcdc0d02de
activity | timestamp | source | source_elapsed
---------------------------------------------------------------------------------------------------------------------+--------------+----------------+----------------
execute_cql3_query | 20:49:45,328 | 10.128.214.245 | 0
Parsing update metrics set n = n + 1 where id = '36122cc69a7a12e266ab40f5b7756daee75bd0d2735a707b369302acb879eedc'; | 20:49:45,328 | 10.128.214.245 | 129
Preparing statement | 20:49:45,328 | 10.128.214.245 | 227
Determining replicas for mutation | 20:49:45,328 | 10.128.214.245 | 298
Enqueuing counter update to /10.128.194.70 | 20:49:45,328 | 10.128.214.245 | 425
Sending message to /10.128.194.70 | 20:49:45,329 | 10.128.214.245 | 598
Message received from /10.128.214.245 | 20:49:45,330 | 10.128.194.70 | 37
Acquiring switchLock read lock | 20:49:45,331 | 10.128.194.70 | 623
Message received from /10.128.194.70 | 20:49:45,331 | 10.128.214.245 | 3335
Appending to commitlog | 20:49:45,331 | 10.128.194.70 | 645
Processing response from /10.128.194.70 | 20:49:45,331 | 10.128.214.245 | 3431
Adding to metrics memtable | 20:49:45,331 | 10.128.194.70 | 692
Sending message to /10.128.214.245 | 20:49:45,332 | 10.128.194.70 | 1120
Row cache miss | 20:49:45,332 | 10.128.194.70 | 1611
Executing single-partition query on metrics | 20:49:45,332 | 10.128.194.70 | 1687
Acquiring sstable references | 20:49:45,332 | 10.128.194.70 | 1692
Merging memtable tombstones | 20:49:45,332 | 10.128.194.70 | 1692
Key cache hit for sstable 13958 | 20:49:45,332 | 10.128.194.70 | 1714
Seeking to partition beginning in data file | 20:49:45,332 | 10.128.194.70 | 1856
Key cache hit for sstable 14036 | 20:49:45,333 | 10.128.194.70 | 2271
Seeking to partition beginning in data file | 20:49:45,333 | 10.128.194.70 | 2271
Skipped 0/2 non-slice-intersecting sstables, included 0 due to tombstones | 20:49:45,333 | 10.128.194.70 | 2540
Merging data from memtables and 2 sstables | 20:49:45,333 | 10.128.194.70 | 2564
Read 0 live and 1 tombstoned cells | 20:49:45,333 | 10.128.194.70 | 2632
Sending message to /10.128.195.149 | 20:49:45,335 | 10.128.194.70 | null
Message received from /10.128.194.70 | 20:49:45,335 | 10.128.180.108 | 43
Sending message to /10.128.180.108 | 20:49:45,335 | 10.128.194.70 | null
Acquiring switchLock read lock | 20:49:45,335 | 10.128.180.108 | 297
Appending to commitlog | 20:49:45,335 | 10.128.180.108 | 312
Message received from /10.128.194.70 | 20:49:45,336 | 10.128.195.149 | 53
Adding to metrics memtable | 20:49:45,336 | 10.128.180.108 | 374
Enqueuing response to /10.128.194.70 | 20:49:45,336 | 10.128.180.108 | 445
Sending message to /10.128.194.70 | 20:49:45,336 | 10.128.180.108 | 677
Message received from /10.128.180.108 | 20:49:45,337 | 10.128.194.70 | null
Processing response from /10.128.180.108 | 20:49:45,337 | 10.128.194.70 | null
Acquiring switchLock read lock | 20:49:45,338 | 10.128.195.149 | 1874
Appending to commitlog | 20:49:45,338 | 10.128.195.149 | 1970
Adding to metrics memtable | 20:49:45,338 | 10.128.195.149 | 2027
Enqueuing response to /10.128.194.70 | 20:49:45,338 | 10.128.195.149 | 2147
Sending message to /10.128.194.70 | 20:49:45,338 | 10.128.195.149 | 2572
Message received from /10.128.195.149 | 20:49:45,339 | 10.128.194.70 | null
Processing response from /10.128.195.149 | 20:49:45,339 | 10.128.194.70 | null
Request complete | 20:49:45,331 | 10.128.214.245 | 3556
cqlsh:analytics> select * from metrics where id = '36122cc69a7a12e266ab40f5b7756daee75bd0d2735a707b369302acb879eedc';
(0 rows)
Tracing session: 28f1f7b0-d172-11e3-a039-67dcdc0d02de
activity | timestamp | source | source_elapsed
--------------------------------------------------------------------------------------------------------------------------+--------------+----------------+----------------
execute_cql3_query | 20:49:59,468 | 10.128.214.245 | 0
Parsing select * from metrics where id = '36122cc69a7a12e266ab40f5b7756daee75bd0d2735a707b369302acb879eedc' LIMIT 10000; | 20:49:59,468 | 10.128.214.245 | 119
Preparing statement | 20:49:59,468 | 10.128.214.245 | 235
Sending message to /10.128.180.108 | 20:49:59,468 | 10.128.214.245 | 574
Message received from /10.128.214.245 | 20:49:59,469 | 10.128.180.108 | 49
Row cache miss | 20:49:59,470 | 10.128.180.108 | 817
Executing single-partition query on metrics | 20:49:59,470 | 10.128.180.108 | 877
Acquiring sstable references | 20:49:59,470 | 10.128.180.108 | 888
Merging memtable tombstones | 20:49:59,470 | 10.128.180.108 | 938
Key cache hit for sstable 5399 | 20:49:59,470 | 10.128.180.108 | 1025
Seeking to partition beginning in data file | 20:49:59,470 | 10.128.180.108 | 1033
Message received from /10.128.180.108 | 20:49:59,471 | 10.128.214.245 | 3378
Skipped 0/3 non-slice-intersecting sstables, included 0 due to tombstones | 20:49:59,471 | 10.128.180.108 | 1495
Processing response from /10.128.180.108 | 20:49:59,471 | 10.128.214.245 | 3466
Merging data from memtables and 1 sstables | 20:49:59,471 | 10.128.180.108 | 1507
Read 0 live and 1 tombstoned cells | 20:49:59,471 | 10.128.180.108 | 1660
Read 0 live and 0 tombstoned cells | 20:49:59,471 | 10.128.180.108 | 1759
Enqueuing response to /10.128.214.245 | 20:49:59,471 | 10.128.180.108 | 1817
Sending message to /10.128.214.245 | 20:49:59,471 | 10.128.180.108 | 1977
Request complete | 20:49:59,471 | 10.128.214.245 | 3638
So this is pretty straight forward. From y reading, it might have to do with tombstones timestamps that would have somehow gotten messed up. However, it's been several days and "now" still hasn't caught up to the future timestamp of the tombstone. Is there any way to just reset every single timestamp of the entire cluster to 0 while activity is stopped, I can live with slightly inacurate data right now.
The second issue is on some of my tables. Some tables reflect changes instantly, but for others, the changes will get reflected between 30 minutes to an hour later. I can't figure out how timestamps might relate to this.
I've synced all nodes of my cluster using NTP, not the most precise, but won't be out of sync in the scale of days or anything. All the nodes have been synced like this from the beginning, at no point did I have wildly out of sync times.
Can anybody help? As I was saying, by this point I'd settle for shutting down access to the cluster and resetting all timestamps to 0, I don't care about getting some of the order wrong, I just want this thing to work.
Thanks

Timestamps are immutable. You'd have to truncate the table and rebuild it. The easiest way to rebuild is to just insert correct data, but if that's not an option, you can round trip through sstable2json -> edit timestamps -> json2sstable.

Cassandra 1.2 merging data from memtables and sstables takes too long

Here is a trace from a 4 node cassandra cluster, running 1.2.6. I'm seeing a timeout with a simple select when the cluster is under no load and I need some help getting to the bottom of it.
activity | timestamp | source | source_elapsed
-------------------------------------------------------------------------+--------------+---------------+----------------
execute_cql3_query | 05:21:00,848 | 100.69.176.51 | 0
Parsing select * from user_scores where user_id='26257166' LIMIT 10000; | 05:21:00,848 | 100.69.176.51 | 77
Peparing statement | 05:21:00,848 | 100.69.176.51 | 225
Executing single-partition query on user_scores | 05:21:00,849 | 100.69.176.51 | 589
Acquiring sstable references | 05:21:00,849 | 100.69.176.51 | 626
Merging memtable tombstones | 05:21:00,849 | 100.69.176.51 | 676
Key cache hit for sstable 34 | 05:21:00,849 | 100.69.176.51 | 817
Seeking to partition beginning in data file | 05:21:00,849 | 100.69.176.51 | 836
Key cache hit for sstable 32 | 05:21:00,849 | 100.69.176.51 | 1135
Seeking to partition beginning in data file | 05:21:00,849 | 100.69.176.51 | 1153
Merging data from memtables and 2 sstables | 05:21:00,850 | 100.69.176.51 | 1394
Request complete | 05:21:20,881 | 100.69.176.51 | 20033807
Here is the schema. You can see that is includes a few collections.
create table user_scores
(
user_id varchar,
post_type varchar,
score double,
team_to_score_map map<varchar, double>,
affiliation_to_score_map map<varchar, double>,
campaign_to_score_map map<varchar, double>,
person_to_score_map map<varchar, double>,
primary key(user_id, post_type)
)
with compaction =
{
'class' : 'LeveledCompactionStrategy',
'sstable_size_in_mb' : 10
};
I added the leveled compaction strategy as it was supposed to help with read latency.
I'd like to understand what could cause the cluster to timeout during the merge phase. Not all queries timeout. It appears to happen more frequently with rows that have maps with a larger number of entries.
Here is another trace of a failure for good measure. It is very reproducable:
activity | timestamp | source | source_elapsed
-------------------------------------------------------------------------+--------------+----------------+----------------
execute_cql3_query | 05:51:34,557 | 100.69.176.51 | 0
Message received from /100.69.176.51 | 05:51:34,195 | 100.69.184.134 | 102
Executing single-partition query on user_scores | 05:51:34,199 | 100.69.184.134 | 3512
Acquiring sstable references | 05:51:34,199 | 100.69.184.134 | 3741
Merging memtable tombstones | 05:51:34,199 | 100.69.184.134 | 3890
Key cache hit for sstable 5 | 05:51:34,199 | 100.69.184.134 | 4040
Seeking to partition beginning in data file | 05:51:34,199 | 100.69.184.134 | 4059
Merging data from memtables and 1 sstables | 05:51:34,200 | 100.69.184.134 | 4412
Parsing select * from user_scores where user_id='26257166' LIMIT 10000; | 05:51:34,558 | 100.69.176.51 | 91
Peparing statement | 05:51:34,558 | 100.69.176.51 | 238
Enqueuing data request to /100.69.184.134 | 05:51:34,558 | 100.69.176.51 | 567
Sending message to /100.69.184.134 | 05:51:34,558 | 100.69.176.51 | 979
Request complete | 05:51:54,562 | 100.69.176.51 | 20005209
And a trace from when it works:
activity | timestamp | source | source_elapsed
--------------------------------------------------------------------------+--------------+----------------+----------------
execute_cql3_query | 05:55:07,772 | 100.69.176.51 | 0
Message received from /100.69.176.51 | 05:55:07,408 | 100.69.184.134 | 53
Executing single-partition query on user_scores | 05:55:07,409 | 100.69.184.134 | 1014
Acquiring sstable references | 05:55:07,409 | 100.69.184.134 | 1087
Merging memtable tombstones | 05:55:07,410 | 100.69.184.134 | 1209
Partition index with 0 entries found for sstable 5 | 05:55:07,410 | 100.69.184.134 | 1681
Seeking to partition beginning in data file | 05:55:07,410 | 100.69.184.134 | 1732
Merging data from memtables and 1 sstables | 05:55:07,411 | 100.69.184.134 | 2415
Read 1 live and 0 tombstoned cells | 05:55:07,412 | 100.69.184.134 | 3274
Enqueuing response to /100.69.176.51 | 05:55:07,412 | 100.69.184.134 | 3534
Sending message to /100.69.176.51 | 05:55:07,412 | 100.69.184.134 | 3936
Parsing select * from user_scores where user_id='305722020' LIMIT 10000; | 05:55:07,772 | 100.69.176.51 | 96
Peparing statement | 05:55:07,772 | 100.69.176.51 | 262
Enqueuing data request to /100.69.184.134 | 05:55:07,773 | 100.69.176.51 | 600
Sending message to /100.69.184.134 | 05:55:07,773 | 100.69.176.51 | 847
Message received from /100.69.184.134 | 05:55:07,778 | 100.69.176.51 | 6103
Processing response from /100.69.184.134 | 05:55:07,778 | 100.69.176.51 | 6341
Request complete | 05:55:07,778 | 100.69.176.51 | 6780

Looks like I was running into a performance issue with 1.2. Fortunately a patch had just been applied to the 1.2 branch, so when I built from source my problem went away.
see https://issues.apache.org/jira/browse/CASSANDRA-5677 for a detailed explanation.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string