presto with hudi - select * from table - apache-spark

I have a parquet record created with hudi off a spark kinesis stream and stored in S3.
An AWS glue table is generated from this record. I update the InputRecord type to org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat as per instructions https://cwiki.apache.org/confluence/display/HUDI/Migration+Guide+From+com.uber.hoodie+to+org.apache.hudi
From the presto-cli i run
presto-cli --catalog hive --schema my-schema --server my-server:8889
presto:my-schema> select * from table
this returns
Query 20200211_185222_00050_hej8h, FAILED, 1 node
Splits: 17 total, 0 done (0.00%)
0:01 [0 rows, 0B] [0 rows/s, 0B/s]
Query 20200211_185222_00050_hej8h failed: No value present
however when i run
select id from table
it returns
id
----------
34551832
(1 row)
Query 20200211_185250_00051_hej8h, FINISHED, 1 node
Splits: 17 total, 17 done (100.00%)
0:00 [1 rows, 93B] [2 rows/s, 213B/s]
is this expected behaviour? or is there an underlying issue with the setup between Hudi/AWS Glue/Presto
Update 12-Feb-2020
Stack track using --debug option
presto:schema> select * from table;
Query 20200212_092259_00006_hej8h, FAILED, 1 node
http://xx-xxx-xxx-xxx.xx-xxxxx-xxx.compute.amazonaws.com:8889/ui/query.html?20200212_092259_00006_hej8h
Splits: 17 total, 0 done (0.00%)
CPU Time: 0.0s total, 0 rows/s, 0B/s, 23% active
Per Node: 0.1 parallelism, 0 rows/s, 0B/s
Parallelism: 0.1
Peak Memory: 0B
0:00 [0 rows, 0B] [0 rows/s, 0B/s]
Query 20200212_092259_00006_hej8h failed: No value present
java.util.NoSuchElementException: No value present
at java.util.Optional.get(Optional.java:135)
at com.facebook.presto.parquet.reader.ParquetReader.readArray(ParquetReader.java:156)
at com.facebook.presto.parquet.reader.ParquetReader.readColumnChunk(ParquetReader.java:282)
at com.facebook.presto.parquet.reader.ParquetReader.readStruct(ParquetReader.java:193)
at com.facebook.presto.parquet.reader.ParquetReader.readColumnChunk(ParquetReader.java:276)
at com.facebook.presto.parquet.reader.ParquetReader.readStruct(ParquetReader.java:193)
at com.facebook.presto.parquet.reader.ParquetReader.readColumnChunk(ParquetReader.java:276)
at com.facebook.presto.parquet.reader.ParquetReader.readBlock(ParquetReader.java:268)
at com.facebook.presto.hive.parquet.ParquetPageSource$ParquetBlockLoader.load(ParquetPageSource.java:247)
at com.facebook.presto.hive.parquet.ParquetPageSource$ParquetBlockLoader.load(ParquetPageSource.java:225)
at com.facebook.presto.spi.block.LazyBlock.assureLoaded(LazyBlock.java:283)
at com.facebook.presto.spi.block.LazyBlock.getLoadedBlock(LazyBlock.java:274)
at com.facebook.presto.spi.Page.getLoadedPage(Page.java:261)
at com.facebook.presto.operator.TableScanOperator.getOutput(TableScanOperator.java:254)
at com.facebook.presto.operator.Driver.processInternal(Driver.java:379)
at com.facebook.presto.operator.Driver.lambda$processFor$8(Driver.java:283)
at com.facebook.presto.operator.Driver.tryWithLock(Driver.java:675)
at com.facebook.presto.operator.Driver.processFor(Driver.java:276)
at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1077)
at com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:162)
at com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:483)
at com.facebook.presto.$gen.Presto_0_227____20200211_134743_1.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

Appears the problem may be elsewhere, issue raised with hudi team here --> https://github.com/apache/incubator-hudi/issues/1325

Related

CounterMutationStage and ViewMutationStage metrics are missing in Cassandra 4.0

When invoking nodetool tpstats on Cassandra 4.0, here is what I got nodetool result screenshot
But no CounterMutationStage and ViewMutationStage found. Where are they?
Those metrics are still there. The issue though, is that they expose their data "lazily." Which basically means, they won't show at all when the value is zero. Once you start writing to counters or views, those metrics execute their "lazy initialization," and only then are they exposed. I tested this out using Cassandra 4.0 beta4.
Running a baseline nodetool tpstats | head -n 4:
Pool Name Active Pending Completed Blocked All time blocked
MutationStage 0 0 1 0 0
ReadStage 0 0 27 0 0
CompactionExecutor 0 0 41 0 0
Next, I'll create a simple counter table.
CREATE TABLE games_popularity (game text PRIMARY KEY, popularity counter);
I'll increment the counter a few times and SELECT it.
aploetz#cqlsh> SELECT * FROM stackoverflow.games_popularity ;
game | popularity
----------------+------------
Cyberpunk 2077 | 3
(1 rows)
Now rerunning the nodetool tpstats | head -n 4 indeed show CounterMutationStage:
Pool Name Active Pending Completed Blocked All time blocked
MutationStage 0 0 12 0 0
CounterMutationStage 0 0 3 0 0
ReadStage 0 0 96 0 0
Note that in 4.0 these metrics are also exposed in the system_view.thread_pools virtual table, which you can view with SELECT * FROM system_views.thread_pools;.
Thanks to the good work that have been done by Cassandra developers, the metrics are now lazy initialised to improve the performance.
The best way to "wake up" all lazy metrics is:
nodetool getconcurrency

Cassandra query 2nd index with pagination become slower when data grow

When I query secondary index with pagination, query becomes slower when data grows.
I thought with pagination, no matter how large your data grow, it takes same time to query one page. Is that true? Why my query get slower?
My simplified table is
CREATE TABLE closed_executions (
domain_id uuid,
workflow_id text,
start_time timestamp,
workflow_type_name text,
PRIMARY KEY ((domain_id), start_time)
) WITH CLUSTERING ORDER BY (start_time DESC)
AND COMPACTION = {
'class': 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'
}
AND GC_GRACE_SECONDS = 172800;
And I create a secondary index as
CREATE INDEX closed_by_type ON closed_executions (workflow_type_name);
I query with following CQL
SELECT workflow_id, start_time, workflow_type_name
FROM closed_executions
WHERE domain_id = ?
AND start_time >= ?
AND start_time <= ?
AND workflow_type_name = ?
and code
query := v.session.Query(templateGetClosedWorkflowExecutionsByType,
request.DomainUUID,
common.UnixNanoToCQLTimestamp(request.EarliestStartTime),
common.UnixNanoToCQLTimestamp(request.LatestStartTime),
request.WorkflowTypeName).Consistency(gocql.One)
iter := query.PageSize(request.PageSize).PageState(request.NextPageToken).Iter()
// PageSize is 10, but could be thousand
Environement:
MacBook Pro
Cassandra: 3.11.0
GoCql: github.com/gocql/gocql master
Observation:
10K rows, within second
100K rows, ~3 second
1M rows, ~17 second
Debug log:
INFO [ScheduledTasks:1] 2018-09-11 16:29:48,349 NoSpamLogger.java:91 - Some operations were slow, details available at debug level (debug.log)
DEBUG [ScheduledTasks:1] 2018-09-11 16:29:48,357 MonitoringTask.java:173 - 1 operations were slow in the last 5005 msecs:
<SELECT * FROM cadence_visibility.closed_executions WHERE workflow_type_name = code.uber.internal/devexp/cadence-bench/load/basic.stressWorkflowExecute AND token(domain_id, domain_partition) >= token(d3138e78-abe7-48a0-adb9-8c466a9bb3fa, 0) AND token(domain_id, domain_partition) <= token(d3138e78-abe7-48a0-adb9-8c466a9bb3fa, 0) AND start_time >= 2018-09-11 16:29-0700 AND start_time <= 1969-12-31 16:00-0800 LIMIT 10>, time 2747 msec - slow timeout 500 msec
DEBUG [COMMIT-LOG-ALLOCATOR] 2018-09-11 16:31:47,774 AbstractCommitLogSegmentManager.java:107 - No segments in reserve; creating a fresh one
DEBUG [ScheduledTasks:1] 2018-09-11 16:40:22,922 ColumnFamilyStore.java:899 - Enqueuing flush of size_estimates: 23.997MiB (2%) on-heap, 0.000KiB (0%) off-heap
Related ref (no answer for my questions):
https://lists.apache.org/thread.html/%3CCAAiKoBidknHVOz8oQQmncZFZHdFiDfW6HTs63vxXCOhisQYZgg#mail.gmail.com%3E
https://www.datastax.com/dev/blog/cassandra-native-secondary-index-deep-dive
https://docs.datastax.com/en/developer/java-driver/3.2/manual/paging/
-- Edit
tablestats returns
Total number of tables: 105
----------------
Keyspace : cadence_visibility
Read Count: 19
Read Latency: 0.5125263157894736 ms.
Write Count: 3220964
Write Latency: 0.04900822269357869 ms.
Pending Flushes: 0
Table: closed_executions
SSTable count: 1
SSTables in each level: [1, 0, 0, 0, 0, 0, 0, 0, 0]
Space used (live): 20.3 MiB
Space used (total): 20.3 MiB
Space used by snapshots (total): 0 bytes
Off heap memory used (total): 6.35 KiB
SSTable Compression Ratio: 0.40192660515179696
Number of keys (estimate): 3
Memtable cell count: 28667
Memtable data size: 7.35 MiB
Memtable off heap memory used: 0 bytes
Memtable switch count: 9
Local read count: 9
Local read latency: NaN ms
Local write count: 327024
Local write latency: NaN ms
Pending flushes: 0
Percent repaired: 0.0
Bloom filter false positives: 0
Bloom filter false ratio: 0.00000
Bloom filter space used: 16 bytes
Bloom filter off heap memory used: 8 bytes
Index summary off heap memory used: 38 bytes
Compression metadata off heap memory used: 6.3 KiB
Compacted partition minimum bytes: 150
Compacted partition maximum bytes: 62479625
Compacted partition mean bytes: 31239902
Average live cells per slice (last five minutes): NaN
Maximum live cells per slice (last five minutes): 0
Average tombstones per slice (last five minutes): NaN
Maximum tombstones per slice (last five minutes): 0
Dropped Mutations: 0 bytes
----------------
Why pagination doesn't scale as the main table?
Your data in your secondary index is disperse
pagination will only apply logic
until it hits the page number
since your data is not clustered by time
you still have to sift through lots and lots of rows
before you can find your first 10 for example .
Query Tracing do show pagination plays at the very late phase.
Why secondary index is slow?
First Cassandra reads the index table to retrieve the primary key of all matching rows and for each of them, it will read the original table to fetch out the data. It is known anti-patterns with low cardinality index. (reference https://www.datastax.com/dev/blog/cassandra-native-secondary-index-deep-dive)

Cassandra tombstones not deleted a month after actual record TTL

Running into an issue with DSE 4.7.
The tombstones are not being deleted even after compactions, cleanup, rebuild_index and repair. records have a 15 day ttl.
sstablemetadata output suggests that there are 90% tombstones
Any ideas?
sstablemetadata output
SSTable: ./abcd-abcd-ka-478675
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Bloom Filter FP chance: 0.010000
Minimum timestamp: 1527521280829593
Maximum timestamp: 1527596173976435
SSTable max local deletion time: 1528892173
Compression ratio: 0.36967428395684393
Estimated droppable tombstones: 0.9073013816277629
SSTable Level: 0
Repaired at: 0
ReplayPosition(segmentId=1520529283052, position=4626679)
Estimated tombstone drop times:%n
1528817679: 18318196
1528818619: 20753822
1528819513: 24176310
.
.
.
Count Row Size Cell Count
1 0 0
2 0 1752560
3 0 0
4 0 6355421
5 0 0
6 0 687302
7 0 0
8 0 529613
10 0 444801
12 0 410107
14 0 456011
17 0 1347893
20 0 184960
24 0 152814
.
.
.
770 1347893 137
924 184960 109
1109 220403 68
1331 121620 86
1597 2044030 102
1916 185601 195
2299 184816 158273
2759 868754 0
3311 62795 0
3973 1668 0
4768 2143 0
5722 1812541 0
6866 828 0
.
.
.
Ancestors: [476190, 474027, 475201, 478160]
Estimated cardinality: 20059264
Cassandra marks TTL data with a tombstone after the requested amount of time has expired. A tombstone exists for gc_grace_seconds. After data is marked with a tombstone, the data is automatically removed during the normal compaction process.
you can try to run major compaction to evict tombstone out.
Tombstones gets deleted after normal compaction. But, still sometime you find stale data (even in prod)in tombstone.The reason could be out of all the nodes in that cluster one is down and the data from tombstone did not got deleted because of that node. Also sometimes null values are inserted in primary key causing tombstone data.

Cassandra NoHostAvailableException when deletes are executed with cqlsh

We have a cluster with 7 nodes and we use the datastax java driver to connect to the cluster. The problem is that I am getting constant NoHostAvailableException like this:
Caused by:
com.datastax.driver.core.exceptions.NoHostAvailableException: All
host(s) tried for query failed (tried: /172.31.7.243:9042
(com.datastax.driver.core.exceptions.DriverException: Timeout while
trying to acquire available connection (you may want to increase the
driver number of per-host connections)), /172.31.7.245:9042
(com.datastax.driver.core.exceptions.DriverException: Timeout while
trying to acquire available connection (you may want to increase the
driver number of per-host connections)), /172.31.7.246:9042
(com.datastax.driver.core.exceptions.DriverException: Timeout while
trying to acquire available connection (you may want to increase the
driver number of per-host connections)), /172.31.7.247:9042,
/172.31.7.232:9042, /172.31.7.233:9042, /172.31.7.244:9042 [only
showing errors of first 3 hosts, use getErrors() for more details])
All the nodes are up:
UN 172.31.7.244 152.21 GB 256 14.5% 58abea69-e7ba-4e57-9609-24f3673a7e58 RAC1
UN 172.31.7.245 168.4 GB 256 14.5% bc11b4f0-cf96-4ca5-9a3e-33cc2b92a752 RAC1
UN 172.31.7.246 177.71 GB 256 13.7% 8dc7bb3d-38f7-49b9-b8db-a622cc80346c RAC1
UN 172.31.7.247 158.57 GB 256 14.1% 94022081-a563-4042-81ab-75ffe4d13194 RAC1
UN 172.31.7.243 176.83 GB 256 14.6% 0dda3410-db58-42f2-9351-068bdf68f530 RAC1
UN 172.31.7.233 159 GB 256 13.6% 01e013fb-2f57-44fb-b3c5-fd89d705bfdd RAC1
UN 172.31.7.232 166.05 GB 256 15.0% 4d009603-faa9-4add-b3a2-fe24ec16a7c1 RAC1
but two of them have high cpu load, especially the 232 because I am running a lot of deletes using cqlsh in that node.
I know that deletes generate tombstones, but with 7 nodes in the cluster I do not think is normal that all the host are not accesible.
Our configuration for the java connection is:
com.datastax.driver.core.Cluster cluster = null;
//Get contact points
String[] contactPoints=this.environment.getRequiredProperty(CASSANDRA_CLUSTER_URL).split(",");
cluster = com.datastax.driver.core.Cluster.builder()
.addContactPoints(contactPoints))
.withCredentials(this.environment.getRequiredProperty(CASSANDRA_CLUSTER_USERNAME),
this.environment.getRequiredProperty(CASSANDRA_CLUSTER_PASSWORD))
.withQueryOptions(new QueryOptions()
.setConsistencyLevel(ConsistencyLevel.QUORUM))
.withLoadBalancingPolicy(new TokenAwarePolicy(new RoundRobinPolicy()))
.withRetryPolicy(new LoggingRetryPolicy(DowngradingConsistencyRetryPolicy.INSTANCE))
.withPort(Integer.parseInt(this.environment.getRequiredProperty(CASSANDRA_CLUSTER_PORT)))
.build();
Metadata metadata = cluster.getMetadata();
for ( Host host : metadata.getAllHosts() ) {
LOG.info("Datacenter: "+host.getDatacenter()+"; Host: "+host.getAddress()+"; DC: "+host.getDatacenter()+"\n");
}
and the contact points are:
172.31.7.244,172.31.7.243,172.31.7.245,172.31.7.246,172.31.7.247
Anyone knows how I can solve this problem? Or at least have anyone some hint about how to deal with this situation?
Update: If I get the error messages withe.getErrors() I obtain:
/172.31.7.243:9042=com.datastax.driver.core.OperationTimedOutException: [/172.31.7.243:9042] Operation timed out,
/172.31.7.244:9042=com.datastax.driver.core.OperationTimedOutException: [/172.31.7.244:9042] Operation timed out,
/172.31.7.245:9042=com.datastax.driver.core.OperationTimedOutException: [/172.31.7.245:9042] Operation timed out,
/172.31.7.246:9042=com.datastax.driver.core.OperationTimedOutException: [/172.31.7.246:9042] Operation timed out,
/172.31.7.247:9042=com.datastax.driver.core.OperationTimedOutException: [/172.31.7.247:9042] Operation timed out}
UPDATE:
The replication factor of the keyspace is 3.
For the deletes Im running them using different files with the cql queries:
cqlsh ip_node_1 -f script-1.duplicates
cqlsh ip_node_1 -f script-2.duplicates
cqlsh ip_node_1 -f script-3.duplicates
...
I am not specifying any consistency level, so is using the default one which is ONE.
Each of the previous files contain deletes like this:
DELETE FROM keyspace_name.search WHERE idline1 = 837 and idline2 = 841 and partid = 8558 and id = 18c04c20-8a3a-11e5-9e20-0025905a2ab2;
And the column family is:
CREATE TABLE search (
idline1 bigint,
idline2 bigint,
partid int,
id uuid,
field3 int,
field4 int,
field5 int,
field6 int,
field7 int,
field8 int,
field9 double,
field10 bigint,
field11 bigint,
field12 bigint,
field13 boolean,
field14 boolean,
field15 int,
field16 bigint,
field17 int,
field18 int,
field19 int,
field20 int,
field21 uuid,
field22 boolean,
PRIMARY KEY ((idline1, idline2, partid), id)
) WITH
bloom_filter_fp_chance=0.010000 AND
caching='KEYS_ONLY' AND
comment='Table with the snp between lines' AND
dclocal_read_repair_chance=0.000000 AND
gc_grace_seconds=0 AND
index_interval=128 AND
read_repair_chance=0.100000 AND
replicate_on_write='true' AND
populate_io_cache_on_flush='false' AND
default_time_to_live=0 AND
speculative_retry='99.0PERCENTILE' AND
memtable_flush_period_in_ms=0 AND
compaction={'class': 'SizeTieredCompactionStrategy'} AND
compression={'sstable_compression': 'LZ4Compressor'};
CREATE INDEX search_partid ON search (partid);
CREATE INDEX search_field8 ON search (field8);
UPDATE (18-03-2016):
After the deletes start to be executed I found the cpu of some of the nodes increases a lot:
I check the processes on that nodes and only cassandra is running but consuming a lot of cpu. The rest of the nodes are not using almost cpu.
UPDATE (04-04-2016): I do not know if it is related. I check the nodes which a lot of CPU (near 96%) and th gc activity remains on 1.6% (using only 3 GB from the 10 which have assigned).
Checing the thread pool stats:
nodetool tpstats
Pool Name Active Pending Completed Blocked All time blocked
ReadStage 0 0 20042001 0 0
RequestResponseStage 0 0 149365845 0 0
MutationStage 32 117720 181498576 0 0
ReadRepairStage 0 0 799373 0 0
ReplicateOnWriteStage 0 0 13624173 0 0
GossipStage 0 0 5580503 0 0
CacheCleanupExecutor 0 0 0 0 0
AntiEntropyStage 0 0 32173 0 0
MigrationStage 0 0 9 0 0
MemtablePostFlusher 0 0 45044 0 0
MemoryMeter 0 0 9553 0 0
FlushWriter 0 0 9425 0 18
ValidationExecutor 0 0 15980 0 0
MiscStage 0 0 0 0 0
PendingRangeCalculator 0 0 7 0 0
CompactionExecutor 0 0 1293147 0 0
commitlog_archiver 0 0 0 0 0
InternalResponseStage 0 0 0 0 0
HintedHandoff 0 0 273 0 0
Message type Dropped
RANGE_SLICE 0
READ_REPAIR 0
PAGED_RANGE 0
BINARY 0
READ 0
MUTATION 0
_TRACE 0
REQUEST_RESPONSE 0
COUNTER_MUTATION 0
I realize that the pending mutation stages are growing but the active value remain the same, could be this the problem?
I see two problems with your datamodel.
You use two secondary indexes. One is on a field on the partition key. I don't know how cassandra behaves in this case. Worst case is, that even if you use the complete partition key (like you do in your example delete) cassandra does a lookup in the secondary index. In that case this would mean a full cluster scan, because secondary indexes are only stored per partition. Since only a part of the partition key is indexed cassandra does not know on which partition the index informations lies. This behavior at least would explain the timeouts.
You said, you delete a lot of rows in a specific partition. That is also a problem. For each deletion cassandra creates a tombstone. The more tombstones there are, the slower the read will become. This will sooner or later lead to timeouts or exceptions (I believe cassandra will write warnings when 1000 tombstones are reached and throw exceptions when 10.000 tombstones are reached). Btw. these tombstones are also created in the secondary index. By default cassandra will remove tombstones after gc_grace_seconds (by default 10 days) when a compaction is performed. You could change this property per table. More information on these table properties can be found here: Table Properties
I believe the first point could be the reason for the timeouts.

Coordinator get responce from one node notably later than from other nodes

Please, help me to understand what i missed.
I see strange behavior of one cluster node on SELECT with LIMIT and ORDER BY DESC clauses:
SELECT cid FROM test_cf WHERE uid = 0x50236b6de695baa1140004bf ORDER BY tuuid DESC LIMIT 1000;
TRACING (only part):
…
Sending REQUEST_RESPONSE message to /10.0.25.56 [MessagingService-Outgoing-/10.0.25.56] | 2016-02-29 22:17:25.117000 | 10.0.23.15 | 7862
Sending REQUEST_RESPONSE message to /10.0.25.56 [MessagingService-Outgoing-/10.0.25.56] | 2016-02-29 22:17:25.136000 | 10.0.25.57 | 6283
Sending REQUEST_RESPONSE message to /10.0.25.56 [MessagingService-Outgoing-/10.0.25.56] | 2016-02-29 22:17:38.568000 | 10.0.24.51 | 457931
…
10.0.25.56 - coordinator node
10.0.23.15, 10.0.24.51, 10.0.25.57 - node with data
Coordinator get response from 10.0.24.51 13 seconds later than other nodes! Why so? How can i fix it?
Number of rows for partition key (uid = 0x50236b6de695baa1140004bf) is about 300.
All is fine if we use ORDER BY ASC (our clustering order) or LIMIT value less than number of rows for this partition key.
Cassandra (v2.2.5) cluster contains 25 nodes.
Every node holds about 400Gb of data.
Cluster is placed in AWS. Nodes are evenly distributed over 3 subnets in VPC. Type of instance for nodes is c3.4xlarge (16 CPU cores, 30GB RAM). We use EBS-backed storages (1TB GP SSD).
Keyspace RF equals 3.
Column family:
CREATE TABLE test_cf (
uid blob,
tuuid timeuuid,
cid text,
cuid blob,
PRIMARY KEY (uid, tuuid)
) WITH CLUSTERING ORDER BY (tuuid ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
AND comment = ''
AND compaction ={'class':'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'}
AND compression ={'sstable_compression':'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 86400
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';
nodetool gcstats (10.0.25.57):
Interval (ms) Max GC Elapsed (ms)Total GC Elapsed (ms)Stdev GC Elapsed (ms) GC Reclaimed (MB) Collections Direct Memory Bytes
1208504 368 4559 73 553798792712 58 305691840
nodetool gcstats (10.0.23.15):
Interval (ms) Max GC Elapsed (ms)Total GC Elapsed (ms)Stdev GC Elapsed (ms) GC Reclaimed (MB) Collections Direct Memory Bytes
1445602 369 3120 57 381929718000 38 277907601
nodetool gcstats (10.0.24.51):
Interval (ms) Max GC Elapsed (ms)Total GC Elapsed (ms)Stdev GC Elapsed (ms) GC Reclaimed (MB) Collections Direct Memory Bytes
1174966 397 4137 69 1900387479552 45 304448986
This could be due to a number of factors both related and not related to Cassandra.
Non-Cassandra Specific
How does the hardware (CPU/RAM/Disk Type (SSD v Rotational) on this
node compare to the other nodes?
How is the network configured? Is traffic to this node slower than other nodes? Do you have a routing issue between the nodes?
How does the load on this server compare to other nodes?
Cassandra Specific
Is the JVM properly configured? Is GC running significantly more frequently than the other nodes? Check nodetool gcstats on this and other nodes to compare.
Has compaction been run on this node recently? Check nodetool compactionhistory
Are there any issues with corrupted files on disk?
Have you checked the system.log to see if it contains any information.
Besides general Linux troubleshooting I would suggest you compare some of the specific C* functionality using nodetool and look for differences:
https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsNodetool_r.html

Resources