Query is slow - large result - cassandra

This table:
CREATE TABLE buckets (
year_month text,
source_id uuid,
nano_since_epoch bigint,
max double,
min double,
num_values int,
sum double,
value double,
PRIMARY KEY ((year_month, source_id), nano_since_epoch)
)
and this query:
select value from buckets where year_month = '2020_9' AND source_id=699ddcc0-5896-4b35-a901-8eec2f221499
It works, but the query takes ~4 sec. to complete. Result set has ~48.000 rows.
Cluster has 3 nodes running on GCP, n1-standard-2 (2 vCPUs, 7.5 GB memory).
Not using SSDs. Almost same performance on the second DC with only 1 node, on a much larger -bare metal- machine.
Table histograms:
Percentile SSTables Write Latency Read Latency Partition Size Cell Count
(micros) (micros) (bytes)
50% 0.00 51.01 73.46 182785 17084
75% 0.00 61.21 182.79 1629722 126934
95% 3.00 182.79 654.95 4055269 315852
98% 3.00 263.21 785.94 4866323 454826
99% 4.00 315.85 943.13 5839588 454826
Min 0.00 4.77 20.50 43 0
Max 4.00 2816.16 943.13 5839588 454826
Is it possible to speed up fetching of the results or is the size of the result the limiting factor here?
EDIT:
Cassandra is running in docker on the VMS. Memory utilization on the VM is ~ 50%
Dockerfile:
services:
elassandra:
image: strapdata/elassandra:6.2.3.23
container_name: elassandra
environment:
- CASSANDRA_BROADCAST_ADDRESS=10.164.0.58
- CASSANDRA_SEEDS=10.164.0.58
- CASSANDRA_CLUSTER_NAME=Prod Cluster 2
- CASSANDRA_NUM_TOKENS=256
- CASSANDRA_DC=datacenter-prod
- CASSANDRA_RACK=rack1
- LOCAL_JMX=no
volumes:
- '/opt/elassandra/data:/opt/elassandra/data'
- '/var/log/cassandra:/var/log/cassandra'
- '/var/lib/cassandra:/var/lib/cassandra'
ports:
- 7000:7000
- 7001:7001
- 7199:7199
- 8080:8080
- 8081:8081
- 9042:9042
- 9160:9160
- 9200:9200
- 9300:9300
ulimits:
memlock: -1
nproc: 32768
nofile: 100000
network_mode: "host"
logging:
driver: syslog
options:
tag: cassandra-0

Related

Cassandra query 2nd index with pagination become slower when data grow

When I query secondary index with pagination, query becomes slower when data grows.
I thought with pagination, no matter how large your data grow, it takes same time to query one page. Is that true? Why my query get slower?
My simplified table is
CREATE TABLE closed_executions (
domain_id uuid,
workflow_id text,
start_time timestamp,
workflow_type_name text,
PRIMARY KEY ((domain_id), start_time)
) WITH CLUSTERING ORDER BY (start_time DESC)
AND COMPACTION = {
'class': 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'
}
AND GC_GRACE_SECONDS = 172800;
And I create a secondary index as
CREATE INDEX closed_by_type ON closed_executions (workflow_type_name);
I query with following CQL
SELECT workflow_id, start_time, workflow_type_name
FROM closed_executions
WHERE domain_id = ?
AND start_time >= ?
AND start_time <= ?
AND workflow_type_name = ?
and code
query := v.session.Query(templateGetClosedWorkflowExecutionsByType,
request.DomainUUID,
common.UnixNanoToCQLTimestamp(request.EarliestStartTime),
common.UnixNanoToCQLTimestamp(request.LatestStartTime),
request.WorkflowTypeName).Consistency(gocql.One)
iter := query.PageSize(request.PageSize).PageState(request.NextPageToken).Iter()
// PageSize is 10, but could be thousand
Environement:
MacBook Pro
Cassandra: 3.11.0
GoCql: github.com/gocql/gocql master
Observation:
10K rows, within second
100K rows, ~3 second
1M rows, ~17 second
Debug log:
INFO [ScheduledTasks:1] 2018-09-11 16:29:48,349 NoSpamLogger.java:91 - Some operations were slow, details available at debug level (debug.log)
DEBUG [ScheduledTasks:1] 2018-09-11 16:29:48,357 MonitoringTask.java:173 - 1 operations were slow in the last 5005 msecs:
<SELECT * FROM cadence_visibility.closed_executions WHERE workflow_type_name = code.uber.internal/devexp/cadence-bench/load/basic.stressWorkflowExecute AND token(domain_id, domain_partition) >= token(d3138e78-abe7-48a0-adb9-8c466a9bb3fa, 0) AND token(domain_id, domain_partition) <= token(d3138e78-abe7-48a0-adb9-8c466a9bb3fa, 0) AND start_time >= 2018-09-11 16:29-0700 AND start_time <= 1969-12-31 16:00-0800 LIMIT 10>, time 2747 msec - slow timeout 500 msec
DEBUG [COMMIT-LOG-ALLOCATOR] 2018-09-11 16:31:47,774 AbstractCommitLogSegmentManager.java:107 - No segments in reserve; creating a fresh one
DEBUG [ScheduledTasks:1] 2018-09-11 16:40:22,922 ColumnFamilyStore.java:899 - Enqueuing flush of size_estimates: 23.997MiB (2%) on-heap, 0.000KiB (0%) off-heap
Related ref (no answer for my questions):
https://lists.apache.org/thread.html/%3CCAAiKoBidknHVOz8oQQmncZFZHdFiDfW6HTs63vxXCOhisQYZgg#mail.gmail.com%3E
https://www.datastax.com/dev/blog/cassandra-native-secondary-index-deep-dive
https://docs.datastax.com/en/developer/java-driver/3.2/manual/paging/
-- Edit
tablestats returns
Total number of tables: 105
----------------
Keyspace : cadence_visibility
Read Count: 19
Read Latency: 0.5125263157894736 ms.
Write Count: 3220964
Write Latency: 0.04900822269357869 ms.
Pending Flushes: 0
Table: closed_executions
SSTable count: 1
SSTables in each level: [1, 0, 0, 0, 0, 0, 0, 0, 0]
Space used (live): 20.3 MiB
Space used (total): 20.3 MiB
Space used by snapshots (total): 0 bytes
Off heap memory used (total): 6.35 KiB
SSTable Compression Ratio: 0.40192660515179696
Number of keys (estimate): 3
Memtable cell count: 28667
Memtable data size: 7.35 MiB
Memtable off heap memory used: 0 bytes
Memtable switch count: 9
Local read count: 9
Local read latency: NaN ms
Local write count: 327024
Local write latency: NaN ms
Pending flushes: 0
Percent repaired: 0.0
Bloom filter false positives: 0
Bloom filter false ratio: 0.00000
Bloom filter space used: 16 bytes
Bloom filter off heap memory used: 8 bytes
Index summary off heap memory used: 38 bytes
Compression metadata off heap memory used: 6.3 KiB
Compacted partition minimum bytes: 150
Compacted partition maximum bytes: 62479625
Compacted partition mean bytes: 31239902
Average live cells per slice (last five minutes): NaN
Maximum live cells per slice (last five minutes): 0
Average tombstones per slice (last five minutes): NaN
Maximum tombstones per slice (last five minutes): 0
Dropped Mutations: 0 bytes
----------------
Why pagination doesn't scale as the main table?
Your data in your secondary index is disperse
pagination will only apply logic
until it hits the page number
since your data is not clustered by time
you still have to sift through lots and lots of rows
before you can find your first 10 for example .
Query Tracing do show pagination plays at the very late phase.
Why secondary index is slow?
First Cassandra reads the index table to retrieve the primary key of all matching rows and for each of them, it will read the original table to fetch out the data. It is known anti-patterns with low cardinality index. (reference https://www.datastax.com/dev/blog/cassandra-native-secondary-index-deep-dive)

Cassandra tombstones not deleted a month after actual record TTL

Running into an issue with DSE 4.7.
The tombstones are not being deleted even after compactions, cleanup, rebuild_index and repair. records have a 15 day ttl.
sstablemetadata output suggests that there are 90% tombstones
Any ideas?
sstablemetadata output
SSTable: ./abcd-abcd-ka-478675
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Bloom Filter FP chance: 0.010000
Minimum timestamp: 1527521280829593
Maximum timestamp: 1527596173976435
SSTable max local deletion time: 1528892173
Compression ratio: 0.36967428395684393
Estimated droppable tombstones: 0.9073013816277629
SSTable Level: 0
Repaired at: 0
ReplayPosition(segmentId=1520529283052, position=4626679)
Estimated tombstone drop times:%n
1528817679: 18318196
1528818619: 20753822
1528819513: 24176310
.
.
.
Count Row Size Cell Count
1 0 0
2 0 1752560
3 0 0
4 0 6355421
5 0 0
6 0 687302
7 0 0
8 0 529613
10 0 444801
12 0 410107
14 0 456011
17 0 1347893
20 0 184960
24 0 152814
.
.
.
770 1347893 137
924 184960 109
1109 220403 68
1331 121620 86
1597 2044030 102
1916 185601 195
2299 184816 158273
2759 868754 0
3311 62795 0
3973 1668 0
4768 2143 0
5722 1812541 0
6866 828 0
.
.
.
Ancestors: [476190, 474027, 475201, 478160]
Estimated cardinality: 20059264
Cassandra marks TTL data with a tombstone after the requested amount of time has expired. A tombstone exists for gc_grace_seconds. After data is marked with a tombstone, the data is automatically removed during the normal compaction process.
you can try to run major compaction to evict tombstone out.
Tombstones gets deleted after normal compaction. But, still sometime you find stale data (even in prod)in tombstone.The reason could be out of all the nodes in that cluster one is down and the data from tombstone did not got deleted because of that node. Also sometimes null values are inserted in primary key causing tombstone data.

Cassandra query fails for some WHERE clauses

I have a large table that looks essentially like:
CREATE TABLE keyspace.table(
node bigint,
time bigint,
core bigint,
set bigint,
value1 bigint,
value2 bigint,
PRIMARY KEY (node, time, core)
);
And a secondary index (possibly irrelevant) on column set.
When I do a stupidly simple query:
SELECT * FROM keyspace.table LIMIT 10;
It succeeds.
However, for some WHERE clauses, it fails, e.g.:
SELECT * FROM keyspace.table WHERE node = 12 LIMIT 10;
Traceback (most recent call last):
File "/usr/bin/cqlsh.py", line 1297, in perform_simple_statement
result = future.result()
File "/usr/share/cassandra/lib/cassandra-driver-internal-only-3.0.0-6af642d.zip/cassandra-driver-3.0.0-6af642d/cassandra/cluster.py", line 3122, in result
raise self._final_exception
Unavailable: code=1000 [Unavailable exception] message="Cannot achieve consistency level ONE" info={'required_replicas': 1, 'alive_replicas': 0, 'consistency': 'ONE'}
And nothing shows up in the cassandra system log.
Details
The datacenter looks like:
Datacenter: DC2
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 192.168.64.8 234.84 GB 256 ? 478f59e1-4797-45df-81d1-77559e393e8a RAC1
DN 192.168.64.9 217.85 GB 256 ? 84d4bfd6-054a-433e-b6d4-3eead77609ac RAC1
DN 192.168.64.10 208.99 GB 256 ? c6ac565e-5897-4205-9439-779becf7fafe RAC1
UN 192.168.64.11 225.69 GB 256 ? 4a941e8f-db80-48f3-8eca-c4430b795694 RAC1
DN 192.168.64.12 208.57 GB 256 ? 34e57e18-e8b9-40d6-85e8-40a309e91b49 RAC1
DN 192.168.64.13 240.22 GB 256 ? 7a312c4f-01c0-4ed4-be7f-417b8f14f940 RAC1
DN 192.168.64.4 208.5 GB 256 ? 41a49d5c-e569-46f3-9f0e-97de43a22690 RAC1
UN 192.168.64.5 213.19 GB 256 ? b5bfb4e3-30a2-46b5-ba41-cf1a58a7355d RAC1
UN 192.168.64.6 212.21 GB 256 ? 9f6781ca-09b7-4923-8fa1-5b91079e2a18 RAC1
UN 192.168.64.7 235.97 GB 256 ? 5f429e7b-2e16-4796-b746-834500aeb884 RAC1
The keyspace:
CREATE KEYSPACE keyspace WITH replication = {'class': 'NetworkTopologyStrategy', 'DC2': '2'} AND durable_writes = true;
This table is continuously ingesting 2000 rows per second using cassandra-loader and is fairly large. nodetool tablestats output (EDIT tombstones cleaned up):
Keyspace: keyspace
Read Count: 0
Read Latency: NaN ms.
Write Count: 31945
Write Latency: 7.547574268273595 ms.
Pending Flushes: 0
Table: table
SSTable count: 9
Space used (live): 15414599520
Space used (total): 15414599520
Space used by snapshots (total): 0
Off heap memory used (total): 4506364
SSTable Compression Ratio: 0.41688805047558564
Number of keys (estimate): 251
Memtable cell count: 0
Memtable data size: 0
Memtable off heap memory used: 0
Memtable switch count: 3
Local read count: 0
Local read latency: NaN ms
Local write count: 31945
Local write latency: 8.261 ms
Pending flushes: 0
Bloom filter false positives: 0
Bloom filter false ratio: 0.00000
Bloom filter space used: 1528
Bloom filter off heap memory used: 1456
Index summary off heap memory used: 260
Compression metadata off heap memory used: 4504648
Compacted partition minimum bytes: 2300
Compacted partition maximum bytes: 74975550
Compacted partition mean bytes: 36544449
Average live cells per slice (last five minutes): 1.0222222222222221
Maximum live cells per slice (last five minutes): 2
Average tombstones per slice (last five minutes): 1.0222222222222221
Maximum tombstones per slice (last five minutes): 2
Dropped Mutations: 0
Output with tracing enabled
cqlsh> TRACING ON;
Now Tracing is enabled
cqlsh> SELECT * FROM keyspace.table WHERE node = 1223 LIMIT 10;
Traceback (most recent call last):
File "/usr/bin/cqlsh.py", line 1297, in perform_simple_statement
result = future.result()
File "/usr/share/cassandra/lib/cassandra-driver-internal-only-3.0.0-6af642d.zip/cassandra-driver-3.0.0-6af642d/cassandra/cluster.py", line 3122, in result
raise self._final_exception
Unavailable: code=1000 [Unavailable exception] message="Cannot achieve consistency level ONE" info={'required_replicas': 1, 'alive_replicas': 0, 'consistency': 'ONE'}
Let me know any other details I can provide.

Sphinx claiming memory is too low and my ids are null

I am trying to index about 3,000 document but here is what I am getting
[root#domU-12-31-39-0A-19-CB data]# /usr/local/sphinx/bin/indexer --all
Sphinx 2.0.4-release (r3135)
Copyright (c) 2001-2012, Andrew Aksyonoff
Copyright (c) 2008-2012, Sphinx Technologies Inc (http://sphinxsearch.com)
using config file '/usr/local/sphinx/etc/sphinx.conf'...
indexing index 'catalog'...
WARNING: Attribute count is 0: switching to none docinfo
WARNING: collect_hits: mem_limit=0 kb too low, increasing to 12288 kb
WARNING: source catalog: skipped 3558 document(s) with zero/NULL ids
collected 0 docs, 0.0 MB
total 0 docs, 0 bytes
total 0.040 sec, 0 bytes/sec, 0.00 docs/sec
total 1 reads, 0.000 sec, 0.0 kb/call avg, 0.0 msec/call avg
total 5 writes, 0.000 sec, 0.0 kb/call avg, 0.0 msec/call avg
I have it set to rt_mem_limit = 512M why is it telling me I dont have enough memory?
rt_mem_limit != mem_limit - they are different variables - with different purposes.
mem_limit - is the value used by indexer during indexing
http://sphinxsearch.com/docs/current.html#conf-mem-limit
- its in the 'indexer' section of your config file.
You must have it sent too loo. Either just leave it out (to use 32M), or change it to better value.
But you also have no document_ids in your dataset. Check your sql_query actully works.

phpcassa get_range is too slow

I have CF with 1280 rows.
Each row has 6 columns. Im trying to $cf->get_range('pq_questions','','',1200) and it gets all rows but too slow(about 4-6 sec)
Column Family: pq_questions
SSTable count: 1
Space used (live): 668363
Space used (total): 668363
Number of Keys (estimate): 1280
Memtable Columns Count: 0
Memtable Data Size: 0
Memtable Switch Count: 0
Read Count: 0
Read Latency: NaN ms.
Write Count: 0
Write Latency: NaN ms.
Pending Tasks: 0
Key cache capacity: 200000
Key cache size: 1000
Key cache hit rate: 0.10998439937597504
Row cache capacity: 1000
Row cache size: 1000
Row cache hit rate: 0.0
Compacted row minimum size: 373
Compacted row maximum size: 1331
Compacted row mean size: 574
It is strange but read latency in cfstats is NaN ms
When i calling htop on debian i see that the most load causes phpcassa
I has only one node and use consistency level ONE.
What can cause so slow quering?
I'm guessing you don't have the C extension installed. Without it, a similar query takes 1-2 seconds for me. With it installed, the same query takes about 0.2 seconds.
Regarding the NaN read latency, latencies aren't captured for get_range_slices (get_range in phpcassa).

Resources