Cassandra READ requests dropping - cassandra

I have a column family as follows:
CREATE TABLE keyspace.tableName (
key text PRIMARY KEY,
A blob,
B text,
C text,
D blob,
"E" text
H int
) WITH COMPACT STORAGE
AND bloom_filter_fp_chance = 0.2
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'}
AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.SnappyCompressor'}
AND dclocal_read_repair_chance = 0.0
AND default_time_to_live = 7776000
AND gc_grace_seconds = 3600
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.1
AND speculative_retry = '99.0PERCENTILE';
While making read requests with query:
SELECT A, B, C, D, E, WRITETIME(A), WRITETIME(B) FROM TABLE_NAME WHERE KEY = ?;
I'm getting the following exception:
com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout during read query at consistency LOCAL_ONE (1 responses were required but only 0 replica responded)
at com.datastax.driver.core.exceptions.ReadTimeoutException.copy(ReadTimeoutException.java:88)
This is a huge cluster handling billions of requests per day. I only get this in less than 0.1% of the requests. My 99th percentile is fine. GC is less than 1 sec but I do see some READ messages being dropped in tpstats. Median SSTable count is less than 3. My queries are such that I don't read the same key again for some time, so my key cache hit rate is less than 10%. My read request timeout is 200ms.
Could anyone please give me any pointers on how to debug this?

Related

Cassandra table did not get compacted in 2 years?

I have the following table definition:
CREATE TABLE snap_websites.backend (
key blob,
column1 blob,
value blob,
PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE
AND CLUSTERING ORDER BY (column1 ASC)
AND bloom_filter_fp_chance = 0.100000001
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = 'backend'
AND compaction = {'class': 'org.apache.cassandra.db.compaction.DateTieredCompactionStrategy', 'max_threshold': '10', 'min_threshold': '4', 'tombstone_threshold': '0.02'}
AND compression = {'enabled': 'false'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 3600
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 3600000
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
Looking at the compaction setup, it seems that it should get compacted once in a while... However, after about 2 years the table was really slow on a SELECT and I could see 12,281 files in the corresponding data folder! I only checked on node, I would imagine that all the nodes had similar piles of files.
Why does that happen? Could it be because I never give Cassandra a break and therefore it never really is given a time to run the compaction? (i.e. I pretty much always have some process running against that table, but I did not expect things to get this bad! Wow!)
The command line worked well:
nodetools compact snapwebsites backend
and the number of files went all the way down to 9 (after all, I have just 2 lines of data in that table at the moment!)
What I really need to know is: what is preventing Cassandra from running the compaction process?
I don't remember much about DTCS, but if you can, I'd consider using TWCS to replace it. It works well for time series data (TDCS was mentioned to be going away in the near future).

Executing a LOGGED BATCH warning in Cassandra logs

Our Java Application doing a batch inserts on 1 of the table,
That table schema is something like..
CREATE TABLE "My_KeySpace"."my_table" (
key text,
column1 varint,
column2 bigint,
column3 text,
column4 boolean,
value blob,
PRIMARY KEY (key, column1, column2, column3, column4)
) WITH CLUSTERING ORDER BY ( column1 DESC, column2 DESC, column3 ASC, column4 ASC )
AND COMPACT STORAGE
AND bloom_filter_fp_chance = 0.1
AND comment = ''
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.0
AND default_time_to_live = 0
AND gc_grace_seconds = 0
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.1
AND speculative_retry = 'NONE'
AND caching = {
'keys' : 'ALL',
'rows_per_partition' : 'NONE'
}
AND compression = {
'chunk_length_in_kb' : 64,
'class' : 'LZ4Compressor',
'enabled' : true
}
AND compaction = {
'class' : 'LeveledCompactionStrategy',
'sstable_size_in_mb' : 5
};
gc_grace_seconds = 0 in above schema. Because of this I am getting following warning:
2019-02-05 01:59:53.087 WARN [SharedPool-Worker-5 - org.apache.cassandra.cql3.statements.BatchStatement:97] Executing a LOGGED BATCH on table [My_KeySpace.my_table], configured with a gc_grace_seconds of 0. The gc_grace_seconds is used to TTL batchlog entries, so setting gc_grace_seconds too low on tables involved in an atomic batch might cause batchlog entries to expire before being replayed.
I have seen Cassandra code, this warning is there for obvious reasons at: this line
Any solution without changing batch code in application??
Should I increase gc_grace_seconds?
In Cassandra, batches aren't the way to optimize inserts into database - they are usually used mostly for coordinating writing into multiple tables, etc. If you're using the batches for insertion into multiple partitions, you're even get worse performance.
The better throughput for inserts you can get from using asynchronous commands execution (via executeAsync), and/or by using batches but only for inserts that are targeting the the same partition.

[Cassandra]Filter at row key level in cassandra sql in spark job causing the overcapacity of cpu utilization

In my spark job I am reading data from cassandra using java cassandra util. My query reads like-
JavaRDD<CassandraRow> cassandraRDD = functions.cassandraTable("keyspace","column_family").
select("timeline_id","shopper_id","product_id").where("action=?", "Viewed")
My row key level is set on action column. When I am running my spark job its causing the over utilisation of cpu but when I remove the filter on the action column its working fine.
Please find below the create table script for the column family-
CREATE TABLE keyspace.column_family (
action text,
timeline_id timeuuid,
shopper_id text,
product_id text,
publisher_id text,
referer text,
remote_ip text,
seed_product text,
strategy text,
user_agent text,
PRIMARY KEY (action, timeline_id, shopper_id)
) WITH CLUSTERING ORDER BY (timeline_id DESC, shopper_id ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'}
AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';
What I am suspecting is as action_item is the row key, all data is getting served from single node (hot spot) and thats why that nodes CPU might be shooting up. Also while reading there is only a single partition of RDD getting created in the spark job. Any help will be appreciated.
Ok you're having a data model issue here. action = partition key so all similar actions are stored in a single partition = (one node + replicas).
How many distinct actions do you have in total ? Your intuition about having hotspot is justified.
You probably need a different partition key OR need to add an extra column to the partition key to let Cassandra distributes the data evenly on the cluster.
Read this blog post : http://www.planetcassandra.org/blog/the-most-important-thing-to-know-in-cassandra-data-modeling-the-primary-key/

Cassandra - Poor Query Performance using NetworkTopologyStrategy

Having performance issues using NetworkTopologyStrategy on a production keyspace with replication factor 4 across multiple datacenters (DCs located in 4 worldwide locations). Each DC has 3 nodes with pretty good hardware (70GB RAM, 5TB SSDs, etc.).
Same keyspace performs well in a SimpleStrategy using 4 node cluster in AWS, but running same queries in Production environment results in poor query times (select * from my_table is 6ms in AWS and 271ms in production).
Table "my_table" (name changed for privacy) is defined as:
CREATE TABLE my_table (
rec_type text,
group_id int,
rec_id timeuuid,
user_id int,
content text,
created_on timestamp,
PRIMARY KEY ((rec_type, group_id), rec_id)
) WITH bloom_filter_fp_chance = 0.1
AND comment = ''
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE'
AND caching = {
'keys' : 'ALL',
'rows_per_partition' : 'NONE'
}
AND compression = {
'sstable_compression' : 'LZ4Compressor'
}
AND compaction = {
'class' : 'LeveledCompactionStrategy'
};
This is a newly-created table in Production with occasional updates and low tombstone count.
Query trace is below:
Looks like range requests are taking the most amount of time. What would be the cause of the delay?
Network latency between nodes in the same DC is <1ms and latency between DCs is around 50-60ms.
EDIT:
Below is the query trace for a select * from my_table where rec_type = 'abc' and group = 1 LIMIT 300 query (2 screenshots because trace is so long):

OperationTimedOut during cqlsh alter table

I am receiving a OperationTimedOut error while running an alter table command in cqlsh. How is that possible? Since this is just a table metadata update, shouldn't this operation run almost instantaneously?
Specifically, this is an excerpt from my cqlsh session
cqlsh:metric> alter table metric with gc_grace_seconds = 86400;
OperationTimedOut: errors={}, last_host=sandbox73vm230
The metric table currently has a gc_grace_seconds of 864000. I am seeing this behavior in a 2-node cluster and in a 6-node 2-datacenter cluster. My nodes seem to be communicating fine in general (e.g. I can insert in one and read from the other). Here is the full table definition (a cyanite 0.1.3 schema with DateTieredCompactionStrategy, clustering and caching changes):
CREATE TABLE metric.metric (
tenant text,
period int,
rollup int,
path text,
time bigint,
data list<double>,
PRIMARY KEY ((tenant, period, rollup, path), time)
) WITH CLUSTERING ORDER BY (time ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
AND comment = ''
AND compaction = {'timestamp_resolution': 'SECONDS', 'class': 'org.apache.cassandra.db.compaction.DateTieredCompactionStrategy'}
AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.0
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = 'NONE';
I realize at this point the question is pretty old, and you may have either figured out the answer or otherwise moved on, but wanted to post this in case others stumbled upon it.
The default cqlsh request timeout is 10 seconds. You can adjust this by starting up cqlsh with the --request-timeout option set to some value that allows your ALTER TABLE to run to completion, e.g.:
cqlsh --request-timeout=1000000

Resources