TTL expired sstables not getting dropped on Cassandra Nodes - cassandra

I am inserting data in below table with TTL of 2 days. SSTABLES with timestamp older than 2 days should have been dropped but that is not happening. There are no deletes and no updates. Window size is 1 hour and gc_grace_seconds is 7200. read_repair has been set to 0.0
CREATE TABLE events."290" (
key text PRIMARY KEY,
raw_log text
) WITH COMPACT STORAGE
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy', 'compaction_window_size': '1', 'compaction_window_unit': 'HOURS', 'max_threshold': '64', 'min_threshold': '32'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.SnappyCompressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.0
AND default_time_to_live = 0
AND gc_grace_seconds = 7200
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = 'NONE';
On restarting cassandra the expired sstables got removed as expected.
Should I be setting unchecked_tombstone_compaction = true?
Ideally I would like the sstables to be dropped completely rather than go through multiple compactions to get removed.
Earlier I was using DateTieredCompaction and cassandra 2.2. Everything was working fine. Upgraded to cassandra 3.0.13 and observed above behaviour. Changed compaction from DTCS to TWCS and still the same behaviour.
If on restarting the expired sstables were not getting removed, I would have considered case of overlapping sstables https://issues.apache.org/jira/browse/CASSANDRA-13418

Related

Reduce gc_grace_seconds to 0 for TTL'ed data in Cassandra

Does it make sense to reduce gc_grace_seconds to 0 (or some other really low #) if table only contains TTL'ed data (with no manual deletes)? Table has a default_time_to_live set of 30 days. Also, as mentioned here
In a single-node cluster, this property can safely be set to zero. You
can also reduce this value for tables whose data is not explicitly
deleted — for example, tables containing only data with TTL set,
More details of the schema.
CREATE TABLE Foo (
user_uuid uuid,
ts bigint,
... //skipped a few columns
PRIMARY KEY (user_uuid, ts, event_uuid)
) WITH CLUSTERING ORDER BY (ts DESC, event_uuid ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy', 'compaction_window_size': '24', 'compaction_window_unit': 'HOURS', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 2592000
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
You need to be careful, as with gc_grace_seconds you effectively disable hints collection, so if the node is down even for 5 minutes, you'll need to do a repair. In Cassandra 3.0, hints obey value gc_grace_seconds and if it's shorter then max_hints_window, then the hints will be collected for that time period only... But you can reduce this value to several hours if necessary, as it was hinted in linked documentation.
See this very good blog post on that topic.

cassandra read high iowait

data read per second
I have a three node cassandra cluster。when I have multi thread querying from the cluster ,the io load is very high.The cluster holds about 80GB data per node.I use time window compact strategy and time-window is ten hour.One sstable is about 1GB.Can some body help me with it.Thank you.one sstable infomation
data is at a speed of 10000 per second .the cluster holds about 10 billion records .
below is the schema information
CREATE TABLE point_warehouse.point_period (
point_name text,
year text,
time timestamp,
period int,
time_end timestamp,
value text,
PRIMARY KEY ((point_name, year), time)
) WITH CLUSTERING ORDER BY (time DESC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy', 'compaction_window_size': '10', 'compaction_window_unit': 'HOURS', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 2592000
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
and the query is SELECT * from POINT_PERIOD where POINT_NAME=? AND YEAR='2017' AND TIME >'2017-05-23 12:53:24 order by time asc LIMIT 1 ALLOW FILTERING'
when execute this query concurrently the io load became extremely high like 200MB/s. thank you .

Connection timed out while dropping keyspace

I'm having trouble deleting a keyspace.
The keyspace in question has 4 tables similar to this one:
CREATE KEYSPACE demo WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'} AND durable_writes = false;
CREATE TABLE demo.t1 (
t11 int,
t12 timeuuid,
t13 int,
t14 text,
t15 text,
t16 boolean,
t17 boolean,
t18 int,
t19 timeuuid,
t110 text,
PRIMARY KEY (t11, t12)
) WITH CLUSTERING ORDER BY (t13 DESC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
CREATE INDEX t1_idx ON demo.t1 (t14);
CREATE INDEX t1_deleted_idx ON demo.t2 (t15);
When I want to delete the keyspace using the command:
Session session = cluster.connect();
PreparedStatement prepared = session.prepare("drop keyspace if exists " + schemaName);
BoundStatement bound = prepared.bind();
session.execute(bound);
Then the query gets timed out (or takes over 10 seconds to execute), even when the tables are empty:
com.datastax.driver.core.exceptions.OperationTimedOutException: [/192.168.0.1:9042] Timed out waiting for server response
at com.datastax.driver.core.exceptions.OperationTimedOutException.copy(OperationTimedOutException.java:44)
at com.datastax.driver.core.exceptions.OperationTimedOutException.copy(OperationTimedOutException.java:26)
at com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:37)
at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:245)
at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:64)
at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:39)
I tried this on multiple machines and the result was the same. I'm using Cassandra 3.9. A similar thing happens using cqlsh. I know that I can increase the read timeout in the cassandra.yaml file, but how can I make this dropping faster? Another thing is that if I do two consecutive requests, the first one gets timed out and the second one goes through fast.
Try to run it with increased timeout:
cqlsh --request-timeout=3600 (in seconds, default 10 deconds)
There's should be also same setting on driver level. Review timeout session in this link:
http://docs.datastax.com/en/developer/java-driver/3.1/manual/socket_options/
Increasing timeout just hides the issue away and is usually a bad idea. Have a look at this answer: https://stackoverflow.com/a/16618888/7413631

Cassandra 3.10 DateTieredCompactionStrategy options for our environment

Our cassandra 3.10 table is operated with the following options. My question is the optimal option value for DateTieredCompactionStrategy for our environment.
WRITE about 1000 events per second.
Data is stored for 10 years.
Queries for recent data and aggregation queries by time window are frequent.
Updates are not frequent.
Current table options:
AND bloom_filter_fp_chance = 0.01
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
AND comment = ''
AND compaction = {'max_sstable_age_days': '365', 'base_time_seconds': '3600', 'max_threshold': '32', 'timestamp_resolution': 'MILLISECONDS', 'enabled': 'true', 'tombstone_compaction_interval': '1', 'min_threshold': '4', 'tombstone_threshold': '.1', 'class': 'org.apache.cassandra.db.compaction.DateTieredCompactionStrategy'}
AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 315360000
AND gc_grace_seconds = 60
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0
AND speculative_retry = 'NONE';
Please suggest the option.
DTSC have many problems, and this is the cause TWCS was developed.
Please consider moving to this compaction strategy.
Here's an article about TWCS and it's benefits, and also explanation about it's parameters:
http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html
About your parameters in DTCS:
'base_time_seconds': '3600'
This can result in a large sstables, if you stay with DTCS, consider lowering it down regarding your record size and throughput.

Cassandra: Fetching data but not populating cache as query does not query from the start of the partition

I'm running Cassandra version 3.3 on a fairly beefy machine. I would like to try out the row cache so I have allocated 2 GBs of RAM for row caching and configured the target tables to cache a number of their rows.
If I run a query on a very small table (under 1 MB) twice with tracing on, on the 2nd query I see a cache hit. However, when I run a query on a large table (34 GB) I only get cache misses and see this message after every cache miss:
Fetching data but not populating cache as query does not query from the start of the partition
What does this mean? Do I need a bigger row cache to be able to handle a 34 GB table with 90 million keys?
Taking a look at the row cache source code on github, I see that clusteringIndexFilter().isHeadFilter() must be evaluating to false in this case. Is this a function of my partitions being too big?
My schema is:
CREATE TABLE ap.account (
email text PRIMARY KEY,
added_at timestamp,
data map< int, int >
) WITH bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': '100000'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
The query is simply SELECT * FROM account WHERE email='sample#test.com'

Resources