Connection timed out while dropping keyspace

Connection timed out while dropping keyspace - cassandra

I'm having trouble deleting a keyspace.
The keyspace in question has 4 tables similar to this one:
CREATE KEYSPACE demo WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'} AND durable_writes = false;
CREATE TABLE demo.t1 (
t11 int,
t12 timeuuid,
t13 int,
t14 text,
t15 text,
t16 boolean,
t17 boolean,
t18 int,
t19 timeuuid,
t110 text,
PRIMARY KEY (t11, t12)
) WITH CLUSTERING ORDER BY (t13 DESC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
CREATE INDEX t1_idx ON demo.t1 (t14);
CREATE INDEX t1_deleted_idx ON demo.t2 (t15);
When I want to delete the keyspace using the command:
Session session = cluster.connect();
PreparedStatement prepared = session.prepare("drop keyspace if exists " + schemaName);
BoundStatement bound = prepared.bind();
session.execute(bound);
Then the query gets timed out (or takes over 10 seconds to execute), even when the tables are empty:
com.datastax.driver.core.exceptions.OperationTimedOutException: [/192.168.0.1:9042] Timed out waiting for server response
at com.datastax.driver.core.exceptions.OperationTimedOutException.copy(OperationTimedOutException.java:44)
at com.datastax.driver.core.exceptions.OperationTimedOutException.copy(OperationTimedOutException.java:26)
at com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:37)
at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:245)
at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:64)
at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:39)
I tried this on multiple machines and the result was the same. I'm using Cassandra 3.9. A similar thing happens using cqlsh. I know that I can increase the read timeout in the cassandra.yaml file, but how can I make this dropping faster? Another thing is that if I do two consecutive requests, the first one gets timed out and the second one goes through fast.

Try to run it with increased timeout:
cqlsh --request-timeout=3600 (in seconds, default 10 deconds)
There's should be also same setting on driver level. Review timeout session in this link:
http://docs.datastax.com/en/developer/java-driver/3.1/manual/socket_options/

Increasing timeout just hides the issue away and is usually a bad idea. Have a look at this answer: https://stackoverflow.com/a/16618888/7413631

Related

Reduce gc_grace_seconds to 0 for TTL'ed data in Cassandra

Does it make sense to reduce gc_grace_seconds to 0 (or some other really low #) if table only contains TTL'ed data (with no manual deletes)? Table has a default_time_to_live set of 30 days. Also, as mentioned here
In a single-node cluster, this property can safely be set to zero. You
can also reduce this value for tables whose data is not explicitly
deleted — for example, tables containing only data with TTL set,
More details of the schema.
CREATE TABLE Foo (
user_uuid uuid,
ts bigint,
... //skipped a few columns
PRIMARY KEY (user_uuid, ts, event_uuid)
) WITH CLUSTERING ORDER BY (ts DESC, event_uuid ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy', 'compaction_window_size': '24', 'compaction_window_unit': 'HOURS', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 2592000
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';

You need to be careful, as with gc_grace_seconds you effectively disable hints collection, so if the node is down even for 5 minutes, you'll need to do a repair. In Cassandra 3.0, hints obey value gc_grace_seconds and if it's shorter then max_hints_window, then the hints will be collected for that time period only... But you can reduce this value to several hours if necessary, as it was hinted in linked documentation.
See this very good blog post on that topic.

What are the ways to select distinct count in cassandra?

I need to select distinct count in table in cassandra.
As I understand direct distinct count is not supported in cassandra not even nested queries like rdbms.
select count(*) from (select distinct key_part_one from stackoverflow_composite) as count;
SyntaxException: line 1:21 no viable alternative at input '(' (select count(*) from [(]...)
What are the ways to get it. whether I can get directly from cassandra or any addon tools/languages need to be used?
below is my create table statement.
CREATE TABLE nishant_ana.ais_profile_table (
profile_key text,
profile_id text,
last_update_day date,
last_transaction_timestamp timestamp,
last_update_insertion_timestamp timeuuid,
profile_data blob,
PRIMARY KEY ((profile_key, profile_id), last_update_day)
) WITH CLUSTERING ORDER BY (last_update_day DESC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
I have just started using cassandra.

From Cassandra you can only do the select distinct partition_key from ....
If you need something like this, you can use Spark + Spark Cassandra Connector - it will work, but don't expect really real-time answers, as it needs to read necessary data from all nodes, and then calculate answer.

TTL expired sstables not getting dropped on Cassandra Nodes

I am inserting data in below table with TTL of 2 days. SSTABLES with timestamp older than 2 days should have been dropped but that is not happening. There are no deletes and no updates. Window size is 1 hour and gc_grace_seconds is 7200. read_repair has been set to 0.0
CREATE TABLE events."290" (
key text PRIMARY KEY,
raw_log text
) WITH COMPACT STORAGE
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy', 'compaction_window_size': '1', 'compaction_window_unit': 'HOURS', 'max_threshold': '64', 'min_threshold': '32'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.SnappyCompressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.0
AND default_time_to_live = 0
AND gc_grace_seconds = 7200
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = 'NONE';
On restarting cassandra the expired sstables got removed as expected.
Should I be setting unchecked_tombstone_compaction = true?
Ideally I would like the sstables to be dropped completely rather than go through multiple compactions to get removed.
Earlier I was using DateTieredCompaction and cassandra 2.2. Everything was working fine. Upgraded to cassandra 3.0.13 and observed above behaviour. Changed compaction from DTCS to TWCS and still the same behaviour.
If on restarting the expired sstables were not getting removed, I would have considered case of overlapping sstables https://issues.apache.org/jira/browse/CASSANDRA-13418

cassandra read high iowait

data read per second
I have a three node cassandra cluster。when I have multi thread querying from the cluster ,the io load is very high.The cluster holds about 80GB data per node.I use time window compact strategy and time-window is ten hour.One sstable is about 1GB.Can some body help me with it.Thank you.one sstable infomation
data is at a speed of 10000 per second .the cluster holds about 10 billion records .
below is the schema information
CREATE TABLE point_warehouse.point_period (
point_name text,
year text,
time timestamp,
period int,
time_end timestamp,
value text,
PRIMARY KEY ((point_name, year), time)
) WITH CLUSTERING ORDER BY (time DESC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy', 'compaction_window_size': '10', 'compaction_window_unit': 'HOURS', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 2592000
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
and the query is SELECT * from POINT_PERIOD where POINT_NAME=? AND YEAR='2017' AND TIME >'2017-05-23 12:53:24 order by time asc LIMIT 1 ALLOW FILTERING'
when execute this query concurrently the io load became extremely high like 200MB/s. thank you .

Cassandra: Fetching data but not populating cache as query does not query from the start of the partition

I'm running Cassandra version 3.3 on a fairly beefy machine. I would like to try out the row cache so I have allocated 2 GBs of RAM for row caching and configured the target tables to cache a number of their rows.
If I run a query on a very small table (under 1 MB) twice with tracing on, on the 2nd query I see a cache hit. However, when I run a query on a large table (34 GB) I only get cache misses and see this message after every cache miss:
Fetching data but not populating cache as query does not query from the start of the partition
What does this mean? Do I need a bigger row cache to be able to handle a 34 GB table with 90 million keys?
Taking a look at the row cache source code on github, I see that clusteringIndexFilter().isHeadFilter() must be evaluating to false in this case. Is this a function of my partitions being too big?
My schema is:
CREATE TABLE ap.account (
email text PRIMARY KEY,
added_at timestamp,
data map< int, int >
) WITH bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': '100000'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
The query is simply SELECT * FROM account WHERE email='sample#test.com'

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string