Commenting Cassandra's keyspace, table, column

Commenting Cassandra's keyspace, table, column - cassandra

In Oracle there is possibility to add a comment about a table, view, materialized view, or column into the data dictionary, e.g.
COMMENT ON COLUMN employees.job_id
IS 'abbreviated job title';
I found this particularly usefull as a tester when trying to understand ideas behind the names which are not necessarily self-explanable and in large databases (over 200 tables).
Is there such feature in Cassandra?

You can use 'with comment' option
cqlsh:d2>
cqlsh:d2> create table employee (id int primary key, name text) with comment = 'Employee id and name';
cqlsh:d2> desc table employee;
CREATE TABLE d2.employee (
id int PRIMARY KEY,
name text
) WITH bloom_filter_fp_chance = 0.01
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
AND comment = 'Employee id and name'
AND compaction = {'min_threshold': '4', 'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32'}
AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';
Cassandra documentation

Related

Cassandra queries perform a full table scan if no rows exist for a specific partition key

I have a very large table like
CREATE TABLE IF NOT EXISTS profile (
account_id text,
user_id uuid,
user_data text,
creation_date timestamp,
update_date timestamp,,
PRIMARY KEY ((account_id, user_id))
) WITH bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': '10'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
The following query will run the full table scan if the table has no rows matching the partial partition key (account_id = 'D-F-8CX7PGX')
SELECT * FROM profile WHERE account_id = 'D-F-8CX7PGX' AND user_id = '123e4567-e89b-12d3-a456-426614174000';
I expect that Cassandra could quickly return with no rows found, not scan the full table.
Someone suggested inserting a dummy row with (account_id = 'D-F-8CX7PGX' AND user_id = '00000000-0000-0000-0000-000000000000') could avoid the full table scan. But I don't understand why it is needed.
Does anyone encounter the similar issue?

A single partition query does not do a full table scan.
Since the partition key is (account_id, user_id) and your query filters on a single partition, Cassandra will attempt to retrieve the partition from the relevant replica(s) without scanning the whole table. Cheers!

Problems performing an update on Cassandra having a compound partitioning key

I have this table in Cassandra:
CREATE TABLE wear_dealer.product_color_size_stock (
productcode text,
colorcode text,
sizecode text,
ean text,
shortdescription text,
stock int,
**PRIMARY KEY (productcode, colorcode, sizecode)**
) WITH CLUSTERING ORDER BY (colorcode ASC, sizecode ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
CREATE INDEX product_color_size_stock_stock_idx ON wear_dealer.product_color_size_stock (stock);
How can I update shortdescription having only the value for productcode
When I perform this query:
cqlsh:wear_dealer> update seasons_product_color_size
set shortdescription ='AAA'
where productcode='RUNTS';
I get the following error:
InvalidRequest: Error from server: code=2200 [Invalid query] message="Some partition key parts are missing: seasoncode"
Any strategie to overcome this?
Many thanks in advance!

Unfortunately, CQL does not allow writes for a partial key. Remember that Cassandra treats INSERTs and UPDATEs the same. So when this:
UPDATE seasons_product_color_size
SET shortdescription ='AAA'
WHERE productcode='RUNTS';
Returns this: "Some partition key parts are missing: seasoncode"
It's saying that Cassandra doesn't know which node to write the data to, because there isn't a partition key. In SQL, it would just iterate through all rows in the table and update them according to your WHERE clause. But Cassandra is specifically designed not to allow operations like that.
For this query you will need to figure out the missing seasoncodes separately, and UPDATE each row individually.

Cassandra supports write based on partition key, As you supplied partial partition key you cannot update with that.
UPDATE seasons_product_color_size SET shortdescription ='AAA' WHERE productcode='RUNTS' and sizecode=10

How do model data in Cassandra for faster reads?

We have modeled data in Cassandra. There is continuous write that happens on data because of events generated by different systems. The schema of the table is defined below. The WRITE works fine on the table but READ with where clause of id takes up to 9s on 99th percentile. Kindly help me with better design of this table. The data column contains a JSON string up to 2KB.
CREATE TABLE table (
id text,
p1 text,
o1 text,
s1 text,
data text,
enabled boolean,
PRIMARY KEY (id, p1, o1, s1)
) WITH CLUSTERING ORDER BY (p1 ASC, o1 ASC, s1 ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
CREATE INDEX table_enabled_idx ON table (enabled);

The table_enabled_idx index will be very slow and eventually break. ditch it.
LeveledCompactionStrategy will flat out improve read performance. STCS only better if you never read the data or on ancient disks imho. Set dclocal_read_repair_chance to zero (wont really make a difference but might as well).
Need a trace to identify if its something else like being too wide, too many tombstones etc which what you provided doesnt tell. Can also be GCs from unrelated things like compactions, bad jvm settings, other data models on system etc. Enable speculative execution on driver to work around GCs if they are infrequent.

Cassandra Delete query with If condition not working

I've got a cassandra table and want to delete a row, but only if one column has one specific value.
Even if cassandra claims that deleting succeeded (it returned "applied: true") the message will still be present.
Let's create the table and insert some data:
CREATE TABLE IF NOT EXISTS test
(
id uuid PRIMARY KEY,
recipient text,
message text
);
INSERT INTO test (id, recipient, message)
VALUES (7ee055ee-b5dd-4bfd-b184-614d51e268d5, 'felix', 'foo');
INSERT INTO test (id, recipient, message)
VALUES (86c9d632-dc24-4635-8277-c987c78bd242, 'andrew', 'bar');
Now I want to delete one message, but only if the user who requests the deletion (in this case felix) is the recipient and thus has permissions to do so:
cqlsh:service_message> DELETE FROM test WHERE id=7ee055ee-b5dd-4bfd-b184-614d51e268d5 IF recipient='felix';
[applied]
-----------
True
So I would now think that the query did succeed, but if we have a look at the table we'll see that the message still exists.
cqlsh:service_message> SELECT * FROM test;
id | message | recipient
--------------------------------------+---------+-----------
86c9d632-dc24-4635-8277-c987c78bd242 | bar | andrew
7ee055ee-b5dd-4bfd-b184-614d51e268d5 | foo | felix
(2 rows)
Some additional information:
cqlsh 5.0.1 | Cassandra 3.11.2 | CQL spec 3.4.4 | Native protocol v4
cqlsh> DESCRIBE KEYSPACE service_message
CREATE KEYSPACE service_message WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '3'} AND durable_writes = true;
CREATE TABLE service_message.test (
id uuid PRIMARY KEY,
message text,
recipient text
) WITH bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';

INSERT and UPDATE statements using the IF clause support lightweight transactions .
From Datastax docs on CQL: https://docs.datastax.com/en/cql/3.3/cql/cql_using/useInsertLWT.html
I'm pretty sure deletes are not supported. If you want to effectively delete your information, you may consider setting the values of the cells in an UPDATE statement to null. Either by delete or by setting nulls, you are still creating tombstones.

Select distinct gives incorrect values even if performed on primary key Cassandra

Im running Cassandra Version 2.1.2 and cqlsh 5.0.1
Here is the table weather.log, weather is the keyspace having consistency level One.
I have 2 nodes configured.
CREATE KEYSPACE weather WITH replication = {'class': 'NetworkTopologyStrategy', 'us-east': '1'} AND durable_writes = true;
CREATE TABLE weather.log (
ip inet,
ts timestamp,
city text,
country text,
PRIMARY KEY (ip, ts)
) WITH CLUSTERING ORDER BY (ts DESC)
AND bloom_filter_fp_chance = 0.01
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
AND comment = ''
AND compaction = {'min_threshold': '4', 'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32'}
AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';
When we run the query.
select distinct ip from weather.log
We get inconsistent, wrong responses. Once we get 99 just next time we get 1600 etc. [where the actual number should be > 2000]
I have tried this query with consistency level set to ALL also. It dint work.
Why is this happening ? I need to get all the keys. How to get all the primary keys?

It looks like you might be effected by CASSANDRA-8940. I'd suggest to update to the latest 2.1.x release and verify if this issue is fixed for you.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Commenting Cassandra's keyspace, table, column - cassandra

Related

Cassandra queries perform a full table scan if no rows exist for a specific partition key

Problems performing an update on Cassandra having a compound partitioning key

How do model data in Cassandra for faster reads?

Cassandra Delete query with If condition not working

Select distinct gives incorrect values even if performed on primary key Cassandra

Categories

Resources