Cassandra table schema changing - cassandra

I am using Datastax Cassandra3.0
while creating table in cassandra using cqlsh schema is changing column names are arranging into alphabetaical order. Please see below.
This is the structure when creating a table..
cqlsh> CREATE TABLE tutorialspoint.SupplierItemData_input15(partnumber BIGINT PRIMARY KEY,
... supplier text,
... monthyear varchar,
... allocation int,
... evdate date,
... paymentterms int,
... actualdays int,
... percentageofpayment int,
... variation int,
... paymenttermsummary text,
... copq int,
... year int,
... month int,
... postingdate date);
But while i check the DESCRIBE TABLE NAME the structure is changing
cqlsh> DESCRIBE tutorialspoint.SupplierItemData_input15;
CREATE TABLE tutorialspoint.supplieritemdata_input15 (
partnumber bigint PRIMARY KEY,
actualdays int,
allocation int,
copq int,
evdate date,
month int,
monthyear text,
paymentterms int,
paymenttermsummary text,
percentageofpayment int,
postingdate date,
supplier text,
variation int,
year int
)
WITH bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
Please help me on this.
Thankyou
Ravi

If you want to import data using cqlsh COPY from a csv file then you should add your column names as a header at the top of the csv file. That way it doesn't matter what order they are by default.

Related

Cassandra queries perform a full table scan if no rows exist for a specific partition key

I have a very large table like
CREATE TABLE IF NOT EXISTS profile (
account_id text,
user_id uuid,
user_data text,
creation_date timestamp,
update_date timestamp,,
PRIMARY KEY ((account_id, user_id))
) WITH bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': '10'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
The following query will run the full table scan if the table has no rows matching the partial partition key (account_id = 'D-F-8CX7PGX')
SELECT * FROM profile WHERE account_id = 'D-F-8CX7PGX' AND user_id = '123e4567-e89b-12d3-a456-426614174000';
I expect that Cassandra could quickly return with no rows found, not scan the full table.
Someone suggested inserting a dummy row with (account_id = 'D-F-8CX7PGX' AND user_id = '00000000-0000-0000-0000-000000000000') could avoid the full table scan. But I don't understand why it is needed.
Does anyone encounter the similar issue?
A single partition query does not do a full table scan.
Since the partition key is (account_id, user_id) and your query filters on a single partition, Cassandra will attempt to retrieve the partition from the relevant replica(s) without scanning the whole table. Cheers!

Problems performing an update on Cassandra having a compound partitioning key

I have this table in Cassandra:
CREATE TABLE wear_dealer.product_color_size_stock (
productcode text,
colorcode text,
sizecode text,
ean text,
shortdescription text,
stock int,
**PRIMARY KEY (productcode, colorcode, sizecode)**
) WITH CLUSTERING ORDER BY (colorcode ASC, sizecode ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
CREATE INDEX product_color_size_stock_stock_idx ON wear_dealer.product_color_size_stock (stock);
How can I update shortdescription having only the value for productcode
When I perform this query:
cqlsh:wear_dealer> update seasons_product_color_size
set shortdescription ='AAA'
where productcode='RUNTS';
I get the following error:
InvalidRequest: Error from server: code=2200 [Invalid query] message="Some partition key parts are missing: seasoncode"
Any strategie to overcome this?
Many thanks in advance!
Unfortunately, CQL does not allow writes for a partial key. Remember that Cassandra treats INSERTs and UPDATEs the same. So when this:
UPDATE seasons_product_color_size
SET shortdescription ='AAA'
WHERE productcode='RUNTS';
Returns this: "Some partition key parts are missing: seasoncode"
It's saying that Cassandra doesn't know which node to write the data to, because there isn't a partition key. In SQL, it would just iterate through all rows in the table and update them according to your WHERE clause. But Cassandra is specifically designed not to allow operations like that.
For this query you will need to figure out the missing seasoncodes separately, and UPDATE each row individually.
Cassandra supports write based on partition key, As you supplied partial partition key you cannot update with that.
UPDATE seasons_product_color_size SET shortdescription ='AAA' WHERE productcode='RUNTS' and sizecode=10

How to do pagination and sorting on post table in cassandra database?

I am using cassandra v3.0 as my database
My Table Scheme is as below:
`CREATE TABLE db_name.post (
postcreatedby timeuuid,
contenttype text,
createdat bigint,
friendid timeuuid,
posttype text,
id timeuuid,
PRIMARY KEY (postcreatedby, contenttype, createdat, friendid, posttype, id)
) WITH CLUSTERING ORDER BY (contenttype ASC, createdat ASC, friendid ASC, posttype ASC, id ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry`enter code here` = '99PERCENTILE';`
And I am trying to run the following query:
SELECT * from post WHERE postcreatedby = timeuuid AND contenttype IN ('text', 'text') AND createdat < bigint AND friendid = timeuuid AND posttype = 'text';
And getting following Error:
InvalidRequest: Error from server: code=2200 [Invalid query] message="Clustering column "friendid" cannot be restricted (preceding column "createdat" is restricted by a non-EQ relation)"
My question is:
I need to use all columns for filtering the data and need to sort it as well. Here i am using 'createdAt' parameter to maintain sorting and pagination.
My problem is, if I will set createdAt as a last cluster key then I can use all columns for filtering as well but unable to sort that data. And if I will put the createdAt as before any parameter then I can not use last parameters as a filter.

Cassandra Delete Records

I'm new to Cassandra and I've been having some issues trying to delete records. I have a table defined as follows:
CREATE TABLE wire_journal (
persistence_id text,
partition_nr bigint,
sequence_nr bigint,
timestamp timeuuid,
timebucket text,
event blob,
event_manifest text,
message blob,
ser_id int,
ser_manifest text,
tag1 text,
tag2 text,
tag3 text,
used boolean static,
writer_uuid text,
PRIMARY KEY ((persistence_id, partition_nr), sequence_nr, timestamp, timebucket)
) WITH CLUSTERING ORDER BY (sequence_nr ASC, timestamp ASC, timebucket ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'bucket_high': '1.5', 'bucket_low': '0.5', 'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'enabled': 'true', 'max_threshold': '32', 'min_sstable_size': '50', 'min_threshold': '4', 'tombstone_compaction_interval': '86400', 'tombstone_threshold': '0.2', 'unchecked_tombstone_compaction': 'false'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
And Indexes defined as follows:
CREATE CUSTOM INDEX timestamp_idx ON wire_journal (timestamp) USING 'org.apache.cassandra.index.sasi.SASIIndex';
CREATE CUSTOM INDEX manifest_idx ON wire_journal (event_manifest) USING 'org.apache.cassandra.index.sasi.SASIIndex';
I would like to be able to delete by timestamp and event_manifest.
I can query by an event manifest for example:
select event_manifest, dateOf(timestamp) from wire_journal where event_manifest = '011000028';
The query above works. However If I try to do a deletion for the same criteria as follows:
delete from wire_journal where event_manifest = '011000028';
I get the following error:
InvalidRequest: code=2200 [Invalid query] message="Some partition key parts are missing: persistence_id, partition_nr"
I've tried including those columns in my delete as follows:
delete persistence_id, partition_nr from wire_journal where event_manifest = 'aba:011000028';
and I get the following error:
invalidRequest: code=2200 [Invalid query] message="Invalid identifier persistence_id for deletion (should not be a PRIMARY KEY part)"
How can I go about deleting all the records that match that condition?
Your partition key is (persistence_id, partition_nr) and Cassandra only delete records using partition key
So your query need to be like:
delete from wire_journal where persistence_id = x AND partition_nr = y AND event_manifest = 'aba:011000028';

Cassandra Undefined name in where clause

I'm querying a cassandra table executing the following command:
select * from oap.purchase_events where clientNumber = '100'
The table contains a row with clientNumber 100 , however I get this error:
InvalidRequest: code=2200 [Invalid query] message="Undefined name clientnumber in where clause ('clientnumber = 100')"
The table definition:
CREATE TABLE oap.purchase_events (
"parentId" text,
"childId" text,
"clientNumber" text,
cost double,
description text,
"eventDate" timestamp,
"logDate" timestamp,
message text,
"operationalChannel" text,
"productDuration" bigint,
"productId" text,
"transactionId" text,
volume double,
"volumeUnit" text,
PRIMARY KEY ("parentId", "childId")
) WITH CLUSTERING ORDER BY ("childId" ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'}
AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';
CREATE INDEX purchase_events_clientNumber_idx ON gestor.purchase_events ("clientNumber");
Any help?
Just enclose clientNumber with double quote
Example : select * from purchase_events where "clientNumber" = '100';

Resources