Migrating DDL from Cassandra to YugabyteDB YCQL - yugabytedb

[Question posted by a user on YugabyteDB Community Slack]
I am trying to migrate DDLs from apache cassandra to YugabyteDB YCQL . But I am getting this error:
cassandra#ycqlsh:killrvideo> CREATE TABLE killrvideo.videos (
... video_id timeuuid PRIMARY KEY,
... added_date timestamp,
... title text
... ) WITH additional_write_policy = '99p'
... AND bloom_filter_fp_chance = 0.01
... AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
... AND cdc = false
... AND comment = ''
... AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
... AND compression = {'chunk_length_in_kb': '16', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
... AND crc_check_chance = 1.0
... AND default_time_to_live = 0
... AND extensions = {}
... AND gc_grace_seconds = 864000
... AND max_index_interval = 2048
... AND memtable_flush_period_in_ms = 0
... AND min_index_interval = 128
... AND read_repair = 'BLOCKING'
... AND speculative_retry = '99p';
SyntaxException: Invalid SQL Statement. syntax error, unexpected '}', expecting SCONST
CREATE TABLE killrvideo.videos (
video_id timeuuid PRIMARY KEY,
added_date timestamp,
title text
) WITH additional_write_policy = '99p'
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND cdc = false
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '16', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND default_time_to_live = 0
AND extensions = {}
^
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair = 'BLOCKING'
AND speculative_retry = '99p';
(ql error -11)
Are these optional parameters after the create table not supported in YugabyteDB (pulled from describing keyspace killrvideo).
Not sure what I am missing here? Any help is really appreciated

Please have a look at: https://docs.yugabyte.com/latest/api/ycql/ddl_create_table/#table-properties-1
You can remove everything starting from with which will probably allow you to create the table.
Taking this statement from the docs:
The other YCQL table properties are allowed in the syntax but are
currently ignored internally (have no effect). (other are properties
that do not deal with a limited set of table properties for the YCQL
engine)
The probable reason you are getting this error is that you use properties that are specific to stock cassandra, and are not implemented in YugabyteDB, because they use a different storage layer.

Related

Compaction - TimeWindowCompactionStrategy Cassandra 3

I am trying to migrate Cassandra 2 to 3, but I am having troubles with TimeWindowCompactionStrategy.
Cassandra 2
compaction = {'compaction_window_size': '3', 'compaction_window_unit': 'DAYS', 'class': 'TimeWindowCompactionStrategy'}
Any idea in Cassandra 3? Thank you!
Following works for me in Cassandra 3.0
compaction = {'class': 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy', 'compaction_window_size': '3', 'compaction_window_unit': 'DAYS'}

Cassandra - .db files bigger than actual data

We're currently switching cassandra (2.x to 3.11.1) and when I exported the data as plain text (as prepared INSERT statements) and checked the file size I was shocked.
The actual data size in txt was 11.7GB.
The actual file size of all .db files is 127GB.
All keyspaces are configured with compaction SizeTieredCompactionStrategy and compresseion LZ4:
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
So why are the files on the disk 10x bigger than the real data? And how do I shrink those files to reflect (kinda) the real data size?
Just a note: all data is simple time-series with timestamp and values (min, max, avg, count, strings, ...)
The schema:
CREATE TABLE prod.data (
datainput bigint,
aggregation int,
timestamp bigint,
avg double,
count double,
flags int,
max double,
min double,
sum double,
val_d double,
val_l bigint,
val_str text,
PRIMARY KEY (datainput, aggregation, timestamp)
) WITH CLUSTERING ORDER BY (aggregation ASC, timestamp ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.0
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 1.0
AND speculative_retry = '99PERCENTILE';
Thanks all!
UPDATE
add schema
fix cassandra version (3.1 => 3.11.1)

Unmatched column names/values when inserting data in Cassandra cqlsh

So I am using the current query to insert data into my column family:
INSERT INTO airports (apid, name, iata, icao, x, y, elevation, code, name, oa_code, dst, cid, name, timezone, tz_id) VALUES (12012,'Ararat Airport','ARY','YARA','142.98899841308594','-37.30939865112305',1008,{ code: 'AS', name: 'Australia', oa_code: 'AU', dst: 'U',city: { cid: 1, name: '', timezone: '', tz_id: ''}});
Now, I'm getting the error: Unmatched column names/values when this is my current model for the airports columnfamily:
CREATE TYPE movielens.cityudt (
cid varint,
name text,
timezone varint,
tz_id text
);
CREATE TYPE movielens.countryudt (
code text,
name text,
oa_code text,
dst text,
city frozen<cityudt>
);
CREATE TABLE movielens.airports (
apid varint PRIMARY KEY,
country frozen<countryudt>,
elevation varint,
iata text,
icao text,
name text,
x varint,
y varint
) WITH bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
But I cant see the problem with the insert! Can someone help me figure out where I am going wrong?
Ok, so I did manage to get this work after adjusting your x and y columns to doubles:
INSERT INTO airports (apid, name, iata, icao, x, y, elevation, country)
VALUES (12012,'Ararat Airport','ARY','YARA',142.98899841308594,-37.30939865112305,
1008,{ code: 'AS', name: 'Australia', oa_code: 'AU', dst: 'U',
city:{ cid: 1, name: '', timezone: 0, tz_id: ''}});
cassdba#cqlsh:stackoverflow> SELECT * FROM airports WHERE apid=12012;
apid | country | elevation | iata | icao | name | x | y
-------+------------------------------------------------------------------------------------------------------------+-----------+------+------+----------------+---------+----------
12012 | {code: 'AS', name: 'Australia', oa_code: 'AU', dst: 'U', city: {cid: 1, name: '', timezone: 0, tz_id: ''}} | 1008 | ARY | YARA | Ararat Airport | 142.989 | -37.3094
(1 rows)
Remember that VARINTs don't take single quotes (like timezone).
Also, you were specifying each type's column, when you just needed to specify country in your column list (as you mentioned).
When inserting into a frozen udt, you do not have to specify the insert for all the rows (even those inside the frozen udt), therefore to fix the query I had to remove all those up to the elevation column and add country.

How to store/show timestamp and double columns with 15 decimal precision?

I would like to insert experimental data into Cassandra where each data has precision of 15 decimal places. The sample dataset is as follows:
+------------------+-------------------+
| Sampling_Rate | Value1 |
+------------------+-------------------+
| 2.48979187011719 | 0.144110783934593 |
+------------------+-------------------+
I would like to see the Sampling_Rate as an Epoch time (i.e. 1970-01-01 00:00:02.48979187011719+0000), and Value1 to store its full precision value.
For this, I inserted data with the describe table :
CREATE TABLE project_fvag.temp (
sampling_rate timestamp PRIMARY KEY,
value1 double ) WITH bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
I also changed the cqlshrc file with increasing precision for both float and double. Also, changed the datetimeformat:
datetimeformat = %Y-%m-%d %H:%M:%S.%.15f%z ;float_precision = 5 ;double_precision = 15
Inspite of these changes, I get the result stored as only 6 decimal places both in timestamp and value. What could be a better strategy to store/see as per my expectation?
For the sampling value: since you set it up as timestamp, cassandra will store it with milisecond precision. One way would be to store it as decimal.
The same applies to value1. Recreate your table with decimal instead of double for value1.

cassandra select query failed with code=1300

I have a Cassandra cluster of 4 nodes with replication factor of 3, we have about 400M records on table1 and after restarting the cluster I got the following error when selecting from the table
ReadFailure: Error from server: code=1300 [Replica(s) failed to execute read] message="Operation failed - received 0 responses and 1 failures" info={'failures': 1, 'received_responses': 0, 'required_responses': 1, 'consistency': 'ONE'}
We tried these queries on cqlsh
select id from table1; ----> Failed
select id from table1 limit 10; -----> Failed
select id from table1 limit 1; ----> Succeed
select id form table1 where id=22; ----> Succeed
select id from othertable; ----> Succeed
And the schema for the table is:
CREATE TABLE ks.table1 (
id int,
name text,
PRIMARY KEY (id))
WITH read_repair_chance = 0.0
AND dclocal_read_repair_chance = 0.1
AND gc_grace_seconds = 864000
AND bloom_filter_fp_chance = 0.01
AND caching = { 'keys' : 'ALL', 'rows_per_partition' : 'NONE' }
AND comment = ''
AND compaction = { 'class' : 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy' }
AND compression = { 'chunk_length_in_kb' : 64, 'class' : 'org.apache.cassandra.io.compress.LZ4Compressor' }
AND default_time_to_live = 0
AND speculative_retry = '99PERCENTILE'
AND min_index_interval = 128
AND max_index_interval = 2048
AND crc_check_chance = 1.0;
How can I trace the error and specify the problem?
We are using Cassandra 3.0.8.1293

Resources