I am trying to migrate Cassandra 2 to 3, but I am having troubles with TimeWindowCompactionStrategy.
Cassandra 2
compaction = {'compaction_window_size': '3', 'compaction_window_unit': 'DAYS', 'class': 'TimeWindowCompactionStrategy'}
Any idea in Cassandra 3? Thank you!
Following works for me in Cassandra 3.0
compaction = {'class': 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy', 'compaction_window_size': '3', 'compaction_window_unit': 'DAYS'}
Related
[Question posted by a user on YugabyteDB Community Slack]
I am trying to migrate DDLs from apache cassandra to YugabyteDB YCQL . But I am getting this error:
cassandra#ycqlsh:killrvideo> CREATE TABLE killrvideo.videos (
... video_id timeuuid PRIMARY KEY,
... added_date timestamp,
... title text
... ) WITH additional_write_policy = '99p'
... AND bloom_filter_fp_chance = 0.01
... AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
... AND cdc = false
... AND comment = ''
... AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
... AND compression = {'chunk_length_in_kb': '16', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
... AND crc_check_chance = 1.0
... AND default_time_to_live = 0
... AND extensions = {}
... AND gc_grace_seconds = 864000
... AND max_index_interval = 2048
... AND memtable_flush_period_in_ms = 0
... AND min_index_interval = 128
... AND read_repair = 'BLOCKING'
... AND speculative_retry = '99p';
SyntaxException: Invalid SQL Statement. syntax error, unexpected '}', expecting SCONST
CREATE TABLE killrvideo.videos (
video_id timeuuid PRIMARY KEY,
added_date timestamp,
title text
) WITH additional_write_policy = '99p'
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND cdc = false
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '16', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND default_time_to_live = 0
AND extensions = {}
^
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair = 'BLOCKING'
AND speculative_retry = '99p';
(ql error -11)
Are these optional parameters after the create table not supported in YugabyteDB (pulled from describing keyspace killrvideo).
Not sure what I am missing here? Any help is really appreciated
Please have a look at: https://docs.yugabyte.com/latest/api/ycql/ddl_create_table/#table-properties-1
You can remove everything starting from with which will probably allow you to create the table.
Taking this statement from the docs:
The other YCQL table properties are allowed in the syntax but are
currently ignored internally (have no effect). (other are properties
that do not deal with a limited set of table properties for the YCQL
engine)
The probable reason you are getting this error is that you use properties that are specific to stock cassandra, and are not implemented in YugabyteDB, because they use a different storage layer.
We're currently switching cassandra (2.x to 3.11.1) and when I exported the data as plain text (as prepared INSERT statements) and checked the file size I was shocked.
The actual data size in txt was 11.7GB.
The actual file size of all .db files is 127GB.
All keyspaces are configured with compaction SizeTieredCompactionStrategy and compresseion LZ4:
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
So why are the files on the disk 10x bigger than the real data? And how do I shrink those files to reflect (kinda) the real data size?
Just a note: all data is simple time-series with timestamp and values (min, max, avg, count, strings, ...)
The schema:
CREATE TABLE prod.data (
datainput bigint,
aggregation int,
timestamp bigint,
avg double,
count double,
flags int,
max double,
min double,
sum double,
val_d double,
val_l bigint,
val_str text,
PRIMARY KEY (datainput, aggregation, timestamp)
) WITH CLUSTERING ORDER BY (aggregation ASC, timestamp ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.0
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 1.0
AND speculative_retry = '99PERCENTILE';
Thanks all!
UPDATE
add schema
fix cassandra version (3.1 => 3.11.1)
So I am using the current query to insert data into my column family:
INSERT INTO airports (apid, name, iata, icao, x, y, elevation, code, name, oa_code, dst, cid, name, timezone, tz_id) VALUES (12012,'Ararat Airport','ARY','YARA','142.98899841308594','-37.30939865112305',1008,{ code: 'AS', name: 'Australia', oa_code: 'AU', dst: 'U',city: { cid: 1, name: '', timezone: '', tz_id: ''}});
Now, I'm getting the error: Unmatched column names/values when this is my current model for the airports columnfamily:
CREATE TYPE movielens.cityudt (
cid varint,
name text,
timezone varint,
tz_id text
);
CREATE TYPE movielens.countryudt (
code text,
name text,
oa_code text,
dst text,
city frozen<cityudt>
);
CREATE TABLE movielens.airports (
apid varint PRIMARY KEY,
country frozen<countryudt>,
elevation varint,
iata text,
icao text,
name text,
x varint,
y varint
) WITH bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
But I cant see the problem with the insert! Can someone help me figure out where I am going wrong?
Ok, so I did manage to get this work after adjusting your x and y columns to doubles:
INSERT INTO airports (apid, name, iata, icao, x, y, elevation, country)
VALUES (12012,'Ararat Airport','ARY','YARA',142.98899841308594,-37.30939865112305,
1008,{ code: 'AS', name: 'Australia', oa_code: 'AU', dst: 'U',
city:{ cid: 1, name: '', timezone: 0, tz_id: ''}});
cassdba#cqlsh:stackoverflow> SELECT * FROM airports WHERE apid=12012;
apid | country | elevation | iata | icao | name | x | y
-------+------------------------------------------------------------------------------------------------------------+-----------+------+------+----------------+---------+----------
12012 | {code: 'AS', name: 'Australia', oa_code: 'AU', dst: 'U', city: {cid: 1, name: '', timezone: 0, tz_id: ''}} | 1008 | ARY | YARA | Ararat Airport | 142.989 | -37.3094
(1 rows)
Remember that VARINTs don't take single quotes (like timezone).
Also, you were specifying each type's column, when you just needed to specify country in your column list (as you mentioned).
When inserting into a frozen udt, you do not have to specify the insert for all the rows (even those inside the frozen udt), therefore to fix the query I had to remove all those up to the elevation column and add country.
I would like to insert experimental data into Cassandra where each data has precision of 15 decimal places. The sample dataset is as follows:
+------------------+-------------------+
| Sampling_Rate | Value1 |
+------------------+-------------------+
| 2.48979187011719 | 0.144110783934593 |
+------------------+-------------------+
I would like to see the Sampling_Rate as an Epoch time (i.e. 1970-01-01 00:00:02.48979187011719+0000), and Value1 to store its full precision value.
For this, I inserted data with the describe table :
CREATE TABLE project_fvag.temp (
sampling_rate timestamp PRIMARY KEY,
value1 double ) WITH bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
I also changed the cqlshrc file with increasing precision for both float and double. Also, changed the datetimeformat:
datetimeformat = %Y-%m-%d %H:%M:%S.%.15f%z ;float_precision = 5 ;double_precision = 15
Inspite of these changes, I get the result stored as only 6 decimal places both in timestamp and value. What could be a better strategy to store/see as per my expectation?
For the sampling value: since you set it up as timestamp, cassandra will store it with milisecond precision. One way would be to store it as decimal.
The same applies to value1. Recreate your table with decimal instead of double for value1.
I'm currently trying to cache much stuff on my web game.
Its kinda a speedrun game, where you have specific sections and it saves each duration for a section in a dict.
So right now, i have a pretty long dict:
dict = [{'account_id': '10', 'Damage': 4874, 'duration': 50.020756483078},
{'account_id': '10', 'Damage': 5920, 'duration': 20.020756483078},
{'account_id': '10', 'Damage': 2585, 'duration': 30.02078},
{'account_id': '4', 'Damage': 3145, 'duration': 21.020756483078},
{'account_id': '4', 'Damage': 4202, 'duration': 60.020756483078},
{'account_id': '4', 'Damage': 5252, 'duration': 66.020756483078}]
(Its much more, up to 10 sections for an account_id, but I just created an example to use for this question)
Then we need to assign those times to an account_id
enterDict = {}
for x in dict :
enterDict[x["account_id"]] = x
But when I try to get the times from the cache, via
account_id = "4"
EnterInfo = enterDict[account_id]
print(EnterInfo)
It only returns one
{'Damage': 5252, 'account_id': '4', 'duration': 66.020756483078}
As an addition:
If we resolve this, is there a way I can give them an order? Since dicts are messing up everything.
So it should start from the lowest duration to the top one, since thats the right correct order.
So I could just use [0] for the lowest duration of 21 [1] for 60 and [2] for 66. I wouldnt mind if there is a new key : value with "order": "0", "order": "1" and "order":"2"
{'account_id': '4', 'Damage': 3145, 'duration': 21.020756483078},
{'account_id': '4', 'Damage': 4202, 'duration': 60.020756483078},
{'account_id': '4', 'Damage': 5252, 'duration': 66.020756483078}]
What you actually want to do is to create a list of dictionaries as a value behind your 'account_id' key. Now you're replacing with each addition the previous value for the key instead of adding (appending) to it.
See 1 for a example how to initialize lists within a dictionary.
See 2 for a example to order a lists of dicts on a certain value. However I could also suggest a Pandas Dataframe for this Job.
This could be a solution:
from collections import defaultdict
from operator import itemgetter
enterDict = defaultdict(list)
for item in dict:
enterDict[item['account_id']].append(item)
sortDict = defaultdict(list)
for key_of_dict in enterDict:
sorted_list = sorted(enterDict[key_of_dict], key=itemgetter('duration'))
sortDict[key_of_dict] = sorted_list