When I try to read from a Cassandra table I get what looks like binary output:
cqlsh 10.243.128.4 --debug -e "select enduser from test.cert limit 2;"
enduser
---------------------------------------
*7UDdnLg\x1135J"\x15%(
\x10\x1c\x1aHa\x7fO\x19)1#3b\x17\x
I am not sure why this is happening. The other fields are displayed correctly.
Table def:
CREATE TABLE test.cert (
enduser text,
cert_id int
PRIMARY KEY (enduser, cert_id)
) WITH CLUSTERING ORDER BY (cert_id ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 1024
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
I have tried UTF8 encoding with the command line but it did not help.
The CQL text data type is a UTF-8 encoded string so it just displays the data in the cell.
It looks like you've stored some data with hex-encoding in it. If you post examples of what the data should be versus how it is displayed, it will provide clues as to what's going on. Cheers!
Related
Does it make sense to reduce gc_grace_seconds to 0 (or some other really low #) if table only contains TTL'ed data (with no manual deletes)? Table has a default_time_to_live set of 30 days. Also, as mentioned here
In a single-node cluster, this property can safely be set to zero. You
can also reduce this value for tables whose data is not explicitly
deleted — for example, tables containing only data with TTL set,
More details of the schema.
CREATE TABLE Foo (
user_uuid uuid,
ts bigint,
... //skipped a few columns
PRIMARY KEY (user_uuid, ts, event_uuid)
) WITH CLUSTERING ORDER BY (ts DESC, event_uuid ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy', 'compaction_window_size': '24', 'compaction_window_unit': 'HOURS', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 2592000
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
You need to be careful, as with gc_grace_seconds you effectively disable hints collection, so if the node is down even for 5 minutes, you'll need to do a repair. In Cassandra 3.0, hints obey value gc_grace_seconds and if it's shorter then max_hints_window, then the hints will be collected for that time period only... But you can reduce this value to several hours if necessary, as it was hinted in linked documentation.
See this very good blog post on that topic.
I need to select distinct count in table in cassandra.
As I understand direct distinct count is not supported in cassandra not even nested queries like rdbms.
select count(*) from (select distinct key_part_one from stackoverflow_composite) as count;
SyntaxException: line 1:21 no viable alternative at input '(' (select count(*) from [(]...)
What are the ways to get it. whether I can get directly from cassandra or any addon tools/languages need to be used?
below is my create table statement.
CREATE TABLE nishant_ana.ais_profile_table (
profile_key text,
profile_id text,
last_update_day date,
last_transaction_timestamp timestamp,
last_update_insertion_timestamp timeuuid,
profile_data blob,
PRIMARY KEY ((profile_key, profile_id), last_update_day)
) WITH CLUSTERING ORDER BY (last_update_day DESC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
I have just started using cassandra.
From Cassandra you can only do the select distinct partition_key from ....
If you need something like this, you can use Spark + Spark Cassandra Connector - it will work, but don't expect really real-time answers, as it needs to read necessary data from all nodes, and then calculate answer.
I am inserting data in below table with TTL of 2 days. SSTABLES with timestamp older than 2 days should have been dropped but that is not happening. There are no deletes and no updates. Window size is 1 hour and gc_grace_seconds is 7200. read_repair has been set to 0.0
CREATE TABLE events."290" (
key text PRIMARY KEY,
raw_log text
) WITH COMPACT STORAGE
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy', 'compaction_window_size': '1', 'compaction_window_unit': 'HOURS', 'max_threshold': '64', 'min_threshold': '32'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.SnappyCompressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.0
AND default_time_to_live = 0
AND gc_grace_seconds = 7200
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = 'NONE';
On restarting cassandra the expired sstables got removed as expected.
Should I be setting unchecked_tombstone_compaction = true?
Ideally I would like the sstables to be dropped completely rather than go through multiple compactions to get removed.
Earlier I was using DateTieredCompaction and cassandra 2.2. Everything was working fine. Upgraded to cassandra 3.0.13 and observed above behaviour. Changed compaction from DTCS to TWCS and still the same behaviour.
If on restarting the expired sstables were not getting removed, I would have considered case of overlapping sstables https://issues.apache.org/jira/browse/CASSANDRA-13418
data read per second
I have a three node cassandra cluster。when I have multi thread querying from the cluster ,the io load is very high.The cluster holds about 80GB data per node.I use time window compact strategy and time-window is ten hour.One sstable is about 1GB.Can some body help me with it.Thank you.one sstable infomation
data is at a speed of 10000 per second .the cluster holds about 10 billion records .
below is the schema information
CREATE TABLE point_warehouse.point_period (
point_name text,
year text,
time timestamp,
period int,
time_end timestamp,
value text,
PRIMARY KEY ((point_name, year), time)
) WITH CLUSTERING ORDER BY (time DESC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy', 'compaction_window_size': '10', 'compaction_window_unit': 'HOURS', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 2592000
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
and the query is SELECT * from POINT_PERIOD where POINT_NAME=? AND YEAR='2017' AND TIME >'2017-05-23 12:53:24 order by time asc LIMIT 1 ALLOW FILTERING'
when execute this query concurrently the io load became extremely high like 200MB/s. thank you .
What is the difference between Varchar and text data type in Cassandra
CQL.
https://docs.datastax.com/en/cql/3.0/cql/cql_reference/cql_data_types_c.html
When I try to create a table with field data type as varchar it is creating as text data type
CREATE TABLE test ( empID int, first_name varchar, last_name
varchar, PRIMARY KEY (empID) );
DESC test table gives me the below result.
CREATE TABLE test (
empid int PRIMARY KEY,
first_name text,
last_name text
) WITH bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
[cqlsh 5.0.1 | Cassandra 3.0.7.1158 | DSE 5.0.0 | CQL spec 3.4.0 |
Native protocol v4]
CQL data types doc, both text and varchar are UTF-8 strings.
So both are one and the same.