My question is related to Logs of Queries ran in Cassandra.
I have a cassandra Cluster. Now , if i run any query on it which if takes good amount of time ( say 1 hour ) to completely execute, then is there any way with which I can trace the status of the query and that too without using any cassandra API.
What I found regarding this is that we can turn 'tracing ON;' in Cassandra-cli, and then if I run any query, then I'll get the proper step-by-step status of the query.
For example :
**cqlsh> use demo;
cqlsh:demo> CREATE TABLE test ( a int PRIMARY KEY, b text );
cqlsh:demo> tracing on;
Now tracing requests.
cqlsh:demo> INSERT INTO test (a, b) VALUES (1, 'example');
Unable to complete request: one or more nodes were unavailable.
Tracing session: 4dc5f950-6625-11e3-841a-b7e2b08eed3e
activity | timestamp | source | source_elapsed
--------------------------------------------------------+--------------+----------------+----------------
execute_cql3_query | 13:10:15,627 | 192.168.171.87 | 0
Parsing INSERT INTO test (a, b) VALUES (1, 'example'); | 13:10:15,640 | 192.168.171.87 | 13770
Preparing statement | 13:10:15,657 | 192.168.171.87 | 30090
Determining replicas for mutation | 13:10:15,669 | 192.168.171.87 | 42689
Unavailable | 13:10:15,682 | 192.168.171.87 | 55131
Request complete | 13:10:15,682 | 192.168.171.87 | 55303**
But it does not satisfy my requirement as I need to see the status of any previously ran query.
Please provide any solution.
Thanks
Saurabh
Take a look at the system_traces keyspace events and sessions tables.
Related
I am trying to measure the query execution time for comparison purpose with other sql and NoSQL databases. For this, i have used tracing on command, but can't understand which is the query execution time. I can't find exact information on internet. For table creation, i am using query:
Tracing on;
CREATE TABLE statistics(country_name text, dt date, confirmed_cases bigint, deaths bigint,
PRIMARY KEY(country_name, deaths))with clustering order by (deaths DESC);
cqlsh is showing the result like this:
left
center
right
One
Two
Three
|activity| timestamp | source | source_elapsed | client |
|:---- |:------:| -----:| -----:|
|Execute CQL3 query | 2022-05-10 10:38:06.084000 | 172.24.2.2 | 0 |
41e1:cbdc:b845:42f6:aa06:27ea:d549:3af0
|Parsing CREATE TABLE statistics(country_name text, dt date, confirmed_cases bigint, deaths
bigint, PRIMARY KEY(country_name, deaths))with clustering order by (deaths DESC);
[CoreThread-6] | 2022-05-10 10:38:06.085000 | 172.24.2.2 | 254 |
41e1:cbdc:b845:42f6:aa06:27ea:d549:3af0
|Preparing statement [CoreThread-6] | 2022-05-10 10:38:06.085000 | 172.24.2.2 |
457 | 41e1:cbdc:b845:42f6:aa06:27ea:d549:3af0
Adding to tables memtable [SchemaUpdatesStage:1] | 2022-05-10 10:38:06.092000 | 172.24.2.2
| 8175 | 41e1:cbdc:b845:42f6:aa06:27ea:d549:3af0
Adding to keyspaces memtable [SchemaUpdatesStage:1] | 2022-05-10 10:38:06.092000 |
172.24.2.2 | 8244 | 41e1:cbdc:b845:42f6:aa06:27ea:d549:3af0
Adding to columns memtable [SchemaUpdatesStage:1] | 2022-05-10 10:38:06.092000 | 172.24.2.2
|8320 | 41e1:cbdc:b845:42f6:aa06:27ea:d549:3af0
Request complete | 2022-05-10 10:38:06.141445 | 172.24.2.2 | 57445 |
41e1:cbdc:b845:42f6:aa06:27ea:d549:3af0
So which one is the actual time for execution of table creation query? i also need to trace execution time for insert query and retrieval of highest value of a column by partition. Please help!
Note: The source_elapsed column value is the elapsed time of the event on the source node in microseconds.
source_elapsed is the cumulative execution time on a specific node
"Request complete | 2022-05-10 10:38:06.141445 | 172.24.2.2 | 57445 |
41e1:cbdc:b845:42f6:aa06:27ea:d549:3af0"
I have a table with a structure like this:
CREATE TABLE kaefko.se_vi_f55dfeebae00d2b3 (
value text PRIMARY KEY,
id text,
popularity bigint);
With data that looks like this:
value | id | popularity
--------+------------------+------------
rally | 4eff16cb91f96cd6 | 2
reddit | 11aa39686ed66ba5 | 3
red | 552d7e95af481415 | 1
really | 756bfa499965863c | 1
right | c5850c6b08f7966b | 1
redis | 7f1d251f399442d7 | 1
And I've created a materialized view that should sort these values by the popularity from the biggest to the smallest ones:
CREATE MATERIALIZED VIEW kaefko.se_vi_f55dfeebae00d2b3_by_popularity AS
SELECT *
FROM kaefko.se_vi_f55dfeebae00d2b3
WHERE popularity IS NOT null
PRIMARY KEY (value, popularity)
WITH CLUSTERING ORDER BY (popularity DESC);
But the data in the materialized view looks like this:
value | popularity | id
--------+------------+------------------
rally | 2 | 4eff16cb91f96cd6
reddit | 3 | 11aa39686ed66ba5
really | 1 | 756bfa499965863c
right | 1 | c5850c6b08f7966b
redis | 1 | 7f1d251f399442d7
As you can see there are two main issues:
Data is not sorted as defined in the materialized view
There is just a part of all data in the materialized view
I'm not very experienced in Cassandra and I've already spent hours trying to find the reason why this happens with no avail. Could somebody please help me? Thank you <3
__
I'm using ScyllaDB 4.1.9-0 and cqlsh shows this:
[cqlsh 5.0.1 | Cassandra 3.0.8 | CQL spec 3.3.1 | Native protocol v4]
Alex's comment is 100% correct, the order is within the partition.
PRIMARY KEY (value, popularity)
WITH CLUSTERING ORDER BY (popularity DESC);
This means that the ordering of popularity is descending only for values where the 'value' field is the same - if I was to alter the data you used to show what this would look like as an example, you would get the following:
value | popularity | id
--------+------------+------------------
rally | 3 | 4eff16cb91f96cd6
rally | 2 | 11aa39686ed66ba5
really | 3 | 756bfa499965863c
really | 2 | c5850c6b08f7966b
really | 1 | 7f1d251f399442d7
The order is on a per partition key basis, not globally ordered.
I create a table in Cassandra for monitoring insert from an application.
My partition key is an int composed by year+month+day, my clustering key a timestamp and after that my username and some others fields.
I would like to display the last 5 inserts but it's seems that the partition key go before the "order by desc".
How can I get the correct result ? Normaly clustering key induces the order so why I get this result? (Thank in advance)
Informations :
Query : select tsp_insert, txt_name from ks_myKeyspace.myTable limit 5;
Result :
idt_day | tsp_insert | txt_name
----------+--------------------------+----------
20161028 | 2016-10-28 15:21:09+0000 | Jean
20161028 | 2016-10-28 15:21:01+0000 | Michel
20161028 | 2016-10-28 15:20:44+0000 | Quentin
20161031 | 2016-10-31 09:24:32+0000 | Jacquie
20161031 | 2016-10-31 09:23:32+0000 | Gabriel
Wanted :
idt_day | tsp_insert | txt_name
----------+--------------------------+----------
20161031 | 2016-10-31 09:24:32+0000 | Jacquie
20161031 | 2016-10-31 09:23:32+0000 | Gabriel
20161028 | 2016-10-28 15:21:09+0000 | Jean
20161028 | 2016-10-28 15:21:01+0000 | Michel
20161028 | 2016-10-28 15:20:44+0000 | Quentin
My table :
CREATE TABLE ks_myKeyspace.myTable(
idt_day int,
tsp_insert timestamp,
txt_name text, ...
PRIMARY KEY (idt_day, tsp_insert)) WITH CLUSTERING ORDER BY (tsp_insert DESC);
Ultimately, you are seeing the current order because you are not using a WHERE clause. You can see what's going on if you use the token function on your partition key:
aploetz#cqlsh:stackoverflow> SELECT idt_day,tsp_insert,token(idt_day),txt_name FROM mytable ;
idt_day | tsp_insert | system.token(idt_day) | txt_name
----------+---------------------------------+-----------------------+----------
20161028 | 2016-10-28 15:21:09.000000+0000 | 810871225231161248 | Jean
20161028 | 2016-10-28 15:21:01.000000+0000 | 810871225231161248 | Michel
20161028 | 2016-10-28 15:20:44.000000+0000 | 810871225231161248 | Quentin
20161031 | 2016-10-31 09:24:32.000000+0000 | 5928478420752051351 | Jacquie
20161031 | 2016-10-31 09:23:32.000000+0000 | 5928478420752051351 | Gabriel
(5 rows)
Results in Cassandra CQL will always come back in order of the hashed token value of the partition key (which you can see by using token). Within the partition keys, your CLUSTERING ORDER will be enforced.
That's key to understand... Result set ordering in Cassandra can only be enforced within a partition key. You have no control over the order that the partition keys come back in.
In short, use a WHERE clause on your idt_day and you'll see the order you expect.
It seems to me that you are getting the whole thing wrong. Partition keys are not used for ordering data, they are used only to know the location of your data in the cluster, specifically the node. Moreover, the order really matters inside a partition only...
Your query results really are unpredictable. Depending on which node is faster to answer (assuming a cluster and not a single node), you can get every time a different result. You should try to avoid selecting without partition restrictions, they don't scale.
You can however change your queries and perform one select per day, then you'd query for ordered data (your clustering key) in an ordered manner ( you manually chose the order of the days in your queries). And as a side note it would be faster because you could query multiple partitions in parallel.
I am trying to evaluate number of tombstones getting created in one of tables in our application. For that I am trying to use nodetool cfstats. Here is how I am doing it:
create table demo.test(a int, b int, c int, primary key (a));
insert into demo.test(a, b, c) values(1,2,3);
Now I am making the same insert as above. So I expect 3 tombstones to be created. But on running cfstats for this columnfamily, I still see that there are no tombstones created.
nodetool cfstats demo.test
Average live cells per slice (last five minutes): 0.0
Average tombstones per slice (last five minutes): 0.0
Now I tried deleting the record, but still I don't see any tombstones getting created. Is there any thing that I am missing here? Please suggest.
BTW a few other details,
* We are using version 2.1.1 of the Java driver
* We are running against Cassandra 2.1.0
For tombstone counts on a query your best bet is to enable tracing. This will give you the in depth history of a query including how many tombstones had to be read to complete it. This won't give you the total tombstone count, but is most likely more relevant for performance tuning.
In cqlsh you can enable this with
cqlsh> tracing on;
Now tracing requests.
cqlsh> SELECT * FROM ascii_ks.ascii_cs where pkey = 'One';
pkey | ckey1 | data1
------+-------+-------
One | One | One
(1 rows)
Tracing session: 2569d580-719b-11e4-9dd6-557d7f833b69
activity | timestamp | source | source_elapsed
--------------------------------------------------------------------------+--------------+-----------+----------------
execute_cql3_query | 08:26:28,953 | 127.0.0.1 | 0
Parsing SELECT * FROM ascii_ks.ascii_cs where pkey = 'One' LIMIT 10000; | 08:26:28,956 | 127.0.0.1 | 2635
Preparing statement | 08:26:28,960 | 127.0.0.1 | 6951
Executing single-partition query on ascii_cs | 08:26:28,962 | 127.0.0.1 | 9097
Acquiring sstable references | 08:26:28,963 | 127.0.0.1 | 10576
Merging memtable contents | 08:26:28,963 | 127.0.0.1 | 10618
Merging data from sstable 1 | 08:26:28,965 | 127.0.0.1 | 12146
Key cache hit for sstable 1 | 08:26:28,965 | 127.0.0.1 | 12257
Collating all results | 08:26:28,965 | 127.0.0.1 | 12402
Request complete | 08:26:28,965 | 127.0.0.1 | 12638
http://www.datastax.com/dev/blog/tracing-in-cassandra-1-2
When saving data to cassandra, the performance of 70% of save takes around 4-8 ms. But 30% of the requests take around 80-90 ms. So trying to figure out why some of the requests are taking long. My suspect is it might be going across data center for these requests, but cant confirm it.
Also when using astyanax, we are pinning to the localhost, which will help to connect to the local cassandra coordinator. The primary key used here is a generated UUID.
I would really appreciate if some one can help with this issue.
Write Consistency: CL_ONE
Read Consistency: CL_LOCAL_QUORUM
using Astyanax for java client: 1.56.37
Cassandra version: 1.2.5
Heres the keyspace info:
CREATE KEYSPACE grd WITH replication = {
'class': 'NetworkTopologyStrategy',
'HYWRCA02': '2',
'CHRLNCUN': '2'
};
CREATE TABLE route (
routeid uuid PRIMARY KEY,
allowdynamicstickyness boolean,
businesskey uuid,
createdby text,
createdtimestamp timestamp,
datapartitionkeyselectorref text,
deletedby text,
deletedtimestamp timestamp,
envcontext text,
lockedbyuser text,
partner text,
routelocationlatitudeselector double,
routelocationlongitudeselector double,
routelocationmaxdistanceselector double,
routename text,
sequence int,
serviceidentifier text,
stalenessinmins int,
status text,
stickykeyselector text,
tags set<text>,
type text,
updatedby text,
updatedtimestamp timestamp,
versionmapnameref text,
versionselector text
) WITH
bloom_filter_fp_chance=0.010000 AND
caching='ALL' AND
comment='' AND
dclocal_read_repair_chance=0.000000 AND
gc_grace_seconds=864000 AND
read_repair_chance=0.100000 AND
replicate_on_write='true' AND
populate_io_cache_on_flush='false' AND
compaction={'class': 'LeveledCompactionStrategy'} AND
compression={'sstable_compression': 'SnappyCompressor'};
thanks.
Try to login to one of the nodes of your cluster and execute cqlsh and choose your keyspace using USE:
[root#yournode ~]# cqlsh
Connected to YourCluster at localhost:9160.
[cqlsh 3.1.2 | Cassandra 1.2.6 | CQL spec 3.0.0 | Thrift protocol 19.36.0]
Use HELP for help.
cqlsh> use yourkeyspacehere ;
After this execute a tracing on command on cqlsh:
cqlsh:yourkeyspacehere> tracing on
Now tracing requests.
And then try running different queries on your table with different keys and check the times and IPs of the nodes involved in the query to check if there is an inter-datacenter connection. A sample output might look like this:
select * from your_table_name_is_here limit 1;
Tracing session: 1ab19ff0-2fa3-11e3-a9aa-2face31554b7
activity | timestamp | source | source_elapsed
-------------------------------------------------------------------------------------------------+--------------+----------------+----------------
execute_cql3_query | 22:52:12,528 | XXX.XX.XXX.XXX | 0
Parsing select * from your_table_name_is_here limit 1; | 22:52:12,529 | XXX.XX.XXX.XXX | 1108
Peparing statement | 22:52:12,530 | XXX.XX.XXX.XXX | 1555
Determining replicas to query | 22:52:12,530 | XXX.XX.XXX.XXX | 1643
Message received from /XXX.XX.XXX.XXX | 22:52:12,534 | YYY.YY.YYY.YYY | 34
Enqueuing request to /YYY.YY.YYY.YYY | 22:52:12,536 | XXX.XX.XXX.XXX | 7549
Sending message to /YYY.YY.YYY.YYY | 22:52:12,536 | XXX.XX.XXX.XXX | 7812
Executing seq scan across 9 sstables for [min(-9223372036854775808), max(-8721075978151533877)] | 22:52:12,538 | YYY.YY.YYY.YYY | 3609
Scanned 1 rows and matched 1 | 22:52:12,550 | YYY.YY.YYY.YYY | 15977
Enqueuing response to /XXX.XX.XXX.XXX | 22:52:12,550 | YYY.YY.YYY.YYY | 16035
Sending message to /XXX.XX.XXX.XXX | 22:52:12,550 | YYY.YY.YYY.YYY | 16202
Message received from /YYY.YY.YYY.YYY | 22:52:12,557 | XXX.XX.XXX.XXX | 28494
Processing response from /YYY.YY.YYY.YYY | 22:52:12,557 | XXX.XX.XXX.XXX | 28647
Request complete | 22:52:12,556 | XXX.XX.XXX.XXX | 28884
Hope it helps!