I am trying to measure the query execution time for comparison purpose with other sql and NoSQL databases. For this, i have used tracing on command, but can't understand which is the query execution time. I can't find exact information on internet. For table creation, i am using query:
Tracing on;
CREATE TABLE statistics(country_name text, dt date, confirmed_cases bigint, deaths bigint,
PRIMARY KEY(country_name, deaths))with clustering order by (deaths DESC);
cqlsh is showing the result like this:
left
center
right
One
Two
Three
|activity| timestamp | source | source_elapsed | client |
|:---- |:------:| -----:| -----:|
|Execute CQL3 query | 2022-05-10 10:38:06.084000 | 172.24.2.2 | 0 |
41e1:cbdc:b845:42f6:aa06:27ea:d549:3af0
|Parsing CREATE TABLE statistics(country_name text, dt date, confirmed_cases bigint, deaths
bigint, PRIMARY KEY(country_name, deaths))with clustering order by (deaths DESC);
[CoreThread-6] | 2022-05-10 10:38:06.085000 | 172.24.2.2 | 254 |
41e1:cbdc:b845:42f6:aa06:27ea:d549:3af0
|Preparing statement [CoreThread-6] | 2022-05-10 10:38:06.085000 | 172.24.2.2 |
457 | 41e1:cbdc:b845:42f6:aa06:27ea:d549:3af0
Adding to tables memtable [SchemaUpdatesStage:1] | 2022-05-10 10:38:06.092000 | 172.24.2.2
| 8175 | 41e1:cbdc:b845:42f6:aa06:27ea:d549:3af0
Adding to keyspaces memtable [SchemaUpdatesStage:1] | 2022-05-10 10:38:06.092000 |
172.24.2.2 | 8244 | 41e1:cbdc:b845:42f6:aa06:27ea:d549:3af0
Adding to columns memtable [SchemaUpdatesStage:1] | 2022-05-10 10:38:06.092000 | 172.24.2.2
|8320 | 41e1:cbdc:b845:42f6:aa06:27ea:d549:3af0
Request complete | 2022-05-10 10:38:06.141445 | 172.24.2.2 | 57445 |
41e1:cbdc:b845:42f6:aa06:27ea:d549:3af0
So which one is the actual time for execution of table creation query? i also need to trace execution time for insert query and retrieval of highest value of a column by partition. Please help!
Note: The source_elapsed column value is the elapsed time of the event on the source node in microseconds.
source_elapsed is the cumulative execution time on a specific node
"Request complete | 2022-05-10 10:38:06.141445 | 172.24.2.2 | 57445 |
41e1:cbdc:b845:42f6:aa06:27ea:d549:3af0"
Related
I have a table with a structure like this:
CREATE TABLE kaefko.se_vi_f55dfeebae00d2b3 (
value text PRIMARY KEY,
id text,
popularity bigint);
With data that looks like this:
value | id | popularity
--------+------------------+------------
rally | 4eff16cb91f96cd6 | 2
reddit | 11aa39686ed66ba5 | 3
red | 552d7e95af481415 | 1
really | 756bfa499965863c | 1
right | c5850c6b08f7966b | 1
redis | 7f1d251f399442d7 | 1
And I've created a materialized view that should sort these values by the popularity from the biggest to the smallest ones:
CREATE MATERIALIZED VIEW kaefko.se_vi_f55dfeebae00d2b3_by_popularity AS
SELECT *
FROM kaefko.se_vi_f55dfeebae00d2b3
WHERE popularity IS NOT null
PRIMARY KEY (value, popularity)
WITH CLUSTERING ORDER BY (popularity DESC);
But the data in the materialized view looks like this:
value | popularity | id
--------+------------+------------------
rally | 2 | 4eff16cb91f96cd6
reddit | 3 | 11aa39686ed66ba5
really | 1 | 756bfa499965863c
right | 1 | c5850c6b08f7966b
redis | 1 | 7f1d251f399442d7
As you can see there are two main issues:
Data is not sorted as defined in the materialized view
There is just a part of all data in the materialized view
I'm not very experienced in Cassandra and I've already spent hours trying to find the reason why this happens with no avail. Could somebody please help me? Thank you <3
__
I'm using ScyllaDB 4.1.9-0 and cqlsh shows this:
[cqlsh 5.0.1 | Cassandra 3.0.8 | CQL spec 3.3.1 | Native protocol v4]
Alex's comment is 100% correct, the order is within the partition.
PRIMARY KEY (value, popularity)
WITH CLUSTERING ORDER BY (popularity DESC);
This means that the ordering of popularity is descending only for values where the 'value' field is the same - if I was to alter the data you used to show what this would look like as an example, you would get the following:
value | popularity | id
--------+------------+------------------
rally | 3 | 4eff16cb91f96cd6
rally | 2 | 11aa39686ed66ba5
really | 3 | 756bfa499965863c
really | 2 | c5850c6b08f7966b
really | 1 | 7f1d251f399442d7
The order is on a per partition key basis, not globally ordered.
We are designing a twitter like follower/following in Cassandra, and found something similar
from here https://www.slideshare.net/jaykumarpatel/cassandra-at-ebay-13920376/13-Data_Model_simplified_13
so I think ItemLike is a table?
itemid1=>(userid1, userid2...) is a row in the table?
what do you think is the create table of this ItemLike table?
Yes, ItemLike is a table
Schema of the ItemLike table will be Like :
CREATE TABLE itemlike(
itemid bigint,
userid bigint,
timeuuid timeuuid,
PRIMARY KEY(itemid, userid)
);
The picture of the slide is the internal structure of the above table.
Let's insert some data :
itemid | userid | timeuuid
--------+--------+--------------------------------------
2 | 100 | f172e3c0-67a6-11e7-8e08-371a840aa4bb
2 | 103 | eaf31240-67a6-11e7-8e08-371a840aa4bb
1 | 100 | d92f7e90-67a6-11e7-8e08-371a840aa4bb
Internally cassandra will store the data like below :
--------------------------------------------------------------------------------------|
| | 100:timeuuid | 103:timeuuid |
| +---------------------------------------+----------------------------------------|
|2 | f172e3c0-67a6-11e7-8e08-371a840aa4bb | eaf31240-67a6-11e7-8e08-371a840aa4bb |
--------------------------------------------------------------------------------------|
---------------------------------------------|
| | 100:timeuuid |
| +---------------------------------------|
|1 | d92f7e90-67a6-11e7-8e08-371a840aa4bb |
---------------------------------------------|
I'm working on smart parking data stored in Cassandra database and i'm trying to get the last status of each device.
I'm working on self-made dataset.
here's the description of the table.
table description
select * from parking.meters
need help please !
trying to get the last status of each device
In Cassandra, you need to design your tables according to your query patterns. Building a table, filling it with data, and then trying to fulfill a query requirement is a very backward approach. The point, is that if you really need to satisfy that query, then your table should have been designed to serve that query from the beginning.
That being said, there may still be a way to make this work. You haven't mentioned which version of Cassandra you are using, but if you are on 3.6+, you can use the PER PARTITION LIMIT clause on your SELECT.
If I build your table structure and INSERT some of your rows:
aploetz#cqlsh:stackoverflow> SELECT * FROM meters ;
parking_id | device_id | date | status
------------+-----------+----------------------+--------
1 | 20 | 2017-01-12T12:14:58Z | False
1 | 20 | 2017-01-10T09:11:51Z | True
1 | 20 | 2017-01-01T13:51:50Z | False
1 | 7 | 2017-01-13T01:20:02Z | False
1 | 7 | 2016-12-02T16:50:04Z | True
1 | 7 | 2016-11-24T23:38:31Z | False
1 | 19 | 2016-12-14T11:36:26Z | True
1 | 19 | 2016-11-22T15:15:23Z | False
(8 rows)
And I consider your PRIMARY KEY and CLUSTERING ORDER definitions:
PRIMARY KEY ((parking_id, device_id), date, status)
) WITH CLUSTERING ORDER BY (date DESC, status ASC);
You are at least clustering by date (which should be an actual date type, not a text), so that will order your rows in a way that helps you here:
aploetz#cqlsh:stackoverflow> SELECT * FROM meters PER PARTITION LIMIT 1;
parking_id | device_id | date | status
------------+-----------+----------------------+--------
1 | 20 | 2017-01-12T12:14:58Z | False
1 | 7 | 2017-01-13T01:20:02Z | False
1 | 19 | 2016-12-14T11:36:26Z | True
(3 rows)
Let's say I have a table, something like this:
CREATE TABLE Users (
user UUID,
seq INT,
group TEXT,
time BIGINT,
PRIMARY KEY ((user), seq)
);
This follows the desired pattern of Cassandra, with good distribution across partitions (assuming the default Murmur3 hash partitioner).
However, I also need to (rarely) perform range queries on and in time order. This doesn't seem possible in Cassandra. In reality I do need to access the data by group, so (group, time) is acceptable. Since there doesn't seem a way to have secondary index have multiple columns, I guess the right thing is to denormalize, into something like this:
CREATE TABLE UsersByGroupTime (
user UUID,
seq INT,
group TEXT,
time BIGINT,
PRIMARY KEY ((group), time)
) WITH CLUSTERING ORDER BY (time ASC);
This works entirely as it should, except that group is really low cardinality, let's say ('A','B','C'), and uneven distribution across users. Since queries on that table is rare, I'm not worried about hot nodes, but I am worried about uneven distribution, perhaps even a single node getting all.
Is this a common scenario and is there any way to mitigate this or are there alternative solutions?
One technique to help avoid hot-spots in Cassandra time series models, is in making use of a "time bucket." Essentially what you would do is determine the "happy medium" level of time precision that provides adequate data distribution, while also being known and semi-convenient to query by.
For the purposes of this example, I'll choose year and month ("yyyyMM"). Note: I have no idea if year and month will work for you...it's just an example. Once you determine your time bucket, you would add it as an additional partition key, like this:
CREATE TABLE UsersByGroupTime (
user UUID,
seq INT,
group TEXT,
time TIMEUUID,
yearmonth BIGINT,
PRIMARY KEY ((group, yearmonth), time)
) WITH CLUSTERING ORDER BY (time DESC);
After inserting some rows, queries like this will work:
aploetz#cqlsh:stackoverflow2> SELECT group, yearmonth, dateof(time), time, seq, user
FROM usersbygrouptime WHERE group='B' AND yearmonth=201505;
group | yearmonth | dateof(time) | time | seq | user
-------+-----------+--------------------------+--------------------------------------+-----+--------------------------------------
B | 201505 | 2015-05-16 10:04:10-0500 | ceda56f0-fbdc-11e4-bd43-21b264d4c94d | 1 | d57ba8a4-db24-440c-a983-b1dd6b0d2e27
B | 201505 | 2015-05-16 10:04:09-0500 | ce1cac40-fbdc-11e4-bd43-21b264d4c94d | 1 | 66d07cbb-a2ff-4d56-8fa1-14dfaf684474
B | 201505 | 2015-05-16 10:04:08-0500 | cd525760-fbdc-11e4-bd43-21b264d4c94d | 1 | 07b589ac-4d5f-401e-a34f-e3479e269e01
B | 201505 | 2015-05-16 10:04:06-0500 | cc76c470-fbdc-11e4-bd43-21b264d4c94d | 1 | 984f85b5-ea58-4cf8-b512-43abacb227c9
(4 rows)
Now that may or may not help you query-wise, so you will need to spend some time ensuring that you pick an appropriate time bucket. But, this does help in terms of data distribution in the ring, which you can see with the token function:
aploetz#cqlsh:stackoverflow2> SELECT group, yearmonth, token(group,yearmonth)
FROM usersbygrouptime ;
group | yearmonth | token(group, yearmonth)
-------+-----------+-------------------------
A | 201503 | -3784784210711042553
A | 201504 | -610775546464185720
B | 201505 | 6232834565276653514
B | 201505 | 6232834565276653514
B | 201505 | 6232834565276653514
B | 201505 | 6232834565276653514
A | 201505 | 8281745497436252453
A | 201505 | 8281745497436252453
A | 201505 | 8281745497436252453
A | 201505 | 8281745497436252453
A | 201505 | 8281745497436252453
A | 201505 | 8281745497436252453
(12 rows)
Notice how different tokens are generated for each group/yearmonth pair, even though some of them have the same group ("A").
I am trying to evaluate number of tombstones getting created in one of tables in our application. For that I am trying to use nodetool cfstats. Here is how I am doing it:
create table demo.test(a int, b int, c int, primary key (a));
insert into demo.test(a, b, c) values(1,2,3);
Now I am making the same insert as above. So I expect 3 tombstones to be created. But on running cfstats for this columnfamily, I still see that there are no tombstones created.
nodetool cfstats demo.test
Average live cells per slice (last five minutes): 0.0
Average tombstones per slice (last five minutes): 0.0
Now I tried deleting the record, but still I don't see any tombstones getting created. Is there any thing that I am missing here? Please suggest.
BTW a few other details,
* We are using version 2.1.1 of the Java driver
* We are running against Cassandra 2.1.0
For tombstone counts on a query your best bet is to enable tracing. This will give you the in depth history of a query including how many tombstones had to be read to complete it. This won't give you the total tombstone count, but is most likely more relevant for performance tuning.
In cqlsh you can enable this with
cqlsh> tracing on;
Now tracing requests.
cqlsh> SELECT * FROM ascii_ks.ascii_cs where pkey = 'One';
pkey | ckey1 | data1
------+-------+-------
One | One | One
(1 rows)
Tracing session: 2569d580-719b-11e4-9dd6-557d7f833b69
activity | timestamp | source | source_elapsed
--------------------------------------------------------------------------+--------------+-----------+----------------
execute_cql3_query | 08:26:28,953 | 127.0.0.1 | 0
Parsing SELECT * FROM ascii_ks.ascii_cs where pkey = 'One' LIMIT 10000; | 08:26:28,956 | 127.0.0.1 | 2635
Preparing statement | 08:26:28,960 | 127.0.0.1 | 6951
Executing single-partition query on ascii_cs | 08:26:28,962 | 127.0.0.1 | 9097
Acquiring sstable references | 08:26:28,963 | 127.0.0.1 | 10576
Merging memtable contents | 08:26:28,963 | 127.0.0.1 | 10618
Merging data from sstable 1 | 08:26:28,965 | 127.0.0.1 | 12146
Key cache hit for sstable 1 | 08:26:28,965 | 127.0.0.1 | 12257
Collating all results | 08:26:28,965 | 127.0.0.1 | 12402
Request complete | 08:26:28,965 | 127.0.0.1 | 12638
http://www.datastax.com/dev/blog/tracing-in-cassandra-1-2