I have a following table structure in Cassandra :
CREATE TABLE ssession (
sessionid text PRIMARY KEY,
session_start_time timestamp,
updated_time timestamp
);
session_start_time is time when a particular session becomes active and update_time is time till the user will be doing some activity.Here, sessionid and session_start_time will be inserted once and the updated_time keeps updating as the user is active.
I want to include only sessionid as the primary key.
Normal update statement will be :
UPDATE ssession SET session_start_time = '2015-07-31 10:43:13+0530',
updated_time = '2015-07-31 10:43:13+0530' WHERE sessionid = '22_865624098';
Here, first time I'll insert same session_start_time and updated_time .But from next time I'll have to only update the updated_time.
And I need a single query to do so.Since, I'll be getting data continuously (using storm to process the data).
Is there any way to achieve this ?
When you INSERT or UPDATE data (updates and inserts are the same in cassandra) you do not need to provide all columns. If you just want to update the updated_time, your query should be:
UPDATE ssession SET updated_time = '2015-07-31 10:43:13+0530' WHERE sessionid = '22_865624098';
But it sounds like you want to make sure that session_start_time is set the first time that sessionid is created and only the first time, correct?
What you could make use of lightweight transactions and if not exists to create the data with session_start_time. If there is already a row with that session_id, the insert will not be applied:
INSERT INTO ssession (sessionid, session_start_time, updated_time) values ('22_865624098', '2015-07-31 10:43:13+0530', '2015-07-31 10:43:13+0530') if not exists;
Cassandra returns a column [applied] in this case with a value of true or false if the insert was applied. If false is returned, you can then simply run an update query that only updates updated_time:
UPDATE ssession set updated_time = '2015-07-31 10:43:14+0531' where sessionid = '22_865624098';
Note that lightweight transactions introduce some performance cost which is detailed in the article I linked above. It uses 'SERIAL' consistency level which is a multi-phase QUORUM. This is also a 'read-then-write' pattern, which is not going to be as fast as blindly writing the data. You should test the performance of this solution and see if it is adequate for you.
Related
I have Following Data Model :-
campaigns {
id int PRIMARY KEY,
scheduletime text,
SchduleStartdate text,
SchduleEndDate text,
enable boolean,
actionFlag boolean,
.... etc
}
Here i need to fetch the data basing on start date and end data with out ALLOW FILTERING .
I got more suggestions to re-design schema to full fill the requirement But i cannot filter the data basing on id since i need the data in b/w the dates .
Some one give me a good suggestion to full fill this scenario to execute Following Query :-
select * from campaings WHERE startdate='XXX' AND endDate='XXX' ; // With out Allow Filtering thing
CREATE TABLE campaigns (
SchduleStartdate text,
SchduleEndDate text,
id int,
scheduletime text,
enable boolean,
PRIMARY KEY ((SchduleStartdate, SchduleEndDate),id));
You can make the below queries to the table,
slect * from campaigns where SchduleStartdate = 'xxx' and SchduleEndDate = 'xx'; -- to get the answer to above question.
slect * from campaigns where SchduleStartdate = 'xxx' and SchduleEndDate = 'xx' and id = 1; -- if you want to filter the data again for specific ids
Here the SchduleStartdate and SchduleEndDate is used as the Partition Key and the ID is used as the Clustering key to make sure the entries are unique.
By this way, you can filter based on start, end and then id if needed.
One downside with this will be if you only need to filter by id that wont be possible as you need to first restrict the partition keys.
In the process of learning Cassandra and using it on a small pilot project at work. I've got one table that is filtered by 3 fields:
CREATE TABLE webhook (
event_id text,
entity_type text,
entity_operation text,
callback_url text,
create_timestamp timestamp,
webhook_id text,
last_mod_timestamp timestamp,
app_key text,
status_flag int,
PRIMARY KEY ((event_id, entity_type, entity_operation))
);
Then I can pull records like so, which is exactly the query I need for this:
select * from webhook
where event_id = '11E7DEB1B162E780AD3894B2C0AB197A'
and entity_type = 'user'
and entity_operation = 'insert';
However, I have an update query to set the record inactive (soft delete), which would be most convenient by partition key in the same table. Of course, this isn't possible:
update webhook
set status_flag = 0
where webhook_id = '11e8765068f50730ac964b31be21d64e'
An example of why I'd want to do this, is a simple DELETE from an API endpoint:
http://myapi.com/webhooks/11e8765068f50730ac964b31be21d64e
Naturally, if I update based on the composite key, I'd potentially inactivate more records than I intend to.
Seems like my only choice, doing it the "Cassandra Way", is to use two tables; the one I already have and one to track status_flag by webhook_id, so I can update based on that id. I'd then have to select by webhook_id in the first table and disable it there as well? Otherwise, I'd have to force users to pass all the compound key values in the URL of the API's DELETE request.
Simple things you take for granted in relational data, seem to get complex very quickly in Cassandraland. Is this the case or am I making it more complicated than it really is?
You can add webhook to your primary key.
So your table defination becomes somethign like this.
CREATE TABLE webhook (
event_id text,
entity_type text,
entity_operation text,
callback_url text,
create_timestamp timestamp,
webhook_id text,
last_mod_timestamp timestamp,
app_key text,
status_flag int,
PRIMARY KEY ((event_id, entity_type, entity_operation),webhook_id)
Now lets say you insert 2 records.
INSERT INTO dev_cybs_rtd_search.webhook(event_id,entity_type,entity_operation,status_flag,webhook_id) VALUES('11E7DEB1B162E780AD3894B2C0AB197A','user','insert',1,'web_id');
INSERT INTO dev_cybs_rtd_search.webhook(event_id,entity_type,entity_operation,status_flag,webhook_id) VALUES('12313131312313','user','insert',1,'web_id_1');
And you can update like following
update webhook
set status_flag = 0
where webhook_id = 'web_id' AND event_id = '11E7DEB1B162E780AD3894B2C0AB197A' AND entity_type = 'user'
AND entity_operation = 'insert';
It will only update 1 record.
However you have to send all the things defined in your primary key.
My application needs to get some basic data from a user table with primary key user_id - and various other data about the user from secondary tables, each of which has user_id as a foreign key. There are a bunch of these secondary tables such as name, addresss, phone, etcetera - things about a person that can change over time.
More specifically, I need only some values from the most recent row from each secondary table. Each table has a "latest" column which is unix timestamp of the most recent UPDATE or INSERT (we must not delete in this application).
The following works correctly:
SELECT u.username, u.user_id, u.password, u.email, u.active
, n.first , n.middle , n.last
, uo.organization_id /* , other_cols_from_other_tables */
FROM user u
LEFT JOIN user_org uo ON (uo.user_id = u.user_id AND
uo.latest in (select max(latest) from name uo1
where uo1.user_id = u.user_id))
/* here, other LEFT JOINs like the above one */
WHERE u.username = :username
However, a subquery solution is widely discouraged due to slowness, and some of these queries will run on every request. So I came up with the following that works in some cases and gets rid of the subquery:
SELECT u.username, u.user_id, u.password, u.email, u.active
, n.first , n.middle , n.last
, uo.organization_id /* , other_cols_from_other_tables, etc. */
FROM user u
INNER JOIN
( SELECT user_id, MAX(latest) utd
FROM user_org
GROUP BY user_id
) uo1 ON uo1.user_id = u.user_id
LEFT JOIN user_org uo
ON (uo.user_id = u.user_id and uo.latest = uo1.utd)
/* here, other clauses like the part from 'FROM' to here */
WHERE u.username = :username
The latter, unfortunately makes a hard dependence on data in the secondary table, so that the whole query fails if data is lacking in any secondary table for the particular user.
I've researched this on SO and www and there are many solutions for avoiding subqueries, but everything I've found on the subject has the issue in the main query, not in a left join.
The logic I need is "if there's data for this user in this secondary table, get the specified column(s) from the most recent row in that table, otherwise a null".
It seems to me that putting a "current row" marker column on the most recent row in each table would avoid the whole issue and run faster than any other solution, but would be against normalization (I would still have to have the 'latest' column to maintain order-able history of previous data).
Is there a solution that gets normalization + speed? This is mariadb so it needs Mysql syntax.
EDIT: Still would like a better way, but decided to go with the extra column. Now the problem described above is avoided, and the SELECT SQL is much simplified and presumably faster. The downside is adding complexity in saves, but SELECT is more frequent.
MariaDB supports ROW_NUMBER as of version 10.2:
SELECT
u.username,
u.user_id,
u.password,
u.email,
u.active,
uo.organization_id,
...
FROM user u
LEFT JOIN
(
select
user_org.*,
row_number() over(partition by user_id order by latest desc) as rn
from user_org
) uo ON uo.user_id = u.user_id AND uo.rn = 1
...
WHERE u.username = :username;
I am seeing some interesting behavior with Cassandra lightweight transactions in a standalone Cassandra instance. I am using DataStax Enterprize 5.0.2 for my testing. The issue is, a table being updated using lightweight transaction is returning true, which means it is updated. But a subsequent query on the same table shows that the row is NOT updated. Please note that I tried the same in a clustered environment and it worked absolutely fine! So I am just trying to understand what's going wrong in my environment.
Here is a simple example of what I am doing.
I create a simple table as provided below:
CREATE TABLE smart.TOPICSCONSUMERASSNSTATUS (
TOPNM text,
PARTID int,
STATUS text,
PRIMARY KEY (TOPNM,PARTID)
);
I I put the following set of preload data for testing purpose:
insert into smart.topicsconsumerassnstatus (topnm, partid, status) values ('ESP', 0, 'UNASSIGNED');
insert into smart.topicsconsumerassnstatus (topnm, partid, status) values ('ESP', 1, 'UNASSIGNED');
insert into smart.topicsconsumerassnstatus (topnm, partid, status) values ('ESP', 2, 'UNASSIGNED');
insert into smart.topicsconsumerassnstatus (topnm, partid, status) values ('ESP', 3, 'UNASSIGNED');
insert into smart.topicsconsumerassnstatus (topnm, partid, status) values ('ESP', 4, 'UNASSIGNED');
Now, I put the first select statement to get the details from the table:
select * from smart.topicsconsumerassnstatus where topnm='ESP';
It lists all partids with status UNASSIGNED. To assign partid 0, I then fire the following update statement:
update smart.topicsconsumerassnstatus set status='ASSIGNED' where topnm='ESP' and partid=0 if status='UNASSIGNED';
It returns true. And now, when I fire the above select query again, it lists all 5 rows with status UNASSIGNED. Interestingly, repeated execution of the update statement keeps returning true all the time - this clearly mean that actual data is not getting updated in the table.
I have seen the query trace as well and the update seems to be working fine as CAS is returned successful.
Also note that this behavior is seen explicitly when a query is used with "allow filtering" at least once, and from then on...
Can anyone please put some lights on what could be the issue? Is it something to do with "allow filtering" clause?
I have a issue with my CQL and cassandra is giving me no viable alternative at input '(' (...WHERE id = ? if [(]...) error message. I think there is a problem with my statement.
UPDATE <TABLE> USING TTL 300
SET <attribute1> = 13381990-735b-11e5-9bed-2ae6d3dfc201
WHERE <attribute2> = dfa2efb0-7247-11e5-a9e5-0242ac110003
IF (<attribute1> = null OR <attribute1> = 13381990-735b-11e5-9bed-2ae6d3dfc201) AND <attribute3> = 0;
Any idea were the problem is in the statement about?
It would help to have your complete table structure, so to test your statement I made a couple of educated guesses.
With this table:
CREATE TABLE lwtTest (attribute1 timeuuid, attribute2 timeuuid PRIMARY KEY, attribute3 int);
This statement works, as long as I don't add the lightweight transaction on the end:
UPDATE lwttest USING TTL 300 SET attribute1=13381990-735b-11e5-9bed-2ae6d3dfc201
WHERE attribute2=dfa2efb0-7247-11e5-a9e5-0242ac110003;
Your lightweight transaction...
IF (attribute1=null OR attribute1=13381990-735b-11e5-9bed-2ae6d3dfc201) AND attribute3 = 0;
...has a few issues.
"null" in Cassandra is not similar (at all) to its RDBMS counterpart. Not every row needs to have a value for every column. Those CQL rows without values for certain column values in a table will show "null." But you cannot query by "null" since it isn't really there.
The OR keyword does not exist in CQL.
You cannot use extra parenthesis to separate conditions in your WHERE clause or your lightweight transaction.
Bearing those points in mind, the following UPDATE and lightweight transaction runs without error:
UPDATE lwttest USING TTL 300 SET attribute1=13381990-735b-11e5-9bed-2ae6d3dfc201
WHERE attribute2=dfa2efb0-7247-11e5-a9e5-0242ac110003
IF attribute1=13381990-735b-11e5-9bed-2ae6d3dfc201 AND attribute3=0;
[applied]
-----------
False