Update child table with different where condition in Jooq - jooq

I have two tables (A & B) A.Id is a foreign key in table B (and a record in table B may or may not exist).
I want to update Table A based on some condition and irrespective of record exists in Table B.
Also want to update table B if it contains a record of A.Id.
How do we do this multiple table update on different condtions in a single execute in Jooq?

In most database products, you can't update multiple tables in a single statement, so you'll have to run separate statements in jOOQ. By "single execute", you probably mean you want to have the experience of a single operation in the jOOQ API (irrespective of how many statements are being generated), but that also doesn't exist.
So, you just run multiple statements:
ctx.update(A)
.set(A.X, someValue)
.where(someCondition)
.execute();
ctx.update(B)
.set(B.Y, someValue)
.where(someCondition)
.execute();
If, for some reason, the single round trip to the server is important to you, you can use the procedural language API in jOOQ to wrap the above in an anonymous block:
ctx.begin(
update(A)
.set(A.X, someValue)
.where(someCondition),
update(B)
.set(B.Y, someValue)
.where(someCondition)
).execute();
Or, you can batch the statements, but they will be separate statements.

Related

Bigquery - Batch run ARRAY_AGG(STRUCT()) to avoid exceeding rescource limit

I am trying to create some nested fields in Bigquery using the ARRAY_AGG(STRUCT()) method, but I am exceeding my resource limit whilst doing so. Is there a way of breaking the query down into batches to overcome this problem?
Example query
SELECT
cust_id,
ARRAY_AGG(
STRUCT(status_s_date,status_e_date,status_desc,current_status_flag,active_flag, price, payment_freq, product_group)
) as status
FROM table1
GROUP BY cust_id
I need all of these fields in the STRUCT but trying to doing so all at the same time for all of the data does not work. Is there a way of doing any of the following? If so, which method is best?
a) Creating mutiple structs and then joining them under a common name?
E.g Run the following script, creating structs 'status1' and 'status2'...
SELECT
cust_id,
ARRAY_AGG(
STRUCT(status_s_date,status_e_date,status_desc,current_status_flag,active_flag)
) as status1,
ARRAY_AGG(
STRUCT(status_s_date,status_e_date, price, payment_freq, product_group)
) as status2,
FROM table1
GROUP BY cust_id
...and then join the two structs on status_s_date and status_e_date to create a single struct called 'Status'
b) Partitioning the original table on status_s_date and then running the ARRAY_AGG(STRUCT() step in batches over the partitioned field.
c) Finally, some of the fields I want to put into the struct are string fields, which I understand take up more resource when nesting. Can I nest their numeric value equivalents and then apply a join to a lookup table afterwards to get their plain-English values?
I am very new to this process so I appreciate some of the above may not make sense. Any help gratefully received.
Regards
An update from me which I would appreciate if someone could confirm.
My original unnested table contained an 'ORDER BY' command, which I believe leads to increased resource requirements in the subsequent nesting process. I have removed the 'ORDER BY' (I'm not sure why it was there in the first place) and the ARRAY_AGG(STRUCT()) process is now running fine.
Is it the case that querying ORDERED tables is more resource intensive?

Cassandra delete/update a row and get its previous value

How can I delete a row from Cassandra and get the value it had just before the deletion?
I could execute a SELECT and DELETE query in series, but how can I be sure that the data was not altered concurrently between the execution of those two queries?
I've tried to execute the SELECT and DELETE queries in a batch but that seems to be not allowed.
cqlsh:foo> BEGIN BATCH
... SELECT * FROM data_by_user WHERE user = 'foo';
... DELETE FROM data_by_user WHERE user = 'foo';
... APPLY BATCH;
SyntaxException: line 2:4 mismatched input 'SELECT' expecting K_APPLY (BEGIN BATCH [SELECT]...)
In my use case I have one main table that stores data for items. And I've build several tables that allow to lookup items based on those informations.
If I delete an item from the main table, I must also remove it from the other tables.
CREATE TABLE items (id text PRIMARY KEY, owner text, liking_users set<text>, ...);
CREATE TABLE owned_items_by_user (user text, item_id text, PRIMARY KEY ((user), item_id));
CREATE TABLE liked_items_by_user (user text, item_id tect, PRIMARY KEY ((user), item_id));
...
I'm afraid the tables might contain wrong data if I delete an item and at the same time someone e.g. hits the like button of that same item.
The deleteItem method execute a SELECT query to fetch the current row of the item from the main table
The likeItem method that gets executed at the same times runs an UPDATE query and inserts the item into the owned_items_by_user, liked_items_by_user, ... tables. This happens after the SELECT statement was executed and the UPDATE query is executed before the DELETE query.
The deleteItem method deletes the items from the owned_items_by_user, liked_items_by_user, ... tables based on the data just retrieved via the SELECT statement. This data does not yet contain the just added like. The item is therefore deleted, but the just added like remains in the liked_items_by_user table.
You can do a select beforehand, then do a lightweight transaction on the delete to ensure that the data still looks exactly like it did when you selected. If it does, you know the latest state before you deleted. If it does not, keep retrying the whole procedure until it sticks.
Unfortunately you cannot do a SELECT query inside a batch statement. If you read the docs here, only insert, update, and delete statements can be used.
What you're looking for is atomicity on the execution, but batch statements are not going to be the way forward. If the data has been altered, your worst case situation is zombies, or data that could reappear.
Cassandra uses a grade period mechanism to deal with this, you can find the details here. If for whatever reason, this is critical to your business logic, the "best" thing you can do in this situation is to increase the consistency level, or restructure the read pattern at application level to not rely on perfect atomicity, whichever the right trade off is for you. So either you give up some of the performance, or tune down the requirement.
In practice, QUORUM should be more than enough to satisfy most situations most of the time. Alternatively, you can do an ALL, and you pay the performance penalty, but that means all replicas for the given foo partition key will have to acknowledge the write both in the commitlog and the memtable. Note, this still means a flush from the commitlog will need to happen before the delete is complete, but you can tune the consistency to the level you require.
You don't have atomicity in the SQL sense, but depending on throughput it's unlikely that you will need it(touch wood).
TLDR:
USE CONSISTENCY ALL;
DELETE FROM data_by_user WHERE user = 'foo';
That should do the trick. The error you're seeing now is basically the ANTLR3 Grammar parser for CQL 3, which is not designed to accept to SELECT queries inside batches simply because they are not supported, you can see that here.

Cassandra changing Primary Key vs Firing multiple select queries

I have a table that stores list products that a user has. The table looks like this.
create table my_keyspace.userproducts{
userid,
username,
productid,
productname,
producttype,
Primary Key(userid)
}
All users belong to a group, there could be min 1 to max 100 users in a group
userid|groupid|groupname|
1 |g1 | grp1
2 |g2 | grp2
3 |g3 | grp3
We have new requirement to display all products for all users in a single group.
So do i change my userproducts so that my Partition Key is now groupid and make userid as my cluster key, so that i get all my results in one single query.
Or do I keep my table design as it is and fire multiple select queries by selecting all users in a group from second table and then fire one select query for each user, consolidate data in my code and then return it to the users
Thanks.
Even before getting to your question, your data modelling as you presented it has a problem: You say that you want to store "a list products that a user has". But this is not what the table you presented has - your table has a single product for each userid. The "userid" is the key of your table, and each entry in the table, i.e, each unique userid, has one combination of the other fields.
If you really want each user to have a list of products, you need the primary key to be (userid, productid). This means that each record is indexed by both a userid and a productid, or in other words - a userid has a list of records each with its own productid. Cassandra allows you to efficiently fetch all the productid records for a single userid because it implements the first part of the key as a "partition key" but the second part is a "clustering key".
Regarding your actual question, you indeed have two options: Either do multiple queries on your original tables, or do so-called denormalization, i.e., create a second table with exactly what you want searchable immediately. For the second option you can either do it manually (update both tables every time you have new data), or let Cassandra update the second table for you automatically, using a feature called Materialized Views.
Which of the two options - multiple queries or multiple updates - to use really depends on your workload. If it has many updates and rare queries, it is better to leave updates quick and make queries slower. If, on the other hand, it has few updates but many queries, it is better to make updates slower (when each update needs to update both tables) but make queries faster. Another important issue is how much query latency is important for you - the multiple queries option not only increases the load on the cluster (which you can solve by throwing more hardware at the problem) but also increases the latency - a problem which does not go away with more hardware and for some use cases may become a problem.
You can also achieve a similar goal in Cassandra by using the Secondary Index feature, which has its own performance characteristics (in some respects it is similar to the "multiple queries" solution).

Selecting from multiple tables in Cassandra CQL

So I have two tables in the query I am using:
SELECT
R.dst_ap, B.name
FROM airports as A, airports as B, routes as R
WHERE R.src_ap = A.iata
AND R.dst_ap = B.iata;
However it is throwing the error:
mismatched input 'as' expecting EOF (..., B.name FROM airports [as] A...)
Is there anyway I can do what I am attempting to do (which is how it works relationally) in Cassandra CQL?
The short answer, is that there are no joins in Cassandra. Period. So using SQL-based JOIN syntax will yield an error similar to what you posted above.
The idea with Cassandra (or any distributed database) is to ensure that your queries can be served by a single node (cutting down on network time). There really isn't a way to guarantee that data from different tables could be queried from a single node. For this reason, distributed joins are typically seen as an anti-pattern. To that end, Cassandra simply doesn't allow them.
In Cassandra you need to take a query-based modeling approach. So you could solve this by building a table from your post-join result set, consisting of desired combinations of dst_ap and name. You would have to find an appropriate way to partition this table, but ultimately you would want to build it based on A) the result set you expect to see and B) the properties you expect to filter on in your WHERE clause.

Do lightweight transactions support delete statement with if exists?

I've read that lightweight transactions just support update and insert statements with an "if" and "if exists" clause. Do they also support delete statement with "if exists" clause.
Eg : create table user(userid text,email text, primary key(email))
delete from user where userid='kris' if exists
Do lightweight transactions support the above delete statement?
Yes, the CQL DELETE statement does support the IF EXISTS clause. From the DELETE documentation:
In Cassandra 2.0.7 and later, you can conditionally delete columns using IF or IF EXISTS. Deleting a column is similar to making an insert or update conditionally. Conditional deletions incur a non-negligible performance cost and should be used sparingly.
However, to Carlo's point, take note of that last sentence. From a performance standpoint, the conditional delete is not free.
The real question is: why do you need it? The compare and set is useful to handle race conditions -- e.g.: I don't want two users to register with same username. In this way a second attempt to register with same username will fail. But why would you delete a data if it exists when two delete operations are idempotent on your data? a delete has implicit IF IT EXISTS condition

Resources