We are using Postgres as part of our backend structure in Nodejs (Using pg). This is a very high multi process environment with a bunch of microservices, where the services query the same table. There is basically a column which functions as a lock 'status' - which value is either 'pending' for unlocked, or 'in-process' for locked.
There are two queries which select data from the table and lock the corresponding rows:
UPDATE market AS m
SET status='in_process', status_update_timestamp='${timestamp}'
WHERE m.guid IN
(SELECT guid FROM market
WHERE status = 'pending'
ORDER BY created_at
LIMIT 1 FOR UPDATE)
RETURNING *
UPDATE market AS m
SET status = 'in_process', status_update_timestamp = '${timestamp}'
WHERE m.guid IN
(SELECT guid FROM market
WHERE request_status='pending' AND asset_id IN (${myArray.join(",")})
FOR UPDATE)
RETURNING *
And one queries which unlocks rows based on guids:
UPDATE market
SET status='pending', status_update_timestamp='${timestamp}'
WHERE guid IN ('${guids.join("','")}')
There are cases where the two selecting queries can block each other, and also cases where the unlocking query and one other selecting query block eachother.
All of these queries can be executed in parallel from multiple services, and even though they are supposed to be atomic according to the documentation (link), we still get an error from postgres that 'deadlock is detected'. We tried wrapping the queries with BEGIN and END, different isolation levels, and different ORDER BYs but still without any improvement.
Is there any problem in the queries that give rise to deadlocks? Is this a problem that have to be solved in the application logic? Any help is welcome.
Table structure:
CREATE TABLE market
(
id BIGSERIAL not null constraint market_pkey primary key,
guid UUID DEFAULT uuid_generate_v4(),
asset_id BIGINT,
created_at TIMESTAMP DEFAULT current_timestamp,
status_update_timestamp TIMESTAMP DEFAULT current_timestamp,
status VARCHAR DEFAULT 'pending'
);
"Atomic" doesn't mean "can't fail". It just means that if it does fail, the whole thing gets rolled back completely.
You could solve the problem in the app by catching the deadlock errors and retrying them.
Perhaps you could redesign your transactions to be less prone to deadlock, but without knowing the rationale behind each query it is hard to suggest how you would go about doing that.
Related
I am facing timeout issue while executing query on Cassandra database. We have tried increasing the read timeout fields "read_request_timeout_in_ms", "range_request_timeout_in_ms" in cassandra.yaml, but still query timesout in 10secs.
Is there anyway we can increase the timeout value to 1-2 mins ?
Sample Product Table Schema:
- product_id string (primary key)
- product_name string
- created_on timestamp (secondary index)
- updated_on timestamp
Requirement: I want to query all the product which are created a particular day using 'created_on' field.
Sample Query: select * from "Product" where created_on > 1632906232 AND created_on < 1632906232
Note: Query uses the secondary index field in filter.
Environment details: Cassandra database with 2 node cluster setup.
The underlying problem is that range queries is expensive which is why it takes so long to complete. By the way, it looks like you posted the wrong query because you have the same value.
The default timeouts are in place to prevent nodes from getting overloaded by expensive queries so they don't go down. Increasing the server-side timeouts is not the right approach. And in your case, it's most likely the client-side timeout getting triggered.
You need to review your data model and create table instead partitioned by the creation date so it will perform better. Cheers!
How can I delete a row from Cassandra and get the value it had just before the deletion?
I could execute a SELECT and DELETE query in series, but how can I be sure that the data was not altered concurrently between the execution of those two queries?
I've tried to execute the SELECT and DELETE queries in a batch but that seems to be not allowed.
cqlsh:foo> BEGIN BATCH
... SELECT * FROM data_by_user WHERE user = 'foo';
... DELETE FROM data_by_user WHERE user = 'foo';
... APPLY BATCH;
SyntaxException: line 2:4 mismatched input 'SELECT' expecting K_APPLY (BEGIN BATCH [SELECT]...)
In my use case I have one main table that stores data for items. And I've build several tables that allow to lookup items based on those informations.
If I delete an item from the main table, I must also remove it from the other tables.
CREATE TABLE items (id text PRIMARY KEY, owner text, liking_users set<text>, ...);
CREATE TABLE owned_items_by_user (user text, item_id text, PRIMARY KEY ((user), item_id));
CREATE TABLE liked_items_by_user (user text, item_id tect, PRIMARY KEY ((user), item_id));
...
I'm afraid the tables might contain wrong data if I delete an item and at the same time someone e.g. hits the like button of that same item.
The deleteItem method execute a SELECT query to fetch the current row of the item from the main table
The likeItem method that gets executed at the same times runs an UPDATE query and inserts the item into the owned_items_by_user, liked_items_by_user, ... tables. This happens after the SELECT statement was executed and the UPDATE query is executed before the DELETE query.
The deleteItem method deletes the items from the owned_items_by_user, liked_items_by_user, ... tables based on the data just retrieved via the SELECT statement. This data does not yet contain the just added like. The item is therefore deleted, but the just added like remains in the liked_items_by_user table.
You can do a select beforehand, then do a lightweight transaction on the delete to ensure that the data still looks exactly like it did when you selected. If it does, you know the latest state before you deleted. If it does not, keep retrying the whole procedure until it sticks.
Unfortunately you cannot do a SELECT query inside a batch statement. If you read the docs here, only insert, update, and delete statements can be used.
What you're looking for is atomicity on the execution, but batch statements are not going to be the way forward. If the data has been altered, your worst case situation is zombies, or data that could reappear.
Cassandra uses a grade period mechanism to deal with this, you can find the details here. If for whatever reason, this is critical to your business logic, the "best" thing you can do in this situation is to increase the consistency level, or restructure the read pattern at application level to not rely on perfect atomicity, whichever the right trade off is for you. So either you give up some of the performance, or tune down the requirement.
In practice, QUORUM should be more than enough to satisfy most situations most of the time. Alternatively, you can do an ALL, and you pay the performance penalty, but that means all replicas for the given foo partition key will have to acknowledge the write both in the commitlog and the memtable. Note, this still means a flush from the commitlog will need to happen before the delete is complete, but you can tune the consistency to the level you require.
You don't have atomicity in the SQL sense, but depending on throughput it's unlikely that you will need it(touch wood).
TLDR:
USE CONSISTENCY ALL;
DELETE FROM data_by_user WHERE user = 'foo';
That should do the trick. The error you're seeing now is basically the ANTLR3 Grammar parser for CQL 3, which is not designed to accept to SELECT queries inside batches simply because they are not supported, you can see that here.
I have a table in Cassandra where i am storing events as they are coming in , different processing are done on the events at different stages. The events are entered into the table with the event occurrence time. I need to get all the events whose event time is less than a certain time and do some processing on them. As its a select range query and its invariably will use scatter gather. Can some one suggest best way to do this. This process is going to happen in every 5 secs and scatter gather happening in Cassandra happening frequently is not a good idea as its an overhead on Cassandra itself which will degrade my overall application Performance.
The table is as below:
PAS_REQ_STAGE (PartitionKey = EndpointID, category ; clusterkey= Automation_flag,alertID)
AlertID
BatchPickTime: Timestamp
Automation_Threshold
ResourceID
ConditionID
category
Automation_time: Timestamp
Automation_flag
FilterValue
Eventtime which i have referred above is the BatchPickTime..
A scheduler wakes up at regular interval and gets all the records whose BatchPickTime is Less than the current scheduler wakeup time and sweeps them off from the table to process them.
Because of this usecase i cannot provide any specific Partition key for the query as it will have to get all data which has expired and is less than the current scheduler wake-up time.
Hi and welcome to Stackoverflow.
Please post your schema and maybe some example code with your question - you can edit it :)
The Cassandra-way of doing this is to denormalize data if necessary and build your schema around your queries. In your case I would suggest putting your events in to a table together with a time bucket:
CREATE TABLE events (event_source int, bucket timestamp,
event_time timestamp, event_text text PRIMARY KEY ((event_source, bucket),event_time));
The reason for this is that it is very efficent in cassandra to select a row by its so called partition key (in this example (event_source, bucket)) as such a query hits only one node. The reminder of the primary key is called clustering columns and defines the order of data, here all events for a day inside the bucket are sorted by event_time.
Try to model your event table in a way that you do not need to make multiple queries. There is a good and free data modeling course from DataStax available: https://academy.datastax.com/resources/ds220-data-modeling
One note - be careful when using cassandra as queue - this is maybe an antipattern and you might be better of with a message queue as ActiveMQ or RabbitMQ or similar.
I'm facing a dilemma that my small knowledge of Cassandra doesn't allow me to solve.
I have a index table used to retrieve data from an item (a notification) using an external id. However, the data contained in that table (in that case the status of the notification) is modified so I need to update the index table as well. Here is the tables design:
TABLE notification_by_external_id (
external_id text,
partition_key_date text,
id uuid,
status text,
...
PRIMARY KEY (external_id, partition_key_date, id)
);
TABLE notification (
partition_key_date text,
status text,
id uuid,
...
PRIMARY KEY (partition_key_date, status, id)
);
The problem is that when I want to update the notification status (and hence the notification_by_external_id table), I don't have access to the external ID.
So far I came up to 2 solutions, none of which seems optimal, and I can't decide which one to go with.
Solution 1
Create an index on notification_by_external_id.id, but this will obviously be a high cardinality column. There can be several external IDs for each notifications, but we're talking about something around 5-10 to one top.
Solution 2
Create a table
TABLE external_id_notification (
notification_id uuid,
external_id text
PRIMARY KEY (notification_id, external_id)
);
but that would mean making one extra read operation (and of course maintain another table) which I understood is also a bad practice.
The thing to understand about secondary indexes is, that their scalability issue is not with the number of rows in the table, but with the amount of nodes in your cluster. A select on an index column means that every single node will have to process it and respond to it, just that it itself will be able to process the select efficiently.
Use secondary indexes for administrative purposes (i.e. you on cqlsh) only. Do not use it for productive purposes.
That being said. You could duplicate all the information into your external_id_notification table. That would alleviate the need for an extra read operation. I know that relational databases taught you, that duplicate data is bad (what if it differs?), and that you should always normalize. But you are not on a relational database. Denormalization is a thing, and on Cassandra, you should always go for that, unless you absolutely cannot.
I have a table with the following schema.
CREATE TABLE IF NOT EXISTS group_friends(
groupId timeuuid,
friendId bigint,
time bigint,
PRIMARY KEY(groupId,friendId));
I need to keep a track of time if any changes happen in a group (such changing the group name or adding a new friend in table etc.). So I need to update the value of time field by groupId every time there is any change in any related table.
As update in cassandra requires mentioning all primary keys in where clause this query will not run.
update group_friends set time = 123456 where groupId = 100;
So I can do something like this.
update group_friends set time=123456 where groupId=100 and friendId in (...);
But it is showing the following error-->
[Invalid query] message="Invalid operator IN for PRIMARY KEY part friendid"
Is there any way to perform an update operation using IN operator in clustering column? If not then what are the possible ways to do this?
Thanks in advance.
Since friendId is a clustering column, a batch operation is probably a reasonable and well performing choice in this case since all updates would be made in the same partition (assuming you are using the same group id for the update). For example, with the java driver you could do the following:
Cluster cluster = new Cluster.Builder().addContactPoint("127.0.0.1").build();
Session session = cluster.connect("friends");
PreparedStatement updateStmt = session.prepare("update group_friends set time = ? where groupId = ? and friendId = ?");
long time = 123456;
UUID groupId = UUIDs.startOf(0);
List<Long> friends = Lists.newArrayList(1L, 2L, 4L, 8L, 22L, 1002L);
BatchStatement batch = new BatchStatement(BatchStatement.Type.UNLOGGED);
for(Long friendId : friends) {
batch.add(updateStmt.bind(time, groupId, friendId));
}
session.execute(batch);
cluster.close();
The other advantage of this is that since the partition key can be inferred from the BatchStatement, the driver will use token-aware routing to send a request to a replica that would own this data, skipping a network hop.
Although this will effectively be a single write, be careful with the size of your batches. You should take care not to make it too large.
In the general case, you can't really go wrong by executing each statement individually instead of using a batch. The CQL transport allows many requests on a single connection and are asynchronous in nature, so you can have many requests going on at a time without the typical performance cost of a request per connection.
For more about writing data in batch see: Cassandra: Batch loading without the Batch keyword
Alternatively, there may be an even easier way to accomplish what you want. If what you are really trying to accomplish is to maintain a group update time and you want it to be the same for all friends in the group, you can make time a static column. This is a new feature in Cassandra 2.0.6. What this does is shares the column value for all rows in the groupId partition. This way you would only have to update time once, you could even set the time in the query you use to add a friend to the group so it's done as one write operation.
CREATE TABLE IF NOT EXISTS friends.group_friends(
groupId timeuuid,
friendId bigint,
time bigint static,
PRIMARY KEY(groupId,friendId)
);
If you can't use Cassandra 2.0.6+ just yet, you can create a separate table called group_metadata that maintains the time for a group, i.e.:
CREATE TABLE IF NOT EXISTS friends.group_metadata(
groupId timeuuid,
time bigint,
PRIMARY KEY(groupId)
);
The downside here being that whenever you want to get at this data you need to select from this table, but that seems manageable.