We are trying to create an algorithm/heuristic that will schedule a delivery at a certain time period, but there is definitely a race condition here, whereby two conflicting scheduled items could be written to the DB, because the write is not really atomic.
The only way to truly prevent race conditions is to create some atomic insert operation, TMK.
The server receives a request to schedule something for a certain time period, and the server has to check if that time period is still available before it writes the data to the DB. But in that time the server could get a similar request and end up writing conflicting data.
How to circumvent this? Is there some way to create some script in the DB itself that hooks into the write operation to make the whole thing atomic? By putting a locking mechanism on that script? What makes the whole thing non-atomic is the read and the wire time between the server and the DB.
Whenever I run into race condition I think of one immediate solution QUEUE.
Step 1) What you can do is that instead of adding data to a database directly you can add it to queue without checking anything.
Step 2) A separate reader will read from the queue check DB for any conflict and take necessary action.
This is one of the ways to solve this If you implement any better solution please do share it.
Hope that helps
Related
Premise: I have a calendar-like system that allows the creation/deletion of 'events' at a scheduled time in the future. The end goal is to perform an action (send message/reminder) prior to & at the start of the event. I've done a bit of searching & have narrowed down to what seems to be my two most viable choices
Unix Cron Jobs
Bree
I'm not quite sure which will best suit my end goal though, and additionally, it feels like there must be some additional established ways to do things like this that I just don't have proper knowledge of, or that I'm entirely skipping over.
My questions:
If, theoretically, the system were to be handling an arbitrarily large amount of 'events', all for arbitrary times in the future, which of these options is more practical system-resource-wise? Is my concern in this regard even valid?
Is there any foreseeable problem with filling up a crontab with a large volume of jobs - or, in bree's case, scheduling a large amount of jobs?
Is there a better idea I've just completely missed so far?
This mainly stems from bree's use of node 'worker threads'. I'm very unfamiliar with this concept
and concerned that since a 'worker thread' is spawned per every job, I could very quickly tie up all of my available threads and grind... something, to a halt. This, however, sounds somewhat silly & possibly wrong(possibly indicative of my complete lack of knowledge here), & thus, my question.
Thanks, Stark.
For a calendar-like system, it seems you could query your database to find all events occuring in the next hour, then create a setTimeout() for each one of those. Then, an hour later, do the same thing again. Then, upon any server restart, do the same thing again. You don't really need to worry about events that aren't imminent. They can just sit in the database until shortly before their time. You will just need an efficient way to query the database to find events that are imminent and user a timer for them.
WorkerThreads are fairly heavy weight items in nodejs as they create a whole separate heap and a whole new instance of a V8 interpreter. You would definitely not want a separate WorkerThread for each event.
I should add that timers in nodejs are very lightweight items and it is not problem to have lots of them. They are just stored in a sorted linked list and only the insertion of a new timer takes a little bit more time (to do an insertion sort as it is added to the list) as the list gets longer. There is no continuous run-time overhead because there are lots of timers. The event loop, then just checks the first item in the linked list to see if it's time yet for the next timer to fire. If so, it removes it from the head of the list and calls its callback. If not, it goes about the rest of the event loop work items and will check the first item in the list again the next through the event loop.
I have a Node.js web app with a route that marks some entity as deleted - flipping boolean field in a database. This route returns that entity. Right now I have code that looks like this:
UPDATE entity SET is_deleted=true WHERE entity.id = ?
SELECT * FROM entity WHERE entity.id = ?
For the moment I can't use RETURNING statement for other reasons.
So I got in the argument with colleague, I think that putting both UPDATE and SELECT inside transaction is unnecessary, because we are not doing anything significant with data, just returning it. As a user of the app I would expect that data that is returned is as fresh as possible, meaning that I would get same results on page refresh.
My question is, what is the best practice regarding reading data after write? Do you always wrap reading with writing inside transaction? Or it depends?
Well, for performance reasons you want to keep your transactions as small and quick as possible. This will minimize the chance to have potential locks and deadlocks that could bring your application to its knees. As such, unless there is a very good reason to do so, keep your select statements outside of the transaction. This is specially important if your need to execute a long running select statement. By putting the select inside the transaction, you keep the update locks much longer than needed.
Here is the nice article which describes what is ES and how to deal with it.
Everything is fine there, but one image is bothering me. Here it is
I understand that in distributed event-based systems we are able to achieve eventual consistency only. Anyway ... How do we ensure that we don't book more seats than available? This is especially a problem if there are many concurrent requests.
It may happen that n aggregates are populated with the same amount of reserved seats, and all of these aggregate instances allow reservations.
I understand that in distributes event-based systems we are able to achieve eventual consistency only, anyway ... How to do not allow to book more seats than we have? Especially in terms of many concurrent requests?
All events are private to the command running them until the book of record acknowledges a successful write. So we don't share the events at all, and we don't report back to the caller, without knowing that our version of "what happened next" was accepted by the book of record.
The write of events is analogous to a compare-and-swap of the tail pointer in the aggregate history. If another command has changed the tail pointer while we were running, our swap fails, and we have to mitigate/retry/fail.
In practice, this is usually implemented by having the write command to the book of record include an expected position for the write. (Example: ES-ExpectedVersion in GES).
The book of record is expected to reject the write if the expected position is in the wrong place. Think of the position as a unique key in a table in a RDBMS, and you have the right idea.
This means, effectively, that the writes to the event stream are actually consistent -- the book of record only permits the write if the position you write to is correct, which means that the position hasn't changed since the copy of the history you loaded was written.
It's typical for commands to read event streams directly from the book of record, rather than the eventually consistent read models.
It may happen that n-AggregateRoots will be populated with the same amount of reserved seats, it means having validation in the reserve method won't help, though. Then n-AggregateRoots will emit the event of successful reservation.
Every bit of state needs to be supervised by a single aggregate root. You can have n different copies of that root running, all competing to write to the same history, but the compare and swap operation will only permit one winner, which ensures that "the" aggregate has a single internally consistent history.
There are going to be a couple of ways to deal with such a scenario.
First off, an event stream would have the current version as the version of the last event added. This means that when you would not, or should not, be able to persist the event stream if the event stream is not at the version when loaded. Since the very first write would cause the version of the event stream to be increased, the second write would not be permitted. Since events are not emitted, per se, but rather a result of the event sourcing we would not have the type of race condition in your example.
Well, if your commands are processed behind a queue any failures should be retried. Should it not be possible to process the request you would enter the normal "I'm sorry, Dave. I'm afraid I can't do that" scenario by letting the user know that they should try something else.
Another option is to start the processing by issuing an update against some table row to serialize any calls to the aggregate. Probably not the most elegant but it does cause a system-wide block on the processing.
I guess, to a large extent, one cannot really trust the read store when it comes to transactional processing.
Hope that helps :)
I am load testing my node.js application. At some point I reach state where requests are pending and my best guess it's because of a locked transaction. This is the last log statement:
SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ;
And in pg_lock I've got 4 rows with the above query which are GRANTED = true, with mode ExclusiveLock.
Where should I start looking for a bug?
If in this locking request I make there are a lot of insert and update operations, should the isolation level be REPEATABLE READ?
Is there any way to debug/process that kind of situations?
Is there any mechanism to timeout that locks so app can be easily/automatically released and not blocking further requests?
Side question (since I'm not looking for a tool directly): are there any tools to monitor and spot that kind of situations? (I was hoping to use Munin.)
I am using nodejs 4.2.1 with express 4.13.3, sequelize 3.19.3 as Postgres 9.4.1 ORM.
Welcome to PostgreSQL transaction locks hell :)
You can spend a lot of time trying to figure out where exactly the lock happens and why. But there is a very little chance that it will help you in resolving the situation.
The general recipe for solving this kind of situations is as follows:
Keep your transactions size to the bare minimum required by the business logic of your application. For example, avoid same-type inserts or updates, replacing them with multi-row analogues, because query IO is expensive
Do not use transactions while executing only a single query that modifies data, i.e. avoid unnecessary transactions.
Implement error handling that can determine a transaction lock and provide a repeated attempt at executing the transaction. Logging such repeats will help you understand weak spots of your system and how to redesign it better.
Even in a well-engineered system the last step often becomes a necessity, don't let it scare you ;)
I encountered a similar situation where I started 5 parallell transactions requesting the same update lock, and the first one also continued with work that required more postgres calls. The entire system deadlocks, and the first transaction is listed as idle in transaction in pg_stat_activity and granted access to all locks it has requested in pg_locks.
What I think is happening;
The first transaction got the lock granted, and then finished the query. After this it drops its connection to postgres.
The following 4 transactions open a connection each and blocks on the lock, that is held by the first transaction.
Since they are blocked, the first transaction gets to execute, when it tries to connect to postgres to make a query, it gets deadlocked, because sequiezlize has run out of connections.
When I changed my sequiezlize initialisation and added more connections to the pool, default being 5, the deadlock disappears.
I am not sure who is using the 5'th connection, or if the default happens to be 4 and not 5, for some reason, but still seem to tick all the boxes.
Another solution is to use the NOWAIT option in postgres, so a transaction abort when asking for a lock and not getting it, depending on your usecase.
Hope it helps if someone else gets encounters the same issue.
I've come up with a fancy issue of synchronization in node.js, which I've not able to find an elegant solution:
I setup a express/node.js web app for retrieving statistics data from a one row database table.
If the table is empty, populate it by a long calculation task
If the record in table is older than 15 minutes from now, update it by a long calculation task
Otherwise, respond with a web page showing the record in DB.
The problem is,
when multiple users issue requests simultaneously, in case the record is old, the long calculation task would be executed once per request, instead of just once.
Is there any elegant way that only one request triggers the calculation task, and all others wait for the updated DB record?
Yes, it is called locks.
Put an additional column in your table say lock which will be of timestamp type. Once a process starts working with that record put a now+timeout time into it (by the rule of thumb I choose timeout to be 2x the average time of processing). When the process stops processing update that column with NULL value.
At the begining of processing check that column. If the value > now condition is satisfied then return some status code to client (don't force client to wait, it's a bad user experience, he doesn't know what's going on unless processing time is really short) like 409 Conflict. Otherwise start processing (also ideally processing takes place in a separate thread/process so that user won't have to wait: respond with an appropriate status code like 202 Accepted).
This now+timeout value is needed in case your processing process crashes (so we avoid deadlocks). Also remember that you have to "check and set" this lock column in transaction because of race conditions (might be quite difficult if you are working with MongoDB-like databases).