Imagine you were working on time-tracking software. Two different managers are adding time worked for an employee on different columns on a time-sheet. The two managers are each adding 8 hours to two different days. But the time-sheet is already at 32 hours and should not go over 40 hours (that’s our new business rule). Right now, both cases will fetch from the database that the time-sheet has 32 hours left. By the time the top operation finished actually adding the additional 8 hours worked, the other operation has already fetched the state of the time-sheet…32 hours worked. What will happen is that both will succeed. And, we are left with 48 hours on a time-sheet!
I can solve this problem by moving Fryday working hours and Monday working hours into a single aggregate with method addHour(int hours, Enum Day) which will check the total hours or I can make Fryday, Monday, and Employee a separate aggregate. When the Monday manager decides to add hours the employee will receive an event addHours which will check the total hours and send back an event HoursAdded event if the total hours do not exceed 40 or HoursNotAdded event. Then the Monday aggregate will handle the event and add hours to his total hours.
What you describe is a persistence issue that arises when concurrent updates are made to a record.
To solve it you must detect concurrent updates of the time sheet. You usually do this with a version that is updated when the time sheet's persistence representation is updated. If the version changed since the time sheet was read it means that someone else updated it. In this case the repository can raise an exception.
This technique using a version is called optimistic locking.
There are several ways to deal with such an optimistic lock exception. You can either show the user a dialog and reload the data or you can retry the use case and execute it again. If you execute the use case again it will reload the time sheet from the database and thus see the updates made by someone else. Now a domain exception will be raised since the use case can not update the hours, because the hours to add will exceed the week limt (your business rule).
Since the version number is only for persistence reasons it sould not be modeled in the domain entity. To keep track of the version number a repository can use an identity map so that it can map the domain entity to it's initial persistence state. This identity map is scoped to a use case invocation so that you do not affect other use cases. There are a lot of different ways to implement this scope. E.g. you could use a thread local or you make the repository stateful which usually makes the use case stateful since it has a dependency to the repository. I can't go into depth here.
When you implement optimistic locking you should ensure that the version check and the record update is an atomic operation. Otherwise you haven't really solved it. How to implement optimistic locking depends on the database you use (SQL, NoSQL, etc.). You usually pass the version and the id as selection criteria in your update request.
Related
I have a table say CREDIT_POINTS. It has got below columns.
Copmany Credit points Amount
A 100 50
B 200 94
C 250 80
There are multiple threads which will update this table. There is a method which reads Credit points and do some calculations and update amount as well as Credit points. This calculations will take quite some time.
Suppose thread A reads and it is doing some calculations. At the same time before A writes back thread B is reading data from table to does calculations and updates data. Here I am loosing the data which thread A updated. In many cases credit points and amount will not be in sync as multiple threads are reading and updating the table.
One thing we can do here is using a synchronized method.
I am thinking of using spring transaction. Is spring transaction thread safe? What else is a good option for this?
Any help greatly appreciated.
Note: am using ibatis(ORM) and and MySQL.
You definitely need transactions to make sure, that you do your updates based on the data you previously read. This transaction must include read and write operation.
To make sure that multiple threads cooperate you do not need synchronized but have two options:
pessimistic locking: you use select for update. This will set a lock which will be release at the end of the transaction.
optimistic locking: during your update you find out, that the data has been changed meanwhile, if so you have to repeat reading and changing. You can achieve this in your update statement by not only searching for the company (the primary key, I hope), but also for the amount and "credit points" previously read.
Both methods have their merits. I recommend to make yourself familiar with these concepts before finishing this application. As soon as there is a heavy load, if you did anything wrong, your amounts and credit points might get wrongly calculated.
It seems to me that using IF would make the statement possibly fail if re-tried. Therefore, the statement is not idempotent. For instance, given the CQL below, if it fails because of a timeout or system problem and I retry it, then it may not work because another person may have updated the version between retries.
UPDATE users
SET name = 'foo', version = 4
WHERE userid = 1
IF version = 3
Best practices for updates in Cassandra are to make updates idempotent, yet the IF operator is in direct opposition to this. Am I missing something?
If your application is idempotent, then generally you wouldn't need to use the expensive IF clause, since all your clients would be trying to set the same value.
For example, suppose your clients were aggregating some values and writing the result to a roll up table. Each client would calculate the same total and write the same value, so it wouldn't matter if multiple clients wrote to it, or what order they wrote to it, since it would be the same value.
If what you are actually looking for is mutual exclusion, such as keeping a bank balance, then the IF clause could be used. You might read a row to get the current balance, then subtract some money and update the balance only if the balance hadn't changed since you read it. If another client was trying to add a deposit at the same time, then it would fail and would have to try again.
But another way to do that without mutual exclusion is to write each withdrawal and deposit as a separate clustered transaction row, and then calculate the balance as an idempotent result of applying all the transaction rows.
You can use the IF clause for idempotent writes, but it seems pointless. The first client to do the write would succeed and Cassandra would return the value "applied=True". And the next client to try the same write would get back "applied=False, version=4", indicating that the row had already been updated to version 4 so nothing was changed.
This question is more about linerizability(ordering) than idempotency I think. This query uses Paxos to try to determine the state of the system before applying a change. If the state of the system is identical then the query can be retried many times without a change in the results. This provides a weak form of ordering (and is expensive) unlike most Cassandra writes. Generally you should only use CAS operations if you are attempting to record state of a system (rather than a history or log)
Do not use many of these queries if you can help it, the guidelines suggest having only a small percentage of your queries rely on this behavior.
I am implementing a session table with nodejs which will grow to a huge number of items. each hash key is a uuid representing a user.
In order to delete the expired sessions, I must scan the table for expired attribute and delete old sessions. I am planning to do this scan once a few days, and other than that, I don't really need high read capacity.
I came out with 2 solutions, and i would like to hear some feedback about them.
1) UpdateTable to higher capacities for only that scheduled routine, and after the scan is done, simply reduce the table capacities to it's original values.
2) Perform the scan, and when retrieving the 'LastEvaluatedKey' after an x*MB read, create a initiation delay (for not consuming all read/sec units), and then continue the scan with 'ExclusiveStartKey'.
If you're doing a scan, option 1 is your best best. This is the only real way to guarantee that you won't effect your application performance while the scan is ongoing.
The only thing you need to be sure of is that you only run this operation once a day -- I believe you can only DOWNGRADE throughput units on a DynamoDB table 2x's per day (at most).
This is an old question, but I saw it through a related question.
There is now a much better native solution: DynamoDB Time to Live
It allows you to specify one attribute per table that serves as the time to live value for each item. You can then set the attribute per item with a Unix-Timestamp that specifies when the item should be deleted.
Within about 24 hours of that timestamp, the item will be deleted at no additional charge.
What is the best practice for running a database-query after any document in a collection become of certain age?
Let's say this is a node.js web-system with mongoDB, with a collection of posts. After a new post is inserted, it should be updated with some data after 60 minutes.
Would a cron-job that checks all posts with (age < one hour) every minute or two be the best solution? What would be the least stressing solution if this system has >10.000 active users?
Some ideas:
Create a second collection as a queue with a "time to update" field which would contain the time at which the source record needs to be updated. Index it, and scan through looking for values older than "now".
Include the field mentioned above in the original document and index it the same way
You could just clear the value when done or reset it to the next 60 minutes depending on behavior (rather than inserting/deleting/inserting documents into the collection).
By keeping the update-collection distinct, you have a better chance of always keeping the entire working set of queued updates in memory (compared to storing the update info in your posts).
I'd kick off the update not as a web request to the same instance of Node but instead as a separate process so as to not block user-requests.
As to how you schedule it -- that's up to you and your architecture and what's best for your system. There's no right "best" answer, especially if you have multiple web servers or a sharded data system.
You might use a capped collection, although you'd run the risk of potentially losing records needing to be updated (although you'd gain performance)
I am new to CouchDB, but that is not related to the problem. The question is simple, yet not clear to me.
For example: Boris was on the site 5 seconds ago and viewing his profile Ivan sees it.
How to correctly implement this feature (users last-access time)?
The problem is that, if we update users profile document in CouchDB, for ex. property last_access_time, each time a page is refreshed, than we will have the most relevant information (with MySQL we did it this way), but on the other hand, we will have _rev of the document somewhere about 100000++ by the end of the day.
So, how do you do that or do you have any ideas?
This is not a full answer but a possible optimization. It will work in addition to any other answers here.
Instead of storing the latest timestamp, update the timestamp only if it has changed by e.g. 5 seconds, or 60 seconds.
Assume a user refreshes every second for a day. That is 86,400 updates. But if you only update the timestamp at 5-second intervals, that is 17,280; for 60-seconds it is 1,440.
You can do this on the client side. When you want to update the timestamp, fetch the current document and check the old timestamp. If it is less than 5 seconds old, don't do anything. Otherwise, update it normally.
You can also do it on the server side. Write an _update function in CouchDB, which you can query like e.g. POST /db/_design/my_app/_update/last-access/the_doc_id?time=2011-01-31T05:05:31.872Z. The update function will do the same thing: check the old timestamp, and either do nothing, or update it, depending on the elapsed time.
If there was (a large) part of a document that is relatively static, and (a small) part that is highly dynamic, I would consider splitting it into two different documents.
Another option might be to use something more suited to the high write throughput of small pieces of data of that nature such as Redis or possibly MongoDB, and (if necessary) have a background task to occasionally write the info to CouchDB.
CouchDB has no problem with rapid document updates. Just do it, like MySQL. High _rev is no problem.
The only thing is, you have to be responsible about your couch from day 1. All CouchDB users must do this anyway, however you may have to do it sooner. (Applications with few updates have lower risk of a full disk, so developers can postpone this work.)
Poll your database and run compaction if it needs it (based on size, document count, seq_id number)
Poll your views and run compaction too
Always have enough disk capacity and i/o bandwidth to support compaction. Mathematical worst-case: you need 2x the database size, and 2x the write speed; however, most applications require less. Since you are updating documents, not adding them, you will need way less.