I need to update 2 tables in one transaction.
In current version RethinkDB doesn't support transactions from the box. So, how can I achieve this?
I can make update in 2 ways:
Update 1st table. If success -> update second table.
Update 2nd tables async.
But how can I resolve case, when 1 of 2 updates was completed well, but another no? Yes, I can check result of update and revert update if error occured. But anyway, there can be case, when something happens with application (lost connection to Rethink, or just crash of script), but one of two updates was completed.
So, my data base will be in inconsistent state. And no way to resolve this.
So, is it possible to simulate transaction behavior in nodejs for RethinkDB?
The best you can do is two-phase commit. (MongoDB has a good document on how to do this, and the exact same technique should work in RethinkDB: http://docs.mongodb.org/master/tutorial/perform-two-phase-commits/ .)
RethinkDB supports per key linearizability and compare-and-set (document level atomicity) and it's known to be enough to implement application level transactions, more over you have several options to choose from:
If you need Serializable isolation level then you can follow the same algorithm which Google use for the Percolator system or Cockroach Labs for CockroachDB. I've blogged about it and create a step-by-step visualization, I hope it will help you to understand the main idea behind the algorithm.
If you expect high contention but it's fine for you to have Read Committed isolation level then please take a look on the RAMP transactions by Peter Bailis.
The third approach is to use compensating transactions also known as the saga pattern. It was described in the late 80s in the Sagas paper but became more actual with the raise of distributed systems. Please see the Applying the Saga Pattern talk for inspiration.
We had a similar requirement to implement transactional support in ReThinkDB, as we wanted to have transactions extending across MySQL and ReThinkDB DB boundaries. We had come-up with this micro library thinktrans https://github.com/jaladankisuresh/thinktrans, which is a promised based declarative javascript library for RethinkDB supporting Atomic transactions. However, It is still in its alpha stages
If you have a specific requirement and you may want to understand its approach Implementing Transactions in NoSQL Databases and implement your own.
Disclaimer: I am the author of this library
Related
I am migrating legacy application from Oracle to CosmosDB (MongoDB 3.6 API). At current stage of the project and also with respect to current goals refactoring ID from sequence numbering to GUIDs cannot be done. I am aware of many reasons why this is bad design but it is what it is - I need sequence generation :(
I am trying to find a solution that will be reliable. What came to my mind is to create a collection sequences and inside of it keep documents such is seq_foo and seq_bar and so on. Upon each insert I would first do findAndModify and then insert with custom id.
What is the question I currently struggle finding answer: is findAndModify atomic if used on single document with eventual consistency?
In case it is not do I need to use multiple DB Accounts with different consistency levels or is there another solution for this problem?
Could you please give me a hint is there any way to atomically update multiple documents on the Couchbase using Java SDK? I know, it's possible to use Embedding of documents, thus guaranteeing required, but, unfortunately, it doesn't work out for me.
In my case, the fact of document update leads to the fact that it's needed to invalidate (set special flag to false) other document, and it should be performed atomically.
I appreciate any help or suggestions from your side.
Thank you!
While there is no built-in way to perform atomic changes to multiple documents, you can use two-phase commit to achieve the same result. Note that in this case 2PC doesn't provide other transactional features, like isolation and consistency, only atomicity - which is what you're asking for. There is no reference 2PC implementation in Java for Couchbase, but there are two in Ruby and PHP in the documentation. I recommend reading the docs on providing transactional logic in Couchbase for an in-depth description of how to implement this. Porting the example code to Java should be fairly straightforward.
Generally speaking, to implement a set of changes on multiple documents atomically, you perform atomic writes to each document in turn, plus a temporary "state" document, in such a way that each step in the process is unique. This way you're able to continue from the same step or roll back your changes if the transaction gets interrupted in the middle for any reason.
Unfortunately, that boils down to a transaction, and Couchbase doesn't offer native transaction support. The level of atomicity you can achieve with Couchbase is at the level of a single document.
I know that some couchbase users have implemented manual transaction management in their applicative code layer, but that's quite a complicated topic and there's no openly available solution that I know of.
I'm currently studying Eric Evans'es Domain-Driven-Design. The idea of aggregates is clear to me and I find it very interesting. Now I'm thinking of an example of aggregate like :
BankAccount (1) ----> (*) Transaction.
BankAccount
BigDecimal calculateTurnover();
BankAccount is an aggregate. To calculate turnover I should traverse all transactions and sum up all amounts. Evans assumes that I should use repositories to only load aggreagates. In the above case there could be a few tousands of transactions which I don't want load at once in memory.
In the context of the repository pattern, aggregate roots are the only objects > your client code loads from the repository.
The repository encapsulates access to child objects - from a caller's perspective it automatically loads them, either at the same time the root is loaded or when they're actually needed (as with lazy loading).
What would be your suggestion to implement calulcateTurnover in a DDD aggregate ?
As you have pointed out, to load 1000s of entities in an aggregate is not a scalable solution. Not only will you run into performance problems but you will likely also experience concurrency issues, as emphasised by Vaughn Vernon in his Effective Aggregate Design series.
Do you want every transaction to be available in the BankAccount aggregate or are you only concerned with turnover?
If it is only the turnover that you need, then you should establish this value when instantiating your BankAccount aggregate. This could likely be effectively calculated by your data store technology (indexed JOINs, for example, if you are using SQL). Perhaps you also need to consider having this this as a precalculated value in your data store (what happens when you start dealing with millions of transactions per bank account)?
But perhaps you still require the transactions available in your domain? Then you should consider having a separate Transaction repository.
I would highly recommend reading Vaughn Vernon's series on aggregate design, as linked above.
You have managed to pick a very interesting example :)
I actually use Account1->*Transaction when explaining event sourcing (ES) to anyone not familiar with it.
As a developer I was taught (way back) to use what we can now refer to as entity interaction. So we have a Customer record and it has a current state. We change the state of the record in some way (address, tax details, discount, etc.) and store the result. We never quite know what happened but we have the latest state and, since that is the current state of our business, it is just fine. Of course one of the first issues we needed to deal with was concurrency but we had ways of handling that and even though not fantastic it "worked".
For some reason the accounting discipline didn't quite buy into this. Why do we not simply have the latest state of an Account. We will load the related record, change the balance, and save the state. Oddly enough most people would probably cringe at the thought yet it seems to be OK for the rest of our data.
The accounting domain got around this by registering the change events as a series of Transaction entries. So should you lose you account record and the latest balance you can always run though all the transactions to obtain the latest balance. That is event sourcing.
In ES one typically loads an entire list of events for an aggregate root (AR) to obtain its latest state. There is also, typically, a mechanism to deal with a huge number of events when loading all would cause performance issues: snapshots. Usually only the latest snapshot is stored. The snapshot contains the full latest state of the aggregate and only event after the snapshot version are applied.
One of the huge advantages of ES is that one could come up with new queries and then simply apply all the events to the query handler and determine the outcome. Perhaps something like: "How many customer do I have that have moved twice in the last year". Quite arbitrary but using the "traditional" approach the answer would quite likely be that we'll start gathering that information from today and have it available next year as we have not been saving the CustomerMoved events. With ES we can search for the CustomerMoved events and get a result at any point.
So this brings me back to your example. You probably do not want to be loading all the transactions. Instead store the "Turnover" and calculate it on the go. Should the "Turnover" be a new requirement then a once off processing of all the ARs should get it up to speed. You can still have a calculateTurnover() method somewhere but that would be something you wouldn't run all too often. And in those cases you would need to load all the transactions for an AR.
As I was reading up about couchdb I stumbled upon a question about transactions and couchdb. Apparently the way to handle transactions in couch is to pull the latest version and compare it to the version you are currently working with. This can present problems if data is changing quickly. The other way is a map reduce and to separate out the transactional data into multiple documents. This also seems less than optimal.
I was thinking about using redis for this sort of data. The increment and decrement functions seem fairly amazing for this sort of purpose.
So I could just write some sort of string for a transactional key like:
//some user document
{
name: "guy",
id: 10,
page_views: "redis user:page_views:10"
}
Then if I read something like "redis" inside of some piece of transactional data then I know to go get that information from redis. I suppose I could decide these things before hand, but since a document oriented database's primary mission is to be flexible and not bound data to columns I figured that there might be an easier way?
Is there an easy way to link redis data to couchdb? should I be doing this all manually and for the few fields that come up? Any other thoughts? Would it be better to update this transactional data "eventually" in the user document or simply not store it there?
Both Redis and CouchDB are "easy" (that is, simple). So in that regard, what you are describing is easy. Of course, by using two databases, you have increased the complexity of your application. But on the other hand, the CouchDB+Redis combination is gaining popularity.
The only tool I know that integrates the two is Mikeal Rogers's redcouch. It is a simple tool. Perhaps you could extend it to add what you need (and send a pull request!).
A more broad consideration is that Redis does not have the full replication feature set that CouchDB does. So Redis might restrict your future options with CouchDB. Specifically, Redis does not support multi-master replication. In contrast with CouchDB, you will always have a centralized Redis database. (Correct me if I'm wrong—I am stronger with CouchDB than with Redis.)
According to the CAP theory, Cassandra can only have eventually consistency. To make things worse, if we have multiple reads and writes during one request without proper handling, we may even lose the logical consistency. In other words, if we do things fast, we may do it wrong.
Meanwhile the best practice to design the data model for Cassandra is to think about the queries we are going to have, and then add a CF to it. In this way, to add/update one entity means to update many views/CFs in many cases. Without atomic transaction feature, it's hard to do it right. But with it, we lose the A and P parts again.
I don't see this concerns many people, hence I wonder why.
Is this because we can always find a way to design our data model to avoid to do multiple reads and writes in one session?
Is this because we can just ignore the 'right' part?
In real practice, do we always have ACID feature somewhere in the middle? I mean maybe implement in application layer or add a middleware to handle it?
It does concern people, but presumably you are using cassandra because a single database server is unable to meet your needs due to scaling or reliability concerns. Because of this, you are forced to work around the limitations of a distributed system.
In real practice, do we always have ACID feature somewhere in the
middle? I mean maybe implement in application layer or add a
middleware to handle it?
No, you don't usually have acid somewhere else, as presumably that somewhere else must be distributed over multiple machines as well. Instead, you design your application around the limitations of a distributed system.
If you are updating multiple columns to satisfy queries, you can look at the eventually atomic section in this presentation for ideas on how to do that. Basically you write enough info about your update to cassandra before you do your write. That way if the write fails, you can retry it later.
If you can structure your application in such a way, using a co-ordination service like Zookeeper or cages may be useful.