It is unclear for me, how the transactions implemented in google's datastore.
How they determines the resource I'm trying to reach? Because we are receiving global transaction object that we must use then for all requests. I have doubts, that this is done on the client library level, not on database level.
Is this transaction works both for read and write? I mean, that can I using the transaction implement document lock mechanism, so two user's cannot access it simultaneously (like mutex when we are dealing with multithreaded/multiprocessed applications).
And the final question. Does anybody know how the transnational mechanism implemented in datastore? I mean, high level architecture, or maybe processes diagrams, maybe short inside description, to get better understanding, what I am working with.
It shouldn't be relevant, but I'm using Google Cloud Functions and node.js environment for the project. I believe that this should not make any hard restrictions on usage.
It's not entirely clear what you are asking, but there is some good documentation on transactions with Cloud Datastore. I suggest you take a read through the concept documentation for this topic: Transactions
Transactions are a database level concept (server-side).
Transactions do support reads and writes. It is implemented as optimistic locking, so it won't block another client from reading it like a mutex, but it will cause the transaction to fail and rollback as appropriate.
In the documentation it talks about the details that are relevant to using it.
Outside of transactions, Cloud Datastore's isolation level is closest
to read committed. Inside of transactions, serializable isolation is
enforced. This means that another transaction cannot concurrently
modify the data that is read or modified by this transaction. Read the
serializable isolation wiki and the Transaction Isolation article for
more information on isolation levels.
Related
I noticed QLDB does not support LIMIT or SKIP query parameters required to implement basic pagination.
Is this going to be supported in the future or is there some other way to implement pagination in QLDB?
LIMIT/SKIP is not currently supported. QLDB is purpose built for data ingestion. We recommend doing reporting and analytics in another purpose built database.
Let's consider a banking application with 2 use-cases:
Moving money between accounts
Providing monthly statements
The first is a very good fit for QLDB, where indexes are being used to read balances and then few documents are being updated or created. Under OCC, QLDB makes it easy to write these transactions correctly and performance should be very good. For example, if an account has $50 remaining and two competing transactions try to deduct $50, only 1 will succeed (the other will fail to commit). Meanwhile, other transactions will continue to succeed. Beyond being simple and performant, you also get integrity via the QLDB hash chain and proof system.
The second is not a good fit. To compute a statement, we would need to lookup transactions for an account. But, what happens if that account changes (maybe somebody just sent you some money!) while we're doing the lookup? Again, under OCC, we will fail the transaction and the statement generation will need to retry. For a small bank, that's probably fine, but I think you can see where this is going. QLDB is purpose built for data ingestion, and the further you stray from what it was built for, the poorer the performance will be.
This begs the question of how to actually do these queries in another database. You can use the S3 Export or Kinesis Data Streaming features to get data out. S3 Exports are better suited for bulk operations (which many analytic databases prefer, e.g. Redshift), while Streams are better for real-time analytics (e.g. using ElasticSearch).
Conversely, I would not recommend using Redshift or ElasticSearch for the first use-case as you will not get the performance, integrity or durability that databases designed for OLTP use-cases offer (e.g. QLDB, DynamoDb, Aurora).
Abstract
I am modelling a generic authorization subdomain for my application. The requirements are quite complicated as it needs to cope with multi tenants, hierarchical organisation structure, resource groups, user groups, permissions, user-editable permissions and so on. It's a mixture of RBAC (users assigned to roles, roles having permissions, permissions can execute commands) with claims-based auth.
Problem
When checking for business rule invariants, I have to traverse the permission "graph" to find a permission for a user to execute a command on a resource in an environment. The traversal depth is arbitrary, on multiple dimensions.
I could model this using code, but it would be best represented using a graph database as queries/updates on this aggregate would be faster. Also, it would reduce the complexity of the code itself. But this would require the graph database to be immediately consistent.
Still, I need to use CQRS/ES, and enable a distributed architecture.
So the graph database needs to be
Immediately consistent
And this introduces some drawbacks
When loading events from event-store, we have to reconstruct the graph database each time
Or, we have to introduce some kind of graph database snapshotting
Overhead when communicating with the graph database
But it has advantages
Reduced complexity of performing complex queries
Complex queries are resolved faster than with code
The graph database is perfect for this job
Why this question?
In other aggregates I modelled, I often have a EntityList instance or EntityHierarchy instance. They basically are ordered/hierarchical collection of sub-entities. Their implementation is arbitrary. They can support anything from indexing, key-value pairs, dynamic arrays, etc. As long as they implement the interfaces I declared for them. I often even have methods like findById() or findByName() on those entities (lists). Those methods are similar to methods that could be executed on a database, but they're executed in-memory.
Thus, why not have an implementation of such a list that could be bound to a database? For example, instead of having a TMemoryEntityList, I would have a TMySQLEntityList. In the case at hand, perhaps having an implementation of a TGraphAuthorizationScheme that would live inside a TOrgAuthPolicy aggregate would be desirable. As long as it behaves like a collection and that it's iterable and support the defined interfaces.
I'm building my application with JavaScript on Node.js. There is an in-memory implementation of this called LevelGraph. Maybe I could use that as well. But let's continue.
Proposal
I know that in DDD terms the infrastructure should not leak into the domain. That's what I'm trying to prevent. That's also one of the reasons I asked this question, is that it's the first time I encounter such a technical need, and I am asking people who are used to cope with this kind of problem for some advice.
The interface for the collection is IAuthorizationScheme. The implementation has to support deep traversal, authorization finding, etc. This is the interface I am thinking about implementing by supporting it with a graph database.
Sequence :
1 When a user asks to execute a command I first authenticate him. I find his organisation, and ask the OrgAuthPolicyRepository to load up his organisation's corresponding OrgAuthPolicy.
The OrgAuthPolicyRepository loads the events from the EventStore.
The OrgAuthPolicyRepository creates a new OrgAuthPolicy, with a dependency-injected TGraphAuthorizationScheme instance.
The OrgAuthPolicyRepository applies all previous events to the OrgAuthPolicy, which in turns call queries on the graph database to sync states of the GraphDatabase with the aggregate.
The command handler executes the business rule validation checks. Some of them might include checks with the aggregate's IAuthorizationScheme.
The business rules have been validated, and a domain event is dispatched.
The aggregate handles this event, and applies it to itself. This might include changes to the IAuthorizationScheme.
The eventBus dispatched the event to all listening eventHandlers on the read-side.
Example :
In resume
Is it conceivable/desirable to implement entities using external databases (ex. Graph Database) so that their implementation be easier? If yes, are there examples of such implementation, or guidelines? If not, what are the drawbacks of using such a technique?
To solve your task I would consider the following variants going from top to bottom:
Reduce task complexity by employing security frameworks or identity
management solutions. Some existent out of the box identity management solution might do the job. If it doesn't take a look on the frameworks to help you implement your own. Unfortunately I'm poorly familiar with Node.js world to advice
you any. In Java world that could be Apache Shiro or Spring Security. This could be a good option from both costs and security perspective
Maintain single model instead of CQRS. This eliminates consistency problems (if you will decide to have separate
resources to store your models). From my understanding
permissions should not be changed frequently but they will be accessed
frequently. This means you can live with one model optimised for
reads, avoiding consistency issues and maintaining 2 models. To
track down user behaviour you can implement auditing separately.
From my experience security auditing can require some additional
data which most likely is not in your data model.
Do it with CQRS. And here I would first consider revisit requirements to find a way to accept eventual consistency instead of strong consistency. This opens many options for implementation.
Regarding the question should you use introduce dedicated Graph Database it's impossible to answer without knowledge of your domain, budget, desired system throughput and performance, existent infrastructure, team knowledge and setup etc. You need to estimate costs of the solution with dedicated Graph Database and without it. My filling is that unless permission management is main idea of your project or your project is mature enough (by number of users and R&D capacities) dedicated database is unlikely to pay back it's costs for your task.
To understand what could be benefits of having dedicated Graph Database your existent storage solutions should be taken in opposite. These 2 articles explains pretty well what could be such benefits:
http://neo4j.com/developer/graph-db-vs-nosql/
http://neo4j.com/developer/graph-db-vs-rdbms/
I'm spending my evenings evaluating Azure Service Fabric as a replacement for our current WebApps/CloudServices stack, and feel a little bit unsure about how to decide when services/actors with state should be stateful actors, and when they should be stateless actors with externally persisted state (Azure SQL, Azure Storage and DocumentDB). I know this is a fairly new product (to the general public at least), so there's probably not a lot of best practices in regards to this yet, but I've read through most of the documentation made available by Microsoft without finding a definite answer for this.
The current problem domain I'm approaching is our event store; parts of our applications are based on event sourcing and CQRS, and I'm evaluating how to move this event store over to the Service Fabric platform. The event store is going to contain a lot time series-data, and as it's our only source of truth for the data being persisted there it must be consistent, replicated and stored to some form of durable storage.
One way I have considered doing this is with stateful "EventStream" actor; each instance of an aggregate using event sourcing stores its events within an isolated stream. This means the stateful actor could keep track of all the events for its own stream, and I'd have met my requirements as to how the data is stored (transactional, replicated and durable). However, some streams may grow very large (hundreds of thousands, if not millions, of events), and this is where I'm starting to get unsure. Having an actor with a large amount of state will, I imagine, have impacts on the performance of the system when these large data models needs to be serialized to or deserialized from disk.
Another option is to keep these actors stateless, and have them just read their data from some external storage like Azure SQL - or just go with stateless services instead of actors.
Basically, when is the amount of state for an actor/service "too much" and you should start considering other ways of handling state?
Also, this section in the Service Fabric Actors design pattern: Some anti-patterns documentation leave me a little bit puzzled:
Treat Azure Service Fabric Actors as a transactional system. Azure Service Fabric Actors is not a two phase commit-based system offering ACID. If we do not implement the optional persistence, and the machine the actor is running on dies, its current state will go with it. The actor will be coming up on another node very fast, but unless we have implemented the backing persistence, the state will be gone. However, between leveraging retries, duplicate filtering, and/or idempotent design, you can achieve a high level of reliability and consistency.
What does "if we do not implement the optional persistance" indicate here? I was under the impression that as long as your transaction modifying the state succeeded, your data was persisted to durable storage and replicated to at least a subset of the replicas. This paragraph leaves me wondering if there are situations where state within my actors/services will get lost, and if this is something I need to handle myself. The impression I got from the stateful model in other parts of the documentation seems to counteract this statement.
One option that you have is to keep 'some' of the state in the actor (let's say what could be considered to be hot data that needs to be quickly available) and store everything else on a 'traditional' storage infrastructure such as SQL Azure, DocDB, ....
It is difficult to have a general rule about too much local state but, maybe, it helps to think about hot vs. cold data.
Reliable Actors also offer the ability to customize the StateProvider so you can also consider implementing a customized StateProvider (by implementing the IActorStateProvider) with the specific policies that you need to be more efficient with the requirements that you have in terms of amount of data, latency, reliability and so on (note: documentation is still very minimal on the StateProvider interface but we can publish some sample code if this is something you want to pursue).
About the anti-patterns: the note is more about implementing transactions across multiple actors. Reliable Actors provides full guarantee on reliability of the data within the boundaries of an actor. Because of the distributed and loosly coupled nature of the Actor model, implementing transactions that involve multiple actors is not a trivial task. If 'distributed' transactions is a strong requirement, the Reliable Services programming model is probably a better fit.
I know this has been answered, but recently found myself in the same predicament with a CQRS/ES system and here's how I went about it:
Each Aggregate was an actor with only the current state stored in it.
On a command, the aggregate would effect a state change and raise an event.
Events themselves were stored in a DocDb.
On activation, AggregateActor instances read events from DocDb if available to recreate its state. This is obviously only performed once per actor activation. This took care of the case where an actor instance is migrated from one node to another.
To answer #Trond's sedcondary question which is, "What does, 'if we do not implement the optional persistance' indicate here?"
An actor is always a stateful service, and its state can be configured, using an attribute on the actor class, to operate in one of three modes:
Persisted. The state is replicated to all replica instances, and it
also written to disk. This the state is maintained even if all
replicas are shut down.
Volatile. The state is replicated to all
replica instances, in memory only. This means as long as one replica
instance is alive the state is maintained. But when all replicas are
shut down the state is lost and cannot be recovered after they are
restarted.
No persistence. The state is not replicated to other
replica instances, nor to disk. This provides the least state
protection.
A full discussion of the topic can be found in the Microsoft documentation
I've been doing some research but have reached the point where I think MongoDB/Mongoose (on Node.js) is not the right tool for the job. Here is the scenario...
Two documents: Account (money) information and Inventory information
Check if user's account has enough money
If so, check and deduct inventory
Deduct funds from Account Information
It seems like I really need a transaction system to prevent other events from altering the data in between steps.
Am I correct, or can this still be handled in MongoDB/Mongoose? If not, is there a NoSQL db that I should check out, preferably with Node.JS support?
Implementing transactional safety is usually tricky and requires more than just transactions on the database, e.g. if you need to communicate with external parties in a reliable fashion or if the transaction runs over minutes, hours or even days. But that's leading to far.
Anyhow, on the db side you can do transactions in MongoDB using two-phase commits, but it's not exactly trivial.
There's a ton of NoSQL databases with transaction support, e.g. redis, cassandra (using the Paxos protocol) and foundationdb.
However, this seems rather random to me because the idea of NoSQL databases is to use one that fits your particular problem. If you just need 'anything' with transactions, an SQL db might do the job, right?
You can always implement your own locking mechanism within your application to lock out other sections of the app while you are making your account and inventory checks and updates. That combined with findAndModify() http://docs.mongodb.org/manual/reference/command/findAndModify/#dbcmd.findAndModify may be enough for your transaction needs while also maintaining the flexibility of a NoSQL solution.
For the distributed lock I'd look at Warlock https://www.npmjs.org/package/node-redis-warlock I've not used it myself but it's node.js based and built on top of Redis, although implementing your own via Redis is not that hard to begin with.
I have scoured the Internet, posted to the Spring forums, and read nearly the whole of the online documentation, but I cannot figure out whether Spring Integration can process more than one message within a single multi-resource (JTA) transaction. This is critical for my purposes, in order to get the throughput necessary. Does anyone know if this is possible? (And a little guidance on how to make it work would be appreciated.)
Once a transaction is started, as long as you don't pass a thread boundary all work will remain in that transaction.
This means that, if your transaction manager supports multi-resource transactions and you avoid introducing concurrency within the transaction, you will be OK.
In other words: it depends, but it is possible.