I know it's an old discussion but still an open one.
Scenario is simple you have an entity say account which contains a attribute named "AccountId"
which should be auto incremented.
A prime candidate for this is Plugin registered on Pre event.
There are differnt options available to cater this.
Get max number, increment it and assign to AccountId attribute.
Rely on Some external source e.g. some web service or db to perform this job. (Which is not a good approach)
These approaches are disccused here.
Personally I am in favour of Approach 1 but I have concerns:
1- Duplication on concurent requests
Locking and mutex can reduce that but what can be done to avoid this problem in case of "Farm environment"?
The problem in a 'Farm environment', which actually means multiple servers with the front-end role installed, is that your are hardly able to avoid the duplication of your counter values.
With locks or mutexes, your are only able to achieve consistency in a single machine environment.
If you need reliable numbering, you should use either a service which generates the numbers or a dedicated database (that means, not the CRM database as this would be not supported) as back-end where you could coordinate the requests with locks.
Related
CosmosDB can geo-replicate collections and clients can be configured to make (read-only) queries to these "follower" regions.
Is there a built-in way for CosmosDB to provide a "follower" collection in the same region?
The scenario for using that is to use the "main" collection for fast interactive queries, and use the "follower" collection for slower, heavier backend queries, without the possibility of hitting limits and causing throttling that would impact the interactive case.
The usual answer for "copying" collections is to use a change feed (possibly via an Azure function), but this is "manual" work and the client (me) would have to take care of general dev-ops overhead like provisioning, telemetry, monitoring, alerting, key rotation etc.
I'd like to know if there's a "managed" way to do this, like there is for geo-replication.
The built-in geo-replication feature only works when replicating to different regions. You cannot replicate the same collection(s) back to the same region.
You'll need to set this up yourself. As you've already mentioned, you can use Change Feed to do this (though you called it a "manual" process and I don't see it as such, since this can be completely automated in code). You can also incorporate a messaging/event pattern: subscribe to database update events, and have multiple consumers writing to different database collections, per your querying needs.
Also: by having an independent collection where you provide the data-movement code, you can choose a different data model for your slower, heavier backend queries (maybe with a different partition key; maybe with some helpful aggregations; etc.).
There's really no way to avoid the added infrastructure setup.
Replication is limited to a single container/collection. For most scenarios like yours, one would use an alternate partition key to make the second collection read optimized. You should also review your top queries and consider using an alternate database which is more read optimize.
You could use this new tool:
https://github.com/Azure-Samples/azure-cosmosdb-live-data-migrator
We are currently working on a design using Azure functions with Azure storage queue binding.
Each message in the queue represents a complete transaction. An Azure function will be bound to that queue so that the function will be triggered as soon as there is a new message in the queue.
The function will then commit the transaction in a SQL DB.
The first-cut implementation is also complete; and it's working fine. However, on retrospective, we are considering the following:
In a typical DAL, there are well-established design patterns using entity framework, repository patterns, etc. However, we didn't find a similar guidance/best practices when implementing DAL within a server-less code.
Therefore, my question is: should such patterns be implemented with Azure functions (this would be challenging :) ), or should the server-less code be kept as light as possible or this is not a use-case for azure functions, at all?
It doesn't take anything too special. We're using a routine set of library DLLs for all kinds of things -- database, interacting with other parts of Azure (like retrieving Key Vault secrets for connection strings), parsing file uploads, business rules, and so on. The libraries are targeting netstandard20 so we can more easily migrate to Functions v2 when the right triggers become available.
Mainly just design your libraries so they're highly modularized, so you can minimize how much you load to get the job done (assuming reuse in other areas of the system is important, which it usually is).
It would be easier if dependency injection was available today. See this for a few ways some of us have hacked it together until we get official DI support. (DI is on the roadmap for Functions, I believe the 3.0 release.)
At first I was a little worried about startup time with the library approach, but the underlying WebJobs stack itself is already pretty heavy, and Functions startup performance seems to vary wildly anyway (on the cheaper tiers, at least). During testing, one of our infrequently-executed Functions has varied from just ~300ms to a peak of about ~3800ms to parse the exact same test file, with all but ~55ms spent on startup).
should such patterns be implemented with Azure functions (this would
be challenging :) ), or should the server-less code be kept as light
as possible or this is not a use-case for azure functions, at all?
My answer is NO.
There should be patterns to follow, but the traditional repository patterns and CRUD operations do not seem to be valid in the cloud era.
Many strong concepts we were raised up to adhere to, became invalid these days.
Denormalizing the data base became something not only acceptable but preferable.
Now designing a pattern will depend on the database you selected for your solution and also depends of the type of your application and the type of your data.
This is a link for general guideline when you do Table Storage design Guidelines.
Is your application read-heavy or write-heavy ? The design will vary accordingly.
Are you using Azure Tables or Mongo? There are design decisions based on that. Indexing is important in Mongo while there is non in Azure table that you can do.
Sharding consideration.
Redundancy Consideration.
In modern development/Architecture many principles has changed, each Microservice has its own database that might be totally different that any other Microservices'.
If you read along the guidelines that I provided, you will see what I mean.
Designing your Table service solution to be read efficient:
Design for querying in read-heavy applications. When you are designing your tables, think about the queries (especially the latency sensitive ones) that you will execute before you think about how you will update your entities. This typically results in an efficient and performant solution.
Specify both PartitionKey and RowKey in your queries. Point queries such as these are the most efficient table service queries.
Consider storing duplicate copies of entities. Table storage is cheap so consider storing the same entity multiple times (with different keys) to enable more efficient queries.
Consider denormalizing your data. Table storage is cheap so consider denormalizing your data. For example, store summary entities so that queries for aggregate data only need to access a single entity.
Use compound key values. The only keys you have are PartitionKey and RowKey. For example, use compound key values to enable alternate keyed access paths to entities.
Use query projection. You can reduce the amount of data that you transfer over the network by using queries that select just the fields you need.
Designing your Table service solution to be write efficient:
Do not create hot partitions. Choose keys that enable you to spread your requests across multiple partitions at any point of time.
Avoid spikes in traffic. Smooth the traffic over a reasonable period of time and avoid spikes in traffic.
Don't necessarily create a separate table for each type of entity. When you require atomic transactions across entity types, you can store these multiple entity types in the same partition in the same table.
Consider the maximum throughput you must achieve. You must be aware of the scalability targets for the Table service and ensure that your design will not cause you to exceed them.
Another good source is this link:
Abstract
I am modelling a generic authorization subdomain for my application. The requirements are quite complicated as it needs to cope with multi tenants, hierarchical organisation structure, resource groups, user groups, permissions, user-editable permissions and so on. It's a mixture of RBAC (users assigned to roles, roles having permissions, permissions can execute commands) with claims-based auth.
Problem
When checking for business rule invariants, I have to traverse the permission "graph" to find a permission for a user to execute a command on a resource in an environment. The traversal depth is arbitrary, on multiple dimensions.
I could model this using code, but it would be best represented using a graph database as queries/updates on this aggregate would be faster. Also, it would reduce the complexity of the code itself. But this would require the graph database to be immediately consistent.
Still, I need to use CQRS/ES, and enable a distributed architecture.
So the graph database needs to be
Immediately consistent
And this introduces some drawbacks
When loading events from event-store, we have to reconstruct the graph database each time
Or, we have to introduce some kind of graph database snapshotting
Overhead when communicating with the graph database
But it has advantages
Reduced complexity of performing complex queries
Complex queries are resolved faster than with code
The graph database is perfect for this job
Why this question?
In other aggregates I modelled, I often have a EntityList instance or EntityHierarchy instance. They basically are ordered/hierarchical collection of sub-entities. Their implementation is arbitrary. They can support anything from indexing, key-value pairs, dynamic arrays, etc. As long as they implement the interfaces I declared for them. I often even have methods like findById() or findByName() on those entities (lists). Those methods are similar to methods that could be executed on a database, but they're executed in-memory.
Thus, why not have an implementation of such a list that could be bound to a database? For example, instead of having a TMemoryEntityList, I would have a TMySQLEntityList. In the case at hand, perhaps having an implementation of a TGraphAuthorizationScheme that would live inside a TOrgAuthPolicy aggregate would be desirable. As long as it behaves like a collection and that it's iterable and support the defined interfaces.
I'm building my application with JavaScript on Node.js. There is an in-memory implementation of this called LevelGraph. Maybe I could use that as well. But let's continue.
Proposal
I know that in DDD terms the infrastructure should not leak into the domain. That's what I'm trying to prevent. That's also one of the reasons I asked this question, is that it's the first time I encounter such a technical need, and I am asking people who are used to cope with this kind of problem for some advice.
The interface for the collection is IAuthorizationScheme. The implementation has to support deep traversal, authorization finding, etc. This is the interface I am thinking about implementing by supporting it with a graph database.
Sequence :
1 When a user asks to execute a command I first authenticate him. I find his organisation, and ask the OrgAuthPolicyRepository to load up his organisation's corresponding OrgAuthPolicy.
The OrgAuthPolicyRepository loads the events from the EventStore.
The OrgAuthPolicyRepository creates a new OrgAuthPolicy, with a dependency-injected TGraphAuthorizationScheme instance.
The OrgAuthPolicyRepository applies all previous events to the OrgAuthPolicy, which in turns call queries on the graph database to sync states of the GraphDatabase with the aggregate.
The command handler executes the business rule validation checks. Some of them might include checks with the aggregate's IAuthorizationScheme.
The business rules have been validated, and a domain event is dispatched.
The aggregate handles this event, and applies it to itself. This might include changes to the IAuthorizationScheme.
The eventBus dispatched the event to all listening eventHandlers on the read-side.
Example :
In resume
Is it conceivable/desirable to implement entities using external databases (ex. Graph Database) so that their implementation be easier? If yes, are there examples of such implementation, or guidelines? If not, what are the drawbacks of using such a technique?
To solve your task I would consider the following variants going from top to bottom:
Reduce task complexity by employing security frameworks or identity
management solutions. Some existent out of the box identity management solution might do the job. If it doesn't take a look on the frameworks to help you implement your own. Unfortunately I'm poorly familiar with Node.js world to advice
you any. In Java world that could be Apache Shiro or Spring Security. This could be a good option from both costs and security perspective
Maintain single model instead of CQRS. This eliminates consistency problems (if you will decide to have separate
resources to store your models). From my understanding
permissions should not be changed frequently but they will be accessed
frequently. This means you can live with one model optimised for
reads, avoiding consistency issues and maintaining 2 models. To
track down user behaviour you can implement auditing separately.
From my experience security auditing can require some additional
data which most likely is not in your data model.
Do it with CQRS. And here I would first consider revisit requirements to find a way to accept eventual consistency instead of strong consistency. This opens many options for implementation.
Regarding the question should you use introduce dedicated Graph Database it's impossible to answer without knowledge of your domain, budget, desired system throughput and performance, existent infrastructure, team knowledge and setup etc. You need to estimate costs of the solution with dedicated Graph Database and without it. My filling is that unless permission management is main idea of your project or your project is mature enough (by number of users and R&D capacities) dedicated database is unlikely to pay back it's costs for your task.
To understand what could be benefits of having dedicated Graph Database your existent storage solutions should be taken in opposite. These 2 articles explains pretty well what could be such benefits:
http://neo4j.com/developer/graph-db-vs-nosql/
http://neo4j.com/developer/graph-db-vs-rdbms/
We have recently decided to adopt DDD in my team for our new projects because of the so many obvious benefits (coming from the Active-Record pattern school) and there are a couple of things that are yet unclear.
Say I have an entity Transaction that depends on the following entities (that each in turn depends on other so many entities):
1. Customer
2. Account
3. Currency
When I make use of factories to instantiate a Transaction entity to pass to a Domain Service for some fancy business rules, do I make so many queries to setup all these dependent instances?
If I have overloads in my factory that skip such dependencies then those will be null in some cases and it will become too complicated to differentiate when I can access those properties and when I cannot. With Active-Record pattern I just use lazy loading and have them load only on demand. Any ideas with DDD?
EDIT:
In my scenario “Transaction” seems to be the best candidate for an Aggregate root. I have defined a method in my Application Service “InitiateTransaction” (also have a “FinalizeTransaction” as it involves a redirect to PayPal) and takes as parameters the DTOs needed to carry AccountId, CurrencyId, LanguageId and various other foreign keys as well as Transaction attributes.
When calling my Domain Services (Transaction Processor and Fraud Rule Evaluator), I need to specify the “Transaction” Aggregate with all dependencies loaded (“Transaction.Customer”, “Transaction.Currency”, etc.).
So if I am correct the steps required are:
1. Call some repository(ies) to retrieve Customer, Currency etc.
2. Call TransactionFactory with dependencies specified above to get a Transaction object
3. Call Domain Services with fully loaded Transaction object for business rules to take place
Correct? Additionally, my concern was about steps 1 and 2.
If “Customer”, “Currency” and other Entities/Value Objects “Transaction” depends on, have in turn other dependencies. Do I try to set up those as well? Because it seems to me that if I do I will end up with very bloated code in my Application Service and not very reusable to place in a separate method. However, if I don’t and just retrieve those from a repository with a “GetById(id)”as you suggested, my code could end up buggy as say I need property “Transaction.Customer.CreatedByUser” which returns a “User” instance, it will be null because repositories only load flat instances.
EDIT:
I ended up using GetById(id) to load only the dependencies I knew they were needed in my Services. Not a big fun of accidentally accessing null instances due to flat loading but I have my unit tests to protect me from taking it to production!!
I highly doubt it that Currency is an entity, however it's important to model things like how they defined and use by the real Domain. Forget factories or other implementation details like the db, you need to make sure you have defined the concepts right.
Once you've done that, you'd already identified the aggregate root as well. Btw, the entities should encapsulate the relevant business rules. Use Services to implement use-cases i.e to manage the interaction between the domain objects and other parts such as the repository.
You should keep EVERYTHING related to db and CRUD in the repository, and have the repo work only with the aggregate roots. Also, for querying purposes, you should use CQRS so that all the queries would be done on a read model. For Domain purposes, a Get(id) is 99% enough and that method returns an aggregate root.
Be aware that DDD is very tricky, the most difficult part is modeling the Domain correctly, all the buzzwords are useless if the model is wrong.
I need to create incremental reports in the table storage. I need to be able to update the same records from several different worker role instances (different roles with several instances each).
My reports consist mainly of values that I need to increment after I parse the raw data I initially stored.
The optimistic solution I found is to use a retry mechanism: Try to update the record. If you get a 412 result code (you don't have the latest ETAG value), retry. This solution becomes less efficient and more costly the more users you have and the more data you need to update simultaneously (my case exactly).
Another solution that comes to mind is to have only one instance of one worker role that can possibly update any given record. This is very problematic because this means that I will by-design create bottlenecks in my architecture, which is the opposite of the scale I want to reach with Azure.
If anyone here has some best practices in mind for such a use case, I would love to hear it.
Most cloud storages (Table Storage is one of those) do not offer scalable writes on a single entity/blob/whatever. There is no quick-fix for this limitation, as this limitation comes from the core tradeoff that have being made to create cloud storage in the first place.
Basically, a storage unit (entity/blob/whatever) can be updated about once every 20ms, and that's about it. Having a dedicated worker or not will not change anything to this aspect.
Instead, you need to address your task from from a different angle. For counters, the most usual approach is the use of sharded counters (link is for GAE, but you can implement an equivalent behavior on Azure).
Also, another way to ease the pain to go for an asynchronous architecture ala CQRS where the performance constraints you put on the update latency of entities is significantly relaxed.
I believe the approach needs re-architecture. In order to ensure scalability and limit amount of contention, you want to make sure that every write can work optimistically by providing unique Table/PartitionKey/RowKey
If you need those values for reports to be merged together, have a separate process/worker that will post-aggregated/merge the records for reporting purposes. You can use a queue or a timing mechanism to start aggregation/merging