multi-master in cosmosdb/documentdb - azure

How can I set up multiple write regions in cosmosdb so that I do not need to combine query results of two or more different regions in my application layer? From this documentation, it seems like cosmosdb global distribution is global replication with one writer and multiple read secondarys, not true multi-master. https://learn.microsoft.com/en-us/azure/documentdb/documentdb-multi-region-writers

As of May 2018, Cosmos DB now supports multi-master natively using a combination of CRDT data types and automatic conflict resolution.
Multi-master in Azure Cosmos DB provides high levels of availability
(99.999%), single-digit millisecond latency to write data and
scalability with built-in comprehensive and flexible conflict
resolution support.
Multi-master is composed of multiple master regions that equally
participate in a write-anywhere model (active-active pattern) and it
is used to ensure that data is available at any time where you need
it. Updates made to an individual region are asynchronously propagated
to all other regions (which in turn are master regions in their own).
Azure Cosmos DB regions operating as master regions in a multi-master
configuration automatically work to converge the data of all replicas
and ensure global consistency and data integrity.
Azure Cosmos DB implements the logic for handling conflicting writes
inside the database engine itself. Azure Cosmos DB offers
comprehensive and flexible conflict resolution support by offering
several conflict resolution models, including Automatic (CRDT-
conflict-free replicated data types), Last Write Wins (LWW), and
Custom (Stored Procedure) for automatic conflict resolution. The
conflict resolution models provide correctness and consistency
guarantees and remove the burden from developers to have to think
about consistency, availability, performance, replication latency, and
complex combinations of events under geo-failovers and cross-region
write conflicts.
More details here: https://learn.microsoft.com/en-us/azure/cosmos-db/multi-region-writers
It's currently in preview and might require approval before you can use it:

According to your supplied link, based on my understanding. Multi-master in cosmosdb/documentdb is implemented by multiple documentdbs separately for write regions and read the documents from the combined query. Currently it seems that it is not supported to set up multiple write regions in cosmosdb so that don't need to combine query results of two or more different regions .

The referenced article states how to implement multi-master in Cosmosdb, while explicitly stating that it is not a multi-master database.
There are ways to "simulate" multi-master scenarios by configuring the consistency level (e.g. session) which will allow callers to see their local copy without having it written to the write region. You can find the details of the various levels here: https://learn.microsoft.com/en-us/azure/cosmos-db/consistency-levels.
Aside from that, consider if you truly need multi-master by working with the consistency levels, considering what acceptable latency is, etc. There are few scenarios that can't tolerate latency, particularly when you have adequate tools to provide a user experience that approximates a local write master. There is no such thing as real-time when remote networks are involved ;)

Related

Does cosmosdb stores data differently depending on which api you use?

What is the ramification of using each api?
For example if I am using sql api, am I sacrificing ACID, and which part of CAP am I using? How do azure achieve horizontal scalability
If I am using document api or key value api, is the internal data layout different?
The internal storage format for the various database API's in Cosmos DB doesn't have any bearing on ACID or CAP. The API you choose should be driven by the appropriate use case needed and/or your familiarity with it. For instance, both SQL API and Mongo DB API are document databases. But if you have experience using Mongo, then it's probably a better choice rather than say Gremlin which is a graph, Table which is a key/value store, or Cassandra which is a columnar store.
Cosmos DB provides ACID support for data within the same logical partition. With regards to CAP that depends on which consistency level/model you choose. Strong consistency trades "A" availability for "C" consistency. All the other consistency models trade consistency for availability but at varying degrees. For instance, bounded staleness defines an upper bound that data can lag in consistency by time or updates. As that boundary approaches, Cosmos will throttle new writes to allow the replication queue to catch up, ensuring the consistency guarantees are met.
Achieving horizontal scalability is a function of your partitioning strategy. For write heavy workloads, the objective should be choosing a partition key that will allow for writes to be distributed across a wide range of values. For read heavy workloads, you want for queries to be served by one or a bounded number of partitions. If it's both, then using change feed to copy data into 2 or more containers as needed such that the cost of copying data is cheaper than running it in a container that would result in cross-partition queries. At the end of the day, choosing a good partition key requires testing under production levels of load and data.

Spark errors when writing to Synapse DWH pool

I am trying to write a dataframe in either append/overwrite mode into a Synapse table using ("com.databricks.spark.sqldw") connector .The official docs doesn't mention much about ACID properties of this write operation. My question is that , if the write operation fails in the middle of the write, would the actions preformed previously be rolled back?
One thing that the docs does mention is that there are two classes of exception that could be thrown during this operation: SqlDWConnectorException and SqlDWSideException .My logic is that if the write operation is ACID compliant,then we do not do anything,but if not,then we plan to encapsulate this operation in a try-catch block and look for other options(maybe retry,or timeout).
As a good practice you should write your code to be re-runnable, eg delete potentially duplicate records. Imagine you are re-running a file for a failed day or someone want to reprocess a certain period. However SQL pools does implement ACID through transaction isolation levels:
Use transactions in a SQL pool in Azure Synapse
SQL pool implements ACID transactions. The isolation level of the
transactional support is default to READ UNCOMMITTED. You can change
it to READ COMMITTED SNAPSHOT ISOLATION by turning ON the
READ_COMMITTED_SNAPSHOT database option for a user SQL pool when
connected to the master database.
You should bear in mind that the default transaction isolation level for dedicated SQL pools is READ UNCOMMITTED which does allow dirty reads. So the way I think about it is, ACID (Atomic, Consistent, Isolated, Durable) is a standard and each provider implements the standard to different degrees through transaction isolation levels. Each transaction isolation level can be strongly meeting ACID or weakly meeting ACID. Here is my summary for READ UNCOMMITTED:
A - you should reasonably expect your transaction to be atomic but you should (IMHO) write your code to be re-runnable
C - you should reasonably expect your transaction to be consistent but bear in mind dedicated SQL pools does not support foreign keys and the NOT ENFORCED keyword is applied to unique indexes on creation.
I - READ UNCOMMITED does not meet 'I' Isolated criteria of ACID, allowing dirty reads (uncommitted data) but the gain is concurrency. You can change the default to READ COMMITTED SNAPSHOT ISOLATION as described above, but you would need a good reason to do so and conduct extensive tests on your application as the impacts on behaviour, performance, concurrency etc
D - you should reasonably expect your transaction to be durable
So the answer to your question is, depending on your transaction isolation level (bearing in mind the default is READ UNCOMMITTED in a dedicated SQL pool), each transaction meets ACID to a degree, most notably Isolation (I) is not fully met. You have the opportunity to change this by altering the default transaction
at the cost of reducing concurrency and the now obligatory regression test. I think you are most interested in Atomicity and my advice is there, make sure your code is re-runnable anyway.
You tend to see the 'higher' transaction isolation levels (READ SERIALIZABLE) in more OLTP systems rather than MPP systems like Synapse, the cost being concurrency. You want your bank withdrawal to work right?
It has guaranteed ACID transaction behavior.
Refer: What is Delta Lake, where it states:
Azure Synapse Analytics is compatible with Linux Foundation Delta
Lake. Delta Lake is an open-source storage layer that brings ACID
(atomicity, consistency, isolation, and durability) transactions to
Apache Spark and big data workloads. This is fully managed using
Apache Spark APIs available in Azure Synapse.

Is there a built-in method to replicate a collection to a "follower" collection in the same region?

CosmosDB can geo-replicate collections and clients can be configured to make (read-only) queries to these "follower" regions.
Is there a built-in way for CosmosDB to provide a "follower" collection in the same region?
The scenario for using that is to use the "main" collection for fast interactive queries, and use the "follower" collection for slower, heavier backend queries, without the possibility of hitting limits and causing throttling that would impact the interactive case.
The usual answer for "copying" collections is to use a change feed (possibly via an Azure function), but this is "manual" work and the client (me) would have to take care of general dev-ops overhead like provisioning, telemetry, monitoring, alerting, key rotation etc.
I'd like to know if there's a "managed" way to do this, like there is for geo-replication.
The built-in geo-replication feature only works when replicating to different regions. You cannot replicate the same collection(s) back to the same region.
You'll need to set this up yourself. As you've already mentioned, you can use Change Feed to do this (though you called it a "manual" process and I don't see it as such, since this can be completely automated in code). You can also incorporate a messaging/event pattern: subscribe to database update events, and have multiple consumers writing to different database collections, per your querying needs.
Also: by having an independent collection where you provide the data-movement code, you can choose a different data model for your slower, heavier backend queries (maybe with a different partition key; maybe with some helpful aggregations; etc.).
There's really no way to avoid the added infrastructure setup.
Replication is limited to a single container/collection. For most scenarios like yours, one would use an alternate partition key to make the second collection read optimized. You should also review your top queries and consider using an alternate database which is more read optimize.
You could use this new tool:
https://github.com/Azure-Samples/azure-cosmosdb-live-data-migrator

Data access layer patterns using azure function

We are currently working on a design using Azure functions with Azure storage queue binding.
Each message in the queue represents a complete transaction. An Azure function will be bound to that queue so that the function will be triggered as soon as there is a new message in the queue.
The function will then commit the transaction in a SQL DB.
The first-cut implementation is also complete; and it's working fine. However, on retrospective, we are considering the following:
In a typical DAL, there are well-established design patterns using entity framework, repository patterns, etc. However, we didn't find a similar guidance/best practices when implementing DAL within a server-less code.
Therefore, my question is: should such patterns be implemented with Azure functions (this would be challenging :) ), or should the server-less code be kept as light as possible or this is not a use-case for azure functions, at all?
It doesn't take anything too special. We're using a routine set of library DLLs for all kinds of things -- database, interacting with other parts of Azure (like retrieving Key Vault secrets for connection strings), parsing file uploads, business rules, and so on. The libraries are targeting netstandard20 so we can more easily migrate to Functions v2 when the right triggers become available.
Mainly just design your libraries so they're highly modularized, so you can minimize how much you load to get the job done (assuming reuse in other areas of the system is important, which it usually is).
It would be easier if dependency injection was available today. See this for a few ways some of us have hacked it together until we get official DI support. (DI is on the roadmap for Functions, I believe the 3.0 release.)
At first I was a little worried about startup time with the library approach, but the underlying WebJobs stack itself is already pretty heavy, and Functions startup performance seems to vary wildly anyway (on the cheaper tiers, at least). During testing, one of our infrequently-executed Functions has varied from just ~300ms to a peak of about ~3800ms to parse the exact same test file, with all but ~55ms spent on startup).
should such patterns be implemented with Azure functions (this would
be challenging :) ), or should the server-less code be kept as light
as possible or this is not a use-case for azure functions, at all?
My answer is NO.
There should be patterns to follow, but the traditional repository patterns and CRUD operations do not seem to be valid in the cloud era.
Many strong concepts we were raised up to adhere to, became invalid these days.
Denormalizing the data base became something not only acceptable but preferable.
Now designing a pattern will depend on the database you selected for your solution and also depends of the type of your application and the type of your data.
This is a link for general guideline when you do Table Storage design Guidelines.
Is your application read-heavy or write-heavy ? The design will vary accordingly.
Are you using Azure Tables or Mongo? There are design decisions based on that. Indexing is important in Mongo while there is non in Azure table that you can do.
Sharding consideration.
Redundancy Consideration.
In modern development/Architecture many principles has changed, each Microservice has its own database that might be totally different that any other Microservices'.
If you read along the guidelines that I provided, you will see what I mean.
Designing your Table service solution to be read efficient:
Design for querying in read-heavy applications. When you are designing your tables, think about the queries (especially the latency sensitive ones) that you will execute before you think about how you will update your entities. This typically results in an efficient and performant solution.
Specify both PartitionKey and RowKey in your queries. Point queries such as these are the most efficient table service queries.
Consider storing duplicate copies of entities. Table storage is cheap so consider storing the same entity multiple times (with different keys) to enable more efficient queries.
Consider denormalizing your data. Table storage is cheap so consider denormalizing your data. For example, store summary entities so that queries for aggregate data only need to access a single entity.
Use compound key values. The only keys you have are PartitionKey and RowKey. For example, use compound key values to enable alternate keyed access paths to entities.
Use query projection. You can reduce the amount of data that you transfer over the network by using queries that select just the fields you need.
Designing your Table service solution to be write efficient:
Do not create hot partitions. Choose keys that enable you to spread your requests across multiple partitions at any point of time.
Avoid spikes in traffic. Smooth the traffic over a reasonable period of time and avoid spikes in traffic.
Don't necessarily create a separate table for each type of entity. When you require atomic transactions across entity types, you can store these multiple entity types in the same partition in the same table.
Consider the maximum throughput you must achieve. You must be aware of the scalability targets for the Table service and ensure that your design will not cause you to exceed them.
Another good source is this link:

Replicate part of data to different region

I am trying a use case with cosmoseDB where we want to maintain one CosmoseDB but split the data into US region and Europe region with some partition key?
And for inserting/updating documents, application know which region(US/Europe) the documents go so is it possible to point to the right region while inserting/updating the document?
As I know , Cosmos DB global distribution mechanism guarantees consistency of all replica sets.
When you create the distribute cosmos db account, you enable the geo-redundancy.
You will see the regions of read and write separation.
Write operations are completed in write region and replicated to other read regions to ensure consistency. On the client side, there is no need to point to specific region to write data. From perspective of consistency , all region data supposed to be same.
More details, you could refer to this document.
Hope it helps you.
Can you have multiple write regions?
DocumentDB has good build-in features to bring read operations closer to consumers by adding read-regions to your documentDB account. You can read about it in documentation: "How to setup Azure Cosmos DB global distribution using the SQL API".
My understanding based on this is that there is always only 1 write region at any given time. I would not bet my thumbs on it, but it's hinted at in documentation. For example in "Connecting to a preferred region using the SQL API":
The SDK will automatically send all writes to the current write region.
All reads will be sent to the first available region in the PreferredLocations list. If the request fails, the client will fail down the list to the next region, and so on.
What you can do..
Things get more complicated when you also want to distribute writes (espcially if you care about consistency and latency). DocumentDB's own documentation suggest you implement this as a combination of multiple accounts, each of which has its own local write region and automatic distribution to read/fallback node in other regions.
The downside is that your application would have to configure and implement reading from all accounts in your application code and merging the results. Having data well partitioned by geography could help avoid full fan-out at times but your DAL would still have to manage multiple storages internally.
This scenario is explained in more detail in documentation page
"Multi-master globally replicated database architectures with Azure Cosmos DB".
I would seriously consider if adding such complexity would be justified, or if distributing just the reads would suffice.
All regions for a given account have the same replicated data. If you want to separate data across regions, you'd need to split it into two accounts.
Given partition A in the US and partition B in the EU – there is very little difference if A and B were under the same account, or under different accounts… the collection/db/account are all just logical wrappers on top of the partition.

Resources