One database, supports multi-region application

One database, supports multi-region application - node.js

Is there a way to have one database (MongoDB) that is able to support multi-region applications with minimal latency?

this is a perfect use case for use replica set with 3 members (one per region)
One of them become a master - that means it will receive all writes and propagate them to others.
This also introduce extra layer of safety as data will be in more than one place, so network outage in one area will not stop entire application.
more here

Related

Node Architecture with Multiple Mongo DBs

We have a SaaS platform that allows clients to create new Instances and a separate DB when they sign up. The issue is creating a new instance for every new client is expensive and difficult to maintain.
We have one easy solution -
Merge all DBs into one and re-write the code accordingly (expensive to complete and we want to keep client DBs separate)
Ideally,
We would want to have separate DBs but common application instance as it's easy to maintain and lowers our server costs significantly.
Is there a proper way so that the application runs a single instance but connects with different DBs as per the client logged in? And what would be its performance implications compared to both separate instances and separate DB?
What we Have Already
What we Want Ideally

I think that one instance which connects to a huge number of DBs might not be a good fit (it may quickly become a bottleneck).
Another (maybe better) middle-ground option would be to have multiple multi-tenant instances (each instance is able to process requests from multiple clients).
That way, you spread the inbound load more efficiently (because you'll have less instances than the number of clients, but more than one in order to avoid a bottleneck), but also the outbound traffic will be distributed, because one multi-tenant instance might not have to be connected to all DBs, but to a subset of them (depending on which client is logged-in).
One important vector here is the throughput that you get per tenant. Depending on that, you may decide whether one instance is enough, or you need to add more of them.

Event Sourcing - Event Store

I am trying to understand the DDD / Event-sourcing / CQRS etc.
Lets consider an e-comm application with below Microservices.
order-service
shipping-service
payment-service
Can you clarify these questions?
We can relate domain as a large application and bounded-context as an individual microservice, rt?
Will each bounded-context/Microservice maintain its own event-store? (Basically 1 domain can have multiple event-sotre?)
If it is going to be 1 event-store per domain, who takes the ownership of event-store?

Typically, a (logical) service will have exclusive authority to modify one or more streams.
Whether those streams are all together in a single durable store, or distributed across multiple stores, isn't particularly important so long as the service knows how to find the streams.
Similarly, it's not typically all that important that each service has its own store. Functionally, the important thing is that the different services not write to streams that are outside of their jurisdiction. So long as you can be confident that two services won't be trying to use the same stream identifier, it should be fine.
Note that both of these guides are the same that you would use if your services were writing rows into tables in an RDBMS. Tables don't have to be in the same database, so long as the service knows which database holds which tables. Similarly, two different services can share the same database so long as they don't write into each other's tables.
There are, of course, non functional reasons that you might want the storage for different services to be isolated. For instance, if one service wants to upgrade to a new version of storage, while another needs to lag behind, then it will be a lot more convenient if the services are not sharing a database. Similarly, certain kinds of audits will be more easily satisfied by isolating data storage.
If I go with CQRS for order-service, My question is - who is supposed to consume payment events. command side or read side of order-service?
If your ordering domain dynamics need information from payments, then the command side of ordering will need a copy of the information from payments.
The payments information is an unlocked copy of the data - the authoritative copy of that information in payments may be changing even as we are updating orders.
Assuming you don't want to tightly couple ordering to the domain dynamics of payments, the copy of the payments information used by ordering will normally be a report (aka a "read model") rather than a copy of the entire history.

In an Event-Driven Microservice, how to I update private database with older data

I'm working on a new project, and I am still learning about how to use Microservice/Domain Driven Design.
If the recommended architecture is to have a Database-Per-Service, and use Events to achieve eventual consistency, how does the service's database get initialized with all the data that it needs?
If the events indicating an update to the database occurred before the new service/db was ever designed, do I need to start with a copy of the previous database?
Or should I publish a 'New Service On The Block' event, and allow all the other services to vomit back everything back to me again? Which could be a LOT of chatty-ness, and cause performance issues.

how does the service's database get initialized with all the data that it needs?
It asks for it; which is to say that you design a protocol so that the service that is spinning up can get copies of all of the information that it needs. That often includes tracking checkpoints, and queries that allow you to ask what has happened since some checkpoint.
Think "pull", rather than "push".
Part of the point of "services": designing the right data boundaries. The need to copy a lot of data between services often indicates that the service boundaries need to be reconsidered.

There is a special streaming platform named Apache Kafka, that solves something similar.
With Kafka you would publish events for other services to consume. What makes Kafka special is the fact, that events never (depends on configuration) get deleted and can be consumed again by new services spinning up. This feature can be used for initially populating the database (by setting the offset for a Topic to 0 and therefore re-read the history of events).
There also is another feature, called GlobalKTable what is a TableView of all events for a particular Topic. The GlobalKTable holds the latest value for each key (like primary key) and can be turned into an state-store (RocksDB under the hood), what makes it queryable. This state-store initializes itself whenever the application starts up. So the application does not need to have a database itself, because the state-store would be kept up-to-date automatically (consistency still is a thing to keep in mind). Only for more complex queries that state-store would need to be accompanied with a database (with kafka you would try to pre-compute the results of those queries and make them accessible to a distinct state-store itself).
This would be a complex endeavor, but if it suits your needs it is a fun thing to do!

multi-master in cosmosdb/documentdb

How can I set up multiple write regions in cosmosdb so that I do not need to combine query results of two or more different regions in my application layer? From this documentation, it seems like cosmosdb global distribution is global replication with one writer and multiple read secondarys, not true multi-master. https://learn.microsoft.com/en-us/azure/documentdb/documentdb-multi-region-writers

As of May 2018, Cosmos DB now supports multi-master natively using a combination of CRDT data types and automatic conflict resolution.
Multi-master in Azure Cosmos DB provides high levels of availability
(99.999%), single-digit millisecond latency to write data and
scalability with built-in comprehensive and flexible conflict
resolution support.
Multi-master is composed of multiple master regions that equally
participate in a write-anywhere model (active-active pattern) and it
is used to ensure that data is available at any time where you need
it. Updates made to an individual region are asynchronously propagated
to all other regions (which in turn are master regions in their own).
Azure Cosmos DB regions operating as master regions in a multi-master
configuration automatically work to converge the data of all replicas
and ensure global consistency and data integrity.
Azure Cosmos DB implements the logic for handling conflicting writes
inside the database engine itself. Azure Cosmos DB offers
comprehensive and flexible conflict resolution support by offering
several conflict resolution models, including Automatic (CRDT-
conflict-free replicated data types), Last Write Wins (LWW), and
Custom (Stored Procedure) for automatic conflict resolution. The
conflict resolution models provide correctness and consistency
guarantees and remove the burden from developers to have to think
about consistency, availability, performance, replication latency, and
complex combinations of events under geo-failovers and cross-region
write conflicts.
More details here: https://learn.microsoft.com/en-us/azure/cosmos-db/multi-region-writers
It's currently in preview and might require approval before you can use it:

According to your supplied link, based on my understanding. Multi-master in cosmosdb/documentdb is implemented by multiple documentdbs separately for write regions and read the documents from the combined query. Currently it seems that it is not supported to set up multiple write regions in cosmosdb so that don't need to combine query results of two or more different regions .

The referenced article states how to implement multi-master in Cosmosdb, while explicitly stating that it is not a multi-master database.
There are ways to "simulate" multi-master scenarios by configuring the consistency level (e.g. session) which will allow callers to see their local copy without having it written to the write region. You can find the details of the various levels here: https://learn.microsoft.com/en-us/azure/cosmos-db/consistency-levels.
Aside from that, consider if you truly need multi-master by working with the consistency levels, considering what acceptable latency is, etc. There are few scenarios that can't tolerate latency, particularly when you have adequate tools to provide a user experience that approximates a local write master. There is no such thing as real-time when remote networks are involved ;)

Creating incremental reports using Azure Tables

I need to create incremental reports in the table storage. I need to be able to update the same records from several different worker role instances (different roles with several instances each).
My reports consist mainly of values that I need to increment after I parse the raw data I initially stored.
The optimistic solution I found is to use a retry mechanism: Try to update the record. If you get a 412 result code (you don't have the latest ETAG value), retry. This solution becomes less efficient and more costly the more users you have and the more data you need to update simultaneously (my case exactly).
Another solution that comes to mind is to have only one instance of one worker role that can possibly update any given record. This is very problematic because this means that I will by-design create bottlenecks in my architecture, which is the opposite of the scale I want to reach with Azure.
If anyone here has some best practices in mind for such a use case, I would love to hear it.

Most cloud storages (Table Storage is one of those) do not offer scalable writes on a single entity/blob/whatever. There is no quick-fix for this limitation, as this limitation comes from the core tradeoff that have being made to create cloud storage in the first place.
Basically, a storage unit (entity/blob/whatever) can be updated about once every 20ms, and that's about it. Having a dedicated worker or not will not change anything to this aspect.
Instead, you need to address your task from from a different angle. For counters, the most usual approach is the use of sharded counters (link is for GAE, but you can implement an equivalent behavior on Azure).
Also, another way to ease the pain to go for an asynchronous architecture ala CQRS where the performance constraints you put on the update latency of entities is significantly relaxed.

I believe the approach needs re-architecture. In order to ensure scalability and limit amount of contention, you want to make sure that every write can work optimistically by providing unique Table/PartitionKey/RowKey
If you need those values for reports to be merged together, have a separate process/worker that will post-aggregated/merge the records for reporting purposes. You can use a queue or a timing mechanism to start aggregation/merging

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string