Cosmos DB selective regional replication - azure

We are planning to use cosmos db single master deployment where all master data are maintained from a single region. The application is spread across various regions and we need to provide read access to the individual regions. However we would like to have filtered replication as not all regions will be interested in all data in cosmos DB. Is there any way to use selective region specific replication? I am aware that we could use Cosmos DB trigger and then have function app etc to replicate traffic but that is an overhead in terms of maintenance and monitoring. Hence would be interested to know if we can make use of any native functionality.

The built-in geo-replication mechanism is completely transparent to you. You can't see it and you can't do anything about it. There is no way to do what you described without writing something custom.
If you really want to have selected data replicated then you would need to do the following (It's a terrible solution and you should NOT go with it):
Create a main source of truth Cosmos DB account. That's "single master" that you described.
Create a few other accounts in whichever region you want.
Use a Cosmos DB trigger Azure Function or the Change Feed Processor library to listen to changes on the main account and then use your filtering logic to replicate them into the other accounts that need to use them.
Use a different connection string per application based on it's deployment environment
What's wrong with just having your data replicated across all regions though? There are no drawbacks.

Related

Azure SQL Databases with bi-directional replication having read/write on all database instances

We are looking to try to implement the following in Azure SQL Server / databases. Our solution we provide has the following resources:
2 azure app services
database backend in Azure SQL Server with SQL Databases within an elastic pool
Goal:
We would like to have the above resources in the West and in the UK, so basically complete solution in each area of the globe listed
Have the databases be able to be read/write in each region we setup the solution while having bi-directional replication
(Not so important right now) ultimately, we would have azure front door in front of this to direct users based on their location where they get directed to. Obvious reason we need the databases to replicate to each other in order to ensure if a user is traveling, they get their tenants data as expected no matter where they log in from.
What we looked at so far:
Azure SQL Geo Replication will not do what we need as the replicas are read only which means we would have to have the Azure App Service in the UK or West point to the SQL server databases in the US East 2 region. We attempted that once and it was super slow but thats expected I would think.
Azure Data Sync, this has some caveats and issues which were that certain types of data do not replicate, certain tables are not replicable, if we add tables theres an added complexity with that.
Side Note: I tried setting this up just with the azure sample database and there we even tables in that you could not data sync.
I cant seem to find a solution that literally mirrors the databases without stipulations or caveats that require database changes on our end or some complexities being added.
I think David's response to this thread applies well here as well. I believe there's only Cosmos DB that gives you a multi-master feature "without stipulations or caveats that require database changes on our end or some complexities being added".

Convert CosmosDB Serverless to Provisioned throughput DB

I am getting ready to create a brand new mobile application that communicates with CosmosDB and I will probably go the serverless way. The serverless way has some little disadvantages compared to the provisioned throughput (eg. only 50GB per container, no Geo-Redundancy, no Multi-region Writes, etc.).
If I need later on to convert my DB to a provisioned throughput one, can I do it somehow?
I know that I can probably use the change-feed and from that (I guess) recreate a new DB from it (provisioned throughput one) but this might open the Pandora's box especially while a mobile app connects to a specific DB.
As Gaurav mentioned, there is no way to change to Provisioned from Serverless plan once you create an account.
You will need to recreate the account with Serverless as type and follow the below ways to migrate the data,
(i) Data Migration Tool - You can easily migrate from one account to another
(ii) ChangeFeed and Restore - push the changes to the new the instance of Azure Cosmos DB
Once you are synced switch back to the new one.
Based on the documentation available here: https://learn.microsoft.com/en-us/azure/cosmos-db/serverless#using-serverless-resources, it is currently not possible to change a Cosmos DB server less account to provisioned throughput.

How do I achieve data span across multiple regions (not replication) with Azure SQL

I have a single Azure SQL Server and a single database in it. I want a solution to store specific records of selected tables in this database in different regions.
as an example, I have a users table with all PII data in it. these users can be from anywhere from the world. but i would want to store user records who are from EU region to be stored only in EU region.
To add it - i want all the other table records related to a specific user as well to get stored in that user's region.
from application perspective, i would be able to query across all users and all related tables to have dashboard data for the global users.
Any pointers to solve this scenario would be helpful for me.
Another approach could be sharding the database. Use horizontal sharding to store the rows for each country/region in a separate database in that country/region. The Elastic Database Client library will use a shardmap do most of the sharding work for you (assuming you are using .NET). You can use the country code in your shardmap to split regional data.
Reference Architecture: https://learn.microsoft.com/en-us/azure/architecture/patterns/sharding
Elastic Database Client: https://learn.microsoft.com/en-us/azure/sql-database/sql-database-elastic-database-client-library
Here is one approach... When your user/tenant registers for your service they will need to pick where their data should reside. This is referred to as data residency. Then on subsequent requests to read or write data your application's repository layer needs to be aware of who the request is executing as so it can lookup the appropriate connection string and connect to that database to retrieve/write the data.
The routing data can be replicated to multiple regions and/or housed in a single location as it would not contain PII. The Azure Web App can be single region hosted (as depicted in the image below) or it can be replicated to multiple regions and traffic routed to it via a global traffic manager.
This approach supports the case where an European user picks to have their data reside in France but happens to be visiting the united states.
This picture shows how this might look. A guy named Barry Luijbregts has a nice pluralsight video that delves into this approach. https://www.pluralsight.com/courses/azure-paas-building-global-app
Good luck!

Azure Storage Account for Tables

So first of all I'd like to say I'm no DBA nor coder, I'm just a regular IT person that works as support for network and infrastructure, however, I like to get familiar with technologies in general and understand the basics of it, let's say how they work, implemented with no additional specific details.
I've been reading about Azure Storage Accounts in regards to tables. As IT, I had to implement simple file shares via SMB 3.0 in order to have them mapped on our network, I've come across other options such as blobs, tables and queues. I've read about them however I'm trying to get the main functionality of tables for a coder.
Correct me if I am wrong, when you code an app with a database, you can put the database on same/different server, and that can be on premise or on the cloud and you kind of link both together.
And as far as Im concerned and what I was able to find out investigating on the web, these tables are NoSQL and no constraints, you create the tables and data through Visual Studio thanks to an API, then that information is reflect on your storage.
How is this is useful when using it for the app you're developing?
I've been reading about Azure Storage Accounts in regards to tables. As IT, I had to implement simple file shares via SMB 3.0 in order to have them mapped on our network, I've come across other options such as blobs, tables and queues. I've read about them however I'm trying to get the main functionality of tables for a coder.
And as far as Im concerned and what I was able to find out investigating on the web, these tables are NoSQL and no constraints, you create the tables and data through Visual Studio thanks to an API, then that information is reflect on your storage.
Azure Storage Accounts is a "box" to keep your Blobs, Tables, Queues, Files organised from the management point of view and for the access control. Each storage type is good for it's specific tasks.
If the world would have just one super storage which will solve all our possible cases for storing, querying and managing the data then there would not be such variety of different databases, storage types etc. available.
If you need to share the files as a "network folder" - try Azure Files.
If your coders need a database storage, then the first question would be what are the requirements to the database do they have? What is the purpose of that database would be, etc. Azure, particularly, has a lot of different database solutions, and again, each of them good for some specific task, and can be not a good choice for other tasks.
As to Azure Tables, from the official docs:
Azure Table storage is a service that stores structured NoSQL data in the cloud, providing a key/attribute store with a schemaless design.
So, if your coders do need to store such data, then yes, that would be one of the possible choices.
Correct me if I am wrong, when you code an app with a database, you can put the database on same/different server, and that can be on premise or on the cloud and you kind of link both together.
Correct. But also you can have your own server with the database which you need to manage yourself, or you can choose some cloud service which will provide the database for you but will keep the underlying server and other maintenance activity managed for you, so you no need to worry/spend your time on that.
How is this is useful when using it for the app you're developing?
It is important to understand what your requirements are for data storage in order to pick a proper one. This question perhaps should be addressed not to you, but to your coders, who are building the app and can consolidate their requirements to the database store. Usually, they will tell you exactly what they need, and you may give them some ideas or advice of the alternatives, if any (That may be a similar solution with extra functionality or the way how the data is stored or processed, or have more built in integrations that may be important for you, or a decision whether keep own installation or use cloud managed service)
For your further possible question about When should I use a NoSQL database instead of a relational database? Is it okay to use both on the same site? see this thread
Update based on further questions:
If I develop an application with a database whose tables are on Azure, can I call let's say functions or data from it to my main application that is hosted on premise? What's the benefit of doing that versus hosting the tables on premise other than it's largely scalable and highly available?
Perhaps you need to better understand the relationship between App (Application) and DB (Database). The Database is a standalone system, which store the data, reply to the incoming queries (receive request, process it, return the result). In overall to the DB is not important who is requesting the data. It is a "passive" system. (There are some cases when DB can trigger further processes in data processing pipelines, but that is beyond this scope).
The App in opposite is an active system in App<->DB relationship. (Also leave behind more advanced designs where App is not just a 1 system). App receive requests, process them (may do external requests to other "services" if that is necessary), give a response (with or without data) to the requester. In App<->DB relationship the external requests is what happening. At some point App need some data from the DB, so App make a request to the DB, obtain the response and continue its own logic.
Where App server and DB server are placed is not that important (for simplicity). The important part is whether DB server is accessable for the requests. DB can be on-prem with public static IP address, it can be in cloud on your own server which has public static IP address (sometimes that is archived in different ways but we skip that for simplicity), that can be a Database as a Service cloud solution, where you do not need to have a server and configure the database, but have a url endpoint which you need to use to query the DB.
I appreciate the answer, and I pretty much agree with what you're saying.
But my questions goes beyond what the requirements are for the developers.
I'll modify the question. If I develop an application with a database whose tables are on Azure, can I call let's say functions or data from it to my main application that is hosted on premise? What's the benefit of doing that versus hosting the tables on premise other than it's largely scalable and highly available?
Azure Storage Tables are the "Notepad" of NoSQL Databases. If you want quick and easy key/value pairs, tables is the way to go. If you are looking for the "Word" of NoSQL in Azure then Cosmos DB is where it's at. Cosmos DB offers global distrobution, better features and better SLA (see comparison). Tables are cheaper too.
Azure also supports MySQL, PostGreSQL, MariaDB and MSSQL as PaaS offerings if you wish to use a traditional database.

How does azure handles geo-replication under the hood

We have an azure web app & a db we want to replicate all over the world.
So, we use Traffic manager to redirect the User to the closest hosted Web app , and with a location setting in the web app, It knows to which database it should go against.
Now, my question is , as the mode is One database Writeable (Primary) and the replicas being read only , how do me or azure handle that at the moment of calling the database?
For example, if from my app I am going to Add a record to database, I cant use the nearest DB connection string, I need to go against the Primary one.
Should I handle this? or I will go always against the nearest one even if its read-only an azure will handle the write transferring it to the primary db ?
In the case I am the one that should manage that, then I should handle 2 connection strings, one for the primary DB writeable, and one for the closest db readable, and I should split my services , categorized by write/read actions
and following this scenario, if I have a Store procedure which WIRTES AND READS, how would I handle that?
This is a common issue when it comes to using Azure SQL in geo-replication mode. You cannot use traditional LB techniques such as Azure Traffic Manager. In this case, you should be using the retry pattern on your database connections, working from the primary down to the alternate names as required.
AFAIK, there is no easy way to tell, after connected to a database, if you are on a primary or a read-only secondary. As per this link there are some stored procs you can call to understand the topology. You can understand this using Azure PS/API, but then you would have to build that logic in to your application.
In short:
You need to handle your database connections and employ retry
patterns,etc
You should implement CQRS to separate read/write workloads from
each other if you want to take advantage of read-only secondaries
Hope that helps.

Resources