Azure Elastic Scale/Database-per-tenant: how to implement data dependent routing - azure

Based on the Microsoft Azure Elastic Scale sample apps online I have been able to create my Shard Map Manager (SMM) and elastic pool databases in Azure. My architecture is separate database per tenant. I am using Entity Framework in my web application. I am using a byte[] hash as my Shard Key based on an alphanumeric customer name. The customer name is entered as part of customer login so I can determine the unique shard key at the time of login to be passed to the SMM.
My questions are:
1.) Since each tenant has its own database, do I still need to include the hashed customer name/shard Key in each row of the customer tables?
2.) I don't understand where the shard key information gets passed to the SMM during a call to the server. Is it within the context of the entity or does it need to be a part of the query itself? Any sample of this would be greatly appreciated!

You access the Shard Map Manager database when finding the connection string for a particular tenant. Once you have the connection string, you connect to a tenant-specific database. Inside the database you don't need to use the shard key at all.
The Elastic Database Tools library has an implementation of data dependent routing (DDR). But you might find it overkill for when you have a simple single tenant sharding pattern implementation. You can always just query the shard map database (or custom configuration store) at startup and load a Dictonary<string,string> to store the CustomerName -> ConnectionString lookup.

Related

Regarding Geo-Replication in Azure

I've configured an active Geo replication for Azure SQL DB. I have primary and secondary database without failover policy. Also, I have an App Service which fetches data by using primary database connection string. After doing forced failover, reads are working fine most of time but my inserts/updates were failed.
My question is do I need to update my connection string(pointing to secondary database which has become primary database now due to failover) in the App Service to make insert/update work or is there any other way to make my application work without changing connection string in my app service?
Thanks In Advance!!!
P.S - I am new to Azure.
There are two types of authentication level:
server level
database level
When you are using SQL admin server user(server-level authentication) then you must update connection string manually in the application.
If you do not want to update connection string then follow this to handle this situation by creating contained DB users in the primary database, wait until the next replication takes place or manual failover, and then try, here the authentication work on database level.
For more information:https://learn.microsoft.com/en-us/sql/relational-databases/security/contained-database-users-making-your-database-portable?view=sql-server-2017

Azure Table Storage for housing Application Configuration

-- I am exploring Azure functionality and am wondering if Azure Table Storage can be an easy way for holding application configuration for an entire environment. It would be easy to see and change (adding list values etc.). Can someone please guide me on whether this is a good idea? I would expect this table to hold no more than 2000 rows if all our applications were moved over to Azure.
Partition Key --> Project Name + Component Name (Azure Function/Logic App)
Row Key --> Parameter Key
Value column --> Parameter Value
-- For securing password/keys, I can use the Azure Key Vault.
There are different ways of storing application configurations:
Key Vault (as you stated) for sensitive information. Ex. tokens, keys, connection strings. It can be standardized and extended to any type of resources for ease of storing and retrieving these.
Application Settings, found under each App Service. This approach assumes you have an App Service for each of your app.
Release Pipeline, such as Azure DevOps Services (AzDo). AzDo has variables that can be global to the release pipeline or some that can be specific to each stages
I am exploring Azure functionality and am wondering if Azure Table
Storage can be an easy way for holding application configuration for
an entire environment. It would be easy to see and change (adding list
values etc.). Can someone please guide me on whether this is a good
idea?
Considering Azure Tables is a key/value pair store, it is certainly a good idea to store application configuration values there. Only thing I would recommend is that you incorporate some kind of caching layer between your application and table storage so that you don't end up making calls to table storage every time you need to fetch a setting.
I would expect this table to hold no more than 2000 rows if all our
applications were moved over to Azure.
Considering the number of entities is going to be less than 2000, I think your design would have no impact in querying the entities however I think your design is good. For best performance, please ensure that you're including both PartitionKey and RowKey while querying. At the very least, include PartitionKey in your query.
Please see this for more details: https://learn.microsoft.com/en-us/azure/cosmos-db/table-storage-design-guide.
For securing password/keys, I can use the Azure Key Vault.
That's the way to go for storing sensitive data in Azure.
Have you looked at the App Configuration service?
There are client libraries in .NET, Java, TypeScript and Python to interact with the service that you can leverage in your application.

How to design a multi-tenant node.js application?

Currently I am facing a technological decision to be made and personally am not able to find the solution myself.
I am currently in progress to develop a multiple-tenant database.
The structure would be the following:
There is one core database which saves data and relations about specific tenants
There are multiple tenant database instances(from a query in the core database, it is determined which tenant id I should be connecting to)
Each tenant is on a separate database instance(on a separate server)
Each tenant has specific data which should not be accessible by none of other tenants
Each database would preferably be in mySQL(but if there are better options, I am open to suggestions)
Backend is written in koa framework
The database models are different in the core database and tenant databases
Each tenant database's largest table could be around 1 mil records(without auditing)
Optimistically the amount of tenants could grow up to 50
Additional data about the project:
All of project's data is available for the owner
Each client will have data available for their own tenant
Each tenant will have their own website
Database structure remains the same for each tenant
Project is mainly a logistics service, which's data is segregated for each different region
The question:
Is this the correct approach to design a multi-tenant architecture or should there be a redesign in the architecture?
If multi-tenant with multiple servers are possible - is there a preferable tool/technology stack that should be done? (Would love to know more specifically about this)
It would be preferred to use an ORM. I am currently trying to use Sequelize but i am facing problems already at early stage(Multiple databases can't share the same models, management of multiple connections).
The ideal goal would be the possibility of adding additional tenants without much additional configuration.
EDIT:
- The databases would be currently hosted in Azure, but we'd prefer the option that they can be migrated away if it becomes a requirement
Exists some ways to architect a data structure in a multi tenant architecture.
It's so hard to say what is the better choice, but I will try to help you with my little knowledge.
First Options:
Segregate your database in distributed servers, for example each tenancy has your own data base server totally isolated.
It could be good because we have a lot of security with tenancy data, we can ensure that other tenancy never see the other tenancy data.
I see some problems in this case, thinking about cost we can increase a lot it because we need a machine to each client and perhaps software license, depends what is your environment. Thinking about devops, we will need a complex strategy to create and deploy a new instance for every new tenancy.
Second Options
Separate Data Bases, we have one server where we create separated databases to each tenancy.
This is often used if you need to provide isolation for each customer, because we can associate different logins, permissions and so on to each database.
Some other cons: A different connection pool is required per database, updates must be replicated across all the databases, there is no resource sharing (unless using Elastic Database Pools) and you need multiple backup strategies across all the databases, and a complex devops strategy to deploy and create new tenancies.
Third Option:
Separate Schemas, It's a good strategy to implement a multi-tenancy architecture, we can share some resources since everything is inside the same database, but the schemas used are different, having a separate schema for each tenant. That allows you to even customize a specific tenant without affecting others. And you save costs by only paying for one database.
Some of the cons: You need to replicate all the database objects in every schema, so the number of objects can increase indefinitely, updates must be replicated across all the schemas, the connection pool for the database must maintain a different connection per tenant (or set of credentials), a different user is required per tenant (which is stored at server level) and you have to backup that user independently.
Fourth Option
Row Isolation.
Everything is shared in this options, server, database and schema, All data for the tenants are in the same tables in the same database. The only way they are differentiated is based on a TenantId or some other column that exists on the table level.
Other good point is that you will not need a devops complex strategy, and if you are using SQL Server, I know that, there exists a resource called Row Level Security to you get only the data that logged user has permission.
But in this case if you have thousands of users who will be hitting the database at the same time you will need some approach for a good scalability.
So you need to think about your case and how your system will be growing up, to choose the better option.
It seems quite fine for me.
Where I see a bottleneck is having every tenant on a separate DB server or DB instance. It would mean that you need to hold a separate connection pool for every tenant or to create a new connection for every request depending on the tenant. Try using any concept where you can have one DB connection for all the tenants (namespaces, schemas or just prefixing tenant table names with some tenant-specific prefix)
But if you need to have the tenants DBs separate eg. because of different backup policies, resource limits etc. you can't do this and will have to manage separate connection pool for every tenant. It also depends on how many tenants will you have. Tens, thousands?
I would also suggest you to cache the tenant->DB mapping somewhere in the app instead of querying it every time from the core database.

How does azure handles geo-replication under the hood

We have an azure web app & a db we want to replicate all over the world.
So, we use Traffic manager to redirect the User to the closest hosted Web app , and with a location setting in the web app, It knows to which database it should go against.
Now, my question is , as the mode is One database Writeable (Primary) and the replicas being read only , how do me or azure handle that at the moment of calling the database?
For example, if from my app I am going to Add a record to database, I cant use the nearest DB connection string, I need to go against the Primary one.
Should I handle this? or I will go always against the nearest one even if its read-only an azure will handle the write transferring it to the primary db ?
In the case I am the one that should manage that, then I should handle 2 connection strings, one for the primary DB writeable, and one for the closest db readable, and I should split my services , categorized by write/read actions
and following this scenario, if I have a Store procedure which WIRTES AND READS, how would I handle that?
This is a common issue when it comes to using Azure SQL in geo-replication mode. You cannot use traditional LB techniques such as Azure Traffic Manager. In this case, you should be using the retry pattern on your database connections, working from the primary down to the alternate names as required.
AFAIK, there is no easy way to tell, after connected to a database, if you are on a primary or a read-only secondary. As per this link there are some stored procs you can call to understand the topology. You can understand this using Azure PS/API, but then you would have to build that logic in to your application.
In short:
You need to handle your database connections and employ retry
patterns,etc
You should implement CQRS to separate read/write workloads from
each other if you want to take advantage of read-only secondaries
Hope that helps.

SQL Azure Federation Splitting Design and Querying

I have a few questions regarding Microsoft SQL Azure Federations:
1) Can I created a federated DB on an active Database or do I need to deploy federations ahead of time?
2) Do I need to make any changes to the SQL queries to comply with how I query federations, or I can continue to use my regular queries as I was working against one SQL Server Database?
3) When I split my database and after some time I see that one of the shards is very busy and almost full, how I tackle this problem using federations? - Do I need to split only that single federated table that is 90% full, or I need to recreate the splitting strategy by using a a less broader range. The problem is that one specific user can be very active, so what strategy I use to making sure that I won't need to re-create the federated strategy due to one very active federated table / user?
4) When I have different tables that I want to split with different primary keys, how the sharding will work then. for example:
From what I understand:
[Blogs]
blog_id
info
[Blog_Posts]
id
blog_id
post_content
So if I decide to shard based on the blog_id from 0-1000, 1-2001 I will have two federated tables. But how much more federated tables I have if I add more tables that have different keys other than blog_id, will I have more federated tables?
Thanks
Please be more precise and concrete and ask one question at a time. You have better chance for getting an answer to all of the questions when asked separately. Now let me try covering some of your questions.
1) Can I created a federated DB on an active Database or do I need to
deploy federations ahead of time?
You can certainly create a Federation(s) within an existing DB. There is no limitation to creating Federations in just a new/empty DB. However, creating a federation in an Active DB will do nothing for you. You have to realize that Federations are separate DBs. A Federation (or Federation member) knows nothing about the Federations Root DB (the DB where you created the federation). So you have to think on migrating schema/data from the Active DB (or the Federations Root) once you create your federation.
2) Do I need to make any changes to the SQL queries to comply with how
I query federations, or I can continue to use my regular queries as I
was working against one SQL Server Database?
Most probably YES. Windows Azure SQL Database Federations is a Scale-Out mechanism for the DB tier. This means, that like any Web Application needs a "special" design to work in a farm-like environment (i.e. scale-out environment like Windows Azure), a DataBase will also need a "special" design to work in a scale-out environment. There is no magic-wand with SQL Azure Federations that will make your code work. You have to design it to Work.
3) When I split my database and after some time I see that one of the
shards is very busy and almost full, how I tackle this problem using
federations? - Do I need to split only that single federated table
that is 90% full, or I need to recreate the splitting strategy by
using a a less broader range. The problem is that one specific user
can be very active, so what strategy I use to making sure that I won't
need to re-create the federated strategy due to one very active
federated table / user?
This is all about partitioning strategy. You have to very carefully design your federation key and how you partition your data across different shards. You can always SPLIT any federation, as long as you keep the Atomic Units in single shard.
4) When I have different tables that I want to split with different
primary keys, how the sharding will work then.
If you want to split different tables on different keys, than you will have different federations, each one with its own federation key and own tables.
A good video worth watching if you are up for SQL Federations: http://channel9.msdn.com/Events/TechEd/NorthAmerica/2012/DBI408

Resources