I have an ASP.NET MVC 2 Azure application that I am trying to switch from being single tenant to multi-tenant. I have been reviewing many blogs and posts and questions here on Stack Overflow, but am still trying to wrap my head around the specifics of what's right for this particular app.
Currently the application stores some information in a SQL Azure database, as well as some other info in an Azure Storage Account. I'm considering writing the tenant provisioning code to simply create a new database for a new tenant, along with a new azure storage account. This brings me to the following question:
How will I go about testing this approach locally? As far as I can tell, the local Azure Storage Emulator only has 1 storage account. I'm not sure if I'm able to create others locally. How will I be able to test this locally? Or will it be possible?
There are many aspects to consider with multitenancy, one of which is data architecture. You also have billing, performance, security and so forth.
Regarding data architecture, let's first explore SQL storage. You have the following options available to you: add a CustomerID (or other identifyer) that your code will use to filter records, use different schema containers for different customers (each customer has its own copy of all the database objects owned by a dedicated schema in a database), linear sharding (in which each customer has its own database) and Federation (a feature of SQL Azure that offers progressive sharding based on performance and scalability needs). All these options are valid, but have different implications on performance, scalability, security, maintenance (such as backups), cost and of course database design. I couldn't tell you which one to choose based on the information you provided; some models are easier to implement than others if you already have a code base. Generally speaking a linear shard is the simplest model and provides strong customer isolation, but perhaps the most expensive of all. A schema-based separation is not too hard, but requires a good handle on security requirements and can introduce cross-customer performance issues because this approach is not shared-nothing (for customers on the same database). Finally Federations requires the use of a customer identifyer and has a few limitations; however this technology gives you more control over performance distribution and long-term scalability (because like a linear shard, Federation uses a shared-nothing architecture).
Regarding storage accounts, using different storage accounts per customer is definitively the way to go. The primary issue you will face if you don't use separate storage accounts is performance limitations, such as the maximum number of transactions per second that can be executed using a single storage account. As you are pointing out however, testing locally may be a problem; however consider this: the local emulator does not offer 100% parity with an Azure Storage Account (some functions are not supported in the emulator). So I would only use the local emulator for initial development and troubleshooting. Any serious testing, including multitenant testing, should be done using real storage accounts. This is the only way you can fully test an application.
You should consider not creating separate databases, but instead creating different object namespaces within a single SQL database. Each tenant can have their own set of tables.
Depending on how you are using storage, you can create separate storage containers or message queues per client.
Given these constraints you should be able to test locally with the storage emulator and local SQL instance.
Please let me know if you need further explanation.
Related
Short Version
We want to migrate to v4 and our app names are less than 32 symbols.
Should we migrate to dedicated Storage Accounts or not?
Long Version
We use Azure Functions v3. From start one Storage Account was shared between 10+ Azure Function Apps. It could be by luck but the names are less than 32 symbols and it is not going to change. We are not using slots as they were initially not recommended and then with no adoption time or recommendation made generally available.
Pre-question research revealed this question but it looks like more related to the durable functions. Another question looks more up the point but outdated and the accepted answer states that one Storage Account can be used.
Firstly, the official documentation has a page with storage considerations and it states (props to ijabit for pointing to it.):
It's possible for multiple function apps to share the same storage account without any issues. For example, in Visual Studio you can develop multiple apps using the Azure Storage Emulator. In this case, the emulator acts like a single storage account. The same storage account used by your function app can also be used to store your application data. However, this approach isn't always a good idea in a production environment.
Unfortunately it does not elaborate further on the rationale behind the last sentence.
The page with best practices for Azure Function mentions:
To improve performance in production, use a separate storage account for each function app. This is especially true with Durable Functions and Event Hub triggered functions.
To my greater confusion there was a subsection on this page that said "Avoid sharing storage accounts". But it was later removed.
This issue is somehow superficially related to the question as it mentions the recommendation in the thread.
Secondly, we had contacted Azure Support for different not-related to this question issues and the two different support engineers shared different opinions on the current issue. One said that we can share a Storage Account among Functions Apps and another one said that we should not. So the recommendation from the support was mixed.
Thirdly, we want to migrate to v4 and in the migration notes it is stated:
Function apps that share storage accounts will fail to start if their computed hostnames are the same. Use a separate storage account for each function app. (#2049)
Digging deeper into the topic, the only issue is the collision of the function host names that are used to obtain the lock that was known even in Oct 2017. One can follow the thread and see how in Jan 2020 the recommendation was made to update the official Azure naming recommendation but it was made only on late Nov 2021. I also see that a non-intrusive, i.e. without renaming, solution is to manually set the host id. The two arguments raised by balag0 are: single point of failure and better isolation. They sound good from the perspective of cleaner architecture but pragmatically I personally find Storage Accounts reliable, especially if read about redundancy or consider that MS is dog-fooding it for other services. So it looks more like a backbone of Azure for me.
Finally, as we want to migrate to v4, should we migrate to dedicated Storage Accounts or not?
For the large project with 30+ Azure Functions I work on, we have gone with dedicated Storage Accounts. The reason why is Azure Storage account service limits. As the docs mention, this really comes into play with Durable Task Functions, but can also come into play in other high volume scenarios. There's a hard limit of 20k requests per second for a Storage Account. Hit that limit, and requests will fail and will return HTTP 429 responses. This means that your Azure Function invocation will fail too. We're running some high-volume scenarios and ran into this.
It can also cause problems with Durable Task Functions if two functions have the same TaskHub ID in host.json. This causes a collision when Durable Task Framework does its internal bookkeeping using Storage Queues and Table Storage, and there's lots of pain and agony as things fail in spectacular fashion.
Note that the 20k requests per second service limit can be raised with a support ticket to Azure. If approved, the max they'll raise it to is 50k requests/second.
So avoid the potential headaches and go with a Storage Account per Function.
I have managed to get the C# and db setup using ListMappings. However, when I try to deploy the split/merge tool to Azure cloud classic the service it states 'The requested VM tier is currently not available in East US for this subscription. Please try another tier or deploy to a different location.' We tried a few other regions with the same result. Do you know if there is a workaround or updated version? Is the split / merge service even still relevant? Has anyone got this service to run on Azure lately?
https://learn.microsoft.com/en-us/azure/azure-sql/database/elastic-scale-overview-split-and-merge
The answer to the question on whether it is still relevant, in my opinion is ...no. Split\merge is no longer relevant with the maturation of elastic pools. Elastic pools with one data base per tenant seem the sustainable way to implement multi tenancy with legacy code. The initial plan was to add keys to each of our tables to have multiple tenants per database. Elastic pools give us the same flexibility without having to make breaking changes our existing code.
Late post here, but we are implementing ElasticScale for a client to split ~50 clients into a database-per-tenant model. I don't think the SplitMerge tool will be used over the long term, just for the initial data migration from one db to many shards, but it has been handy for that purpose. We are using the ElasticScale SDK to allow a single API to route queries to the appropriate shard(s) based on sharding key. Happy to compare notes with you if you are still working on this.
So first of all I'd like to say I'm no DBA nor coder, I'm just a regular IT person that works as support for network and infrastructure, however, I like to get familiar with technologies in general and understand the basics of it, let's say how they work, implemented with no additional specific details.
I've been reading about Azure Storage Accounts in regards to tables. As IT, I had to implement simple file shares via SMB 3.0 in order to have them mapped on our network, I've come across other options such as blobs, tables and queues. I've read about them however I'm trying to get the main functionality of tables for a coder.
Correct me if I am wrong, when you code an app with a database, you can put the database on same/different server, and that can be on premise or on the cloud and you kind of link both together.
And as far as Im concerned and what I was able to find out investigating on the web, these tables are NoSQL and no constraints, you create the tables and data through Visual Studio thanks to an API, then that information is reflect on your storage.
How is this is useful when using it for the app you're developing?
I've been reading about Azure Storage Accounts in regards to tables. As IT, I had to implement simple file shares via SMB 3.0 in order to have them mapped on our network, I've come across other options such as blobs, tables and queues. I've read about them however I'm trying to get the main functionality of tables for a coder.
And as far as Im concerned and what I was able to find out investigating on the web, these tables are NoSQL and no constraints, you create the tables and data through Visual Studio thanks to an API, then that information is reflect on your storage.
Azure Storage Accounts is a "box" to keep your Blobs, Tables, Queues, Files organised from the management point of view and for the access control. Each storage type is good for it's specific tasks.
If the world would have just one super storage which will solve all our possible cases for storing, querying and managing the data then there would not be such variety of different databases, storage types etc. available.
If you need to share the files as a "network folder" - try Azure Files.
If your coders need a database storage, then the first question would be what are the requirements to the database do they have? What is the purpose of that database would be, etc. Azure, particularly, has a lot of different database solutions, and again, each of them good for some specific task, and can be not a good choice for other tasks.
As to Azure Tables, from the official docs:
Azure Table storage is a service that stores structured NoSQL data in the cloud, providing a key/attribute store with a schemaless design.
So, if your coders do need to store such data, then yes, that would be one of the possible choices.
Correct me if I am wrong, when you code an app with a database, you can put the database on same/different server, and that can be on premise or on the cloud and you kind of link both together.
Correct. But also you can have your own server with the database which you need to manage yourself, or you can choose some cloud service which will provide the database for you but will keep the underlying server and other maintenance activity managed for you, so you no need to worry/spend your time on that.
How is this is useful when using it for the app you're developing?
It is important to understand what your requirements are for data storage in order to pick a proper one. This question perhaps should be addressed not to you, but to your coders, who are building the app and can consolidate their requirements to the database store. Usually, they will tell you exactly what they need, and you may give them some ideas or advice of the alternatives, if any (That may be a similar solution with extra functionality or the way how the data is stored or processed, or have more built in integrations that may be important for you, or a decision whether keep own installation or use cloud managed service)
For your further possible question about When should I use a NoSQL database instead of a relational database? Is it okay to use both on the same site? see this thread
Update based on further questions:
If I develop an application with a database whose tables are on Azure, can I call let's say functions or data from it to my main application that is hosted on premise? What's the benefit of doing that versus hosting the tables on premise other than it's largely scalable and highly available?
Perhaps you need to better understand the relationship between App (Application) and DB (Database). The Database is a standalone system, which store the data, reply to the incoming queries (receive request, process it, return the result). In overall to the DB is not important who is requesting the data. It is a "passive" system. (There are some cases when DB can trigger further processes in data processing pipelines, but that is beyond this scope).
The App in opposite is an active system in App<->DB relationship. (Also leave behind more advanced designs where App is not just a 1 system). App receive requests, process them (may do external requests to other "services" if that is necessary), give a response (with or without data) to the requester. In App<->DB relationship the external requests is what happening. At some point App need some data from the DB, so App make a request to the DB, obtain the response and continue its own logic.
Where App server and DB server are placed is not that important (for simplicity). The important part is whether DB server is accessable for the requests. DB can be on-prem with public static IP address, it can be in cloud on your own server which has public static IP address (sometimes that is archived in different ways but we skip that for simplicity), that can be a Database as a Service cloud solution, where you do not need to have a server and configure the database, but have a url endpoint which you need to use to query the DB.
I appreciate the answer, and I pretty much agree with what you're saying.
But my questions goes beyond what the requirements are for the developers.
I'll modify the question. If I develop an application with a database whose tables are on Azure, can I call let's say functions or data from it to my main application that is hosted on premise? What's the benefit of doing that versus hosting the tables on premise other than it's largely scalable and highly available?
Azure Storage Tables are the "Notepad" of NoSQL Databases. If you want quick and easy key/value pairs, tables is the way to go. If you are looking for the "Word" of NoSQL in Azure then Cosmos DB is where it's at. Cosmos DB offers global distrobution, better features and better SLA (see comparison). Tables are cheaper too.
Azure also supports MySQL, PostGreSQL, MariaDB and MSSQL as PaaS offerings if you wish to use a traditional database.
Currently I am facing a technological decision to be made and personally am not able to find the solution myself.
I am currently in progress to develop a multiple-tenant database.
The structure would be the following:
There is one core database which saves data and relations about specific tenants
There are multiple tenant database instances(from a query in the core database, it is determined which tenant id I should be connecting to)
Each tenant is on a separate database instance(on a separate server)
Each tenant has specific data which should not be accessible by none of other tenants
Each database would preferably be in mySQL(but if there are better options, I am open to suggestions)
Backend is written in koa framework
The database models are different in the core database and tenant databases
Each tenant database's largest table could be around 1 mil records(without auditing)
Optimistically the amount of tenants could grow up to 50
Additional data about the project:
All of project's data is available for the owner
Each client will have data available for their own tenant
Each tenant will have their own website
Database structure remains the same for each tenant
Project is mainly a logistics service, which's data is segregated for each different region
The question:
Is this the correct approach to design a multi-tenant architecture or should there be a redesign in the architecture?
If multi-tenant with multiple servers are possible - is there a preferable tool/technology stack that should be done? (Would love to know more specifically about this)
It would be preferred to use an ORM. I am currently trying to use Sequelize but i am facing problems already at early stage(Multiple databases can't share the same models, management of multiple connections).
The ideal goal would be the possibility of adding additional tenants without much additional configuration.
EDIT:
- The databases would be currently hosted in Azure, but we'd prefer the option that they can be migrated away if it becomes a requirement
Exists some ways to architect a data structure in a multi tenant architecture.
It's so hard to say what is the better choice, but I will try to help you with my little knowledge.
First Options:
Segregate your database in distributed servers, for example each tenancy has your own data base server totally isolated.
It could be good because we have a lot of security with tenancy data, we can ensure that other tenancy never see the other tenancy data.
I see some problems in this case, thinking about cost we can increase a lot it because we need a machine to each client and perhaps software license, depends what is your environment. Thinking about devops, we will need a complex strategy to create and deploy a new instance for every new tenancy.
Second Options
Separate Data Bases, we have one server where we create separated databases to each tenancy.
This is often used if you need to provide isolation for each customer, because we can associate different logins, permissions and so on to each database.
Some other cons: A different connection pool is required per database, updates must be replicated across all the databases, there is no resource sharing (unless using Elastic Database Pools) and you need multiple backup strategies across all the databases, and a complex devops strategy to deploy and create new tenancies.
Third Option:
Separate Schemas, It's a good strategy to implement a multi-tenancy architecture, we can share some resources since everything is inside the same database, but the schemas used are different, having a separate schema for each tenant. That allows you to even customize a specific tenant without affecting others. And you save costs by only paying for one database.
Some of the cons: You need to replicate all the database objects in every schema, so the number of objects can increase indefinitely, updates must be replicated across all the schemas, the connection pool for the database must maintain a different connection per tenant (or set of credentials), a different user is required per tenant (which is stored at server level) and you have to backup that user independently.
Fourth Option
Row Isolation.
Everything is shared in this options, server, database and schema, All data for the tenants are in the same tables in the same database. The only way they are differentiated is based on a TenantId or some other column that exists on the table level.
Other good point is that you will not need a devops complex strategy, and if you are using SQL Server, I know that, there exists a resource called Row Level Security to you get only the data that logged user has permission.
But in this case if you have thousands of users who will be hitting the database at the same time you will need some approach for a good scalability.
So you need to think about your case and how your system will be growing up, to choose the better option.
It seems quite fine for me.
Where I see a bottleneck is having every tenant on a separate DB server or DB instance. It would mean that you need to hold a separate connection pool for every tenant or to create a new connection for every request depending on the tenant. Try using any concept where you can have one DB connection for all the tenants (namespaces, schemas or just prefixing tenant table names with some tenant-specific prefix)
But if you need to have the tenants DBs separate eg. because of different backup policies, resource limits etc. you can't do this and will have to manage separate connection pool for every tenant. It also depends on how many tenants will you have. Tens, thousands?
I would also suggest you to cache the tenant->DB mapping somewhere in the app instead of querying it every time from the core database.
If I write an Azure addon, can it access the WADPerformanceCountersTable table (of the business application that provisioned this addon)? Especially in terms of security/permissions.
E.g. say I wanted my addon to monitor some performance counters, and send an email alert if they pass some thresholds (regardless of whether there are already such commercial products, I'm just interested in the technical capability). What will I have to do? I'm guessing WADPerformanceCountersTable isn't publicly exposed to the entire worlds - so how can I make them accessible to my addon?
thanks very much
WADPerformanceCountersTable is nothing different from other Azure tables, and it's stored in the storage defined by Microsoft.WindowsAzure.Plugins.Diagnostics.ConnectionString in the configuration file. You will need the storage account name/key pair to read from this table.
FYI, here is an article about how to effectively fetching performance counter data from this table: http://gauravmantri.com/2012/02/17/effective-way-of-fetching-diagnostics-data-from-windows-azure-diagnostics-table-hint-use-partitionkey/