How to scale multi tenant application for almost 2000 tenants

How to scale multi tenant application for almost 2000 tenants - azure

I want to create a multi tenant application. How to scale the db or partition because tenants will be more than 2000.
Is it correct to have individual db for each tenant.
Is it correct to split tenants by geographical region.Reporting will be a problem to get from individual tenant or generatinh report across multiple tenants.

This question may be off topic as too opinion based but here's a few things to note about DocumentDB:
You are limited to 5 databases by default.
Pricing is by collection, so it would be very costly to partition tenants into their own collection.
The common way to do this is to have a field with tenantID on each document and put all of your tenants into one collection. You may be surprised how much data fits in one collection and you can spill-over to a new collection when one fills up or you are constantly exceeding your resource unit limit.

Related

Azure ActiveDirectory: How many groups max?

This is a two part question.
What is the maximum number of Azure AD groups you can create?
Is there a best practice? We have over 3000 groups, and I’m wondering if it’s slowing things down.

What is the maximum number of Azure AD groups you can create?
There is no specific limit about the number of groups. But there are limits about objects(include groups). A maximum of 50,000 objects can be created in a single directory by users of the Free edition of Azure Active Directory by default. See more details here.
Is there a best practice? We have over 3000 groups, and I’m wondering
if it’s slowing things down.
The official documentation does not say that this number will affect performance. And I haven't seen any feedback about this.

How to design a multi-tenant node.js application?

Currently I am facing a technological decision to be made and personally am not able to find the solution myself.
I am currently in progress to develop a multiple-tenant database.
The structure would be the following:
There is one core database which saves data and relations about specific tenants
There are multiple tenant database instances(from a query in the core database, it is determined which tenant id I should be connecting to)
Each tenant is on a separate database instance(on a separate server)
Each tenant has specific data which should not be accessible by none of other tenants
Each database would preferably be in mySQL(but if there are better options, I am open to suggestions)
Backend is written in koa framework
The database models are different in the core database and tenant databases
Each tenant database's largest table could be around 1 mil records(without auditing)
Optimistically the amount of tenants could grow up to 50
Additional data about the project:
All of project's data is available for the owner
Each client will have data available for their own tenant
Each tenant will have their own website
Database structure remains the same for each tenant
Project is mainly a logistics service, which's data is segregated for each different region
The question:
Is this the correct approach to design a multi-tenant architecture or should there be a redesign in the architecture?
If multi-tenant with multiple servers are possible - is there a preferable tool/technology stack that should be done? (Would love to know more specifically about this)
It would be preferred to use an ORM. I am currently trying to use Sequelize but i am facing problems already at early stage(Multiple databases can't share the same models, management of multiple connections).
The ideal goal would be the possibility of adding additional tenants without much additional configuration.
EDIT:
- The databases would be currently hosted in Azure, but we'd prefer the option that they can be migrated away if it becomes a requirement

Exists some ways to architect a data structure in a multi tenant architecture.
It's so hard to say what is the better choice, but I will try to help you with my little knowledge.
First Options:
Segregate your database in distributed servers, for example each tenancy has your own data base server totally isolated.
It could be good because we have a lot of security with tenancy data, we can ensure that other tenancy never see the other tenancy data.
I see some problems in this case, thinking about cost we can increase a lot it because we need a machine to each client and perhaps software license, depends what is your environment. Thinking about devops, we will need a complex strategy to create and deploy a new instance for every new tenancy.
Second Options
Separate Data Bases, we have one server where we create separated databases to each tenancy.
This is often used if you need to provide isolation for each customer, because we can associate different logins, permissions and so on to each database.
Some other cons: A different connection pool is required per database, updates must be replicated across all the databases, there is no resource sharing (unless using Elastic Database Pools) and you need multiple backup strategies across all the databases, and a complex devops strategy to deploy and create new tenancies.
Third Option:
Separate Schemas, It's a good strategy to implement a multi-tenancy architecture, we can share some resources since everything is inside the same database, but the schemas used are different, having a separate schema for each tenant. That allows you to even customize a specific tenant without affecting others. And you save costs by only paying for one database.
Some of the cons: You need to replicate all the database objects in every schema, so the number of objects can increase indefinitely, updates must be replicated across all the schemas, the connection pool for the database must maintain a different connection per tenant (or set of credentials), a different user is required per tenant (which is stored at server level) and you have to backup that user independently.
Fourth Option
Row Isolation.
Everything is shared in this options, server, database and schema, All data for the tenants are in the same tables in the same database. The only way they are differentiated is based on a TenantId or some other column that exists on the table level.
Other good point is that you will not need a devops complex strategy, and if you are using SQL Server, I know that, there exists a resource called Row Level Security to you get only the data that logged user has permission.
But in this case if you have thousands of users who will be hitting the database at the same time you will need some approach for a good scalability.
So you need to think about your case and how your system will be growing up, to choose the better option.

It seems quite fine for me.
Where I see a bottleneck is having every tenant on a separate DB server or DB instance. It would mean that you need to hold a separate connection pool for every tenant or to create a new connection for every request depending on the tenant. Try using any concept where you can have one DB connection for all the tenants (namespaces, schemas or just prefixing tenant table names with some tenant-specific prefix)
But if you need to have the tenants DBs separate eg. because of different backup policies, resource limits etc. you can't do this and will have to manage separate connection pool for every tenant. It also depends on how many tenants will you have. Tens, thousands?
I would also suggest you to cache the tenant->DB mapping somewhere in the app instead of querying it every time from the core database.

Windows Azure Cache with a multi tenant application

I have in development a multi tenant application that I am deploying to azure.
I would like to take advantage of the windows azure cache service as it looks like it will be a great performance improvement vs hitting the database for each call.
Lets say I have 2 tables . Businesses and Customers. A business can have multiple customers and the business table contains details about the business.
Business details don't change often but customer information is changing constantly for each of the different tenants.
I assume I need 2 named instances (1 for business details and 1 for customers)
Is 2 named caches enough or do I need separate these for each of the tenants? I think 2 would be ok as if I have to create separate for each it will get expensive pretty quickly.
Thank you.

Using different named caches is interesting if you have different cache requirements (Expiry policy, default TTL, Notifications, High Availability, ...).
In you case you could simply look at using different Regions per tenant:
Windows Azure Cache supports the creation and use of user-defined regions. A region is a subgroup for cached items. Regions also support the annotation of cached items with additional descriptive strings called tags. Regions support the ability to perform search operations on any tagged items in that region.
This would allow you to split your named cache (you would only need one), in regions per tenant holding the businesses and customers for that tenant. And if the businesses don't change that often, you can simple change the TTL for those items to 1, 2, .. hours.

Multi Tenant Data Architecture in Azure

I want to implement multi tenant architecture for database. Plan is to have same database but have schema in it which will have same tables, sprocs, triggers, etc. repeated for each tenant. Tenant will be mapped to a schema and adding a tenant is like adding a schema.
And depending on the sub-domain, i will figure out the tenant and pull / push information to the respective database schema.
However, while looking for the way to implement the same i came across many articles and blogs and am confuse whether the word 'schema' is right in my context or should i go for Federation? And if i have to go to federation - does it mean that each tenant will be a federated member which will be mapped to a schema?
Can someone throw some more light on it?

I wrote a short series of articles for BusinessCloud9.com that may be of interest:
http://www.businesscloud9.com/user/2688
(I'd be grateful if you'd ignore the incorrect statement on the number of tables in a SQL Azure database! Unfortunately, I don't have the ability to edit the offending post to fix it)

SQL Azure federations can likely do the trick for you, but it is possible to accidentally create a fan-out query where multiple databases will be queried and results unintentionally intermixed. If you want to separate the schemas completely with no accidental mixing of data due to buggy code, you'll want multiple and distinct SQL Azure databases or schemas. You'll want to provision them as new tenants are brought onboard.
Here's a good link on the subject: http://geekswithblogs.net/hroggero/archive/2011/10/05/solving-schema-separation-challenges.aspx from one of SQL Azure MVPs

Limitations on Windows Azure Table Storage accounts

I am designing a multi-tennant web-based SaaS application that will be hosted on Windows Azure and use Table Storage.
The only limits I have found so far are:
5 storage accounts per subscription
100 TB maximum per storage account
1 MB per entity
I am deciding how to best partition my storage for multiple customers:
Option 1: Give each customer their own storage account. Not likely, considering the 5 account default limit.
Option 2: Give each customer their own set of tables. Prefix the table names with customer identifiers, such as a Books table split as "CustA_Books", "CustB_Books", etc.
Option 3: Have one set of tables, but prefix the partition keys to split the customers. So one "Books" table with partition keys of "CustA_Fiction", "CustA_NonFiction", "CustB_Fiction", "CustB_NonFiction", etc.
What are the pros and cons for options 2 and 3? Is there a limit to the number of tables in a single account that might affect option 2?

There are no limits to the number of tables you can create in Windows Azure. Your only limits ar the ones you have already listed. Well... I guess there are other limits if you consider the size of the entity attribute is always 64KB or less or if you consider batch options (100 entities or 4MB, whatever is the lesser).
Anyhow, the thing to keep in mind here is that your PartitionKey is going to be the most important thing you make. If you create a PK with the customer name in it, you get some good partitioning benefits. The downside to this is that if you mix the customer data in the same table, you make it harder on yourself to delete data (if you ever need to delete a customer). So, you can use the table as another level of partitioning. The PK you create is scoped to the table you create it under.
What I would consider here is if you ever need to delete the data in bulk or if you ever need to query data across customers (tenants). For the first one, it makes a ton of sense to use separate tables per customer so a delete is one operation versus at best 1 per 100 entities. However, if you need to query across tenants it is harder to join this data when you have multiple tables (that would require multiple queries).
All things being equal, I would use the tables as another level of partitioning if there is no overlap in tenant functionality and make my life easier should I want to delete a tenant. So, I guess that is option 2.
HTH

I highly suggest Option 2
We are also going this route because it adds a nice level or federation for the customer data. As the answered comment mentions it is easier to manage adding/deleting customers. Another benefit that we have noticed is the 'copy-abilty' of a customers data. This approach makes it much easier to move customer specific data to other storage accounts or to development environments for testing without affecting the entire lot.
In the SaaS world it also enables customers to get a copy of their own data with little effort, which is also a concern of many SaaS users.

Another alternative:
Imagine you have N storage accounts, the limit is 100 storage accounts per subscription. Each storage account have a table per customer.
For table request operations with Partition Key, like Insert, Update, Delete or a point query, you calculate hash value of customer name + partition key, calculate its modular of base N (total number of storage accounts), find the index of the exact storage account and forward the request to the correct storage account / table.
For read requests with no partition key, like a range query. Then you would need to broadcast the request to all storage accounts and merge the results.
One of the other things to keep in mind specifically around naming multiple storage accounts. Avoid naming the accounts lexicographically, that will cause them to be served from the same partition server on Azure backend and against their recommended scalability best practises. If you have N storage accounts. prefix each storage account name with a 3 digit hash, so they would be evenly distributed.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string