How to design a multi-tenant node.js application? - node.js

Currently I am facing a technological decision to be made and personally am not able to find the solution myself.
I am currently in progress to develop a multiple-tenant database.
The structure would be the following:
There is one core database which saves data and relations about specific tenants
There are multiple tenant database instances(from a query in the core database, it is determined which tenant id I should be connecting to)
Each tenant is on a separate database instance(on a separate server)
Each tenant has specific data which should not be accessible by none of other tenants
Each database would preferably be in mySQL(but if there are better options, I am open to suggestions)
Backend is written in koa framework
The database models are different in the core database and tenant databases
Each tenant database's largest table could be around 1 mil records(without auditing)
Optimistically the amount of tenants could grow up to 50
Additional data about the project:
All of project's data is available for the owner
Each client will have data available for their own tenant
Each tenant will have their own website
Database structure remains the same for each tenant
Project is mainly a logistics service, which's data is segregated for each different region
The question:
Is this the correct approach to design a multi-tenant architecture or should there be a redesign in the architecture?
If multi-tenant with multiple servers are possible - is there a preferable tool/technology stack that should be done? (Would love to know more specifically about this)
It would be preferred to use an ORM. I am currently trying to use Sequelize but i am facing problems already at early stage(Multiple databases can't share the same models, management of multiple connections).
The ideal goal would be the possibility of adding additional tenants without much additional configuration.
EDIT:
- The databases would be currently hosted in Azure, but we'd prefer the option that they can be migrated away if it becomes a requirement

Exists some ways to architect a data structure in a multi tenant architecture.
It's so hard to say what is the better choice, but I will try to help you with my little knowledge.
First Options:
Segregate your database in distributed servers, for example each tenancy has your own data base server totally isolated.
It could be good because we have a lot of security with tenancy data, we can ensure that other tenancy never see the other tenancy data.
I see some problems in this case, thinking about cost we can increase a lot it because we need a machine to each client and perhaps software license, depends what is your environment. Thinking about devops, we will need a complex strategy to create and deploy a new instance for every new tenancy.
Second Options
Separate Data Bases, we have one server where we create separated databases to each tenancy.
This is often used if you need to provide isolation for each customer, because we can associate different logins, permissions and so on to each database.
Some other cons: A different connection pool is required per database, updates must be replicated across all the databases, there is no resource sharing (unless using Elastic Database Pools) and you need multiple backup strategies across all the databases, and a complex devops strategy to deploy and create new tenancies.
Third Option:
Separate Schemas, It's a good strategy to implement a multi-tenancy architecture, we can share some resources since everything is inside the same database, but the schemas used are different, having a separate schema for each tenant. That allows you to even customize a specific tenant without affecting others. And you save costs by only paying for one database.
Some of the cons: You need to replicate all the database objects in every schema, so the number of objects can increase indefinitely, updates must be replicated across all the schemas, the connection pool for the database must maintain a different connection per tenant (or set of credentials), a different user is required per tenant (which is stored at server level) and you have to backup that user independently.
Fourth Option
Row Isolation.
Everything is shared in this options, server, database and schema, All data for the tenants are in the same tables in the same database. The only way they are differentiated is based on a TenantId or some other column that exists on the table level.
Other good point is that you will not need a devops complex strategy, and if you are using SQL Server, I know that, there exists a resource called Row Level Security to you get only the data that logged user has permission.
But in this case if you have thousands of users who will be hitting the database at the same time you will need some approach for a good scalability.
So you need to think about your case and how your system will be growing up, to choose the better option.

It seems quite fine for me.
Where I see a bottleneck is having every tenant on a separate DB server or DB instance. It would mean that you need to hold a separate connection pool for every tenant or to create a new connection for every request depending on the tenant. Try using any concept where you can have one DB connection for all the tenants (namespaces, schemas or just prefixing tenant table names with some tenant-specific prefix)
But if you need to have the tenants DBs separate eg. because of different backup policies, resource limits etc. you can't do this and will have to manage separate connection pool for every tenant. It also depends on how many tenants will you have. Tens, thousands?
I would also suggest you to cache the tenant->DB mapping somewhere in the app instead of querying it every time from the core database.

Related

Designing and implementing SaaS Application with Muti-tenancy GraphDB (Neo4J / ArangoDB)

I am developing a SaaS Application with the following Technology:
NestJS (Node)
DB (NEO4J, ArangoDB)
Nginx for proxy (Micro-services Approach)
The SaaS Application will be hosting many distinct companies, as clients.
The data from 2 different companies must be fully isolated in the GraphDB.
2 different companies may have different data structures and models.
ENQUIRIES
Here are my enquiries:
How to setup Multi-tenancy on a GraphDB (Neo4J / ArangoDB)?
Is a totally separate required GraphDB instance required for each company?
Is it possible to host 2 companies on the same GraphDB, yet maintain isolation?
Can anyone please suggest an optimal solution for this type of architecture?
Thanks for your time
Best regards
Since Neo4j 4.0 multi-tenancy is supported via multi-database.
In the system database you can create as many databases as you want and from a client select the database to talk to on a session by session basis, so you can use each database for a tenant.
Here is the JS API:
https://neo4j.com/docs/api/javascript-driver/current/class/src/driver.js~Driver.html#instance-method-session
Each database instance can handle hundreds or thousands of databases.
With Neo4j Fabric enabled you can do cross-database federated queries.
here are some more examples
https://adamcowley.co.uk/neo4j/multi-tenancy-neo4j-4.0/
https://graphaware.com/neo4j/2020/02/06/multi-tenancy-neo4j.html
https://neo4j.com/developer/multi-tenancy-worked-example/
With ArangoDB you only need one instance and can simply use a database per tenant.
Each database is isolated, for example, AQL queries run in the context of a single database and you can only access the collections and named graphs of that database.
You can create an ArangoDB user for each customer and restrict its access to the respective database to achieve the desired isolation.
For scalability and resilience, there is also the OneShard feature (Enterprise Edition / managed service). It enables you to have a cluster where each database is treated like a single shard, i.e. all collections of a customer are stored on one DB-Server (excluding replicas), so that queries can be executed locally on that node. This is especially beneficial for graph traversals.

Azure, SQL Server and Database

One of my customer is developing multi-tenant solution. And I'm working as developer for the automation of resource provisioning part. The solution is developed such that each tenant have their resources separate from each other.
So for example, a single tenant will require a SQL database (PAAS), A Storage Account, and also many other resources.
One of the requirement that, customer set is, he wants to have X number of databases to be hosted on a SQL server (a logical server not VM). Which I don't think is valid having been using SQL as PAAS.
So My question is, Should we create SQL Server and SQL database for each tenant?
Or
Should we create a SQL server then host X number of databases on that server. when server reaches limits (X databases), create another server and execute same logic.
In either scenario, what difference does it make from Database Performace, Pricing and Database security point of view?
FYI, My thinking is that, If I host 'X' database on a single SQL Logical Server or If I create 'X' SQL Logical Server for 'X' SQL database hosting, It won't make any difference from Pricing and Database Performace point of view.
Few differences i could think of, if you go with single server for all clients..
1.Administrator Password is per Server and using this,one client can have access to other databases as well..
2.Azure has a limit of how many DTU's can be capped under one server,so if you have many databases under one server..This may lead to few issues like
a.)frequent DTU increase requests
b.)some times automated backup may fail,if there are no DTU's available(Backup needs to copy the whole database,so in this process ,DTU's needed will be equal to database which is backed up)
Your question is too broad, as there are many opinions and approaches to your question.
But in any way you should take a look at elastic database pools: https://azure.microsoft.com/en-us/documentation/articles/sql-database-elastic-pool/ which is a feature exactly designed for multi-tenant SaaS solutions.
Your end solution may be a combination of both - you may want to use a single server to "bigger" tenants, while you can host multiple small tenants together in a single server.
Security shall not be a factor with big weight because, when you use database contained credentials for application access, it does not really matter whether the databases are allocated in single logical server or not.

Multi Tenant Data Architecture in Azure

I want to implement multi tenant architecture for database. Plan is to have same database but have schema in it which will have same tables, sprocs, triggers, etc. repeated for each tenant. Tenant will be mapped to a schema and adding a tenant is like adding a schema.
And depending on the sub-domain, i will figure out the tenant and pull / push information to the respective database schema.
However, while looking for the way to implement the same i came across many articles and blogs and am confuse whether the word 'schema' is right in my context or should i go for Federation? And if i have to go to federation - does it mean that each tenant will be a federated member which will be mapped to a schema?
Can someone throw some more light on it?
I wrote a short series of articles for BusinessCloud9.com that may be of interest:
http://www.businesscloud9.com/user/2688
(I'd be grateful if you'd ignore the incorrect statement on the number of tables in a SQL Azure database! Unfortunately, I don't have the ability to edit the offending post to fix it)
SQL Azure federations can likely do the trick for you, but it is possible to accidentally create a fan-out query where multiple databases will be queried and results unintentionally intermixed. If you want to separate the schemas completely with no accidental mixing of data due to buggy code, you'll want multiple and distinct SQL Azure databases or schemas. You'll want to provision them as new tenants are brought onboard.
Here's a good link on the subject: http://geekswithblogs.net/hroggero/archive/2011/10/05/solving-schema-separation-challenges.aspx from one of SQL Azure MVPs

SQL Azure Federation Splitting Design and Querying

I have a few questions regarding Microsoft SQL Azure Federations:
1) Can I created a federated DB on an active Database or do I need to deploy federations ahead of time?
2) Do I need to make any changes to the SQL queries to comply with how I query federations, or I can continue to use my regular queries as I was working against one SQL Server Database?
3) When I split my database and after some time I see that one of the shards is very busy and almost full, how I tackle this problem using federations? - Do I need to split only that single federated table that is 90% full, or I need to recreate the splitting strategy by using a a less broader range. The problem is that one specific user can be very active, so what strategy I use to making sure that I won't need to re-create the federated strategy due to one very active federated table / user?
4) When I have different tables that I want to split with different primary keys, how the sharding will work then. for example:
From what I understand:
[Blogs]
blog_id
info
[Blog_Posts]
id
blog_id
post_content
So if I decide to shard based on the blog_id from 0-1000, 1-2001 I will have two federated tables. But how much more federated tables I have if I add more tables that have different keys other than blog_id, will I have more federated tables?
Thanks
Please be more precise and concrete and ask one question at a time. You have better chance for getting an answer to all of the questions when asked separately. Now let me try covering some of your questions.
1) Can I created a federated DB on an active Database or do I need to
deploy federations ahead of time?
You can certainly create a Federation(s) within an existing DB. There is no limitation to creating Federations in just a new/empty DB. However, creating a federation in an Active DB will do nothing for you. You have to realize that Federations are separate DBs. A Federation (or Federation member) knows nothing about the Federations Root DB (the DB where you created the federation). So you have to think on migrating schema/data from the Active DB (or the Federations Root) once you create your federation.
2) Do I need to make any changes to the SQL queries to comply with how
I query federations, or I can continue to use my regular queries as I
was working against one SQL Server Database?
Most probably YES. Windows Azure SQL Database Federations is a Scale-Out mechanism for the DB tier. This means, that like any Web Application needs a "special" design to work in a farm-like environment (i.e. scale-out environment like Windows Azure), a DataBase will also need a "special" design to work in a scale-out environment. There is no magic-wand with SQL Azure Federations that will make your code work. You have to design it to Work.
3) When I split my database and after some time I see that one of the
shards is very busy and almost full, how I tackle this problem using
federations? - Do I need to split only that single federated table
that is 90% full, or I need to recreate the splitting strategy by
using a a less broader range. The problem is that one specific user
can be very active, so what strategy I use to making sure that I won't
need to re-create the federated strategy due to one very active
federated table / user?
This is all about partitioning strategy. You have to very carefully design your federation key and how you partition your data across different shards. You can always SPLIT any federation, as long as you keep the Atomic Units in single shard.
4) When I have different tables that I want to split with different
primary keys, how the sharding will work then.
If you want to split different tables on different keys, than you will have different federations, each one with its own federation key and own tables.
A good video worth watching if you are up for SQL Federations: http://channel9.msdn.com/Events/TechEd/NorthAmerica/2012/DBI408

Windows Azure and multiple storage accounts

I have an ASP.NET MVC 2 Azure application that I am trying to switch from being single tenant to multi-tenant. I have been reviewing many blogs and posts and questions here on Stack Overflow, but am still trying to wrap my head around the specifics of what's right for this particular app.
Currently the application stores some information in a SQL Azure database, as well as some other info in an Azure Storage Account. I'm considering writing the tenant provisioning code to simply create a new database for a new tenant, along with a new azure storage account. This brings me to the following question:
How will I go about testing this approach locally? As far as I can tell, the local Azure Storage Emulator only has 1 storage account. I'm not sure if I'm able to create others locally. How will I be able to test this locally? Or will it be possible?
There are many aspects to consider with multitenancy, one of which is data architecture. You also have billing, performance, security and so forth.
Regarding data architecture, let's first explore SQL storage. You have the following options available to you: add a CustomerID (or other identifyer) that your code will use to filter records, use different schema containers for different customers (each customer has its own copy of all the database objects owned by a dedicated schema in a database), linear sharding (in which each customer has its own database) and Federation (a feature of SQL Azure that offers progressive sharding based on performance and scalability needs). All these options are valid, but have different implications on performance, scalability, security, maintenance (such as backups), cost and of course database design. I couldn't tell you which one to choose based on the information you provided; some models are easier to implement than others if you already have a code base. Generally speaking a linear shard is the simplest model and provides strong customer isolation, but perhaps the most expensive of all. A schema-based separation is not too hard, but requires a good handle on security requirements and can introduce cross-customer performance issues because this approach is not shared-nothing (for customers on the same database). Finally Federations requires the use of a customer identifyer and has a few limitations; however this technology gives you more control over performance distribution and long-term scalability (because like a linear shard, Federation uses a shared-nothing architecture).
Regarding storage accounts, using different storage accounts per customer is definitively the way to go. The primary issue you will face if you don't use separate storage accounts is performance limitations, such as the maximum number of transactions per second that can be executed using a single storage account. As you are pointing out however, testing locally may be a problem; however consider this: the local emulator does not offer 100% parity with an Azure Storage Account (some functions are not supported in the emulator). So I would only use the local emulator for initial development and troubleshooting. Any serious testing, including multitenant testing, should be done using real storage accounts. This is the only way you can fully test an application.
You should consider not creating separate databases, but instead creating different object namespaces within a single SQL database. Each tenant can have their own set of tables.
Depending on how you are using storage, you can create separate storage containers or message queues per client.
Given these constraints you should be able to test locally with the storage emulator and local SQL instance.
Please let me know if you need further explanation.

Resources