Designing and implementing SaaS Application with Muti-tenancy GraphDB (Neo4J / ArangoDB)

Designing and implementing SaaS Application with Muti-tenancy GraphDB (Neo4J / ArangoDB) - node.js

I am developing a SaaS Application with the following Technology:
NestJS (Node)
DB (NEO4J, ArangoDB)
Nginx for proxy (Micro-services Approach)
The SaaS Application will be hosting many distinct companies, as clients.
The data from 2 different companies must be fully isolated in the GraphDB.
2 different companies may have different data structures and models.
ENQUIRIES
Here are my enquiries:
How to setup Multi-tenancy on a GraphDB (Neo4J / ArangoDB)?
Is a totally separate required GraphDB instance required for each company?
Is it possible to host 2 companies on the same GraphDB, yet maintain isolation?
Can anyone please suggest an optimal solution for this type of architecture?
Thanks for your time
Best regards

Since Neo4j 4.0 multi-tenancy is supported via multi-database.
In the system database you can create as many databases as you want and from a client select the database to talk to on a session by session basis, so you can use each database for a tenant.
Here is the JS API:
https://neo4j.com/docs/api/javascript-driver/current/class/src/driver.js~Driver.html#instance-method-session
Each database instance can handle hundreds or thousands of databases.
With Neo4j Fabric enabled you can do cross-database federated queries.
here are some more examples
https://adamcowley.co.uk/neo4j/multi-tenancy-neo4j-4.0/
https://graphaware.com/neo4j/2020/02/06/multi-tenancy-neo4j.html
https://neo4j.com/developer/multi-tenancy-worked-example/

With ArangoDB you only need one instance and can simply use a database per tenant.
Each database is isolated, for example, AQL queries run in the context of a single database and you can only access the collections and named graphs of that database.
You can create an ArangoDB user for each customer and restrict its access to the respective database to achieve the desired isolation.
For scalability and resilience, there is also the OneShard feature (Enterprise Edition / managed service). It enables you to have a cluster where each database is treated like a single shard, i.e. all collections of a customer are stored on one DB-Server (excluding replicas), so that queries can be executed locally on that node. This is especially beneficial for graph traversals.

Related

Azure Storage Account for Tables

So first of all I'd like to say I'm no DBA nor coder, I'm just a regular IT person that works as support for network and infrastructure, however, I like to get familiar with technologies in general and understand the basics of it, let's say how they work, implemented with no additional specific details.
I've been reading about Azure Storage Accounts in regards to tables. As IT, I had to implement simple file shares via SMB 3.0 in order to have them mapped on our network, I've come across other options such as blobs, tables and queues. I've read about them however I'm trying to get the main functionality of tables for a coder.
Correct me if I am wrong, when you code an app with a database, you can put the database on same/different server, and that can be on premise or on the cloud and you kind of link both together.
And as far as Im concerned and what I was able to find out investigating on the web, these tables are NoSQL and no constraints, you create the tables and data through Visual Studio thanks to an API, then that information is reflect on your storage.
How is this is useful when using it for the app you're developing?

I've been reading about Azure Storage Accounts in regards to tables. As IT, I had to implement simple file shares via SMB 3.0 in order to have them mapped on our network, I've come across other options such as blobs, tables and queues. I've read about them however I'm trying to get the main functionality of tables for a coder.
And as far as Im concerned and what I was able to find out investigating on the web, these tables are NoSQL and no constraints, you create the tables and data through Visual Studio thanks to an API, then that information is reflect on your storage.
Azure Storage Accounts is a "box" to keep your Blobs, Tables, Queues, Files organised from the management point of view and for the access control. Each storage type is good for it's specific tasks.
If the world would have just one super storage which will solve all our possible cases for storing, querying and managing the data then there would not be such variety of different databases, storage types etc. available.
If you need to share the files as a "network folder" - try Azure Files.
If your coders need a database storage, then the first question would be what are the requirements to the database do they have? What is the purpose of that database would be, etc. Azure, particularly, has a lot of different database solutions, and again, each of them good for some specific task, and can be not a good choice for other tasks.
As to Azure Tables, from the official docs:
Azure Table storage is a service that stores structured NoSQL data in the cloud, providing a key/attribute store with a schemaless design.
So, if your coders do need to store such data, then yes, that would be one of the possible choices.
Correct me if I am wrong, when you code an app with a database, you can put the database on same/different server, and that can be on premise or on the cloud and you kind of link both together.
Correct. But also you can have your own server with the database which you need to manage yourself, or you can choose some cloud service which will provide the database for you but will keep the underlying server and other maintenance activity managed for you, so you no need to worry/spend your time on that.
How is this is useful when using it for the app you're developing?
It is important to understand what your requirements are for data storage in order to pick a proper one. This question perhaps should be addressed not to you, but to your coders, who are building the app and can consolidate their requirements to the database store. Usually, they will tell you exactly what they need, and you may give them some ideas or advice of the alternatives, if any (That may be a similar solution with extra functionality or the way how the data is stored or processed, or have more built in integrations that may be important for you, or a decision whether keep own installation or use cloud managed service)
For your further possible question about When should I use a NoSQL database instead of a relational database? Is it okay to use both on the same site? see this thread
Update based on further questions:
If I develop an application with a database whose tables are on Azure, can I call let's say functions or data from it to my main application that is hosted on premise? What's the benefit of doing that versus hosting the tables on premise other than it's largely scalable and highly available?
Perhaps you need to better understand the relationship between App (Application) and DB (Database). The Database is a standalone system, which store the data, reply to the incoming queries (receive request, process it, return the result). In overall to the DB is not important who is requesting the data. It is a "passive" system. (There are some cases when DB can trigger further processes in data processing pipelines, but that is beyond this scope).
The App in opposite is an active system in App<->DB relationship. (Also leave behind more advanced designs where App is not just a 1 system). App receive requests, process them (may do external requests to other "services" if that is necessary), give a response (with or without data) to the requester. In App<->DB relationship the external requests is what happening. At some point App need some data from the DB, so App make a request to the DB, obtain the response and continue its own logic.
Where App server and DB server are placed is not that important (for simplicity). The important part is whether DB server is accessable for the requests. DB can be on-prem with public static IP address, it can be in cloud on your own server which has public static IP address (sometimes that is archived in different ways but we skip that for simplicity), that can be a Database as a Service cloud solution, where you do not need to have a server and configure the database, but have a url endpoint which you need to use to query the DB.

I appreciate the answer, and I pretty much agree with what you're saying.
But my questions goes beyond what the requirements are for the developers.
I'll modify the question. If I develop an application with a database whose tables are on Azure, can I call let's say functions or data from it to my main application that is hosted on premise? What's the benefit of doing that versus hosting the tables on premise other than it's largely scalable and highly available?

Azure Storage Tables are the "Notepad" of NoSQL Databases. If you want quick and easy key/value pairs, tables is the way to go. If you are looking for the "Word" of NoSQL in Azure then Cosmos DB is where it's at. Cosmos DB offers global distrobution, better features and better SLA (see comparison). Tables are cheaper too.
Azure also supports MySQL, PostGreSQL, MariaDB and MSSQL as PaaS offerings if you wish to use a traditional database.

How to design a multi-tenant node.js application?

Currently I am facing a technological decision to be made and personally am not able to find the solution myself.
I am currently in progress to develop a multiple-tenant database.
The structure would be the following:
There is one core database which saves data and relations about specific tenants
There are multiple tenant database instances(from a query in the core database, it is determined which tenant id I should be connecting to)
Each tenant is on a separate database instance(on a separate server)
Each tenant has specific data which should not be accessible by none of other tenants
Each database would preferably be in mySQL(but if there are better options, I am open to suggestions)
Backend is written in koa framework
The database models are different in the core database and tenant databases
Each tenant database's largest table could be around 1 mil records(without auditing)
Optimistically the amount of tenants could grow up to 50
Additional data about the project:
All of project's data is available for the owner
Each client will have data available for their own tenant
Each tenant will have their own website
Database structure remains the same for each tenant
Project is mainly a logistics service, which's data is segregated for each different region
The question:
Is this the correct approach to design a multi-tenant architecture or should there be a redesign in the architecture?
If multi-tenant with multiple servers are possible - is there a preferable tool/technology stack that should be done? (Would love to know more specifically about this)
It would be preferred to use an ORM. I am currently trying to use Sequelize but i am facing problems already at early stage(Multiple databases can't share the same models, management of multiple connections).
The ideal goal would be the possibility of adding additional tenants without much additional configuration.
EDIT:
- The databases would be currently hosted in Azure, but we'd prefer the option that they can be migrated away if it becomes a requirement

Exists some ways to architect a data structure in a multi tenant architecture.
It's so hard to say what is the better choice, but I will try to help you with my little knowledge.
First Options:
Segregate your database in distributed servers, for example each tenancy has your own data base server totally isolated.
It could be good because we have a lot of security with tenancy data, we can ensure that other tenancy never see the other tenancy data.
I see some problems in this case, thinking about cost we can increase a lot it because we need a machine to each client and perhaps software license, depends what is your environment. Thinking about devops, we will need a complex strategy to create and deploy a new instance for every new tenancy.
Second Options
Separate Data Bases, we have one server where we create separated databases to each tenancy.
This is often used if you need to provide isolation for each customer, because we can associate different logins, permissions and so on to each database.
Some other cons: A different connection pool is required per database, updates must be replicated across all the databases, there is no resource sharing (unless using Elastic Database Pools) and you need multiple backup strategies across all the databases, and a complex devops strategy to deploy and create new tenancies.
Third Option:
Separate Schemas, It's a good strategy to implement a multi-tenancy architecture, we can share some resources since everything is inside the same database, but the schemas used are different, having a separate schema for each tenant. That allows you to even customize a specific tenant without affecting others. And you save costs by only paying for one database.
Some of the cons: You need to replicate all the database objects in every schema, so the number of objects can increase indefinitely, updates must be replicated across all the schemas, the connection pool for the database must maintain a different connection per tenant (or set of credentials), a different user is required per tenant (which is stored at server level) and you have to backup that user independently.
Fourth Option
Row Isolation.
Everything is shared in this options, server, database and schema, All data for the tenants are in the same tables in the same database. The only way they are differentiated is based on a TenantId or some other column that exists on the table level.
Other good point is that you will not need a devops complex strategy, and if you are using SQL Server, I know that, there exists a resource called Row Level Security to you get only the data that logged user has permission.
But in this case if you have thousands of users who will be hitting the database at the same time you will need some approach for a good scalability.
So you need to think about your case and how your system will be growing up, to choose the better option.

It seems quite fine for me.
Where I see a bottleneck is having every tenant on a separate DB server or DB instance. It would mean that you need to hold a separate connection pool for every tenant or to create a new connection for every request depending on the tenant. Try using any concept where you can have one DB connection for all the tenants (namespaces, schemas or just prefixing tenant table names with some tenant-specific prefix)
But if you need to have the tenants DBs separate eg. because of different backup policies, resource limits etc. you can't do this and will have to manage separate connection pool for every tenant. It also depends on how many tenants will you have. Tens, thousands?
I would also suggest you to cache the tenant->DB mapping somewhere in the app instead of querying it every time from the core database.

Azure, SQL Server and Database

One of my customer is developing multi-tenant solution. And I'm working as developer for the automation of resource provisioning part. The solution is developed such that each tenant have their resources separate from each other.
So for example, a single tenant will require a SQL database (PAAS), A Storage Account, and also many other resources.
One of the requirement that, customer set is, he wants to have X number of databases to be hosted on a SQL server (a logical server not VM). Which I don't think is valid having been using SQL as PAAS.
So My question is, Should we create SQL Server and SQL database for each tenant?
Or
Should we create a SQL server then host X number of databases on that server. when server reaches limits (X databases), create another server and execute same logic.
In either scenario, what difference does it make from Database Performace, Pricing and Database security point of view?
FYI, My thinking is that, If I host 'X' database on a single SQL Logical Server or If I create 'X' SQL Logical Server for 'X' SQL database hosting, It won't make any difference from Pricing and Database Performace point of view.

Few differences i could think of, if you go with single server for all clients..
1.Administrator Password is per Server and using this,one client can have access to other databases as well..
2.Azure has a limit of how many DTU's can be capped under one server,so if you have many databases under one server..This may lead to few issues like
a.)frequent DTU increase requests
b.)some times automated backup may fail,if there are no DTU's available(Backup needs to copy the whole database,so in this process ,DTU's needed will be equal to database which is backed up)

Your question is too broad, as there are many opinions and approaches to your question.
But in any way you should take a look at elastic database pools: https://azure.microsoft.com/en-us/documentation/articles/sql-database-elastic-pool/ which is a feature exactly designed for multi-tenant SaaS solutions.
Your end solution may be a combination of both - you may want to use a single server to "bigger" tenants, while you can host multiple small tenants together in a single server.
Security shall not be a factor with big weight because, when you use database contained credentials for application access, it does not really matter whether the databases are allocated in single logical server or not.

What is the recommended way to create a database per client with Azure SQL Database?

I will try to explain my question with an example:
let's say that I have a client, a cellular company. I need to create a database which includes data such as - customer lists, accounts, payment options and so on.
Now I have another client - another cellular company, but of course - the skeleton database will be identical to the first client.
How can I create a database that would include information of all the companies (clients), on the other, each company (client) will have its own database?
I hope that the question is understandable, I'd love some help.

Did you have a look at Azure Sql Database Elastic Scale ?
You can find more information here : https://azure.microsoft.com/en-us/documentation/articles/sql-database-elastic-scale-introduction/
The elastic tools library is designed for this kind of workload.
Hope this helps
Julien

In addition, also please see Elastic Database Pools. The Elastic Scale features provide the tools to create a single application in which each customer received their own database and Elastic Data Pools provides the mechanism to share resource between such databases to reduce costs.

Multiple Apps with CouchDB

What is the recomended security model for running multiple apps with CouchDB? The apps are separate from each other apps and DBs are in a 1:1 relationship, and it makes sense for them not to be able to access each others' data.
Should the databases run in their own CouchDB instance or is there a way to combine them? I've seen a little about authentication and authorization, but there's not enough to tell if it's viable to support different users on the same instance. And on the other hand, if there's much overhead to running separate instances.

You can create a _security document for each database, preventing access by username or role.
http://wiki.apache.org/couchdb/Security_Features_Overview#Authorization
The primary consideration when running multiple applications on one CouchDB server is that all user accounts will be shared. There is one central _users database for everybody.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string