I am designing a basic ERP (nodejs/express/postgresql-vue3/quasar), in which several businesses of different clients will be managed, some of these clients have several businesses with some branches, I should implement a server/database instance per customer or should I look to load balance and scale a single database in the future?
That is database tenancy aproach. Here is nice article on that.
Personally, would recommend schema multi-tenancy for start (one client per schema) as it is basic ERP and it's easier to manage and maintain single DB, and you can add specific changes for some clients on table design if needed
You can use set search_path on pg connection for each client to direct queries to specific schema
PostGreSQL has not be designed for VLDB, so you must evaluate the final volume for 3 to 5 years.
If this volume will be over 300 Gb, it is preferable to split your customers into one database each.
If this volume will be under, you can use SQL schemas.
Beware of the number of files... PG create many file for each table... If there is too much files this will need a high consumption of resources. In this case, it will be necessary to split your system over many PG clusters...
Related
I have a system that I modeled 1 database per customer.
Today I control the database version manually, but I want to use Prisma to for that, for someone how do I run my migrations and multiple databases at the same time?
You would need to create multiple PrismaClients, one for each of your customers.
Here's a Feature Request on supporting Multiple Databases. There are various ways suggested in the Feature Request which would allow you to use multiple databases with Prisma.
I made a CRM app using NestJs with Nodejs. I designed it in a way that each team has its own database because every teams data is difference and has no relation with other teams and also it made the process of back up much easier.
However, Now that I want to deploy my service I noticed that for each team I must create a separate nodejs Instance which makes ram usage very high. Imagine just for 10 teams I may need around ~500MB ram which will hurt me economically even in short run.
Solutions
I used TypeORM in NestJs so the first thought I had was to find a way to have multiple databases (not multiple connections) having them sharing same schema but dynamicly use one of them based on request's scope and details. Which seems the best solution so I can avoid creating another NodeJs instance and in same time I now have seperate database for each team.
I read nestJs and TypeORM documents but didn't found any way to accomplish that. So my other solution was to just use one database for everone and add something like team_id column to each table to make a filter data for each team.
Is it a good way?
Is there any other solutions to use one nestJs instance but with same schema for multiple databases?
I recommend to use one database.
The database can have a table saving all of the teams and other tables will have a new team_id column as you think.
One database for each team has disadvantages.
Multiple DB Connections
Since you need to use same Entities for all of the databases for the teams, you cannot use Single Database Connection. According to every incoming API request, the server will have to switch db connections.
DB Configuration in TypeORM
For multiple databases, the configuration will be looking like below:
imports: [
...,
TypeOrmModule.forRoot({
name
type
host
port
username
password
...
}),
TypeOrmModule.forRoot({
name
type
host
port
username
password
}),
...
]
If you need to add a new team, you have to update your code base for adding a new db for the team and have to redeploy your application. (maybe you will create a new database and perform migration too?)
Backup
I agree with you that it's better to backup a single team with multiple databases. But how about when you want to backup all teams? In most of cases, I believe it will need to backup all teams, not just a specific team.
Teams Management
Where do you save a team's information? How to know what team has what db?
Maybe you saved teams somewhere(in a separated db?). To know which database connection should be used in each request, it needs to make a new query?
Cost
If there are 100 teams, you are gonna make 100 databases? Also each application has development and production environment. In some cases, there can be more environments like staging. 2 envs will double the number of dbs.
Conclusion
Of course there will be a way to automate some of the items in the above list and it's still possible to use multipe databases in NestJS + TypeORM for your project but it looks not a good way and not a worth effort for your project.
I have seen some big multi-tenant applications (like grafana) and they weren't using multiple databases strategy.
I don't know how you are storing users, but since you are speaking about teams I suppose you have a place where users are stored and assigned to a team, could it be a table in a login common database?
A solution could be to bind each team to it's own database; once a user login (accessing data from common login database) you read the team which it belongs and the database for its data, then you can access CRM data from the database bound to the team the user belongs.
Hi we are planning to use Cassandra for ad server implementation. We have a req where client can create advertisers publishers and new ads sort of typical relational req as well as interface to monitor analytical data ad hits, conversion etc. We also needs an interface where client is able to apply filters based on master fields such as name, location etc. As well as based on analytical data like where ad revenue > x and similar other criterias quite a few in nos.
Is it OK to use a single databases like Cassandra to maintain both types of data. As Cassandra has fairly limited querying capacity on fields unless u create views n index we are skeptical. If we keep two seperate databases products will it complicate and add additional redundancy. How companies such as Facebook, linkedin are accounting for both master and analytical data req. Any suggestions are appreciated. Thx
The typical solution in Cassandra is to have multiple datacenters - one for online transaction processing, and another for spark analytical queries. The different datacenters allow you to query them independently so spark doesn't impact production. Alternatively you can denormalize and insert into multiple tables using 'BaTCH'
I'm working on a management/planning application that will have 1,000+ users, each with 30+ data collections.
For instance, each user might have a collection of, say, client contacts with as few as 10 and as many as several hundred items/records.
Would Arangodb be a suitable choice for this application?
Is there a better choice?
Many thanks,
LRP
I assume that all the user databases should be kept separate, so user 1 should not see any data of user 2 etc.
If so, there is the option to create a separate database for each user, or, as you mentioned 30+ databases for each user. That would result in 30,000+ databases. I think this wouldn't be an ideal usage of ArangoDB, as each database will incur some overhead, and you may want to keep the total number of databases in an ArangoDB relatively small, at least you wouldn't create 30,000+ databases in it.
The alternative option is to not create that many databases but as many collections, maybe all in the same databases. While this provides a good separation of user data, from the point of resource usage this would also be rather expensive (as each collections may need a separate storage file if it contains data). I think it could work if not all users/collections need to be active at the same time and the server has plenty of resources (or you split the data across multiple servers).
The solution that would use least resources in ArangoDB would be to put data of multiple users into just a few collections. For each record you could store a user-id, and have your application use the user-id in each query.
This would ensure the application would only access records of one specific user at a time. Additionally, as this would use just few collections, there would be no need to create empty or mostly empty databases / collections for users with few data. From the resource usage point of view, this should be relatively efficient.
Given the following "facts" I have gleaned from reading around this.
Federations are separate databases from the moment they are created.
As copies of the original, they will not alter automatically if I alter the original's schema.
As separate databases you cannot cross join.
Each federation is priced as a separate db.
I will have to provide a TenantId field to each table I want to federate.
If these are correct, what are the advantages to using federation to achieve multi-tenancy over simply separate dbs? Or if there're not correct please put me straight.
Note, we have a small number of tenants, maybe 20.
Your understanding is correct.
There are a few interesting aspects of Federations that you may find useful. First it is a relatively flexible partitioning environment. For example you can group 10 tenants into the first member, and 50 in the second, based on usage patterns of your customers. Or you could simply isolate a single customer that is using the system more than the others.
Another important concept is that you can have multiple federations per database. So you could have a Customer federation and a SalesHistory federation for example.
Last but not least you may want to read this article that discusses connection pool fragmentation that occurs in traditional sharding models, but is not an issue with SQL Database Federations.