Couchdb/Pouchdb schema design

Couchdb/Pouchdb schema design - couchdb

I have an app that manages prospects. These prospects are assigned to some collaborators. They have a tablet that they use to contact our clients. The mobile app has offline support and syncs with the desktop app. I have some users that manages this collaborators and they need to have access to all prospects from their collaborators.
Our current design is a cordova app, with angular and pouchdb that syncs to couchdb. The desktop app is a laravel app.
We create a new couchdb database for every collaborator that syncs with their account on mobile app.
The admins and supervisors needs to access, list and edit all prospects of all couchdb databases. So, we sync through a nodejs process, all prospects on couchdb databases to mysql prospects table, so we can easily list all prospects.
My question is if this design makes sense or if we should go for another way.
There is a similar question and design here

To increase security and decrease design problems with multiple databases to sync, I use AES encrypted data in documents. For me it is the best solution. Everyone can read but the data is securily protected.

Related

Designing and implementing SaaS Application with Muti-tenancy GraphDB (Neo4J / ArangoDB)

I am developing a SaaS Application with the following Technology:
NestJS (Node)
DB (NEO4J, ArangoDB)
Nginx for proxy (Micro-services Approach)
The SaaS Application will be hosting many distinct companies, as clients.
The data from 2 different companies must be fully isolated in the GraphDB.
2 different companies may have different data structures and models.
ENQUIRIES
Here are my enquiries:
How to setup Multi-tenancy on a GraphDB (Neo4J / ArangoDB)?
Is a totally separate required GraphDB instance required for each company?
Is it possible to host 2 companies on the same GraphDB, yet maintain isolation?
Can anyone please suggest an optimal solution for this type of architecture?
Thanks for your time
Best regards

Since Neo4j 4.0 multi-tenancy is supported via multi-database.
In the system database you can create as many databases as you want and from a client select the database to talk to on a session by session basis, so you can use each database for a tenant.
Here is the JS API:
https://neo4j.com/docs/api/javascript-driver/current/class/src/driver.js~Driver.html#instance-method-session
Each database instance can handle hundreds or thousands of databases.
With Neo4j Fabric enabled you can do cross-database federated queries.
here are some more examples
https://adamcowley.co.uk/neo4j/multi-tenancy-neo4j-4.0/
https://graphaware.com/neo4j/2020/02/06/multi-tenancy-neo4j.html
https://neo4j.com/developer/multi-tenancy-worked-example/

With ArangoDB you only need one instance and can simply use a database per tenant.
Each database is isolated, for example, AQL queries run in the context of a single database and you can only access the collections and named graphs of that database.
You can create an ArangoDB user for each customer and restrict its access to the respective database to achieve the desired isolation.
For scalability and resilience, there is also the OneShard feature (Enterprise Edition / managed service). It enables you to have a cluster where each database is treated like a single shard, i.e. all collections of a customer are stored on one DB-Server (excluding replicas), so that queries can be executed locally on that node. This is especially beneficial for graph traversals.

How to design a multi-tenant node.js application?

Currently I am facing a technological decision to be made and personally am not able to find the solution myself.
I am currently in progress to develop a multiple-tenant database.
The structure would be the following:
There is one core database which saves data and relations about specific tenants
There are multiple tenant database instances(from a query in the core database, it is determined which tenant id I should be connecting to)
Each tenant is on a separate database instance(on a separate server)
Each tenant has specific data which should not be accessible by none of other tenants
Each database would preferably be in mySQL(but if there are better options, I am open to suggestions)
Backend is written in koa framework
The database models are different in the core database and tenant databases
Each tenant database's largest table could be around 1 mil records(without auditing)
Optimistically the amount of tenants could grow up to 50
Additional data about the project:
All of project's data is available for the owner
Each client will have data available for their own tenant
Each tenant will have their own website
Database structure remains the same for each tenant
Project is mainly a logistics service, which's data is segregated for each different region
The question:
Is this the correct approach to design a multi-tenant architecture or should there be a redesign in the architecture?
If multi-tenant with multiple servers are possible - is there a preferable tool/technology stack that should be done? (Would love to know more specifically about this)
It would be preferred to use an ORM. I am currently trying to use Sequelize but i am facing problems already at early stage(Multiple databases can't share the same models, management of multiple connections).
The ideal goal would be the possibility of adding additional tenants without much additional configuration.
EDIT:
- The databases would be currently hosted in Azure, but we'd prefer the option that they can be migrated away if it becomes a requirement

Exists some ways to architect a data structure in a multi tenant architecture.
It's so hard to say what is the better choice, but I will try to help you with my little knowledge.
First Options:
Segregate your database in distributed servers, for example each tenancy has your own data base server totally isolated.
It could be good because we have a lot of security with tenancy data, we can ensure that other tenancy never see the other tenancy data.
I see some problems in this case, thinking about cost we can increase a lot it because we need a machine to each client and perhaps software license, depends what is your environment. Thinking about devops, we will need a complex strategy to create and deploy a new instance for every new tenancy.
Second Options
Separate Data Bases, we have one server where we create separated databases to each tenancy.
This is often used if you need to provide isolation for each customer, because we can associate different logins, permissions and so on to each database.
Some other cons: A different connection pool is required per database, updates must be replicated across all the databases, there is no resource sharing (unless using Elastic Database Pools) and you need multiple backup strategies across all the databases, and a complex devops strategy to deploy and create new tenancies.
Third Option:
Separate Schemas, It's a good strategy to implement a multi-tenancy architecture, we can share some resources since everything is inside the same database, but the schemas used are different, having a separate schema for each tenant. That allows you to even customize a specific tenant without affecting others. And you save costs by only paying for one database.
Some of the cons: You need to replicate all the database objects in every schema, so the number of objects can increase indefinitely, updates must be replicated across all the schemas, the connection pool for the database must maintain a different connection per tenant (or set of credentials), a different user is required per tenant (which is stored at server level) and you have to backup that user independently.
Fourth Option
Row Isolation.
Everything is shared in this options, server, database and schema, All data for the tenants are in the same tables in the same database. The only way they are differentiated is based on a TenantId or some other column that exists on the table level.
Other good point is that you will not need a devops complex strategy, and if you are using SQL Server, I know that, there exists a resource called Row Level Security to you get only the data that logged user has permission.
But in this case if you have thousands of users who will be hitting the database at the same time you will need some approach for a good scalability.
So you need to think about your case and how your system will be growing up, to choose the better option.

It seems quite fine for me.
Where I see a bottleneck is having every tenant on a separate DB server or DB instance. It would mean that you need to hold a separate connection pool for every tenant or to create a new connection for every request depending on the tenant. Try using any concept where you can have one DB connection for all the tenants (namespaces, schemas or just prefixing tenant table names with some tenant-specific prefix)
But if you need to have the tenants DBs separate eg. because of different backup policies, resource limits etc. you can't do this and will have to manage separate connection pool for every tenant. It also depends on how many tenants will you have. Tens, thousands?
I would also suggest you to cache the tenant->DB mapping somewhere in the app instead of querying it every time from the core database.

RavenDB collection level security / permissions (for read-only replication)?

We have a RavenDB server with a Master-Slave setup. There is one master DB where documents get written to via one process, which are then replicated to the slaves.
The slaves are accessed via a web application, but we would like to make all replicated documents read-only.
When looking for security options, we find it at DB level and at Document level (using Authorization bundle). The first makes the entire DB read-only which should not be the case. The second is too convoluted and is for more fine-grained security which we don't need.
We were hoping that the web application could connect to RavenDB with certain credentials context (not the application users, but the system user which the website runs as), where certain collections would then be read-only for that identity. This is perfectly possible in SQL Server for example.
Any alternative solutions such as replicating documents as read-only are also appreciated. Our current best idea is to have put triggers on the server which make the required documents read-only on slave databases.

You can easily use API Keys for this:
https://ravendb.net/docs/article-page/3.0/Csharp/server/configuration/authentication-and-authorization#oauth-authentication
One API Key (read/write) for the replication
One API Key (read only) for the application

Scale out scenarios with Azure Sql (geolocation)

I have my Azure Sql database located in West Europe and are considering to have a database in the States also. Deploying my website in the states was easy, but this website then query the database in Europe, which gives delays.
What do people do in these cases? Having separate databases for different users could work I guess, but it then fails if a user normally on one server get routed to the other server, then his data is not in the database. Is there easy solutions to have the same data available in two azure SQL servers, and Azure maintain the data sync? What about conflicts when syncing?

It really depends on your requirements and how you implement routing. You can design your distributed application in a manner that user A, when authenticated always go the US server for instance. Even if he/she is currently in Europe or Asia.
If you want to sync everything everywhere, there a preview feature named "SQL Data Sync". It can sync data between multiple instance of SQL Server (including on-premises SQL Server installations). It is quite flexible in terms of configuring and options for syncing. But again, it really depends on application requirements. If I was building distributed system, I would not sync data across continents. Will design the app so that user specific data lives in only one Data Centre. this, of course is impossible if my user has access to a lot more data then just related to his/her profile.
The best option would be to keep user-specific data in user's designated Data Centre, and sync the data that must be available to all users at all locations.

Multiple Apps with CouchDB

What is the recomended security model for running multiple apps with CouchDB? The apps are separate from each other apps and DBs are in a 1:1 relationship, and it makes sense for them not to be able to access each others' data.
Should the databases run in their own CouchDB instance or is there a way to combine them? I've seen a little about authentication and authorization, but there's not enough to tell if it's viable to support different users on the same instance. And on the other hand, if there's much overhead to running separate instances.

You can create a _security document for each database, preventing access by username or role.
http://wiki.apache.org/couchdb/Security_Features_Overview#Authorization
The primary consideration when running multiple applications on one CouchDB server is that all user accounts will be shared. There is one central _users database for everybody.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string