1000+ users, 30+ private data collections - arangodb

I'm working on a management/planning application that will have 1,000+ users, each with 30+ data collections.
For instance, each user might have a collection of, say, client contacts with as few as 10 and as many as several hundred items/records.
Would Arangodb be a suitable choice for this application?
Is there a better choice?
Many thanks,
LRP

I assume that all the user databases should be kept separate, so user 1 should not see any data of user 2 etc.
If so, there is the option to create a separate database for each user, or, as you mentioned 30+ databases for each user. That would result in 30,000+ databases. I think this wouldn't be an ideal usage of ArangoDB, as each database will incur some overhead, and you may want to keep the total number of databases in an ArangoDB relatively small, at least you wouldn't create 30,000+ databases in it.
The alternative option is to not create that many databases but as many collections, maybe all in the same databases. While this provides a good separation of user data, from the point of resource usage this would also be rather expensive (as each collections may need a separate storage file if it contains data). I think it could work if not all users/collections need to be active at the same time and the server has plenty of resources (or you split the data across multiple servers).
The solution that would use least resources in ArangoDB would be to put data of multiple users into just a few collections. For each record you could store a user-id, and have your application use the user-id in each query.
This would ensure the application would only access records of one specific user at a time. Additionally, as this would use just few collections, there would be no need to create empty or mostly empty databases / collections for users with few data. From the resource usage point of view, this should be relatively efficient.

Related

saas with one database per client

I am designing a basic ERP (nodejs/express/postgresql-vue3/quasar), in which several businesses of different clients will be managed, some of these clients have several businesses with some branches, I should implement a server/database instance per customer or should I look to load balance and scale a single database in the future?
That is database tenancy aproach. Here is nice article on that.
Personally, would recommend schema multi-tenancy for start (one client per schema) as it is basic ERP and it's easier to manage and maintain single DB, and you can add specific changes for some clients on table design if needed
You can use set search_path on pg connection for each client to direct queries to specific schema
PostGreSQL has not be designed for VLDB, so you must evaluate the final volume for 3 to 5 years.
If this volume will be over 300 Gb, it is preferable to split your customers into one database each.
If this volume will be under, you can use SQL schemas.
Beware of the number of files... PG create many file for each table... If there is too much files this will need a high consumption of resources. In this case, it will be necessary to split your system over many PG clusters...

What are best practices for partitioning data in MongoDB?

I'm creating a social site using mean stack and I need some suggestions regarding mongoDB and mongoose.
I'm part of a startup and we decided to use these amazing technologies to fulfil our task.
Basically, I need some suggestions.
Currently, I have finished creation of simple CRUD and implemented local passport JS. I have currently one single collection in my mongoDB called users.
Our social site will have a blog, marketplace and many other pages (features) that will be related to a single user.
Since I never worked with mongoDB before, I'm curious if mongoDB should use one collection per user or have multiple collections for each feature.
To clarify it, let's say I use User model for user registration, blog model for blogs etc etc.
This would really mean a lot to me if you would shortly explain me how to structure my mongoose models, if all data should be inside one collection or if one user should have separate collections for different features. And if you recommend multiple collections, how do I then link these collections together and make sure that all data is saved for one user etc.
Thanks a lot in advance!
I will explain partitioning/dividing into two level.
Of course, you're going to create different collections for different models. Such as Users, Blogs, Messages etc.
Now comes the 2nd part, if we are talking about millions of data. How you partition them for faster data lookup.
For example, you have 1M users, which you are going to put in one big collection of 'Users'. But if you look for a user whose first name is 'Imdad' and age is 28, Now your query looks through these 1M items in your single Users collection, which will take a good amount of time.
To solve this problem, users collection can be divided into multiple collections through horizontal partition (Users1 (age between 10-20), Users2 (age between 20-30), Users3 (age between 30-40)). Now based on your query predicate monggoDB is to look up into a different collection/s. This is the idea that MongoDB has applied like other SQL DB. You don't have to explicitly execute your query to the chunk collection but the mongoDB itself take care of that.
Shard key generation
Mongoose shard key
If you are using mongoDB as a backend for a REST interface, the best practice is to create on collection per resource. For example, if you intend on having a /api/users endpoint, you should have users collection and it should contain any and everything you intend to return on that endpoint.
If you are using node to compile server-side templates, structure can be more flexible. In this case, the above still applies (as you will probably eventually want to expose a REST service), but there is more flexibility. In fact, if a many-to-many style relationship is appropriate, it is easier to separate these collections and load them together in the same page.
As an aside, you mention having users and a marketplace. The bigger issue than the separation of data into collections is the use of transactions. Any time you intend on performing a transaction of data, it should be performed within a SQL transaction. There are no notions of transactions in MongoDB. This is by design, as MongoDB is designed to be a fast, scaleable data store. It is not unreasonable to amalgamate SQL and noSQL data, in this case.

Cloudant number of database limitation

I'm planing on having my database stored in Cloudant.
Our application is multi-tenant. We currently do the separation to tenants based on a value in some of our tables which will naturally translation to value in a document. Another way is to have database per tenant. We currently have around 100 tenants and hopefully will grow to 500-2000 in our best projections.
What is the pros and cons between all tenants in one db vs. db per tenant?
Is there limitation on the number of database we can create and work with concurrently?
This is a good and involved question. There are pros and cons to both models. The main advantage to one large database is that you can analyze (search, mapreduce, etc) across all users very easily. The main advantage of one-db-per-user is that every user has their own data "sandbox", which may be nice for your SLA. Additionally, that means that the amount of data in each user database can be relatively small.
If you can provide more details about the data you are storing, the relational modeling, and the queries you hope to be able to do, I can probably give you a more satisfying answer.

SaaS, central database, database per user, or combination?

Problem at hand is as follows:
SaaS to keep maintenance records
95% of data would be specific to each user i.e. no need to be accessed by other users
5% of data shared (and contributed by all users), like parts that are used in maintenance
SaaS to be delivered as CouchApp i.e. with public facing CouchDB
So I am torn between database per user, and single database for all users.
Database per user seems to offer much easier backup and maintenance, smaller data set, and easier access control. On the negative side how could I handle shared data?
Is it possible to have database per user, and one common database for shared information (parts)? Then replicate parts documents from all user databases to central one, from there back to all user databases? How to handle conflicts in that case (or even better avoid if possible)?
Or any much simpler approach? Or bite the bullet and go with just one central database?
It depends on the nature of the shared data, I guess. It seems natural to have filtered replication flowing from the user databases to the shared databases and unfiltered replication from the shared database to the user databases; I think that covers your requirements? It makes it so that each user only has to read/write from/to their specific database, while you can still distribute out the shared docs.
It may be easier to query from the shared database directly instead of replicating it back into the user databases, but that really depends on what kind of data would be in there.

Restricting resource access in CouchDB to exactly 2 users

Currently I'm in the process of evaluating CouchDB for a new project.
Key constraint for this project is strong privacy. There need to be resources that are readable by exactly two users.
One usecase may be something similar to Direct Messages (DMs) on Twitter. Another usecase would be User / SuperUser access level.
I currently don't have any ideas about how to solve these kind of problems with CouchDB other than creating one Database that is accessable only by these 2 users. I wonder how I would then build views aggregating data from several databases?
Do you have any hints / suggestions for me?
I've asked this question several times on couchdb mailing lists, and never got an answer.
There are a number of things that couchdb is missing.
One of them is the document level security which would :
allow only certain users to view a doc
filter the documents indexed in a view on a user level permission base
I don't think that there is a solution to the permission considerations with the current couchdb implementation.
One solution would be to use an external indexing tool like lucene, and tag your documents with user rights, then issue a lucene query with user right definition in order to get the docs. It also implies extra load on your server(s) (lucene requires a JVM) and an extra delay for the data to be available (lucene indexing time ... )
As for the several databases solution, there are language framework implementations that simply don't allow to use more then one databases ( for instance couch_potato for Ruby ).
Having several databases also means that you'll have several replication processes if your databases are replicated.
Also, this means that the views will be updated for each of the database. In some cases this is better then have huge views indexed in a single database, but it also means that distinct users might not be up to date for a single source of information ( i.e some will have their views updated, other won't). So you cannot guarantee that the data is consistent for all users.
So unless something is implemented in the couch core in order to manage document level authorizations, CouchDB does not seem appropriate for managing data with privacy constraints.
There are a bunch of details missing about what you are trying to accomplish, what the data looks like, so it's hard to make a specific recommendation. You may be able to create a database per user and copy items into each users database (for the DM use case you described). Each user would only be able to access their own database, and then you could have an admin user that could access all databases. If you need to later update those records copying them to multiple databases might not be a good idea, and then you might consider whether you want to control permissions at a different level from storage.
For views that aggregate data from several databases, I recommend looking at lounge and bigcouch, which take different approaches.
http://tilgovi.github.com/couchdb-lounge/
http://support.cloudant.com/faqs/views/chained-mapreduce-views

Resources