Restricting resource access in CouchDB to exactly 2 users - resources

Currently I'm in the process of evaluating CouchDB for a new project.
Key constraint for this project is strong privacy. There need to be resources that are readable by exactly two users.
One usecase may be something similar to Direct Messages (DMs) on Twitter. Another usecase would be User / SuperUser access level.
I currently don't have any ideas about how to solve these kind of problems with CouchDB other than creating one Database that is accessable only by these 2 users. I wonder how I would then build views aggregating data from several databases?
Do you have any hints / suggestions for me?

I've asked this question several times on couchdb mailing lists, and never got an answer.
There are a number of things that couchdb is missing.
One of them is the document level security which would :
allow only certain users to view a doc
filter the documents indexed in a view on a user level permission base
I don't think that there is a solution to the permission considerations with the current couchdb implementation.
One solution would be to use an external indexing tool like lucene, and tag your documents with user rights, then issue a lucene query with user right definition in order to get the docs. It also implies extra load on your server(s) (lucene requires a JVM) and an extra delay for the data to be available (lucene indexing time ... )
As for the several databases solution, there are language framework implementations that simply don't allow to use more then one databases ( for instance couch_potato for Ruby ).
Having several databases also means that you'll have several replication processes if your databases are replicated.
Also, this means that the views will be updated for each of the database. In some cases this is better then have huge views indexed in a single database, but it also means that distinct users might not be up to date for a single source of information ( i.e some will have their views updated, other won't). So you cannot guarantee that the data is consistent for all users.
So unless something is implemented in the couch core in order to manage document level authorizations, CouchDB does not seem appropriate for managing data with privacy constraints.

There are a bunch of details missing about what you are trying to accomplish, what the data looks like, so it's hard to make a specific recommendation. You may be able to create a database per user and copy items into each users database (for the DM use case you described). Each user would only be able to access their own database, and then you could have an admin user that could access all databases. If you need to later update those records copying them to multiple databases might not be a good idea, and then you might consider whether you want to control permissions at a different level from storage.
For views that aggregate data from several databases, I recommend looking at lounge and bigcouch, which take different approaches.
http://tilgovi.github.com/couchdb-lounge/
http://support.cloudant.com/faqs/views/chained-mapreduce-views

Related

What are best practices for partitioning data in MongoDB?

I'm creating a social site using mean stack and I need some suggestions regarding mongoDB and mongoose.
I'm part of a startup and we decided to use these amazing technologies to fulfil our task.
Basically, I need some suggestions.
Currently, I have finished creation of simple CRUD and implemented local passport JS. I have currently one single collection in my mongoDB called users.
Our social site will have a blog, marketplace and many other pages (features) that will be related to a single user.
Since I never worked with mongoDB before, I'm curious if mongoDB should use one collection per user or have multiple collections for each feature.
To clarify it, let's say I use User model for user registration, blog model for blogs etc etc.
This would really mean a lot to me if you would shortly explain me how to structure my mongoose models, if all data should be inside one collection or if one user should have separate collections for different features. And if you recommend multiple collections, how do I then link these collections together and make sure that all data is saved for one user etc.
Thanks a lot in advance!
I will explain partitioning/dividing into two level.
Of course, you're going to create different collections for different models. Such as Users, Blogs, Messages etc.
Now comes the 2nd part, if we are talking about millions of data. How you partition them for faster data lookup.
For example, you have 1M users, which you are going to put in one big collection of 'Users'. But if you look for a user whose first name is 'Imdad' and age is 28, Now your query looks through these 1M items in your single Users collection, which will take a good amount of time.
To solve this problem, users collection can be divided into multiple collections through horizontal partition (Users1 (age between 10-20), Users2 (age between 20-30), Users3 (age between 30-40)). Now based on your query predicate monggoDB is to look up into a different collection/s. This is the idea that MongoDB has applied like other SQL DB. You don't have to explicitly execute your query to the chunk collection but the mongoDB itself take care of that.
Shard key generation
Mongoose shard key
If you are using mongoDB as a backend for a REST interface, the best practice is to create on collection per resource. For example, if you intend on having a /api/users endpoint, you should have users collection and it should contain any and everything you intend to return on that endpoint.
If you are using node to compile server-side templates, structure can be more flexible. In this case, the above still applies (as you will probably eventually want to expose a REST service), but there is more flexibility. In fact, if a many-to-many style relationship is appropriate, it is easier to separate these collections and load them together in the same page.
As an aside, you mention having users and a marketplace. The bigger issue than the separation of data into collections is the use of transactions. Any time you intend on performing a transaction of data, it should be performed within a SQL transaction. There are no notions of transactions in MongoDB. This is by design, as MongoDB is designed to be a fast, scaleable data store. It is not unreasonable to amalgamate SQL and noSQL data, in this case.

Using federations to partition for multiple tenants

Given the following "facts" I have gleaned from reading around this.
Federations are separate databases from the moment they are created.
As copies of the original, they will not alter automatically if I alter the original's schema.
As separate databases you cannot cross join.
Each federation is priced as a separate db.
I will have to provide a TenantId field to each table I want to federate.
If these are correct, what are the advantages to using federation to achieve multi-tenancy over simply separate dbs? Or if there're not correct please put me straight.
Note, we have a small number of tenants, maybe 20.
Your understanding is correct.
There are a few interesting aspects of Federations that you may find useful. First it is a relatively flexible partitioning environment. For example you can group 10 tenants into the first member, and 50 in the second, based on usage patterns of your customers. Or you could simply isolate a single customer that is using the system more than the others.
Another important concept is that you can have multiple federations per database. So you could have a Customer federation and a SalesHistory federation for example.
Last but not least you may want to read this article that discusses connection pool fragmentation that occurs in traditional sharding models, but is not an issue with SQL Database Federations.

SaaS, central database, database per user, or combination?

Problem at hand is as follows:
SaaS to keep maintenance records
95% of data would be specific to each user i.e. no need to be accessed by other users
5% of data shared (and contributed by all users), like parts that are used in maintenance
SaaS to be delivered as CouchApp i.e. with public facing CouchDB
So I am torn between database per user, and single database for all users.
Database per user seems to offer much easier backup and maintenance, smaller data set, and easier access control. On the negative side how could I handle shared data?
Is it possible to have database per user, and one common database for shared information (parts)? Then replicate parts documents from all user databases to central one, from there back to all user databases? How to handle conflicts in that case (or even better avoid if possible)?
Or any much simpler approach? Or bite the bullet and go with just one central database?
It depends on the nature of the shared data, I guess. It seems natural to have filtered replication flowing from the user databases to the shared databases and unfiltered replication from the shared database to the user databases; I think that covers your requirements? It makes it so that each user only has to read/write from/to their specific database, while you can still distribute out the shared docs.
It may be easier to query from the shared database directly instead of replicating it back into the user databases, but that really depends on what kind of data would be in there.

Can you have a DbContext that is associated with multiple databases?

I have a User database that houses all of the user information plus permissions to applications, etc. If I have a general database as described and then other databases for each Web Application, can I link up databases to make Relationships between the two databases using Fluent API or Code First? There are not so elegant ways to do this, but I wanted to ask the question first before getting involved with a custom solution.
For example: 1 DbContext, DbSets for each table in the 2 databases. Ability to relate entities between databases with Fluent API.
Thanks in advance.
The answer is no. The context is related to a single database. There is even no easy way to hack this because the context still can create queries only for a single database so if you want to have access to multiple databases you need either a context for each database (no cross context queries or relations exists) or you need to expose all tables from other databases as views or aliases in the database used by the context.

Hibernate Security Apprehension: Hibernate vs. Stored Procedures

At the company that I work with, we often have to integrate with client’s infrastructure.
Recently, after hearing that we use Hibernate, one client manifested following concern: Since user under which Hibernate connects to database has a direct access to tables and Hibernate generates SQL dynamically, then such user can do pretty mach anything in the database.
Had the user only permission to execute stored procedures, then SPs can limit the data but more importantly type of queries he can issue to database: basically no dynamic and injected SQL. So, if there is a stored procedure that eliminates a row, malicious person who got hold of user credentials will be able to eliminate single row in one go, but will not be able to issue the DELETE *. I know Hibernate can also map views, but again this limits the data and not the operations user can perform. Hibernate can also execute SPs, but that in a great extent beats the purpose of using Hibernate and would imply a complete rewrite of application.
While I don’t see this as a major concern, since application servers also provide security, I had a problem of convincing the client. What’s your take on this? Is Hibernate really less secure than application using stored procedures? What are additional security measures that can be put in place when working with Hibernate?
NHibernate can map to sprocs instead of tables
You can map read operations onto tables / views and insert / update / delete operations onto sprocs if you like
NHibernate does generates parameterised SQL, i.e. there is no chance of SQL injection
User permissions can always be restricted to certain operations on certain tables, if you decide to map onto tables and / or views
Most projects using sprocs start off by generating CRUD procedures for every table and assigning execute permissions on them all - this is not really much more secure than allowing table access
I would assume Hibernate uses paramaterized queries. That should alleviate much of the concern for SQL Injection. You can also prevent the user account from being able to do everything in the database. It doesn't need to be the SA account after all.
If I am not mistaken NHibernate use parametrized sql queries. This will stop injection.
Hibernate of course is just a ORM layer over sql.
Add the show_sql=true property on, show them what sql is being generated, and they'll see exactly what it does (parametrized queries as was mentioned).
Hibernate can be less secure than using stored procedures, since in theory, DBAs can limit user access to only calling stored procedures, rather than direct access to the underlying data structures.
In practice and in my experience, it's extremely rare for this style of security to be implemented in a meaningful way. If a stored procedure is written for each CRUD operation, and a user is granted access to all of the stored procedures, there is no real difference between that and just granting rights to the underlying structures themselves.
If the company is audited for SOX or security compliance, they might get dinged for not using stored procedures.
It is possible to use Hibernate over stored procedures, but it seems like a pain in the ass.

Resources