I wish to use PouchDB - CouchDB for saving user data for my web application, but cannot find a way to control the access per user basis. My DB would simply consists of documents using user id as the key. I know there are some solutions:
One database per user - however it requires to monitor whenever a new user wants to save data in order to create a new DB, and may create a lot of DBs;
Proxy between client and CouchDB - however I don't want PouchDB to sync changes for the whole DB including documents of other users in those _all_docs, _revs_diff request.
Is there any suggestion for user access control for pouchDB for a user base of around 1 million (active users around 10 thousand only)?
The topic of a million or more databases has come up on the mailing list in the past. The conclusion was that it depends on how your operating system deals with that many files. CouchDB is just accessing parts of the .couch file when requested. Performance is related to how quickly it can find, open, access, and close that file.
There are tricks for some file systems like putting / delimiters in the database name--which will cause CouchDB to store them in matching directory structures such as groupA/userA.couch or using email-style database names com/bigbluehat/byoung.couch (or some similar).
If that's not sufficient, Apache CouchDB 2.0 brings in BigCouch code (which IBM Cloudant uses) to provide a fully auto-sharded CouchDB. It's not done yet, but it will provide scalability across multiple nodes using an Amazon Dynamo style sharding system.
Another option is to do your own username-based partitioning between multiple CouchDB servers or use IBM Cloudant (which is built for this level of scale).
All these options provide the same Apache CouchDB replication protocol and will work just fine with PouchDB sitting on the user's computer, phone, or tablet.
The user's device would then have their own database +/- any share databases. The apps on those million user devices would only have the scalability of their own content (aka hard drive space) to be concerned about. The app would replicate directly to the "cloud"-side user database for backup, web use, etc.
Hopefully something in there sounds promising. :)
Related
I have a news base that needs to work online and offline. I'm using CouchDB (IBM Cloudant ) and PouchDB to make this sync with the APP.
The problem is that the news is relatively "heavy" for having photos and am having sync problems because the size of the "docs", and does not see any need to synchronize all the news base, will only fill the user's mobile phone with unnecessary records.
I need to sync only some news, approx. five registers. I wonder how can I do this in CouchDB or PouchDB.
I looked in sync + filters documentation but does not answer me the question of the amount of sync docs (or at least did not see if it is possible).
I'm using a view to pull the news.
Since you're using a view to pull the news, you can use limit to limit the number of documents you fetch. You can also use since to determine when you need to fetch the next batch of documents (this will have to be executed periodically to check for the existence of new documents)
If you go down this route and if your app doesn't need need client -> server replication then you could use something lighter than PouchDB do store the documents and other info on the client.
Yes you can, use filtered replication for this: https://pouchdb.com/api.html#filtered-replication
Credits: Nolan Lawson (PouchDB core team).
I need to sync some document from Cloudant server to my iOS in swift language.
For that I use this official library
https://github.com/cloudant/CDTDatastore#overview
I need to understand how replicate only user documents.
I need to figure out the correct road.
Imagine you a ticket assistance system of a company.
All users can create the ticket and this is save in cloudant/couchdb server.
When the user uses a mobile platform, I would just like to synchronize him ticket
how can I do it?
Thank all
CDTDatastore is designed to sync the whole database, and cloudant/ couchdb doesn't provide a per document ACL. In order to only sync a specific users data you either need to use a filter function, which will significantly hit the performance of the replication, or use the one database per user model.
I am creating a couchdb database per user of my application, in which the application is granted database admin privileges. This is done so that the application can sync design docs -- but I do not want to expose my server to any risks.
There is no legitimate reason for a user to run a view on my server (they only use the server for 2-way sync'ing) so it wouldn't be hard to filter requests out that were attempting to view views?
Are there other security risks or DoS attacks I'm missing?
Every user that has read access to your database is able to run view. That's not an issue since view index builds once and updates incrementally.
But database admins can create new views whatever they like. Views couldn't consume a lot of CPU time since CouchDB limits their execution with timeout (default 5 sec), but they could consume a lot of disk space, especially if full doc content will be emitted from view - this could make single index view be bigger than whole database.
More over, database admins can run database and view index compactions - these operations are very heavy for disk IO (and sometimes for CPU too), especially for large databases (100GiBs+). These tasks may significantly slow down (single compaction probably may not, but multiple - easily will) your server performance if will be running at the peak of your users activity.
Things can get worse if you're using custom view server without sandbox feature (like Python, Erlang etc.). By the fact, they will allow your db admins execute custom code on your server though CouchDB. In this case, losing all databases and finding remote shell on your server are just the top of the iceberg of possibilities.
Resume: don't assign to database admins people whom you cannot trust and you'll be safe.
Problem at hand is as follows:
SaaS to keep maintenance records
95% of data would be specific to each user i.e. no need to be accessed by other users
5% of data shared (and contributed by all users), like parts that are used in maintenance
SaaS to be delivered as CouchApp i.e. with public facing CouchDB
So I am torn between database per user, and single database for all users.
Database per user seems to offer much easier backup and maintenance, smaller data set, and easier access control. On the negative side how could I handle shared data?
Is it possible to have database per user, and one common database for shared information (parts)? Then replicate parts documents from all user databases to central one, from there back to all user databases? How to handle conflicts in that case (or even better avoid if possible)?
Or any much simpler approach? Or bite the bullet and go with just one central database?
It depends on the nature of the shared data, I guess. It seems natural to have filtered replication flowing from the user databases to the shared databases and unfiltered replication from the shared database to the user databases; I think that covers your requirements? It makes it so that each user only has to read/write from/to their specific database, while you can still distribute out the shared docs.
It may be easier to query from the shared database directly instead of replicating it back into the user databases, but that really depends on what kind of data would be in there.
Currently I'm in the process of evaluating CouchDB for a new project.
Key constraint for this project is strong privacy. There need to be resources that are readable by exactly two users.
One usecase may be something similar to Direct Messages (DMs) on Twitter. Another usecase would be User / SuperUser access level.
I currently don't have any ideas about how to solve these kind of problems with CouchDB other than creating one Database that is accessable only by these 2 users. I wonder how I would then build views aggregating data from several databases?
Do you have any hints / suggestions for me?
I've asked this question several times on couchdb mailing lists, and never got an answer.
There are a number of things that couchdb is missing.
One of them is the document level security which would :
allow only certain users to view a doc
filter the documents indexed in a view on a user level permission base
I don't think that there is a solution to the permission considerations with the current couchdb implementation.
One solution would be to use an external indexing tool like lucene, and tag your documents with user rights, then issue a lucene query with user right definition in order to get the docs. It also implies extra load on your server(s) (lucene requires a JVM) and an extra delay for the data to be available (lucene indexing time ... )
As for the several databases solution, there are language framework implementations that simply don't allow to use more then one databases ( for instance couch_potato for Ruby ).
Having several databases also means that you'll have several replication processes if your databases are replicated.
Also, this means that the views will be updated for each of the database. In some cases this is better then have huge views indexed in a single database, but it also means that distinct users might not be up to date for a single source of information ( i.e some will have their views updated, other won't). So you cannot guarantee that the data is consistent for all users.
So unless something is implemented in the couch core in order to manage document level authorizations, CouchDB does not seem appropriate for managing data with privacy constraints.
There are a bunch of details missing about what you are trying to accomplish, what the data looks like, so it's hard to make a specific recommendation. You may be able to create a database per user and copy items into each users database (for the DM use case you described). Each user would only be able to access their own database, and then you could have an admin user that could access all databases. If you need to later update those records copying them to multiple databases might not be a good idea, and then you might consider whether you want to control permissions at a different level from storage.
For views that aggregate data from several databases, I recommend looking at lounge and bigcouch, which take different approaches.
http://tilgovi.github.com/couchdb-lounge/
http://support.cloudant.com/faqs/views/chained-mapreduce-views