I am creating a couchdb database per user of my application, in which the application is granted database admin privileges. This is done so that the application can sync design docs -- but I do not want to expose my server to any risks.
There is no legitimate reason for a user to run a view on my server (they only use the server for 2-way sync'ing) so it wouldn't be hard to filter requests out that were attempting to view views?
Are there other security risks or DoS attacks I'm missing?
Every user that has read access to your database is able to run view. That's not an issue since view index builds once and updates incrementally.
But database admins can create new views whatever they like. Views couldn't consume a lot of CPU time since CouchDB limits their execution with timeout (default 5 sec), but they could consume a lot of disk space, especially if full doc content will be emitted from view - this could make single index view be bigger than whole database.
More over, database admins can run database and view index compactions - these operations are very heavy for disk IO (and sometimes for CPU too), especially for large databases (100GiBs+). These tasks may significantly slow down (single compaction probably may not, but multiple - easily will) your server performance if will be running at the peak of your users activity.
Things can get worse if you're using custom view server without sandbox feature (like Python, Erlang etc.). By the fact, they will allow your db admins execute custom code on your server though CouchDB. In this case, losing all databases and finding remote shell on your server are just the top of the iceberg of possibilities.
Resume: don't assign to database admins people whom you cannot trust and you'll be safe.
Related
I am building a Node.js application which uses a few global variables to track data such as online users and statuses, information about other servers, and ongoing events, but having this information be lost in the event of server restart/crash is not ideal.
As these things are frequently read & modified, I figure it would not be a good idea to put that extra strain on my existing MySQL database. I have looked into Redis but unfortunately my application is hosted on a Windows server so I would have to use an old unsupported version of it which isn't ideal.
I'm currently considering setting up a NoSQL database such as MongoDB, but I'm not sure if this is an efficient solution and if it would be too much on my relatively weak server to have an application and 2 different databases running.
What would be the best solution for persistent storage of data that needs to be frequently accessed and updated by an application?
Making my comments into an answer...
If it's a reasonable amount of data, you can just write JSON to a single data file. No database required. Just overwrite the file with a new block of JSON to save the new state. This is very fast, efficient and simple. I've used this before as a quick and easy way to regularly save snapshots of state that you want to be able to reload if your server restarts. Read the state into memory upon server start, then use it from memory, then regularly save a new snapshot to disk however often your application desires.
If some data changes a lot and some data doesn't change very much, you can break the data into multiple files so you're writing less data on the more frequent interval. Obviously, there is a threshold of amount of data or frequency of writes or complexity of data access where a database would be warranted, but you should at least consider the simpler option first and only add a new database when you think you really need it.
If you cluster your servers in the future, that would speak to a multi-user database (one with appropriate concurrency management features) to be your master keeper of state, but you're going to have other design issues to work through if you're trying to share multi-user state (like online status) across all clustered servers as you can no longer keep that in memory for any server unless all state changes are broadcast to all servers so they can update their in-memory copy of the data or unless you make users sticky to a particular server (which complicates load balancing in clustering). That does somewhat call for a redis-like central store that all clustered servers can access.
I chose to use docker with node-js and mongoDB to make a game, and set all of my game sections (battles, chat, resources, etc) into different servers so each section will have its own process, because for example my resource server will be running a lot as it will do all the calculation of the resources of each user every second and will handle requests from other servers, like if the user have enough resources to build a building or how much resources the player will lose if attacked etc.
For my alpha and beta versions i'm planning on running one server, that will run all the sections of my game, but the way i'm doing it now, every section is having its own mongodb instance, so i have resourcesDB, mainDB (users info, login info and stuff), all the instances won't have many collections, for example resourcesDB have only 2 collections, resource collections, which each user will have 1 document with stuff related to resources, and logs collection that will store all user's usage (building upgrades, battles lost etc.)
I have read about multi instances on the same servers, and they all said that its not the best way to do it, as it will have an impact on my performance, in the future i am planning on separate all of my sections into different servers so each section will have its own server and DB server if needed.
Should I maybe build 1 instance and separate it on the database level? that means that I will have to connect to the same instance from multiple servers, resource server will update all the user's resources count every second, combine that with battle server that will update the state of troops in DB and more for each war that is going on, i will also have chat database that will be updating with each private message or chat message, will it cause any issues? or it will cause problems only if I connect to a single database and separate it on the collections level?
Is there a reason why i should not continue with creating multiple instances? i read that there might be a problem with config files if using multiple instances on same server, but i assume docker handle that part.
UPDATE - i have found an excellent answer here: https://dba.stackexchange.com/questions/156811/mongodb-in-micro-services-structure/156984#156984
Using microservice architecture for your app is good idea but I don't see a good reason to separate the database. MongoDB can handle it for you (Sharding). Also MongoDB not block write to collection A when you write to collection B. if you will separate your app to multiple database it will be hard to do backups and to maintain the app.
You can read here about concurrency options for MongoDB and see how MongoDB handle collection level locking.
https://docs.mongodb.com/manual/faq/concurrency/
I have an API-service in Node.js, basically what it does is gets id from request, reads record with this id from the database and returns it back in response.
While there are many clients with different ids usually only about 10-20 of them are used in a given timespan.
Is it a good idea to create an object with ids as keys and store the resulting record along with last_requested time to emulate a small database with fast-access? Whenever a record is requested I will update the last_requested field with new Date(). Also, create a setInterval() to delete those keys which were not used for some time.
Records in the database do not change often, and when they do I can restart the service (there are several instances running simultaneously via PM2, so they can be gracefully restarted).
If the required id is not found in this "database" a request to real database will be performed and the result will be stored in the object in a new key.
You're talking about caching. And it's very useful, if
You have a lot of reads, but not a lot of writes. i.e. Lots of people request a record, and it changes rarely.
You have a lot of free memory, or not many records.
You have a good indication of when to invalidate the cache.
For trivial usecases (i.e. under 50 requests / second), you probably don't need an in-memory cache for the database. Moreover, database access is very fast if you use the tools the database gives you (like persistent connection pools, consistent parameterized queries, query cache, etc).
It all depends on your specific usecase. But I wouldn't do it until I actually start encountering performance problems, and determine that the database is the bottleneck.
It's not just a good idea, caching is a necessity in different level of a computational system. Caching start from the CPU level (L1, L2, L3), OS Level up to application level which must be done by the developer.
Even if you have a well structured Database with good indexes, still there is an overhead for TCP-IP communication between your app and database. So if you are going to access some row frequently it's a must to have them in your app process.
The good news is Node.js apps are single process resident in memory (unlike PHP or other scripting programs which come and go). So you can load frequent required data and omit the database access.
The best mechanism to store the record can be an LRU (least-recently-used) cache. There are several LRU cache packages available for node.js:
https://github.com/adzerk/node-lru-native
https://github.com/isaacs/node-lru-cache
https://www.npmjs.com/package/simple-lru-cache
In an LRU cache you can define how much memory the cache can use, expiry age of each item, and how many item it can store! or you can write your own!
I wish to use PouchDB - CouchDB for saving user data for my web application, but cannot find a way to control the access per user basis. My DB would simply consists of documents using user id as the key. I know there are some solutions:
One database per user - however it requires to monitor whenever a new user wants to save data in order to create a new DB, and may create a lot of DBs;
Proxy between client and CouchDB - however I don't want PouchDB to sync changes for the whole DB including documents of other users in those _all_docs, _revs_diff request.
Is there any suggestion for user access control for pouchDB for a user base of around 1 million (active users around 10 thousand only)?
The topic of a million or more databases has come up on the mailing list in the past. The conclusion was that it depends on how your operating system deals with that many files. CouchDB is just accessing parts of the .couch file when requested. Performance is related to how quickly it can find, open, access, and close that file.
There are tricks for some file systems like putting / delimiters in the database name--which will cause CouchDB to store them in matching directory structures such as groupA/userA.couch or using email-style database names com/bigbluehat/byoung.couch (or some similar).
If that's not sufficient, Apache CouchDB 2.0 brings in BigCouch code (which IBM Cloudant uses) to provide a fully auto-sharded CouchDB. It's not done yet, but it will provide scalability across multiple nodes using an Amazon Dynamo style sharding system.
Another option is to do your own username-based partitioning between multiple CouchDB servers or use IBM Cloudant (which is built for this level of scale).
All these options provide the same Apache CouchDB replication protocol and will work just fine with PouchDB sitting on the user's computer, phone, or tablet.
The user's device would then have their own database +/- any share databases. The apps on those million user devices would only have the scalability of their own content (aka hard drive space) to be concerned about. The app would replicate directly to the "cloud"-side user database for backup, web use, etc.
Hopefully something in there sounds promising. :)
Problem at hand is as follows:
SaaS to keep maintenance records
95% of data would be specific to each user i.e. no need to be accessed by other users
5% of data shared (and contributed by all users), like parts that are used in maintenance
SaaS to be delivered as CouchApp i.e. with public facing CouchDB
So I am torn between database per user, and single database for all users.
Database per user seems to offer much easier backup and maintenance, smaller data set, and easier access control. On the negative side how could I handle shared data?
Is it possible to have database per user, and one common database for shared information (parts)? Then replicate parts documents from all user databases to central one, from there back to all user databases? How to handle conflicts in that case (or even better avoid if possible)?
Or any much simpler approach? Or bite the bullet and go with just one central database?
It depends on the nature of the shared data, I guess. It seems natural to have filtered replication flowing from the user databases to the shared databases and unfiltered replication from the shared database to the user databases; I think that covers your requirements? It makes it so that each user only has to read/write from/to their specific database, while you can still distribute out the shared docs.
It may be easier to query from the shared database directly instead of replicating it back into the user databases, but that really depends on what kind of data would be in there.