CouchDB - Figuring out database security

CouchDB - Figuring out database security - security

CouchDB offers validation prior to allowing an object/row to be inserted into the database. This make sure that if you have a public facing couch application, you're database won't be filled with junk by just anyone.
User <-> CouchDB
However, I'm tring to figure out what that looks like comming from the standard application design process where you have a trusted middle layer that does much of the auth work. For example, most apps place Ruby or PHP between the database and user agent which allows the application to figure out information about the user agent before allowing something like a post to be saved to the database.
User -> Ruby -> MySQL
User <- Ruby <- MySQL
How do you trust the user to do administrative tasks when the user can't be trusted?
For example, how would you do something like "email verification" prior to inserting a user row using just couchDB? You can't let the user agent insert the row - because they would fill the system with spam accounts. On the other hand, there is no middle layer either that can insert the row after they click the link in the email.
How about this, I would assume that you would allow anyone to enter their email by creating a new record in a public table like email_verify. This is something that a public user agent could do as the table would not do anything in the application - it would just be a holding tank.
Then node.js could track the _changes feed and send an activation email while creating a new entry in a private table (like email_confirm) (node.js would serve as a trusted middle layer). If the user clicks that link and comes back then... [unknown] ... and node.js could finally create a record in the private user table (user).
At this point we could then rely on couchdb validation for the rest of the application since we got a confirmed user account created.
As more background lets imagine a discussion built on couchdb that anyone can register for. We don't want to allow just anyone to directly submit content without some kind of verification - yet the user agents all directly run the system. (Tables would be Thread, Comment, & User). How would this work?

I would think about adding roles to existing users in this issue.
Using couchdb's validation and changing _design/_auth can be a good idea to add email, email_verified and randomly generated email_verification_code in _users database when the user firsts registers.
To send mail, get confirmation, resend confirmation you can use external processes. (for an example usage of external process you can check couchdb-lucene).
And at last you can again do a quick check in _design/_auth in user update process if verification code matches and add verified_user role for that user.
This way all your requests would pass over couchdb, you would use external process only when you need to send mail and get confirmation.
Edit : Forgot to add (since it was pretty obvious), I would add verified_user role to database readers.

Couldn't you just make use of CouchDb's Validation ?
Users could be flagged. Upon registration, a User is added to the Users database. He gets his mail and then is flagged "valid:true" or something like this upon answering to that mail or clicking a link.
With validation users could not only be "logged in/out" but also access authorization can be implemented with more granular access rights. E.g.: Only mark threads solved if one is the author, admin, whatever...
Or does this seem impracticable?

After talking with some people on #couchdb IRC, it seems that they can't figure out out a way to do something administrative (like activation users that click on a email link) with out using a "backend" process like a node.js server which keeps track of the _changes feed.
I was hoping for a pure couchdb app - but it seems like couchdb still has a little ways to go.
Still, the good news is that you can hand off 80% of your applications logic/processing to your users. The other 20% will be 1) a node.js instance for things like sending emails or checking recaptcha and 2) record validation functions running in your couchdb, and 3) map/reduce (query) functions. These three things cannot be offloaded to something "untrusted" like a user-agent.

Related

Web user is not authorized to access a database despite having Editor access in the ACL

In my XPages application, web users can perform a self-registration. In the registration process, a user document for the web user is created in the address book and the user is added to a group that has Editor access for the database. After executing show nlcache reset on the Domino server, the user can login to and access the application.
In ~98% of all registrations this works perfectly fine. However, sometimes new users cannot enter the application after the login because, according to the Domino server, they "are not authorized to access" the database. The login must have worked because the user id is correct. The exact same user id can also be found in the Members field of the group that has Editor access to the database. To additionally verify the user's access level, I executed NotesDatabase.queryAccess() with the user's id. It returned 0, which is the ACL default and means "No Access". Yet, there are dozens of users in the same ACL group which have absolutely no problem with accessing the database.
At the moment, we "circumvent" this problem by manually removing the user's document from the address book as well as remove him/her from the Members of the ACL group. Afterwards we ask the user the re-do the self-registration with the exact same information as before. Up to now, this second registration has always worked and the user can access the application. Yet, this is not a real solution, which is why I have to ask if anyone knows what could be the problem?

Don't create entries in the address book directly. Use the adminp process for registration. To minimize perceived delay send a validation/confirmation message the user has to click.

Comment of 12/02/2015 seems to be the correct Answer:
Check if the self-registrated user has TWO consecutives spaces in his name, (could be because trailling space too)
In group domino do a FullTrim. So we have
John<space><space>Smith
that is not in group XXX because in the members it's:
John<space>Smith.

This may have something to do with the frequency at which the views index are refreshed in the names.nsf
Since the access control is done groups in the ACL, the server will "know" which user belongs to which group only after the views index have been updated.
In a normal setting, this can take a couple of minutes.
You can test this hypothesis by forcing an index refresh, either with CTRL-MAJ-F9 from your Notes client (warning, can take very long depending on network and number of entries in the names.nsf) or with the command
load updall -v names.nsf
... or by having the users wait a little while and try again 5min later.

Ok, first a question. If you let the user wait a couple of minutes will the access then work? I.e. is it a refresh/caching problem - or an inconsistency in the way you add the user to the group?
I assume that the format of the user name is correct as it works in most cases (i.e. fully hierarchical name)... Is there anything "special" about the names that do not work?
I do a similar thing (and has done several times) - although with some differences :-)
I typically use Directory Assistance to include my database with a "($Users)" view. When I update anything in this view I do a view.refresh() on the view (using Java). I typically do not use groups in these type of applications (either not applicable - or I use OU's or roles for specific users). I am not sure how the group membership is calculated - but I guess you could try to locate the relevant view (though none of them seemed obvious when I looked) - and do a refresh on it.
/John

How to ease CouchDB read/write restrictions on _users database

In my couchapp two databases are being used
1 Is for application data
2 Is "_users" database.
In my application In one form I'm trying to implement autocomplete where data source is a "view" created in "_users" database.
Now when I login with normal user id other than admin. While trying to access the view inside "_users" database I'm getting the error 403 which is :
{"error":"forbidden","reason":"Only admins can access design document actions for system databases."}
Is it possible for me to allow and limit the access for non admin users to that view only ? So I can get the list of users from _users database into my application.

I've never been able to do many tasks that require much custom with CouchDB by itself. I've always needed a script somewhere else that gives me the info that I need.
What works for me is this setup:
A gatekeeper Sinatra app that has admin access to my CouchDB
Using CouchDB's config to proxy to my Sinatra app. httpd_global_handlers _my_service {couch_httpd_proxy, handle_proxy_req, <<"http://127.0.0.1:9999">>}
The reason for the proxy is because any request that comes through to your gatekeeper will have the AuthSession token set. Inside your gatekeeper, you can GET localhost:5984/_session passing the AuthSession cookie along, it will tell you who is making the request, allowing you to look them up and see if they have access, or just give everyone access to whatever you like. Another reason for the proxy is to avoid any CORS nonsense since you're making the request to yourserver:5984/_my_service.
Update
A purely client-side/javascript solution means that it will be fundamentally insecure at some point, since well, everything is on the client-side. But perhaps your application, doesn't need to be that secure. That's up to you.
One workaround could be to make your application authenticate as a predefined admin, and then create more admin users that way. You could authenticate once when your application boots or on an as needed basis.
The "problem" is that CouchDB sees the _users database as fundamentally special, and doesn't give you the opportunity to change the credential requirements like other databases. Normally you would be able to use the _security document to give role based or user based access. But that's not possible with _users.
An alternative implementation might be to keep track of your own users and forgo the _users database altogether. In that case you could set your own cookies and have your own login and logout methods that don't depend on CouchDB's authentication scheme. You could query your own _view/users because it would be in your main database. Things wouldn't be locked down tight but they would work fine as long as no one was interested in hacking your system. :)

Node.js user system

I'm currently working on a web application which deals with multiple users. Whilst it currently works, it relies on some real bad practises which I'll outline in a minute.
We're using MySQL as the database system, since we're updating our current application, we want to ensure everything is backwards compatible. Otherwise I'd look at MongoDB etc.
Our users are stored in a table aptly named login. This contains their username, email, hashed password etc and a field which contains a JSON encoded object of their preferences. There is no real reason for doing this over using a meta table.
So the bad practises:
We're storing the entire users login row, excluding their password (although this is an internal-only app) in a cookie. It's JSON encoded.
Once the user logs in we have a secure HTTP cookie, readable only via Node.js for their username and their password so that we can continue to keep the user logged in automatically.
We have a app.get('*') route which constantly ensures that the user has their three cookies and updates their acc cookie with new preferences. This means that every time the user switches page or accesses a new AJAX item (all under the same routes) they have an updated cookie.
Every time a user performs an action we do this to get their user id: JSON.parse(res.cookies.acc).agent_id yuck!
Now, each user is able to perform actions to certain elements on the page, this effects everyone as the application is internal and anybody can work on the data inside of it.
I know what I want to achieve and how it should be done in say PHP, but I can't figure out the most effective way in Node.js.
I've started creating a User module which would allow us to get the user who performed the action and neatly update their preferences etc. You can see this here bearing in mind that it's a WIP. The issue I'm having with the module is that it doesn't have access to the users cookies, since it's not "a part of" Express. Which explains the last bad practise.
What would be the best way to handle such a system and remain bad-practise free?

I doubt it meets all of your requirements but its worth checking out out Drywall; A website and user system for Node.js
Hopefully it (or parts of it) could be helpful to you.
http://jedireza.github.io/drywall/

Is this safe for client side code?

I'm writing a GWT application where users login and interact with their profile. I understand that each form entry needs to be validated on the server, however, I am unsure about potential security issues once the user has logged in.
Let me explain. My application (the relevant parts) works as follows:
1 - user enters email/pass
2 - this info is sent back to the server, a DB is queried, passwords are checked (which are salted and hashed)
3. if the passwords match the profile associated w/ the email, this is considered success
Now I am unsure whether or not it is safe to pass the profile ID back to the client, which would then be used to query the DB for information relevant to the user to be displayed on the profile page.
Is there a possibility for a potential user to manually provide this profile ID and load a profile that way? My concern is that somebody w/ bad intentions could, if they knew the format of the profile ID, load an arbitrary amount of information from my DB without providing credentials.
-Nick

What you are dealing with here is a session management issue. Ideally, you want a way to keep track of logged in users (using random values as the session key), know how long they have been idle, be able to extend sessions as the user is using the site, and expire sessions.
Simply passing the profile ID to the client, and relying on it to send it back for each request is not sufficient - you are correct with your concern.
You want to keep a list of sessions with expiration times in a database. Every time an action is executed that needs user permissions (which should be pretty much everything), check to see if the session is still valid, if it is, extend it by however long you want. If it is expired, kill the session completely and log the user out.
You can store your session keys in a cookie (you have to trust the client at some point), but make sure they are non-deterministic and have a very large keyspace so it cannot be brute forced to get a valid session.

Since you're logging a user in, you must be using a backend that supports sessions (PHP, .Net, JAVA, etc), as Stefan H. said. That means that you shouldn't keep any ids on your client side, since a simple id substitution might grant me full access to another user's account (depending on what functionality you expose on your client, of course).
Any server request to get sensitive info (or for any admin actions) for the logged in user should look something like getMyCreditCard(), setMyCreditCard(), etc (note that no unique ids are passed in).

Is there a possibility for a potential user to manually provide this profile ID and load a profile that way? My concern is that somebody w/ bad intentions could, if they knew the format of the profile ID, load an arbitrary amount of information from my DB without providing credentials.
Stefan H is correct that you can solve this via session management if your session keys are unguessable and unfixable.
Another way to solve it is to use crypto-primitives to prevent tampering with the ID.
For example, you can store a private key on your server and use it to sign the profile ID. On subsequent requests, your server can trust the profile ID if it passes the signature check.

Rule 1 - Avoid cooking up your own security solution and use existing tested approaches.
Rule 2 - If your server side is java then you should be thinking along the lines of jsessionid. Spring Security will give you a good starting point to manage session ids with additional security features. There will be similar existing frameworks across php too (i did not see server side language tags in the question).
Rule 3 - With GWT you come across javascript based security issues with Google Team documents and suggests XSRF and XSS security prevention steps. Reference - https://developers.google.com/web-toolkit/articles/security_for_gwt_applications

CQRS Event Sourcing: Validate UserName uniqueness

Let's take a simple "Account Registration" example, here is the flow:
User visit the website
Click the "Register" button and fill out the form, click the "Save" button
MVC Controller: Validate UserName uniqueness by reading from ReadModel
RegisterCommand: Validate UserName uniqueness again (here is the question)
Of course, we can validate UserName uniqueness by reading from ReadModel in the MVC controller to improve performance and user experience. However, we still need to validate the uniqueness again in RegisterCommand, and obviously, we should NOT access ReadModel in Commands.
If we do not use Event Sourcing, we can query the domain model, so that's not a problem. But if we're using Event Sourcing, we are not able to query the domain model, so how can we validate UserName uniqueness in RegisterCommand?
Notice: User class has an Id property, and UserName is not the key property of the User class. We can only get the domain object by Id when using event sourcing.
BTW: In the requirement, if the entered UserName is already taken, the website should show the error message "Sorry, the user name XXX is not available" to the visitor. It's not acceptable to show a message, that says, "We are creating your account, please wait, we will send the registration result to you via Email later", to the visitor.
Any ideas? Many thanks!
[UPDATE]
A more complex example:
Requirement:
When placing an order, the system should check the client's ordering history, if he is a valuable client (if the client placed at least 10 orders per month in the last year, he is valuable), we make 10% off to the order.
Implementation:
We create PlaceOrderCommand, and in the command, we need to query the ordering history to see if the client is valuable. But how can we do that? We shouldn't access ReadModel in command! As Mikael said, we can use compensating commands in the account registration example, but if we also use that in this ordering example, it would be too complex, and the code might be too difficult to maintain.

If you validate the username using the read model before you send the command, we are talking about a race condition window of a couple of hundred milliseconds where a real race condition can happen, which in my system is not handled. It is just too unlikely to happen compared to the cost of dealing with it.
However, if you feel you must handle it for some reason or if you just feel you want to know how to master such a case, here is one way:
You shouldn't access the read model from the command handler nor the domain when using event sourcing. However, what you could do is to use a domain service that would listen to the UserRegistered event in which you access the read model again and check whether the username still isn't a duplicate. Of course you need to use the UserGuid here as well as your read model might have been updated with the user you just created. If there is a duplicate found, you have the chance of sending compensating commands such as changing the username and notifying the user that the username was taken.
That is one approach to the problem.
As you probably can see, it is not possible to do this in a synchronous request-response manner. To solve that, we are using SignalR to update the UI whenever there is something we want to push to the client (if they are still connected, that is). What we do is that we let the web client subscribe to events that contain information that is useful for the client to see immediately.
Update
For the more complex case:
I would say the order placement is less complex, since you can use the read model to find out if the client is valuable before you send the command. Actually, you could query that when you load the order form since you probably want to show the client that they'll get the 10% off before they place the order. Just add a discount to the PlaceOrderCommand and perhaps a reason for the discount, so that you can track why you are cutting profits.
But then again, if you really need to calculate the discount after the order was places for some reason, again use a domain service that would listen to OrderPlacedEvent and the "compensating" command in this case would probably be a DiscountOrderCommand or something. That command would affect the Order Aggregate root and the information could be propagated to your read models.
For the duplicate username case:
You could send a ChangeUsernameCommand as the compensating command from the domain service. Or even something more specific, that would describe the reason why the username changed which also could result in the creation of an event that the web client could subscribe to so that you can let the user see that the username was a duplicate.
In the domain service context I would say that you also have the possibility to use other means to notify the user, such like sending an email which could be useful since you cannot know if the user is still connected. Maybe that notification functionality could be initiated by the very same event that the web client is subscribing to.
When it comes to SignalR, I use a SignalR Hub that the users connects to when they load a certain form. I use the SignalR Group functionality which allows me to create a group which I name the value of the Guid I send in the command. This could be the userGuid in your case. Then I have Eventhandler that subscribe to events that could be useful for the client and when an event arrives I can invoke a javascript function on all clients in the SignalR Group (which in this case would be only the one client creating the duplicate username in your case). I know it sounds complex, but it really isn't. I had it all set up in an afternoon. There are great docs and examples on the SignalR Github page.

I think you are yet to have the mindset shift to eventual consistency and the nature of event sourcing. I had the same problem. Specifically I refused to accept that you should trust commands from the client that, using your example, say "Place this order with 10% discount" without the domain validating that the discount should go ahead. One thing that really hit home for me was something that Udi himself said to me (check the comments of the accepted answer).
Basically I came to realise that there is no reason not to trust the client; everything on the read side has been produced from the domain model, so there is no reason not to accept the commands. Whatever in the read side that says the customer qualifies for discount has been put there by the domain.
BTW: In the requirement, if the entered UserName is already taken, the website should show error message "Sorry, the user name XXX is not available" to the visitor. It's not acceptable to show a message, say, "We are creating your account, please wait, we will send the registration result to you via Email later", to the visitor.
If you are going to adopt event sourcing & eventual consistency, you will need to accept that sometimes it will not be possible to show error messages instantly after submitting a command. With the unique username example the chances of this happening are so slim (given that you check the read side before sending the command) its not worth worrying about too much, but a subsequent notification would need to be sent for this scenario, or perhaps ask them for a different username the next time they log on. The great thing about these scenarios is that it gets you thinking about business value & what's really important.
UPDATE : Oct 2015
Just wanted to add, that in actual fact, where public facing websites are concerned - indicating that an email is already taken is actually against security best practices. Instead, the registration should appear to have gone through successfully informing the user that a verification email has been sent, but in the case where the username exists, the email should inform them of this and prompt them to login or reset their password. Although this only works when using email addresses as the username, which I think is advisable for this reason.

There is nothing wrong with creating some immediately consistent read models (e.g. not over a distributed network) that get updated in the same transaction as the command.
Having read models be eventually consistent over a distributed network helps support scaling of the read model for heavy reading systems. But there's nothing to say you can't have a domain specific read model thats immediately consistent.
The immediately consistent read model is only ever used to check data before issuing a command, you should never use it for directly displaying read data to a user (i.e. from a GET web request or similar). Use eventually consistent, scaleable read models for that.

About uniqueness, I implemented the following:
A first command like "StartUserRegistration". UserAggregate would be created no matter if user is unique or not, but with a status of RegistrationRequested.
On "UserRegistrationStarted" an asynchronous message would be sent to a stateless service "UsernamesRegistry". would be something like "RegisterName".
Service would try to update (no queries, "tell don't ask") table which would include a unique constraint.
If successful, service would reply with another message (asynchronously), with a sort of authorization "UsernameRegistration", stating that username was successfully registered. You can include some requestId to keep track in case of concurrent competence (unlikely).
The issuer of the above message has now an authorization that the name was registered by itself so now can safely mark the UserRegistration aggregate as successful. Otherwise, mark as discarded.
Wrapping up:
This approach involves no queries.
User registration would be always created with no validation.
Process for confirmation would involve two asynchronous messages and one db insertion. The table is not part of a read model, but of a service.
Finally, one asynchronous command to confirm that User is valid.
At this point, a denormaliser could react to a UserRegistrationConfirmed event and create a read model for the user.

Like many others when implementing a event sourced based system we encountered the uniqueness problem.
At first I was a supporter of letting the client access the query side before sending a command in order to find out if a username is unique or not. But then I came to see that having a back-end that has zero validation on uniqueness is a bad idea. Why enforce anything at all when it's possible to post a command that would corrupt the system ? A back-end should validate all it's input else you're open for inconsistent data.
What we did was create an index table at the command side. For example, in the simple case of a username that needs to be unique, just create a user_name_index table containing the field(s) that need to be unique. Now the command side is able to query a username's uniqueness. After the command has been executed it's safe to store the new username in the index.
Something like that could also work for the Order discount problem.
The benefits are that your command back-end properly validates all input so no inconsistent data could be stored.
A downside might be that you need an extra query for each uniqueness constraint and you are enforcing extra complexity.

I think for such cases, we can use a mechanism like "advisory lock with expiration".
Sample execution:
Check username exists or not in eventually consistent read model
If not exists; by using a redis-couchbase like keyvalue storage or cache; try to push the username as key field with some expiration.
If successful; then raise userRegisteredEvent.
If either username exists in read model or cache storage, inform visitor that username has taken.
Even you can use an sql database; insert username as a primary key of some lock table; and then a scheduled job can handle expirations.

Have you considered using a "working" cache as sort of an RSVP? It's hard to explain because it works in a bit of a cycle, but basically, when a new username is "claimed" (that is, the command was issued to create it), you place the username in the cache with a short expiration (long enough to account for another request getting through the queue and denormalized into the read model). If it's one service instance, then in memory would probably work, otherwise centralize it with Redis or something.
Then while the next user is filling out the form (assuming there's a front end), you asynchronously check the read model for availability of the username and alert the user if it's already taken. When the command is submitted, you check the cache (not the read model) in order to validate the request before accepting the command (before returning 202); if the name is in the cache, don't accept the command, if it's not then you add it to the cache; if adding it fails (duplicate key because some other process beat you to it), then assume the name is taken -- then respond to the client appropriately. Between the two things, I don't think there'll be much opportunity for a collision.
If there's no front end, then you can skip the async look up or at least have your API provide the endpoint to look it up. You really shouldn't be allowing the client to speak directly to the command model anyway, and placing an API in front of it would allow you to have the API to act as a mediator between the command and read hosts.

It seems to me that perhaps the aggregate is wrong here.
In general terms, if you need to guarantee that value Z belonging to Y is unique within set X, then use X as the aggregate. X, after all, is where the invariant really exists (only one Z can be in X).
In other words, your invariant is that a username may only appear once within the scope of all of your application's users (or could be a different scope, such as within an Organization, etc.) If you have an aggregate "ApplicationUsers" and send the "RegisterUser" command to that, then you should be able to have what you need in order to ensure that the command is valid prior to storing the "UserRegistered" event. (And, of course, you can then use that event to create the projections you need in order to do things such as authenticate the user without having to load the entire "ApplicationUsers" aggregate.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string