Client Server Data Exchange Persistence - Smells - web

Suppose I have a client that sends some RunLogicCommand with input to a server. The server responds with some output which is a report for the user to verify. At this point, the server has not persisted anything. The client then sends back the entire report in a separate SaveCommand which will then persist the report data.
To me, certain parts of this exchange seem unnecessary. That is, once the user has verified the report, it seems unnecessary for them to send the entire report back to the server for persistence. Perhaps there is a chance some sensitive data could exposed here as well?
What is the typical approach in this case?
I can see two options:
The user just sends the RunLogicCommand with Input AGAIN with some flag specifying it should be persisted. I don't really like this option since the logic could be complex and take some time to compute.
cache the report on the server (or different service or even db), then just have the client send back the SaveCommand with the ID of the report to save.
Are there any problems with either of these approaches? Is there a better, more typical approach?
Thanks!

There is no single best solution here:
The cons for the approach you mentioned firsts are:
Increased network traffic,potentially increasing costs and giving slower response times
Can you be sure that the document you sent is the same one that has been received. You can but it would require extra work.
As you mentioned, there is an increased risk that sensitive data is exposed. However, you are sending it to the client.
The cons for the first of your two options are:
Running the report twice would increase the load on the server, giving an extra cost due to the need for more processing capacity.
If the underlying data has changed between the two requests. Then the report that was verified by the user and the report stored in the database may not be the same.
I would use a variation of your second option:
Store the report in the database as soon as it has been generated, with status "waiting for user verification"
When the user verifies the report, update the status as verified.
To avoid having many unverified reports in the database, you could have a batch job that checks for and deletes all unverified reports that are older than x days.

Related

Rest API real time Tricky Question- Need Answer

I was recently interviewed by a MNC technical panel and they asked me different questions related to RestAPI , i was able to answer all but below 2 questions though i answered but not sure if those are correct answers. Can somebody answer my queries with real time examples
1) How can i secure my Rest API when somebody send request from Postman.The user provides all the correct information in the header like session id, Token etc.
My answer was: The users token sent in the header of the request should be associated with the successfully authenticated user info then only the user will be granted access if the Request either comes from Postman or application calls these API.(The panel said no to my answer)
2) How can i handle concurrency in Rest API Means if multiple users are trying to access the API at the same given time (For e.g multiple post request are coming to update data in a table) how will you make sure one request is served at one time and accordingly the values are updated as requested by different user request.
2) My answer was: In Entitiy framework we have a class called DbUpdateConcurrencyException, This class takes of handling concurrency and serves one request is served at a time.
I am not sure about my both the above answers and i did not find any specific answer on Googling also.
Your expert help is appreciated.
Thanks
1) It is not possible, requests from Postman or any other client or proxy (Burp, ZAP, etc) are indistinguishable from browser requests, if the user has appropriate credentials (like for example can observe and copy normal requests). It is not possible to authenticate the client application, only the client user.
2) It would be really bad if a web application could only serve one client at a time. Think of large traffic like Facebook. :) In many (maybe most?) stacks, each request gets its own thread (or similar) to run, and that finishes when the request-response ends. These threads are not supposed to directly communicate with each other while running. Data consistency is a requirement of the persistence technology, ie. if you are using a database for example, it must guarantee that database queries are run one after the other. Note that if an application runs multiple queries, database transactions or locks need to be used on the database level to maintain consistency. But this is not at all about client requests, it's about how you use your persistence technology to achieve consistent data. With traditional RDBMS it's mostly easy, with other persistence technologies (like for example using plaintext files for storage) it's much harder, because file operations typically don't support a facility similar to transactions (but they do support locks, which you have to manage manually).

How to manage GUIDs offline

Given that clients can tamper with GUIDs if they are generated client-side, wondering how to mitigate this problem if you allow working offline.
Say you have a Todo list application and are working offline. From what I'm thinking, as you create todos, the client is creating GUIDs for the todos, as well as any attachments or associated records. Then say you go back online and it syncs. The GUIDs created on the client could have been tampered with, so something possibly needs to happen during a merge. Maybe all new GUIDs are created server-side, and sent back to the client to overwrite the client-generated ones. Not sure.
Wondering what best-practice is here.
I think yes, ids could be reassigned when sent to the server. One way this could be done is have a client-side id and a server-side id, the latter only assigned if it's saved. The client-side id can then also be removed from the design, but then upon a succesful save all references must be updated.
And then the problem is the inevitable inconsistency, because what happens if the server already received the update, assigned a server-side id, but the confirmation response never made it back to the client. Upon the next download, the client will see a new item on the server which it cannot associate with any client-side item, unless there is some kind of a heuristic to identify duplicates (eg. if all fields are the same in a client item without a server-side id, it is most probably the same).
I think this is less of a security question though, if the format of the id is validated (for example it must be a guid, ie. numbers, letters and dashes), it doesn't really matter what exactly the client sends. So from a security point of view, this is almost purely an input validation question, which of course must be in place, errors must be thrown on already existing ids and so on. Then it touches on access control as well, if multiple users are using the app, but that's a different topic, any access must be authorized anyway, and access control decisions must not be made solely on the id. That is, it's not a good access control model if you can access anything you know the id of.

Azure Change Feed Support and multiple local clients

We have a scenario where multiple clients would like to get updates from Document Db inserts, but they are not available online all the time.
Example: Suppose there are three clients registered with the system, but only one is online at present time. When the online client inserts/updates a document, we want the offline client(s) on wakes up to go look at change feed and update itself independently.
Now is there a way for each client to maintain it's own feed to the same partition (when they were last synced) and get the changes when the come online based on last sync?
When using change feed, you use continuation token per partition. Change feed continuation tokens do not expire, thus you can continue from any point. Each client can keep its own continuation token and read changes as needed/wakes up, this essentially means that each client can keep its own feed for each partition.

CQRS Event Sourcing: Validate UserName uniqueness

Let's take a simple "Account Registration" example, here is the flow:
User visit the website
Click the "Register" button and fill out the form, click the "Save" button
MVC Controller: Validate UserName uniqueness by reading from ReadModel
RegisterCommand: Validate UserName uniqueness again (here is the question)
Of course, we can validate UserName uniqueness by reading from ReadModel in the MVC controller to improve performance and user experience. However, we still need to validate the uniqueness again in RegisterCommand, and obviously, we should NOT access ReadModel in Commands.
If we do not use Event Sourcing, we can query the domain model, so that's not a problem. But if we're using Event Sourcing, we are not able to query the domain model, so how can we validate UserName uniqueness in RegisterCommand?
Notice: User class has an Id property, and UserName is not the key property of the User class. We can only get the domain object by Id when using event sourcing.
BTW: In the requirement, if the entered UserName is already taken, the website should show the error message "Sorry, the user name XXX is not available" to the visitor. It's not acceptable to show a message, that says, "We are creating your account, please wait, we will send the registration result to you via Email later", to the visitor.
Any ideas? Many thanks!
[UPDATE]
A more complex example:
Requirement:
When placing an order, the system should check the client's ordering history, if he is a valuable client (if the client placed at least 10 orders per month in the last year, he is valuable), we make 10% off to the order.
Implementation:
We create PlaceOrderCommand, and in the command, we need to query the ordering history to see if the client is valuable. But how can we do that? We shouldn't access ReadModel in command! As Mikael said, we can use compensating commands in the account registration example, but if we also use that in this ordering example, it would be too complex, and the code might be too difficult to maintain.
If you validate the username using the read model before you send the command, we are talking about a race condition window of a couple of hundred milliseconds where a real race condition can happen, which in my system is not handled. It is just too unlikely to happen compared to the cost of dealing with it.
However, if you feel you must handle it for some reason or if you just feel you want to know how to master such a case, here is one way:
You shouldn't access the read model from the command handler nor the domain when using event sourcing. However, what you could do is to use a domain service that would listen to the UserRegistered event in which you access the read model again and check whether the username still isn't a duplicate. Of course you need to use the UserGuid here as well as your read model might have been updated with the user you just created. If there is a duplicate found, you have the chance of sending compensating commands such as changing the username and notifying the user that the username was taken.
That is one approach to the problem.
As you probably can see, it is not possible to do this in a synchronous request-response manner. To solve that, we are using SignalR to update the UI whenever there is something we want to push to the client (if they are still connected, that is). What we do is that we let the web client subscribe to events that contain information that is useful for the client to see immediately.
Update
For the more complex case:
I would say the order placement is less complex, since you can use the read model to find out if the client is valuable before you send the command. Actually, you could query that when you load the order form since you probably want to show the client that they'll get the 10% off before they place the order. Just add a discount to the PlaceOrderCommand and perhaps a reason for the discount, so that you can track why you are cutting profits.
But then again, if you really need to calculate the discount after the order was places for some reason, again use a domain service that would listen to OrderPlacedEvent and the "compensating" command in this case would probably be a DiscountOrderCommand or something. That command would affect the Order Aggregate root and the information could be propagated to your read models.
For the duplicate username case:
You could send a ChangeUsernameCommand as the compensating command from the domain service. Or even something more specific, that would describe the reason why the username changed which also could result in the creation of an event that the web client could subscribe to so that you can let the user see that the username was a duplicate.
In the domain service context I would say that you also have the possibility to use other means to notify the user, such like sending an email which could be useful since you cannot know if the user is still connected. Maybe that notification functionality could be initiated by the very same event that the web client is subscribing to.
When it comes to SignalR, I use a SignalR Hub that the users connects to when they load a certain form. I use the SignalR Group functionality which allows me to create a group which I name the value of the Guid I send in the command. This could be the userGuid in your case. Then I have Eventhandler that subscribe to events that could be useful for the client and when an event arrives I can invoke a javascript function on all clients in the SignalR Group (which in this case would be only the one client creating the duplicate username in your case). I know it sounds complex, but it really isn't. I had it all set up in an afternoon. There are great docs and examples on the SignalR Github page.
I think you are yet to have the mindset shift to eventual consistency and the nature of event sourcing. I had the same problem. Specifically I refused to accept that you should trust commands from the client that, using your example, say "Place this order with 10% discount" without the domain validating that the discount should go ahead. One thing that really hit home for me was something that Udi himself said to me (check the comments of the accepted answer).
Basically I came to realise that there is no reason not to trust the client; everything on the read side has been produced from the domain model, so there is no reason not to accept the commands. Whatever in the read side that says the customer qualifies for discount has been put there by the domain.
BTW: In the requirement, if the entered UserName is already taken, the website should show error message "Sorry, the user name XXX is not available" to the visitor. It's not acceptable to show a message, say, "We are creating your account, please wait, we will send the registration result to you via Email later", to the visitor.
If you are going to adopt event sourcing & eventual consistency, you will need to accept that sometimes it will not be possible to show error messages instantly after submitting a command. With the unique username example the chances of this happening are so slim (given that you check the read side before sending the command) its not worth worrying about too much, but a subsequent notification would need to be sent for this scenario, or perhaps ask them for a different username the next time they log on. The great thing about these scenarios is that it gets you thinking about business value & what's really important.
UPDATE : Oct 2015
Just wanted to add, that in actual fact, where public facing websites are concerned - indicating that an email is already taken is actually against security best practices. Instead, the registration should appear to have gone through successfully informing the user that a verification email has been sent, but in the case where the username exists, the email should inform them of this and prompt them to login or reset their password. Although this only works when using email addresses as the username, which I think is advisable for this reason.
There is nothing wrong with creating some immediately consistent read models (e.g. not over a distributed network) that get updated in the same transaction as the command.
Having read models be eventually consistent over a distributed network helps support scaling of the read model for heavy reading systems. But there's nothing to say you can't have a domain specific read model thats immediately consistent.
The immediately consistent read model is only ever used to check data before issuing a command, you should never use it for directly displaying read data to a user (i.e. from a GET web request or similar). Use eventually consistent, scaleable read models for that.
About uniqueness, I implemented the following:
A first command like "StartUserRegistration". UserAggregate would be created no matter if user is unique or not, but with a status of RegistrationRequested.
On "UserRegistrationStarted" an asynchronous message would be sent to a stateless service "UsernamesRegistry". would be something like "RegisterName".
Service would try to update (no queries, "tell don't ask") table which would include a unique constraint.
If successful, service would reply with another message (asynchronously), with a sort of authorization "UsernameRegistration", stating that username was successfully registered. You can include some requestId to keep track in case of concurrent competence (unlikely).
The issuer of the above message has now an authorization that the name was registered by itself so now can safely mark the UserRegistration aggregate as successful. Otherwise, mark as discarded.
Wrapping up:
This approach involves no queries.
User registration would be always created with no validation.
Process for confirmation would involve two asynchronous messages and one db insertion. The table is not part of a read model, but of a service.
Finally, one asynchronous command to confirm that User is valid.
At this point, a denormaliser could react to a UserRegistrationConfirmed event and create a read model for the user.
Like many others when implementing a event sourced based system we encountered the uniqueness problem.
At first I was a supporter of letting the client access the query side before sending a command in order to find out if a username is unique or not. But then I came to see that having a back-end that has zero validation on uniqueness is a bad idea. Why enforce anything at all when it's possible to post a command that would corrupt the system ? A back-end should validate all it's input else you're open for inconsistent data.
What we did was create an index table at the command side. For example, in the simple case of a username that needs to be unique, just create a user_name_index table containing the field(s) that need to be unique. Now the command side is able to query a username's uniqueness. After the command has been executed it's safe to store the new username in the index.
Something like that could also work for the Order discount problem.
The benefits are that your command back-end properly validates all input so no inconsistent data could be stored.
A downside might be that you need an extra query for each uniqueness constraint and you are enforcing extra complexity.
I think for such cases, we can use a mechanism like "advisory lock with expiration".
Sample execution:
Check username exists or not in eventually consistent read model
If not exists; by using a redis-couchbase like keyvalue storage or cache; try to push the username as key field with some expiration.
If successful; then raise userRegisteredEvent.
If either username exists in read model or cache storage, inform visitor that username has taken.
Even you can use an sql database; insert username as a primary key of some lock table; and then a scheduled job can handle expirations.
Have you considered using a "working" cache as sort of an RSVP? It's hard to explain because it works in a bit of a cycle, but basically, when a new username is "claimed" (that is, the command was issued to create it), you place the username in the cache with a short expiration (long enough to account for another request getting through the queue and denormalized into the read model). If it's one service instance, then in memory would probably work, otherwise centralize it with Redis or something.
Then while the next user is filling out the form (assuming there's a front end), you asynchronously check the read model for availability of the username and alert the user if it's already taken. When the command is submitted, you check the cache (not the read model) in order to validate the request before accepting the command (before returning 202); if the name is in the cache, don't accept the command, if it's not then you add it to the cache; if adding it fails (duplicate key because some other process beat you to it), then assume the name is taken -- then respond to the client appropriately. Between the two things, I don't think there'll be much opportunity for a collision.
If there's no front end, then you can skip the async look up or at least have your API provide the endpoint to look it up. You really shouldn't be allowing the client to speak directly to the command model anyway, and placing an API in front of it would allow you to have the API to act as a mediator between the command and read hosts.
It seems to me that perhaps the aggregate is wrong here.
In general terms, if you need to guarantee that value Z belonging to Y is unique within set X, then use X as the aggregate. X, after all, is where the invariant really exists (only one Z can be in X).
In other words, your invariant is that a username may only appear once within the scope of all of your application's users (or could be a different scope, such as within an Organization, etc.) If you have an aggregate "ApplicationUsers" and send the "RegisterUser" command to that, then you should be able to have what you need in order to ensure that the command is valid prior to storing the "UserRegistered" event. (And, of course, you can then use that event to create the projections you need in order to do things such as authenticate the user without having to load the entire "ApplicationUsers" aggregate.

Best way to limit (and record) login attempts

Obviously some sort of mechanism for limiting login attempts is a security requisite. While I like the concept of an exponentially increasing time between attempts, what I'm not sure of storing the information. I'm also interested in alternative solutions, preferrably not including captchas.
I'm guessing a cookie wouldn't work due to blocking cookies or clearing them automatically, but would sessions work? Or does it have to be stored in a database? Being unaware of what methods can/are being used so I simply don't know what's practical.
Use some columns in your users table 'failed_login_attempts' and 'failed_login_time'. The first one increments per failed login, and resets on successful login. The second one allows you to compare the current time with the last failed time.
Your code can use this data in the db to determine how long it waits to lock out users, time between allowed logins etc
Assuming google has done the necessary usability testing (not an unfair assumption) and decided to use captchas , I'd suggest going along with them.
Increasing timeouts is frustrating when I'm a genuine user and have forgotten my password (with so many websites and their associated passwords that happens a lot , especially to me)
Storing attempts in the database is the best solution IMHO since it gives you the auditing records of the security breach attempts. Depending on your application this may or may not be a legal requirement.
By recording all bad attempts you can also gather higher level information, such as if the requests are coming from one IP address (i.e. someone / thing is attempting a brute force attack) so you can block the IP address. This can be VERY usefull information.
Once you have determined a threshold, why not force them to request the email to be sent to their email address (i.e. similar to 'I have forgotten my password'), or you can go for the CAPCHA approach.
Answers in this post prioritize database centered solutions because they provide a structure of records that make auditing and lockout logic convenient.
While the answers here address guessing attacks on individual users, a major concern with this approach is that it leaves the system open to Denial of Service attacks. Any and every request from the world should not trigger database work.
An alternative (or additional) layer of security should be implemented earlier in the req/ res cycle to protect the application and database from performing lock out operations that can be expensive and are unnecessary.
Express-Brute is an excellent example that utilizes Redis caching to filter out malicious requests while allowing honest ones.
You know which userid is being hit, keep a flag and when it reaches a threshold value simply stop accepting anything for that user. But that means you store an extra data value for every user.
I like the concept of an exponentially increasing time between attempts, [...]
Instead of using exponentially increasing time, you could actually have a randomized lag between successive attempts.
Maybe if you explain what technology you are using people here will be able to help with more specific examples.
Lock out Policy is all well and good but there is a balance.
One consideration is to think about the consruction of usernames - guessable? Can they be enumerated at all?
I was on an External App Pen Test for a dotcom with an Employee Portal that served Outlook Web Access /Intranet Services, certain Apps. It was easy to enumerate users (the Exec /Managament Team on the web site itself, and through the likes of Google, Facebook, LinkedIn etc). Once you got the format of the username logon (firstname then surname entered as a single string) I had the capability to shut 100's of users out due to their 3 strikes and out policy.
Store the information server-side. This would allow you to also defend against distributed attacks (coming from multiple machines).
You may like to say block the login for some time say for example, 10 minutes after 3 failure attempts for example. Exponentially increasing time sounds good to me. And yes, store the information at the server side session or database. Database is better. No cookies business as it is easy to manipulate by the user.
You may also want to map such attempts against the client IP adrress as it is quite possible that valid user might get a blocked message while someone else is trying to guess valid user's password with failure attempts.

Resources