Relialibility of user identity in actions-on-google for Dialogflow authentification

Relialibility of user identity in actions-on-google for Dialogflow authentification - security

I am currently developing a strategy for Dialogflow on https://passportjs.org.
From what I've learnt, Dialogflow doesn't authenticate users. So I'm thinking about making a strategy (for passportjs) that identify users from every plateform differently (analyse the originalRequest differently for each plateform).
For example, the Telegram originalRequest has this field:
originalRequest.data.message.from.id
The Telegram says this field is a:
"Unique identifier for this user or bot"
So I think it is safe to use it for authentication and identify every intent of my users fulfilled by my webhook.
I was wondering about the actions-on-google authentication and I found the field originalRequest.data.user.userId.
The documentation says:
"Users can reset this identifier, so don't store important user data keyed off this identifier, because once it's reset, that information is no longer accessible by the user."
So the only reason to not trust the userId is because it can be reset? At the end of the documentation it says:
User ID lifetime - User IDs are reset automatically after 30 days of inactivity or if users unlink their accounts on the device.
And:
"If a registered user's voice isn't recognized by the device or no registered voice exists, then a different ID is used that is unique for just that conversation."
How to differentiate users from one other? Can some IDs be recycled?

The best way to differentiate users from each other is to use the userId field, as you've determined. On the AoG platform, the userId is meant to be used somewhat like a web cookie can be used - if you see it again, you are assured that this is the same user that used it last time. But if you see a new one, you have to assume that you've never seen this user before, even if it means they deleted the cookie.
To be clear - most of the time, the UserId will remain the same and you can expect returning users to have the same ID. This won't be true in only three cases:
They have reset the ID for this Action. So they have deliberately chosen to start over.
They didn't use the Action for 30 days, in which case it makes sense to treat them as a brand new user anyway in most cases.
They were not recognized as a normal user of this device, so they are treated anonymously. (This is the equivalent of the clunky "Do not remember me on this machine" setting you see on websites, which forces a session cookie rather than a persistent cookie.)
The phrasing is poor in the documentation - I think it is meant to remind developers that the user is ultimately in charge of their privacy. And Google both forces you to do the same and adopt policies that do so.
IDs will not be recycled. In fact, they won't even be re-used between different Actions, even for the same Assistant account.
Summary: If you see the same UserId, you can trust it is the same user you saw before. If you see a new one, assume they are a new user.
If you want a more robust way to identify users, you might consider using Account Linking which puts you in control of the identifying token. But that has significant additional overhead.
Be careful when using other authentication methods - Google limits how you're allowed to use them as part of an Action, and expressly forbids them in some cases. See the General Policies for details.

Related

How do services/platforms uniquely identify users?

If you log into a platform (Twitch, Blizzard, Steam, Most Crypto exchanges, Most Banks) from a new device you'll typically get an email stating so.
As far as my knowledge goes, the only information you can get on a request is
IP address
Device Operating system & version
Browser type & version
Are these platforms basing their "unique" users off of this information alone and/or am is there more information that can be gathered?

From a security perspective the largest thing is your identity or how you authenticate. That's king. The email stating "hey this is a new device" I've seen handled differently from site to site. Most commonly it's actually browser cache and I see banks specifically use browser cache to store these kinds of tokens. Otherwise every time your cellphone connected to a new cell tower you'd likely be flagged as different. They're not necessarily the same as an authentication token, rather it just says hey I've authenticated as this user to this site before. Since it's generated by the service provider, the service provider knows to trust it, and it's nearly impossible to hack (assuming it's implemented correctly).
From my own experience the operating systems and browser types, that's more record keeping than actionable insights, however you could build a security system that takes into account an IP address from very different geo-locations. I.e. why is this guy from the US logging in from China. They just logged in from California 3 hours ago, this is impossible. I don't believe most sites really go to that extent though. I do see MFA providers saying "hey there's a login from china, do you want to approve?". That workflow makes a lot more sense.
The last part of your question is tricky, regarding "unique users." Most calculate that based off the number of sessions opened (tabs), or in the case of Twitch (since you mentioned them specifically), the number of tabs that are streaming that video in. These open platforms where anyone without an account can stream the content obviously treat this differently than say Netflix that makes you authenticate and each account has a limited number of sessions that can be open.

AFAIK, most of the systems like this stores a cookie in your browser when you log (not the session cookie, just a random ID) that is also assiciated to your account in the provider database, so when you came back, you log in, and they check whether you have that cookie set and in case if the ID matches
They you can probably do some more advance stuff with that ID, like base that value from the browser, OS, expire date and so on

Spotify app - allowed to save user settings by username?

I have a Spotify app and want to persist basic settings per user between sessions. I see the User object has a username field, so it would be easy to do this using my own backend. My question is, is this allowed, without requiring the user to log in, agree to some TOS, etc? Every app I see that persists any data requires me to log in with Facebook.

Usernames are typically obfuscated out in the Spotify API, so they're not the best thing to use. However, the anonymous ID for the user is the same for a given user/app ID combo across multiple machines, so you could use that instead. This sort of thing is what we designed the anonymous ID for, so you're good to go on the ToS front.

I can't find anything that restricts you from load/storing data from your own servers and I've seen 'you'd have to use your own server' suggested in a number of questions.
Not sure why other apps would involve FB - probably to get more info from the user or promote their product.
You should use the User's URI instead of their username though. I would expect it be more stable than the username and less likely to be little Bobby Tables.

Is this safe for client side code?

I'm writing a GWT application where users login and interact with their profile. I understand that each form entry needs to be validated on the server, however, I am unsure about potential security issues once the user has logged in.
Let me explain. My application (the relevant parts) works as follows:
1 - user enters email/pass
2 - this info is sent back to the server, a DB is queried, passwords are checked (which are salted and hashed)
3. if the passwords match the profile associated w/ the email, this is considered success
Now I am unsure whether or not it is safe to pass the profile ID back to the client, which would then be used to query the DB for information relevant to the user to be displayed on the profile page.
Is there a possibility for a potential user to manually provide this profile ID and load a profile that way? My concern is that somebody w/ bad intentions could, if they knew the format of the profile ID, load an arbitrary amount of information from my DB without providing credentials.
-Nick

What you are dealing with here is a session management issue. Ideally, you want a way to keep track of logged in users (using random values as the session key), know how long they have been idle, be able to extend sessions as the user is using the site, and expire sessions.
Simply passing the profile ID to the client, and relying on it to send it back for each request is not sufficient - you are correct with your concern.
You want to keep a list of sessions with expiration times in a database. Every time an action is executed that needs user permissions (which should be pretty much everything), check to see if the session is still valid, if it is, extend it by however long you want. If it is expired, kill the session completely and log the user out.
You can store your session keys in a cookie (you have to trust the client at some point), but make sure they are non-deterministic and have a very large keyspace so it cannot be brute forced to get a valid session.

Since you're logging a user in, you must be using a backend that supports sessions (PHP, .Net, JAVA, etc), as Stefan H. said. That means that you shouldn't keep any ids on your client side, since a simple id substitution might grant me full access to another user's account (depending on what functionality you expose on your client, of course).
Any server request to get sensitive info (or for any admin actions) for the logged in user should look something like getMyCreditCard(), setMyCreditCard(), etc (note that no unique ids are passed in).

Is there a possibility for a potential user to manually provide this profile ID and load a profile that way? My concern is that somebody w/ bad intentions could, if they knew the format of the profile ID, load an arbitrary amount of information from my DB without providing credentials.
Stefan H is correct that you can solve this via session management if your session keys are unguessable and unfixable.
Another way to solve it is to use crypto-primitives to prevent tampering with the ID.
For example, you can store a private key on your server and use it to sign the profile ID. On subsequent requests, your server can trust the profile ID if it passes the signature check.

Rule 1 - Avoid cooking up your own security solution and use existing tested approaches.
Rule 2 - If your server side is java then you should be thinking along the lines of jsessionid. Spring Security will give you a good starting point to manage session ids with additional security features. There will be similar existing frameworks across php too (i did not see server side language tags in the question).
Rule 3 - With GWT you come across javascript based security issues with Google Team documents and suggests XSRF and XSS security prevention steps. Reference - https://developers.google.com/web-toolkit/articles/security_for_gwt_applications

CQRS Event Sourcing: Validate UserName uniqueness

Let's take a simple "Account Registration" example, here is the flow:
User visit the website
Click the "Register" button and fill out the form, click the "Save" button
MVC Controller: Validate UserName uniqueness by reading from ReadModel
RegisterCommand: Validate UserName uniqueness again (here is the question)
Of course, we can validate UserName uniqueness by reading from ReadModel in the MVC controller to improve performance and user experience. However, we still need to validate the uniqueness again in RegisterCommand, and obviously, we should NOT access ReadModel in Commands.
If we do not use Event Sourcing, we can query the domain model, so that's not a problem. But if we're using Event Sourcing, we are not able to query the domain model, so how can we validate UserName uniqueness in RegisterCommand?
Notice: User class has an Id property, and UserName is not the key property of the User class. We can only get the domain object by Id when using event sourcing.
BTW: In the requirement, if the entered UserName is already taken, the website should show the error message "Sorry, the user name XXX is not available" to the visitor. It's not acceptable to show a message, that says, "We are creating your account, please wait, we will send the registration result to you via Email later", to the visitor.
Any ideas? Many thanks!
[UPDATE]
A more complex example:
Requirement:
When placing an order, the system should check the client's ordering history, if he is a valuable client (if the client placed at least 10 orders per month in the last year, he is valuable), we make 10% off to the order.
Implementation:
We create PlaceOrderCommand, and in the command, we need to query the ordering history to see if the client is valuable. But how can we do that? We shouldn't access ReadModel in command! As Mikael said, we can use compensating commands in the account registration example, but if we also use that in this ordering example, it would be too complex, and the code might be too difficult to maintain.

If you validate the username using the read model before you send the command, we are talking about a race condition window of a couple of hundred milliseconds where a real race condition can happen, which in my system is not handled. It is just too unlikely to happen compared to the cost of dealing with it.
However, if you feel you must handle it for some reason or if you just feel you want to know how to master such a case, here is one way:
You shouldn't access the read model from the command handler nor the domain when using event sourcing. However, what you could do is to use a domain service that would listen to the UserRegistered event in which you access the read model again and check whether the username still isn't a duplicate. Of course you need to use the UserGuid here as well as your read model might have been updated with the user you just created. If there is a duplicate found, you have the chance of sending compensating commands such as changing the username and notifying the user that the username was taken.
That is one approach to the problem.
As you probably can see, it is not possible to do this in a synchronous request-response manner. To solve that, we are using SignalR to update the UI whenever there is something we want to push to the client (if they are still connected, that is). What we do is that we let the web client subscribe to events that contain information that is useful for the client to see immediately.
Update
For the more complex case:
I would say the order placement is less complex, since you can use the read model to find out if the client is valuable before you send the command. Actually, you could query that when you load the order form since you probably want to show the client that they'll get the 10% off before they place the order. Just add a discount to the PlaceOrderCommand and perhaps a reason for the discount, so that you can track why you are cutting profits.
But then again, if you really need to calculate the discount after the order was places for some reason, again use a domain service that would listen to OrderPlacedEvent and the "compensating" command in this case would probably be a DiscountOrderCommand or something. That command would affect the Order Aggregate root and the information could be propagated to your read models.
For the duplicate username case:
You could send a ChangeUsernameCommand as the compensating command from the domain service. Or even something more specific, that would describe the reason why the username changed which also could result in the creation of an event that the web client could subscribe to so that you can let the user see that the username was a duplicate.
In the domain service context I would say that you also have the possibility to use other means to notify the user, such like sending an email which could be useful since you cannot know if the user is still connected. Maybe that notification functionality could be initiated by the very same event that the web client is subscribing to.
When it comes to SignalR, I use a SignalR Hub that the users connects to when they load a certain form. I use the SignalR Group functionality which allows me to create a group which I name the value of the Guid I send in the command. This could be the userGuid in your case. Then I have Eventhandler that subscribe to events that could be useful for the client and when an event arrives I can invoke a javascript function on all clients in the SignalR Group (which in this case would be only the one client creating the duplicate username in your case). I know it sounds complex, but it really isn't. I had it all set up in an afternoon. There are great docs and examples on the SignalR Github page.

I think you are yet to have the mindset shift to eventual consistency and the nature of event sourcing. I had the same problem. Specifically I refused to accept that you should trust commands from the client that, using your example, say "Place this order with 10% discount" without the domain validating that the discount should go ahead. One thing that really hit home for me was something that Udi himself said to me (check the comments of the accepted answer).
Basically I came to realise that there is no reason not to trust the client; everything on the read side has been produced from the domain model, so there is no reason not to accept the commands. Whatever in the read side that says the customer qualifies for discount has been put there by the domain.
BTW: In the requirement, if the entered UserName is already taken, the website should show error message "Sorry, the user name XXX is not available" to the visitor. It's not acceptable to show a message, say, "We are creating your account, please wait, we will send the registration result to you via Email later", to the visitor.
If you are going to adopt event sourcing & eventual consistency, you will need to accept that sometimes it will not be possible to show error messages instantly after submitting a command. With the unique username example the chances of this happening are so slim (given that you check the read side before sending the command) its not worth worrying about too much, but a subsequent notification would need to be sent for this scenario, or perhaps ask them for a different username the next time they log on. The great thing about these scenarios is that it gets you thinking about business value & what's really important.
UPDATE : Oct 2015
Just wanted to add, that in actual fact, where public facing websites are concerned - indicating that an email is already taken is actually against security best practices. Instead, the registration should appear to have gone through successfully informing the user that a verification email has been sent, but in the case where the username exists, the email should inform them of this and prompt them to login or reset their password. Although this only works when using email addresses as the username, which I think is advisable for this reason.

There is nothing wrong with creating some immediately consistent read models (e.g. not over a distributed network) that get updated in the same transaction as the command.
Having read models be eventually consistent over a distributed network helps support scaling of the read model for heavy reading systems. But there's nothing to say you can't have a domain specific read model thats immediately consistent.
The immediately consistent read model is only ever used to check data before issuing a command, you should never use it for directly displaying read data to a user (i.e. from a GET web request or similar). Use eventually consistent, scaleable read models for that.

About uniqueness, I implemented the following:
A first command like "StartUserRegistration". UserAggregate would be created no matter if user is unique or not, but with a status of RegistrationRequested.
On "UserRegistrationStarted" an asynchronous message would be sent to a stateless service "UsernamesRegistry". would be something like "RegisterName".
Service would try to update (no queries, "tell don't ask") table which would include a unique constraint.
If successful, service would reply with another message (asynchronously), with a sort of authorization "UsernameRegistration", stating that username was successfully registered. You can include some requestId to keep track in case of concurrent competence (unlikely).
The issuer of the above message has now an authorization that the name was registered by itself so now can safely mark the UserRegistration aggregate as successful. Otherwise, mark as discarded.
Wrapping up:
This approach involves no queries.
User registration would be always created with no validation.
Process for confirmation would involve two asynchronous messages and one db insertion. The table is not part of a read model, but of a service.
Finally, one asynchronous command to confirm that User is valid.
At this point, a denormaliser could react to a UserRegistrationConfirmed event and create a read model for the user.

Like many others when implementing a event sourced based system we encountered the uniqueness problem.
At first I was a supporter of letting the client access the query side before sending a command in order to find out if a username is unique or not. But then I came to see that having a back-end that has zero validation on uniqueness is a bad idea. Why enforce anything at all when it's possible to post a command that would corrupt the system ? A back-end should validate all it's input else you're open for inconsistent data.
What we did was create an index table at the command side. For example, in the simple case of a username that needs to be unique, just create a user_name_index table containing the field(s) that need to be unique. Now the command side is able to query a username's uniqueness. After the command has been executed it's safe to store the new username in the index.
Something like that could also work for the Order discount problem.
The benefits are that your command back-end properly validates all input so no inconsistent data could be stored.
A downside might be that you need an extra query for each uniqueness constraint and you are enforcing extra complexity.

I think for such cases, we can use a mechanism like "advisory lock with expiration".
Sample execution:
Check username exists or not in eventually consistent read model
If not exists; by using a redis-couchbase like keyvalue storage or cache; try to push the username as key field with some expiration.
If successful; then raise userRegisteredEvent.
If either username exists in read model or cache storage, inform visitor that username has taken.
Even you can use an sql database; insert username as a primary key of some lock table; and then a scheduled job can handle expirations.

Have you considered using a "working" cache as sort of an RSVP? It's hard to explain because it works in a bit of a cycle, but basically, when a new username is "claimed" (that is, the command was issued to create it), you place the username in the cache with a short expiration (long enough to account for another request getting through the queue and denormalized into the read model). If it's one service instance, then in memory would probably work, otherwise centralize it with Redis or something.
Then while the next user is filling out the form (assuming there's a front end), you asynchronously check the read model for availability of the username and alert the user if it's already taken. When the command is submitted, you check the cache (not the read model) in order to validate the request before accepting the command (before returning 202); if the name is in the cache, don't accept the command, if it's not then you add it to the cache; if adding it fails (duplicate key because some other process beat you to it), then assume the name is taken -- then respond to the client appropriately. Between the two things, I don't think there'll be much opportunity for a collision.
If there's no front end, then you can skip the async look up or at least have your API provide the endpoint to look it up. You really shouldn't be allowing the client to speak directly to the command model anyway, and placing an API in front of it would allow you to have the API to act as a mediator between the command and read hosts.

It seems to me that perhaps the aggregate is wrong here.
In general terms, if you need to guarantee that value Z belonging to Y is unique within set X, then use X as the aggregate. X, after all, is where the invariant really exists (only one Z can be in X).
In other words, your invariant is that a username may only appear once within the scope of all of your application's users (or could be a different scope, such as within an Organization, etc.) If you have an aggregate "ApplicationUsers" and send the "RegisterUser" command to that, then you should be able to have what you need in order to ensure that the command is valid prior to storing the "UserRegistered" event. (And, of course, you can then use that event to create the projections you need in order to do things such as authenticate the user without having to load the entire "ApplicationUsers" aggregate.

What are best practices for activation/registration/password-reset links in emails with nonce

Applications send out emails to verify user accounts or reset a password. I believe the following is the way it should be and I am asking for references and implementations.
If an application has to send out a link in an email to verify the user's address, according to my view, the link and the application's processing of the link should have the following characteristics:
The link contains a nonce in the request URI (http://host/path?nonce).
On following the link (GET), the user is presented a form, optionally with the nonce.
User confirms the input (POST).
The server receives the request and
checks input parameters,
performs the change,
and invalidates the nonce.
This should be correct per HTTP RFC on Safe and Idempotent Methods.
The problem is that this process involves one additional page or user action (item 3), which is considered superfluous (if not useless) by a lot of people. I had problems presenting this approach to peers and customers, so I am asking for input on this from a broader technical group. The only argument I had against skipping the POST step was a possible pre-loading of the link from the browser.
Are there references on this subject that might better explain the idea and convince even a non-technical person (best practices from journals, blogs, ...)?
Are there reference sites (preferably popular and with many users) that implement this approach?
If not, are there documented reasons or equivalent alternatives?
Thank you,
Kariem
Details spared
I have kept the main part short, but to reduce too much discussion around the details which I had intentionally left out, I will add a few assumptions:
The content of the email is not part of this discussion. The user knows that she has to click the link to perform the action. If the user does not react, nothing will happen, which is also known.
We do not have to indicate why we are mailing the user, nor the communication policy. We assume that the user expects to receive the email.
The nonce has an expiration timestamp and is directly associated with the recipients email address to reduce duplicates.
Notes
With OpenID and the like, normal web applications are relieved from implementing standard user account management (password, email ...), but still some customers want 'their own users'
Strangely enough I haven't found a satisfying question nor answer here yet. What I have found so far:
Answer by Don in HTTP POST with URL query parameters — good idea or not?
Question from Thomas -- When do you use POST and when do you use GET?

This question is very similar to Implementing secure, unique “single-use” activation URLs in ASP.NET (C#).
My answer there is close to your scheme, with a few issues pointed out - such as short period of validity, handling double signups, etc.
Your use of a cryptographic nonce is also important, that many tend to skip over - e.g. "lets just use a GUID"...
One new point that you do raise, and this is important here, is wrt the idempotency of GET.
Whilst I agree with your general intent, its clear that idempotency is in direct contradiction to one-time links, which is a necessity in some situations such as this.
I would have liked to posit that this doesn't really violate the idempotentness of the GET, but unfortunately it does... On the other hand, the RFC says GET SHOULD be idempotent, its not a MUST. So I would say forgo it in this case, and stick to the one-time auto-invalidated links.
If you really want to aim for strict RFC compliance, and not get into non-idempotent(?) GETs, you can have the GET page auto-submit the POST - kind of a loophole around that bit of the RFC, but legit, and you dont require the user to double-optin, and you're not bugging him...
You dont really have to worry about preloading (are you talkng about CSRF, or browser-optimizers?)... CSRF is useless because of the nonce, and optimizers usually wont process javascript (used to auto-submit) on the preloaded page.

About password reset:
The practice of doing this by sending an email to the user's registered email address is, while very common in practice, not good security. Doing this fully outsources your application security to the user's email provider. It does not matter how long passwords you require and whatever clever password hashing you use. I will be able to get into your site by reading the email sent out to the user, given that I have access to the email account or am able to read the unencrypted email anywhere on its way to the user (think: evil sysadmins).
This might or might not be important depending on the security requirements of the site in question, but I, as a user of the site, would at least want to be able to disable such a password reset function since I consider it unsafe.
I found this white paper that discusses the topic.
The short version of how to do it in a secure way:
Require hard facts about the account
username.
email address.
10 digit account number or other information
like social security number.
Require that the user answers at least three predefined questions (predefined by you,
don't let the user create his own questions) that can not be trivial. Like "What's
your favorite vacation spot", not "What's your favorite color".
Optionally: Send a confirmation code to a predefined email address or cell number (SMS) that the user has to input.
Allow the user to input a new password.

I generally agree with you with some modification suggested below.
User registers at your site providing an email.
Verification email is sent to the users account with two links:
a) One link with the GUID to verify the registration b) One link with the GUID to reject the verification
When they visit the verification url from their email they are automatically verified and the verification guid is marked as such in your system.
When they visit the rejection url from their email they are automatically removed from the queue of possible verifications but more importantly you can tell the user that you are sorry for the email registration and give them further options such as removing their email from your system. This will stop any custom service type complaints about someone entering my email in your system...blah blah blah.
Yes, you should assume that when they click the verification link that they are verified. Making them click a second button in a page is a bit much and only needed for double opt in style registration where you plan to spam the person that registered. Standard registration/verification schemes don't usually require this.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string