I have followed tutorial on how to create token-based authentication with node from this tutorial http://www.kdelemme.com/2014/08/16/token-based-authentication-with-nodejs-redis/
I got it all worked out, but I got 1 problem.
The way I store token is :
KEY = TOKEN
VALUE = UserData (Username, email, etc.)
To protect multiple devices login, I would like to invalidate the existing Token, and generate new one. During login, I would like to check if the user's token is already existed. However, I need to find Key by Value. ( I need to find TOKEN by email ). But as I look through Redis document I couldn't find any line talking about finding Key by value.
Thank you very much :)
You basically have to choose one of two approaches: a full scan of the database or an index. A full scan, as proposed in another answer to this question, will be quite inefficient - you'll go over your entire keyspace (or at least all the tokens) and will need to fetch each one until you find a match to the email.
An index will allow you to get an answer to your query much faster, at the expense of some RAM and administrative overhead. While Redis doesn't provide indexing capabilities out of the box, you can easily devise them using regular Redis data structures and operations. For example, the straightforward way to accomplish what you want would be to store for each token another key who's name is the email and its value the token. This will let you let the token but email with a single GET operation.
Note that this indexing approach will effectively double the number of token-related keys, so in order to optimize your RAM consumption you may want to consider other types of indexing structures (e.g. using a Hash to group email-token pairs where the is used as a bucket).
You would have to do a SCAN of some kind and iterate through the keys, searching each value. The redis module supports these commands, but if you need/want a streaming interface for SCAN, there are at least a couple of modules to do that: redis-scanstreams and redisscan (which technically uses a callback approach, so not a real stream implementation).
Related
I am planning to create a small application that users can login to use. To provide API access, I plan to provide them with access tokens just like a lot of services provide.
When users login with email/password, I search for email address and match the password hash against that database entry. How can I create an access token though, something like GitHub:
ghp_36-characters
I have seen GitHub's tokens to always start with ghp_ followed by 36 characters. Maybe the length of the characters can be a little more or less some times, but that's their format. How do they get a deterministic length and a specific format?
Possibilities I considered:
Encrypt the user ID with a strong encryption algorithm and a strong encryption password
Generate a random string
Use JWTs
Problems with each of those approaches:
The length of the string would depend upon the length of the user ID followed by encryption padding and other pieces of the equation. Also, I don't think it's possible to make it follow a specific format.
I can generate a random string, and store its hash like passwords, but then how can I find the entry in database to authenticate it against? What I mean is, I won't have a known value to compare the unknown value against in the database.
JWTs would heavily depend on the length of the string to encode and are extremely long even for really small data. Again, they have their own structure which won't be possible to escape.
Following a specific format is not a requirement for my application, so I can use any of these options successfully (possibly not the second one), but I'm curious to know how all services achieve this.
I've already saw, that sometimes companies are sending customized links to get to some resource without logging in.
For example some company send me email with link to my invoices:
www.financial.service.com/<SOME_HASHED_VALUE>
and there is no any authorization behind this endpoint, they only rely on fact that I am only person who knows this hash value. I have very similar case but I have concerns:
firstly is it good approach ?
secondly how should I make this hash? sha512 on some random data?
This can be a completely valid approach, and is its own type of authentication. If constructed correctly, it proves that you have access to that email (it doesn't prove anything else, but it does prove that much).
These values often aren't hashes. They're often random, and that's their power. If they are hashes, they need to be constructed such that their output is "effectively random," so usually you might as well just make them random in the first place. For this discussion, I'll call it a "token."
The point of a token is that's unpredictable, and extremely sparse within its search space. By unpredictable, I mean that even if I know exactly who the token is for, it should be effectively impossible (i.e. within practical time contraints) to construct a legitimate token for that user. So, for instance, if this were the hash of the username and a timestamp (even a millisecond timestamp), that would be a terrible token. I could guess those very quickly. So random is best.
By "sparse" I mean that out of all the possible tokens (i.e. strings of the correct length and format), a vanishingly small number of them should be valid tokens, and those valid tokens should be scattered across the search space randomly. For example, if the tokens were sequential, that would be terrible. If I had a token, I could find other tokens by simply increasing or decreasing the value by one.
So a good token looks like this:
Select a random, long string
Store it in your database, along with metadata about what it means, and a timestamp
When a user shows up with it, read the data from the database
After some period of time, expire the token by deleting it from the database (optional, but preferred)
Another way to implement this kind of scheme is to encode the encrypted metadata (i.e. the userid, what page this goes to, a timestamp, etc). Then you don't need to store anything in a database, because its right there in the URL. I don't usually like this approach because it requires a very high-value crypto key that you then have to protect on production servers, and can be used to connect as anyone. Even if I have good ways to protect such a key (generally an attached HSM), I don't like such a key even existing. So generally I prefer a database. But for some applications, encrypting the data is better. Storing the metadata in the URL also significantly restricts how much metadata you can store, so again, tokens are nicer.
and there is no any authorization behind this endpoint, they only rely on fact that I am only person who knows this hash value.
Usually there is authorization before accessing the endpoint (you have authenticated before receiving the invoices). I see it as a common way to share resource with external parties. We use similar approach with expirable AWS S3 urls.
firstly is it good approach ?
It depends on your use case. For sharing some internal resources with option to control access (revoking access, time based access, one time access, ..)
secondly how should I make this hash? sha512 on some random data?
Until the SOME_HASHED_VALUE is not guessable with negligible collision probability (salted hash, long random unique value, ..) it should be ok.
Using nodejs and crypto, right now, when a user logs in, I generate a random auth token:
var token = crypto.randomBytes(16).toString('hex');
I know it's unlikely, but there is a tiny chance for two tokens to be of the same value.
This means a user could, in theory, authenticate on another account.
Now, I see two obvious methods to get pass this:
When I generate the token, query the user's database and see if a
Token with the same value already exists. If it does, just generate another one. As you can see, this is not perfect since I am adding queries to the database.
Since every user has a unique username in my database, I could
generate a random token using the username as a secret generator key.
This way, there is no way of two tokens having the same value. Can crypto do that? Is it secure?
How would you do it?
It's too unlikely to worry about it happening by chance. I would not sacrifice performance to lock and check the database for it.
Consider this excerpt from Pro Git about the chance of random collisions between 20-byte SHA-1 sums:
Here’s an example to give you an idea of what it would take to get a
SHA-1 collision [by chance]. If all 6.5 billion humans on Earth were programming,
and every second, each one was producing code that was the equivalent
of the entire Linux kernel history (1 million Git objects) and pushing
it into one enormous Git repository, it would take 5 years until that
repository contained enough objects to have a 50% probability of a
single SHA-1 object collision. A higher probability exists [for average projects] that every
member of your programming team will be attacked and killed by wolves
in unrelated incidents on the same night.
(SHA-1 collisions can be directly constructed now, so the quote is now less applicable to SHA-1, but it's still valid when considering collisions of random values.)
If you are still worried about that probability, then you can easily use more random bytes instead of 16.
But regarding your second idea: if you hashed the random ID with the username, then that hash could collide, just like the random ID could. You haven't solved anything.
You should always add a UNIQUE constraint to your database column. This will create an implicit index to improve searches for this column and it will make sure that none of two records will ever has the same value. So, in the worst-case scenario you will get a database exception and not a security violation.
Also, depending on how frequently unique tokens are needed to be created, I think it's perfectly fine in most cases to use database lookups during generation. If your column, again, is properly indexed, it will be a pretty fast query. Most databases a very well horizontally scalable, so if your are building a next Facebook it is again an option. Furthermore, you will probably need to do a query to check for E-Mail uniqueness anyway.
Finally, if you are really concerned about performance you could always pre-generate a one-million of unique tokens and store them in the separate database table for quick use. Just setup a routine to periodically check it's usage and insert more records to it as needed. However, as #MacroMan stated in the comments, this could have a security implications if someone will get access to the list of pre-generated tokens, so this practice should be avoided.
PostgreSQL UNIQUE CONSTRAINT
MySQL: Unique Constraints
I am storing user information from nodejs app in the form of
SET user_<userid> {id:"asdad", .....}
I have sets of users organized by updates and such. Sometimes I need to retrieve large amount for users and send them to a client (lets say 100 users)
Currently I use MGET key1, key2, .... then once I get them back I parse the json and return the result.
Would it better for me to store the users in hashes? To retrieve multiple users I could use multi together with HMGET so I would use 100 HMGETs and then get back the user data.
Big advantage I see for the HMGET is that if I need only some of my user fields I can retrieve partial objects instead of full objects.
It would definitely be better to use a hash to store your users, but not by splitting their values up to different hash fields. Use a hash like a key value store of its own.
HMSET users <userid> {"your":"json"}
This is way more memory efficient than using a top level key for every user, as Redis does not have to save metadata (like expire) for the keys. You still have support for HMGET/SET and some others.
Details can be found at:
http://redis.io/topics/memory-optimization
Concerning your vs. question I could not find benchmarks, so you will have to benchmark the performance for yourself. Nevertheless using MULTI will get you into troubles when you have to shard your data on two redis instances. MULTI commands currently can't span accross multiple instances.
Would it better for me to store the users in hashes?
Storing users in hashes gives you better flexibility in terms of retrieving only specific fields as you mentioned and also no need to parse strings to objects. I asked similar question before.
Is it recommended to create a column (unique key) that is a hash.
When people view my URL, it is currently like this:
url.com/?id=2134
But, people can look over this and data-mine all the content, right?
Is it RECOMMENDED to go 1 extra step to make this through hash?
url.com?id=3fjsdFNHDNSL
Thanks!
The first and most important step is to use some form of role-based security to ensure that no user can see data they aren't supposed to see. So, for example, if a user should only see their own information, then you should check that the id belongs to the logged-in user before you display it.
As a second level of protection, it's not a bad idea to have a unique key that doesn't let you predict other keys (a hash, as you suggest, or a UUID). However, that still means that, for example, a malicious user who obtained someone else's URL (e.g. by sniffing a network, by reading a log file, by viewing the history in someone's browser) could see that user's information. You need authentication and authorization, not simply obfuscating the IDs.
It sort of depends on your situation, but off hand I think if you think you need to hash you need to hash. If someone could data mine by, say, iterating through:
...
url.com?id=2134
url.com?id=2135
url.com?id=2136
...
Then using a hash for the id is necessary to avoid this, since it will be much harder to figure out the next one. Keep in mind, though, that you don't want to make the hash too obvious, so that a determined attacker would easily figure it out, e.g. just taking the MD5 of 2134 or whatever number you had.
Well, the problem here is that an actual Hash is technically one way. So if you hash the data you won't be able to recover it on the receiving side. Without knowing what technology you are using to create your web page it's hard to make any concrete suggestions, but if you must have sensitive information in your query string then I would recommend that you at least use a symmetric encryption algorithm on it to keep people from simply reading off the values and reverse engineering things.
Of course if you have the option - it's probably better to not have that information in the query string at all.