I have a simple database of content. Should I hash the "id" so that people don't look over it in the URL?

I have a simple database of content. Should I hash the "id" so that people don't look over it in the URL? - security

Is it recommended to create a column (unique key) that is a hash.
When people view my URL, it is currently like this:
url.com/?id=2134
But, people can look over this and data-mine all the content, right?
Is it RECOMMENDED to go 1 extra step to make this through hash?
url.com?id=3fjsdFNHDNSL
Thanks!

The first and most important step is to use some form of role-based security to ensure that no user can see data they aren't supposed to see. So, for example, if a user should only see their own information, then you should check that the id belongs to the logged-in user before you display it.
As a second level of protection, it's not a bad idea to have a unique key that doesn't let you predict other keys (a hash, as you suggest, or a UUID). However, that still means that, for example, a malicious user who obtained someone else's URL (e.g. by sniffing a network, by reading a log file, by viewing the history in someone's browser) could see that user's information. You need authentication and authorization, not simply obfuscating the IDs.

It sort of depends on your situation, but off hand I think if you think you need to hash you need to hash. If someone could data mine by, say, iterating through:
...
url.com?id=2134
url.com?id=2135
url.com?id=2136
...
Then using a hash for the id is necessary to avoid this, since it will be much harder to figure out the next one. Keep in mind, though, that you don't want to make the hash too obvious, so that a determined attacker would easily figure it out, e.g. just taking the MD5 of 2134 or whatever number you had.

Well, the problem here is that an actual Hash is technically one way. So if you hash the data you won't be able to recover it on the receiving side. Without knowing what technology you are using to create your web page it's hard to make any concrete suggestions, but if you must have sensitive information in your query string then I would recommend that you at least use a symmetric encryption algorithm on it to keep people from simply reading off the values and reverse engineering things.
Of course if you have the option - it's probably better to not have that information in the query string at all.

Related

Unique hash as authorization for endpoint

I've already saw, that sometimes companies are sending customized links to get to some resource without logging in.
For example some company send me email with link to my invoices:
www.financial.service.com/<SOME_HASHED_VALUE>
and there is no any authorization behind this endpoint, they only rely on fact that I am only person who knows this hash value. I have very similar case but I have concerns:
firstly is it good approach ?
secondly how should I make this hash? sha512 on some random data?

This can be a completely valid approach, and is its own type of authentication. If constructed correctly, it proves that you have access to that email (it doesn't prove anything else, but it does prove that much).
These values often aren't hashes. They're often random, and that's their power. If they are hashes, they need to be constructed such that their output is "effectively random," so usually you might as well just make them random in the first place. For this discussion, I'll call it a "token."
The point of a token is that's unpredictable, and extremely sparse within its search space. By unpredictable, I mean that even if I know exactly who the token is for, it should be effectively impossible (i.e. within practical time contraints) to construct a legitimate token for that user. So, for instance, if this were the hash of the username and a timestamp (even a millisecond timestamp), that would be a terrible token. I could guess those very quickly. So random is best.
By "sparse" I mean that out of all the possible tokens (i.e. strings of the correct length and format), a vanishingly small number of them should be valid tokens, and those valid tokens should be scattered across the search space randomly. For example, if the tokens were sequential, that would be terrible. If I had a token, I could find other tokens by simply increasing or decreasing the value by one.
So a good token looks like this:
Select a random, long string
Store it in your database, along with metadata about what it means, and a timestamp
When a user shows up with it, read the data from the database
After some period of time, expire the token by deleting it from the database (optional, but preferred)
Another way to implement this kind of scheme is to encode the encrypted metadata (i.e. the userid, what page this goes to, a timestamp, etc). Then you don't need to store anything in a database, because its right there in the URL. I don't usually like this approach because it requires a very high-value crypto key that you then have to protect on production servers, and can be used to connect as anyone. Even if I have good ways to protect such a key (generally an attached HSM), I don't like such a key even existing. So generally I prefer a database. But for some applications, encrypting the data is better. Storing the metadata in the URL also significantly restricts how much metadata you can store, so again, tokens are nicer.

and there is no any authorization behind this endpoint, they only rely on fact that I am only person who knows this hash value.
Usually there is authorization before accessing the endpoint (you have authenticated before receiving the invoices). I see it as a common way to share resource with external parties. We use similar approach with expirable AWS S3 urls.
firstly is it good approach ?
It depends on your use case. For sharing some internal resources with option to control access (revoking access, time based access, one time access, ..)
secondly how should I make this hash? sha512 on some random data?
Until the SOME_HASHED_VALUE is not guessable with negligible collision probability (salted hash, long random unique value, ..) it should be ok.

Store user IP, but make it non traceable

I am working on a project where users (in a given and relativity short time period) answers statements, and i would like to store the entries anonymously.
After the collection period is over, i would like to be able to run statics on the answers. But it is very important that the users answers can not be traced back to a specific user/ip.
The reason that i would still like to store ip, regardless of my desire for the users to be anonymously, is that i would like to exclude entries where the user (with malicious intent or by accident), take the same test multiple times in a short span.
I have ruled out using encryptions, as it is, to my limited knowledge, not possible to compare a large set of encrypted strings like that.
My currently self proposed method is then to store: The user agent, a session identifier and a hashed ip address.
Regarding to the hashing method, i am thinking to use sha512 where the ip is prepended with a 16 character long salt (same salt for all entries).
I know that when hashing simple and common strings, that sha512 and other hashing methods can be broken with tools like: http://md5decrypt.net/en/Sha512/ and good old brute forcing.
My idea to then guarantee user anonymity, is that after the collection period is over, i will delete the salt. Making it (to my knowledge) near impossible to brute force the hash. Even if a malicious party got hand on my source code.
I know it seems like a low tech solution, and that party of the security is based on my own action of actually deleting, where i en theory could forget or change my mind. But it is the only solution i could come up with.
Thanks in advance

Don't hash the IP's, HMAC them. That's conceptually the same than what you want to do, but cryptographically robust.
https://en.wikipedia.org/wiki/Hash-based_message_authentication_code

Visible User ID in Address Bar

Currently, to pass a user id to the server on certain views I use the raw user id.
http://example.com/page/12345 //12345 Being the users id
Although there is no real security risk in my specific application by exposing this data, I can't help but feeling a little dirty about it. What is the proper solution? Should I somehow be disguising the data?
Maybe a better way to propose my question is to ask what the standard approach is. Is it common for applications to use user id's in plain view if it's not a security risk? If it is a security risk how is it handled? I'm just looking for a point in the right direction here.

There's nothing inherently wrong with that. Lots of sites do it. For instance, Stack Overflow users can be enumerated using URLs of the form:
http://stackoverflow.com/users/123456
Using a normalized form of the user's name in the URL, either in conjunction with the ID or as an alternative to it, may be a nicer solution, though, e.g:
http://example.com/user/yourusername
http://example.com/user/12345/yourusername
If you go with the former, you'll need to ensure that the normalized username is set up as a unique key in your user database.
If you go with the latter, you've got a choice: if the normalized username in the database doesn't match the one in the URL, you can either redirect to the correct URL (like Stack Overflow does), or return a 404 error.

In addition to duskwuff's great suggestion to use the username instead of the ID itself, you could use UUIDs instead of integers. They are 128-bit in length so infeasible to enumerate, and also avoid disclosing exactly how many users you have. As an added benefit, your site is future proofed against user id limits if it becomes massively popular.
For example, with integer ids, an attacker could find out the largest user_id on day one, and come back in a week or months time and find what the largest user_id is now. They can continually do this to monitor the rate of growth on your site - perhaps not a biggie for your example - but many organisations consider this sort of information commercially sensitive. Also helps avoid social engineering, e.g. makes it significantly harder for an attacker to email you asking to reset their password "because I've changed email providers and I've forgotten my old password but I remember my user id!". Give an attack an inch and they'll run a mile.
I prefer to use Version/Type 4 (Random) UUIDs, however you could also use Version/Type 5 (SHA-1-based) so you could go UUID.fromName(12345) and get a UUID derived from the integer value, which is useful if you want to migrate existing data and need to update a bunch of foreign key values. Most major languages support UUIDs natively or are included in popular libraries (C & C++), although some database software might require some tweaking - I've used them with postgres and myself and are easy transitions.
The downside is UUIDs are significantly longer and not memorable, but it doesn't sound like you need the ability for the user to type in the URLs manually. You do also need to check if the UUID already exists when creating a user, and if it does, just keep generating until an unused UUID is found - in practice given the size of the numbers, using Version 4 Random UUIDs you will have a better chance at winning the lottery than dealing with a collision, so it's not something that will impact performance etc.
Example URL: http://example.com/page/4586A0F1-2BAD-445F-BFC6-D5667B5A93A9

Do I need to use hashes instead of user id's for invite system?

I'm implementing a referral system, and I'm wondering if I NEED to use a hash as the referrer_id, or if I can just use the user's id? I don't want to dogmatically use hashes for EVERYTHING, so can you give me some examples of potential pitfalls?

It depends on what is important to you. If you need to know for sure that the referrer_id originated from the user who provided the refer, then you will need some sort of hash combined with token that cannot be guessed. If all you do is hash the user id, you aren't providing any real security because an attacker could simply guess a user ID and hash it to make your server happy.
It is important to know that hashing does not produce any unpredictability. It will always produce the same output for the same input. Hashing is valuable because it prevents tampering (a small change in input produces a large change in output) and it normalizes the size of the data, but it should never be confused with encryption.
How much do you care about Confidentiality, Integrity, and Availability of the referrer_id? The answers to those questions will determine how much effort you need to exert to protect the referrer_id.

Protecting application url

How do I protect the url generated by my application?
example
http://www.mydomain.com/jsp/get_article.jsp?id=1
how do I make these url unreadable to human beings?

What you can do is use a hash such as http://www.mydomain.com/jsp/get_article.jsp?hash=[base32 MD-5 hash value] or similar. Then you have a table hash -> article on the server (as hashes are unique enough, you don't have to care about "clashes" between the hashes of different articles). Of course, you would still have to have the hash on the client side, so you either have to calculate it there, or you have to simply give it within the page.
The hash would be the hash over the article itself, so it will be unique for the article at all times, and cannot be guessed without knowing the actual article. Titles are too easy to guess.
Howevery you look at it, you will not get perfect security from this, but you can get security from people trying to guess the URL without requesting the page before it. In other words, it's a lot of work without too much gain. But as you are trying to achieve a DRM scheme, it's probably the best you can get...

Not sure what you mean by "unreadable". I think the short answer is: It can't be done. The URL has to be visible to the browser or how will it request the resource? Your question sounds a little like saying, "How can I let people call me without telling them my telephone number?"
You could, I suppose, encrypt your URLs. But why?
If there is information in the URL that you don't want the user to see, then ... don't put it in the URL. Like, if you had a customer help system that directs users who give impossible answers to the "moron" section of your system, I wouldn't make the url be "http://example.com/help/moron.jsp?screen=17". Call it something non-descript. More seriously, you certainly should not make the customer's password or other confidential information part of the URL. Keep this sort of thing in data on the server side that is accessed via "safe" data, like a user id.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string