Pin Generation - security

I am looking to develop a system in which i need to assign every user a unique pin code for security. The user will only enter this pin code as a means of identifying himself. Thus i dont want the user to be able to guess another users pincode. Assuming the max users i will have is 100000, how long should this pin code be?
e.g. 1234 4532 3423
Should i generate this code via some sort of algorithm? Or should i randomly generate it?
Basically I dont want people to be able to guess other peoples pincode and it should support enough number of users.
Am sorry if my question sounds a bit confusing but would gladly clarify any doubts.
thank you very much.
UPDATE
After reading all the posts below, I would like to add some more detail.
What i am trying to achieve is something very similar to a scratch card.
A user is given a card, which he/she must scratch to find the pin code.
Now using this pin code the user must be able to access my system.
I cannot add extra security (e.g. username and password), as then it will deter the user from using the scratch card. I want to make it as difficult as possible to guess the pincode within the limitations.
thankyou all for your amazing replies again.

4 random digits should be plenty if you append it to unique known userid (could still be number) [as recommended by starblue]
Pseudo random number generator should also be fine. You can store these in the DB using reversable encryption (AES) or one-way hashing
The main concern you have is how many times a person can incorrectly input the pin before they are locked out. This should be low, say around three...This will stop people guessing other peoples numbers.
Any longer than 6 digits and people will be forgetting them, or worse, writing them on a post-it note on their monitor.
Assuming an account locks with 3 incorrect attempts, then having a 4 digit pin plus a user ID component UserId (999999) + Pin (1234) gives you a 3/10000 chance of someone guessing. Is this acceptable? If not make the pin length 5 and get 3/100000

May I suggest an alternative approach? Take a look at Perfect Paper Passwords, and the derivatives it prompted .
You could use this "as is" to generate one-time PINs, or simply to generate a single PIN per user.
Bear in mind, too, that duplicate PINs are not of themselves an issue: any attack would then simply have to try multiple user-ids.
(Mileage warning: I am definitely not a security expert.)
Here's a second answer: from re-reading, I assume you don't want a user-id as such - you're just validating a set of issued scratch cards. I also assume you don't want to use alphabetic PINs.
You need to choose a PIN length such that the probability of guessing a valid PIN is less than 1/(The number of attempts you can protect against). So, for example, if you have 1 million valid PINs, and you want to protect against 10000 guesses, you'll need a 10-digit PIN.
If you use John Graham-Cumming's version of the Perfect Paper Passwords system, you can:
Configure this for (say) 10-digit decimal pins
Choose a secret IV/key phrase
Generate (say) the first million passwords(/PINs)
I suspect this is a generic procedure that could, for example, be used to generate 25-alphanumeric product ids, too.
Sorry for doing it by successive approximation; I hope that comes a bit nearer to what you're looking for.

If we assume 100,000 users maximum then they can have unique PINs with 0-99,999 ie. 5 digits.
However, this would make it easier to guess the PINs with the maximum number of users.
If you can restrict the number of attempts on the PIN then you can have a shorter PIN.
eg. maximum of 10 failed attempts per IP per day.
It also depends on the value of what you are protecting and how catastrophic it would be if the odd one did get out.
I'd go for 9 digits if you want to keep it short or 12 digits if you want a bit more security from automated guessing.
To generate the PINs, I would take a high resolution version of the time along with some salt and maybe a pseudo-random number, generate a hash and use the first 9 or 12 digits. Make sure there is a reasonable and random delay between new PIN generations so don't generate them in a loop, and if possible make them user initiated.
eg. Left(Sha1(DateTime + Salt + PseudoRandom),9)

Lots of great answers so far: simple, effective, and elegant!
I'm guessing the application is somewhat lottery-like, in that each user gets a scratch card and uses it to ask your application if "he's already won!" So, from that perspective, a few new issues come to mind:
War-dialing, or its Internet equivalent: Can a rogue user hit your app repeatedly, say guessing every 10-digit number in succession? If that's a possibility, consider limiting the number of attempts from a particular location. An effective way might be simply to refuse to answer more than, say, one attempt every 5 seconds from the same IP address. This makes machine-driven attacks inefficient and avoids the lockout problem.
Lockout problem: If you lock an account permanently after any number of failed attempts, you're prone to denial of service attacks. The attacker above could effectively lock out every user unless you reactivate the accounts after a period of time. But this is a problem only if your PINs consist of an obvious concatenation of User ID + Key, because an attacker could try every key for a given User ID. That technique also reduces your key space drastically because only a few of the PIN digits are truly random. On the other hand, if the PIN is simply a sequence of random digits, lockout need only be applied to the source IP address. (If an attempt fails, no valid account is affected, so what would you "lock"?)
Data storage: if you really are building some sort of lottery-like system you only need to store the winning PINs! When a user enters a PIN, you can search a relatively small list of PINs/prizes (or your equivalent). You can treat "losing" and invalid PINs identically with a "Sorry, better luck next time" message or a "default" prize if the economics are right.
Good luck!

The question should be, "how many guesses are necessary on average to find a valid PIN code, compared with how many guesses attackers are making?"
If you generate 100 000 5-digit codes, then obviously it takes 1 guess. This is unlikely to be good enough.
If you generate 100 000 n-digit codes, then it takes (n-5)^10 guesses. To work out whether this is good enough, you need to consider how your system responds to a wrong guess.
If an attacker (or, all attackers combined) can make 1000 guesses per second, then clearly n has to be pretty large to stop a determined attacker. If you permanently lock out their IP address after 3 incorrect guesses, then since a given attacker is unlikely to have access to more than, say, 1000 IP addresses, n=9 would be sufficient to thwart almost all attackers. Obviously if you will face distributed attacks, or attacks from a botnet, then 1000 IP addresses per attacker is no longer a safe assumption.
If in future you need to issue further codes (more than 100 000), then obviously you make it easier to guess a valid code. So it's probably worth spending some time now making sure of your future scaling needs before fixing on a size.
Given your scratch-card use case, if users are going to use the system for a long time, I would recommend allowing them (or forcing them) to "upgrade" their PIN code to a username and password of their choice after the first use of the system. Then you gain the usual advantages of username/password, without discarding the ease of first use of just typing the number off the card.
As for how to generate the number - presumably each one you generate you'll store, in which case I'd say generate them randomly and discard duplicates. If you generate them using any kind of algorithm, and someone figures out the algorithm, then they can figure out valid PIN codes. If you select an algorithm such that it's not possible for someone to figure out the algorithm, then that almost is a pseudo-random number generator (the other property of PRNGs being that they're evenly distributed, which helps here too since it makes it harder to guess codes), in which case you might as well just generate them randomly.

If you use random number generator algorithms, so you never have PIN like "00038384882" ,
starts with 0 (zeros), because integer numbers never begins with "0". your PIN must be started with 1-9 numbers except 0.
I have seen many PIN numbers include and begins many zeros, so you eliminate first million of numbers. Permutation need for calculations for how many numbers eliminated.
I think you need put 0-9 numbers in a hash, and get by randomly from hash, and make your string PIN number.

If you want to generate scratch-card type pin codes, then you must use large numbers, about 13 digits long; and also, they must be similar to credit card numbers, having a checksum or verification digit embedded in the number itself. You must have an algorithm to generate a pin based on some initial data, which can be a sequence of numbers. The resulting pin must be unique for each number in the sequence, so that if you generate 100,000 pin codes they must all be different.
This way you will be able to validate a number not only by checking it against a database but you can verify it first.
I once wrote something for that purpose, I can't give you the code but the general idea is this:
Prepare a space of 12 digits
Format the number as five digits (00000 to 99999) and spread it along the space in a certain way. For example, the number 12345 can be spread as __3_5_2_4__1. You can vary the way you spread the number depending on whether it's an even or odd number, or a multiple of 3, etc.
Based on the value of certain digits, generate more digits (for example if the third digit is even, then create an odd number and put it in the first open space, otherwise create an even number and put it in the second open space, e.g. _83_5_2_4__1
Once you have generated 6 digits, you will have only one open space. You should always leave the same open space (for example the next-to-last space). You will place the verification digit in that place.
To generate the verification digit you must perform some arithmetic operations on the number you have generated, for example adding all the digits in the odd positions and multiplying them by some other number, then subtracting all the digits in the even positions, and finally adding all the digits together (you must vary the algorithm a little based on the value of certain digits). In the end you have a verification digit which you include in the generated pin code.
So now you can validate your generated pin codes. For a given pin code, you generate the verification digit and check it against the one included in the pin. If it's OK then you can extract the original number by performing the reverse operations.
It doesn't sound so good because it looks like security through obscurity but it's the only way you can use this. It's not impossible for someone to guess a pin code but being a 12-digit code with a verification digit, it will be very hard since you have to try 1,000,000,000,000 combinations and you just have 100,000 valid pin codes, so for every valid pin code there are 10,000,000 invalid ones.
I should mention that this is useful for disposable pin codes; a person uses one of these codes only once, for example to charge a prepaid phone. It's not a good idea to use these pins as authentication tokens, especially if it's the only way to authenticate someone (you should never EVER authenticate someone only through a single piece of data; the very minimum is username+password)

It seems you want to use the pin code as the sole means of identification for users.
A workable solution would be to use the first five digits to identify the user,
and append four digits as a PIN code.
If you don't want to store PINs they can be computed by applying a cryptographically secure hash (SHA1 or better)
to the user number plus a system-wide secret code.

Should i generate this code via some
sort of algorithm?
No. It will be predictable.
Or should i randomly generate it?
Yes. Use a cryptographic random generator, or let the user pick their own PIN.
In theory 4 digits will be plenty as ATM card issuers manage to support a very large community with just that (and obviously, they can't be and do not need to be unique). However in that case you should limit the number of attempts at entering the PIN and lock them out after that many attempts as the banks do. And you should also get the user to supply a user ID (in the ATM case, that's effectively on the card).
If you don't want to limit them in that way, it may be best to ditch the PIN idea and use a standard password (which is essentially what your PIN is, just with a very short length and limited character set). If you absolutely must restrict it to numerics (because you have a PIN pad or something) then consider making 4 a (configurable) minimum length rather than the fixed length.
You shouldn't store the PIN in clear anywhere (e.g. salt and hash it like a password), however given the short length and limited char set it is always going to be vulnerable to a brute force search, given an easy way to verify it.
There are various other schemes that can be used as well, if you can tell us more about your requirements (is this a web app? embedded system? etc).

There's a difference between guessing the PIN of a target user, and that of any valid user. From your use case, it seems that the PIN is used to gain access to certain resource, and it is that resource that attackers may be after, not particular identities of users. If that's indeed the case, you will need to make valid PIN numbers sufficiently sparse among all possible numbers of the same number digits.
As mentioned in some answers, you need to make your PIN sufficiently random, regardless if you want to generate it from an algorithm. The randomness is usually measured by the entropy of the PIN.
Now, let's say your PIN is of entropy N, and there are 2^M users in your system (M < N), the probability that a random guess will yield a valid PIN is 2^{M-N}. (Sorry for the latex notations, I hope it's intuitive enough). Then from there you can determine if that probability is low enough given N and M, or compute the required N from the desired probability and M.
There are various ways to generate the PINs so that you won't have to remember every PIN you generated. But you will need a very long PIN to make it secure. This is probably not what you want.

I've done this before with PHP and a MySQL database. I had a permutations function that would first ensure that the number of required codes - $n, at length $l, with the number of characters, $c - was able to be created before starting the generation process.
Then, I'd store each new code to the database and let it tell me via UNIQUE KEY errors, that there was a collision (duplicate). Then keep going until I had made $n number of successfully created codes. You could of course do this in memory, but I wanted to keep the codes for use in a MS Word mail merge. So... then I exported them as a CSV file.

Related

Share Link Generation Security

I have always wondered how websites generates "share with others" links.
Some websites allow you to share a piece of data through a link in order to let people you sent the link to to be able to see the data or even edit it.
For example Google Drive, OneDrive, etc... They give you a (pretty short) link, but what guaranties me that it's not possible for someone to find this link "by luck" and access my data?
Like what if an attacker was trying all the possibilities of links: https://link.share.me/xxxxxxx till he finds some working ones?
Is there a certain length which almost guaranties that no one will find one link this way ? For example if a site generated 1000 links, if the endpoints are composed of 10 times a [A-Za-z0-9] like character (~8e17 possibilities), we just assume that it is secure enough ? If yes, at what probability or ratio between links and possibilities do we consider this kind of system as secure?
Is there a certain cryptographic or mathematic way of generating those links which assure us that a link cannot be found by anyone?
Thank you very much.
Probably the most important thing (besides entropy, which we will come back to in a second) is where you get random from. For this purpose you should use a cryptographic pseudo-random number generator (crypto prng). (As a sidenote, you could also use real random, but a real random source is very hard to come by, if you generate many links, you will likely run out of available random bits, so a crypto prng is probably good enough for your purpose, few applications do actually need real random numbers). Most languages and/or frameworks have a facility for this, in Ruby it is SecureRandom, in Java it's java.security.SecureRandom for example, in python it could be os.urandom and so on.
Ok so how long should it be. It somewhat depends on your other non-security requirements as well, for example sometimes these need to be easy to say over the phone, easy to type or something similar. Apart from these, what you should consider is entropy. Your idea of counting the number of all possible codes is a great start, let's just say that the entropy in the code is log2 (base 2 logarithm) of that number. So for a case sensitive, alphanumeric code that is 10 characters long, the entropy is log2((26+26+10)^10) = 59.5 bits. You can compute the entropy for any other length and character set the same way.
That might well be enough, what you should consider is your attacker. Will they be able to perform online attacks only (a lot slower), or offline too (can be very-very fast, especially with specialized hardware)? Also what is the impact if they find one, is it like financial data, or just a random funny picture, or the personal data of somebody, for which you are legally responsible in multiple jurisdictions (see GDPR in EU, or the California privacy laws)?
In general, you could say that 64 bits of entropy is probably good enough for many purposes, and 128 bits is a lot (except maybe for cryptographic keys and very high security applications). As the 59 bits above is.. well, almost 64, for lower security apps that could for example be a reasonable tradeoff for better usability.
So in short, there is no definitive answer, it depends on how you want to model this, and what security requirements you want to meet.
Two more things to consider are the validity of these codes, and how many will be issued (how dense will the space be).
I think the usual variables here are the character set for the code, and its length. Validity is more like a business requirement, and the density of codes will depend on your usage and also the length (which defines the size of your code space).
As an example, let's say you have 64 bits of entropy, you issued 10 million codes already, and your attacker can only perform online attacks by sending a request to your server, at a rate of say 100/second. These are likely huge overstatements towards the secure side.
That would mean there is a 0.17% chance somebody could find a valid code in a year. But would your attacker put so much effort into finding one single (random) valid code? Whether that's acceptable for you only depends on your specific case, only you can tell. If not, you can increase the length of the code for example.
I do not use OneDrive, but I can say from Google Drive that:
The links are not that short. I have just counted one and it's length is 32.
More than security, they probably made large links to do not run out of combinations as thousands of Drive files are shared each day. For security, Drive allows you to choose the users that can access to it. If you select "Everyone" then you should be sure that you don't have problem that anyone sees the content of the link. Even if the link cannot be found "by chance" there still exists the probability that someone else obtains the link from your friend and then shares it or that they are catched in proxies. Long links should be just complementary to other security measures.
Answering your questions:
Links of any length can be found, but longer links will require more time to be found. If you use all alphanumeric characters probably 30 is enough, but as I said they should not be the unique security in your system.
Just make them random, long and let the characters to be in a wide range.

Is a brute force attack a viable option in this event ticketing scheme

I plan on creating an ticket "pass" platform. Basically, imagine you come to a specific city, you buy a "pass" for several days (for which you get things like free entrance to museums and other attractions).
Now, the main question that bothered me for several days is: How will museum staff VALIDATE if the pass is valid? I see platforms like EventBrite etc. using barcodes/QR codes, but that is not quite a viable solution because we'll need to get a good camera phone for every museum to scan the code and that's over-budget. So I was thinking of something like a simple 6-letter code, for eg: GHY-AGF. There are 26^6 = 308 million combinations, which is a tough nut to crack.
I've asked a question on the StackExchange security site about this, and the main concern was the brute forcing. However, I imagine someone doing this kind of attack if: they had access of doing pass lookup. The only people that will be able to do this are:
1) The museum staff (for which there will be a secure user/pass app, and rate limits of no more than 1000 look-ups per day)
b) Actual customers to check the validity of their pass, and this will be protected with Google ReCaptcha v3, which doesn't sacrifice user experience like with v1. Also rate limits and IP bans will be applied
Is a brute force STILL a viable attack if I implement these 2 measures in place? Also, is there something else I'm missing in terms of security, when using this approach?
By the way, Using a max. 6-character-long string as a unique "pass" has many advantages portable-wise, for eg. you could print "blank" passes, where the user will be give instructions on how to obtain it. After they pay, they'll be given a code like: GAS-GFS, which they can easily write with a pen on the pass. This is not possible with a QR/barcode. Also, the staff can check the validity in less than 10 seconds, by typing it in a web-app, or sending an SMS to check if it's valid. If you're aware of any other portable system like this, that may be more secure, let me know.
Brute forcing is a function of sparseness. How many codes at any given time are valid out of how large a space? For example, if out of your 308M possibilities, 10M are valid (for a given museum), then I only need ~30 guesses to hit a collision. If only 1000 are valid, then I need more like 300k guesses. If those values are valid indefinitely, I should expect to hit one in less than a year at 1000/day. It depends on how much they're worth to figure out if that's something anyone would do.
This whole question is around orders of magnitude. You want as many as you can get away with. 7 characters would be better than 6 (exactly 26x better). 8 would be better than that. It depends on how devoted your attackers are and how big the window is.
But that's how you think about the problem to choose your space.
What's much more important is making sure that codes can't be reused, and are limited to a single venue. In all problems like this, reconciliation (i.e. keeping track of what's been issued and what's been used) is more important than brute-force protection. Posting a number online and having everyone use it is dramatically simpler than making millions of guesses.

Advice for securing data with a "light" auth on a public webpage

Requirements
Display semi-secure data on a public webpage (not behind an auth wall) but provide some semblance of security to that data. The user of this system will receive a piece of paper in the mail that has both a 5-8 digit check number and a 10-digit alphanumeric verification code.
Proposal
First, a peremptory declaration, I know the data will never been 100% secure without some sort of formal authentication.
I am proposing to have the user enter 3 pieces of information, a numeric 5-8 digit check number, an EIN number, which I believe is 8-10 numeric digits, then a verification code, which we are proposing to be 10 digits alphanumeric.
We would put a server side control on the page to prevent repeated submissions from a single IP address in a XX minute timespan, or force a captcha, I haven't decided yet.
I have two questions...
Is the 10-digit alphanumeric code long enough, and I realize this is a relative answer. I'm looking for a "Is this good enough for not-terribly-secure data?" gut check.
Is there a better alternative? We can't expect PGP keys or anything of the sort. All the user of the system gets is a piece of paper in the mail.
A 10 digit alphanumeric code gives you 52 bits of entropy (possible characters is 26 letters + 10 digits, assuming single case), assuming it is generated by a CSPRNG:
log2(36^10) = 52
As a rough guide, 59 bits would take a determined attacker 457.50 hours (just over 19 days). However, make this 16 characters and you will have the 80 bits NIST recommend for strong passwords (109,527.95 years to crack). Note these times are maximum, for average simply divide by two.
Sending a password using an out of bound mechanism such as the mail is a good security measure. Make sure the password is stored securely on your system using one way bcrypt hashes.

How do I hash an integer into a very small string?

I need a function that, given a salt integer and a value integer will return a small hash string. Calling the function with 1 and 56 might return "1AF3". Calling it with 2 and 56 might return "C2FA".
Background info:
I have a web app (written in C# if that matters) that stores employee Id values as integers. Users need to be able to see a consistent representation of that Id, but no user should see the actual Id, or the same representation of that Id as seen by another user.
For example, suppose there is an Employee with the Id of 56.
When User 1 logs in, wherever he sees that employee, he sees "1AF3" or something. He might see this employee on different pages in the app, and its Id should always be 1AF3 so he knows it's the same guy.
When User 2 logs in, should he encounter that same employee, he would always see "C2FA", or something. Same goes for User 2: wherever he is in the system, he would see that one employee represented by that same string.
Should User 2 look over the shoulder of User 1 while User 1 is logged in, User 2 should not be able to recognize any of his employees on User 1's screen, because this hash should be irreversible.
Does this make sense?
One additional requirement is that since the users will be discussing these employees in email, on the phone, and in faxes, the hash would need to be of a minimum size and not contain non-alphanumeric characters. 10 characters or fewer would be ideal.
Maybe there is a way to "collapse" a SHA-256 result into fewer characters since the whole alphabet could be used? I have no idea.
Update: Another walk-through
Thanks everyone for giving this a shot but it seems like I am doing a bad job explaining it or something.
Let's pretend you and me are both users of this system. You're Fred and I'm Chris. Your UserId is 2 and my UserId is 1. Let's also assume there are 5 Employees in the system. Employees are not users. You can think of them as products, or whatever you want. I'm just talking about 5 generic entities that you, Fred, and I, Chris, each deal with.
Fred, every time you log in, you need to be able to uniquely identify each employee. Every time I, Chris, log in, I also need to work with employees and I too will need to be able to identify them uniquely. But should I ever look over your shoulder while you are managing employees, I should not be able to figure out which ones you are managing.
So, while in the database, the employee IDs are 1, 2, 3, 4, and 5. You and I do not see them that way in our interface. I might see A, B, C, D, and E, and you might see F, G, H, I, and J. So while E and J both represent the same employee, I can't look at your screen while you are working with your Employee "J" and know that you are working with Employee 5, because for me, that employee is called Employee "E" for me.
So, Fred and Chris can each work with the same set of employees, but if they were to see each other's work, or discussion in an email, they would not be able to know what employees the other guy was talking about.
I was thinking I could achieve this "real-time user-dependent EmployeeID" by taking the real employee ID and hashing it using the user ID as the salt.
Since Fred and Chris each need to discuss employees over email and the telephone with their clients and customers, I'd like the IDs that they use in these discussions to be as simple as we can get them.
Conceptually, here is what you want:
You have a set of employee IDs which you can represent as element in a given space S. You also have some users, and you want each user to see a permutation of space S, which is specific to the user, and such that the details of that permutation cannot be guessed by any other user.
This calls for symmetric encryption. Namely, each employee ID is a numerical value (e.g. a 32-bit integer), and a user 'A' sees employee x as Ek(x), there k is a secret key which is specific to 'A' and that 'B' cannot guess. So you need two things:
a block cipher which can work with short values (e.g. 32-bit words);
a method which turns user ID into the user-specific key.
For the block cipher, the trouble is that short blocks are a security issue for the normal usage of a block cipher (i.e. to encrypt long messages). So all published, secure block ciphers use large blocks (64 bits or more). 64 bits can be represented over 11 characters by using uppercase and lowercase letters, and digits (6211 is somewhat greater than 264). If that's good enough for you, then use 3DES. If you want something smaller, you will have to design your own cipher, something which is not recommended at all. You may want to try KeeLoq: see this paper for pointers (KeeLoq is cryptographically "broken" but not too much, given your context). There is a generic method for building block ciphers with arbitrary block sizes, given a seekable stream cipher, but this is mostly theoretical (implementation requires waddling through high-precision floating point values, which can be done but is very slow).
For the user-specific key: you want something that the Web application can compute, but not users. This means that the Web application knows a secret key K; then, the user-specific encryption key can be the result of HMAC (with a good hash function, such as SHA-256) applied over the user ID, and using key K. You then truncate the HMAC output to the length you need for the user-specific key (for instance, 3DES needs a 24-byte key).
C# has TripleDES and HMAC/SHA-256 implementations (in System.Security.Cryptography namespace).
(There is no generally accepted secure standard for a block cipher with 32-bit blocks. This is still an open research area.)
There might be problems with this approach but you could do it like this:
Make an array holding all your symbols (say a 25 element array)
Hash your string using whatever hash function
Pick a number of octets out of the resulting hash (4 octets if you want 4 symbols in our resulting string) from predefined positions
For each octet compute index = octet % array_size. The index gives the position for each of your symbols
Again, I have almost zero experience with cryptography, hash functions and the like so you may want to take this with a grain of salt.
There are many ways to "de-anonymize" information. It would help if you could be more specific about the context and what "assets" you are really trying to protect here, against who. See our faq.
E.g., might one user know the number of another user? They could probably find it out quickly if they discovered thru other means the correspondence between 1AF3 and C2FA.
But specifically for your narrower question, a good hash will already be well-mixed, so I'd think you could just truncate, e.g., a SHA-256 hash value. But Thomas will probably know the definitive answer there.
Here are my thoughts getting to the point of it (I figure if you talked out your question, I'll talk out my answer. I'm guessing you'll find that helpful):
All hail Thomas, because he has clearly established his dominance.
0-9, A-F is a representation of the data. You can make it A-Z, 0-9, exclude some uncommon letters, and represent six bits per character.
You can basically say that all hashes have collisions. If you approach saturation, you'll end up with two people who have the same hash. Hashes are also one-way. You would need a mapping that allows reversal. If you have a reverse mapping, why not fill it with random strings which don't collide?
You are obfuscating a limited set of data. With a large and secret salt, you can prevent reversal. That said, you're trading one ID for another. The ID is still unique and constant, so I wonder how this enhances security.
I have some clients where if I were to see something like this, I'd put money that the employee ID was a SSN. I hope you're not doing that.
Employee ID and Employee Alternate ID are what you are coming up with. Since they have to be reversible to you but not the public, you need to store that in a two way pairing and keep it secret. Since there's risk of collision with a hash and you have to have a reverse map anyway, the alternate id might as well be a random string. An ID should be arbitrary anyway, and I would really like to know the perceived security benefit of your approach with two ids for one employee; it makes me think of Mission Impossible and the NOC list.
Just an idea for an approach based on the extra information you have added. The security on this idea is very very light and i'm would not recommend it if you think people are going to attempt to crack it, but it's worth throwing in the pot.
You could create a personal hash by bit-shifting the employee Id based on your own employee Id. Then by adding whatever extra obfuscation code you need to the resulting number, such as converting it to hex. E.g.
string hashedEmployeeId = (employeeIdToHash << myEmployeeId).ToString("X");
This will generate hashed employee Ids based on your own Id, but you may run into problems when the employee Ids get large (especially your own!)
Just to reiterate, this on it's own isn't really very secure but it might help you on your way.
Using 4 characters you would have a total of: 36^4 = 1679616.
You could permute all possibilities of employes togheter.
If you calculate de square root you get 1296.
You could then generate an ordered table with all the possibilities in the first column and then randomly distribute ids from 1 to 1296 in to oder columns. You would get something like this:
key a b
AAAA 386 67
AAAB 86 945
...
With this solution you would have a lookup table scalable up to 1296 employes. However if you consider adding an extra character to your key you would get a lot more possibilities (36^5)^0.5=7776.
With this solution gessing a key would give you one chance on 1296 or 7776 to see information about an employe.
May be performance would be an issue, but I tink you can manage this using a cache or may be even keeping all the data loaded in memory and use a kind of tree map to find corresponding key for two given ids.

Password complexity strategies - any evidence for them?

On more than one occasion I've been asked to implement rules for password selection for software I'm developing. Typical suggestions include things like:
Passwords must be at least N characters long;
Passwords must include lowercase, uppercase and numbers;
No reuse of the last M passwords (or passwords used within P days).
And so on.
Something has always bugged me about putting any restrictions on passwords though - by restricting the available passwords, you reduce the size of the space of all allowable passwords. Doesn't this make passwords easier to guess?
Equally, by making users create complex, frequently-changing passwords, the temptation to write them down increases, also reducing security.
Is there any quantitative evidence that password restriction rules make systems more secure?
If there is, what are the 'most secure' password restriction strategies to use?
Edit Ólafur Waage has kindly pointed out a Coding Horror article on dictionary attacks which has a lot of useful analysis in it, but it strikes me that dictionary attacks can be massively reduced (as Jeff suggests) by simply adding a delay following a failed authentication attempt.
With this in mind, what evidence is there that forced-complex passwords are more secure?
Something has always bugged me about
putting any restrictions on passwords
though - by restricting the available
passwords, you reduce the size of the
space of all allowable passwords.
Doesn't this make passwords easier to
guess?
In theory, yes. In practice, the "weak" passwords you disallow represent a tiny subset of all possible passwords that is disproportionately often chosen when there are no restrictions, and which attackers know to attack first.
Equally, by making users create
complex, frequently-changing
passwords, the temptation to write
them down increases, also reducing
security.
Correct. Forcing users to change passwords every month is a very, very bad idea, except perhaps in extreme high-security environments where everyone really understands the need for security.
Those kind of rules definitely help because it stops stupid users from using passwords like "mypassword", which unfortunately happens quite often.
So actually, you are forcing the users into an extremely large set of potential passwords. It doesn't matter that you are excluding the set of all passwords with only lowercase letters, because the remaining set is still orders of magnitude larger.
BUT my big pet peeves are password restrictions I've encountered on major sites, like
No special characters
Maximum length
Why would anyone do this? W.H.Y.????
A nice read up on this is Jeff's article on Dictionary Attacks.
Never prevent the user from doing what they really want, unless there is a technical limitation from doing so.
You may nag the hell out of the user for doing stupid things like using a dictionary word or a 3-character password, or only using numbers, but see #1 above.
There is no good technical reason to require only alphanumerics, or at least one capital letter, or at least one number; see #1 above.
I forget which website had this advice regarding passwords: "Pick a password that is very easy for you to remember, but very hard for someone else to guess." But then they proceeded to require at least one capital letter and one number.
The problem with passwords is that they are so ubiquitous that it is essentially impossible for any person without a photographic memory to actually remember them without writing them down, and therefore leaving a serious security hole should someone gain access to this list of written-down passwords.
The only way I am able to manage this for myself is to split most of my passwords -- and I just checked my list, I'm up to 130 so far! -- into two parts, one which is the same in all cases, and the other which is unique but simple. (I break this rule for sites requiring high-security like bank accounts.)
By requiring "complexity" as defined as multiple types of characters all present, is that it forces people into a disparate set of conventions for different sites, which makes it harder to remember the password in question.
The only reason I will acknowledge for sites limiting the set of allowable password characters, is that it needs to be typeable on a keyboard. If you have to assume the account needs to be accessed from multiple countries, then keyboards may not always support the same characters on the user's home keyboard.
One of these days I'll have to make a blog posting on the subject. :(
My old limit theorem:
As the security of the password approaches adequate, the probability that it will be on a sticky note attached to the computer or monitor approaches one.
One also might point out the recent fiasco over at twitter where one of their admin's password turned out to be "happiness", which fell to a dictionary attack.
For questions like this, I ask myself what Bruce Schneier would do - the linked article is about how to choose passwords which are hard to guess with typical attacks.
Also note that if you add a delay after a failed attempt, you might also want to add a delay after a successful attempt, otherwise the delay is simply a signal that the attack has failed an other attempt should be launched.
Whilst this does not directly answer your question, I personally find the most aggrevating rule I have encountered one whereby you could not reuse any password previously used. After working at the same place for a number of years, and having to change your password every 2/3 months, the ability to use a password I chose over a year ago would not seem to be particularly unsafe or unsecure. If I have used "safe" passwords in the past (Alphanumeric with changes in case), surely reusing them after a perios of say a year or 2 (depending on how regularly you have to change your password) would seem to be acceptable to me. It also means I am less likely to use "easier" passwords, which might happen if I can't think of anything easy to remember and difficult to guess!
First let me say that details such as minimum length, case sensitivity and required special characters should depend on who has access and what the password allows them to do. If it's a code to launch a nuclear missile, it should be more strict than a password to log in to play your paid online edition of Angry Birds.
But I've got a SPECIFIC beef with case sensitivity.
For starters, users hate it. The human brain thinks "A=a". Of course, developers brains' aren't usually typical. ;-) But developers are also inconvenienced by case sensitivity.
Second, the CapsLock key is too easy to hit by mistake. It's right between Tab and Shift keys, but it SHOULD be up above the Esc key. Its location was established long ago in the days of typewriters, which had no alternate font available. In those days it was useful to have it there.
All passwords have risk... You're balancing risk with ease-of-use, and yes, usability matters.
MY ARGUMENT:
Yes, case sensitivity is more secure for a given password length. But unless someone is making me do otherwise, I opt for a longer minimum password length. Even if we assume only letters and digits are allowed, each added character multiplies number of the possible passwords by 36.
Someone who's less lazy than me with math could tell you the difference in number of combinations between, say a minimum 8-character case-sensitive password, and a 12-character case-insensitive password. I think most users would prefer the latter.
Also, not all apps expose usernames to others, so there are potentially two fields the hacker may have to find.
I also prefer to allow spaces in passwords as long as the majority of the password isn't spaces.
In the project I'm developing now, my management screen allows the administrator to change password requirements, which apply to all future passwords. He can also force all users to update passwords (to new requirements) at any time after next logon. I do this because I feel my stuff doesn't need case-sensitivity, but the administrator (who probably paid me for the software) may disagree so I let that person decide.
The PIN for my bank card is only four digits. Since it's only numbers it's not case sensitive. And heck, it's my MONEY! If you consider nothing else, this sounds pretty insecure, were it not for the fact that the hacker has to steal my card to get my money. (And have his photo taken.)
One other beef: Developers who come onto StackOverflow and regurgitate hard-and-fast rules that they read in an article somewhere. "Never hard code anything." (As if that's possible.) "All queries must be parameterized" (not if the the user doesn't contribute to the query.) etc.
Please excuse the rant. ;-) I promise I respect disagreement.
Personally for this paticular problem I tend to give passwords a 'score' based on characteristics of the entered text, and refuse passwords that don't meet the score.
For example:
Contains Lower Case Letter +1
Contains different Lower Case Letter +1
Contains Upper Case Letter +1
Contains different Upper Case Letter +1
Contains Non-Alphanumeric character: +1
Contains different Non-Alphanumeric character: +1
Contains Number: +1
Contains Non Consecutive or repeated Second Number: +1
Length less than 8: -10
Length Greater than 12: +1
Contains Dictionary word: -4
Then only allowing passwords with a score greater than 4, (and providing the user feedback as they create their password via javascript)

Resources