How are software license keys generated? - security

License keys are the defacto-standard as an anti-piracy measure. To be honest, this strikes me as (in)Security Through Obscurity, although I really have no idea how license keys are generated. What is a good (secure) example of license key generation? What cryptographic primitive (if any) are they using? Is it a message digest? If so, what data would they be hashing? What methods do developers employ to make it difficult for crackers to build their own key generators? How are key generators made?

For old-school CD keys, it was just a matter of making up an algorithm for which CD keys (which could be any string) are easy to generate and easy to verify, but the ratio of valid-CD-keys to invalid-CD-keys is so small that randomly guessing CD keys is unlikely to get you a valid one.
INCORRECT WAY TO DO IT:
Starcraft and Half-life both used the same checksum, where the 13th digit verified the first 12. Thus, you could enter anything for the first 12 digits, and guess the 13th (there's only 10 possibilities), leading to the infamous 1234-56789-1234
The algorithm for verifying is public, and looks something like this:
x = 3;
for(int i = 0; i < 12; i++)
{
x += (2 * x) ^ digit[i];
}
lastDigit = x % 10;
CORRECT WAY TO DO IT
Windows XP takes quite a bit of information, encrypts it, and puts the letter/number encoding on a sticker. This allowed MS to both verify your key and obtain the product-type (Home, Professional, etc.) at the same time. Additionally, it requires online activation.
The full algorithm is rather complex, but outlined nicely in this (completely legal!) paper, published in Germany.
Of course, no matter what you do, unless you are offering an online service (like World of Warcraft), any type of copy protection is just a stall: unfortunately, if it's any game worth value, someone will break (or at least circumvent) the CD-key algorithm, and all other copyright protections.
REAL CORRECT WAY TO DO IT:
For online-services, life is a bit simpler, since even with the binary file you need to authenticate with their servers to make any use of it (eg. have a WoW account). The CD-key algorithm for World of Warcraft - used, for instance, when buying playtime cards - probably looks something like this:
Generate a very large cryptographically-secure random number.
Store it in our database and print it on the card.
Then, when someone enters a playtime-card number, check if it's in the database, and if it is, associate that number with the current user so it can never be used again.
For online services, there is no reason not to use the above scheme; using anything else can lead to problems.

When I originally wrote this answer it was under an assumption that the question was regarding 'offline' validation of licence keys. Most of the other answers address online verification, which is significantly easier to handle (most of the logic can be done server side).
With offline verification the most difficult thing is ensuring that you can generate a huge number of unique licence keys, and still maintain a strong algorithm that isnt easily compromised (such as a simple check digit)
I'm not very well versed in mathematics, but it struck me that one way to do this is to use a mathematical function that plots a graph
The plotted line can have (if you use a fine enough frequency) thousands of unique points, so you can generate keys by picking random points on that graph and encoding the values in some way
As an example, we'll plot this graph, pick four points and encode into a string as "0,-500;100,-300;200,-100;100,600"
We'll encrypt the string with a known and fixed key (horribly weak, but it serves a purpose), then convert the resulting bytes through Base32 to generate the final key
The application can then reverse this process (base32 to real number, decrypt, decode the points) and then check each of those points is on our secret graph.
Its a fairly small amount of code which would allow for a huge number of unique and valid keys to be generated
It is however very much security by obscurity. Anyone taking the time to disassemble the code would be able to find the graphing function and encryption keys, then mock up a key generator, but its probably quite useful for slowing down casual piracy.

Check tis article on Partial Key Verification which covers the following requirements:
License keys must be easy enough to type in.
We must be able to blacklist (revoke) a license key in the case of chargebacks or purchases with stolen credit cards.
No “phoning home” to test keys. Although this practice is becoming more and more prevalent, I still do not appreciate it as a user, so will not ask my users to put up with it.
It should not be possible for a cracker to disassemble our released application and produce a working “keygen” from it. This means that our application will not fully test a key for verification. Only some of the key is to be tested. Further, each release of the application should test a different portion of the key, so that a phony key based on an earlier release will not work on a later release of our software.
Important: it should not be possible for a legitimate user to accidentally type in an invalid key that will appear to work but fail on a future version due to a typographical error.

I've not got any experience with what people actually do to generate CD keys, but (assuming you're not wanting to go down the road of online activation) here are a few ways one could make a key:
Require that the number be divisible by (say) 17. Trivial to guess, if you have access to many keys, but the majority of potential strings will be invalid. Similar would be requiring that the checksum of the key match a known value.
Require that the first half of the key, when concatenated with a known value, hashes down to the second half of the key. Better, but the program still contains all the information needed to generate keys as well as to validate them.
Generate keys by encrypting (with a private key) a known value + nonce. This can be verified by decrypting using the corresponding public key and verifying the known value. The program now has enough information to verify the key without being able to generate keys.
These are still all open to attack: the program is still there and can be patched to bypass the check. Cleverer might be to encrypt part of the program using the known value from my third method, rather than storing the value in the program. That way you'd have to find a copy of the key before you could decrypt the program, but it's still vulnerable to being copied once decrypted and to having one person take their legit copy and use it to enable everyone else to access the software.

CD-Keys aren't much of a security for any non-networked stuff, so technically they don't need to be securely generated. If you're on .net, you can almost go with Guid.NewGuid().
Their main use nowadays is for the Multiplayer component, where a server can verify the CD Key. For that, it's unimportant how securely it was generated as it boils down to "Lookup whatever is passed in and check if someone else is already using it".
That being said, you may want to use an algorhithm to achieve two goals:
Have a checksum of some sort. That allows your Installer to display "Key doesn't seem valid" message, solely to detect typos (Adding such a check in the installer actually means that writing a Key Generator is trivial as the hacker has all the code he needs. Not having the check and solely relying on server-side validation disables that check, at the risk of annoying your legal customers who don't understand why the server doesn't accept their CD Key as they aren't aware of the typo)
Work with a limited subset of characters. Trying to type in a CD Key and guessing "Is this an 8 or a B? a 1 or an I? a Q or an O or a 0?" - by using a subset of non-ambigous chars/digits you eliminate that confusion.
That being said, you still want a large distribution and some randomness to avoid a pirate simply guessing a valid key (that's valid in your database but still in a box on a store shelf) and screwing over a legitimate customer who happens to buy that box.

The key system must have several properties:
very few keys must be valid
valid keys must not be derivable even given everything the user has.
a valid key on one system is not a valid key on another.
others
One solution that should give you these would be to use a public key signing scheme. Start with a "system hash" (say grab the macs on any NICs, sorted, and the CPU-ID info, plus some other stuff, concatenate it all together and take an MD5 of the result (you really don't want to be handling personally identifiable information if you don't have to)) append the CD's serial number and refuse to boot unless some registry key (or some datafile) has a valid signature for the blob. The user activates the program by shipping the blob to you and you ship back the signature.
Potential issues include that you are offering to sign practically anything so you need to assume someone will run a chosen plain text and/or chosen ciphertext attacks. That can be mitigated by checking the serial number provided and refusing to handle request from invalid ones as well as refusing to handle more than a given number of queries from a given s/n in an interval (say 2 per year)
I should point out a few things: First, a skilled and determined attacker will be able to bypass any and all security in the parts that they have unrestricted access to (i.e. everything on the CD), the best you can do on that account is make it harder to get illegitimate access than it is to get legitimate access. Second, I'm no expert so there could be serious flaws in this proposed scheme.

If you aren't particularly concerned with the length of the key, a pretty tried and true method is the use of public and private key encryption.
Essentially have some kind of nonce and a fixed signature.
For example:
0001-123456789
Where 0001 is your nonce and 123456789 is your fixed signature.
Then encrypt this using your private key to get your CD key which is something like:
ABCDEF9876543210
Then distribute the public key with your application. The public key can be used to decrypt the CD key "ABCDEF9876543210", which you then verify the fixed signature portion of.
This then prevents someone from guessing what the CD key is for the nonce 0002 because they don't have the private key.
The only major down side is that your CD keys will be quite long when using private / public keys 1024-bit in size. You also need to choose a nonce long enough so you aren't encrypting a trivial amount of information.
The up side is that this method will work without "activation" and you can use things like an email address or licensee name as the nonce.

I realize that this answer is about 10 years late to the party.
A good software license key/serial number generator consists of more than just a string of random characters or a value from some curve generator. Using a limited alphanumeric alphabet, data can be embedded into a short string (e.g. XXXX-XXXX-XXXX-XXXX) that includes all kinds of useful information such as:
Date created or the date the license expires
Product ID, product classification, major and minor version numbers
Custom bits like a hardware hash
Per-user hash checksum bits (e.g. the user enters their email address along with the license key and both pieces of information are used to calculate/verify the hash).
The license key data is then encrypted and then encoded using the limited alphanumeric alphabet. For online validation, the license server holds the secrets for decrypting the information. For offline validation, the decryption secret(s) are included with the software itself along with the decryption/validation code. Obviously, offline validation means the software isn't secure against someone making a keygen.
Probably the hardest part about creating a license key is figuring out how to cram as much data as possible into as few bytes as possible. Remember that users will be entering in their license keys by hand, so every bit counts and users don't want to type extremely long, complex strings in. 16 to 25 character license keys are the most common and balance how much data can be placed into a key vs. user tolerance for entering the key to unlock the software. Slicing up bytes into chunks of bits allows for more information to be included but does increase code complexity of both the generator and validator.
Encryption is a complex topic. In general, standard encryption algorithms like AES have block sizes that don't align with the goal of keeping license key lengths short. Therefore, most developers making their own license keys end up writing their own encryption algorithms (an activity which is frequently discouraged) or don't encrypt keys at all, which guarantees that someone will write a keygen. Suffice it to say that good encryption is hard to do right and a decent understanding of how Feistel networks and existing ciphers work are prerequisites.
Verifying a key is a matter of decoding and decrypting the string, verifying the hash/checksum, checking the product ID and major and minor version numbers in the data, verifying that the license hasn't expired, and doing whatever other checks need to be performed.
Writing a keygen is a matter of knowing what a license key consists of and then producing the same output that the original key generator produces. If the algorithm for license key verification is included in and used by the software, then it is just a matter of creating software that does the reverse of the verification process.
To see what the entire process looks like, here is a blog post I recently wrote that goes over choosing the license key length, the data layout, the encryption algorithm, and the final encoding scheme:
https://cubicspot.blogspot.com/2020/03/adventuring-deeply-into-software-serial.html
A practical, real-world implementation of the key generator and key verifier from the blog post can be seen here:
https://github.com/cubiclesoft/php-misc/blob/master/support/serial_number.php
Documentation for the above class:
https://github.com/cubiclesoft/php-misc/blob/master/docs/serial_number.md
A production-ready open source license server that generates and manages license keys using the above serial number code can be found here:
https://github.com/cubiclesoft/php-license-server
The above license server supports both online and offline validation modes. A software product might start its existence with online only validation. When the software product is ready to retire and no longer supported, it can easily move to offline validation where all existing keys continue to work once the user upgrades to the very last version of the software that switches over to offline validation.
A live demo of how the above license server can be integrated into a website to sell software licenses plus an installable demo application can be found here (both the website and demo app are open source too):
https://license-server-demo.cubiclesoft.com/
Full disclosure: I'm the author of both the license server and the demo site software.

There are also DRM behaviors that incorporate multiple steps to the process. One of the most well known examples is one of Adobe's methods for verifying an installation of their Creative Suite. The traditional CD Key method discussed here is used, then Adobe's support line is called. The CD key is given to the Adobe representative and they give back an activation number to be used by the user.
However, despite being broken up into steps, this falls prey to the same methods of cracking used for the normal process. The process used to create an activation key that is checked against the original CD key was quickly discovered, and generators that incorporate both of the keys were made.
However, this method still exists as a way for users with no internet connection to verify the product. Going forward, it's easy to see how these methods would be eliminated as internet access becomes ubiquitous.

All of the CD only copy protection algorithms inconvience honest users while providing no protection against piracy whatsoever.
The "pirate" only need to have access to one legitimate cd and its access code, he can then make n copies and distribute them.
It does not matter how cryptographically secure you make the code, you need to supply this with the CD in plain text or an legitimate user cannot activite the software.
Most secure schemes involve either the user providing the software supplier with some details of the machine which will run the software (cpu serial numbers, mac addresses, Ip address etc.), or, require online access to register the software on the suppliers website and in return receive an activitation token. The first option requires a lot of manual administration and is only worth it for very high value software, the, second option can be spoofed and is absolutly infuriating if you have limited network access or you are stuck behind a firewall.
On the whole its much easier to establish a trust relationship with your customers!

You can use and implement Secure Licensing API from very easily in your Software Projects using it,(you need to download the desktop application for creating secure license from https://www.systemsoulsoftwares.com/)
Creates unique UID for client software based on System Hardware(CPU,Motherboard,Hard-drive)
(UID acts as Private Key for that unique system)
Allows to send Encrypted license string very easily to client system, It verifies license string and works on only that particular system
This method allows software developers or company to store more information about software/developer/distributor services/features/client
It gives control for locking and unlocked the client software features, saving time of developers for making more version for same software with changing features
It take cares about trial version too for any number of days
It secures the License timeline by Checking DateTime online during registration
It unlocks all hardware information to developers
It has all pre-build and custom function that developer can access at every process of licensing for making more complex secure code

Related

Using asymmetric encryption to secure passwords

Due to our customer's demands, user passwords must be kept in some "readable" form in order to allow accounts to be converted at a later date. Unfortunately, just saving hash values and comparing them on authentication is not an option here. Storing plain passwords in the database is not an option either of course, but using an encryption scheme like AES might be one. But in that case, the key to decrypt passwords would have to be stored on the system handling authentication and I'm not quite comfortable with that.
Hoping to get "best of both worlds", my implementation is now using RSA asymmetric encryption to secure the passwords. Passwords are salted and encrypted using the public key. I disabled any additional, internal salting or padding mechanisms. The encrypted password will be the same every time, just like a MD5 or SHA1 hashed password would be. This way, the authentication system needs the public key, only. The private key is not required.
The private key is printed out, sealed and stored offline in the company's safe right after it is created. But when the accounts need to be converted later, it will allow access to the passwords.
Before we deploy this solution, I'd like to hear your opinion on this scheme. Any flaws in design? Any serious drawbacks compared to the symmetric encryption? Anything else we are missing?
Thank you very much in advance!
--
Update:
In response to Jack's arguments below, I'd like to add the relevant implementation details for our RSA-based "hashing" function:
Security.addProvider(new org.bouncycastle.jce.provider.BouncyCastleProvider());
Cipher rsa = Cipher.getInstance("RSA/None/NoPadding");
rsa.init(Cipher.ENCRYPT_MODE, publicKey);
byte[] cryptRaw = rsa.doFinal(saltedPassword.getBytes());
Having quickly skimmed over the paper mentioned by Jack, I think I somewhat understand the importance of preprocessing such as OAEP. Would it be alright to extend my original question and ask if there is a way to apply the needed preprocessing and still have the function return the same output every time for each input, just as a regular hashing function would? I would accept an answer to that "bonus question" here. (Or should I make that a seperate question on SOF?)
--
Update 2:
I'm having a hard time accepting one of the present answers because I feel that none really does answer my question. But I no longer expect any more answers to come, so I'll accept the one that I feel is most constructive.
I'm adding this as another answer because instead of answering the question asked (as I did in the first response) this is a workaround / alternative suggestion.
Simply put:
Use hashes BUT, whenever a user changes their password, also use your public key as follows:
Generate a random symmetric key and use it to encrypt the timestamp, user identifier, and new password.
The timestamp is to ensure you don't mess up later when trying to find the current / most up-to-date password.
Username so that you know which account you're dealing with.
Password because it is a requirement.
Store the encrypted text.
Encrypt the symmetric key using your public key.
Store the public key encrypted symmetric key with the encrypted text.
Destroy the in-memory plaintext symmetric key, leaving only the public key encrypted key.
When you need to 'convert' the accounts using the current password, you use the private key and go through the password change records. For each one:
Using the private key, decrypt the symmetric key.
Using the symmetric key, decrypt the record.
If you have a record for this user already, compare timestamps, and keep the password that is most recent (discarding the older).
Lather, rinse, repeat.
(Frankly I'm probably overdoing things by encrypting the timestamp and not leaving it plaintext, but I'm paranoid and I have a thing for timestamps. Don't get me started.)
Since you only use the public key when changing passwords, speed isn't critical. Also, you don't have to keep the records / files / data where the plaintext password is encrypted on the server the user uses for authentication. This data can be archived or otherwise moved off regularly, as they aren't required for normal operations (that's what the hash is for).
There is not enough information in the question to give any reasonable answer. Anyway since you disable padding there is a good chance that one of the attacks described in the paper
"Why Textbook ElGamal and RSA Encryption are Insecure" by
D. Boneh, A. Joux, and P. Nguyen is applicable.
That is just a wild guess of course. Your proposal could be susceptible to a number of other attacks.
In terms of answering your specific question, my main concern would have been management of the private key but given it's well and truly not accessible via any computer system breach, you're pretty well covered on that front.
I'd still question the logic of not using hashes though - this sounds like a classic YAGNI. A hashing process is deterministic so even if you decided to migrate systems in the future, so long as you can still use the same algorithm, you'll get the same result. Personally, I'd pick a strong hash algorithm, use a cryptographically strong, unique salt on each account and be done with it.
It seems safe enough in terms of what is online but have you given full consideration to the offline storage. How easy will it be for people within your company to get access to the private key? How would you know if someone within your company had accessed the private key? How easy would it be for the private key to be destroyed (e.g. is the safe fireproof/waterproof, will the printed key become illegible over time etc).
You need to look at things such as split knowledge, dual control, tamper evident envelopes etc. As a minimum I think you need to print out two strings of data which when or'd together create the private key and then have one in your office and one in your customers office,
One serious drawback I've not seen mentioned is the speed.
Symmetric encryption is generally much much faster than asymmetric. That's normally fine because most people account for that in their designs (SSL, for example, only uses asymmetric encryption to share the symmetric key and checking certificates). You're going to be doing asymmetric (slow) for every login, instead of cryptographic hashing (quite fast) or symmetric encryption (pretty snappy). I don't know that it will impact performance, but it could.
As a point of comparison: on my machine an AES symmetric stream cipher encryption (aes-128 cbc) yields up to 188255kB/s. That's a lot of passwords. On the same machine, the peak performance for signatures per second (probably the closest approximation to your intended operation) using DSA with a 512 bit key (no longer used to sign SSL keys) is 8916.2 operations per second. That difference is (roughly) a factor of a thousand assuming the signatures were using MD5 sized checksums. Three orders of magnitude.
This direct comparison is probably not applicable directly to your situation, but my intention was to give you an idea of the comparative algorithmic complexity.
If you have cryptographic algorithms you would prefer to use or compare and you'd like to benchmark them on your system, I suggest the 'openssl speed' command for systems that have openssl builds.
You can also probably mitigate this concern with dedicated hardware designed to accelerate public key cryptographic operations.

Authenticating addons and files

If you ever played the original startcraft and selected an official map made by Blizzard you would notice a little "Blizz" icon next to the map to let you know that it was official and not made by third-party.
I wish to implement a similar system in my application whereby addons and files can be authenticated to let the user know whether or not they came from me or somebody else.
I know very little about security and would appreciate any help in this matter.
Public key cryptography. The client application has a copy of the official author's public signing key, and verifies a signature applied to the addon/file made with the author's private key.
Mac's answer is absolutely correct. To be more specific, the process generally would go as follows:
Signing:
A hash (e.g. SHA-1) is created from the content.
The hash is then signed using the private key, resulting in a signature (e.g. using DSA or RSA algorithm).
The signature is included with the content. If the content changes, the signature will become invalid.
Verification:
Client calculates a hash from the content.
The signature is decrypted with the known public key of the author (i.e. you).
If the hash inside the signature matches the calculated hash from #1 then it's OK. Otherwise the content was modified / isn't really from the author (i.e. you).
Some considerations:
You need to use a key size large enough to deter brute forcing.
These algorithms require a source of random data - both when generating keys and when signing, for example. If this is violated, then often the private key can be trivially revealed.
"Encrypting the hash with the public key" is a simplistic explanation. For example, with RSA - the hash needs to be padded with other data. On verification, the padding has to be checked as well. Otherwise, the signatures may be able to be forged easily. (See: OpenSSL debacle from a few years ago).
The standard advice is to use a proven, off-the-shelf cryptography library written by those with more experience. You need to be able to feed the library the key pair, and data to be signed/verify and say "sign/verify this" with minimal code involved. If you're worrying about the details of padding, or anything like that - you're probably doing it wrong. See: System.Security.Cryptography namespace in Microsoft .NET (very easy to use), or Microsoft CryptoAPI if you're doing standard Windows C/C++ programming. Other cross-platform libs exist too: pick something that works well on your platform.

Need a secure way to publicly display hash values

I am building a windows application to store backups of sensitive files. The purpose of my application is to store a copy of a file with its hash. The program or user will then display the hash publicly in case the user needs to prove they had the backup of the sensitive file at a certain time.
Motivation:
Some situations where this might be useful are:
Someone has a job at a company where they think they might be accused of doing something illegal. If they were accused of changing some data over time, it would be convenient to have copies of sensitive files related to their case over a period of time.
A politician might take notes about things they did each day, many of them about classified or sensitive subjects, and then want to be able to disclose her files at a later date if they are accused of something (for instance, if the CIA said they were briefed on torture…). Not absolute proof, but it would be hard to create fake backup files for every potential scenario, especially several years into the future.
Just to be clear, this application is mostly just an excuse for me to practice my coding skills. I don’t recommend using any type of cryptographic software that hasn’t been scrutinized by several professionals.
Possible Solutions:
For my application, I need to find a good place to publicly store the hash values. Here are my ideas so far:
Send the hash values to a group of people through email. (disadvantage: could annoy people, but would create a traceable record)
Publish the hash values on a public blog (disadvantage: if I ever got in serious legal trouble someone with resources could try to attack the free service I used and erase my data)
Publish the hash values using some online security service that stores documents but does not allow you to delete them. (I am not sure something like this exists.)
What is the most secure and convenient way to publicly display my hash values?
Hash your set of hashes so that you have only one hash to record. Then publish this hash in the classifieds of a widely archived newspaper.
Truly secure? Print out the hashes on a piece of paper along with a legal text to the effect of, "On this day XX/XX/XXXX I affirm these hashes to be accurately identifying these files with these dates." (not a lawyer, get one to verify this), then have it notarized. Then, save that piece of paper in a secure location.

Is it a security risk to use parts of GUID as a random passwords?

When users create an account in my web application, I generate a GUID and use the first 8 characters as their password which is then sent via email.
Is there a security risk I am overlooking in using GUIDs as passwords? I've taken a look at the questionAre GUIDs good passwords?, but that question pertains to personal passwords not random/generated passwords. Ideally, users will login and change their password if they want to.
Using GUIDs as passwords is a very bad idea. GUIDs are generated in a very predictable and well defined manner. Or in other words given enough information it would allow an attacker to predict the passwords of other users.
Predictable and well defined is the exact opposite of what you want in a password generator.
Yes, unless you know exactly how the GUID is built. For example, some GUIDs bundle the MAC address of the host in to the GUID. If you happen to use those bits, then that compromises a large amount of the bit space for the "random" password.
Simply put, GUIDs may be unique, but they are not necessarily random.
"Cryptanalysis of the WinAPI GUID generator shows that, since the sequence of V4 GUIDs is pseudo-random; given full knowledge of the internal state, it is possible to predict previous and subsequent values." http://en.wikipedia.org/wiki/Globally_unique_identifier
I wouldn't use it. It's not that hard to use a random number generator, after all, which are designed to be as random as possible, rather than attempting to guarantee global uniqueness.
This article says don't use it.
GUIDs come in a number of flavors; some have parts that are predictable.
On the other hand, it is very, very easy to generate random numbers.
Why use a questionable technique when a secure alternative is readily available?
Using part of the GUID, or even the whole thing, is a very bad idea. Even if most of it happens to be random, there's no guarantee that any particular portion will be.
I'm not sure there'd be much trouble using a hash of a GUID, or better yet a hash that combined a GUID with some other source of randomness (e.g. one might hash the time when the program starts, and then generate a passcode by returning part of a hash of the previous hash and a new GUID). If there's any randomness at all in GUID generation, the entropy of the hash should increase with each iteration. Note that the passcode should not reveal the entire hash value; some of that should be kept as secret internal state.

Is MD5 less secure than SHA et. al. in a practical sense?

I've seen a few questions and answers on SO suggesting that MD5 is less secure than something like SHA.
My question is, Is this worth worrying about in my situation?
Here's an example of how I'm using it:
On the client side, I'm providing a "secure" checksum for a message by appending the current time and a password and then hashing it using MD5. So: MD5(message+time+password).
On the server side, I'm checking this hash against the message that's sent using my knowledge of the time it was sent and the client's password.
In this example, am I really better off using SHA instead of MD5?
In what circumstances would the choice of hashing function really matter in a practical sense?
Edit:
Just to clarify - in my example, is there any benefit moving to an SHA algorithm?
In other words, is it feasible in this example for someone to send a message and a correct hash without knowing the shared password?
More Edits:
Apologies for repeated editing - I wasn't being clear with what I was asking.
Yes, it is worth worrying about in practice. MD5 is so badly broken that researchers have been able to forge fake certificates that matched a real certificate signed by a certificate authority. This meant that they were able to create their own fake certificate authority, and thus could impersonate any bank or business they felt like with browsers completely trusting them.
Now, this took them a lot of time and effort using a cluster of PlayStation 3s, and several weeks to find an appropriate collision. But once broken, a hash algorithm only gets worse, never better. If you care at all about security, it would be better to choose an unbroken hash algorithm, such as one of the SHA-2 family (SHA-1 has also been weakened, though not broken as badly as MD5 is).
edit: The technique used in the link that I provided you involved being able to choose two arbitrary message prefixes and a common suffix, from which it could generate for each prefix a block of data that could be inserted between that prefix and the common suffix, to produce a message with the same MD5 sum as the message constructed from the other prefix. I cannot think of a way in which this particular vulnerability could be exploited in the situation you describe, and in general, using a secure has for message authentication is more resistant to attack than using it for digital signatures, but I can think of a few vulnerabilities you need to watch out for, which are mostly independent of the hash you choose.
As described, your algorithm involves storing the password in plain text on the server. This means that you are vulnerable to any information disclosure attacks that may be able to discover passwords on the server. You may think that if an attacker can access your database then the game is up, but your users would probably prefer if even if your server is compromised, that their passwords not be. Because of the proliferation of passwords online, many users use the same or similar passwords across services. Furthermore, information disclosure attacks may be possible even in cases when code execution or privilege escalation attacks are not.
You can mitigate this attack by storing the password on your server hashed with a random salt; you store the pair <salt,hash(password+salt)> on the server, and send the salt to the client so that it can compute hash(password+salt) to use in place of the password in the protocol you mention. This does not protect you from the next attack, however.
If an attacker can sniff a message sent from the client, he can do an offline dictionary attack against the client's password. Most users have passwords with fairly low entropy, and a good dictionary of a few hundred thousand existing passwords plus some time randomly permuting them could make finding a password given the information an attacker has from sniffing a message pretty easy.
The technique you propose does not authenticate the server. I don't know if this is a web app that you are talking about, but if it is, then someone who can perform a DNS hijack attack, or DHCP hijacking on an unsecure wireless network, or anything of the sort, can just do a man-in-the-middle attack in which they collect passwords in clear text from your clients.
While the current attack against MD5 may not work against the protocol you describe, MD5 has been severely compromised, and a hash will only ever get weaker, never stronger. Do you want to bet that you will find out about new attacks that could be used against you and will have time to upgrade hash algorithms before your attackers have a chance to exploit it? It would probably be easier to start with something that is currently stronger than MD5, to reduce your chances of having to deal with MD5 being broken further.
Now, if you're just doing this to make sure no one forges a message from another user on a forum or something, then sure, it's unlikely that anyone will put the time and effort in to break the protocol that you described. If someone really wanted to impersonate someone else, they could probably just create a new user name that has a 0 in place of a O or something even more similar using Unicode, and not even bother with trying to forge message and break hash algorithms.
If this is being used for something where the security really matters, then don't invent your own authentication system. Just use TLS/SSL. One of the fundamental rules of cryptography is not to invent your own. And then even for the case of the forum where it probably doesn't matter all that much, won't it be easier to just use something that's proven off the shelf than rolling your own?
In this particular case, I don't think that the weakest link your application is using md5 rather than sha. The manner in which md5 is "broken" is that given that md5(K) = V, it is possible to generate K' such that md5(K') = V, because the output-space is limited (not because there are any tricks to reduce the search space). However, K' is not necessarily K. This means that if you know md5(M+T+P) = V, you can generate P' such that md5(M+T+P') = V, this giving a valid entry. However, in this case the message still remains the same, and P hasn't been compromised. If the attacker tries to forge message M', with a T' timestamp, then it is highly unlikely that md5(M'+T'+P') = md5(M'+T'+P) unless P' = P. In which case, they would have brute-forced the password. If they have brute-forced the password, then that means that it doesn't matter if you used sha or md5, since checking if md5(M+T+P) = V is equivalent to checking if sha(M+T+P) = V. (except that sha might take constant time longer to calculate, that doesn't affect the complexity of the brute-force on P).
However, given the choice, you really ought to just go ahead and use sha. There is no sense in not using it, unless there is a serious drawback to using it.
A second thing is you probably shouldn't store the user's password in your database in plain-text. What you should store is a hash of the password, and then use that. In your example, the hash would be of: md5(message + time + md5(password)), and you could safely store md5(password) in your database. However, an attacker stealing your database (through something like SQL injection) would still be able to forge messages. I don't see any way around this.
Brian's answer covers the issues, but I do think it needs to be explained a little less verbosely
You are using the wrong crypto algorithm here
MD5 is wrong here, Sha1 is wrong to use here Sha2xx is wrong to use and Skein is wrong to use.
What you should be using is something like RSA.
Let me explain:
Your secure hash is effectively sending the password out for the world to see.
You mention that your hash is "time + payload + password", if a third party gets a copy of your payload and knows the time. It can find the password (using a brute force or dictionary attack). So, its almost as if you are sending the password in clear text.
Instead of this you should look at a public key cryptography have your server send out public keys to your agents and have the agents encrypt the data with the public key.
No man in the middle will be able to tell whats in the messages, and no one will be able to forge the messages.
On a side note, MD5 is plenty strong most of the time.
It depends on how valuable the contents of the messages are. The SHA family is demonstrably more secure than MD5 (where "more secure" means "harder to fake"), but if your messages are twitter updates, then you probably don't care.
If those messages are the IPC layer of a distributed system that handles financial transactions, then maybe you care more.
Update: I should add, also, that the two digest algorithms are essentially interchangeable in many ways, so how much more trouble would it really be to use the more secure one?
Update 2: this is a much more thorough answer: http://www.schneier.com/essay-074.html
Yes, someone can send a message and a correct hash without knowing the shared password. They just need to find a string that hashes to the same value.
How common is that? In 2007, a group from the Netherlands announced that they had predicted the winner of the 2008 U.S. Presidential election in a file with the MD5 hash value 3D515DEAD7AA16560ABA3E9DF05CBC80. They then created twelve files, all identical except for the candidate's name and an arbitrary number of spaces following, that hashed to that value. The MD5 hash value is worthless as a checksum, because too many different files give the same result.
This is the same scenario as yours, if I'm reading you right. Just replace "candidate's name" with "secret password". If you really want to be secure, you should probably use a different hash function.
if you are going to generate a hash-mac don't invent your scheme. use HMAC. there are issues with doing HASH(secret-key || message) and HASH(message || secret-key). if you are using a password as a key you should also be using a key derivation function. have a look at pbkdf2.
Yes, it is worth to worry about which hash to use in this case. Let's look at the attack model first. An attacker might not only try to generate values md5(M+T+P), but might also try to find the password P. In particular, if the attacker can collect tupels of values Mi, Ti, and the corresponding md5(Mi, Ti, P) then he/she might try to find P. This problem hasn't been studied as extensively for hash functions as finding collisions. My approach to this problem would be to try the same types of attacks that are used against block ciphers: e.g. differential attacks. And since MD5 already highly susceptible to differential attacks, I can certainly imagine that such an attack could be successful here.
Hence I do recommend that you use a stronger hash function than MD5 here. I also recommend that you use HMAC instead of just md5(M+T+P), because HMAC has been designed for the situation that you describe and has accordingly been analyzed.
There is nothing insecure about using MD5 in this manner. MD5 was only broken in the sense that, there are algorithms that, given a bunch of data A additional data B can be generated to create a desired hash. Meaning, if someone knows the hash of a password, they could produce a string that will result with that hash. Though, these generated strings are usually very long so if you limit passwords to 20 or 30 characters you're still probably safe.
The main reason to use SHA1 over MD5 is that MD5 functions are being phased out. For example the Silverlight .Net library does not include the MD5 cryptography provider.
MD5 provide more collision than SHA which mean someone can actually get same hash from different word (but it's rarely).
SHA family has been known for it's reliability, SHA1 has been standard on daily use, while SHA256/SHA512 was a standard for government and bank appliances.
For your personal website or forum, i suggest you to consider SHA1, and if you create a more serious like commerce, i suggest you to use SHA256/SHA512 (SHA2 family)
You can check wikipedia article about MD5 & SHA
Both MD5 amd SHA-1 have cryptographic weaknesses. MD4 and SHA-0 are also compromised.
You can probably safely use MD6, Whirlpool, and RIPEMD-160.
See the following powerpoint from Princeton University, scroll down to the last page.
http://gcu.googlecode.com/files/11Hashing.pdf
I'm not going to comment on the MD5/SHA1/etc. issue, so perhaps you'll consider this answer moot, but something that amuses me very slightly is whenever the use of MD5 et al. for hashing passwords in databases comes up.
If someone's poking around in your database, then they might very well want to look at your password hashes, but it's just as likely they're going to want to steal personal information or any other data you may have lying around in other tables. Frankly, in that situation, you've got bigger fish to fry.
I'm not saying ignore the issue, and like I said, this doesn't really have much bearing on whether or not you should use MD5, SHA1 or whatever to hash your passwords, but I do get tickled slightly pink every time I read someone getting a bit too upset about plain text passwords in a database.

Resources