Signature/Hash Choice for File Integrity Verification - security

For a file repository, I need to select a hashing algorithm that will reasonably ensure the integrity of files.
I need an algorithm that anyone (with a bit of effort) would be able to easily use to verify the integrity given the hash. In short, the file may be transferred to the user, along with a hash, and they must be able to verify that the hash comes from the file.
My first choice would be MD5 because there seems to be widely available utilities to verify MD5 hashes, but I'm concerned with the MD5 algorithm being cryptographically broken (ref Wikipedia/US-CERT: http://en.wikipedia.org/wiki/MD5)
My second choice would be a SHA-2 algorithm, but I'm concerned about availability of utilities that could easily verify the hash. Most examples I've found show program code to evaluate a hash, but I've found few, if any, utilities that are pre-built (asking users to build their own utility is beyond the 'easily' scope)
What other options are available for generating and evaluating a file hash, or are these two the options that are best?

Provide both/multiple, and let the user decide which they verify against. Or if they are really cautious, they can verify against both/all.
Have seen download sites use this approach. One site recommended the most secure, but offered others like md5 as fallback. It also provided links to tools. Can't remember specific site I'm afraid.

Since you've been able to find a few file-checkers, why not link to them as a recommendation? That way your users have at least one tool they can use. They don't need several dozen different filechecking utilities, they need just one which works for the algo you chose to use.
Tools you could link to:
Windows: http://securityxploded.com/download-hash-verifier.php
Mac OS X: http://www.macupdate.com/app/mac/31781/checksum

sha256sum, a program a part of the coreutils package on linux will generate checksums for the listed files. The format of the checksum output is the same as that of the md5sum program (but using SHA-256 hashing instead of MD5 of course), which has been widely used for years. You didn't list any target platforms but a quick googling shows there are Windows ports of the command line program.
If you need to generate large numbers of checksums you can use md5deep, which includes support for other hashes as well, including SHA-256.
http://md5deep.sourceforge.net/
I haven't tried this but from the screenshots it looks pretty neat integrating into OSX and Windows Explorer: http://implbits.com/HashTab.aspx

Related

Do I need MD5 as a companion to SHA-1?

Do I need both MD5 and SHA-1 values to be sure the downloaded file is
a) Untouched by hackers. For example, when I need to download some app's .iso via torrents
and
b) Not corrupted during technical issues? For example, some unstable network connection during download.
Or, probably, SHA-1 value will be enough for both checks?
Also, is SHA-1 (without MD5) enough to be sure that some file downloaded years ago and stored somewhere on my HDD haven't degradated?
From a security perspective MD-5 is utterly broken.
SHA-1 is considered suspicious, and avoided for most uses if at all possible. For new projects: don't use it at all.
SHA-2 (aka SHA-256, SHA-512, etc.) is still widely used for fast hashes.
SHA-3 is the future since 2012, nothing is stopping you from using it already. I see little reason not to use it for new projects.
What's the problem with older ones:
Their resistance to finding collisions is below par: This is an attacker creating 2 contents that have the same hash. These are constructed at the same time. This problem is there for MD5 and SHA-1, and it's BAD, but requires the attacker creating both versions (and then they can do a switch at any time they want undetected).
Their resistance to length extension attacks is relatively weak. This is especially true for MD5, but SHA-1 and even SHA-2 to some degree suffer from it.
When is it not a problem: to ensure your disk has not produced an error: and hash will do, even a simple CRC32 will work wonders (and I'd recommend the simpler CRC check), or a RAID array, as these can fix errors, not just detect them.
Use both ?
Well if you have to find a collision on one hash and have that same set of plaintexts also produce a collision on another hash, is probably more difficult. This approach has been used in the past, The original PGP did something like it. If I'm not mistaken it had a number of things it calculated, one of them simply the length (which would prevent the extension attack above).
So yes, it likely adds something, but the way md5 and SHA-1 and SHA-2 work internally is quite similar, and that's the worrisome part: they are too much alike to be sure just how much it adds against a highly sophisticated attacker (think the level of the NSA and their counterparts).
So why not use one of the more modern versions of SHA-2, or even better SHA-3 ? They've no known weaknesses and have been peer-reviewed heavily. As such for any commercial level use, they should be more than enough.
Refs:
https://en.wikipedia.org/wiki/Length_extension_attack
https://en.wikipedia.org/wiki/Collision_attack
https://stackoverflow.com/questions/tagged/sha-3

How to Generate a per-Host UUID in a Shell Script?

I am writing a shell script that will be deployed to multiple machines and connect to a central server. When connecting, the script should identify the machine it is running on. (This is used to implement some rudimentary locking but that's not important for the question.)
I know that I could use the host-name, as reported by hostname -f, to identify the machine. But many personal devices have far from unique host-names, such as my-laptop or workstation so I'm wary of using this.
I might be able to add entropy by hashing the host-name together with some other host-specific information that will not change.
echo $(hostname -f) $(uname -snmpio) | md5sum
But the added entropy is very low. I'm having a hard time thinking of other system properties that can be hashed and are guaranteed not to change. (For example, I don't want to add any properties of the file system or other system configuration1 because it might legitimately change at any time.)
Finally, I thought about generating a random string the first time the script is run and store it in some configuration file. This wold be extremely likely to be unique and guaranteed not to change. But if possible, I'd prefer not having to manage persistent state.
Ideally, there would exist a utility to obtain a deterministic non-volatile UUID for the local system (like blkid for block-devices). It is not required that the UUIDs be hard to forge. This is not an authentication mechanism and I'm trusting all parties that run the script.
Are there any superior options I have overlooked?
1 Technically, the host-name is a system configuration, too. But if we change it, we expect the system to no longer be identified as the one it was before.
How to Generate Version 4 UUIDs
The easiest way to generate a unique identifier is to use a UUID. The most common type of UUID for this purpose is UUID v4, which is generally the correct choice unless you have some specific circumstances (e.g. namespacing requirements or poor sources of entropy) that would lead you to using one of the other UUID types.
You can use uuidgen on Linux, which can be found in the "uuid-runtime" package on Debian-based systems. The uuidgen tool is also installed by default OS X, should you need it. On Linux, the tool relies on libuuid and /dev/random to generate Version 4 UUIDs. If a high-quality random number generator isn't available, uuidgen will fall back on Version 1 time-based UUIDs.
UUIDGEN(1) says:
There are two types of UUIDs which uuidgen can generate: time-based UUIDs and random-based UUIDs. By default uuidgen will generate a random-based UUID if a high-quality random number generator is present. Otherwise, it will choose a time-based UUID. It is possible to force the generation of one of these two UUID types by using the -r or -t options.
As a general rule, if you're able to do so you should definitely stick with Version 4 as Version 1 has known security risks and limitations to its uniqueness properties. However, specific use cases may vary.
In the interest of providing options, you could use the RSA fingerprint of the localhost. Your machines most likely have all the components necessary already configured.
hostkey=$(ssh-keygen -l -f /path/to/host_key.pub)
The output will contain spaces and whatnot, but you could parse those out if it is a problem. The host key is usually in /etc/ssh/ssh_host_* but may depend on the distribution of linux.

new encryption algorithm for ssh

I am asked to add a new algorithm to ssh so data is ciphered in new algorithm, any idea how to add new algorithm to ssh ?
thanks
It is possible to add some new algorithm to SSH communication, and this is done from time to time (eg. AES was added later). But the question is that you need to modify both client and server so that they both support this algorithm, otherwise it makes no sense.
I assume that you were asked to add some custom, either home-made or non-standard algorithm. So first thing I'd like to do is to warn you that the added algorithm can be weak. You need to perform at least basic search for information about this algorithm, as if it's broken, you will do completely useless and even dangerous work.
As for software modification themselves - it's a rare job to do so most likely you won't find anybody with this experience there. However the code that handles various algorithms is typical and adding new algorithm is trivial - you add one source file with algorithm implementation and then modify a bunch of places by adding one more case to switch statement.
In my career I've worked on a private fork of ssh that was sold as closed-source commercial software. Even they in all their crazy stupidity (private fork? who in their right mind uses non-Open Source encryption software? I thought our customers were completely off their rockers.) didn't add a new encryption algorithm.
It can be done though. Adding the hooks to the ssh protocol to support it isn't hard. The protocol is designed to be extensible in that way. At the beginning the client and server exchange lists of encryption algorithms they're willing to use.
This means, of course, that only a modified client and modified server will talk to eachother.
The real difficulty is OpenSSL. ssh does not use TLS/SSL, but it does use the OpenSSL encryption library. You would have to add the new algorithm to that library, and that library is a terrible beast.
Though, I suppose you could add the algorithm without adding it to OpenSSL. That might be tricky though since I think openssh may rely heavily on the way the OpenSSL APIs work. And part of how they work allows you to pass around a constant representing which algorithm you want to use and then a standard set of calls for encryption and decryption that use the constant to decide on the algorithm.
Again though, if I recall correctly, OpenSSL has an API specifically for adding new algorithms to its suite. So that may not be so hard. You will have to make sure this happens when the OpenSSL library is being initialized.
Anyway, this is a fairly vague answer, but maybe it will point you in the right direction. You should make whoever is doing this pay enormous sums of money. Stupidity that requires this level of knowledge to pull off should never come cheaply.

Zip file with passwd security?

We have client server based app which saves user related data into a zip file and sets the passwd to the zip file programatically. Just wondering if it could be considered as secure.
Thanks
N
The "classic" encryption for Zip files is considered to be weak. It is breakable, quickly, by known methods. See: "A Known Plaintext Attack on the PKZIP Stream Cipher" for the original paper, by Biham and Kocher, from 1994. Yes, 16 years ago.
More recently there have been other exploits described, for example, the paper
Yet another Plaintext Attack on ZIP's Encryption Scheme (WinZIP) says that a classic-zip encrypted file with 3 entries, and created by WinZip, can be cracked in 2 hours on a "pentium". This was based on an exploit of a weakness in the random number generator then-current WinZip v9.0 tool. I'm sure it would go much faster now, on current processors, but at the same time, I'm pretty sure WinZip, now at v12.0, has fixed this problem in their random number generator. Nevertheless, even without the specific-to-WinZip-v9 exploit, classic ZIP encryption remains weak.
This weak zip encryption that has been cracked is also known as "ZIP 2.0 encryption" or "PKZIP encryption".
Many modern ZIP toolkits also support AES encryption of ZIP entries. This is considered to be strong encryption, and is quite secure (** See note). WinZip, XCeed, and DotNetZip are three such tools that support reading and writing zip files with this encryption level. Among the three, DotNetZip is the only free option.
You didn't mention the library you use to programmatically produce the zip file. If you use DotNetZip, producing an AES-encrypted ZIP file in C# is as easy as this:
using (var zip = new ZipFile())
{
zip.AddFile("MySensitiveFile.doc");
zip.Encryption = EncryptionAlgorithm.WinZipAes128;
zip.Password = "Very.Secret!";
zip.Save("MyEncryptedArchive.zip");
}
** note: Yoshi has published a paper entitled Attacking and Repairing the WinZip Encryption Scheme, describing exploits of WinZip's AES encryption to argue that WinZip's AES encryption is not secure. However, the exploits he describes rely on social-engineering or previous compromises or both. For example, the primary exploit described in the paper involves an attacker intercepting the encrypted zip file, modifying it, sending the modified copy to its intended recipient, getting the recipient to attempt to decrypt it and then send the result of that encryption back to the attacker, who can then decrypt the original file. This so-called "exploit" involves numerous leaps of faith, piled on the previous compromise of intercepted communication in both directions. No one has described any structural exploits of WinZip AES, on par with the exploits of ZIP classic encryption.
use 7zip, that has better password security - and also tick the 'encrypt filenames' option
Secure to what level? There are programs out there that can crack the password encryption on a zip file very quickly so if it has to withstand any sort of effort, then no.
If it's just a matter of ensuring that someone with a password can open it and to keep away casual prying eyes, then maybe.
If you want to have some halfway reasonably security I'd look into zipping up the data and then running it through proper encryption software like gpg.
You should ask a couple of question to yourself.
Where are you storing the zip files?
Which permissions are associated to the zip file?
Is the password a strong password?
Usually, it's a good habit to store user data into a folder that is out of the webroot, not directly accessible.
Password generators are also available and they should be used.

How can I write a program that can detect by itself that it has been changed?

I need to write a small program that can detect that it has been changed. Please give me a suggestion!
Thank you.
The short answer is to create a hash or key of the program and have the program encrypt and store that key within itself. From time to time the program would make a checksum of itself and compare it against that hash/key. If there is a difference then handle it accordingly.
There are lots and lots of ways to go about this. There are lots of very smart engineers out there that know how to work around it if that is what you are trying to avoid.
The simplest way would be to use a hash function to generate a short code which is a digest of the whole program and then check this.
It would be fairly easy to debug the code and replace the hash value to subvert this.
A better way would be to generate a digital signature using your private key and with the public key in the program to check it.
This would then require changing the public key and the hash as well as understanding the program, or changing the program code itself to subvert the check.
All you can do in the case described so far is make it more difficult to subvert but it will be possible with a certain amount of effort. I'd suggest looking into cryptographic techniques and copy protection for more information to suit your specific case.
Do you mean that program 'foo' should be able to tell if some part of it was modified prior to / during run time? That's not the responsibility of the program, its the responsibility of the security hooks in the target OS.
For instance, if the installed and trusted 'foo' has signature "xyz1234" , the kernel should refuse to run a modified (or completely new) 'foo'. The same goes for 'foo' while its currently running in memory. Look up 'Trusted Path Of Execution', aka TPE to start.
A better question to ask would be how to sign your released version of 'foo', which depends upon your target platform.
try searching for "code signing"
The easiest way would be for the program to detect its own md5 and store that in a separate file, but this isn't totally secure. An MD5 + CRC might work slightly better.
Or as others here have suggested, a sha1, sha2 or sha3 which are much more secure than md5 currently.
I'd ask an external tool to do the check. This problem reminds me of the challenge to write a program that prints itself. In Bash you could do something like this:
#!/bin/bash
cat $0
which really asks for an external tool to do the job. It's kind of solving the problem by getting away from solving the problem...
The best option is going to be code signing -- either using a tool supplied by your local friendly OS (For example, If you're targeting Windows, you probably want to take a look at Authenticode where the Operating System handles the tampering), or by rolling your own option storing MD5 hashes and comparing
It is important to remember that bets are off if someone injects a thread into your process (to potentially kill your ongoing checks, etc.), or if they tamper with your compiled application to bypass said checks.
An alternative way which wasn't mentioned is to use a binary packer such as UPX.
If the binary gets changed on the disk then the unpacking code is likely to fail.
This however doesn't protect you if someone changes the binary while it is in memory.

Resources