Haskell digest libraries - haskell

I need to do some user authentication stuff that involves storing password digest.
I chose sha256 but md5 would do the trick just fine as it is just a learning project and security is not a big deal.
My question is about hundreds of different crypto and hashing libraries and keeping sanity to chose the right one.
I've been through hackage: some libraries fast but not "pure", some "pure but not fast ... and so on with other advantages and disadvantages.
What would you guys use for a sha256 password hashing?
For instance, I found Crypto.Hash.SHA256 and Data.Digest.Pure.SHA.
Which one is more preferable and what is the difference if any?
Thank you.

Data.Digest.Pure.SHA, from Adam's SHA package, is written in only Haskell (hence the 'Pure' in the module name) but is not so fast iirc. Crypto.Hash.SHA256 is from Vincent's cipherhash package and is a binding to a fast C implementation.
There is nothing wrong with the C binding from cipherhash - it isn't impure, it doesn't break referential transparency, it's just that I picked a bad module name when building pureMD5 and that set a precedent.

Related

Do I need MD5 as a companion to SHA-1?

Do I need both MD5 and SHA-1 values to be sure the downloaded file is
a) Untouched by hackers. For example, when I need to download some app's .iso via torrents
and
b) Not corrupted during technical issues? For example, some unstable network connection during download.
Or, probably, SHA-1 value will be enough for both checks?
Also, is SHA-1 (without MD5) enough to be sure that some file downloaded years ago and stored somewhere on my HDD haven't degradated?
From a security perspective MD-5 is utterly broken.
SHA-1 is considered suspicious, and avoided for most uses if at all possible. For new projects: don't use it at all.
SHA-2 (aka SHA-256, SHA-512, etc.) is still widely used for fast hashes.
SHA-3 is the future since 2012, nothing is stopping you from using it already. I see little reason not to use it for new projects.
What's the problem with older ones:
Their resistance to finding collisions is below par: This is an attacker creating 2 contents that have the same hash. These are constructed at the same time. This problem is there for MD5 and SHA-1, and it's BAD, but requires the attacker creating both versions (and then they can do a switch at any time they want undetected).
Their resistance to length extension attacks is relatively weak. This is especially true for MD5, but SHA-1 and even SHA-2 to some degree suffer from it.
When is it not a problem: to ensure your disk has not produced an error: and hash will do, even a simple CRC32 will work wonders (and I'd recommend the simpler CRC check), or a RAID array, as these can fix errors, not just detect them.
Use both ?
Well if you have to find a collision on one hash and have that same set of plaintexts also produce a collision on another hash, is probably more difficult. This approach has been used in the past, The original PGP did something like it. If I'm not mistaken it had a number of things it calculated, one of them simply the length (which would prevent the extension attack above).
So yes, it likely adds something, but the way md5 and SHA-1 and SHA-2 work internally is quite similar, and that's the worrisome part: they are too much alike to be sure just how much it adds against a highly sophisticated attacker (think the level of the NSA and their counterparts).
So why not use one of the more modern versions of SHA-2, or even better SHA-3 ? They've no known weaknesses and have been peer-reviewed heavily. As such for any commercial level use, they should be more than enough.
Refs:
https://en.wikipedia.org/wiki/Length_extension_attack
https://en.wikipedia.org/wiki/Collision_attack
https://stackoverflow.com/questions/tagged/sha-3

Javacard big numbers and modular arithmetic

I am new in the Javacard ecosystem and I was wondering what's the consensus regarding (modular) computations with big numbers in Javacard.
More specifically, I am looking for a lib which supports modular exponentiation and in general modular arithmetic operations between big numbers.
I am aware of BigNumber and ds.ov2.bignat. However, the first one does not provide methods for modular arithmetic.
ds.ov2.bignat seems to be more relevant, but I wasn't sure if it is common prctice to use bignat or there is another more popular lib.
Thanks!emphasized text
The consensus is kind of not to perform modular exponentiation. bignat seems to rely on RSA ops for modular arithmetic. Nowadays this should probably be replaced by DH calculations.
But in general, JC is not really the platform to create your own cryptography. Some platform have vendor specific extensions for users to implement their own cryptography.
Smart cards however rely on many protections against side channel attacks. You'd need a very good understanding about cryptography to implement anything for use "in the field".
Responding to update, as the landscape has changed since the last response:
Indeed, at the time there was no library and the previous responses were correct.
This lack of BigNumbers (and other basic functionality) was very annoying so we actually built the library ourselves.
It realizes lots of things that I needed but couldn't find, including BigNumbers. For people who come across this question in the future you can download it here and see if it helps you: opencryptojc.org

Building A Deduplication Application For OS X, What/How Should I Use As The Hash For Files

I am about to embark on a programming journey, which undoubtedly will end in failure and/or throwing my mouse through my Mac, but it's an interesting problem.
I want to build an app, which scans starting at some base directory and recursively loops down through each file, and if it finds an exact duplicate file, it deletes it, and makes a symbolic link in its place. Basically poor mans deduplication. This actually solves a real problem for me, since I have a bunch of duplicate files on my Mac, and I need to free up disk space.
From what I have read, this is the strategy:
Loop through recursively, and generate a hash for each file. The hash need to be extremely unique. This is the first problem. What hash should I use? How do I run the entire binary contents of each file through this magical hash?
Store each files hash and full-path in a key/value store. I'm thinking redis is an excellent fit because of its speed.
Iterate through the key/value store, find duplicate hashes, delete the duplicate file, create the symbolic link, and flag the row in the key/value store as a copy.
My questions therefore are:
What hashing algorithm should I use for each file? How is this done?
I'm thinking about using node.js because node generally is fast at i/o types of things. The problem is that node sucks at CPU intensive stuff, so the hashing will probably be the bottleneck.
What other gotchas am I missing here?
What hashing algorithm should I use for each file? How is this done?
Use SHA1. Git uses SHA1 to generate unique hash for files. It's almost impossible to have a collision. There is no known collision of standard SHA1.
I'm thinking about using node.js because node generally is fast at i/o types of things. The problem is that node sucks at CPU intensive stuff, so the hashing will probably be the bottleneck.
Your application will have 2 kinds of operation:
Reading file (IO bound).
Calculating hash (CPU bound).
My suggestion is: don't calculate hash in scripting language (Ruby or JavaScript) unless it has native hashing library. You can just invoke other executables such as sha1sum. It's written in C and should be blazing fast.
I don't think you need NodeJS. NodeJS is fast in event-driven IO, but it cannot boost your I/O speed. I don't think you need to implement event-driven IO here.
What other gotchas am I missing here?
My suggestions: Just implement with a language which you are familiar with. Don't over-engineering too early. Optimize it only when you really hit performance issue.
A little late but I used miaout's advice and came up with this...
var exec = require('child_process').exec;
exec('openssl sha1 "'+file+'"', { maxBuffer: (200*10240) }, function(p_err, p_stdout, p_stderr) {
var myregexp = /=\s?(\w*)/g;
var match = myregexp.exec(p_stdout);
fileInfo.hash = "Fake hash";
if (match != null) {
fileInfo.hash = match[1];
}
next()
});
You could use sha1sum but like every other great piece of software it will require something like homebrew to install. Of course you could also compile it yourself if you have the environment for it.

new encryption algorithm for ssh

I am asked to add a new algorithm to ssh so data is ciphered in new algorithm, any idea how to add new algorithm to ssh ?
thanks
It is possible to add some new algorithm to SSH communication, and this is done from time to time (eg. AES was added later). But the question is that you need to modify both client and server so that they both support this algorithm, otherwise it makes no sense.
I assume that you were asked to add some custom, either home-made or non-standard algorithm. So first thing I'd like to do is to warn you that the added algorithm can be weak. You need to perform at least basic search for information about this algorithm, as if it's broken, you will do completely useless and even dangerous work.
As for software modification themselves - it's a rare job to do so most likely you won't find anybody with this experience there. However the code that handles various algorithms is typical and adding new algorithm is trivial - you add one source file with algorithm implementation and then modify a bunch of places by adding one more case to switch statement.
In my career I've worked on a private fork of ssh that was sold as closed-source commercial software. Even they in all their crazy stupidity (private fork? who in their right mind uses non-Open Source encryption software? I thought our customers were completely off their rockers.) didn't add a new encryption algorithm.
It can be done though. Adding the hooks to the ssh protocol to support it isn't hard. The protocol is designed to be extensible in that way. At the beginning the client and server exchange lists of encryption algorithms they're willing to use.
This means, of course, that only a modified client and modified server will talk to eachother.
The real difficulty is OpenSSL. ssh does not use TLS/SSL, but it does use the OpenSSL encryption library. You would have to add the new algorithm to that library, and that library is a terrible beast.
Though, I suppose you could add the algorithm without adding it to OpenSSL. That might be tricky though since I think openssh may rely heavily on the way the OpenSSL APIs work. And part of how they work allows you to pass around a constant representing which algorithm you want to use and then a standard set of calls for encryption and decryption that use the constant to decide on the algorithm.
Again though, if I recall correctly, OpenSSL has an API specifically for adding new algorithms to its suite. So that may not be so hard. You will have to make sure this happens when the OpenSSL library is being initialized.
Anyway, this is a fairly vague answer, but maybe it will point you in the right direction. You should make whoever is doing this pay enormous sums of money. Stupidity that requires this level of knowledge to pull off should never come cheaply.

Pitfalls of cryptographic code

I'm modifying existing security code. The specifications are pretty clear, there is example code, but I'm no cryptographic expert. In fact, the example code has a disclaimer saying, in effect, "Don't use this code verbatim."
While auditing the code I'm to modify (which is supposedly feature complete) I ran across this little gem which is used in generating the challenge:
static uint16 randomSeed;
...
uint16 GetRandomValue(void)
{
return randomSeed++;/* This is not a good example of very random generation :o) */
}
Of course, the first thing I immediately did was pass it around the office so we could all get a laugh.
The programmer who produced this code knew it wasn't a good algorithm (as indicated by the comment), but I don't think they understood the security implications. They didn't even bother to call it in the main loop so it would at least turn into a free running counter - still not ideal, but worlds beyond this.
However, I know that the code I produce is going to similarly cause a real security guru to chuckle or quake.
What are the most common security problems, specific to cryptography, that I need to understand?
What are some good resources that will give me suitable knowledge about what I should know beyond common mistakes?
-Adam
Don't try to roll your own - use a standard library if at all possible. Subtle changes to security code can have a huge impact that aren't easy to spot, but can open security holes. For example, two modified lines to one library opened a hole that wasn't readily apparent for quite some time.
Applied Cryptography is an excellent book to help you understand crypto and code. It goes over a lot of fundamentals, like how block ciphers work, and why choosing a poor cipher mode will make your code useless even if you're using a perfectly implemented version of AES.
Some things to watch out for:
Poor Sources of Randomness
Trying to design your own algorithm or protocol - don't do it, ever.
Not getting it code reviewed. Preferably by publishing it online.
Not using a well established library and trying to write it yourself.
Crypto as a panacea - encrypting data does not magically make it safe
Key Management. These days it's often easier to steal the key with a side-channel attack than to attack the crypto.
Your question shows one of the more common ones: poor sources of randomness. It doesn't matter if you use a 256 bit key if they bits aren't random enough.
Number 2 is probably assuming that you can design a system better than the experts. This is an area where a quality implementation of a standard is almost certainly going to be better than innovation. Remember, it took 3 major versions before SSL was really secure. We think.
IMHO, there are four levels of attacks you should be aware of:
social engineering attacks. You should train your users not to do stupid things and write your software such that it is difficult for users to do stupid things. I don't know of any good reference about this stuff.
don't execute arbitrary code (buffer overflows, xss exploits, sql injection are all grouped here). The minimal thing to do in order to learn about this is to read Writing Secure Code from someone at MS and watching the How to Break Web Software google tech talk. This should also teach you a bit about defense in depth.
logical attacks. If your code is manipulating plain-text, certificates, signatures, cipher-texts, public keys or any other cryptographic objects, you should be aware that handling them in bad ways can lead to bad things. Minimal things you should be aware about include offline&online dictionary attacks, replay attacks, man-in-the-middle attacks. The starting point to learning about this and generally a very good reference for you is http://www.soe.ucsc.edu/~abadi/Papers/gep-ieee.ps
cryptographic attacks. Cryptographic vulnerabilities include:
stuff you can avoid: bad random number generation, usage of a broken hash function, broken implementation of security primitive (e.g. engineer forgets a -1 somewhere in the code, which renders the encryption function reversible)
stuff you cannot avoid except by being as up-to-date as possible: new attack against a hash function or an encryption function (see e.g. recent MD5 talk), new attack technique (see e.g. recent attacks against protocols that send encrypted voice over the network)
A good reference in general should be Applied Cryptography.
Also, it is very worrying to me that stuff that goes on a mobile device which is probably locked and difficult to update is written by someone who is asking about security on stackoverflow. I believe your case would one of the few cases where you need an external (good) consultant that helps you get the details right. Even if you hire a security consultant, which I recommend you to do, please also read the above (minimalistic) references.
What are the most common security problems, specific to cryptography, that I need to understand?
Easy - you(1) are not smart enough to come up with your own algorithm.
(1) And by you, I mean you, me and everyone else reading this site...except for possibly Alan Kay and Jon Skeet.
I'm not a crypto guy either, but S-boxes can be troublesome when messed with (and they do make a difference). You also need a real source of entropy, not just a PRNG (no matter how random it looks). PRNGs are useless. Next, you should ensure the entropy source isn't deterministic and that it can't be tampered with.
My humble advice is: stick with known crypto algorithms, unless you're an expert and understand the risks. You could be better off using some tested, publicly-available open source / public domain code.

Resources