How to produce a "securely random" string token in Haskell? - security

I want to produce string tokens to implement the password reset functionality of a web app. They can be v4 UUIDs but it is not required.
I want each token to be "securely random" in the sense of this SO question. The system's entropy pool should be sampled for each generated token.
I found the uuid package, that generates v4 UUIDs. The docs mention that
we use the System.Random StdGen as our random source.
but it is unclear to me if this is enough.
Is there other library that I should use instead?

The requirement that the system entropy pool should be sampled on each generated token sounds suspicious. Entropy is usually a scarce resource.
Let's agree that /dev/random is definitely out. Other processes may need it, it could lead to potential denial of service etc. First and foremost though, /dev/random blocks on read when it deems the entropy pool insufficient so it's a definite no. /dev/urandom on the other hand should be perfectly fine. Some say that it has issues in some odd cases (for example right after boot on diskless machines) but let's no go there.
Note that on most systems /dev/[u]random both use the same algorithm. The difference is that /dev/urandom never blocks but also not necessarily uses any new entropy on each read. So if using /dev/urandom counts as "sampling the system entropy pool" then whatever solution you have that properly uses it is probably fine.
However, it comes at a cost of impurity. Reading from /dev/urandom forces us into IO. Being haskellers we have to at least consider alternatives and we are in luck if only we drop the entropy-sample requirement which seems weird to begin with.
Instead we can use a crypto-secure deterministic RNG and only seed it from /dev/random. ChaChaDRG from cryptonite which is an implementation of DJB's ChaCha would be a good choice. Since it is deterministic we only need IO to get the initial seed. Everything after that is pure.
Cryptonite offers getRandomBytes so you can tweak the length of the token to suit your needs.

Related

What does secrets module do to make perfect random sequences in Python

Now I have a decent knowledge of math, and I know it's possible to create pseudo-random sequences using a specific mathematics algorithm. I also know that in Python, there is a secrets module that apparently can produce random numbers. I tried tweaking around with it a little, but I still don't understand how it's supposed to work. Lets say this piece of code:
import secrets
secret_num = secrets.choice([1, 2, 3])
print(secret_num)
It's supposed to output a perfectly random number. But how is that possible using computers?
The documentation for the secrets module says it produces "cryptographically strong random numbers suitable for managing data such as passwords, account authentication, security tokens, and related secrets". The documentation doesn't specify how it does so exactly.
However, a usual requirement for "cryptographically strong random numbers" is that they should be hard to guess by outside attackers. To this end, the secrets module may rely on the random number generator provided by the operating system (as secrets.SystemRandom does, for example), and how that generator works depends in turn on the operating system. But in general, a random number generator designed for information security ultimately relies on gathering hard-to-guess bits from nondeterministic sources, as further explained in the following question:
How to get truly random data, not random data fed into a PRNG seed like CSRNG's do?

Adler-32 in Multilevel Security Environment Safe Practices

I'm implementing a multilevel security environment on several several web servers running Debian. I've done quite a bit of reading on fast hash checking algorithms to compliment the other security components.
It seems Adler-32 is quite fast and compact (which I'm quite fond of), although I understand it can be 'easily' forged. This aspect of it makes me a bit nervous, so is there some way to safeguard against it being forged somehow?
No. CRCs can also be easily forged. If you are worried about forging (and make sure that you understand why you are worried about forging), then you need to use a cryptographically secure hash. E.g. SHA-2.
No. You cannot condition Adler-32 to be secure, for a simple reason which will affect even a perfect non-reversible hash function you might hope to find.
No 32-bit checksum or digest can be meaningfully resistant to attack because there are only four billion possible outcomes in a modified message. This means that to brute-force a collision takes a comparatively trivial amount of time.
To put this in real-world terms, hashcash is giving me an estimate of 874 seconds to brute-force 32 bits of of a SHA1 digest. Any checksum which you choose for speed is going to get proportionally easier.
That's even before you start to consider potential weaknesses in the algorithm which might yield a more efficient approach than brute force, and the use of GPU computing to accelerate the attack.

Is /dev/random considered truly random?

For instance, could it be used to generate a one-time pad key?
Also, what are its sources and how could it be used to generate a random number between x and y?
Strictly speaking, /dev/random is not really completely random. /dev/random feeds on hardware sources which are assumed to be unpredictible in some way; then it mixes such data using functions (hash functions, mostly) which are also assumed to be one-way. So the "true randomness" of /dev/random is thus relative to the inherent security of the mixing functions, security which is no more guaranteed than that of any other cryptographic primitive, in particular the PRNG hidden in /dev/urandom.
The difference between /dev/random and /dev/urandom is that the former will try to maintain an estimate (which means "a wild guess") of how much entropy it has gathered, and will refuse to output more bits than that. On the other hand, /dev/urandom will happily produce megabytes of data from the entropy it has.
The security difference between the two approaches is meaningless unless you assume that "classical" cryptographic algorithms can be broken, and you use one of the very few information-theoretic algorithms (e.g. OTP or Shamir's secret sharing); and, even then, /dev/random may be considered as more secure than /dev/urandom only if the mixing functions are still considered to be one-way, which is not compatible with the idea that a classical cryptographic algorithm can be broken. So, in practice and even in theory, no difference whatsoever. You can use the output of /dev/urandom for an OTP and it will not be broken because of any structure internal to /dev/urandom -- actual management of the obtained stream will be the weak point (especially long-time storage). On the other hand, /dev/random has very real practical issues, namely that it can block at untimely instants. It is really irksome when an automated OS install blocks (for hours !) because SSH server key generation insists on using /dev/random and needlessly stalls for entropy.
There are many applications which read /dev/random as a kind of ritual, as if it was "better" than /dev/urandom, probably on a karmic level. This is plain wrong, especially when the alea is to be used with classical cryptographic algorithms (e.g. to generate a SSH server public key). Do not do that. Instead, use /dev/urandom and you will live longer and happier. Even for one-time pad.
(Just for completeness, there is a quirk with /dev/urandom as implemented on Linux: it will never block, even if it has not gathered any entropy at all since previous boot. Distributions avoid this problem by creating a "random seed" at installation time, with /dev/random, and using that seed at each boot to initialize the PRNG used by /dev/urandom; a new random seed is regenerated immediately, for next boot. This ensures that /dev/urandom always works over a sufficiently big internal seed. The FreeBSD implementation of /dev/urandom will block until a given entropy threshold is reached, which is safer.)
The only thing in this universe that can be considered truly is one based on quantum effects. Common example is radioactive decay. For certain atoms you can be sure only about half-life, but you can't be sure which nucleus will break up next.
About /dev/random - it depends on implementation. In Linux it uses as entropy sources:
The Linux kernel generates entropy
from keyboard timings, mouse
movements, and IDE timings and makes
the random character data available to
other operating system processes
through the special files /dev/random
and /dev/urandom.
Wiki
It means that it is better than algorithmic random generators, but it is not perfect as well. The entropy may not be distributed randomly and can be biased.
This was philosophy. Practice is that on Linux /dev/random is random enough for vast majority of tasks.
There are implementations of random generators that have more entropy sources, including noise on audio inputs, CPU temperature sensors etc. Anyway they are not true.
There is interesting site where you can get Genuine random numbers, generated by radioactive decay.
/dev/random will block if there's not enough random data in the entropy pool whereas /dev/urandom will not. Instead, /dev/urandom will fall back to a PRNG (kernel docs). From the same docs:
The random number generator [entropy pool] gathers environmental noise from device drivers and other sources into an entropy pool.
So /dev/random is not algorithmic, like a PRNG, but it may not be "truly random" either. Mouse movements and keystroke timings tend to follow patterns and can be used for exploits but you'll have to weigh the risk against your use case.
To get a random number between x and y using /dev/random, assuming you're happy with a 32-bit integer, you could have a look at the way the Java java.util.Random class does it (nextInt()), substituting in appropriate code to read from /dev/random for the nextBytes() method.

Why do you use a random number generator/extractor?

I am dealing with some computer security issues at the school at the moment and I am interested in general programming public preferences, customs, ideas etc. If you have to use a random number generator or extractor, which one do you choose? Why do you choose it? The mathematical properties, already implemented as a package or for what reason? Do you write your own or use some package?
If computational time is no object, then you can't go wrong with Blum Blum Shub (http://en.wikipedia.org/wiki/Blum_blum_shub). Informally speaking, it's at least as secure (hard to predict) as integer factorization.
dev/random, or equivalent on your platform.
It returns bits from an entropy pool fed by device drivers. No need to worry about mathematical properties.
If you're after a cryptographically secure PRNG, then repeated application of a secure hash to a large seed array is generally the way to go. Don't invent your own algorithm, though, go for a version of Fortuna or something else reasonably well reviewed.
The keys for encryption of phone calls between presidents of the USA and USSR were said to be generated from cosmic rays. We checked it in the physics lab at out univercity -- their energies yield true Gaussian distribution. ;-) So for the best encryption you should use these, because such random sequence can not be replayed. Unless, of course, your adversary covertly builds a particle accelerator near your random number generator.
Ah... about computers... Well, acquire a stream that comes from something physical, not computed. /dev/random is an easiest solution, but your hand-made Geiger-counter attached to USB would give the best randomness ever.
For a little school project, I'd use whatever the OS provides for random number generation.
For a serious security application (eg: COMSEC-level encryption), I use a hardware random number generator. Pure algorithms with no hardware access by definition don't produce random numbers.
HotBits.

Self validating binaries?

My question is pretty straightforward: You are an executable file that outputs "Access granted" or "Access denied" and evil persons try to understand your algorithm or patch your innards in order to make you say "Access granted" all the time.
After this introduction, you might be heavily wondering what I am doing. Is he going to crack Diablo3 once it is out? I can pacify your worries, I am not one of those crackers. My goal are crackmes.
Crackmes can be found on - for example - www.crackmes.de. A Crackme is a little executable that (most of the time) contains a little algorithm to verify a serial and output "Access granted" or "Access denied" depending on the serial. The goal is to make this executable output "Access granted" all the time. The methods you are allowed to use might be restricted by the author - no patching, no disassembling - or involve anything you can do with a binary, objdump and a hex editor. Cracking crackmes is one part of the fun, definately, however, as a programmer, I am wondering how you can create crackmes that are difficult.
Basically, I think the crackme consists of two major parts: a certain serial verification and the surrounding code.
Making the serial verification hard to track just using assembly is very possible, for example, I have the idea to take the serial as an input for a simulated microprocessor that must end up in a certain state in order to get the serial accepted. On the other hand, one might grow cheap and learn more about cryptographically strong ways to secure this part. Thus, making this hard enough to make the attacker try to patch the executable should not be tha
t hard.
However, the more difficult part is securing the binary. Let us assume a perfectly secure serial verification that cannot be reversed somehow (of course I know it can be reversed, in doubt, you rip parts out of the binary you try to crack and throw random serials at it until it accepts). How can we prevent an attacker from just overriding jumps in the binary in order to make our binary accept anything?
I have been searching on this topic a bit, but most results on binary security, self verifying binaries and such things end up in articles that try to prevent attacks on an operating system using compromised binaries. by signing certain binaries and validate those signatures with the kernel.
My thoughts currently consist of:
checking explicit locations in the binary to be jumps.
checksumming parts of the binary and compare checksums computed at runtime with those.
have positive and negative runtime-checks for your functions in the code. With side-effects on the serial verification. :)
Are you able to think of more ways to annoy a possible attacker longer? (of course, you cannot keep him away forever, somewhen, all checks will be broken, unless you managed to break a checksum-generator by being able to embed the correct checksum for a program in the program itself, hehe)
You're getting into "Anti-reversing techniques". And it's an art basically. Worse is that even if you stomp newbies, there are "anti-anti reversing plugins" for olly and IDA Pro that they can download and bypass much of your countermeasures.
Counter measures include debugger detection by trap Debugger APIs, or detecting 'single stepping'. You can insert code that after detecting a debugger breakin, continues to function, but starts acting up at random times much later in the program. It's really a cat and mouse game and the crackers have a significant upper hand.
Check out...
http://www.openrce.org/reference_library/anti_reversing - Some of what's out there.
http://www.amazon.com/Reversing-Secrets-Engineering-Eldad-Eilam/dp/0764574817/ - This book has a really good anti-reversing info and steps through the techniques. Great place to start if you're getting int reversing in general.
I believe these things are generally more trouble than they're worth.
You spend a lot of effort writing code to protect your binary. The bad guys spend less effort cracking it (they're generally more experienced than you) and then release the crack so everyone can bypass your protection. The only people you'll annoy are those honest ones who are inconvenienced by your protection.
Just view piracy as a cost of business - the incremental cost of pirated software is zero if you ensure all support is done only for paying customers.
There's TPM technology: tpm on wikipedia
It allows you to store the cryptographic check sums of a binary on special chip, which could act as one-way verification.
Note: TPM has sort of a bad rap because it could be used for DRM. But to experts in the field, that's sort of unfair, and there's even an open-TPM group allowing linux users control exactly how their TPM chip is used.
One of the strongest solutions to this problem is Trusted Computing. Basically you would encrypt the application and transmit the decryption key to a special chip (the Trusted Platform Module), The chip would only decrypt the application once it has verified that the computer is in a "trusted" state: no memory viewers/editors, no debuggers etc. Basically, you would need special hardware to just be able to view the decrypted program code.
So, you want to write a program that accepts a key at the beginning and stores it in memory, subsequently retrieving it from disc. If it's the correct key, the software works. If it's the wrong key, the software crashes. The goal is that it's hard for pirates to generate a working key, and it's hard to patch the program to work with an unlicensed key.
This can actually be achieved without special hardware. Consider our genetic code. It works based on the physics of this universe. We try to hack it, create drugs, etc., and we fail miserably, usually creating tons of undesirable side-effects, because we haven't yet fully reverse engineered the complex "world" in which the genetic "code" evolved to operate. Basically, if you're running everything on an common processor (a common "world"), which everyone has access to, then it's virtually impossible to write such a secure code, as demonstrated by current software being so easily cracked.
To achieve security in software, you essentially would have to write your own sufficiently complex platform, which others would have to completely and thoroughly reverse engineer in order to modify the behavior of your code without unpredictable side effects. Once your platform is reverse engineered, however, you'd be back to square one.
The catch is, your platform is probably going to run on common hardware, which makes your platform easier to reverse engineer, which in turn makes your code a bit easier to reverse engineer. Of course, that may just mean the bar is raised a bit for the level of complexity required of your platform to be sufficiently difficult to reverse engineer.
What would a sufficiently complex software platform look like? For example, perhaps after every 6 addition operations, the 7th addition returns the result multiplied by PI divided by the square root of the log of the modulus 5 of the difference of the total number of subtract and multiply operations performed since system initialization. The platform would have to keep track of those numbers independently, as would the code itself, in order to decode correct results. So, your code would be written based on knowledge of the complex underlying behavior of a platform you engineered. Yes, it would eat processor cycles, but someone would have to reverse engineer that little surprise behavior and re-engineer it into any new code to have it behave properly. Furthermore, your own code would be difficult to change once written, because it would collapse into irreducible complexity, with each line depending on everything that happened prior. Of course, there would be much more complexity in a sufficiently secure platform, but the point is that someone would have reverse engineer your platform before they could reverse engineer and modify your code, without debilitating side-effects.
Great article on copy protection and protecting the protection Keeping the Pirates at Bay:
Implementing Crack Protection for Spyro: Year of the Dragon
The most interesting idea mentioned in there that hasn't yet been mentioned is cascading failures - you have checksums that modify a single byte that causes another checksum to fail. Eventually one of the checksums causes the system to crash or do something strange. This makes attempts to pirate your program seem unstable and makes the cause occur a long way from the crash.

Resources