What (else) is wrong with using time as a seed for random number generation? - security

I understand that time is an insecure seed for random number generation because it effectively reduces the size of the seed space.
But say I don't care about security. For example, say I'm doing a Monte Carlo simulation for a card game. I DO however, care about getting as close to true randomness as possible. Will time as a seed affect the randomness of my output? I would think the choice of PRNG matters more than the seed in this case.

For security purposes you obviously need a high entropy seed. And time alone cannot provide that.
For simulation purposes the quality of the seed doesn't matter much, as long as it's unique. As you noted the quality of the PRNG is more important here.
Even a PRNG in a game may need to be secure. For example in multiplayer games a player might find out the internal state of the PRNG and use that to predict future random events, guess the opponent cards, get better loot,...
One common pitfall using time to seed a PRNG is that the time doesn't change very often. For example on windows most time related functions only change their return value every few milliseconds. So all PRNGs created withing that interval will return the same sequence.

Just for the sake of completeness, this paper by Matsumoto et al. nicely illustrates how important the initialization scheme (ie. the way of choosing your seed(s)) is for simulation. Turns out a bad initialization scheme may strongly bias the results, even though the RNG algorithm as such is rather good in principle.

If you are just running a single instance of your program, then there should not be too many problems.
However I have seen people who starts multiple programs at the same time and then each program seed with time. In that case all the program gets the same sequence of random numbers -- In particular I have seen people seeding an apache process at each call to use a random numer as session-id, only to find that different people hitting the webserver at the same time get exactly the same IDs.
Hence if you are expecting to run multiple simultanous version of the program, then using time is a very bad idea.

Think that your program runs very fast and asks for the system's time to use as a seed in a great sequence, with a very few interval. You could get the same time as the answer, so it would end up generating the same random number. So, even in a simulation, a low-entropy can be a problem.
Considering that it's not that hard to have some other sources of entropy in your system, ot that even your operating system can provide you some almost-random numbers, you could use them to increase the entropy of your time-based-seed.

Related

Is there a chance of reading 16-bytes /dev/urandom data twice, and getting same result?

Working with Linux 3.2, I would like to implement a UID algorithm using /dev/urandom.
There may be a chance of reading 16 random bytes twice, and getting the same result. But is the chance small enough to be negligible?
/dev/urandom is supposed to be a random device that should look uniformly random, and in a uniformly random sequence you would expect to find repeated patterns. However, since there are 2128 possible 16-byte sequences, this should happen with probability 2-128, which is vanishingly small.
That said, /dev/urandom is not known to be cryptographically safe and there may be attacks that aren't in the open literature to force the behavior to degenerate (perhaps some government agency knows how to do this, for example). From the man pages:
A read from the /dev/urandom device will not block waiting for more
entropy. As a result, if there is not sufficient entropy in the
entropy pool, the returned values are theoretically vulnerable to a
cryptographic attack on the algorithms used by the driver. Knowledge
of how to do this is not available in the current unclassified
literature, but it is theoretically possible that such an attack may
exist. If this is a concern in your application, use /dev/random
instead.
(My emphasis) Therefore, I wouldn't rely on this if you are trying to go for cryptographic security.
In short, if you just need random values, this is probably fine. If you want to go for cryptographic security, I would not recommend doing this.
Hope this helps!
you have a 1/2^128 chance of reading the same data, so yes - the probability is very negligible. Roughly the same probability of breaking the AES128 encryption scheme.
Assuming the values are perfectly random, due to the Birthday Paradox the probability is approximately 2-64 (the square root of getting any particular value). That is, at about 264 UIDs, the probability to find a pair becomes greater than 50%.
For most applications that should be fine.

Quickest and easiest algorithm for comparing the frequency content of two sounds

I want to take two sounds that contain a dominant frequency and say 'this one is higher than this one'. I could do FFT, find the frequency with the greatest amplitude of each and compare them. I'm wondering if, as I have a specific task, there may be a simpler algorithm.
The sounds are quite dirty with many frequencies, but contain a clear dominant pitch. They aren't perfectly produced sine waves.
Given that the sounds are quite dirty, I would suggest starting to develop the algorithm with the output of an FFT as it'll be much simpler to diagnose any problems. Then when you're happy that it's working you can think about optimising/simplifying.
As a rule of thumb when developing this kind of numeric algorithm, I always try to operate first in the most relevant domain (in this case you're interested in frequencies, so analyse in frequency space) at the start, and once everything is behaving itself consider shortcuts/optimisations. That way you can test the latter solution against the best-performing former.
In the general case, decent pitch detection/estimation generally requires a more sophisticated algorithm than looking at FFT peaks, not a simpler algorithm.
There are a variety of pitch detection methods ranging in sophistication from counting zero-crossing (which obviously won't work in your case) to extremely complex algorithms.
While the frequency domain methods seems most appropriate, it's not as simple as "taking the FFT". If your data is very noisy, you may have spurious peaks that are higher than what you would consider to be the dominant frequency. One solution is use window overlapping segments of your signal, and do STFTs, and average the results. But this raises more questions: how big should the windows be? In this case, it depends on how far apart you expect those dominant peaks to be, how long your recordings are, etc. (Note: FFT methods can resolve to better than one-bin size by taking into account phase information. In this case, you would have to do something more complex than averaging all your FFT windows together).
Another approach would be a time-domain method, such as YIN:
http://recherche.ircam.fr/equipes/pcm/cheveign/pss/2002_JASA_YIN.pdf
Wikipedia discusses some more methods:
http://en.wikipedia.org/wiki/Pitch_detection_algorithm
You can also explore some more methods in chapter 9 of this book:
http://www.amazon.com/DAFX-Digital-Udo-ouml-lzer/dp/0471490784
You can get matlab sourcecode for yin from chapter 9 of that book here:
http://www2.hsu-hh.de/ant/dafx2002/DAFX_Book_Page_2nd_edition/matlab.html

The best choice for random number generator

There are so many randomizers out there. Some standard ones are questionably slow. Some claim to be of high quality and speed. Some claim to be of higher quality. Some claim to be even more fast and of better quality. Some claim the speed but quality.
One fact I know is that mwc-random is being used by the Criterion benchmarking library which speaks for itself and the claims are very promising.
Since there are at least two qualities to every generator: the robustness and the quality of the generated number - I'll split the question of choosing the best generator into three categories:
The fastest
The one generating the most random number
The one having the optimal combination of both of these qualities at adequate rate
So which is which and why?
I only can speak about mwc-random.
It is fast ~15ns per Word32 on Phenom II. If you want to measure how fast is it on your computer it comes with benchmark set. Still it possible to trade period for speed. Xorshift RNG should be faster but have shorter periods 2^32 or 2^64 instead of 2^8222.
Randomness. mwc-random uses algorithm MWC256 (another name: MWC8222) which is not cryptographicaly secure but fares well in randomness tests. In particular mwc-random passes dieharder randomness test.

How large is the average delay from key-presses

I am currently helping someone with a reaction time experiment. For this experiment reaction times on the keyboard are measured. For this experiment it might be important to know, how much error could be introduced because of the delay between the key-press and the processing in the software.
Here are some factors that I found out using google already:
The USB-bus is polled at 125Hz at minimum and 1000Hz at maximum (depending on settings, see this link).
There might be some additional keyboard buffers in Windows that might delay the keypresses further, but I do not know about the logic behind those.
Unfortunately it is not possible to control the low level logic of the experiment. The experiment is written in E-Prime a software that is often used for this kind of experiments. However the company that offers E-Prime also offers additional hardware, that they advertise for precise reaction-timing. Hence they seem to be aware about this effect (but do not tell how large it is).
Unfortunately it is necessary to use a standart keyboard, so I need to provide ways to reduce the latency.
any latency from key presses can be attributed to the debounce routine (i usually use 30ms to be safe) and not to the processing algorithms themselves (unless you are only evaluating the first press).
If you are running an experiment where millisecond timing is important you may want to use http://www.blackboxtoolkit.com/ to find sources of error.
Your needs also depend on the nature of your study. I've run RT experiments in Eprime with a keyboard. Since any error should be consistent on average across participants, for some designs it is not a big problem. If you need to sync up the data though with something else (like Eye tracking or EEG) or want to draw conclusions about RT where specific magnitude is important then E-Primes serial resp box (or another brand, though I have had compatibility issues in the past with other brand boxes and eprime) is a must.

How to disable a Linux entropy pool source

How do I disable entropy sources?
Here's a little background on what I'm trying to do. I'm building a little RNG device that talks to my PC via USB. I want it to be the only source of entropy used. I'll use rngd to add my device as a source of entropy.
Quick answer is "you don't".
Don't ever remove sources of entropy. The designers of the random number generator rigged it so any new random bits just get mixed in with the current state.
Having multiple sources of entropy never weaken the random number generator's output, only strengthen it.
The only reason I can think to remove a source of entropy is that it sucks CPU or wall-clock time that you cannot afford. I find this highly unlikely but if this is the case, then your only option is kernel hacking. As far as hacking the kernel goes, this should be fairly simple. Just comment out all the calls to the add_*_randomness() functions throughout the kernel source code (the functions themselves are found in drivers/char/random.c). You could just comment out the contents of the functions but you are trying to save time in this case and the minuscule time the extra function call takes could be too much.
One solution is to to run separate linux instance in a virtual machine.
Additional note, too big for comment:
Depending on its settings, rngd can dominate the kernel's entropy pool,
by feeding it so much data, so often, that other sources of entropy are
mostly ignored or lost. Do not to that unless you trust rngd's source
of random data ultimately.
http://man.he.net/man8/rngd
I suspect you might want a fast random generator.
Edit I should have read the question better
Anyways, frandom comes with a complete tarball for the kernel module so you might be able to learn how to build your own module around your USB device. Perhaps, you can even have it replace/displace /dev/urandom so arbitrary applications would work with it instead of /dev/urandom (of course, given enough permissions, you could just rename the device nodes and 'fool' most applications).
You could look at http://billauer.co.il/frandom.html, which implements that.
Isn't /dev/urandom enough?
Discussions about the necessity of a faster kernel random number generator rise and fall since 1996 (that I know of). My opinion is that /dev/frandom is as necessary as /dev/zero, which merely creates a stream of zeroes. The common opposite opinion usually says: Do it in user space.
What's the difference between /dev/frandom and /dev/erandom?
In the beginning I wrote /dev/frandom. Then it turned out that one of the advantages of this suite is that it saves kernel entropy. But /dev/frandom consumes 256 bytes of kernel random data (which may, in turn, eat some entropy) every time a device file is opened, in order to seed the random generator. So I made /dev/erandom, which uses an internal random generator for seeding. The "F" in frandom stands for "fast", and "E" for "economic": /dev/erandom uses no kernel entropy at all.
How fast is it?
Depends on your computer and kernel version. Tests consistently show 10-50 times faster than /dev/urandom.
Will it work on my kernel?
It most probably will, if it's > 2.6
Is it stable?
Since releasing the initial version in fall 2003, at least 100 people have tried it (probably many more) on i686 and x86_64 systems alike. Successful test reports have arrived, and zero complaints. So yes, it's very stable. As for randomness, there haven't been any complaints either.
How is random data generated?
frandom is based on the RC4 encryption algorithm, which is considered secure, and is used by several applications, including SSL. Let's start with how RC4 works: It takes a key, and generates a stream of pseudo-random bytes. The actual encryption is a XOR operation between this stream of bytes and the cleartext data stream.
Now to frandom: Every time /dev/frandom is opened, a distinct pseudo-random stream is initialized by using a 2048-bit key, which is picked by doing something equivalent to reading the key from /dev/urandom. The pseudo-random stream is what you read from /dev/frandom.
frandom is merely RC4 with a random key, just without the XOR in the end.
Does frandom generate good random numbers?
Due to its origins, the random numbers can't be too bad. If they were, RC4 wouldn't be worth anything.
As for testing: Data directly "copied" from /dev/frandom was tested with the "Diehard" battery of tests, developed by George Marsaglia. All tests passed, which is considered to be a good indication.
Can frandom be used to create one-time pads (cryptology)?
frandom was never intended for crypto purposes, nor was any special thought given to attacks. But there is very little room for attacking the module, and since the module is based upon RC4, we have the following fact: Using data from /dev/frandom as a one-time pad is equivalent to using the RC4 algorithm with a 2048-bit key, read from /dev/urandom.
Bottom line: It's probably OK to use frandom for crypto purposes. But don't. It wasn't the intention.

Resources