How to disable a Linux entropy pool source

How to disable a Linux entropy pool source - linux

How do I disable entropy sources?
Here's a little background on what I'm trying to do. I'm building a little RNG device that talks to my PC via USB. I want it to be the only source of entropy used. I'll use rngd to add my device as a source of entropy.

Quick answer is "you don't".
Don't ever remove sources of entropy. The designers of the random number generator rigged it so any new random bits just get mixed in with the current state.
Having multiple sources of entropy never weaken the random number generator's output, only strengthen it.
The only reason I can think to remove a source of entropy is that it sucks CPU or wall-clock time that you cannot afford. I find this highly unlikely but if this is the case, then your only option is kernel hacking. As far as hacking the kernel goes, this should be fairly simple. Just comment out all the calls to the add_*_randomness() functions throughout the kernel source code (the functions themselves are found in drivers/char/random.c). You could just comment out the contents of the functions but you are trying to save time in this case and the minuscule time the extra function call takes could be too much.

One solution is to to run separate linux instance in a virtual machine.

Additional note, too big for comment:
Depending on its settings, rngd can dominate the kernel's entropy pool,
by feeding it so much data, so often, that other sources of entropy are
mostly ignored or lost. Do not to that unless you trust rngd's source
of random data ultimately.
http://man.he.net/man8/rngd

I suspect you might want a fast random generator.
Edit I should have read the question better
Anyways, frandom comes with a complete tarball for the kernel module so you might be able to learn how to build your own module around your USB device. Perhaps, you can even have it replace/displace /dev/urandom so arbitrary applications would work with it instead of /dev/urandom (of course, given enough permissions, you could just rename the device nodes and 'fool' most applications).
You could look at http://billauer.co.il/frandom.html, which implements that.
Isn't /dev/urandom enough?
Discussions about the necessity of a faster kernel random number generator rise and fall since 1996 (that I know of). My opinion is that /dev/frandom is as necessary as /dev/zero, which merely creates a stream of zeroes. The common opposite opinion usually says: Do it in user space.
What's the difference between /dev/frandom and /dev/erandom?
In the beginning I wrote /dev/frandom. Then it turned out that one of the advantages of this suite is that it saves kernel entropy. But /dev/frandom consumes 256 bytes of kernel random data (which may, in turn, eat some entropy) every time a device file is opened, in order to seed the random generator. So I made /dev/erandom, which uses an internal random generator for seeding. The "F" in frandom stands for "fast", and "E" for "economic": /dev/erandom uses no kernel entropy at all.
How fast is it?
Depends on your computer and kernel version. Tests consistently show 10-50 times faster than /dev/urandom.
Will it work on my kernel?
It most probably will, if it's > 2.6
Is it stable?
Since releasing the initial version in fall 2003, at least 100 people have tried it (probably many more) on i686 and x86_64 systems alike. Successful test reports have arrived, and zero complaints. So yes, it's very stable. As for randomness, there haven't been any complaints either.
How is random data generated?
frandom is based on the RC4 encryption algorithm, which is considered secure, and is used by several applications, including SSL. Let's start with how RC4 works: It takes a key, and generates a stream of pseudo-random bytes. The actual encryption is a XOR operation between this stream of bytes and the cleartext data stream.
Now to frandom: Every time /dev/frandom is opened, a distinct pseudo-random stream is initialized by using a 2048-bit key, which is picked by doing something equivalent to reading the key from /dev/urandom. The pseudo-random stream is what you read from /dev/frandom.
frandom is merely RC4 with a random key, just without the XOR in the end.
Does frandom generate good random numbers?
Due to its origins, the random numbers can't be too bad. If they were, RC4 wouldn't be worth anything.
As for testing: Data directly "copied" from /dev/frandom was tested with the "Diehard" battery of tests, developed by George Marsaglia. All tests passed, which is considered to be a good indication.
Can frandom be used to create one-time pads (cryptology)?
frandom was never intended for crypto purposes, nor was any special thought given to attacks. But there is very little room for attacking the module, and since the module is based upon RC4, we have the following fact: Using data from /dev/frandom as a one-time pad is equivalent to using the RC4 algorithm with a 2048-bit key, read from /dev/urandom.
Bottom line: It's probably OK to use frandom for crypto purposes. But don't. It wasn't the intention.

Related

Why does my linux entropy have such a low upper limit?

I noticed I was getting poor performance when running cryptographic operations.
I ran cat /proc/sys/kernel/random/entropy_avail.
After some testing using watch and my own observations I can see that my entropy levels never surpass 200!
Even when I generate entropy using mouse movements etc. (when my computer is completely idle) it briefly surpasses 200 then suddenly dips back down below it for no reason.
Why is this and how do I fix it?

Perhaps the entropy-accumulating system has only about 200 bits of state, and simply cannot get more "unknown" than that. The people most concerned about having enough entropy tend to be cryptologists, and 200 bits of entropy is plenty for most (maybe all?) cryptographic applications.

You can substantially improve the available entropy with haveged. It may be already included in your distribution. Centos/Redhat users can install it from the epel repository.
Haveged was created to remedy low-entropy conditions in the Linux
random device that can occur under some workloads, especially on
headless servers.

Don't worry about it. 200 bits of entropy is more than enough.
Here's a quote from RFC 4086 (Randomness Requirements for Security):
3.1. Volume Required
How much unpredictability is needed? Is it possible to quantify the
requirement in terms of, say, number of random bits per second?
The answer is that not very much is needed. For AES, the key can be
128 bits, and, as we show in an example in Section 8, even the
highest security system is unlikely to require strong keying material
of much over 200 bits.

Is there a chance of reading 16-bytes /dev/urandom data twice, and getting same result?

Working with Linux 3.2, I would like to implement a UID algorithm using /dev/urandom.
There may be a chance of reading 16 random bytes twice, and getting the same result. But is the chance small enough to be negligible?

/dev/urandom is supposed to be a random device that should look uniformly random, and in a uniformly random sequence you would expect to find repeated patterns. However, since there are 2128 possible 16-byte sequences, this should happen with probability 2-128, which is vanishingly small.
That said, /dev/urandom is not known to be cryptographically safe and there may be attacks that aren't in the open literature to force the behavior to degenerate (perhaps some government agency knows how to do this, for example). From the man pages:
A read from the /dev/urandom device will not block waiting for more
entropy. As a result, if there is not sufficient entropy in the
entropy pool, the returned values are theoretically vulnerable to a
cryptographic attack on the algorithms used by the driver. Knowledge
of how to do this is not available in the current unclassified
literature, but it is theoretically possible that such an attack may
exist. If this is a concern in your application, use /dev/random
instead.
(My emphasis) Therefore, I wouldn't rely on this if you are trying to go for cryptographic security.
In short, if you just need random values, this is probably fine. If you want to go for cryptographic security, I would not recommend doing this.
Hope this helps!

you have a 1/2^128 chance of reading the same data, so yes - the probability is very negligible. Roughly the same probability of breaking the AES128 encryption scheme.

Assuming the values are perfectly random, due to the Birthday Paradox the probability is approximately 2-64 (the square root of getting any particular value). That is, at about 264 UIDs, the probability to find a pair becomes greater than 50%.
For most applications that should be fine.

What (else) is wrong with using time as a seed for random number generation?

I understand that time is an insecure seed for random number generation because it effectively reduces the size of the seed space.
But say I don't care about security. For example, say I'm doing a Monte Carlo simulation for a card game. I DO however, care about getting as close to true randomness as possible. Will time as a seed affect the randomness of my output? I would think the choice of PRNG matters more than the seed in this case.

For security purposes you obviously need a high entropy seed. And time alone cannot provide that.
For simulation purposes the quality of the seed doesn't matter much, as long as it's unique. As you noted the quality of the PRNG is more important here.
Even a PRNG in a game may need to be secure. For example in multiplayer games a player might find out the internal state of the PRNG and use that to predict future random events, guess the opponent cards, get better loot,...
One common pitfall using time to seed a PRNG is that the time doesn't change very often. For example on windows most time related functions only change their return value every few milliseconds. So all PRNGs created withing that interval will return the same sequence.

Just for the sake of completeness, this paper by Matsumoto et al. nicely illustrates how important the initialization scheme (ie. the way of choosing your seed(s)) is for simulation. Turns out a bad initialization scheme may strongly bias the results, even though the RNG algorithm as such is rather good in principle.

If you are just running a single instance of your program, then there should not be too many problems.
However I have seen people who starts multiple programs at the same time and then each program seed with time. In that case all the program gets the same sequence of random numbers -- In particular I have seen people seeding an apache process at each call to use a random numer as session-id, only to find that different people hitting the webserver at the same time get exactly the same IDs.
Hence if you are expecting to run multiple simultanous version of the program, then using time is a very bad idea.

Think that your program runs very fast and asks for the system's time to use as a seed in a great sequence, with a very few interval. You could get the same time as the answer, so it would end up generating the same random number. So, even in a simulation, a low-entropy can be a problem.
Considering that it's not that hard to have some other sources of entropy in your system, ot that even your operating system can provide you some almost-random numbers, you could use them to increase the entropy of your time-based-seed.

is 1024 bit rsa secure

Is 1024 bit rsa secure, or is it crackable now? Is it safe for my program to use 1024 bit rsa? I read at http://pcworld.about.com/od/privacysecurity1/Researcher-RSA-1024-bit-encry.htm that 1024 bit encryption is unsecure, but I find 2048 bit slower, and also I see that various https sites (even paypal) use 1024 bit encryption. Is 1024 bit encryption secure enough?

Last time I checked, NIST recommends 2048-bit RSA and predicts that it will remain secure until 2030. Page 67 of this PDF has the table.
Edit: They actually predict 1024-bit is OK until 2010, then 2048-bit until 2030, then 3072-bit after that. And it's NIST, not the NSA. Been too long since I did my thesis, LOL.

What are you trying to protect? If you are encrypting something that is not terribly vital, then 1024 may be fine, but, if you are protecting something that is very vital, such as someone's medical or financial info then 4096 bits would be better.
The size of the key really depends on what you are protecting, and how long you expect the encryption to hold. If your timeframe is that the info is only valid for 10 mins then 1024 works fine, for 10 years of protection it isn't.
So, what are you protecting?

There is no easy answer to the question "is size n secure ?" because it depends on the resources of an expected attacker. This has two parts:
Resources that the attacker is willing to invest heavily depend on the situation: defeating your grandmother, a bored computer-science student, or the full secret service of some big, rich country, does not involve the same attack power. It also depends on the perceived value of the protected data.
When designing the system, you want some margin of security, which means that you will make some prophecies on how computing power will evolve in the future, and this raises the difficult question of the notion of cost.
So there are several estimates which have been proposed by various researchers and government institutes. This site offers a survey of such methods, with online calculators so that you may play a bit with some of the input parameters.
Short answer is that if you want short-term security (i.e. security is not relevant beyond, say, year 2015) and 1024 bits are not enough for you, then your enemies must be very powerful indeed. Scarily so. To the point that you should have other, more urgent trouble on your hands.

It is necessary to define the meaning of secure to get a useful answer.
Is your house secure? Mostly we make it "good enough." For example, making it harder to break in than the neighbors is often adequate. That way the thieves spend time trying to break into next door rather than your place.
It might be secure if it requires X hours to break in and the valuable content is worth Y. Converting time to money is tricky, but if it takes a cracker 100 hours of his time to break in, and the contents of your information is worth, say $100, then your data is probably secure enough.

Nothing is going to be totally secure forever. If you're that worried about it, just use 2048-bit and sacrifice speed for better security.
Besides, as the article states:
But determining the prime numbers that make up a huge integer is nearly impossible without lots of computers and lots of time.
It all depends on whether or not you think people will actually try that hard to get at whatever information you're trying to protect.

Found a recent paper addressing exactly this question:
On the Security of 1024-bit RSA and
160-bit Elliptic Curve Cryptography
version 2.1, September 1, 2009
http://eprint.iacr.org/2009/389.pdf

It is said that, currently 1024 bit numbers cannot be factored but, RSA 1024 bit (which is about 310 decimal digits) is not considered secured enough. It is advisable to use RSA with 2048 bit or more, if one needs long term security. There are too many research companies, which are well-funded, doing research and there is a chance that they would not share everything at all. So i think, we can say it is not secure at all. I mean, if one day I happened encrypt an important data, i would prefer 2048 bits or more considering the long term security and the unknown developments in that field.

How does a 7- or 35-pass erase work? Why would one use these methods?

How and why do 7- and 35-pass erases work?
Shouldn't a simple rewrite with all zeroes be enough?

A single pass with zeros doesn't completely erase magnetic artifacts from a disk. It's still possible to recover the data from the drive. A 7-pass erasure using random data will do a pretty complete job to prevent reconstruction of the data on the drive.
Wikipedia has a number of different articles relating to this topic.
http://en.wikipedia.org/wiki/Data_remanence
http://en.wikipedia.org/wiki/Computer_forensics
http://en.wikipedia.org/wiki/Data_erasure

I'd never heard of the 35-part erase: http://en.wikipedia.org/wiki/Gutmann_method
The Gutmann method is an algorithm for
securely erasing the contents of
computer hard drives, such as files.
Devised by Peter Gutmann and Colin
Plumb, it does so by writing a series
of 35 patterns over the region to be
erased. The selection of patterns
assumes that the user doesn't know the
encoding mechanism used by the drive,
and so includes patterns designed
specifically for three different types
of drives. A user who knows which type
of encoding the drive uses can choose
only those patterns intended for their
drive. A drive with a different
encoding mechanism would need
different patterns. Most of the
patterns in the Gutmann method were
designed for older MFM/RLL encoded
disks. Relatively modern drives no
longer use the older encoding
techniques, making many of the
patterns specified by Gutmann
superfluous.[1]
Also interesting:
One standard way to recover data that
has been overwritten on a hard drive
is to capture the analog signal which
is read by the drive head prior to
being decoded. This analog signal will
be close to an ideal digital signal,
but the differences are what is
important. By calculating the ideal
digital signal and then subtracting it
from the actual analog signal it is
possible to ignore that last
information written, amplify the
remaining signal and see what was
written before.

As mentioned before, magnetic artifacts are present from the previous data on the platter.
In a recent issue of MaximumPC they put this to the test. They took a drive, ran it through a pass of all zeros, and hired a data recovery firm to try and recover what they could. Answer: Not one bit was recovered. Their analysis was that unless you expect the NSA to try, a zero pass is probably enough.
Personally, I'd run an alternating pattern or two across it.

one random pass is enough for plausible deniability, as the lost data will have to be mostly "reconstructed" with a margin of error that grows with the length of the data trying to be recovered, as well as whether or not the data is contiguous (most cases, its not).
for the insanely paranoid, three passes is good. 0xAA (10101010), 0x55 (01010101), and then random. the first two will grey out residual bits, the last random pass will obliterate any "residual residual" bits.
never do passes with zeros. under magnetic microscopy the data is still there, its just "faded".
never trust "single file shredding", especially on solid state mediums like flash drives. if you need to "shred" a file, well, "delete" it and fill your drive with random data files until it runs out of space. then next time think twice about housing shred-worthy data on the same medium as "low-clearance" stuff.
the gutmann method is based on tin-foil hat speculation, it does various things to get drives to degauss themselves, which is admirable in an artistic sense, but pragmatically its overkill. no private organisation to-date has successfully recovered data from even a single random pass. and as for big brother, if the DoD considers it gone then you know its gone, the military industrial complex gets all the big bucks to try and do exactly what gutmann claims they can do, and believe you me if they had the tech to do so it would already have been leaked to the private sector since they're all in bed with each other. however if you want to use gutmann in spite of this, check out the secure-delete package for linux.

7 pass and 35 pass would take forever to finish. HIPAA only requires DOD 3-pass overwrite,
and I am not certain why DOD even has a 7 pass overwrite as it seems they just simply
shred the disks before disposing of machines anyway. Theoretically, you could recover
data off of the outer edges of each track (using a scanning electron microscope or
microscopic magnetic probe), but it practice you would need the resources of a disk
drive maker or one of the three letter government organizations to do this.
The reason to perform multipass writes is to take advantage of the slight errors in positioning to overwrite the edges of the track also, making recovery far less likely.
Most drive recovery companies can't recover a drive that has had its data overwritten
even once. They are typically taking advantage of the fact that Windows doesn't zero out the data blocks, just changes the directory to mark the space free. They simply 'undelete'
the file and make it visable again.
If you don't believe me, call them up and ask them if they can recover a disk
that has been dd'ed over... they will typically tell you no, and if they do agree to try, it will be serious $$$ to get it back...
DOD 3 pass followed by a zero overwrite should be more than sufficent for most (i.e.
non- TOP SECRET) folks.
DBAN (and its commercially supported decendent, EBAN) do this all cleanly... I would
recommed these.

See: Secure Deletion of Data from Magnetic and Solid-State Memory

Advanced recovery tools can recover single pass deleted files easily. And they are expensive too (e.g http://accessdata.com/).
A visual GUI for Gutmann passes from http://sourceforge.net/projects/gutmannmethod/ shows it has 8 semi random passes. I never seen a proof that files deleted by Gutmann been recovered.
An overkill, maybe, still far better that Windows soft delete.

Regarding the second part of the question, some of the answers here actually contradict real research on that exact atopic. According the the Number of overwrites needed of the Data erasure article on wikipedia, on modern drives, erasing with more than one pass is redundant:
"ATA disk drives manufactured after 2001 (over 15 GB) clearing by
overwriting the media once is adequate to protect the media from both
keyboard and laboratory attack." (citation)
Also, infosec did a nice article entitled "The Urban Legend of Multipass Hard Disk Overwrite", on the entire subject, talking about the old USA Government erasure standards, among others, of how the multi-pass myth established itself in the industry.
"Fortunately, several security researchers presented a paper [WRIG08]
at the Fourth International Conference on Information Systems Security
(ICISS 2008) that declares the “great wiping controversy” about how
many passes of overwriting with various data values to be settled:
their research demonstrates that a single overwrite using an arbitrary
data value will render the original data irretrievable even if MFM and
STM techniques are employed.
The researchers found that the probability of recovering a single bit
from a previously used HDD was only slightly better than a coin toss,
and that the probability of recovering more bits decreases
exponentially so that it quickly becomes close to zero.
Therefore, a single pass overwrite with any arbitrary value (randomly
chosen or not) is sufficient to render the original HDD data
effectively irretrievable."

There's a lot of misinformation around this, though most of the answers I see on this page are correct. I've worked in the data recovery industry for 25 years and have addressed this exact question an enormous number of times.
The "residual magnetism" hypothesis never worked in real life. And back then, tolerances were millions of times looser.
If you still doubt this, remember that a rotational hard drive uses the same storage principle as an audio tape - moving magnetic substrate storage - and the audio tape that was recorded over a single time in the Watergate case has still not been recovered.
A single zero-pass wipe renders all the data on a HDD unrecoverable unless some malfunction or mistake causes the overwrite to be incomplete. This was true even back in the days when Peter Gutmann released his paper (which was like a tsunami in the erasure industry.) Gutmann's paper was pure hypothesis, it never panned out in reality. Even in the days of MFM/RLL drives, nobody could recover from a single-pass overwrite. It should be noted that Gutmann patented the algorithm that his paper said would be required to ensure complete erasure. Presumably, every time erasure was sold with his algorithm, he got paid. I am not saying there was intentional deception on his part, just pointing out that his algorithm, though there was never any evidence it erased better than a single overwrite, was patented and sold.
Please note that SSDs are different. SSDs can (and often do) use a pool of sectors that are rotated in and out of use, so if data is written to an SSD and then "deleted" and the drive rotates the sectors on which the deleted file is on out of the pool, an erasure might not be able to reach those sectors because the firmware in the SSD has control that software can't override. One way around this is to continuously overwrite until all sectors have been rotated into use.
The reason multiple passes exist is because hardware can malfunction. If the drive somehow malfunctions during one pass, it's possible that not all sectors will be erased - however, most good erasure software offers a full verification, which basically reads every bit on the drive to make sure the erasure didn't malfunction. With that, multi-pass overwrites are overkill.
And sometimes, data is so sensitive, it makes sense to go overboard in making sure it's destroyed. For example, I heard about a drive that was erased by the military with a 7-pass zero-fill, then the drive was run over by a tank, and then the remains were buried in a secret location in a highly secured area. Practically, the recoverability is about the same as a single-pass overwrite, but if lives could be lost as a result of the data falling into the wrong hands, then why not go for the overkill?

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string