Genuine Random Numbers with Multithreading - multithreading

A programm can only produce pseudo random numbers, because it is always deterministic. But with Multithreading you get non-determinism, because of all the effects of scheduling/cache/swaps etc.
Could you use this effect to produce real random numbers, because this depends not only on deterministic code, but also on physical phenomina as latency etc.

This is used all the time. If you're reading from /dev/random on many Unix-like systems, you're probably getting "environmental" effects mixed into the entropy. But don't over-read this as "real random numbers." The effects you're describing still only vary over a limited range, and in some cases may vary over a very limited range such that they are very close to deterministic (and "close" is often enough if you build your security around your RNG).
The classic version of this problem is a router booting up and seeding its RNG with latency information from the network. In a very noisy network, this may be pretty random. In a fairly quiet network, this may be very predictable. This is a very real-world problem and difficult to solve in embedded systems.
A lesson from this is to avoid inventing your own RNGs, particularly if you are building security systems on top of them. Research the RNGs your system provides, and use them (or research cryptographic random number generation before embarking on a new solution).
If this subject interests you, random.org has some good introductory materials, along with an implementation based on atmospheric data. (Whether even this is "true random" or just "deterministic based on a state we don't know" is an argument for physicists, but it's about as close as we've got.)

Related

Use SimPy to simulate Chord distributed system

I am doing some research on several distributed systems such as Chord, and I would like to be able to write algorithms and run simulations of the distributed system with just my desktop.
In the simulation, I need to be able to have each node execute independently and communicate with each other, while manually inducing elements such as lag, packet loss, random crashes etc. And then collect data to estimate the performance of the system.
After some searching, I find SimPy to be a good candidate for my purpose.
Would SimPy be a suitable library for this task?
If yes, what are some suggestions/caveats for implementing such a system?
I would say yes.
I used SimPy (version 2) for simulating arbitary communication networks as part of my doctorate. You can see the code here:
https://github.com/IncidentNormal/CommNetSim
It is, however, a bit dense and not very well documented. Also it should really be translated to SimPy version 3, as 2 is no longer supported (and 3 fixes a bunch of limitations I found with 2).
Some concepts/ideas I found to be useful:
Work out what you want out of the simulation before you start implementing it; communication network simulations are incredibly sensitive to small design changes, as you are effectively trying to monitor/measure emergent behaviours from the system.
It's easy to start over-engineering the simulation, using native SimPy objects is almost always sufficient when you strip away the noise from your design.
Use Stores to simulate mediums for transferring packets/payloads. There is an example like this for simulating latency in the SimPy docs: https://simpy.readthedocs.io/en/latest/examples/latency.html
Events are tricky - as they can only fire once per simulation step, so often this can be the source of bugs as behaviour is effectively lost if multiple things fire the same event in a step. For robustness, try not to use them to represent behaviour in communication networks (you rarely need something that low-level), as mentioned above - use Stores instead as these act like queues by design.
Pay close attention to the probability distributions you use to generating randomness. Expovariate distributions are usually closer to simulating natural systems than uniform distributions, but make sure to check every distribution you use for sanity. Generating network traffic usually follows a Poisson distribution, for example, and data volume often follows a Power Law (Pareto) distribution.

What's the overhead of the different forms of parallelism in Julia v0.5?

As the title states, what is the overhead of the different forms of parallelism, at least in the current implementation of Julia (v0.5, in case the implementation changes drastically in the future)? I am looking for some "practical measures", some general heuristics or ballparks to keep in my head for when it can be useful. For example, it's pretty obvious that multiprocessing won't give you gains in a loop like:
addprocs(4)
#parallel (+) for i=1:4
rand()
end
doesn't give you performance gains because each process is only taking one random number, but is there general heuristic for knowing when it will be worthwhile? Also, what about a heuristic for threading. It's surely a lower overhead than multiprocessing, but for example, with 4 threads, for what N is it a good idea to multithread:
A = rand(4)
Base.#threads (+) for i = 1:N
A[i%4+1]
end
(I know there isn't a threaded reduction right now, but let's act like there is, or edit with a better example). Sure, I can benchmark every example, but some good rules to keep in mind would go a long way.
In more concrete terms: what are some good rules of thumb?
How many numbers do you need to be adding/multiplying before threading gives performance enhancements, or before multiprocessing gives performance enhancements?
How much does the depend on Julia's current implementation?
How much does it depend on the number of threads/processes?
How much does the depend on the architecture? Are there good rules for knowing when the threshold should be higher/lower on a particular system?
What kinds of applications violate these heuristics?
Again, I'm not looking for hard rules, just general guidelines to guide development.
A few caveats: 1. I'm speaking from experience with version 0.4.6, (and prior), haven't played with 0.5 yet (but, as I hope my answer below demonstrates, I don't think this is essential vis-a-vis the response I give). 2. this isn't a fully comprehensive answer.
Nevertheless, from my experience, the overhead for multiple processes itself is very small provided that you aren't dealing with data movement issues. In other words, in my experience, any time that you ever find yourself in a situation of wishing something were faster than a single process on your CPU can manage, you're well past the point where parallelism will be beneficial. For instance, in the sum of random numbers example that you gave, I found through testing just now that the break-even point was somewhere around 10,000 random numbers. Anything more and parallelism was the clear winner. Generating 10,000 random number is trivial for modern computers, taking a tiny fraction of a second, and is well below the threshold where I'd start getting frustrated by the slowness of my scripts and want parallelism to speed them up.
Thus, I at least am of the opinion, that although there are probably even more wonderful things that the Julia developers could do to cut down on the overhead even more, at this point, anything pertinent to Julia isn't going to be so much of your limiting factor, at least in terms of the computation aspects of parallelism. I think that there are still improvements to be made in terms of enhancing both the ease and the efficiency of parallel data movement (I like the package that you've started on that topic as a good step. You and I would probably both agree there's still a ways more to go). But, the big limiting factors will be:
How much data do you need to be moving around between processes?
How much read/write to your memory do you need to be doing during your computations? (e.g. flops per read/write)
Aspect 1. might at times lean against using parallelism. Aspect 2. is more likely just to mean that you won't get so much benefit from it. And, at least as I interpret "overhead," neither of these really fall so directly into that specific consideration. And, both of these are, I believe, going to be far more heavily determined by your system hardware than by Julia.

Are limitations of CPU speed and memory prevent us from creating AI systems?

Many technology optimists say that in 15 years the speed of computers will be comparable with the speed of the human brain. This is why they believe that computers will achieve the same level of intelligence as humans.
If Moore's law holds, then every 18 months we should expect doubling of CPU speed. 15 years is 180 months. So, we will have the doubling 10 times. Which means that in 15 years computer will be 1024 times faster than they are now.
But is the speed the reason of the problem? If it is so, we would be able to build an AI system NOW, it would just 1024 times slower than in 15 years. Which means that to answer a question it will need 1024 second (17 minutes) instead of acceptable 1 second. But do we have now strong (but slow) AI system? I think no. Even if now (2015) we give to a system 1 hour instead of 17 minutes, or 1 day, or 1 month or even 1 year, it still will be unable to answer complex questions formulated in natural language. So, it is not the speed that causes problems.
It means that in 15 years our intelligence will not be 1024 faster than now (because we have no intelligence). Instead our "stupidity" will be 1024 times faster than now.
We need both faster hardware and better algorithms. Of course speed alone is not enough as you pointed out.
We need self-modifying meta-learning algorithms capable of creating hypotheses and performing experiments to verify them (like humans do). Systems that are learning to learn and self-improving. Algorithms that can prove that given self-modification is optimal in certain sense and will lead to even better self-modifications in the future. Systems that can reflect on and inspect their own software (can you call it consciousness ?). Such research is being done and may create superhuman intelligence in the future or even technological singularity as some believe.
There is one problem with this approach, though. People doing this research usually assume that consciousness is computable. That it is all about intelligence. They don't take into account experiences like pleasure and pain which have nothing to do (in my opinion) with computation nor intellect. You can understand pain through experience only (not intellectual speculation). Setting variable pleasure to 5 or behaving like one feels pleasure is very different from experiencing pleasure. Some people say that feelings originate in brain so it is enough to understand brain. Not necessarily. Child can ask: "How did they put small people inside TV box ?". Of course TV is just a receiver and there are no small people inside. Brain might be receiver too. Do we need higher knowledge for feelings and other experiences ?
The answer has to be answered in the context of computation and complexity.
Every algorithm has its own complexity and running time (See Big O notation). There are problems which are non-computable problems such as the halting problem. These problems are proven that an algorithm does not exists independent of the hardware.
Computable algorithms are described in the number of steps required with respect to the input to solve an algorithm. As the number of input increases, the execution time of the algorithm also increases. However, these algorithms can be categorized into two: exponential time algorithms and non-exponential time algorithms. Exponential time algorithms increases drastically with the number of input and becomes intractable.
These executing time of these problems can be improved with better hardware however the complexity will always be the same. This means that no matter what the CPU uses, the execution time will always require the same number of steps. This means that the hardware is important to provide an answer in less time but the hardness of the problem will always remain the same. Thus, the limitation of the hardware is not preventing us from creating an AI system. For instance, you can use parallel programming (ex: GPU) to improve the execution time of the algorithm drastically but the algorithm is still the same as a normal CPU algorithm.
I would say no. As you showed, speed is not the only factor of intelligence. I for one would think Language is, yes language. Language is the primary skill we learn as humans, so why not for computers? Language gives an understanding that can be understood across the globe, given you know that language. Humans use nonverbal and verbal language to communicate. But I honestly think it really works something like this:
Humans go through experiences. These experiences have a bigger impact on our lives the closer we are to our birth date, or the more emotional they are. For example, the first time we are told no means ALOT more to us as an infant than as a 70 year old adult. These get stored as either long term or short term memory and correlated to that event later on in life for reference. We mainly store events to learn from them to prevent negative experience or promote positive experiences.
Think of it as a tag cloud. The more often you do task A, the bigger the cloud is in memory. We then store crucial details such as type of emotion, location, smells etc. Now when we reference them again from memory we pick out those details and create a logical sentence:
Touching that stove hurt me when I was at grandma's house.
All of the bolded words would have to be stored to have a complete memory.
Now inside of this sentence we have learned a lot more things than just being hurt from the stove at grandma's house. We have learned that stove's can be hot, dangerous, and grandma allows it to be in her house. We also learned how long it takes to heal from such an event, emotionally and physically to gauge how important the event is. And so much more. So we also store this sub-event information inside of other knowledge bubbles. And these bubbles continue to grow exponentially.
Now when asked: Are stoves dangerous?
You can identify the words in the sentence:
are, stoves, dangerous, question
and reference the definition of dangerous as: hurt, bad
and then provide more evidence that this is true, such as personal experience to result in:
Yes, stoves are dangerous because I was hurt at grandmas house by one.
So intelligence seems to be a mix of events, correlation and data retrieval to solve some solution. I'm sure there's a lot more to it than that but this is just my understanding of intelligence.

Is /dev/random considered truly random?

For instance, could it be used to generate a one-time pad key?
Also, what are its sources and how could it be used to generate a random number between x and y?
Strictly speaking, /dev/random is not really completely random. /dev/random feeds on hardware sources which are assumed to be unpredictible in some way; then it mixes such data using functions (hash functions, mostly) which are also assumed to be one-way. So the "true randomness" of /dev/random is thus relative to the inherent security of the mixing functions, security which is no more guaranteed than that of any other cryptographic primitive, in particular the PRNG hidden in /dev/urandom.
The difference between /dev/random and /dev/urandom is that the former will try to maintain an estimate (which means "a wild guess") of how much entropy it has gathered, and will refuse to output more bits than that. On the other hand, /dev/urandom will happily produce megabytes of data from the entropy it has.
The security difference between the two approaches is meaningless unless you assume that "classical" cryptographic algorithms can be broken, and you use one of the very few information-theoretic algorithms (e.g. OTP or Shamir's secret sharing); and, even then, /dev/random may be considered as more secure than /dev/urandom only if the mixing functions are still considered to be one-way, which is not compatible with the idea that a classical cryptographic algorithm can be broken. So, in practice and even in theory, no difference whatsoever. You can use the output of /dev/urandom for an OTP and it will not be broken because of any structure internal to /dev/urandom -- actual management of the obtained stream will be the weak point (especially long-time storage). On the other hand, /dev/random has very real practical issues, namely that it can block at untimely instants. It is really irksome when an automated OS install blocks (for hours !) because SSH server key generation insists on using /dev/random and needlessly stalls for entropy.
There are many applications which read /dev/random as a kind of ritual, as if it was "better" than /dev/urandom, probably on a karmic level. This is plain wrong, especially when the alea is to be used with classical cryptographic algorithms (e.g. to generate a SSH server public key). Do not do that. Instead, use /dev/urandom and you will live longer and happier. Even for one-time pad.
(Just for completeness, there is a quirk with /dev/urandom as implemented on Linux: it will never block, even if it has not gathered any entropy at all since previous boot. Distributions avoid this problem by creating a "random seed" at installation time, with /dev/random, and using that seed at each boot to initialize the PRNG used by /dev/urandom; a new random seed is regenerated immediately, for next boot. This ensures that /dev/urandom always works over a sufficiently big internal seed. The FreeBSD implementation of /dev/urandom will block until a given entropy threshold is reached, which is safer.)
The only thing in this universe that can be considered truly is one based on quantum effects. Common example is radioactive decay. For certain atoms you can be sure only about half-life, but you can't be sure which nucleus will break up next.
About /dev/random - it depends on implementation. In Linux it uses as entropy sources:
The Linux kernel generates entropy
from keyboard timings, mouse
movements, and IDE timings and makes
the random character data available to
other operating system processes
through the special files /dev/random
and /dev/urandom.
Wiki
It means that it is better than algorithmic random generators, but it is not perfect as well. The entropy may not be distributed randomly and can be biased.
This was philosophy. Practice is that on Linux /dev/random is random enough for vast majority of tasks.
There are implementations of random generators that have more entropy sources, including noise on audio inputs, CPU temperature sensors etc. Anyway they are not true.
There is interesting site where you can get Genuine random numbers, generated by radioactive decay.
/dev/random will block if there's not enough random data in the entropy pool whereas /dev/urandom will not. Instead, /dev/urandom will fall back to a PRNG (kernel docs). From the same docs:
The random number generator [entropy pool] gathers environmental noise from device drivers and other sources into an entropy pool.
So /dev/random is not algorithmic, like a PRNG, but it may not be "truly random" either. Mouse movements and keystroke timings tend to follow patterns and can be used for exploits but you'll have to weigh the risk against your use case.
To get a random number between x and y using /dev/random, assuming you're happy with a 32-bit integer, you could have a look at the way the Java java.util.Random class does it (nextInt()), substituting in appropriate code to read from /dev/random for the nextBytes() method.

Self validating binaries?

My question is pretty straightforward: You are an executable file that outputs "Access granted" or "Access denied" and evil persons try to understand your algorithm or patch your innards in order to make you say "Access granted" all the time.
After this introduction, you might be heavily wondering what I am doing. Is he going to crack Diablo3 once it is out? I can pacify your worries, I am not one of those crackers. My goal are crackmes.
Crackmes can be found on - for example - www.crackmes.de. A Crackme is a little executable that (most of the time) contains a little algorithm to verify a serial and output "Access granted" or "Access denied" depending on the serial. The goal is to make this executable output "Access granted" all the time. The methods you are allowed to use might be restricted by the author - no patching, no disassembling - or involve anything you can do with a binary, objdump and a hex editor. Cracking crackmes is one part of the fun, definately, however, as a programmer, I am wondering how you can create crackmes that are difficult.
Basically, I think the crackme consists of two major parts: a certain serial verification and the surrounding code.
Making the serial verification hard to track just using assembly is very possible, for example, I have the idea to take the serial as an input for a simulated microprocessor that must end up in a certain state in order to get the serial accepted. On the other hand, one might grow cheap and learn more about cryptographically strong ways to secure this part. Thus, making this hard enough to make the attacker try to patch the executable should not be tha
t hard.
However, the more difficult part is securing the binary. Let us assume a perfectly secure serial verification that cannot be reversed somehow (of course I know it can be reversed, in doubt, you rip parts out of the binary you try to crack and throw random serials at it until it accepts). How can we prevent an attacker from just overriding jumps in the binary in order to make our binary accept anything?
I have been searching on this topic a bit, but most results on binary security, self verifying binaries and such things end up in articles that try to prevent attacks on an operating system using compromised binaries. by signing certain binaries and validate those signatures with the kernel.
My thoughts currently consist of:
checking explicit locations in the binary to be jumps.
checksumming parts of the binary and compare checksums computed at runtime with those.
have positive and negative runtime-checks for your functions in the code. With side-effects on the serial verification. :)
Are you able to think of more ways to annoy a possible attacker longer? (of course, you cannot keep him away forever, somewhen, all checks will be broken, unless you managed to break a checksum-generator by being able to embed the correct checksum for a program in the program itself, hehe)
You're getting into "Anti-reversing techniques". And it's an art basically. Worse is that even if you stomp newbies, there are "anti-anti reversing plugins" for olly and IDA Pro that they can download and bypass much of your countermeasures.
Counter measures include debugger detection by trap Debugger APIs, or detecting 'single stepping'. You can insert code that after detecting a debugger breakin, continues to function, but starts acting up at random times much later in the program. It's really a cat and mouse game and the crackers have a significant upper hand.
Check out...
http://www.openrce.org/reference_library/anti_reversing - Some of what's out there.
http://www.amazon.com/Reversing-Secrets-Engineering-Eldad-Eilam/dp/0764574817/ - This book has a really good anti-reversing info and steps through the techniques. Great place to start if you're getting int reversing in general.
I believe these things are generally more trouble than they're worth.
You spend a lot of effort writing code to protect your binary. The bad guys spend less effort cracking it (they're generally more experienced than you) and then release the crack so everyone can bypass your protection. The only people you'll annoy are those honest ones who are inconvenienced by your protection.
Just view piracy as a cost of business - the incremental cost of pirated software is zero if you ensure all support is done only for paying customers.
There's TPM technology: tpm on wikipedia
It allows you to store the cryptographic check sums of a binary on special chip, which could act as one-way verification.
Note: TPM has sort of a bad rap because it could be used for DRM. But to experts in the field, that's sort of unfair, and there's even an open-TPM group allowing linux users control exactly how their TPM chip is used.
One of the strongest solutions to this problem is Trusted Computing. Basically you would encrypt the application and transmit the decryption key to a special chip (the Trusted Platform Module), The chip would only decrypt the application once it has verified that the computer is in a "trusted" state: no memory viewers/editors, no debuggers etc. Basically, you would need special hardware to just be able to view the decrypted program code.
So, you want to write a program that accepts a key at the beginning and stores it in memory, subsequently retrieving it from disc. If it's the correct key, the software works. If it's the wrong key, the software crashes. The goal is that it's hard for pirates to generate a working key, and it's hard to patch the program to work with an unlicensed key.
This can actually be achieved without special hardware. Consider our genetic code. It works based on the physics of this universe. We try to hack it, create drugs, etc., and we fail miserably, usually creating tons of undesirable side-effects, because we haven't yet fully reverse engineered the complex "world" in which the genetic "code" evolved to operate. Basically, if you're running everything on an common processor (a common "world"), which everyone has access to, then it's virtually impossible to write such a secure code, as demonstrated by current software being so easily cracked.
To achieve security in software, you essentially would have to write your own sufficiently complex platform, which others would have to completely and thoroughly reverse engineer in order to modify the behavior of your code without unpredictable side effects. Once your platform is reverse engineered, however, you'd be back to square one.
The catch is, your platform is probably going to run on common hardware, which makes your platform easier to reverse engineer, which in turn makes your code a bit easier to reverse engineer. Of course, that may just mean the bar is raised a bit for the level of complexity required of your platform to be sufficiently difficult to reverse engineer.
What would a sufficiently complex software platform look like? For example, perhaps after every 6 addition operations, the 7th addition returns the result multiplied by PI divided by the square root of the log of the modulus 5 of the difference of the total number of subtract and multiply operations performed since system initialization. The platform would have to keep track of those numbers independently, as would the code itself, in order to decode correct results. So, your code would be written based on knowledge of the complex underlying behavior of a platform you engineered. Yes, it would eat processor cycles, but someone would have to reverse engineer that little surprise behavior and re-engineer it into any new code to have it behave properly. Furthermore, your own code would be difficult to change once written, because it would collapse into irreducible complexity, with each line depending on everything that happened prior. Of course, there would be much more complexity in a sufficiently secure platform, but the point is that someone would have reverse engineer your platform before they could reverse engineer and modify your code, without debilitating side-effects.
Great article on copy protection and protecting the protection Keeping the Pirates at Bay:
Implementing Crack Protection for Spyro: Year of the Dragon
The most interesting idea mentioned in there that hasn't yet been mentioned is cascading failures - you have checksums that modify a single byte that causes another checksum to fail. Eventually one of the checksums causes the system to crash or do something strange. This makes attempts to pirate your program seem unstable and makes the cause occur a long way from the crash.

Resources