Write the longest possible loop - programming-languages

Recently I was asked this question in a technical discussion. What is the longest possible loop that can be written in computational science considering the machine/architecture on which it is to run? This loop has to be as long as possible and yet not an infinite loop and should not end-up crashing the program (Recursion etc...)
I honestly did not know how to attack this problem, so I asked him if is it practically possible. He said using some computer science concepts, you can arrive at a hypothetical number which may not be practical but nevertheless it will still not be infinite.
Anyone here; knows how to analyse / attack this problem.
P.S. Choosing some highest limit for a type that can store the highest numerical value is apparently not an answer.
Thanks in advance,

You are getting into the field of turing machines.
Simply put (lets stay in the deterministic fields...) your computer/machine can be in a finite number of states that are passed during the algorithm. every state is unique and only occours once, otherwise you would by defnition have an endless loop. like "goto".
we can remove that limitation, but it would not make much sense because then a trivial algorithm can be found that always has one more loop runs than every other possible algorithm.
so it depends on the machines possible states which you could naively translate by "its ram".
so the question now is: whats the longest possible loop on a machine that can be in X several states? and wikipedia gives the answer

Please read up on the Busy Beaver problem.
http://en.wikipedia.org/wiki/Busy_beaver

The largest possible finite value? As a mathematician, I find that ridiculous. Perhaps the problem could be explained better.

If we're talking about language limitation, well, some languages define arbitrary-precision integers (like Python and Common Lisp), so you could count up to any number you liked as far as the language goes. You could easily set it too large for any actual machine, but that's not a language limitation.
As far as counting on actual machines go, it's a matter of the number of possible states. For each bit you can allocate as a counter (and it doesn't have to be as one data element, since it's real easy to make an arbitrary-length counter), that's two states, so if you had 8G of memory available you could count to something like 2^8G with it. You could of course use the file system for more counter space.
Or, assuming you're not using physically reversible computation, you could check the minimum energy necessary to flip a bit, divide the amount of energy available (like the total expected solar output or whatever), and get a limit.
There is a limit for Turing machines of specified complexity. It goes up pretty fast.
There's too many possible answers to provide one.

If "long" means time, the following finite loop should be a good bet:
for(unsigned long long i = 0; i < ULONG_LONG_MAX; ++i) sleep(UINT_MAX);
Okay this answer is not really serious, but actually my opinion is that the question is totally useless to ask in a job interview. Why? According to S.Lott's answer, this could be about the Busy Beaver problem which is almost 50 years old and is totally unknown because nobody could make use of it in a real job.

Related

What's the overhead of the different forms of parallelism in Julia v0.5?

As the title states, what is the overhead of the different forms of parallelism, at least in the current implementation of Julia (v0.5, in case the implementation changes drastically in the future)? I am looking for some "practical measures", some general heuristics or ballparks to keep in my head for when it can be useful. For example, it's pretty obvious that multiprocessing won't give you gains in a loop like:
addprocs(4)
#parallel (+) for i=1:4
rand()
end
doesn't give you performance gains because each process is only taking one random number, but is there general heuristic for knowing when it will be worthwhile? Also, what about a heuristic for threading. It's surely a lower overhead than multiprocessing, but for example, with 4 threads, for what N is it a good idea to multithread:
A = rand(4)
Base.#threads (+) for i = 1:N
A[i%4+1]
end
(I know there isn't a threaded reduction right now, but let's act like there is, or edit with a better example). Sure, I can benchmark every example, but some good rules to keep in mind would go a long way.
In more concrete terms: what are some good rules of thumb?
How many numbers do you need to be adding/multiplying before threading gives performance enhancements, or before multiprocessing gives performance enhancements?
How much does the depend on Julia's current implementation?
How much does it depend on the number of threads/processes?
How much does the depend on the architecture? Are there good rules for knowing when the threshold should be higher/lower on a particular system?
What kinds of applications violate these heuristics?
Again, I'm not looking for hard rules, just general guidelines to guide development.
A few caveats: 1. I'm speaking from experience with version 0.4.6, (and prior), haven't played with 0.5 yet (but, as I hope my answer below demonstrates, I don't think this is essential vis-a-vis the response I give). 2. this isn't a fully comprehensive answer.
Nevertheless, from my experience, the overhead for multiple processes itself is very small provided that you aren't dealing with data movement issues. In other words, in my experience, any time that you ever find yourself in a situation of wishing something were faster than a single process on your CPU can manage, you're well past the point where parallelism will be beneficial. For instance, in the sum of random numbers example that you gave, I found through testing just now that the break-even point was somewhere around 10,000 random numbers. Anything more and parallelism was the clear winner. Generating 10,000 random number is trivial for modern computers, taking a tiny fraction of a second, and is well below the threshold where I'd start getting frustrated by the slowness of my scripts and want parallelism to speed them up.
Thus, I at least am of the opinion, that although there are probably even more wonderful things that the Julia developers could do to cut down on the overhead even more, at this point, anything pertinent to Julia isn't going to be so much of your limiting factor, at least in terms of the computation aspects of parallelism. I think that there are still improvements to be made in terms of enhancing both the ease and the efficiency of parallel data movement (I like the package that you've started on that topic as a good step. You and I would probably both agree there's still a ways more to go). But, the big limiting factors will be:
How much data do you need to be moving around between processes?
How much read/write to your memory do you need to be doing during your computations? (e.g. flops per read/write)
Aspect 1. might at times lean against using parallelism. Aspect 2. is more likely just to mean that you won't get so much benefit from it. And, at least as I interpret "overhead," neither of these really fall so directly into that specific consideration. And, both of these are, I believe, going to be far more heavily determined by your system hardware than by Julia.

Are limitations of CPU speed and memory prevent us from creating AI systems?

Many technology optimists say that in 15 years the speed of computers will be comparable with the speed of the human brain. This is why they believe that computers will achieve the same level of intelligence as humans.
If Moore's law holds, then every 18 months we should expect doubling of CPU speed. 15 years is 180 months. So, we will have the doubling 10 times. Which means that in 15 years computer will be 1024 times faster than they are now.
But is the speed the reason of the problem? If it is so, we would be able to build an AI system NOW, it would just 1024 times slower than in 15 years. Which means that to answer a question it will need 1024 second (17 minutes) instead of acceptable 1 second. But do we have now strong (but slow) AI system? I think no. Even if now (2015) we give to a system 1 hour instead of 17 minutes, or 1 day, or 1 month or even 1 year, it still will be unable to answer complex questions formulated in natural language. So, it is not the speed that causes problems.
It means that in 15 years our intelligence will not be 1024 faster than now (because we have no intelligence). Instead our "stupidity" will be 1024 times faster than now.
We need both faster hardware and better algorithms. Of course speed alone is not enough as you pointed out.
We need self-modifying meta-learning algorithms capable of creating hypotheses and performing experiments to verify them (like humans do). Systems that are learning to learn and self-improving. Algorithms that can prove that given self-modification is optimal in certain sense and will lead to even better self-modifications in the future. Systems that can reflect on and inspect their own software (can you call it consciousness ?). Such research is being done and may create superhuman intelligence in the future or even technological singularity as some believe.
There is one problem with this approach, though. People doing this research usually assume that consciousness is computable. That it is all about intelligence. They don't take into account experiences like pleasure and pain which have nothing to do (in my opinion) with computation nor intellect. You can understand pain through experience only (not intellectual speculation). Setting variable pleasure to 5 or behaving like one feels pleasure is very different from experiencing pleasure. Some people say that feelings originate in brain so it is enough to understand brain. Not necessarily. Child can ask: "How did they put small people inside TV box ?". Of course TV is just a receiver and there are no small people inside. Brain might be receiver too. Do we need higher knowledge for feelings and other experiences ?
The answer has to be answered in the context of computation and complexity.
Every algorithm has its own complexity and running time (See Big O notation). There are problems which are non-computable problems such as the halting problem. These problems are proven that an algorithm does not exists independent of the hardware.
Computable algorithms are described in the number of steps required with respect to the input to solve an algorithm. As the number of input increases, the execution time of the algorithm also increases. However, these algorithms can be categorized into two: exponential time algorithms and non-exponential time algorithms. Exponential time algorithms increases drastically with the number of input and becomes intractable.
These executing time of these problems can be improved with better hardware however the complexity will always be the same. This means that no matter what the CPU uses, the execution time will always require the same number of steps. This means that the hardware is important to provide an answer in less time but the hardness of the problem will always remain the same. Thus, the limitation of the hardware is not preventing us from creating an AI system. For instance, you can use parallel programming (ex: GPU) to improve the execution time of the algorithm drastically but the algorithm is still the same as a normal CPU algorithm.
I would say no. As you showed, speed is not the only factor of intelligence. I for one would think Language is, yes language. Language is the primary skill we learn as humans, so why not for computers? Language gives an understanding that can be understood across the globe, given you know that language. Humans use nonverbal and verbal language to communicate. But I honestly think it really works something like this:
Humans go through experiences. These experiences have a bigger impact on our lives the closer we are to our birth date, or the more emotional they are. For example, the first time we are told no means ALOT more to us as an infant than as a 70 year old adult. These get stored as either long term or short term memory and correlated to that event later on in life for reference. We mainly store events to learn from them to prevent negative experience or promote positive experiences.
Think of it as a tag cloud. The more often you do task A, the bigger the cloud is in memory. We then store crucial details such as type of emotion, location, smells etc. Now when we reference them again from memory we pick out those details and create a logical sentence:
Touching that stove hurt me when I was at grandma's house.
All of the bolded words would have to be stored to have a complete memory.
Now inside of this sentence we have learned a lot more things than just being hurt from the stove at grandma's house. We have learned that stove's can be hot, dangerous, and grandma allows it to be in her house. We also learned how long it takes to heal from such an event, emotionally and physically to gauge how important the event is. And so much more. So we also store this sub-event information inside of other knowledge bubbles. And these bubbles continue to grow exponentially.
Now when asked: Are stoves dangerous?
You can identify the words in the sentence:
are, stoves, dangerous, question
and reference the definition of dangerous as: hurt, bad
and then provide more evidence that this is true, such as personal experience to result in:
Yes, stoves are dangerous because I was hurt at grandmas house by one.
So intelligence seems to be a mix of events, correlation and data retrieval to solve some solution. I'm sure there's a lot more to it than that but this is just my understanding of intelligence.

Software to Tune/Calibrate Properties for Heuristic Algorithms

Today I read that there is a software called WinCalibra (scroll a bit down) which can take a text file with properties as input.
This program can then optimize the input properties based on the output values of your algorithm. See this paper or the user documentation for more information (see link above; sadly doc is a zipped exe).
Do you know other software which can do the same which runs under Linux? (preferable Open Source)
EDIT: Since I need this for a java application: should I invest my research in java libraries like gaul or watchmaker? The problem is that I don't want to roll out my own solution nor I have time to do so. Do you have pointers to an out-of-the-box applications like Calibra? (internet searches weren't successfull; I only found libraries)
I decided to give away the bounty (otherwise no one would have a benefit) although I didn't found a satisfactory solution :-( (out-of-the-box application)
Some kind of (Metropolis algorithm-like) probability selected random walk is a possibility in this instance. Perhaps with simulated annealing to improve the final selection. Though the timing parameters you've supplied are not optimal for getting a really great result this way.
It works like this:
You start at some point. Use your existing data to pick one that look promising (like the highest value you've got). Set o to the output value at this point.
You propose a randomly selected step in the input space, assign the output value there to n.
Accept the step (that is update the working position) if 1) n>o or 2) the new value is lower, but a random number on [0,1) is less than f(n/o) for some monotonically increasing f() with range and domain on [0,1).
Repeat steps 2 and 3 as long as you can afford, collecting statistics at each step.
Finally compute the result. In your case an average of all points is probably sufficient.
Important frill: This approach has trouble if the space has many local maxima with deep dips between them unless the step size is big enough to get past the dips; but big steps makes the whole thing slow to converge. To fix this you do two things:
Do simulated annealing (start with a large step size and gradually reduce it, thus allowing the walker to move between local maxima early on, but trapping it in one region later to accumulate precision results.
Use several (many if you can afford it) independent walkers so that they can get trapped in different local maxima. The more you use, and the bigger the difference in output values, the more likely you are to get the best maxima.
This is not necessary if you know that you only have one, big, broad, nicely behaved local extreme.
Finally, the selection of f(). You can just use f(x) = x, but you'll get optimal convergence if you use f(x) = exp(-(1/x)).
Again, you don't have enough time for a great many steps (though if you have multiple computers, you can run separate instances to get the multiple walkers effect, which will help), so you might be better off with some kind of deterministic approach. But that is not a subject I know enough about to offer any advice.
There are a lot of genetic algorithm based software that can do exactly that. Wrote a PHD about it a decade or two ago.
A google for Genetic Algorithms Linux shows a load of starting points.
Intrigued by the question, I did a bit of poking around, trying to get a better understanding of the nature of CALIBRA, its standing in academic circles and the existence of similar software of projects, in the Open Source and Linux world.
Please be kind (and, please, edit directly, or suggest editing) for the likely instances where my assertions are incomplete, inexact and even flat-out incorrect. While working in related fields, I'm by no mean an Operational Research (OR) authority!
[Algorithm] Parameter tuning problem is a relatively well defined problem, typically framed as one of a solution search problem whereby, the combination of all possible parameter values constitute a solution space and the parameter tuning logic's aim is to "navigate" [portions of] this space in search of an optimal (or locally optimal) set of parameters.
The optimality of a given solution is measured in various ways and such metrics help direct the search. In the case of the Parameter Tuning problem, the validity of a given solution is measured, directly or through a function, from the output of the algorithm [i.e. the algorithm being tuned not the algorithm of the tuning logic!].
Framed as a search problem, the discipline of Algorithm Parameter Tuning doesn't differ significantly from other other Solution Search problems where the solution space is defined by something else than the parameters to a given algorithm. But because it works on algorithms which are in themselves solutions of sorts, this discipline is sometimes referred as Metaheuristics or Metasearch. (A metaheuristics approach can be applied to various algorihms)
Certainly there are many specific features of the parameter tuning problem as compared to the other optimization applications but with regard to the solution searching per-se, the approaches and problems are generally the same.
Indeed, while well defined, the search problem is generally still broadly unsolved, and is the object of active research in very many different directions, for many different domains. Various approaches offer mixed success depending on the specific conditions and requirements of the domain, and this vibrant and diverse mix of academic research and practical applications is a common trait to Metaheuristics and to Optimization at large.
So... back to CALIBRA...
From its own authors' admission, Calibra has several limitations
Limit of 5 parameters, maximum
Requirement of a range of values for [some of ?] the parameters
Works better when the parameters are relatively independent (but... wait, when that is the case, isn't the whole search problem much easier ;-) )
CALIBRA is based on a combination of approaches, which are repeated in a sequence. A mix of guided search and local optimization.
The paper where CALIBRA was presented is dated 2006. Since then, there's been relatively few references to this paper and to CALIBRA at large. Its two authors have since published several other papers in various disciplines related to Operational Research (OR).
This may be indicative that CALIBRA hasn't been perceived as a breakthrough.
State of the art in that area ("parameter tuning", "algorithm configuration") is the SPOT package in R. You can connect external fitness functions using a language of your choice. It is really powerful.
I am working on adapters for e.g. C++ and Java that simplify the experimental setup, which requires some getting used to in SPOT. The project goes under name InPUT, and a first version of the tuning part will be up soon.

What's the opposite of "embarrassingly parallel"?

According to Wikipedia, an "embarrassingly parallel" problem is one for which little or no effort is required to separate the problem into a number of parallel tasks. Raytracing is often cited as an example because each ray can, in principle, be processed in parallel.
Obviously, some problems are much harder to parallelize. Some may even be impossible. I'm wondering what terms are used and what the standard examples are for these harder cases.
Can I propose "Annoyingly Sequential" as a possible name?
Inherently sequential.
Example: The number of women will not reduce the length of pregnancy.
There's more than one opposite of an "embarrassingly parallel" problem.
Perfectly sequential
One opposite is a non-parallelizable problem, that is, a problem for which no speedup may be achieved by utilizing more than one processor. Several suggestions were already posted, but I'd propose yet another name: a perfectly sequential problem.
Examples: I/O-bound problems, "calculate f1000000(x0)" type of problems, calculating certain cryptographic hash functions.
Communication-intensive
Another opposite is a parallelizable problem which requires a lot of parallel communication (a communication-intensive problem). An implementation of such a problem will scale properly only on a supercomputer with high-bandwidth, low-latency interconnect. Contrast this with embarrassingly parallel problems, implementations of which run efficiently even on systems with very poor interconnect (e.g. farms).
Notable example of a communication-intensive problem: solving A x = b where A is a large, dense matrix. As a matter of fact, an implementation of the problem is used to compile the TOP500 ranking. It's a good benchmark, as it emphasizes both the computational power of individual CPUs and the quality of interconnect (due to intensity of communication).
In more practical terms, any mathematical model which solves a system of partial differential equations on a regular grid using discrete time stepping (think: weather forecasting, in silico crash tests), is parallelizable by domain decomposition. That means, each CPU takes care of a part of the grid, and at the end of each time step the CPUs exchange their results on region boundaries with "neighbour" CPUs. These exchanges render this class of problems communication-intensive.
Im having a hard time to not post this... cause I know it don't add anything to the discussion.. but for all southpark fans out there
"Super serial!"
"Stubbornly serial"?
The opposite of embarassingly parallel is Amdahl's Law, which says that some tasks cannot be parallel, and that the minimum time a perfectly parallel task will require is dictated by the purely sequential portion of that task.
"standard examples" of sequential processes:
making a baby: “Crash programs fail because they are based on theory that, with nine women pregnant, you can get a baby a month.” -- attributed to Werner von Braun
calculating pi, e, sqrt(2), and other irrational numbers to millions of digits: most algorithms sequential
navigation: to get from point A to point Z, you must first go through some intermediate points B, C, D, etc.
Newton's method: you need each approximation in order to calculate the next, better approximation
challenge-response authentication
key strengthening
hash chain
Hashcash
P-complete (but that's not known for sure yet).
I use "Humiliatingly Sequential"
Paul
If ever one should speculate what it would be like to have natural, incorrigibly sequential problems, try
blissfully sequential
to counter 'embarrassingly parallel'.
"Gladdengly Sequential"
It all has to do with data dependencies. Embarrassingly parallel problems are ones for which the solution is made up of many independent parts. Problems with the opposite of this nature would be ones that have massive data dependencies, where there is little to nothing that can be done in parallel. Degeneratively dependent?
The term I've heard most often is "tightly-coupled", in that each process must interact and communicate often in order to share intermediate data. Basically, each process depends on others to complete their computation.
For example, matrix processing often involves sharing boundary values at the edges of each array partition.
This is in contrast to embarassingly parallel (or loosely-coupled) problems where each part of the problem is completely self-contained, and no (or very little) IPC is needed. Think master/worker parallelism.
Boastfully sequential.
I've always preferred 'sadly sequential' ala the partition step in quicksort.
"Completely serial?"
It shouldn't really surprise you that scientists think more about what can be done than what cannot be done. Especially in this case, where the alternative to parallelizing is doing everything as one normally would.
Completely non-parallelizable?
Pessimally parallelizable?
The opposite is "disconcertingly serial".
taking into acount that parallelism is the act of doing many jobs in the same time step t. the opposite could be time-sequential problems
An example inherently sequential problem.
This is common in CAD packages and some kinds of engineering analysis.
Tree traversal with data dependencies between nodes.
Imagine traversing a graph and adding up weights of nodes.
You just can't parallelise it.
CAD software represents parts as a tree, and to render to object you have to traverse the tree.
For this reason, cad workstations use less cores and faster, rather than many cores.
Thanks for reading.
You could of course, however I think that both 'names' are a non-issue.
From a functional programming perspective you could say that the 'annoyingly sequential' part is the smallest more or less independent part of an algorithm.
While the 'embarrassingly parallel' if not indeed taking into a parallel approach is bad coding practice.
Thus I don't see a point in given these things a name if good coding practice is always to brake up your solution into independent pieces, even if you at that moment don't take advantage of parallelism.

What would programming languages look like if every computable thing could be done in 1 second?

Inspired by this question
Suppose we had a magical Turing Machine with infinite memory, and unlimited CPU power.
Use your imagination as to how this might be possible, e.g. it uses some sort of hyperspace continuum to automatically parallelize anything as much as is desired, so that it could calculate the answer to any computable question, no matter what it's time complexity is and number of actual "logical steps", in one second.
However, it can only answer computable questions in one second... so I'm not positing an "impossible" machine (at least I don't think so)... For example, this machine still wouldn't be able to solve the halting problem.
What would the programming language for such a machine look like? All programming languages I know about currently have to make some concessions to "algorithmic complexity"... with that constraint removed though, I would expect that all we would care about would be the "expressiveness" of the programming language. i.e. its ability to concisely express "computable questions"...
Anyway, in the interests of a hopefully interesting discussion, opening it up as community wiki...
SendMessage travelingSalesman "Just buy a ticket to the same city twice already. You'll spend much more money trying to solve this than you'll save by visiting Austin twice."
SendMessage travelingSalesman "Wait, they built what kind of computer? Nevermind."
This is not really logical. If a thing takes O(1) time, then doing n times will take O(n) time, even on a quantum computer. It is impossible that "everything" takes O(1) time.
For example: Grover's algorithm, the one mentioned in the accepted answer to the question you linked to, takes O(n^1/2) time to find an element in a database of n items. And thats not O(1).
The amount of memory or the speed of the memory or the speed of the processor doesn't define the time and space complexity of an algorithm. Basic mathematics do that. Asking what would programming languages look like if everything could be computed in O(1) is like asking how would our calculators look like if pi was 3 and the results of all square roots are integers. It's really impossible and if it isn't, it's not likely to be very useful.
Now, asking ourself what we would do with infinite process power and infinite memory could be a useful exercise. We'll still have to deal with complexity of algorithms but we'd probably work somehow differently. For that I recommend The Hundred-Year Language.
Note that even if the halting problem is not computable, "does this halt within N steps on all possible inputs of size smaller than M" is!
As such any programming language would become purely specification. All you need to do is accurately specify the pre and post conditions of a function and the compiler could implement the fastest possible code which implements your spec.
Also, this would trigger a singularity very quickly. Constructing an AI would be a lot easier if you could do near infinite computation -- and once you had one, of any efficiency, it could ask the computable question "How would I improve my program if I spent a billion years thinking about it?"...
It could possibly be a haskell-ish language. Honestly it's a dream to code in. You program the "laws" of your types, classes, and functions and then let them loose. It's incredibly fun, powerful, and you can write some very succinct and elegant code. It's like an art.
Maybe it would look more like pseudo-code than "real" code. After all, you don't have to worry about any implementation details any more because whichever way you go, it'll be sufficiently fast enough.
Scalability would not be an issue any longer. We'd have AIs way smarter than us.
We wouldn't need to program any longer and instead the AI would figure out our intentions before we realize them ourselves.
SQL is such a language - you ask for some piece of data and you get it. If you didn't have to worry about minute implementation details of the db this might even be fun to program in.
Your underestimate the O(1). It means that there exists a constant C>0 such that time to compute a problem is limited to this C.
What you ignore is that the actual value of C can be large and it can (and mostly is) different for different algorithms. You may have two algorithms (or computers - doesn't matter) both with O(1) but in one this C may be billion times bigger that in another - then the latter will be much slower and perhaps very slow in terms of time.
If it will all be done in one second, then most languages will eventually look like this, I call it DWIM theory (Do what I mean theory):
Just do what I said (without any bugs this time)
Because if we ever develop a machine that can compute everything in one second, then we will probably have mind control at that stage, and at the very least artificial intelligence.
I don't know what new languages would come up (I'm a physicist, not a computer scientist) but I'd still write my programs for it in Python.

Resources