Are limitations of CPU speed and memory prevent us from creating AI systems?

Are limitations of CPU speed and memory prevent us from creating AI systems? - nlp

Many technology optimists say that in 15 years the speed of computers will be comparable with the speed of the human brain. This is why they believe that computers will achieve the same level of intelligence as humans.
If Moore's law holds, then every 18 months we should expect doubling of CPU speed. 15 years is 180 months. So, we will have the doubling 10 times. Which means that in 15 years computer will be 1024 times faster than they are now.
But is the speed the reason of the problem? If it is so, we would be able to build an AI system NOW, it would just 1024 times slower than in 15 years. Which means that to answer a question it will need 1024 second (17 minutes) instead of acceptable 1 second. But do we have now strong (but slow) AI system? I think no. Even if now (2015) we give to a system 1 hour instead of 17 minutes, or 1 day, or 1 month or even 1 year, it still will be unable to answer complex questions formulated in natural language. So, it is not the speed that causes problems.
It means that in 15 years our intelligence will not be 1024 faster than now (because we have no intelligence). Instead our "stupidity" will be 1024 times faster than now.

We need both faster hardware and better algorithms. Of course speed alone is not enough as you pointed out.
We need self-modifying meta-learning algorithms capable of creating hypotheses and performing experiments to verify them (like humans do). Systems that are learning to learn and self-improving. Algorithms that can prove that given self-modification is optimal in certain sense and will lead to even better self-modifications in the future. Systems that can reflect on and inspect their own software (can you call it consciousness ?). Such research is being done and may create superhuman intelligence in the future or even technological singularity as some believe.
There is one problem with this approach, though. People doing this research usually assume that consciousness is computable. That it is all about intelligence. They don't take into account experiences like pleasure and pain which have nothing to do (in my opinion) with computation nor intellect. You can understand pain through experience only (not intellectual speculation). Setting variable pleasure to 5 or behaving like one feels pleasure is very different from experiencing pleasure. Some people say that feelings originate in brain so it is enough to understand brain. Not necessarily. Child can ask: "How did they put small people inside TV box ?". Of course TV is just a receiver and there are no small people inside. Brain might be receiver too. Do we need higher knowledge for feelings and other experiences ?

The answer has to be answered in the context of computation and complexity.
Every algorithm has its own complexity and running time (See Big O notation). There are problems which are non-computable problems such as the halting problem. These problems are proven that an algorithm does not exists independent of the hardware.
Computable algorithms are described in the number of steps required with respect to the input to solve an algorithm. As the number of input increases, the execution time of the algorithm also increases. However, these algorithms can be categorized into two: exponential time algorithms and non-exponential time algorithms. Exponential time algorithms increases drastically with the number of input and becomes intractable.
These executing time of these problems can be improved with better hardware however the complexity will always be the same. This means that no matter what the CPU uses, the execution time will always require the same number of steps. This means that the hardware is important to provide an answer in less time but the hardness of the problem will always remain the same. Thus, the limitation of the hardware is not preventing us from creating an AI system. For instance, you can use parallel programming (ex: GPU) to improve the execution time of the algorithm drastically but the algorithm is still the same as a normal CPU algorithm.

I would say no. As you showed, speed is not the only factor of intelligence. I for one would think Language is, yes language. Language is the primary skill we learn as humans, so why not for computers? Language gives an understanding that can be understood across the globe, given you know that language. Humans use nonverbal and verbal language to communicate. But I honestly think it really works something like this:
Humans go through experiences. These experiences have a bigger impact on our lives the closer we are to our birth date, or the more emotional they are. For example, the first time we are told no means ALOT more to us as an infant than as a 70 year old adult. These get stored as either long term or short term memory and correlated to that event later on in life for reference. We mainly store events to learn from them to prevent negative experience or promote positive experiences.
Think of it as a tag cloud. The more often you do task A, the bigger the cloud is in memory. We then store crucial details such as type of emotion, location, smells etc. Now when we reference them again from memory we pick out those details and create a logical sentence:
Touching that stove hurt me when I was at grandma's house.
All of the bolded words would have to be stored to have a complete memory.
Now inside of this sentence we have learned a lot more things than just being hurt from the stove at grandma's house. We have learned that stove's can be hot, dangerous, and grandma allows it to be in her house. We also learned how long it takes to heal from such an event, emotionally and physically to gauge how important the event is. And so much more. So we also store this sub-event information inside of other knowledge bubbles. And these bubbles continue to grow exponentially.
Now when asked: Are stoves dangerous?
You can identify the words in the sentence:
are, stoves, dangerous, question
and reference the definition of dangerous as: hurt, bad
and then provide more evidence that this is true, such as personal experience to result in:
Yes, stoves are dangerous because I was hurt at grandmas house by one.
So intelligence seems to be a mix of events, correlation and data retrieval to solve some solution. I'm sure there's a lot more to it than that but this is just my understanding of intelligence.

Related

What's the overhead of the different forms of parallelism in Julia v0.5?

As the title states, what is the overhead of the different forms of parallelism, at least in the current implementation of Julia (v0.5, in case the implementation changes drastically in the future)? I am looking for some "practical measures", some general heuristics or ballparks to keep in my head for when it can be useful. For example, it's pretty obvious that multiprocessing won't give you gains in a loop like:
addprocs(4)
#parallel (+) for i=1:4
rand()
end
doesn't give you performance gains because each process is only taking one random number, but is there general heuristic for knowing when it will be worthwhile? Also, what about a heuristic for threading. It's surely a lower overhead than multiprocessing, but for example, with 4 threads, for what N is it a good idea to multithread:
A = rand(4)
Base.#threads (+) for i = 1:N
A[i%4+1]
end
(I know there isn't a threaded reduction right now, but let's act like there is, or edit with a better example). Sure, I can benchmark every example, but some good rules to keep in mind would go a long way.
In more concrete terms: what are some good rules of thumb?
How many numbers do you need to be adding/multiplying before threading gives performance enhancements, or before multiprocessing gives performance enhancements?
How much does the depend on Julia's current implementation?
How much does it depend on the number of threads/processes?
How much does the depend on the architecture? Are there good rules for knowing when the threshold should be higher/lower on a particular system?
What kinds of applications violate these heuristics?
Again, I'm not looking for hard rules, just general guidelines to guide development.

A few caveats: 1. I'm speaking from experience with version 0.4.6, (and prior), haven't played with 0.5 yet (but, as I hope my answer below demonstrates, I don't think this is essential vis-a-vis the response I give). 2. this isn't a fully comprehensive answer.
Nevertheless, from my experience, the overhead for multiple processes itself is very small provided that you aren't dealing with data movement issues. In other words, in my experience, any time that you ever find yourself in a situation of wishing something were faster than a single process on your CPU can manage, you're well past the point where parallelism will be beneficial. For instance, in the sum of random numbers example that you gave, I found through testing just now that the break-even point was somewhere around 10,000 random numbers. Anything more and parallelism was the clear winner. Generating 10,000 random number is trivial for modern computers, taking a tiny fraction of a second, and is well below the threshold where I'd start getting frustrated by the slowness of my scripts and want parallelism to speed them up.
Thus, I at least am of the opinion, that although there are probably even more wonderful things that the Julia developers could do to cut down on the overhead even more, at this point, anything pertinent to Julia isn't going to be so much of your limiting factor, at least in terms of the computation aspects of parallelism. I think that there are still improvements to be made in terms of enhancing both the ease and the efficiency of parallel data movement (I like the package that you've started on that topic as a good step. You and I would probably both agree there's still a ways more to go). But, the big limiting factors will be:
How much data do you need to be moving around between processes?
How much read/write to your memory do you need to be doing during your computations? (e.g. flops per read/write)
Aspect 1. might at times lean against using parallelism. Aspect 2. is more likely just to mean that you won't get so much benefit from it. And, at least as I interpret "overhead," neither of these really fall so directly into that specific consideration. And, both of these are, I believe, going to be far more heavily determined by your system hardware than by Julia.

Write the longest possible loop

Recently I was asked this question in a technical discussion. What is the longest possible loop that can be written in computational science considering the machine/architecture on which it is to run? This loop has to be as long as possible and yet not an infinite loop and should not end-up crashing the program (Recursion etc...)
I honestly did not know how to attack this problem, so I asked him if is it practically possible. He said using some computer science concepts, you can arrive at a hypothetical number which may not be practical but nevertheless it will still not be infinite.
Anyone here; knows how to analyse / attack this problem.
P.S. Choosing some highest limit for a type that can store the highest numerical value is apparently not an answer.
Thanks in advance,

You are getting into the field of turing machines.
Simply put (lets stay in the deterministic fields...) your computer/machine can be in a finite number of states that are passed during the algorithm. every state is unique and only occours once, otherwise you would by defnition have an endless loop. like "goto".
we can remove that limitation, but it would not make much sense because then a trivial algorithm can be found that always has one more loop runs than every other possible algorithm.
so it depends on the machines possible states which you could naively translate by "its ram".
so the question now is: whats the longest possible loop on a machine that can be in X several states? and wikipedia gives the answer

Please read up on the Busy Beaver problem.
http://en.wikipedia.org/wiki/Busy_beaver

The largest possible finite value? As a mathematician, I find that ridiculous. Perhaps the problem could be explained better.

If we're talking about language limitation, well, some languages define arbitrary-precision integers (like Python and Common Lisp), so you could count up to any number you liked as far as the language goes. You could easily set it too large for any actual machine, but that's not a language limitation.
As far as counting on actual machines go, it's a matter of the number of possible states. For each bit you can allocate as a counter (and it doesn't have to be as one data element, since it's real easy to make an arbitrary-length counter), that's two states, so if you had 8G of memory available you could count to something like 2^8G with it. You could of course use the file system for more counter space.
Or, assuming you're not using physically reversible computation, you could check the minimum energy necessary to flip a bit, divide the amount of energy available (like the total expected solar output or whatever), and get a limit.
There is a limit for Turing machines of specified complexity. It goes up pretty fast.
There's too many possible answers to provide one.

If "long" means time, the following finite loop should be a good bet:
for(unsigned long long i = 0; i < ULONG_LONG_MAX; ++i) sleep(UINT_MAX);
Okay this answer is not really serious, but actually my opinion is that the question is totally useless to ask in a job interview. Why? According to S.Lott's answer, this could be about the Busy Beaver problem which is almost 50 years old and is totally unknown because nobody could make use of it in a real job.

Cobol: science and fiction [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 12 years ago.
There are a few threads about the relevance of the Cobol programming language on this forum, e.g. this thread links to a collection of them. What I am interested in here is a frequently repeated claim based on a study by Gartner from 1997: that there were around 200 billion lines of code in active use at that time!
I would like to ask some questions to verify or falsify a couple of related points. My goal is to understand if this statement has any truth to it or if it is totally unrealistic.
I apologize in advance for being a little verbose in presenting my line of thought and my own opinion on the things I am not sure about, but I think it might help to put things in context and thus highlight any wrong assumptions and conclusions I have made.
Sometimes, the "200 billion lines" number is accompanied by the added claim that this corresponded to 80% of all programming code in any language in active use. Other times, the 80% merely refer to so-called "business code" (or some other vague phrase hinting that the reader is not to count mainstream software, embedded systems or anything else where Cobol is practically non-existent). In the following I assume that the code does not include double-counting of multiple installations of the same software (since that is cheating!).
In particular in the time prior to the y2k problem, it has been noted that a lot of Cobol code is already 20 to 30 years old. That would mean it was written in the late 60ies and 70ies. At that time, the market leader was IBM with the IBM/370 mainframe. IBM has put up a historical announcement on his website quoting prices, configuration and availability. According to the sheet, prices are about one million dollars for machines with up to half a megabyte of memory.
Question 1: How many mainframes have actually been sold?
I have not found any numbers for those times; the latest numbers are for the year 2000, again by Gartner. :^(
I would guess that the actual number is in the hundreds or the low thousands; if the market size was 50 billion in 2000 and the market has grown exponentially like any other technology, it might have been merely a few billions back in 1970. Since the IBM/370 was sold for twenty years, twenty times a few thousand will result in a couple of ten-thousands of machines (and that is pretty optimistic)!
Question 2: How large were the programs in lines of code?
I don't know how many bytes of machine code result from one line of source code on that architecture. But since the IBM/370 was a 32-bit machine, any address access must have used 4 bytes plus instruction (2, maybe 3 bytes for that?). If you count in operating system and data for the program, how many lines of code would have fit into the main memory of half a megabyte?
Question 3: Was there no standard software?
Did every single machine sold run a unique hand-coded system without any standard software? Seriously, even if every machine was programmed from scratch without any reuse of legacy code (wait ... didn't that violate one of the claims we started from to begin with???) we might have O(50,000 l.o.c./machine) * O(20,000 machines) = O(1,000,000,000 l.o.c.).
That is still far, far, far away from 200 billion! Am I missing something obvious here?
Question 4: How many programmers did we need to write 200 billion lines of code?
I am really not sure about this one, but if we take an average of 10 l.o.c. per day, we would need 55 million man-years to achieve this! In the time-frame of 20 to 30 years this would mean that there must have existed two to three million programmers constantly writing, testing, debugging and documenting code. That would be about as many programmers as we have in China today, wouldn't it?
EDIT: Several people have brought up automatic templating systems/code generators or so. Could somebody elaborate on this? I have two issues with that: a) I need to tell the system what it is supposed to do for me; for that I need to communicate with the computer and the computer will output the code. This is exactly what a compiler of a programming language does. So essentially I am using a different high-level programming language to generate my Cobol code. Shouldn't I work with that other high-level language instead of Cobol? Why the middle-man? b) In the 70s and 80s the most precious commodity was memory. So if I have a programming language output something, it should better be concise. Using my hypothetical meta-language would I really generate verbose and repetitive Cobol code with it rather than bytecode/p-code like other compilers of that time did? END OF EDIT
Question 5: What about the competition?
So far, I have come up with two things here:
1) IBM had their own programming language, PL/I. Above I have assumed that the majority of code has been written exclusively using Cobol. However, all other things being equal I wonder if IBM marketing had really pushed their own development off the market in favor of Cobol on their machines. Was there really no relevant code base of PL/I?
2) Sometimes (also on this board in the thread quoted above) I come across the claim that the "200 billion lines of code" are simply invisible to anybody outside of "governments, banks ..." (and whatnot). Actually, the DoD had funded their own language in order to increase cost effectiveness and reduce the proliferation of programming language. This lead to their use of Ada. Would they really worry about having so many different programming languages if they had predominantly used Cobol? If there was any language running on "government and military" systems outside the perception of mainstream computing, wouldn't that language be Ada?
I hope someone can point out any flaws in my assumptions and/or conclusions and shed some light on whether the above claim has any truth to it or not.

On the surface, the numbers Gartner produces are akin to answering the
question: How many angels can dance on the head of a pin?.
Unless you obtain a full copy of their report (costing big bucks) you will never know how they came up
with or justified the 200 billion lines of COBOL claim. Having said that, Gartner is a well
respected information technology research and advisory
firm so I would think they would not have made such a claim without justification or
explanation.
It is amazing how this study has been quoted over the years. A Google search for "200 billion lines of COBOL"
got me about 19,500 hits. I sampled a bunch of them and most attribute the number directly to the 1997 the Gartner report.
Clearly, this figure has captured the attention of many.
The method that you have taken to "debunk" the claim has a few problems:
1) How many mainframes have been sold This is a big question in and of itself, probably just as difficult
as answering the 200 billion lines of code question. But more importantly, I don't see how determining the number of
mainframes could be used in constraing the number of lines of code running on them.
2) How large were the programs in lines of code? COBOL programs tend to be large. A modest program can
run to a few thousand lines, a big one into tens of thousands. One of the jokes COBOL programmers
often make is that only one COBOL program has ever been written, the rest are just modified
copies of it. As with many jokes there is a grain of truth in it. Most shops have a large program inventory
and a lot of those programs were built by cutting and pasting from each other. This really "fluffs" up the
size of your code base.
Your assumption that a program must fit into physical memory in order to run is wrong. The size problem
was solved in several different ways (e.g. program overlays, virtual memory etc.). It was not unusual in the
60's and 70's to run large programs on machines with tiny physical memories.
3) Was there no standard software? A lot of COBOL is written
from scratch or from templates. A number of financial packages were developed by software houses the 70's and 80's.
Most of these
were distributed as source code libraries. The customer then copied and modified the source to
fit their particular business
requirement. More "fluffing" of the code base - particularly given that large segments of that code
was logically unexecutable once the package had been "configured".
4) How many programmers did we need to write 200 billion lines of code Not as many as you might think!
Given that COBOL tends to be verbose and highly replicated, a programmer can have huge "productivity".
Program generating systems were in vogue during the 70's and early 80's.
I once worked with a product (now defunct fortunately) that let me write "business logic" and it
generated all of the "boiler plate" code around it - producing a fully functional COBOL program. The code
it generated was, to be polite, verbose in the extreme. I could produce a 15K line COBOL program from
about 200 lines of input! We are taking serious fluff here!
5) What about the competition? COBOL has never really had much serious competition in the
financial and insurance sectors. PL/1 was a major IBM initiative to produce the one programming language
that met every possible computing need. Like all such initiatives it was too ambitious and
has pretty much collapsed inward. I believe IBM still uses and supports it today. During the 70's
several other languages were predicted to replace COBOL - ALGOL, ADA and C come to mind, but
none have. Today I hear the same said of Java and .Net. I think the major reason COBOL is still with us is that it
does what it is supposed to do very well and
the huge multi billion lines of code legacy make moving to a "better" language both expensive and
risky from a business point of view.
Do I believe the 200 billion lines of code story? I find the number high but not impossibly high given
the number of ways COBOL code tends to "fluff" itself up over time.
I also think that getting too involved in analyzing these numbers quickly degrades into a
"counting angels" exercise - something people can get really wound up over but has no
significance in the real world.
EDIT
Let me address a few of the comments left below...
Never seen a COBOL program used by an investment bank. Quite possible. Depends
which application areas you were working in. Investment banks tend to have
large computing infrastructures and employ a wide range of technologies. The shop
I have been working in
for the past 20 years (although not an investment bank) is one of the largest in
the country and it has a significant
COBOL investment. It also has significant Java, C and C++ investments as
well as pockets of just about every other technology
known to man. I have also met some fairly senior applications developers here that
were completely unaware that COBOL was still in use. I did a
rough line count through our source control system and found around 70 million lines of
production COBOL. Quite a few people that have worked here for years are completely oblivious to it!
I am also aware that COBOL is rapidly declining as a language of choice, but the fact
is, there is still a lot of it around today. In 1997, the period to which this question
relates, I believe COBOL would have been the dominant language in terms of LOC. The
point of the question is: Could there have been 200 billion lines of it in 1997?
Counting mainframes. Even if one were able to obtain the number of mainframes it would
be difficult to assess the "compute" power they represented. Mainframes, like most
other computers, come in a wide range of configurations and processing capacity.
If we could say there were exactly "X" mainframes in use in 1997, you still need to
estimate the processing capacity they represented, then you need to figure out what
percentage of the work load was due to running COBOL programs and a bunch of other
confounding factors. I just don't see how this line of reasoning would ever
yield an answer with an acceptable margin of error.
Multi-counting lines of code. That was exactly my point when
referring to the COBOL "fluff" factor. Counting lines of COBOL can be a very misleading statistic
simply because a significant amount of it was never written by programmers in the
first place. Or if it was, quite a bit of it was done using the cut-paste-tinker
"design pattern".
Your observation that memory was a valuable commodity in 1997 and prior is true. One would think that
this would have lead to using the most efficient coding techniques and languages
available to maximize its use. However, there are other factors: The opportunity cost of having an application
backlog was often perceived to outweigh the cost of bringing in more memory/cpu to deal with less than
optimal code (which could be cranked out quite a bit faster). This thinking was further reinforced by the
observation that Moore's Law leads to ever
declining hardware costs whereas software development costs remain constant. The "logical" conclusion
was to crank out less than optimal code, suffer for a while, then reap the benefits
in the years to come (IMO, a lesson in bad planning and greed, still common today).
The push to deliver applications during the 70's through 90's led to the rise of a host of
code generators (today I see frameworks of various flavours fulfilling this role).
Many of these code generators emitted tons of COBOL code. Why emit COBOL code? Why not emit
assembler or p-code or something much more efficient? I believe the answer is
one of risk mitigation. Most code generators are proprietary pieces of software owned by some
third party who may or may not be in business or supporting their product 10 years from now.
It is a very hard sell if you can't provide an iron-clad guarantee that the generated application can be
supported into the distant future. The solution is to have the "generator" produce something
familiar - COBOL for example! Even if the "generator" dies, the resulting application can
be maintained by your existing staff of COBOL programmers. Problem solved ;) (today we see
open source used as a risk mitigation argument).
Returning to the LOC question. Counting lines of COBOL code is, in my opinion, open to
a wide margin of error or at least misinterpretation. Here are a few statistics from an application
I work on (quoted approximately). This
application was built with and is maintained using Basset Frame Technology (frame-work) so
we don't actually write COBOL but we generate COBOL from it.
Lines of COBOL: 7,000,000
Non-Procedure Division: 3,000,000
Procedure Division: 3,500,000
Comment/blank : 500,000
Non-expanded COPY directives: 40,000
COBOL verbs: 2,000,000
Programmer written procedure Division: 900,000
Application frame generated: 270,000
Corporate infrastructure frame generated: 2,330,000
As you can see, almost half of our COBOL programs are non-procedure Division code (data declaration
and the like). The ratio of LOC to Verbs (statement count) is about 7:2. Using our framework
leverages code production by about a factor of 7:1.
So what should you make of all this? I really don't know - except that there is a lot of room to
fluff up the COBOL line counts.
I have worked with other COBOL program generators in the past. Some of these had absolutely
stupid inflation factors (e.g. the 200 lines to 15K line fluffing mentioned earlier). Given all these
inflationary factors and the counting methodology used in by Gartner, it may very well have
been possible to "fluff" up to 200 billion lines of COBOL in 1997 - but the question
remains: Of what real use is this number? What could it really mean? I have no idea. Now
lets get back to the counting angels problem!

I would never defend those clowns at Gartner, but still:
Your ideas about IBM/370s are wrong. The 370 was an architecture, not a specific machine - it was implemented on everything from water cooled monsters to small, mini-computer sized machine (same size as a VAX). The number sold was thus probably far larger, by orders of magnitude, than you suspect.
Also, COBOL was heavily used on DEC's VAX lineup, and before that on the DEC-10 and DEC-20 lines. In the UK it was used on all ICL mainframes. Lots of other platforms also supported it.

[Usual disclaimer - I work for a COBOL vendor]
It's an interesting question and it's always difficult to get a provable number. On the number of COBOL programmers estimates - the 2 - 3 million number may not be orders of magnitude in error. I think there have been estimates of 1 or 2 million in the past. And each one of those can generate a lot of code in a 20 year career. In India tens of thousands of new COBOL programmers are added to the pool every year (perhaps every month!).
I think the automatically generated code is probably bigger than might be thought. For example PACBASE is a very popular application in the banking industry. A very large global bank I know of uses it extensively and they generate all their code into COBOL and estimate this generated code is 95% of their total code base with the other 5% being hand coded/maintained. I don't think this is uncommon. The maintenance and development of those systems is typically done at the model-level not the generated code as you say.
There is a group of applications that were missing from the original question - COBOL isn't only a mainframe language. The early years of Micro Focus were almost entirely spent in the OEM marketplace - we used to have something like 200 different OEMs (lots of long-gone names like DEC, Stratus, Bull, ...). Every OEM had to have a COBOL compiler on their box alongside C and Assembler. A lot of big applications were built at that time and are still going strong - think about the largest HR ERP systems, the largest mobile phone billing systems etc. My point is that there is a lot of COBOL code that was never on an IBM mainframe and is often overlooked.
And finally, the size of the code base may be larger in COBOL shops than the "average". That's not just because COBOL is verbose (or was - that's not been the case for a long time) but the systems are just bigger - they're in large organizations, doing a large number of disparate tasks. It's very common for sites to have 10's of millions of LoC.

I don't have figures, but my first "real" job was on IBM 370s.
First: Number sold. In 1974, a large railway ran on three 370s. These were big 370s, though, and you could get a small one for a whole lot less. (For perspective, at that time whether to get another megabyte was a decision on the VP level.)
Second: COBOL code is what you might call "fluffy", since a typical sentence (COBOLese for line) might be "ADD 1 TO MAIN-ACCOUNT OF CUSTOMER." There would be relatively few machine instructions per line. (This is especially true on the IBM 360 and onwards, which had machine instructions designed around executing COBOL.) BTW, addresses were 16 bits, four to designate a register (using the bottom 24 bits as a base address) and 12 as an offset. This meant that something under 64K could be addressed at a time, since not all of the 16 registers could be used as base registers for various reasons. Don't underestimate the ability of people to fit programs into small memory.
Also, don't underestimate the number of programs. The program library would be on disk and tape, and was essentially only limited by cost and volume. (Earlier on, they'd be on punch cards, which had serious problems as data and program storage.)
Third: Yes, most software was hand-written for the business at that time. Machines were far more expensive then, and people were cheaper. Programs were written to automate the existing business processes, and the idea that you could get outside software and change your business practices was almost heresy. This changed over time, of course.
Fourth: Programmers could go much faster than today, in lines of code per person-year, since these were largely big dumb programs for big dumb problems. In my experience, the DATA DIVISION was a large part of each COBOL program, and that would frequently take large descriptions of file layouts and repeat them in each program in the series.
I have limited experience with program generators, but it was very common at the time to use it to generate an application and then modify the output. This was partly just bad practice, and partly because a program generator couldn't do everything needed.
Fifth: PL/I was not heavily used, despite IBM's efforts. It ran into early bad press, although as far as I know the only real major problem that couldn't be fixed was figuring out the precision system. The Defense Department used Ada and COBOL for entirely different things. You are omitting assembly language as a competitor, and lots of small shops used BAL (also called ASM) instead of COBOL. After all, programmers were cheap, compilers were expensive, and there were a whole lot of things COBOL couldn't do. It was actually a very nice assembly language, and I liked it a lot.

Well, you're asking in the wrong place here. This forum is dominated by .net programmers, with a significant java minority and such a age build-up that only a very small minority has any cobol experience.
The CASE tool market consisted for a large part of cobol code generators. Most tools were write-only, not two-way. That ensures there are a lot of lines of code. This market was somewhat newer than the 70s, so the volume of cobol code grew fast in the 80s and 90s.
A lot of cobol development is done by people having no significant internet access and therefore visibility. There is no need for it. Cobol prorammers are used to having in-house programming courses and paper documentation (lots of it).
[edit] Generating cobol source made a lot of sense. Cobol is very verbose and low level. The various cobol implementations are all slightly different, so configuring the code generator eliminates a lot of potential errors.

With regards to #4: how much of that could have been machine-generated code? I don't know if template-based code was used a lot with Cobol, but I see a lot of it used now for all sorts of things. If my application has thousands of LOC that were machine generated, that doesn't mean much. The last code-generating script I wrote took 20 minutes to write, 10 minutes to format the input, 2 minutes to run, then an hour to execute a suite of automatic tests to verify it actually worked, but the code it generated would have taken several days to do by hand (or the time between the morning meeting and lunch, doin' it my way ;) ). Ok I admit it's not always that easy and there is often manual tweaking involved, but I still don't think the LOC metric has much meaning if code-generators are in heavy use.
Maybe that's how they generated so much code in so little time.

Software to Tune/Calibrate Properties for Heuristic Algorithms

Today I read that there is a software called WinCalibra (scroll a bit down) which can take a text file with properties as input.
This program can then optimize the input properties based on the output values of your algorithm. See this paper or the user documentation for more information (see link above; sadly doc is a zipped exe).
Do you know other software which can do the same which runs under Linux? (preferable Open Source)
EDIT: Since I need this for a java application: should I invest my research in java libraries like gaul or watchmaker? The problem is that I don't want to roll out my own solution nor I have time to do so. Do you have pointers to an out-of-the-box applications like Calibra? (internet searches weren't successfull; I only found libraries)
I decided to give away the bounty (otherwise no one would have a benefit) although I didn't found a satisfactory solution :-( (out-of-the-box application)

Some kind of (Metropolis algorithm-like) probability selected random walk is a possibility in this instance. Perhaps with simulated annealing to improve the final selection. Though the timing parameters you've supplied are not optimal for getting a really great result this way.
It works like this:
You start at some point. Use your existing data to pick one that look promising (like the highest value you've got). Set o to the output value at this point.
You propose a randomly selected step in the input space, assign the output value there to n.
Accept the step (that is update the working position) if 1) n>o or 2) the new value is lower, but a random number on [0,1) is less than f(n/o) for some monotonically increasing f() with range and domain on [0,1).
Repeat steps 2 and 3 as long as you can afford, collecting statistics at each step.
Finally compute the result. In your case an average of all points is probably sufficient.
Important frill: This approach has trouble if the space has many local maxima with deep dips between them unless the step size is big enough to get past the dips; but big steps makes the whole thing slow to converge. To fix this you do two things:
Do simulated annealing (start with a large step size and gradually reduce it, thus allowing the walker to move between local maxima early on, but trapping it in one region later to accumulate precision results.
Use several (many if you can afford it) independent walkers so that they can get trapped in different local maxima. The more you use, and the bigger the difference in output values, the more likely you are to get the best maxima.
This is not necessary if you know that you only have one, big, broad, nicely behaved local extreme.
Finally, the selection of f(). You can just use f(x) = x, but you'll get optimal convergence if you use f(x) = exp(-(1/x)).
Again, you don't have enough time for a great many steps (though if you have multiple computers, you can run separate instances to get the multiple walkers effect, which will help), so you might be better off with some kind of deterministic approach. But that is not a subject I know enough about to offer any advice.

There are a lot of genetic algorithm based software that can do exactly that. Wrote a PHD about it a decade or two ago.
A google for Genetic Algorithms Linux shows a load of starting points.

Intrigued by the question, I did a bit of poking around, trying to get a better understanding of the nature of CALIBRA, its standing in academic circles and the existence of similar software of projects, in the Open Source and Linux world.
Please be kind (and, please, edit directly, or suggest editing) for the likely instances where my assertions are incomplete, inexact and even flat-out incorrect. While working in related fields, I'm by no mean an Operational Research (OR) authority!
[Algorithm] Parameter tuning problem is a relatively well defined problem, typically framed as one of a solution search problem whereby, the combination of all possible parameter values constitute a solution space and the parameter tuning logic's aim is to "navigate" [portions of] this space in search of an optimal (or locally optimal) set of parameters.
The optimality of a given solution is measured in various ways and such metrics help direct the search. In the case of the Parameter Tuning problem, the validity of a given solution is measured, directly or through a function, from the output of the algorithm [i.e. the algorithm being tuned not the algorithm of the tuning logic!].
Framed as a search problem, the discipline of Algorithm Parameter Tuning doesn't differ significantly from other other Solution Search problems where the solution space is defined by something else than the parameters to a given algorithm. But because it works on algorithms which are in themselves solutions of sorts, this discipline is sometimes referred as Metaheuristics or Metasearch. (A metaheuristics approach can be applied to various algorihms)
Certainly there are many specific features of the parameter tuning problem as compared to the other optimization applications but with regard to the solution searching per-se, the approaches and problems are generally the same.
Indeed, while well defined, the search problem is generally still broadly unsolved, and is the object of active research in very many different directions, for many different domains. Various approaches offer mixed success depending on the specific conditions and requirements of the domain, and this vibrant and diverse mix of academic research and practical applications is a common trait to Metaheuristics and to Optimization at large.
So... back to CALIBRA...
From its own authors' admission, Calibra has several limitations
Limit of 5 parameters, maximum
Requirement of a range of values for [some of ?] the parameters
Works better when the parameters are relatively independent (but... wait, when that is the case, isn't the whole search problem much easier ;-) )
CALIBRA is based on a combination of approaches, which are repeated in a sequence. A mix of guided search and local optimization.
The paper where CALIBRA was presented is dated 2006. Since then, there's been relatively few references to this paper and to CALIBRA at large. Its two authors have since published several other papers in various disciplines related to Operational Research (OR).
This may be indicative that CALIBRA hasn't been perceived as a breakthrough.

State of the art in that area ("parameter tuning", "algorithm configuration") is the SPOT package in R. You can connect external fitness functions using a language of your choice. It is really powerful.
I am working on adapters for e.g. C++ and Java that simplify the experimental setup, which requires some getting used to in SPOT. The project goes under name InPUT, and a first version of the tuning part will be up soon.

What would programming languages look like if every computable thing could be done in 1 second?

Inspired by this question
Suppose we had a magical Turing Machine with infinite memory, and unlimited CPU power.
Use your imagination as to how this might be possible, e.g. it uses some sort of hyperspace continuum to automatically parallelize anything as much as is desired, so that it could calculate the answer to any computable question, no matter what it's time complexity is and number of actual "logical steps", in one second.
However, it can only answer computable questions in one second... so I'm not positing an "impossible" machine (at least I don't think so)... For example, this machine still wouldn't be able to solve the halting problem.
What would the programming language for such a machine look like? All programming languages I know about currently have to make some concessions to "algorithmic complexity"... with that constraint removed though, I would expect that all we would care about would be the "expressiveness" of the programming language. i.e. its ability to concisely express "computable questions"...
Anyway, in the interests of a hopefully interesting discussion, opening it up as community wiki...

SendMessage travelingSalesman "Just buy a ticket to the same city twice already. You'll spend much more money trying to solve this than you'll save by visiting Austin twice."
SendMessage travelingSalesman "Wait, they built what kind of computer? Nevermind."

This is not really logical. If a thing takes O(1) time, then doing n times will take O(n) time, even on a quantum computer. It is impossible that "everything" takes O(1) time.
For example: Grover's algorithm, the one mentioned in the accepted answer to the question you linked to, takes O(n^1/2) time to find an element in a database of n items. And thats not O(1).

The amount of memory or the speed of the memory or the speed of the processor doesn't define the time and space complexity of an algorithm. Basic mathematics do that. Asking what would programming languages look like if everything could be computed in O(1) is like asking how would our calculators look like if pi was 3 and the results of all square roots are integers. It's really impossible and if it isn't, it's not likely to be very useful.
Now, asking ourself what we would do with infinite process power and infinite memory could be a useful exercise. We'll still have to deal with complexity of algorithms but we'd probably work somehow differently. For that I recommend The Hundred-Year Language.

Note that even if the halting problem is not computable, "does this halt within N steps on all possible inputs of size smaller than M" is!
As such any programming language would become purely specification. All you need to do is accurately specify the pre and post conditions of a function and the compiler could implement the fastest possible code which implements your spec.
Also, this would trigger a singularity very quickly. Constructing an AI would be a lot easier if you could do near infinite computation -- and once you had one, of any efficiency, it could ask the computable question "How would I improve my program if I spent a billion years thinking about it?"...

It could possibly be a haskell-ish language. Honestly it's a dream to code in. You program the "laws" of your types, classes, and functions and then let them loose. It's incredibly fun, powerful, and you can write some very succinct and elegant code. It's like an art.

Maybe it would look more like pseudo-code than "real" code. After all, you don't have to worry about any implementation details any more because whichever way you go, it'll be sufficiently fast enough.

Scalability would not be an issue any longer. We'd have AIs way smarter than us.
We wouldn't need to program any longer and instead the AI would figure out our intentions before we realize them ourselves.

SQL is such a language - you ask for some piece of data and you get it. If you didn't have to worry about minute implementation details of the db this might even be fun to program in.

Your underestimate the O(1). It means that there exists a constant C>0 such that time to compute a problem is limited to this C.
What you ignore is that the actual value of C can be large and it can (and mostly is) different for different algorithms. You may have two algorithms (or computers - doesn't matter) both with O(1) but in one this C may be billion times bigger that in another - then the latter will be much slower and perhaps very slow in terms of time.

If it will all be done in one second, then most languages will eventually look like this, I call it DWIM theory (Do what I mean theory):
Just do what I said (without any bugs this time)
Because if we ever develop a machine that can compute everything in one second, then we will probably have mind control at that stage, and at the very least artificial intelligence.

I don't know what new languages would come up (I'm a physicist, not a computer scientist) but I'd still write my programs for it in Python.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string