why did the y2k bug exist? - y2k

The explanation I got for this was that it was common practice for programmers to simply represent the year with 2 digits. But why would anyone do that? If anything I imagine it would take more effort to make a program roll back to 1900 instead of going on to 2000.

Premium of storage space + lack of foresight = Y2K bug.
Saving bytes was very important in many older systems. Plus, a common fallacy in software development is "no one's going to be using this in X years". So why not save those bytes now? 10/20/30 years down the line this will SURELY be scrapped for a completely new system.
To quote Lex Luthor -- "Wrong."

Why? Their question was likely, "Why not?" If it saved a few bits in a world where memory usage was significantly more limited, then they figured they may as well save that space.
Obviously, the "why not" was because "your software might actually be in use for significant amounts of time." Some programmers had the foresight to plan ahead, but not all.

The story goes that this was back in the day when 1 kilobyte of RAM cost more than $1000. Omitting the extra digits meant really money savings.

Related

Are limitations of CPU speed and memory prevent us from creating AI systems?

Many technology optimists say that in 15 years the speed of computers will be comparable with the speed of the human brain. This is why they believe that computers will achieve the same level of intelligence as humans.
If Moore's law holds, then every 18 months we should expect doubling of CPU speed. 15 years is 180 months. So, we will have the doubling 10 times. Which means that in 15 years computer will be 1024 times faster than they are now.
But is the speed the reason of the problem? If it is so, we would be able to build an AI system NOW, it would just 1024 times slower than in 15 years. Which means that to answer a question it will need 1024 second (17 minutes) instead of acceptable 1 second. But do we have now strong (but slow) AI system? I think no. Even if now (2015) we give to a system 1 hour instead of 17 minutes, or 1 day, or 1 month or even 1 year, it still will be unable to answer complex questions formulated in natural language. So, it is not the speed that causes problems.
It means that in 15 years our intelligence will not be 1024 faster than now (because we have no intelligence). Instead our "stupidity" will be 1024 times faster than now.
We need both faster hardware and better algorithms. Of course speed alone is not enough as you pointed out.
We need self-modifying meta-learning algorithms capable of creating hypotheses and performing experiments to verify them (like humans do). Systems that are learning to learn and self-improving. Algorithms that can prove that given self-modification is optimal in certain sense and will lead to even better self-modifications in the future. Systems that can reflect on and inspect their own software (can you call it consciousness ?). Such research is being done and may create superhuman intelligence in the future or even technological singularity as some believe.
There is one problem with this approach, though. People doing this research usually assume that consciousness is computable. That it is all about intelligence. They don't take into account experiences like pleasure and pain which have nothing to do (in my opinion) with computation nor intellect. You can understand pain through experience only (not intellectual speculation). Setting variable pleasure to 5 or behaving like one feels pleasure is very different from experiencing pleasure. Some people say that feelings originate in brain so it is enough to understand brain. Not necessarily. Child can ask: "How did they put small people inside TV box ?". Of course TV is just a receiver and there are no small people inside. Brain might be receiver too. Do we need higher knowledge for feelings and other experiences ?
The answer has to be answered in the context of computation and complexity.
Every algorithm has its own complexity and running time (See Big O notation). There are problems which are non-computable problems such as the halting problem. These problems are proven that an algorithm does not exists independent of the hardware.
Computable algorithms are described in the number of steps required with respect to the input to solve an algorithm. As the number of input increases, the execution time of the algorithm also increases. However, these algorithms can be categorized into two: exponential time algorithms and non-exponential time algorithms. Exponential time algorithms increases drastically with the number of input and becomes intractable.
These executing time of these problems can be improved with better hardware however the complexity will always be the same. This means that no matter what the CPU uses, the execution time will always require the same number of steps. This means that the hardware is important to provide an answer in less time but the hardness of the problem will always remain the same. Thus, the limitation of the hardware is not preventing us from creating an AI system. For instance, you can use parallel programming (ex: GPU) to improve the execution time of the algorithm drastically but the algorithm is still the same as a normal CPU algorithm.
I would say no. As you showed, speed is not the only factor of intelligence. I for one would think Language is, yes language. Language is the primary skill we learn as humans, so why not for computers? Language gives an understanding that can be understood across the globe, given you know that language. Humans use nonverbal and verbal language to communicate. But I honestly think it really works something like this:
Humans go through experiences. These experiences have a bigger impact on our lives the closer we are to our birth date, or the more emotional they are. For example, the first time we are told no means ALOT more to us as an infant than as a 70 year old adult. These get stored as either long term or short term memory and correlated to that event later on in life for reference. We mainly store events to learn from them to prevent negative experience or promote positive experiences.
Think of it as a tag cloud. The more often you do task A, the bigger the cloud is in memory. We then store crucial details such as type of emotion, location, smells etc. Now when we reference them again from memory we pick out those details and create a logical sentence:
Touching that stove hurt me when I was at grandma's house.
All of the bolded words would have to be stored to have a complete memory.
Now inside of this sentence we have learned a lot more things than just being hurt from the stove at grandma's house. We have learned that stove's can be hot, dangerous, and grandma allows it to be in her house. We also learned how long it takes to heal from such an event, emotionally and physically to gauge how important the event is. And so much more. So we also store this sub-event information inside of other knowledge bubbles. And these bubbles continue to grow exponentially.
Now when asked: Are stoves dangerous?
You can identify the words in the sentence:
are, stoves, dangerous, question
and reference the definition of dangerous as: hurt, bad
and then provide more evidence that this is true, such as personal experience to result in:
Yes, stoves are dangerous because I was hurt at grandmas house by one.
So intelligence seems to be a mix of events, correlation and data retrieval to solve some solution. I'm sure there's a lot more to it than that but this is just my understanding of intelligence.

Why don't hardware failures show up at the programming language level? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 9 years ago.
Improve this question
I am wondering if anyone can give my a good answer, or at least point me in the direction of a good reference to the following question:
How come I have never heard of a computer breaking in a very fundamental way? How come when I declare x to be a double it stays as a double? How come there is never a short circuit that robs it of some bytes and makes it an integer? Why do we have faith that when we initialize x to 10, there will never be a power surge that will cause it to become 11, or something similar?
I think I need a better understanding of memory.
Thanks, and please don't bash me over the head for such a simple/abstract question.
AFAIK, the biggest such problem is cosmic background radiation. If a gamma ray hits your memory chip, it can randomly flip a memory bit. This does happen even in your computer every now and then. It usually doesn't cause any problem, since it is unlikely that it is the very bit in your Excel input field, for example, and magnetic drives are protected against such accidents. However, it is a problem for long, large calculations. That's what ECC memory is invented for. You can also find more info about this phenomenon here:
http://en.wikipedia.org/wiki/ECC_memory
"The actual error rate found was several orders of magnitude higher than previous small-scale or laboratory studies, with 25,000 to 70,000 errors per billion device hours per megabit (about 2.5–7 × 10−11 error/bit·h)(i.e. about 5 single bit errors in 8 Gigabytes of RAM per hour using the top-end error rate), and more than 8% of DIMM memory modules affected by errors per year."
How come I have never heard of a computer breaking in a very fundamental way?
Hardware is fantastically complicated and there are a huge number of engineers whose job it is to make sure that the hardware works as intended. Whenever Intel, AMD, etc. release chips, they've extensively tested the design and run all sorts of diagnostics before it leaves the plant. They have an economic incentive to do this: if there's a mistake somewhere, it can be extremely costly. Look at the Intel FDIV bug for an example.
How come when I declare x to be a double it stays as a double? How come there is never a short circuit that robs it of some bytes and makes it an integer?
Part of this has to do with how the assembly works. Typically, compiled application binaries don't have any type information in them. Instead, they just issue commands like "take the four bytes at position 0x243598F0 and load them into a register." For a variable's type to mutate somehow, a huge amount of application code would have to change. If there was an error that underallocated the space for the variable, it would mess up the stack layout and probably cause a pretty quick program crash, so chances are the result would be "it crashed" rather than "the type got mutated," especially since at a binary level the operations on doubles and integral types are so different.
Why do we have faith that when we initialize x to 10, there will never be a power surge that will cause it to become 11, or something similar?
There might be! It's extremely rare, though, because the hardware people do such a good job designing everything. One of the nifty things about being a software engineer is that you sit on top of the food chain:
Software engineers write software that runs in an operating system,
which was written by systems programmers and talks to the hardware,
which was designed by electrical engineers and is built out of hardware gates,
which were fabricated and designed by materials engineers,
who got their materials due to the efforts of mining engineers,
etc.
Lots of engineers make a good living at each link in the chain, which is why everything is so well-tested. Errors do occur, and they do take down real computer systems, but it's relatively rare unless you have thousands or millions of computers running.
Hope this helps!
Answer 1, most of us rarely or never work on a large enough system or one where this type of consideration is needed. In large databases or in file systems, they have error detection to notice what you describe present. In physically or data large systems, errors when writing or storing data happen (ex,. packets get lost or corrupted mid-journey, gamma rays hit). Sectors go bad in your hard drive all the time. We have hashing, parity checking, and a host of other methods to notify us when funny issues happen.
Answer 2: Our axioms & models. The models we tend to use, the Imperative or Functional model, doesn't have the 'gamma ray from the sun changed a bit' present as a consideration. Just like how an environmental scientist may abstract away quarks when studying environmental changes, we abstract away hardware.
Edit #X:
This is a great question. I actually heard this recently, by accident. In Physics, their models are wrong. Dead wrong. And they know they are wrong but they use them anyway. When I said this to justify my distain for the subject, the CS technician at my school verbally back-handed me. He basically said what you said, how do we know 'int x=10' isn't 11 randomly later?

justify my love of gvim [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 12 years ago.
I have been using gvim at work for a year or so, just at the point where I'm loving it, getting the hang of it and trying to j,k all over Microsoft Outlook. Then my computer died. Now, originally I had installed gvim myself, which at the time was a "no-no" and is now is really a bad idea (what with all the people introducing viruses to the network and whatnot).
We have a software review board to which I was sent when I wanted gvim "legally" installed. I was told that the standard text editor is UltraEdit and they don't want to support more than one. If I want to use gvim I need to talk management into making it the standard.
I'm kind of at a loss. Obviously, I can tout the cost savings, but I was having a hard time explaining what my fuss was about. If it were another programmer, I'd just force them to use it and they'd figure it out for themselves. But management folk aren't much interested in not being able to figure out you need to "i" before you can type, er, insert.
I told my manager it was like having a rowboat instead of swimming everywhere. And sometimes you're motorboating in that thing, but I'm looking for concise, compelling arguments which aren't based on bad analogies. There are a number of similar-ish questions, but I fear they trend too technical. Any ideas?
And after all your awesome advice wins the day for me, how do I ease former UltraEdit users into becoming gvimmers?
Update:
Thanks for the answers! I accepted one but took from many (don't know if that matters as question is now closed). Even though it was apparently too open-ended it is helping me plead my case with the powers-that-be.
Seems simple enough. Tell them that you are far more proficient with Vim and that you know next to nothing about UltraEdit. Whether this is true is irrelevant - provisioning requests for software aren't delivered under oath :-)
This has two effects:
you won't need the IT staff to support you since you such a guru.
you won't need weeks of ramp-up time trying to figure out how UltraEdit works.
Managers understand cost/benefit analyses. The cost of letting you use Vim is zero. The cost of making you use UltraEdit is considerably more.
Likewise Vim's benefits are high since you're immediately productive.
The company where I work actually has two classes of software that they let us use. The first is the stuff they support. The second is stuff that you need to get yourself (off the company distribution site, not from outside, they're still paranoid about malware and rightly so) and, if you have trouble with it, don't call them.
But don't make the mistake of trying to evangelise Vim. You want to be given a choice, not try to convince everyone else to have their choice taken away.
gvim is a portable app. So don't install it but have it anyway.
The argument I would use is that individual developers are more productive in different environments and this one doesn't even cost them anything. And, on that note, while I'm a gvim lover myself, I think forcing others onto it is guaranteed to only make them hate it.
To be honest, I don't know what UltraEdit provides that Notepad++ doesn't - which suggests a waste of money.
But, their response seems like a canned "we don't want to do our job so go away". If I were in your position I would present the use cases that I used with vi and DEMAND that they show me how to do the same thing in UltraEdit because they "support" that product. And trust me, I would make sure I make multiple tickets in the ticketing system just to piss them off. And at any point if they say "I don't know", contact their supervisor and ask them why you can't have gvim installed when the techs don't even know about the "supported" software.
If they refuse to help you or take their time, contact their supervisor and tell them they are impairing your ability to do your job.
Eventually someone will listen to you and cave :).
gvim is indeed a thing of great power. Grown men have been known to weep at the mere thought of its beauty. The productivity increases provided by this tool are immense if you know them by heart, and switching back to a conventional editor can make you feel as if you are typing with only your thumbs.
Given this, I would suggest you take some sort of productivity measurement, if you can. For similar straightforward development tasks, measure the lines of code you output in n hours with gvim, and then with UltraEdit. Include tasks such as refactoring into these measurements. Then, take these numbers to management and say, "Would you have me perform at 1/x the speed that I could be performing? Remember, this is dollars and cents we are talking about!"
Also assure these naive creatures that gvim is not a virus and will not take down the network in flames. It is, in fact, a text editor.
Implore them to amend the standards to allow for the application of a little logic. A little logic can go a long, long way.
Good luck to you, roger. As a fellow gvim enthusiast, I salute you.
This question is a better fit for programmers.stackexchange.com. But anyway. I think this whole "everybody at work must use one editor only" is absurd. Whatever happened to "different strokes for different folks", especially for creative types like programmers?
If your work doesn't see programmers as creative types, then you have a bigger problem. Time to visit careers.stackoverflow.com. ;-)
As a personal aside, I type with Dvorak. I don't necessarily want to convert all my workmates to Dvorak, but, I would find a different job if work made me use qwerty. There is simply no way I would agree to retrain myself on qwerty given that I type at 100 to 120 wpm on Dvorak, and no amount of qwerty training will get me to that speed.
Under these circumstances, I would consider going rogue.
I'm afraid you've presented a no win situation that I've faced many times in my programming career - a draconian policy inflicted on productive employees by middle management. A vain effort to homogenize the environment and work force beyond any level that can be considered reasonable.
Ponder the consequences of going rogue, by installing vim on your box anyways, and see if they are worth the benefit to you. If you decide that it is worth it, just do it. It's not like you are doing something illegal, after all. If the consequences are dire, I'm afraid you will have to cave in and start using UltraEdit. It's not the end of the world (it could have been notepad), but as an avid vim user myself, I feel your pain.
Update: I see people are voting me down, but this is the real world and the real world isn't perfect(ly theoretical in nature). Sometimes sacrifices have to be made, but in the end it's still your decision and only you have enough information to weigh the consequences. All we can do is present you with options, some more extreme than others...
Programmers are a very expensive resource, and you are losing productivity by using UltraEdit. Just do a little math:
Suppose you spend 60 minutes a day for a month dealing with UltraEdit instead of programming. Then, maybe after month of adjustment, it only takes you an extra 30 minutes a day to use UltraEdit. Add those minutes together, and you get nearly 20 days per year! This means it costs your company nearly a month of your time every year to use UltraEdit.
Now find a few colleagues who have similar opinions. If four or five of you get together, the amount of lost time gets really big really fast.
Just flip the numbers around, and tell your manager that you know a great way to A) save the company a bunch of money or B) greatly improve programmer productivity.
Whether that argument will work depends on your company (and your position in the company).
The people who craft IT policies should understand that a programmer's computer needs are quite different from those of the average business user.

Cobol: science and fiction [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 12 years ago.
There are a few threads about the relevance of the Cobol programming language on this forum, e.g. this thread links to a collection of them. What I am interested in here is a frequently repeated claim based on a study by Gartner from 1997: that there were around 200 billion lines of code in active use at that time!
I would like to ask some questions to verify or falsify a couple of related points. My goal is to understand if this statement has any truth to it or if it is totally unrealistic.
I apologize in advance for being a little verbose in presenting my line of thought and my own opinion on the things I am not sure about, but I think it might help to put things in context and thus highlight any wrong assumptions and conclusions I have made.
Sometimes, the "200 billion lines" number is accompanied by the added claim that this corresponded to 80% of all programming code in any language in active use. Other times, the 80% merely refer to so-called "business code" (or some other vague phrase hinting that the reader is not to count mainstream software, embedded systems or anything else where Cobol is practically non-existent). In the following I assume that the code does not include double-counting of multiple installations of the same software (since that is cheating!).
In particular in the time prior to the y2k problem, it has been noted that a lot of Cobol code is already 20 to 30 years old. That would mean it was written in the late 60ies and 70ies. At that time, the market leader was IBM with the IBM/370 mainframe. IBM has put up a historical announcement on his website quoting prices, configuration and availability. According to the sheet, prices are about one million dollars for machines with up to half a megabyte of memory.
Question 1: How many mainframes have actually been sold?
I have not found any numbers for those times; the latest numbers are for the year 2000, again by Gartner. :^(
I would guess that the actual number is in the hundreds or the low thousands; if the market size was 50 billion in 2000 and the market has grown exponentially like any other technology, it might have been merely a few billions back in 1970. Since the IBM/370 was sold for twenty years, twenty times a few thousand will result in a couple of ten-thousands of machines (and that is pretty optimistic)!
Question 2: How large were the programs in lines of code?
I don't know how many bytes of machine code result from one line of source code on that architecture. But since the IBM/370 was a 32-bit machine, any address access must have used 4 bytes plus instruction (2, maybe 3 bytes for that?). If you count in operating system and data for the program, how many lines of code would have fit into the main memory of half a megabyte?
Question 3: Was there no standard software?
Did every single machine sold run a unique hand-coded system without any standard software? Seriously, even if every machine was programmed from scratch without any reuse of legacy code (wait ... didn't that violate one of the claims we started from to begin with???) we might have O(50,000 l.o.c./machine) * O(20,000 machines) = O(1,000,000,000 l.o.c.).
That is still far, far, far away from 200 billion! Am I missing something obvious here?
Question 4: How many programmers did we need to write 200 billion lines of code?
I am really not sure about this one, but if we take an average of 10 l.o.c. per day, we would need 55 million man-years to achieve this! In the time-frame of 20 to 30 years this would mean that there must have existed two to three million programmers constantly writing, testing, debugging and documenting code. That would be about as many programmers as we have in China today, wouldn't it?
EDIT: Several people have brought up automatic templating systems/code generators or so. Could somebody elaborate on this? I have two issues with that: a) I need to tell the system what it is supposed to do for me; for that I need to communicate with the computer and the computer will output the code. This is exactly what a compiler of a programming language does. So essentially I am using a different high-level programming language to generate my Cobol code. Shouldn't I work with that other high-level language instead of Cobol? Why the middle-man? b) In the 70s and 80s the most precious commodity was memory. So if I have a programming language output something, it should better be concise. Using my hypothetical meta-language would I really generate verbose and repetitive Cobol code with it rather than bytecode/p-code like other compilers of that time did? END OF EDIT
Question 5: What about the competition?
So far, I have come up with two things here:
1) IBM had their own programming language, PL/I. Above I have assumed that the majority of code has been written exclusively using Cobol. However, all other things being equal I wonder if IBM marketing had really pushed their own development off the market in favor of Cobol on their machines. Was there really no relevant code base of PL/I?
2) Sometimes (also on this board in the thread quoted above) I come across the claim that the "200 billion lines of code" are simply invisible to anybody outside of "governments, banks ..." (and whatnot). Actually, the DoD had funded their own language in order to increase cost effectiveness and reduce the proliferation of programming language. This lead to their use of Ada. Would they really worry about having so many different programming languages if they had predominantly used Cobol? If there was any language running on "government and military" systems outside the perception of mainstream computing, wouldn't that language be Ada?
I hope someone can point out any flaws in my assumptions and/or conclusions and shed some light on whether the above claim has any truth to it or not.
On the surface, the numbers Gartner produces are akin to answering the
question: How many angels can dance on the head of a pin?.
Unless you obtain a full copy of their report (costing big bucks) you will never know how they came up
with or justified the 200 billion lines of COBOL claim. Having said that, Gartner is a well
respected information technology research and advisory
firm so I would think they would not have made such a claim without justification or
explanation.
It is amazing how this study has been quoted over the years. A Google search for "200 billion lines of COBOL"
got me about 19,500 hits. I sampled a bunch of them and most attribute the number directly to the 1997 the Gartner report.
Clearly, this figure has captured the attention of many.
The method that you have taken to "debunk" the claim has a few problems:
1) How many mainframes have been sold This is a big question in and of itself, probably just as difficult
as answering the 200 billion lines of code question. But more importantly, I don't see how determining the number of
mainframes could be used in constraing the number of lines of code running on them.
2) How large were the programs in lines of code? COBOL programs tend to be large. A modest program can
run to a few thousand lines, a big one into tens of thousands. One of the jokes COBOL programmers
often make is that only one COBOL program has ever been written, the rest are just modified
copies of it. As with many jokes there is a grain of truth in it. Most shops have a large program inventory
and a lot of those programs were built by cutting and pasting from each other. This really "fluffs" up the
size of your code base.
Your assumption that a program must fit into physical memory in order to run is wrong. The size problem
was solved in several different ways (e.g. program overlays, virtual memory etc.). It was not unusual in the
60's and 70's to run large programs on machines with tiny physical memories.
3) Was there no standard software? A lot of COBOL is written
from scratch or from templates. A number of financial packages were developed by software houses the 70's and 80's.
Most of these
were distributed as source code libraries. The customer then copied and modified the source to
fit their particular business
requirement. More "fluffing" of the code base - particularly given that large segments of that code
was logically unexecutable once the package had been "configured".
4) How many programmers did we need to write 200 billion lines of code Not as many as you might think!
Given that COBOL tends to be verbose and highly replicated, a programmer can have huge "productivity".
Program generating systems were in vogue during the 70's and early 80's.
I once worked with a product (now defunct fortunately) that let me write "business logic" and it
generated all of the "boiler plate" code around it - producing a fully functional COBOL program. The code
it generated was, to be polite, verbose in the extreme. I could produce a 15K line COBOL program from
about 200 lines of input! We are taking serious fluff here!
5) What about the competition? COBOL has never really had much serious competition in the
financial and insurance sectors. PL/1 was a major IBM initiative to produce the one programming language
that met every possible computing need. Like all such initiatives it was too ambitious and
has pretty much collapsed inward. I believe IBM still uses and supports it today. During the 70's
several other languages were predicted to replace COBOL - ALGOL, ADA and C come to mind, but
none have. Today I hear the same said of Java and .Net. I think the major reason COBOL is still with us is that it
does what it is supposed to do very well and
the huge multi billion lines of code legacy make moving to a "better" language both expensive and
risky from a business point of view.
Do I believe the 200 billion lines of code story? I find the number high but not impossibly high given
the number of ways COBOL code tends to "fluff" itself up over time.
I also think that getting too involved in analyzing these numbers quickly degrades into a
"counting angels" exercise - something people can get really wound up over but has no
significance in the real world.
EDIT
Let me address a few of the comments left below...
Never seen a COBOL program used by an investment bank. Quite possible. Depends
which application areas you were working in. Investment banks tend to have
large computing infrastructures and employ a wide range of technologies. The shop
I have been working in
for the past 20 years (although not an investment bank) is one of the largest in
the country and it has a significant
COBOL investment. It also has significant Java, C and C++ investments as
well as pockets of just about every other technology
known to man. I have also met some fairly senior applications developers here that
were completely unaware that COBOL was still in use. I did a
rough line count through our source control system and found around 70 million lines of
production COBOL. Quite a few people that have worked here for years are completely oblivious to it!
I am also aware that COBOL is rapidly declining as a language of choice, but the fact
is, there is still a lot of it around today. In 1997, the period to which this question
relates, I believe COBOL would have been the dominant language in terms of LOC. The
point of the question is: Could there have been 200 billion lines of it in 1997?
Counting mainframes. Even if one were able to obtain the number of mainframes it would
be difficult to assess the "compute" power they represented. Mainframes, like most
other computers, come in a wide range of configurations and processing capacity.
If we could say there were exactly "X" mainframes in use in 1997, you still need to
estimate the processing capacity they represented, then you need to figure out what
percentage of the work load was due to running COBOL programs and a bunch of other
confounding factors. I just don't see how this line of reasoning would ever
yield an answer with an acceptable margin of error.
Multi-counting lines of code. That was exactly my point when
referring to the COBOL "fluff" factor. Counting lines of COBOL can be a very misleading statistic
simply because a significant amount of it was never written by programmers in the
first place. Or if it was, quite a bit of it was done using the cut-paste-tinker
"design pattern".
Your observation that memory was a valuable commodity in 1997 and prior is true. One would think that
this would have lead to using the most efficient coding techniques and languages
available to maximize its use. However, there are other factors: The opportunity cost of having an application
backlog was often perceived to outweigh the cost of bringing in more memory/cpu to deal with less than
optimal code (which could be cranked out quite a bit faster). This thinking was further reinforced by the
observation that Moore's Law leads to ever
declining hardware costs whereas software development costs remain constant. The "logical" conclusion
was to crank out less than optimal code, suffer for a while, then reap the benefits
in the years to come (IMO, a lesson in bad planning and greed, still common today).
The push to deliver applications during the 70's through 90's led to the rise of a host of
code generators (today I see frameworks of various flavours fulfilling this role).
Many of these code generators emitted tons of COBOL code. Why emit COBOL code? Why not emit
assembler or p-code or something much more efficient? I believe the answer is
one of risk mitigation. Most code generators are proprietary pieces of software owned by some
third party who may or may not be in business or supporting their product 10 years from now.
It is a very hard sell if you can't provide an iron-clad guarantee that the generated application can be
supported into the distant future. The solution is to have the "generator" produce something
familiar - COBOL for example! Even if the "generator" dies, the resulting application can
be maintained by your existing staff of COBOL programmers. Problem solved ;) (today we see
open source used as a risk mitigation argument).
Returning to the LOC question. Counting lines of COBOL code is, in my opinion, open to
a wide margin of error or at least misinterpretation. Here are a few statistics from an application
I work on (quoted approximately). This
application was built with and is maintained using Basset Frame Technology (frame-work) so
we don't actually write COBOL but we generate COBOL from it.
Lines of COBOL: 7,000,000
Non-Procedure Division: 3,000,000
Procedure Division: 3,500,000
Comment/blank : 500,000
Non-expanded COPY directives: 40,000
COBOL verbs: 2,000,000
Programmer written procedure Division: 900,000
Application frame generated: 270,000
Corporate infrastructure frame generated: 2,330,000
As you can see, almost half of our COBOL programs are non-procedure Division code (data declaration
and the like). The ratio of LOC to Verbs (statement count) is about 7:2. Using our framework
leverages code production by about a factor of 7:1.
So what should you make of all this? I really don't know - except that there is a lot of room to
fluff up the COBOL line counts.
I have worked with other COBOL program generators in the past. Some of these had absolutely
stupid inflation factors (e.g. the 200 lines to 15K line fluffing mentioned earlier). Given all these
inflationary factors and the counting methodology used in by Gartner, it may very well have
been possible to "fluff" up to 200 billion lines of COBOL in 1997 - but the question
remains: Of what real use is this number? What could it really mean? I have no idea. Now
lets get back to the counting angels problem!
I would never defend those clowns at Gartner, but still:
Your ideas about IBM/370s are wrong. The 370 was an architecture, not a specific machine - it was implemented on everything from water cooled monsters to small, mini-computer sized machine (same size as a VAX). The number sold was thus probably far larger, by orders of magnitude, than you suspect.
Also, COBOL was heavily used on DEC's VAX lineup, and before that on the DEC-10 and DEC-20 lines. In the UK it was used on all ICL mainframes. Lots of other platforms also supported it.
[Usual disclaimer - I work for a COBOL vendor]
It's an interesting question and it's always difficult to get a provable number. On the number of COBOL programmers estimates - the 2 - 3 million number may not be orders of magnitude in error. I think there have been estimates of 1 or 2 million in the past. And each one of those can generate a lot of code in a 20 year career. In India tens of thousands of new COBOL programmers are added to the pool every year (perhaps every month!).
I think the automatically generated code is probably bigger than might be thought. For example PACBASE is a very popular application in the banking industry. A very large global bank I know of uses it extensively and they generate all their code into COBOL and estimate this generated code is 95% of their total code base with the other 5% being hand coded/maintained. I don't think this is uncommon. The maintenance and development of those systems is typically done at the model-level not the generated code as you say.
There is a group of applications that were missing from the original question - COBOL isn't only a mainframe language. The early years of Micro Focus were almost entirely spent in the OEM marketplace - we used to have something like 200 different OEMs (lots of long-gone names like DEC, Stratus, Bull, ...). Every OEM had to have a COBOL compiler on their box alongside C and Assembler. A lot of big applications were built at that time and are still going strong - think about the largest HR ERP systems, the largest mobile phone billing systems etc. My point is that there is a lot of COBOL code that was never on an IBM mainframe and is often overlooked.
And finally, the size of the code base may be larger in COBOL shops than the "average". That's not just because COBOL is verbose (or was - that's not been the case for a long time) but the systems are just bigger - they're in large organizations, doing a large number of disparate tasks. It's very common for sites to have 10's of millions of LoC.
I don't have figures, but my first "real" job was on IBM 370s.
First: Number sold. In 1974, a large railway ran on three 370s. These were big 370s, though, and you could get a small one for a whole lot less. (For perspective, at that time whether to get another megabyte was a decision on the VP level.)
Second: COBOL code is what you might call "fluffy", since a typical sentence (COBOLese for line) might be "ADD 1 TO MAIN-ACCOUNT OF CUSTOMER." There would be relatively few machine instructions per line. (This is especially true on the IBM 360 and onwards, which had machine instructions designed around executing COBOL.) BTW, addresses were 16 bits, four to designate a register (using the bottom 24 bits as a base address) and 12 as an offset. This meant that something under 64K could be addressed at a time, since not all of the 16 registers could be used as base registers for various reasons. Don't underestimate the ability of people to fit programs into small memory.
Also, don't underestimate the number of programs. The program library would be on disk and tape, and was essentially only limited by cost and volume. (Earlier on, they'd be on punch cards, which had serious problems as data and program storage.)
Third: Yes, most software was hand-written for the business at that time. Machines were far more expensive then, and people were cheaper. Programs were written to automate the existing business processes, and the idea that you could get outside software and change your business practices was almost heresy. This changed over time, of course.
Fourth: Programmers could go much faster than today, in lines of code per person-year, since these were largely big dumb programs for big dumb problems. In my experience, the DATA DIVISION was a large part of each COBOL program, and that would frequently take large descriptions of file layouts and repeat them in each program in the series.
I have limited experience with program generators, but it was very common at the time to use it to generate an application and then modify the output. This was partly just bad practice, and partly because a program generator couldn't do everything needed.
Fifth: PL/I was not heavily used, despite IBM's efforts. It ran into early bad press, although as far as I know the only real major problem that couldn't be fixed was figuring out the precision system. The Defense Department used Ada and COBOL for entirely different things. You are omitting assembly language as a competitor, and lots of small shops used BAL (also called ASM) instead of COBOL. After all, programmers were cheap, compilers were expensive, and there were a whole lot of things COBOL couldn't do. It was actually a very nice assembly language, and I liked it a lot.
Well, you're asking in the wrong place here. This forum is dominated by .net programmers, with a significant java minority and such a age build-up that only a very small minority has any cobol experience.
The CASE tool market consisted for a large part of cobol code generators. Most tools were write-only, not two-way. That ensures there are a lot of lines of code. This market was somewhat newer than the 70s, so the volume of cobol code grew fast in the 80s and 90s.
A lot of cobol development is done by people having no significant internet access and therefore visibility. There is no need for it. Cobol prorammers are used to having in-house programming courses and paper documentation (lots of it).
[edit] Generating cobol source made a lot of sense. Cobol is very verbose and low level. The various cobol implementations are all slightly different, so configuring the code generator eliminates a lot of potential errors.
With regards to #4: how much of that could have been machine-generated code? I don't know if template-based code was used a lot with Cobol, but I see a lot of it used now for all sorts of things. If my application has thousands of LOC that were machine generated, that doesn't mean much. The last code-generating script I wrote took 20 minutes to write, 10 minutes to format the input, 2 minutes to run, then an hour to execute a suite of automatic tests to verify it actually worked, but the code it generated would have taken several days to do by hand (or the time between the morning meeting and lunch, doin' it my way ;) ). Ok I admit it's not always that easy and there is often manual tweaking involved, but I still don't think the LOC metric has much meaning if code-generators are in heavy use.
Maybe that's how they generated so much code in so little time.

What would programming languages look like if every computable thing could be done in 1 second?

Inspired by this question
Suppose we had a magical Turing Machine with infinite memory, and unlimited CPU power.
Use your imagination as to how this might be possible, e.g. it uses some sort of hyperspace continuum to automatically parallelize anything as much as is desired, so that it could calculate the answer to any computable question, no matter what it's time complexity is and number of actual "logical steps", in one second.
However, it can only answer computable questions in one second... so I'm not positing an "impossible" machine (at least I don't think so)... For example, this machine still wouldn't be able to solve the halting problem.
What would the programming language for such a machine look like? All programming languages I know about currently have to make some concessions to "algorithmic complexity"... with that constraint removed though, I would expect that all we would care about would be the "expressiveness" of the programming language. i.e. its ability to concisely express "computable questions"...
Anyway, in the interests of a hopefully interesting discussion, opening it up as community wiki...
SendMessage travelingSalesman "Just buy a ticket to the same city twice already. You'll spend much more money trying to solve this than you'll save by visiting Austin twice."
SendMessage travelingSalesman "Wait, they built what kind of computer? Nevermind."
This is not really logical. If a thing takes O(1) time, then doing n times will take O(n) time, even on a quantum computer. It is impossible that "everything" takes O(1) time.
For example: Grover's algorithm, the one mentioned in the accepted answer to the question you linked to, takes O(n^1/2) time to find an element in a database of n items. And thats not O(1).
The amount of memory or the speed of the memory or the speed of the processor doesn't define the time and space complexity of an algorithm. Basic mathematics do that. Asking what would programming languages look like if everything could be computed in O(1) is like asking how would our calculators look like if pi was 3 and the results of all square roots are integers. It's really impossible and if it isn't, it's not likely to be very useful.
Now, asking ourself what we would do with infinite process power and infinite memory could be a useful exercise. We'll still have to deal with complexity of algorithms but we'd probably work somehow differently. For that I recommend The Hundred-Year Language.
Note that even if the halting problem is not computable, "does this halt within N steps on all possible inputs of size smaller than M" is!
As such any programming language would become purely specification. All you need to do is accurately specify the pre and post conditions of a function and the compiler could implement the fastest possible code which implements your spec.
Also, this would trigger a singularity very quickly. Constructing an AI would be a lot easier if you could do near infinite computation -- and once you had one, of any efficiency, it could ask the computable question "How would I improve my program if I spent a billion years thinking about it?"...
It could possibly be a haskell-ish language. Honestly it's a dream to code in. You program the "laws" of your types, classes, and functions and then let them loose. It's incredibly fun, powerful, and you can write some very succinct and elegant code. It's like an art.
Maybe it would look more like pseudo-code than "real" code. After all, you don't have to worry about any implementation details any more because whichever way you go, it'll be sufficiently fast enough.
Scalability would not be an issue any longer. We'd have AIs way smarter than us.
We wouldn't need to program any longer and instead the AI would figure out our intentions before we realize them ourselves.
SQL is such a language - you ask for some piece of data and you get it. If you didn't have to worry about minute implementation details of the db this might even be fun to program in.
Your underestimate the O(1). It means that there exists a constant C>0 such that time to compute a problem is limited to this C.
What you ignore is that the actual value of C can be large and it can (and mostly is) different for different algorithms. You may have two algorithms (or computers - doesn't matter) both with O(1) but in one this C may be billion times bigger that in another - then the latter will be much slower and perhaps very slow in terms of time.
If it will all be done in one second, then most languages will eventually look like this, I call it DWIM theory (Do what I mean theory):
Just do what I said (without any bugs this time)
Because if we ever develop a machine that can compute everything in one second, then we will probably have mind control at that stage, and at the very least artificial intelligence.
I don't know what new languages would come up (I'm a physicist, not a computer scientist) but I'd still write my programs for it in Python.

Resources