What are nontrivial defects and how to overcome them? - bug-tracking

It is said [Software Defect ReductionTop 10 List] that, 'about 40 to 50 percent of user programs contain nontrivial defects'.
What are some nontrivial defects and how to overcome them?

I would interpret "non-trivial" as "has a real impact on the user".
For instance, if a menu item has a typo in it, that would be a trivial defect. If your spreadsheet application crashed when it tried to save any sheet with the number "999" in, that would be non-trivial.
I'd be hugely surprised if the number was really as low as 40-50%. In my experience pretty much every signficant application has non-trivial defects, even if they're rarely encountered. (If I'm the only user in the world who uses the number 999 in a spreadsheet, the bug is still hugely important to me so I don't think it can be classed as trivial.)
As for "overcoming" defects - the normal barrage of unit tests, continuous build, automated integration tests, manual testing, making sure you have a really good user feedback system, and management who are willing to put resources into fixing bugs as well as creating new features.

Subjective, but:
Non trivial: defects that stop users doing their job, or that impact their productivity to a significant degree
Trivial: defects that just annoy users
Obviously there is a big grey area here, because what's annoying and trivial for one product might be annoying but non-trivial for another.

First, it is worth noting that most single defects are trivial: tests aim at discovering them.
So non trivial defects are generally a combination of two or more single defects, each of them being harmless alone (test input didn't trigger them).
A second step in non triviality is when time is part of the input/output space: specific dates or durations.
Then you can add discrepancies between assumptions and reality: compiler, target platform, inputs, ...
Shake all of that and may the force be with you...

Try to understand the other side first: Trivial defects. A trivial defect is either harmless or easy to fix (typo in a text in the UI, wrong color for a button, labels are not aligned perfectly).
Non-trivial defects are everything else: Performance problems, handling of the application, data corruption, etc. They are sometimes hard to find and often hard to fix.

Related

Are limitations of CPU speed and memory prevent us from creating AI systems?

Many technology optimists say that in 15 years the speed of computers will be comparable with the speed of the human brain. This is why they believe that computers will achieve the same level of intelligence as humans.
If Moore's law holds, then every 18 months we should expect doubling of CPU speed. 15 years is 180 months. So, we will have the doubling 10 times. Which means that in 15 years computer will be 1024 times faster than they are now.
But is the speed the reason of the problem? If it is so, we would be able to build an AI system NOW, it would just 1024 times slower than in 15 years. Which means that to answer a question it will need 1024 second (17 minutes) instead of acceptable 1 second. But do we have now strong (but slow) AI system? I think no. Even if now (2015) we give to a system 1 hour instead of 17 minutes, or 1 day, or 1 month or even 1 year, it still will be unable to answer complex questions formulated in natural language. So, it is not the speed that causes problems.
It means that in 15 years our intelligence will not be 1024 faster than now (because we have no intelligence). Instead our "stupidity" will be 1024 times faster than now.
We need both faster hardware and better algorithms. Of course speed alone is not enough as you pointed out.
We need self-modifying meta-learning algorithms capable of creating hypotheses and performing experiments to verify them (like humans do). Systems that are learning to learn and self-improving. Algorithms that can prove that given self-modification is optimal in certain sense and will lead to even better self-modifications in the future. Systems that can reflect on and inspect their own software (can you call it consciousness ?). Such research is being done and may create superhuman intelligence in the future or even technological singularity as some believe.
There is one problem with this approach, though. People doing this research usually assume that consciousness is computable. That it is all about intelligence. They don't take into account experiences like pleasure and pain which have nothing to do (in my opinion) with computation nor intellect. You can understand pain through experience only (not intellectual speculation). Setting variable pleasure to 5 or behaving like one feels pleasure is very different from experiencing pleasure. Some people say that feelings originate in brain so it is enough to understand brain. Not necessarily. Child can ask: "How did they put small people inside TV box ?". Of course TV is just a receiver and there are no small people inside. Brain might be receiver too. Do we need higher knowledge for feelings and other experiences ?
The answer has to be answered in the context of computation and complexity.
Every algorithm has its own complexity and running time (See Big O notation). There are problems which are non-computable problems such as the halting problem. These problems are proven that an algorithm does not exists independent of the hardware.
Computable algorithms are described in the number of steps required with respect to the input to solve an algorithm. As the number of input increases, the execution time of the algorithm also increases. However, these algorithms can be categorized into two: exponential time algorithms and non-exponential time algorithms. Exponential time algorithms increases drastically with the number of input and becomes intractable.
These executing time of these problems can be improved with better hardware however the complexity will always be the same. This means that no matter what the CPU uses, the execution time will always require the same number of steps. This means that the hardware is important to provide an answer in less time but the hardness of the problem will always remain the same. Thus, the limitation of the hardware is not preventing us from creating an AI system. For instance, you can use parallel programming (ex: GPU) to improve the execution time of the algorithm drastically but the algorithm is still the same as a normal CPU algorithm.
I would say no. As you showed, speed is not the only factor of intelligence. I for one would think Language is, yes language. Language is the primary skill we learn as humans, so why not for computers? Language gives an understanding that can be understood across the globe, given you know that language. Humans use nonverbal and verbal language to communicate. But I honestly think it really works something like this:
Humans go through experiences. These experiences have a bigger impact on our lives the closer we are to our birth date, or the more emotional they are. For example, the first time we are told no means ALOT more to us as an infant than as a 70 year old adult. These get stored as either long term or short term memory and correlated to that event later on in life for reference. We mainly store events to learn from them to prevent negative experience or promote positive experiences.
Think of it as a tag cloud. The more often you do task A, the bigger the cloud is in memory. We then store crucial details such as type of emotion, location, smells etc. Now when we reference them again from memory we pick out those details and create a logical sentence:
Touching that stove hurt me when I was at grandma's house.
All of the bolded words would have to be stored to have a complete memory.
Now inside of this sentence we have learned a lot more things than just being hurt from the stove at grandma's house. We have learned that stove's can be hot, dangerous, and grandma allows it to be in her house. We also learned how long it takes to heal from such an event, emotionally and physically to gauge how important the event is. And so much more. So we also store this sub-event information inside of other knowledge bubbles. And these bubbles continue to grow exponentially.
Now when asked: Are stoves dangerous?
You can identify the words in the sentence:
are, stoves, dangerous, question
and reference the definition of dangerous as: hurt, bad
and then provide more evidence that this is true, such as personal experience to result in:
Yes, stoves are dangerous because I was hurt at grandmas house by one.
So intelligence seems to be a mix of events, correlation and data retrieval to solve some solution. I'm sure there's a lot more to it than that but this is just my understanding of intelligence.

Functional approaches to designing the discrete side of hybrid systems

I'm working on developing controllers for hybrid systems in Haskell.
FRP libraries (right now I'm using netwire, but there are several good ones and a lot of interesting research on future ones) provide a great solution for the continuous-time side of the problem. Augmenting them with signal names, dimensions, preferred units, and so forth gets you a system that has modularity, is self-describing, and has a straightforward path to confidence in correctness.
I'm looking for information, folklore, or papers that provide similar properties for the discrete-time side. In some sense the problem is much easier, state machines are well-studied and simple. In other senses it's more difficult, I'll briefly explain how.
Correctness is obviously the most important thing, and thankfully it's also straightforward.
Self-description is more of a problem. You'd like the controller not just to be in the correct state, but to be capable of telling you what state it's in. Also how it got there. And where it might go next. So you can tack names on to everything, and it works, but it conflicts somewhat with modularity. You'd also like to be able to build complex discrete time behaviors from simpler ones. But when you ask the system what state it's in, generally the high-level answer is more interesting (or at least, as interesting) as the low level answer. How do you get this cleanly? I've tried a few naive approaches and have wrapped myself in spaghetti a few different ways, but it seems like there must be elegant solutions?
Another problem I've had with self-description is that I'd like to have a list of self-describing conditions (generally comparisons: has it been 10 seconds? am I within 3 feet of the next waypoint? has the battery power fallen below 15%? etc) that are being monitored which might trigger the next state transition. There are tricky questions of what even are the desirable semantics here, since it seems like some of these events are better handled "from the bottom up" (e.g. expected termination conditions of whatever low level step you are performing) and some "from the top down" (e.g. equipment failure detection, geofencing, ...). This can lead to spaghetti of its own even if you relax the goal of self-description.
In addition to diagnostics, accurate self-description information here could also be very useful for abstract interpretation, projecting the state of the system into the future by guessing which events are likely to occur when. Many of the event conditions lead themselves to fairly simple guesses (e.g. using velocity made good, fuel consumption rate, timers). Others are more complicated but might still be worth the effort to develop projections for some applications (e.g. expected orders from operators, weather forecasts, projected tracks for moving objects of interest). It would be nice to find a design that annotates conditions not only with names, but also with functions for this sort of stuff.
Does anyone have experience with this that they are willing to share?
Okay, so I would say the "real" answer to your question is that some of things that you are asking for are open areas of research --- in particular I think some of the self-describing features you desire may necessitate some degree of "spaghetti" simply because the problem you are trying to solve is inherently complicated.
That being said, your focus on modularity is exactly the right approach. I would say, take a look at Keymaera as I believe it has the features you are looking for despite being in Java. I would also recommend looking at the publications page on the Keymaera website as this should provide you valuable insight to the problem in general.
If you do not like Keymaera's approach you can also look into using Timed Automata which is another direction modeling-wise that should be sufficient for your problem description.

Cobol: science and fiction [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 12 years ago.
There are a few threads about the relevance of the Cobol programming language on this forum, e.g. this thread links to a collection of them. What I am interested in here is a frequently repeated claim based on a study by Gartner from 1997: that there were around 200 billion lines of code in active use at that time!
I would like to ask some questions to verify or falsify a couple of related points. My goal is to understand if this statement has any truth to it or if it is totally unrealistic.
I apologize in advance for being a little verbose in presenting my line of thought and my own opinion on the things I am not sure about, but I think it might help to put things in context and thus highlight any wrong assumptions and conclusions I have made.
Sometimes, the "200 billion lines" number is accompanied by the added claim that this corresponded to 80% of all programming code in any language in active use. Other times, the 80% merely refer to so-called "business code" (or some other vague phrase hinting that the reader is not to count mainstream software, embedded systems or anything else where Cobol is practically non-existent). In the following I assume that the code does not include double-counting of multiple installations of the same software (since that is cheating!).
In particular in the time prior to the y2k problem, it has been noted that a lot of Cobol code is already 20 to 30 years old. That would mean it was written in the late 60ies and 70ies. At that time, the market leader was IBM with the IBM/370 mainframe. IBM has put up a historical announcement on his website quoting prices, configuration and availability. According to the sheet, prices are about one million dollars for machines with up to half a megabyte of memory.
Question 1: How many mainframes have actually been sold?
I have not found any numbers for those times; the latest numbers are for the year 2000, again by Gartner. :^(
I would guess that the actual number is in the hundreds or the low thousands; if the market size was 50 billion in 2000 and the market has grown exponentially like any other technology, it might have been merely a few billions back in 1970. Since the IBM/370 was sold for twenty years, twenty times a few thousand will result in a couple of ten-thousands of machines (and that is pretty optimistic)!
Question 2: How large were the programs in lines of code?
I don't know how many bytes of machine code result from one line of source code on that architecture. But since the IBM/370 was a 32-bit machine, any address access must have used 4 bytes plus instruction (2, maybe 3 bytes for that?). If you count in operating system and data for the program, how many lines of code would have fit into the main memory of half a megabyte?
Question 3: Was there no standard software?
Did every single machine sold run a unique hand-coded system without any standard software? Seriously, even if every machine was programmed from scratch without any reuse of legacy code (wait ... didn't that violate one of the claims we started from to begin with???) we might have O(50,000 l.o.c./machine) * O(20,000 machines) = O(1,000,000,000 l.o.c.).
That is still far, far, far away from 200 billion! Am I missing something obvious here?
Question 4: How many programmers did we need to write 200 billion lines of code?
I am really not sure about this one, but if we take an average of 10 l.o.c. per day, we would need 55 million man-years to achieve this! In the time-frame of 20 to 30 years this would mean that there must have existed two to three million programmers constantly writing, testing, debugging and documenting code. That would be about as many programmers as we have in China today, wouldn't it?
EDIT: Several people have brought up automatic templating systems/code generators or so. Could somebody elaborate on this? I have two issues with that: a) I need to tell the system what it is supposed to do for me; for that I need to communicate with the computer and the computer will output the code. This is exactly what a compiler of a programming language does. So essentially I am using a different high-level programming language to generate my Cobol code. Shouldn't I work with that other high-level language instead of Cobol? Why the middle-man? b) In the 70s and 80s the most precious commodity was memory. So if I have a programming language output something, it should better be concise. Using my hypothetical meta-language would I really generate verbose and repetitive Cobol code with it rather than bytecode/p-code like other compilers of that time did? END OF EDIT
Question 5: What about the competition?
So far, I have come up with two things here:
1) IBM had their own programming language, PL/I. Above I have assumed that the majority of code has been written exclusively using Cobol. However, all other things being equal I wonder if IBM marketing had really pushed their own development off the market in favor of Cobol on their machines. Was there really no relevant code base of PL/I?
2) Sometimes (also on this board in the thread quoted above) I come across the claim that the "200 billion lines of code" are simply invisible to anybody outside of "governments, banks ..." (and whatnot). Actually, the DoD had funded their own language in order to increase cost effectiveness and reduce the proliferation of programming language. This lead to their use of Ada. Would they really worry about having so many different programming languages if they had predominantly used Cobol? If there was any language running on "government and military" systems outside the perception of mainstream computing, wouldn't that language be Ada?
I hope someone can point out any flaws in my assumptions and/or conclusions and shed some light on whether the above claim has any truth to it or not.
On the surface, the numbers Gartner produces are akin to answering the
question: How many angels can dance on the head of a pin?.
Unless you obtain a full copy of their report (costing big bucks) you will never know how they came up
with or justified the 200 billion lines of COBOL claim. Having said that, Gartner is a well
respected information technology research and advisory
firm so I would think they would not have made such a claim without justification or
explanation.
It is amazing how this study has been quoted over the years. A Google search for "200 billion lines of COBOL"
got me about 19,500 hits. I sampled a bunch of them and most attribute the number directly to the 1997 the Gartner report.
Clearly, this figure has captured the attention of many.
The method that you have taken to "debunk" the claim has a few problems:
1) How many mainframes have been sold This is a big question in and of itself, probably just as difficult
as answering the 200 billion lines of code question. But more importantly, I don't see how determining the number of
mainframes could be used in constraing the number of lines of code running on them.
2) How large were the programs in lines of code? COBOL programs tend to be large. A modest program can
run to a few thousand lines, a big one into tens of thousands. One of the jokes COBOL programmers
often make is that only one COBOL program has ever been written, the rest are just modified
copies of it. As with many jokes there is a grain of truth in it. Most shops have a large program inventory
and a lot of those programs were built by cutting and pasting from each other. This really "fluffs" up the
size of your code base.
Your assumption that a program must fit into physical memory in order to run is wrong. The size problem
was solved in several different ways (e.g. program overlays, virtual memory etc.). It was not unusual in the
60's and 70's to run large programs on machines with tiny physical memories.
3) Was there no standard software? A lot of COBOL is written
from scratch or from templates. A number of financial packages were developed by software houses the 70's and 80's.
Most of these
were distributed as source code libraries. The customer then copied and modified the source to
fit their particular business
requirement. More "fluffing" of the code base - particularly given that large segments of that code
was logically unexecutable once the package had been "configured".
4) How many programmers did we need to write 200 billion lines of code Not as many as you might think!
Given that COBOL tends to be verbose and highly replicated, a programmer can have huge "productivity".
Program generating systems were in vogue during the 70's and early 80's.
I once worked with a product (now defunct fortunately) that let me write "business logic" and it
generated all of the "boiler plate" code around it - producing a fully functional COBOL program. The code
it generated was, to be polite, verbose in the extreme. I could produce a 15K line COBOL program from
about 200 lines of input! We are taking serious fluff here!
5) What about the competition? COBOL has never really had much serious competition in the
financial and insurance sectors. PL/1 was a major IBM initiative to produce the one programming language
that met every possible computing need. Like all such initiatives it was too ambitious and
has pretty much collapsed inward. I believe IBM still uses and supports it today. During the 70's
several other languages were predicted to replace COBOL - ALGOL, ADA and C come to mind, but
none have. Today I hear the same said of Java and .Net. I think the major reason COBOL is still with us is that it
does what it is supposed to do very well and
the huge multi billion lines of code legacy make moving to a "better" language both expensive and
risky from a business point of view.
Do I believe the 200 billion lines of code story? I find the number high but not impossibly high given
the number of ways COBOL code tends to "fluff" itself up over time.
I also think that getting too involved in analyzing these numbers quickly degrades into a
"counting angels" exercise - something people can get really wound up over but has no
significance in the real world.
EDIT
Let me address a few of the comments left below...
Never seen a COBOL program used by an investment bank. Quite possible. Depends
which application areas you were working in. Investment banks tend to have
large computing infrastructures and employ a wide range of technologies. The shop
I have been working in
for the past 20 years (although not an investment bank) is one of the largest in
the country and it has a significant
COBOL investment. It also has significant Java, C and C++ investments as
well as pockets of just about every other technology
known to man. I have also met some fairly senior applications developers here that
were completely unaware that COBOL was still in use. I did a
rough line count through our source control system and found around 70 million lines of
production COBOL. Quite a few people that have worked here for years are completely oblivious to it!
I am also aware that COBOL is rapidly declining as a language of choice, but the fact
is, there is still a lot of it around today. In 1997, the period to which this question
relates, I believe COBOL would have been the dominant language in terms of LOC. The
point of the question is: Could there have been 200 billion lines of it in 1997?
Counting mainframes. Even if one were able to obtain the number of mainframes it would
be difficult to assess the "compute" power they represented. Mainframes, like most
other computers, come in a wide range of configurations and processing capacity.
If we could say there were exactly "X" mainframes in use in 1997, you still need to
estimate the processing capacity they represented, then you need to figure out what
percentage of the work load was due to running COBOL programs and a bunch of other
confounding factors. I just don't see how this line of reasoning would ever
yield an answer with an acceptable margin of error.
Multi-counting lines of code. That was exactly my point when
referring to the COBOL "fluff" factor. Counting lines of COBOL can be a very misleading statistic
simply because a significant amount of it was never written by programmers in the
first place. Or if it was, quite a bit of it was done using the cut-paste-tinker
"design pattern".
Your observation that memory was a valuable commodity in 1997 and prior is true. One would think that
this would have lead to using the most efficient coding techniques and languages
available to maximize its use. However, there are other factors: The opportunity cost of having an application
backlog was often perceived to outweigh the cost of bringing in more memory/cpu to deal with less than
optimal code (which could be cranked out quite a bit faster). This thinking was further reinforced by the
observation that Moore's Law leads to ever
declining hardware costs whereas software development costs remain constant. The "logical" conclusion
was to crank out less than optimal code, suffer for a while, then reap the benefits
in the years to come (IMO, a lesson in bad planning and greed, still common today).
The push to deliver applications during the 70's through 90's led to the rise of a host of
code generators (today I see frameworks of various flavours fulfilling this role).
Many of these code generators emitted tons of COBOL code. Why emit COBOL code? Why not emit
assembler or p-code or something much more efficient? I believe the answer is
one of risk mitigation. Most code generators are proprietary pieces of software owned by some
third party who may or may not be in business or supporting their product 10 years from now.
It is a very hard sell if you can't provide an iron-clad guarantee that the generated application can be
supported into the distant future. The solution is to have the "generator" produce something
familiar - COBOL for example! Even if the "generator" dies, the resulting application can
be maintained by your existing staff of COBOL programmers. Problem solved ;) (today we see
open source used as a risk mitigation argument).
Returning to the LOC question. Counting lines of COBOL code is, in my opinion, open to
a wide margin of error or at least misinterpretation. Here are a few statistics from an application
I work on (quoted approximately). This
application was built with and is maintained using Basset Frame Technology (frame-work) so
we don't actually write COBOL but we generate COBOL from it.
Lines of COBOL: 7,000,000
Non-Procedure Division: 3,000,000
Procedure Division: 3,500,000
Comment/blank : 500,000
Non-expanded COPY directives: 40,000
COBOL verbs: 2,000,000
Programmer written procedure Division: 900,000
Application frame generated: 270,000
Corporate infrastructure frame generated: 2,330,000
As you can see, almost half of our COBOL programs are non-procedure Division code (data declaration
and the like). The ratio of LOC to Verbs (statement count) is about 7:2. Using our framework
leverages code production by about a factor of 7:1.
So what should you make of all this? I really don't know - except that there is a lot of room to
fluff up the COBOL line counts.
I have worked with other COBOL program generators in the past. Some of these had absolutely
stupid inflation factors (e.g. the 200 lines to 15K line fluffing mentioned earlier). Given all these
inflationary factors and the counting methodology used in by Gartner, it may very well have
been possible to "fluff" up to 200 billion lines of COBOL in 1997 - but the question
remains: Of what real use is this number? What could it really mean? I have no idea. Now
lets get back to the counting angels problem!
I would never defend those clowns at Gartner, but still:
Your ideas about IBM/370s are wrong. The 370 was an architecture, not a specific machine - it was implemented on everything from water cooled monsters to small, mini-computer sized machine (same size as a VAX). The number sold was thus probably far larger, by orders of magnitude, than you suspect.
Also, COBOL was heavily used on DEC's VAX lineup, and before that on the DEC-10 and DEC-20 lines. In the UK it was used on all ICL mainframes. Lots of other platforms also supported it.
[Usual disclaimer - I work for a COBOL vendor]
It's an interesting question and it's always difficult to get a provable number. On the number of COBOL programmers estimates - the 2 - 3 million number may not be orders of magnitude in error. I think there have been estimates of 1 or 2 million in the past. And each one of those can generate a lot of code in a 20 year career. In India tens of thousands of new COBOL programmers are added to the pool every year (perhaps every month!).
I think the automatically generated code is probably bigger than might be thought. For example PACBASE is a very popular application in the banking industry. A very large global bank I know of uses it extensively and they generate all their code into COBOL and estimate this generated code is 95% of their total code base with the other 5% being hand coded/maintained. I don't think this is uncommon. The maintenance and development of those systems is typically done at the model-level not the generated code as you say.
There is a group of applications that were missing from the original question - COBOL isn't only a mainframe language. The early years of Micro Focus were almost entirely spent in the OEM marketplace - we used to have something like 200 different OEMs (lots of long-gone names like DEC, Stratus, Bull, ...). Every OEM had to have a COBOL compiler on their box alongside C and Assembler. A lot of big applications were built at that time and are still going strong - think about the largest HR ERP systems, the largest mobile phone billing systems etc. My point is that there is a lot of COBOL code that was never on an IBM mainframe and is often overlooked.
And finally, the size of the code base may be larger in COBOL shops than the "average". That's not just because COBOL is verbose (or was - that's not been the case for a long time) but the systems are just bigger - they're in large organizations, doing a large number of disparate tasks. It's very common for sites to have 10's of millions of LoC.
I don't have figures, but my first "real" job was on IBM 370s.
First: Number sold. In 1974, a large railway ran on three 370s. These were big 370s, though, and you could get a small one for a whole lot less. (For perspective, at that time whether to get another megabyte was a decision on the VP level.)
Second: COBOL code is what you might call "fluffy", since a typical sentence (COBOLese for line) might be "ADD 1 TO MAIN-ACCOUNT OF CUSTOMER." There would be relatively few machine instructions per line. (This is especially true on the IBM 360 and onwards, which had machine instructions designed around executing COBOL.) BTW, addresses were 16 bits, four to designate a register (using the bottom 24 bits as a base address) and 12 as an offset. This meant that something under 64K could be addressed at a time, since not all of the 16 registers could be used as base registers for various reasons. Don't underestimate the ability of people to fit programs into small memory.
Also, don't underestimate the number of programs. The program library would be on disk and tape, and was essentially only limited by cost and volume. (Earlier on, they'd be on punch cards, which had serious problems as data and program storage.)
Third: Yes, most software was hand-written for the business at that time. Machines were far more expensive then, and people were cheaper. Programs were written to automate the existing business processes, and the idea that you could get outside software and change your business practices was almost heresy. This changed over time, of course.
Fourth: Programmers could go much faster than today, in lines of code per person-year, since these were largely big dumb programs for big dumb problems. In my experience, the DATA DIVISION was a large part of each COBOL program, and that would frequently take large descriptions of file layouts and repeat them in each program in the series.
I have limited experience with program generators, but it was very common at the time to use it to generate an application and then modify the output. This was partly just bad practice, and partly because a program generator couldn't do everything needed.
Fifth: PL/I was not heavily used, despite IBM's efforts. It ran into early bad press, although as far as I know the only real major problem that couldn't be fixed was figuring out the precision system. The Defense Department used Ada and COBOL for entirely different things. You are omitting assembly language as a competitor, and lots of small shops used BAL (also called ASM) instead of COBOL. After all, programmers were cheap, compilers were expensive, and there were a whole lot of things COBOL couldn't do. It was actually a very nice assembly language, and I liked it a lot.
Well, you're asking in the wrong place here. This forum is dominated by .net programmers, with a significant java minority and such a age build-up that only a very small minority has any cobol experience.
The CASE tool market consisted for a large part of cobol code generators. Most tools were write-only, not two-way. That ensures there are a lot of lines of code. This market was somewhat newer than the 70s, so the volume of cobol code grew fast in the 80s and 90s.
A lot of cobol development is done by people having no significant internet access and therefore visibility. There is no need for it. Cobol prorammers are used to having in-house programming courses and paper documentation (lots of it).
[edit] Generating cobol source made a lot of sense. Cobol is very verbose and low level. The various cobol implementations are all slightly different, so configuring the code generator eliminates a lot of potential errors.
With regards to #4: how much of that could have been machine-generated code? I don't know if template-based code was used a lot with Cobol, but I see a lot of it used now for all sorts of things. If my application has thousands of LOC that were machine generated, that doesn't mean much. The last code-generating script I wrote took 20 minutes to write, 10 minutes to format the input, 2 minutes to run, then an hour to execute a suite of automatic tests to verify it actually worked, but the code it generated would have taken several days to do by hand (or the time between the morning meeting and lunch, doin' it my way ;) ). Ok I admit it's not always that easy and there is often manual tweaking involved, but I still don't think the LOC metric has much meaning if code-generators are in heavy use.
Maybe that's how they generated so much code in so little time.

Predictive vs Reactive software design [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
I know that for me I first got started following the waterfall method of project management and along with that I went with the predictive approach to software design. In this I mean we had huge packets of documentation, UML, database schemas, data dictionaries, workflows, activity diagrams, etc.
Having worked in software for over 10 years now I find it to be much more realistic to approach software design from a Reactive approach. I frequently follow a scrum approach to project management and with that very little heavy documentation is ever generated. We have very little workflow specification (though they still have there use). This is a much more dynamic approach to software creation. Of course along with it comes frequent refactoring as time goes on as we find out new features over time that had we planned for up front would have changed things dramatically.
The big difference for us is that the first approach takes longer, seems to fail more frequently in a software construction world, and isn't nearly as flexible. The second approach provides more flexibility, makes us aware of failure faster (so we can course correct faster), and provides some form of functionality at the end of every iteration.
Knowing both sides from experience, I still find many people that LOVE the waterfall approach over the agile approach for software development. I don't get it.
question: Why would someone use waterfall over some form of agile with all of the research backing agile? What are strong arguments for using waterfall over agile?
When I started programming (with COBOL no less), waterfall was the "new" approach. Today, I'd tell you that I use a waterfallish agile methodology. For larger systems, I find a waterfall type start works best. Not creating huge documents (that's a waste of time IMO) but rather to take some steps like creating a UI prototype and/or use cases to get our heads around the business problem at hand. Once we are comfortable we have the problem scoped and we have a solid understanding, we move into an agile development mode.
To answer your question though, I think the big reason waterfall sticks around is many people don't like change. It's scary to change and moving from waterfall to agile is a big change.
I think that part of the reason why people still often cling to waterfall is that it gives the illusion of control. In a waterfall, you can do enough up front work to put together a beautiful schedule that nicely addresses every contingency that you can think of, and then give a detailed roadmap for the future to anyone on the business side who asks when feature X will be available.
The problem is that you can almost never follow that plan to the letter, and you are almost always late/dropping features. However from the upfront, it looks very controlled and manageable.
I'm a big Agile fan, but what I've always struggled with is the long range roadmap/forecasting that is often asked for by the sales and marketing folks. I think that the waterfall's illusion of certainty is very comforting to managers and business folks.
My boss tells me to.
I suspect many people have no choice and old bosses don't learn new tricks.
Not taking sides, but pretty much any research would be unscientific at best.
You say (emphasis is mine)
question: Why would someone use waterfall over some form of agile with all of the research backing agile? What are strong arguments for using waterfall over agile?
but don't link to any studies.
It's one of those things that are known to be extremely difficult to actually test. You can't have two identical teams work on the same project at the same time, because there's no such thing as two identical teams. You can't have the same team complete the same task twice in a row using two different methodologies without the first pass tainting the second. I've never heard of anyone designing an experimental (or even statistical) study that can convincingly argue for any software development methodology. I'd be interested to see one though, if you have a link.
Short of real evidence, it boils down to personal preference. What are the strong arguments for chocolate over vanilla?
I'll play devil's advocate and state that agile is flawed is nearly as many ways as the waterfall method is. I'm not one of those that love the waterfall method, but I don't love agile either.
My experience with agile hasn't been very positive. To be fair, I used it in a corporate environment, which paid lip service to "agile" while still expecting our manager to produce long term milestones and deliverables upfront.
However, I found that agile (scrum in particular) methodologies often disguise major problems with design. While waterfall gives managers the illusion of control, agile seems to do the same for development teams. I've seen teams where bringing up any issue that aren't in the current sprint/iteraton is frowned upon, with the expectation that it'll be handled "in time". It only requires a few major design decisions to be ignored for the project to go belly up in future, while current iterations go smoothly and project looks to be on track.
You can argue that the team's at fault for not understanding the spirit of agile, but I'd like to see better methodologies that incorporate the best parts of agile.
One of the premises of (at least) XP is that change is cheap. The waterfall model was built on the principles that change, any change, is costly. The assumption in the waterfall model is that once software has been written, changing it is more expensive than investing the time up front to come to a "complete" understanding of the problem.
Experience seems to indicate that it is very hard to come to a complete understanding of the problem and that if some precautions are taken (e.g. Unit Testing) change can become a lot cheaper. Therefore if you encounter a problem where some of the agile premises don't hold true other approaches might become feasible again. In between Waterfall and Agile there is at least Spiral development which is - sort of - what we practice.
You need to be preditive enough to deliver the goods. You need to be reactive enough to deal with the issues.
I was once stuck with six months to complete a project estimated to take a year, and based on past experience experience would take two. So I spent three months researching methodolgies. We finished on time (in three months), using the appropriate parts of a waterfall process.
A few points that made the methodoly work:
- Create an use standards, update them when needed.
- Build libraries: Do it once, do it well, fix it without breaking existing code.
- Do just enough documentation.
- Version control everything you can.
- Break things down; a method should either manage work or do work.
- Increase cohesion, decrease coupling, reuse.
- Buy or build the tools you need.
- Track your issues and progress.
Another project I was breifly involved was a six month project. I didn't get involved until a year and a half after it started. The development lead had been hired at an extreme markup as he was leaving a career with a pension plan. At the start of the project he asked the project manager, "Do you want me to do it right or be reactive?" Can you guess the answer? The week I was involved same feature was implemented on Monday, Wednesday, and Friday. Guess what happened Tuesday and Thursday?
The strength in Agile is its emphasis on just enough, just in time. The strength in the waterfall methodoly is that it covers all the things you need to think about. I've yet to work on a project that did or should have done all the steps. I have worked on many projects that did steps which should have been done on a corporate basis.
The title says it all. (Actually: proactive vs reactive). Why chose the reactive way and give up control unless you don't have to? Waterfall is not the only alternative, you can have any kind of development process what you refine when you like. Control is the key.
It's a spectrum btw, the waterfall on one end and the totally reactive, zero documentation methods on the other end. If you work in the consultant industry for powerful (and usually indecisive) clients, you have to resort to reactive methods. If you develop shrinkwrap software you can plan ahead and manage knowledge. Some projects also require tons of specifications and rules, where the code and fix approach just don't cut it. For me software engineering is primarily about knowledge management and design, coding comes second.
P.s. there is no such a thing as agile and fixed price. Not in the classical way they usually sell the method. See http://martinfowler.com/bliki/FixedPrice.html
If you exactly know the requirements that never chagen, if you know how long each step will take and if you know all the resources are avaliable at the time needed you can do waterfall and it will work. But in deed these kind of projects are quite rare and I think I will never be part of it.
When designing systems to be used by end users, agile often works well because the requirements are likely to be incorrect and a large part of the process is getting feedback from users in the form of a usable version.
However, when creating software that interfaces with other software often the requirements can be worked out in very clearly. In this case it is often more productive to ensure that you have a very clear and accurate specification, unit tests In this model you can also generate fairly good work estimates and there would be a great deal more cost to use the agile model.
retroactive behaviour
If you have a team of a few dozen people that have over the course of a decade, refined the waterfall strategy to the point that it works well for them, who are you to come in and say, "You're doing it wrong..."? Really, if it is working for them, why change things? Yes this is merely flipping the question around but I think it may be a valid point.
In my team we've found that with maintenance projects (which is the bulk of what we do) where we're tweaking or replacing like with like there isn't always as much need for user input beyond perhaps some UI prototypes.
In that case, particularly given that there are commercial deals involved the waterfall approach at a macro level can fit well. Even then we still like incremental / agile approaches at the implementation level.
It is worth noting that most of our clients are large lumbering organisations in love with their paperwork, so that adds even more impetus for us to at least appear traditional to them.
The documentation generated during the waterfall process allows for a lot of CYA. You can point fingers when a project goes off the rails. Very few executives are going to be OK with "oh well, I guess that project got away from us! Well, at least we found out early, no harm no foul!"
Also, design docs can automatically generate test plans, which is useful for QA.
It's pretty common when bidding for a contract that one of the iron-clad conditions is that you follow their "process" which on inspection is waterfall.

When must you use poor design to finish a project?

There are many different bad practices, such as memory leaks, that are easy to slip into a program on accident. Sometimes, they might even be able to jury-rig your program together.
I'm working on a project right now and it works if I deliberately put a memory leak in my code. If I take the leak out, the code crashes. Obviously this is bad, and needs to be (and will be) fixed soon.
My question is, when do you decide to deliver code in this state, if it's not possible to release code without these poor practices, in time?
If the problem's impact on actual usage of the system can be reasonably expected to be none or negligible, and the delivery date cannot be pushed back, and it can be fixed within a scope of time before the problem's impact becomes more than negligible, ship it.
Obviously this is not ideal or even recommended, but you're clearly pushed into a corner at this point. Sometimes there are no good choices, but pragmatism must win over formal correctness. If an application has a memory leak, but we can reasonably expect that the app will be recycled or machine restarted or whatever before the leak becomes a real problem, that can sometimes be better than delivering late. It depends on the conditions of the agreement and the customer.
It's always better to at least try to push back the delivery date, but I am assuming you've already tried that and it's not an option here.
It is typical once an application has been shipped to ignore technical debt and move on. It's the responsibility of the developers to clearly communicate to the stakeholders the importance of paying off some of that debt, especially in a case like this.
However, given that it seems the customer cares more about a delivery date than correctness, it's less likely anyone will be convinced to pay off any debt once you go live. This is a bad situation to be in. Only the person with all the facts can make the right choice.
"My question is, when do you decide to deliver code in this state, if it's not possible to release code without these poor practices, in time?"
Never.
What you do instead: prioritize and focus.
If what you're working on is really high-priority, and you've mis-designed it, something low- priority has to be sacrificed. Often, some feature(s) must be delayed to give you time to focus on the high priority feature that doesn't work.
If what you're working on is really low-priority, you have to ask why you're not working on something higher-priority. And you still have to focus and prioritize. Sometimes things which are very low priority must be sacrificed.
When you can't do "everything" you have to pick things you can do that will be reasonably bug-free.
You might be interested in the concept of technical debt.
You only have three knobs you can turn when shipping software, assuming a fixed number of developers: features, quality and ship date, and turning one up means the others get turned down.
One of the most difficult things to do in software development is to build your product with the knobs set just right. For example, the Duke Nukem Forever guys have turned the features and quality knob up to eleven and thrown the ship date knob out the window. Microsoft often seems to glue the ship date knob in place and turn down the feature knob as needed, then unglue the ship date knob, turn it up a bit, glue it back down and continue twiddling the other two. And there are seem to be an endless amount of products out there that ship all the time but never put in the features they need to be successful.
In the end, you don't get paid if you don't deliver. Having poor quality hurts you terribly in the long run; reputations are hard to rebuild. It has almost always been that the right thing to do is to cut features if you have too many bugs. Always.
However, bug triage is just as important as feature development. What kind of leak are we talking about here? Are you leaking a byte? A small object? One thousand objects? Entire DLLs? There are scenarios where its probably better to leak a little than to fail to deliver the product.
And what do you mean by leak? Does your application have a well defined life cycle? Where you allocate something once at startup and then never free it until the process dies? Well how long does your process run? Do you expect to run multiple copies of your process?
Obviously you never want to leak, and you should work to develop best practices that minimize leaking, but in the end you have to make a judgment call. Maybe you can just explain the bug to your customers, explain the impact, and they'll buy it anyway. Maybe you can patch it a week later. Maybe you really do need to fix it. But we'd need to know more about it to give good advice.
I will say I have shipped known leaks in the past. I won't say what product or company, but I had a bug where DLL dependencies and insane lifetime management made it next to impossible to correctly free our references to a certain DLL once it was loaded, so in the end we just leaked it. And I still think it was the right thing to do. Other times I've seen things deliberately leaked to keep third party code that was written incorrectly from crashing (though that is a completely separate debate).
But in the end I believe such instances are rare and once you have identified the source of a memory leak, it shouldn't take much more than a day to fix it. It is rare indeed that I would ship with a memory leak that was known and a fix was known. It would have to be something that required a major re-architect involving changing a threading model, or refactoring huge swaths of code, and it would literally have to be a day or two before the product was to ship. At that point I might just leak the memory and promise a patch in a weak after proper testing could be done on the re-architect.
I would be very uncomfortable releasing with such a known bug. It is likely to occur in another way.
You have not specified your environment or language, but I suggest you look at using a memory checking tool such as:
Purify (trial available)
BoundsChecker
Valgrind
or even a free one, Visual Leak Detector
Perhaps, when you are not going to be around to maintain the code later, you don't care about your client/employer and none of the ramifications of your code could possibly affect you.
In other words, in your professional coding life, it's never a really good idea.
If you are working for someone that is less concerned about code quality than you are and simply wants you to finish the code at all costs, then I can see how you'd be in a difficult situation. Finishing faster but poorer will earn you some immediate reward. You should remember though that even if failing to meet your employer/client's expectation for a milestone bites you only once, your poor code may continue to bite you into the future, not only through the difficulties in maintaining it but also through the negative impression others may form of you down the track.
If its a single (or limited) memory leak, and it doesn't grow, and say it only causes a crash when shutting down (the most common case of stuff like this), then it depends. If its a client/desktop software and the users are going to crash every time on their way out, I'd make it an ultra high priority. If its server, and the only one running the server is you, and everything else works fine, I'd say its alright to enter beta. But if the leaks grow, or can cause crashes at "random" times they need to be fixed asap.
To get past an internal milestone, it's arguable, although still nothing to be taken likely.
To release, never. It always comes back and bites you. If your software is in such a bad space that a piece of poor design will get it over the line, you've got much bigger problems looming round the corner
Never, unless you don't care about the poor developer who is going to be maintaining your work afterwards.
Ultimately, a decision like this should be made by the customer or the project manager. Individual developers should not be making these kinds of decisions alone, or keeping this information to themselves.
Tell them what the problem is, and what the consequences will be for not fixing it. If they want you to ship it broken on time, that's their call.
If you don't want to work for people who accept shoddy products, that's your call, but it's a mistake to think that developers have some sort of professional responsibility to ignore their clients' and bosses' quality/cost/time priorities.
If somebody may actually die if you ship bad software, then don't do it, but if the worst-case scenario is that somebody is going to have to reboot a couple times per day, then do what you're told or find another job.

Resources