Any good strategies for dealing with 'not reproducible' bugs? [closed] - bug-tracking

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
Very often you will get or submit bug reports for defects that are 'not reproducible'. They may be reproducible on your computer or software project, but not on a vendor's system. Or the user supplies steps to reproduce, but you can't see the defect locally. Many variations on this scenario of course, so to simplify I guess what I'm trying to learn is:
What is your company's policy towards 'not reproducible' bugs? Shelve them, close them, ignore? I occasionally see intermittent, non reproducible bugs in 3-rd party frameworks, and these are pretty much always closed instantly by the vendor... but they are real bugs.
Have you found any techniques that help in fixing these types of bugs? Usually what I do is get a system info report from the user, and steps to reproduce, then search on keywords, and try to see any sort of pattern.

Verify the steps used to produce the error
Oftentimes the people reporting the error, or the people reproducing the error, will do something wrong and not end up in the same state, even if they think they are. Try to walk it through with the reporting party. I've had a user INSIST that the admin privileges were not appearing correctly. I tried reproducing the error and was unable to. When we walked it through together, it turned out he was logging in as a regular user in that case.
Verify the system/environment used to produce the error
I've found many 'irreproducible' bugs and only later discovered that they ARE reproducible on Mac OS (10.4) Running X version of Safari. And this doesn't apply only to browsers and rendering, it can apply to anything; the other applications that are currently being run, whether or not the user is RDP or local, admin or user, etc... Make certain you get your environment as close to theirs as possible before calling it irreproducible.
Gather Screenshots and Logs
Once you have verified that the user is doing everything correctly and still getting a bug, and that you're doing exactly what they do, and you are NOT getting the bug, then it's time to see what you can actually do about it. Screenshots and logs are critical. You want to know exactly what it looks like, and exactly what was going on at the time.
It is possible that the logs could contain some information that you can reproduce on your system, and once you can reproduce the exact scenario, you might be able to coax the error out of hiding.
Screenshots also help with this, because you might discover that "X piece has loaded correctly, but it shouldn't have because it is dependent on Y" and that might give you a hint. Even if the user can describe what were doing, a screen shot could help even more.
Gather step-by-step description from the user
It's very common to blame the users, and not trust anything that they say (because they call a 'usercontrol' a 'thingy') but even though they might not know the names of what they're seeing, they will still be able to describe some of the behaviour they are seeing. This includes some minor errors that may have occured a few minutes BEFORE the real error occurred, or possibly slowness in certain things that are usually fast. All these things can be clues to help you narrow down which aspect is causing the error on their machine and not yours.
Try Alternate Approachs to produce the error
If all else fails, try looking at the section of code that is causing problems, and possibly refactor or use a workaround. If it is possible for you to create a scenario where you start with half the information already there (hopefully in UAT) ask the user to try that approach, and see if the error still occurs. Do you best to create alternate but similar approaches that get the error into a different light so that you can examine it better.

Short answer: Conduct a detailed code review on the suspected faulty code, with the aim of fixing any theoretical bugs, and adding code to monitor and log any future faults.
Long answer:
To give a real-world example from the embedded systems world: we make industrial equipment, containing custom electronics, and embedded software running on it.
A customer reported that a number of devices on a single site were experiencing the same fault at random intervals. Their symptoms were the same in each case, but they couldn't identify an obvious cause.
Obviously our first step was to try and reproduce the fault in the same device in our lab, but we were unable to do this.
So, instead, we circulated the suspected faulty code within the department, to try and get as many ideas and suggestions as possible. We then held a number of code review meetings to discuss these ideas, and determine a theory which: (a) explained the most likely cause of the faults observed in the field; (b) explained why we were unable to reproduce it; and (c) led to improvements we could make to the code to prevent the fault happening in the future.
In addition to the (theoretical) bug fixes, we also added monitoring and logging code, so if the fault were to occur again, we could extract useful data from the device in question.
To the best of my knowledge, this improved software was subsequently deployed on site, and appears to have been successful.

resolved "sterile" and "spooky"
We have two closed bug categories for this situation.
sterile - cannot reproduce.
spooky - it's acknowledged there is a problem, but it just appears intermittently, isn't quite understandable, and gives everyone a faint case of the creeps.

Error-reporting, log files, and stern demands to "Contact me immediately if this happens again."

If it happens in one context, and not in another, we try to enumerate the difference between both, and eliminate them.
Sometimes this works (e.g. other hardware, dual core vs. hyperthreading, laptop-disk vs. workstation disk, ...).
Sometimes it doesn't. If it's possible, we may start remote-debugging. If that doesn't help, we may try get our hands on the customer's system.
But of course, we don't write too many bugs in the first place :)

Well, you try your best to reproduce it, and if you can't, you take a long think and consider how such a problem might arise. If you still have no idea, then there's not much you can do about it.

Some of the new features in Visual Studio 2010 will help. See:
Historical Debugger and Test Impact Analysis in Visual Studio Team System 2010
Better Software Quality with Visual Studio Team System 2010
Manual Testing with Visual Studio Team System 2010

Sometimes the bug is not reproducible even in a pre-production environment that is the exact duplicate of the production environment. Concurrency issues are notorious for this.
Random Failures Are Often Concurrency Issues
Link: https://pragprog.com/tips/
The reason can be simply because of the Heisenberg effect, i.e. observation changes behaviour. Another reason can be because the chances are very small of hitting the combination of events that triggers the bug.
Sometimes you are lucky and you have audit logs that you can playback, greatly increasing the chances of recreating the issue. You can also stress the environment with high volumes of transactions. This effectively compresses time so that if the bug occurs say once a week, you may be able to reliably reproduce it in 1 day if you stress the system to 7 X the production load.
The last resort is whitebox testing where you go through the code line by line writing unit tests as you go.

I add logging to the exception handling code throughout the program. You need a method to collect the logs (users can email it, etc.)
Preemptive checks for code versions and sane environments are a good thing too. With the ease of software updates these days the code and environment the user is running has almost certainly not been tested. It didn't exist when you released your code.

With a web project I'm developing at the moment I'm doing something very similar to your technique. I'm building a page that I can direct users to in order to collect information such as their browser version and operating system. I'll also be collecting the apps registry info so i can have a look at what they've been doing.
This is a very real problem. I can only speak for web development, but I find users are rarely able to give me the basic information I would need to look into the issue. I suspect it's entirely possible to do something similar with other kinds of development. My plan is to keep working on this system to make it more and more useful.
But my policy is never to close a bug simply because I can't reproduce it, no matter how annoying it may be. And then there's the cases when it's not a bug, but the user has simply gotten confused. Which is a different type of bug I guess, but just as important.

You talk about problems that are reproducible but only on some systems. These are easy to handle:
First step: By using some sort of remote software, you let the customer tell you what to do to reproduce the problem on the system that has it. If this fails, then close it.
Second step: Try to reproduce the problem on another system. If this fails, make an exact copy of the customers system.
Third step: If it still fails, you have no option than to try to debug it on the customer system.
Once you can reproduce it, you can fix it. Doesn't matter on what system.
The tricky issue are truly non-reproducible issues, that is things that happen only intermittently. For that I'll have to chime in with the reports, logs and stern demands attitude. :)

It is important to categorize such bugs (rarely reproducible) and act on them differently than bugs that are frequently reproducible based on specific user actions.
Clear issue description along with steps to reproduce and observed behavior: Unambiguous reporting helps in understanding of the issue by entire team eliminating incorrect conclusions. For example, user reporting blank screen is different than HMI freeze on user action. Sequence of steps and approx timing of user action is also important.Did the user immediately select the option after screen transition or waited for a few minutes? An interesting bug concerning timing is a car allergic to vanilla ice-cream that baffled automotive engineers.
System config and startup parameters: Sometimes even hardware configuration and application software version (including drivers and firmware version) may do a trick or two. Mismatch of version or configuration can result in issues that are difficult to reproduce in other setups. Hence these are essential details to be captured. Most bug reporting tools have these details as mandatory parameters to report while logging an issue.
Extensive Logging: This is dependent on the logging facilities followed in concerned projects. While working with embedded Linux systems, we not only provide general diagnostic logs, but also system level logs like dmesg or top command logs. You may never know that wrong part is not the code flow but the abnormal memory usage/CPU usage. Identify the type of the issue and report the relevant logs for investigation.
Code Reviews and Walk-through: Dev teams cannot wait forever to reproduce these issues at their end and then take action. Bug report and available logs should be investigated and various possibilities be identified on this basis from design and code. If required, they should prepare hotfix on possible root causes and circulate the hotfix among teams including the tester who identified it to see if bug is reproducible with it.
Don't close these issues based on observation by a single tester/team after a fix is identified and checked in: Perhaps the most important part is approach followed to close these issues. Once fix of these issues has been checked in, all testing/validation teams at different locations should be informed on it for running intensive tests and identifying regression errors if any. Only all (practically most of them) of them reports as non-reproducible, a closure assessment has to be done by senior management.

If it is not reproduce able get logs, screen shots of exact steps to reproduce.

There's a nice new feature in Windows 7 that allows the user to record what they're doing and then send a report - it comes through as a doc with screen-shots of every stage. Hopefully it'll help in the cases where it's the user interacting with the application in an order that the developer wouldn't think of. I've seen plenty of bugs where it's just a case that the developer's logical way of using the app doesn't fit with how end users actually do it... resulting in lots of subtle errors.

Logging is your friend!
Generally what happens when we discover a bug that we can't reproduce is we either ask the customer to turn on more logging (if its available), or we release a version with extra logging added around the area we are interested in. Generally speaking the logging we have is excellent and has the ability to be very verbose, and so releasing versions with extra logging doesn't happen often.
You should also consider the use of memory dumps (which IMO also falls under the umbrella of logging). Producing a minidump is so quick that it can usually be done on production servers, even under load (as long as the number of dumps being produced is low).
The way I see it: Being able to reproduce a problem is nice because it gives you an environment where you can debug, experiement and play around in more freely, but - reproducing a bug is by no means essential to debug it! If the bug is only happening on someone else system then you still need to diagnose and debug the problem in the same way, its just that this time you need to be cleverer about how you do it.

The accepted answer is the best general approach. At a high level, it's worth weighing the importance of fixing the bug against what you could add as a feature or enhance that would benefit the user. Could a 'non-reproducible' bug take two days to fix? Could a feature be added in that time that gives users more benefit than that bug fix? Maybe the users would prefer the feature. I've been fixated at times as a developer on imperfections I can see, and then users are asked for feedback and none of them actually mention the bug(s) that I can see, but the software is missing a feature that they really want!
Sometimes, simple persistence in attempting to reproduce the bug whilst debugging can be the most effective approach. For this strategy to work, the bug needs to be 'intermittent' rather than completely 'non-reproducible'. If you can repeat a bug even one time in 10, and you have ideas about the most likely place it's occurring, you can place breakpoints at those points then doggedly attempt to repeat the bug and see exactly what's going on. I've experienced this to be more effective than logging in one or two cases (although logging would be my first go-to in general).

Related

Is PetraVM Jinx Beta 1 good?

PetraVM recently came out with a Beta release of their Jinx product. Has anyone checked it out yet? Any feedback?
By good, I mean:
1) easy to use
2) intuitive
3) useful
4) doesn't take a lot of code to integrate
... those kinds of things.
Thanks guys!
After literally stumbling across Jinx while poking around on Google, I have been on the beta and pre-beta tests with a fair amount of usage already under my belt. As with any beta related comments please understand that things may change or already have changed, so do keep this in mind and take the following with a grain of salt.
So, going through the list of questions one by one:
1) Install and go. Jinx adds a control panel to Visual Studio which you can mostly ignore as the defaults are typically good for most cases. Otherwise you just work normally and forget about it. Jinx does not instrument your code, require any additional libraries linked in or the numerous other things some tools require.
2) The question of "intuitive" is really up to the user. If you understand threaded code and the sorts of bugs possible, Jinx just makes those bugs happen much more frequently, which by itself is a huge benefit to people doing threaded code. While Jinx attempts to stop the code in a state that makes the problem as obvious as possible, "obvious" and "intuitive" are really up to the skill of the programmer.
3) Useful? Anyone who has done threaded code before knows that a race condition can happen regularly or once every month based on cosmic ray counts, that randomness makes debugging threaded code very difficult. With Jinx, even the most minor race condition can be reproduced usually on the first run consistently. This works even for lockless code that other static analysis or instrumenting tools would generally miss.
This sort of quick reproduction of problems is amazingly useful. Jinx has helped me track down a "one instruction in the wrong place" sort of bug that would actually hit once a week at most. Jinx forced the crash to happen almost immediately and allowed me to focus on the actual cause of the bug instead being completely in the dark as to the real source.
4) Integration with Jinx is a breeze. If you don't mind your machine becoming a bit slow, you can tell Jinx to watch the entire machine. It slows the machine down as it is actually watching everything on the machine, including the OS. While interresting and useful if your software consists of multiple processes on the same machine, this is not suggested as it can become painful to work with the machine.
Instead of using the global system, adding an include and two lines of code does the primary work needed of registering the process with Jinx such that Jinx can watch just the registered items. You can help Jinx by using the Jinx specific asserts and registering regions of code that are more important. In the case of the crash mentioned above though, I didn't have to do any of that and Jinx found the problem without the additional integration work. In any case, the integration is extremely simple.
After using Jinx for the last couple months, I have to say that overall it has been a great pleasure. I won't write new threaded code without Jinx running in the background anymore simply because it does its intended job of forcing obscure threading issues to be immediate assert/crashes. As mentioned, things that you could go weeks without seeing become problems almost immediately, this is a wonderful thing to have during initial test and implementation.
KRB
BTW, PetraVM has changed its name to Corensic and you can find Jinx Beta 2 over at www.corensic.com.
--Prashant, the marketing guy at Corensic

Bugs versus enhancement versus new feature [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
When planning and prioritizing what is to be included in a release, do you distinguish between bugs, feature enhancements and new features?
For example, do bugs always take priority - do you fix all known bugs before working on new features? Do you use a formal system for comparing the cost vs. value of each change in your backlog? And if so, do you compare bugs and features using the same formula? Is this different for commercial software vs. open source vs. in-house corporate software?
EDIT: Some great responses - thanks. While I had a preconceived opinion that you need to treat bugs, features, enhancements all the same, and simply select the work based on the cost/benfit of each, I think the reality is that this depends on your situation.
This choice is called triage, a term from emergency departments in hospitals where they have to decide who gets treated (and sometimes, unfortunately, who lives and dies).
As with all business decisions, it's a cost/benefit problem. What is the benefit of fixing a bug or adding a feature? What will it cost (including the opportunity cost of not doing something else)?
Pick those that have the most benefit for least cost. What you're aiming for is the maximum bang-per-buck. Resources are limited, desires are not, the perennial problem of capitalism :-)
There's no point fixing a bug experienced by only one customer who's never going to throw more repeat business your way if it means a feature that will sell hundreds of copies is dropped in the meantime.
For what it's worth, our company has a database of requested changes where customers can basically vote for what they want to see in upcoming versions of our products. The actual creation of these requested changes in that database is limited to the sales force since we don't want all sorts of requests showing up without being evaluated and discussed at least a little bit with the customers.
In addition, we regularly approach our biggest customers (in terms of revenue generated) with the list to figure out what features should be added (they are free to suggest their own desires as well, which also get entered into the database - obviously voting power depends a bit on revenue).
This is quite separate from our bug system although quite often bugs are raised which are actually new feature requests, and they're shipped across to the new features database. It's possible that this may even happen for real bugs that are considered low-impact or have a suitable workarounds in place.
We ask our users.
We have a niche product, and a small user base.
Seriously, the users group are paying maintenance, or thinking about buying.
So we ask them what they would like.
They suggest fixes, ask for new features.
We tell them about the development roadmap: because we have things we want to do to the product ,
due to times changing, design ideas. Changes to regulations.
And if every customer says "we really need feature X" : then it'll come next.
If they say "you guys need to fix the bug where I click there an it doesn't do blah:" then that bug gets fixed.
Commercial software: with the customers voting for changes.
Of course, we take their choices on advisement: the company have other things that are thinking about.
We always look at the cost of fixing the bug versus the problems caused by it. Sometimes, it just isn't worth it to have every single bug properly triaged, root caused, then fixed.
Plenty of times a particular enhancement or new feature is being funded or at least strongly recommended to occur by a large/good customer, so that also affects matters.
I like to think that bug fixes should always come before enhancements and new features, in all cases. Even if the particular bug isn't bothering you too much as the developer, someone somewhere is having their day ruined when your little error pops up.
distinguish, yes.
bugs take priority, yes.
all critical / normal priority and above bugs first, yes.
yes, the 80/20 rule.
no, bugs and features have to be treated differently because they are weighted differently.
yes, all commercial, open-source, and in house applications have their own way to do things.
As an example, FogBugz uses Evidence Based Scheduling and is the only management/tracker that i know of that uses that formula.
Hope that helps!
You have to look at it from the standpoint of what the bug is. A show-stopper bug is always number one priority. If people can't log in or critical data can't be entered or adjusted, etc. then that must take precedence over pretty much everything.
Bugs of lower significance can be worked in as need be. We may delay fixing a bug becasue we know we are working on that section for an enhancement next week. Then the bug fix will go with the enhancement. We may delay fixing a bug if it is minor and a planned enhancement will replace the code entirely shortly. A major enhancement might take precendence over fixing some typos on the interface. A client may tell us that this other project is more critical and to do it before fixing the bug (our software is highly customized by client). It all depends on the affect of the bug and existing plans and corporate politics once you are past the show stopper. A bug that is bothering a major client may take higher precedence even if it seems minor to the developer. If the CEO wants it fixed now, doesn't matter how unimportant it seems compared to the rest of the workload, it gets fixed now.
Point 5 of The Joel Test: 12 Steps to Better Code makes a compelling argument (in my opinion) that it's a good idea to fix bugs before writing new code.
For bugs, it's pretty simple: If you're going to fix it, fix it before you write any more code. Why? The more code you add, the harder the existing bug will become to find.
If you're okay with the idea of the bug never being fixed, by all means triage it over and add features.
Bugs? We have no bugs. They're undocumented features.
For us the choice is always based on business decisions and as a developer I have no input beyond offering my opinion on what should be top priority. More often than not, features win over bugs because adding features "appears" to the business area like progress is being made and bugs that I could have fixed a year ago continue to exist because the business side only wants to see "progress". Triage is great if your allowed, but all too often in the corporate environment, it's about visible results, not functionality.
One thing did not mention so far the severity of the bug. If the bug has high severity (like crash , can not pass duration test, and it surely depends on what kind of application you have) ,you should definitly fix it first before adding new feature.

When must you use poor design to finish a project?

There are many different bad practices, such as memory leaks, that are easy to slip into a program on accident. Sometimes, they might even be able to jury-rig your program together.
I'm working on a project right now and it works if I deliberately put a memory leak in my code. If I take the leak out, the code crashes. Obviously this is bad, and needs to be (and will be) fixed soon.
My question is, when do you decide to deliver code in this state, if it's not possible to release code without these poor practices, in time?
If the problem's impact on actual usage of the system can be reasonably expected to be none or negligible, and the delivery date cannot be pushed back, and it can be fixed within a scope of time before the problem's impact becomes more than negligible, ship it.
Obviously this is not ideal or even recommended, but you're clearly pushed into a corner at this point. Sometimes there are no good choices, but pragmatism must win over formal correctness. If an application has a memory leak, but we can reasonably expect that the app will be recycled or machine restarted or whatever before the leak becomes a real problem, that can sometimes be better than delivering late. It depends on the conditions of the agreement and the customer.
It's always better to at least try to push back the delivery date, but I am assuming you've already tried that and it's not an option here.
It is typical once an application has been shipped to ignore technical debt and move on. It's the responsibility of the developers to clearly communicate to the stakeholders the importance of paying off some of that debt, especially in a case like this.
However, given that it seems the customer cares more about a delivery date than correctness, it's less likely anyone will be convinced to pay off any debt once you go live. This is a bad situation to be in. Only the person with all the facts can make the right choice.
"My question is, when do you decide to deliver code in this state, if it's not possible to release code without these poor practices, in time?"
Never.
What you do instead: prioritize and focus.
If what you're working on is really high-priority, and you've mis-designed it, something low- priority has to be sacrificed. Often, some feature(s) must be delayed to give you time to focus on the high priority feature that doesn't work.
If what you're working on is really low-priority, you have to ask why you're not working on something higher-priority. And you still have to focus and prioritize. Sometimes things which are very low priority must be sacrificed.
When you can't do "everything" you have to pick things you can do that will be reasonably bug-free.
You might be interested in the concept of technical debt.
You only have three knobs you can turn when shipping software, assuming a fixed number of developers: features, quality and ship date, and turning one up means the others get turned down.
One of the most difficult things to do in software development is to build your product with the knobs set just right. For example, the Duke Nukem Forever guys have turned the features and quality knob up to eleven and thrown the ship date knob out the window. Microsoft often seems to glue the ship date knob in place and turn down the feature knob as needed, then unglue the ship date knob, turn it up a bit, glue it back down and continue twiddling the other two. And there are seem to be an endless amount of products out there that ship all the time but never put in the features they need to be successful.
In the end, you don't get paid if you don't deliver. Having poor quality hurts you terribly in the long run; reputations are hard to rebuild. It has almost always been that the right thing to do is to cut features if you have too many bugs. Always.
However, bug triage is just as important as feature development. What kind of leak are we talking about here? Are you leaking a byte? A small object? One thousand objects? Entire DLLs? There are scenarios where its probably better to leak a little than to fail to deliver the product.
And what do you mean by leak? Does your application have a well defined life cycle? Where you allocate something once at startup and then never free it until the process dies? Well how long does your process run? Do you expect to run multiple copies of your process?
Obviously you never want to leak, and you should work to develop best practices that minimize leaking, but in the end you have to make a judgment call. Maybe you can just explain the bug to your customers, explain the impact, and they'll buy it anyway. Maybe you can patch it a week later. Maybe you really do need to fix it. But we'd need to know more about it to give good advice.
I will say I have shipped known leaks in the past. I won't say what product or company, but I had a bug where DLL dependencies and insane lifetime management made it next to impossible to correctly free our references to a certain DLL once it was loaded, so in the end we just leaked it. And I still think it was the right thing to do. Other times I've seen things deliberately leaked to keep third party code that was written incorrectly from crashing (though that is a completely separate debate).
But in the end I believe such instances are rare and once you have identified the source of a memory leak, it shouldn't take much more than a day to fix it. It is rare indeed that I would ship with a memory leak that was known and a fix was known. It would have to be something that required a major re-architect involving changing a threading model, or refactoring huge swaths of code, and it would literally have to be a day or two before the product was to ship. At that point I might just leak the memory and promise a patch in a weak after proper testing could be done on the re-architect.
I would be very uncomfortable releasing with such a known bug. It is likely to occur in another way.
You have not specified your environment or language, but I suggest you look at using a memory checking tool such as:
Purify (trial available)
BoundsChecker
Valgrind
or even a free one, Visual Leak Detector
Perhaps, when you are not going to be around to maintain the code later, you don't care about your client/employer and none of the ramifications of your code could possibly affect you.
In other words, in your professional coding life, it's never a really good idea.
If you are working for someone that is less concerned about code quality than you are and simply wants you to finish the code at all costs, then I can see how you'd be in a difficult situation. Finishing faster but poorer will earn you some immediate reward. You should remember though that even if failing to meet your employer/client's expectation for a milestone bites you only once, your poor code may continue to bite you into the future, not only through the difficulties in maintaining it but also through the negative impression others may form of you down the track.
If its a single (or limited) memory leak, and it doesn't grow, and say it only causes a crash when shutting down (the most common case of stuff like this), then it depends. If its a client/desktop software and the users are going to crash every time on their way out, I'd make it an ultra high priority. If its server, and the only one running the server is you, and everything else works fine, I'd say its alright to enter beta. But if the leaks grow, or can cause crashes at "random" times they need to be fixed asap.
To get past an internal milestone, it's arguable, although still nothing to be taken likely.
To release, never. It always comes back and bites you. If your software is in such a bad space that a piece of poor design will get it over the line, you've got much bigger problems looming round the corner
Never, unless you don't care about the poor developer who is going to be maintaining your work afterwards.
Ultimately, a decision like this should be made by the customer or the project manager. Individual developers should not be making these kinds of decisions alone, or keeping this information to themselves.
Tell them what the problem is, and what the consequences will be for not fixing it. If they want you to ship it broken on time, that's their call.
If you don't want to work for people who accept shoddy products, that's your call, but it's a mistake to think that developers have some sort of professional responsibility to ignore their clients' and bosses' quality/cost/time priorities.
If somebody may actually die if you ship bad software, then don't do it, but if the worst-case scenario is that somebody is going to have to reboot a couple times per day, then do what you're told or find another job.

Reasons not to build your own bug tracking system [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
Several times now I've been faced with plans from a team that wants to build their own bug tracking system - Not as a product, but as an internal tool.
The arguments I've heard in favous are usually along the lines of :
Wanting to 'eat our own dog food' in terms of some internally built web framework
Needing some highly specialised report, or the ability to tweak some feature in some allegedly unique way
Believing that it isn't difficult to build a bug tracking system
What arguments might you use to support buying an existing bug tracking system? In particular, what features sound easy but turn out hard to implement, or are difficult and important but often overlooked?
First, look at these Ohloh metrics:
Trac: 44 KLoC, 10 Person Years, $577,003
Bugzilla: 54 KLoC, 13 Person Years, $714,437
Redmine: 171 KLoC, 44 Person Years, $2,400,723
Mantis: 182 KLoC, 47 Person Years, $2,562,978
What do we learn from these numbers? We learn that building Yet Another Bug Tracker is a great way to waste resources!
So here are my reasons to build your own internal bug tracking system:
You need to neutralize all the bozocoders for a decade or two.
You need to flush some money to avoid budget reduction next year.
Otherwise don't.
I would want to turn the question around. WHY on earth would you want to build your own?
If you need some extra fields, go with an existing package that can be modified.
Special report? Tap into the database and make it.
Believing that it isn't difficult? Try then. Spec it up, and see the list of features and hours grow. Then after the list is complete, try to find an existing package that can be modified before you implement your own.
In short, don't reinvent the wheel when another one just needs some tweaking to fit.
Programmers like to build their own ticket system because, having seen and used dozens of them, they know everything about it. That way they can stay in the comfort zone.
It's like checking out a new restaurant: it might be rewarding, but it carries a risk. Better to order pizza again.
There's also a great fact of decision making buried in there: there are always two reasons to do something: a good one and the right one. We make a decision ("Build our own"), then justify it ("we need full control"). Most people aren't even aware of their true motivation.
To change their minds, you have to attack the real reason, not the justification.
Not Invented Here syndrome!
Build your own bug tracker? Why not build your own mail client, project management tool, etc.
As Omer van Kloeten says elsewhere, pay now or pay later.
There is a third option, neither buy nor build. There are piles of good free ones out there.
For example:
Bugzilla
Trac
Rolling your own bug tracker for any use other than learning is not a good use of time.
Other links:
Three free bug-tracking tools
Comparison of issue tracking systems
I would just say it's a matter of money - buying a finished product you know is good for you (and sometimes not even buying if it's free) is better than having to go and develop one on your own. It's a simple game of pay now vs. pay later.
First, against the arguments in favor of building your own:
Wanting to 'eat our own dog food' in terms of some internally built web framework
That of course raises the question why build your own web framework. Just like there are many worthy free bug trackers out there, there are many worthy frameworks too. I wonder whether your developers have their priorities straight? Who's doing the work that makes your company actual money?
OK, if they must build a framework, let it evolve organically from the process of building the actual software your business uses to make money.
Needing some highly specialised report, or the ability to tweak some feature in some allegedly unique way
As others have said, grab one of the many fine open source trackers and tweak it.
Believing that it isn't difficult to build a bug tracking system
Well, I wrote the first version of my BugTracker.NET in just a couple of weeks, starting with no prior C# knowledge. But now, 6 years and a couple thousand hours later, there's still a big list of undone feature requests, so it all depends on what you want a bug tracking system to do. How much email integration, source control integration, permissions, workflow, time tracking, schedule estimation, etc. A bug tracker can be a major, major application.
What arguments might you use to support buying an existing bug tracking system?
Don't need to buy.Too many good open source ones: Trac, Mantis_Bug_Tracker, my own BugTracker.NET, to name a few.
In particular, what features sound easy but turn out hard to implement, or are difficult and important but often overlooked?
If you are creating it just for yourselves, then you can take a lot of shortcuts, because you can hard-wire things. If you are building it for lots of different users, in lots of different scenarios, then it's the support for configurability that is hard. Configurable workflow, custom fields, and permissions.
I think two features that a good bug tracker must have, that both FogBugz and BugTracker.NET have, are 1) integration of both incoming and outgoing email, so that the entire conversation about a bug lives with the bug and not in a separate email thread, and 2) a utility for turning a screenshot into a bug post with a just a couple of clicks.
The most basic argument for me would be the time loss. I doubt it could be completed in less than a month or two. Why spend the time when there are soooo many good bug tracking systems available? Give me an example of a feature that you have to tweak and is not readily available.
I think a good bug tracking system has to reflect your development process. A very custom development process is inherently bad for a company/team. Most agile practices favor Scrum or these kinds of things, and most bug tracking systems are in line with such suggestions and methods. Don't get too bureaucratic about this.
A bug tracking system can be a great project to start junior developers on. It's a fairly simple system that you can use to train them in your coding conventions and so forth. Getting junior developers to build such a system is relatively cheap and they can make their mistakes on something a customer will not see.
If it's junk you can just throw it away but you can give them a feeling of there work already being important to the company if it is used. You can't put a cost on a junior developer being able to experience the full life cycle and all the opportunities for knowledge transfer that such a project will bring.
We have done this here. We wrote our first one over 10 years ago. We then upgraded it to use web services, more as a way to learn the technology. The main reason we did this originally was that we wanted a bug tracking system that also produced version history reports and a few other features that we could not find in commercial products.
We are now looking at bug tracking systems again and are seriously considering migrating to Mantis and using Mantis Connect to add additional custom features of our own. The amount of effort in rolling our own system is just too great.
I guess we should also be looking at FogBugz :-)
Most importantly, where will you submit the bugs for your bug tracker before it's finished?
But seriously. The tools already exist, there's no need to reinvent the wheel. Modifying tracking tools to add certain specific features is one thing (I've modified Trac before)... rewriting one is just silly.
The most important thing you can point out is that if all they want to do is add a couple of specialized reports, it doesn't require a ground-up solution. And besides, the LAST place "your homebrew solution" matters is for internal tools. Who cares what you're using internally if it's getting the job done as you need it?
Being a programmer working on an already critical (or least, important) task, should not let yourself deviate by trying to develop something that is already available in the market (open source or commercial).
You will now try to create a bug tracking system to keep track of the bug tracking system that you use to track bugs in your core development.
First:
1. Choose the platform your bug system would run on (Java, PHP, Windows, Linux etc.)
2. Try finding open source tools that are available (by open source, I mean both commercial and free tools) on the platform you chose
3. Spend minimum time to try to customize to your need. If possible, don't waste time in customising at all
For an enterprise development team, we started using JIRA. We wanted some extra reports, SSO login, etc. JIRA was capable of it, and we could extend it using the already available plugin. Since the code was given part of paid-support, we only spent minimal time on writing the custom plugin for login.
Building on what other people have said, rather than just download a free / open source one. How about download it, then modify it entirely for your own needs? I know I've been required to do that in the past. I took an installation of Bugzilla and then modified it to support regression testing and test reporting (this was many years ago).
Don't reinvent the wheel unless you're convinced you can build a rounder wheel.
I'd say one of the biggest stumbling blocks would be agonising over the data model / workflow. I predict this will take a long time and involve many arguments about what should happen to a bug under certain circumstances, what really constitutes a bug, etc. Rather than spend months arguing to-and-fro, if you were to just roll out a pre-built system, most people will learn how to use it and make the best of it, no matter what decisions are already fixed. Choose something open-source, and you can always tweak it later if need be - that will be much quicker than rolling your own from scratch.
At this point, without a large new direction in bug tracking/ticketing, it would simply be re-inventing the wheel. Which seems to be what everyone else thinks, generally.
Your discussions will start with what consitutes a bug and evolve into what workflow to apply and end up with a massive argument about how to manage software engineering projects. Do you really want that? :-) Nah, thought not - go and buy one!
Most developers think that they have some unique powers that no one else has and therefore they can create a system that is unique in some way.
99% of them are wrong.
What are the chances that your company has employees in the 1%?
I have been on both sides of this debate so let me be a little two faced here.
When I was younger, I pushed to build our own bug tracking system. I just highlighted all of the things that the off the shelf stuff couldn't do, and I got management to go for it. Who did they pick to lead the team? Me! It was going to be my first chance to be a team lead and have a voice in everything from design to tools to personnel. I was thrilled. So my recommendation would be to check to the motivations of the people pushing this project.
Now that I'm older and faced with the same question again, I just decided to go with FogBugz. It does 99% of what we need and the costs are basically 0. Plus, Joel will send you personal emails making you feel special. And in the end, isn't that the problem, your developers think this will make them special?
Every software developer wants to build their own bug tracking system. It's because we can obviously improve on what's already out there since we are domain experts.
It's almost certainly not worth the cost (in terms of developer hours). Just buy JIRA.
If you need extra reports for your bug tracking system, you can add these, even if you have to do it by accessing the underlying database directly.
The quesion is what is your company paying you to do? Is it to write software that only you will use? Obviously not. So the only way you can justify the time and expense to build a bug tracking system is if it costs less than the costs associated with using even a free bug tracking system.
There well may be cases where this makes sense. Do you need to integrate with an existing system? (Time tracking, estimation, requirements, QA, automated testing)? Do you have some unique requirements in your organization related to say SOX Compliance that requires specific data elements that would be difficult to capture?
Are you in an extremely beauracratic environment that leads to significant "down-time" between projects?
If the answer is yes to these types of questions - then by all means the "buy" vs build arguement would say build.
If "Needing some highly specialised report, or the ability to tweak some feature in some allegedly unique way", the best and cheapest way to do that is to talk to the developers of existing bug tracking systems. Pay them to put that feature in their application, make it available to the world. Instead of reinventing the wheel, just pay the wheel manufacturers to put in spokes shaped like springs.
Otherwise, if trying to showcase a framework, its all good. Just make sure to put in the relevant disclaimers.
To the people who believe bug tracking system are not difficult to build, follow the waterfall SDLC strictly. Get all the requirements down up front. That will surely help them understand the complexity. These are typically the same people who say that a search engine isn't that difficult to build. Just a text box, a "search" button and a "i'm feeling lucky" button, and the "i'm feeling lucky" button can be done in phase 2.
Use some open source software as is.
For sure there are bugs, and you will need what is not yet there or is pending a bug fix. It happens all of the time. :)
If you extend/customize an open source version then you must maintain it. Now the application that is suppose to help you with testing money making applications will become a burden to support.
I think the reason people write their own bug tracking systems (in my experience) are,
They don't want to pay for a system they see as being relatively easy to build.
Programmer ego
General dissatisfaction with the experience and solution delivered by existing systems.
They sell it as a product :)
To me, the biggest reason why most bug trackers failed was that they did not deliver an optimum user experience and it can be very painful working with a system that you use a LOT, when it is not optimised for usability.
I think the other reason is the same as why almost every one of us (programmers) have built their own custom CMS or CMS framework at sometime (guilty as charged). Just because you can!
I agree with all the reasons NOT to. We tried for some time to use what's out there, and wound up writing our own anyway. Why? Mainly because most of them are too cumbersome to engage anyone but the technical people. We even tried basecamp (which, of course, isn't designed for this and failed in that regard).
We also came up with some unique functionality that worked great with our clients: a "report a bug" button that we scripted into code with one line of javascript. It allows our clients to open a small window, jot info in quickly and submit to the database.
But, it certainly took many hours to code; became a BIG pet project; lots of weekend time.
If you want to check it out: http://www.archerfishonline.com
Would love some feedback.
We've done this... a few times. The only reason we built our own is because it was five years ago and there weren't very many good alternatives. but now there are tons of alternatives. The main thing we learned in building our own tool is that you will spend a lot of time working on it. And that is time you could be billing for your time. It makes a lot more sense, as a small business, to pay the monthly fee which you can easily recoup with one or two billable hours, than to spend all that time rolling your own. Sure, you'll have to make some concessions, but you'll be far better off in the long run.
As for us, we decided to make our application available for other developers. Check it out at http://www.myintervals.com
Because Trac exists.
And because you'll have to train new staff on your bespoke software when they'll likely have experience in other systems which you can build on rather than throw away.
Because it's not billable time or even very useful unless you are going to sell it.
There are perfectly good bug tracking systems available, for example, FogBugz.
I worked in a startup for several years where we started with GNATS, an open source tool, and essentially built our own elaborate bug tracking system on top of it. The argument was that we would avoid spending a lot of money on a commercial system, and we would get a bug tracking system exactly fitted to our needs.
Of course, it turned out to be much harder than expected and was a big distraction for the developers - who also had to maintain the bug tracking system in addition to our code. This was one of the contributing factors to the demise of our company.
Don't write your own software just so you can "eat your own dog food". You're just creating more work, when you could probably purchase software that does the same thing (and better) for less time and money spent.
Tell them, that's great, the company could do with saving some money for a while and will be happy to contribute the development tools whilst you work on this unpaid sabbatical. Anyone who wishes to take their annual leave instead to work on the project is free to do so.

Which is better: shipping a buggy feature or not shipping the feature at all?

this is a bit of a philosophical question. I am adding a small feature to my software which I assume will be used by most users but only maybe 10% of the times they use the software. In other words, the software has been fine without it for 3 months, but 4 or 5 users have asked for it, and I agree that it should be there.
The problem is that, due to limitations of the platform I'm working with (and possibly limitations of my brain), "the best I can do" still has some non-critical but noticeable bugs - let's say the feature as coded is usable but "a bit wonky" in some cases.
What to do? Is a feature that's 90% there really "better than nothing"? I know I'll get some bug reports which I won't be able to fix: what do I tell customers about those? Should I live with unanswered feature requests or unanswered bug reports?
Make sure people know, that you know, that there are problems. That there are bugs. And give them an easy way to proide feedback.
What about having a "closed beta" with the "4 or 5 users" who suggested the feature in the first place?
There will always be unanswered feature requests and bug reports. Ship it, but include a readme with "known issues" and workarounds when possible.
You need to think of this from your user's perspective - which will cause less frustration? Buggy code is usually more frustrating than missing features.
Perfectionists may answer "don't do it".
Business people may answer "do it".
I guess where the balance is is up to you. I would be swaying towards putting the feature in there if the bugs are non-critical. Most users don't see your software the same way you do. You're a craftsman/artist, which means your more critical than regular people.
Is there any way that you can get a beta version to the 4-5 people who requested the feature? Then, once you get their feedback, it may be clear which decision to make.
Precisely document the wonkiness and ship it.
Make sure a user is likely to see and understand your documentation of the wonkiness.
You could even discuss the decision with users who have requested the feature: do some market research.
Just because you can't fix it now, doesn't mean you won't be able to in the future. Things change.
Label what you have now as a 'beta version' and send it out to those people who have asked for it. Get their feedback on how well it works, fix whatever they complain about, and you should then be ready to roll it out to larger groups of users.
Ship early, ship often, constant refactoring.
What I mean is, don't let it stop you from shipping, but don't give up on fixing the problems either.
An inability to resolve wonkiness is a sign of problems in your code base. Spend more time refactoring than adding features.
I guess it depends on your standards. For me, buggy code is not production ready and so shouldn't be shipped. Could you have a beta version with a known issues list so users know what to expect under certain conditions? They get the benefit of using the new features but also know that it's not perfect (use that their own risk). This may keep those 4 or 5 customers that requested the feature happy for a while which gives you more time to fix the bugs (if possible) and release to production later for the masses.
Just some thoughts depending on your situation.
Depends. On the bugs, their severity and how much effort you think it will take to fix them. On the deadline and how much you think you can stretch it. On the rest of the code and how much the client can do with it.
I would not expect coders to deliver known problems into test let alone to release to a customer.
Mind you, I believe in zero tolerance of bugs. Interestingly I find that it is usually developers/ testers who are keenest to remove all bugs - it is often the project manager and/ or customer who are willing to accept bugs.
If you must release the code, then document every feature/ bug that you are aware of, and commit to fixing each one.
Why don't you post more information about the limitations of the platform you are working on, and perhaps some of the clever folk here can help get your bug list down.
If the demand is for a feature NOW, rather than a feature that works. You may have to ship.
In this situation though:
Make sure you document the bug(s)
and consequences (both to the user
and other developers).
Be sure to add the bug(s) to your
bug tracking database.
If you write unit tests (I hope so),
make sure that tests are written
which highlight the bugs, before you
ship. This will mean that when you
come to fix the bugs in the future,
you know where and what they are,
without having to remember.
Schedule the work to fix the bugs
ASAP. You do fix bugs before
writing new code, don't you?
If bugs can cause death or can lose users' files then don't ship it.
If bugs can cause the application to crash itself then ship it with a warning (a readme or whatever). If crashes might cause the application to corrupt the users' files that they were in the middle of editing with this exact application, then display a warning each time they start up the application, and remind them to backup their files first.
If bugs can cause BSODs then be very careful about who you ship it to.
If it doesn't break anything else, why not ship it? It sounds like you have a good relationship with your customers, so those who want the feature will be happy to get it even if it's not all the way there, and those who don't want it won't care. Plus you'll get lots of feedback to improve it in the next release!
The important question you need to answer is if your feature will solve a real business need given the design you've come up with. Then it's only a matter of making the implementation match the design - making the "bugs" being non-bugs by defining them as not part of the intended behaviour of the feature (which should be covered by the design).
This boils down to a very real choice of paths: is a bug something that doesn't work properly, that wasn't part of the intended behaviour and design? Or is it a bug only if if doesn't work in accordance to the intended behaviour?
I am a firm believer in the latter; bugs are the things that do not work the way they were intended to work. The implementation should capture the design, that should capture the business need. If the implementation is used to address a different business need that wasn't covered by the design, it is the design that is at fault, not the implementation; thus it is not a bug.
The former attitude is by far the most common amongst programmers in my experience. It is also the way the user views software issues. From a software development perspective, however, it is not a good idea to adopt this view, because it leads you to fix bugs that are not bugs, but design flaws, instead of redesigning the solution to the business need.
Coming from someone who has to install buggy software for their users - don't ship it with that feature enabled.
It doesn't matter if you document it, the end users will forget about that bug the first time they hit it, and that bug will become critical to them not being able to do their job.

Resources