When must you use poor design to finish a project? - deadlines

There are many different bad practices, such as memory leaks, that are easy to slip into a program on accident. Sometimes, they might even be able to jury-rig your program together.
I'm working on a project right now and it works if I deliberately put a memory leak in my code. If I take the leak out, the code crashes. Obviously this is bad, and needs to be (and will be) fixed soon.
My question is, when do you decide to deliver code in this state, if it's not possible to release code without these poor practices, in time?

If the problem's impact on actual usage of the system can be reasonably expected to be none or negligible, and the delivery date cannot be pushed back, and it can be fixed within a scope of time before the problem's impact becomes more than negligible, ship it.
Obviously this is not ideal or even recommended, but you're clearly pushed into a corner at this point. Sometimes there are no good choices, but pragmatism must win over formal correctness. If an application has a memory leak, but we can reasonably expect that the app will be recycled or machine restarted or whatever before the leak becomes a real problem, that can sometimes be better than delivering late. It depends on the conditions of the agreement and the customer.
It's always better to at least try to push back the delivery date, but I am assuming you've already tried that and it's not an option here.
It is typical once an application has been shipped to ignore technical debt and move on. It's the responsibility of the developers to clearly communicate to the stakeholders the importance of paying off some of that debt, especially in a case like this.
However, given that it seems the customer cares more about a delivery date than correctness, it's less likely anyone will be convinced to pay off any debt once you go live. This is a bad situation to be in. Only the person with all the facts can make the right choice.

"My question is, when do you decide to deliver code in this state, if it's not possible to release code without these poor practices, in time?"
Never.
What you do instead: prioritize and focus.
If what you're working on is really high-priority, and you've mis-designed it, something low- priority has to be sacrificed. Often, some feature(s) must be delayed to give you time to focus on the high priority feature that doesn't work.
If what you're working on is really low-priority, you have to ask why you're not working on something higher-priority. And you still have to focus and prioritize. Sometimes things which are very low priority must be sacrificed.
When you can't do "everything" you have to pick things you can do that will be reasonably bug-free.

You might be interested in the concept of technical debt.

You only have three knobs you can turn when shipping software, assuming a fixed number of developers: features, quality and ship date, and turning one up means the others get turned down.
One of the most difficult things to do in software development is to build your product with the knobs set just right. For example, the Duke Nukem Forever guys have turned the features and quality knob up to eleven and thrown the ship date knob out the window. Microsoft often seems to glue the ship date knob in place and turn down the feature knob as needed, then unglue the ship date knob, turn it up a bit, glue it back down and continue twiddling the other two. And there are seem to be an endless amount of products out there that ship all the time but never put in the features they need to be successful.
In the end, you don't get paid if you don't deliver. Having poor quality hurts you terribly in the long run; reputations are hard to rebuild. It has almost always been that the right thing to do is to cut features if you have too many bugs. Always.
However, bug triage is just as important as feature development. What kind of leak are we talking about here? Are you leaking a byte? A small object? One thousand objects? Entire DLLs? There are scenarios where its probably better to leak a little than to fail to deliver the product.
And what do you mean by leak? Does your application have a well defined life cycle? Where you allocate something once at startup and then never free it until the process dies? Well how long does your process run? Do you expect to run multiple copies of your process?
Obviously you never want to leak, and you should work to develop best practices that minimize leaking, but in the end you have to make a judgment call. Maybe you can just explain the bug to your customers, explain the impact, and they'll buy it anyway. Maybe you can patch it a week later. Maybe you really do need to fix it. But we'd need to know more about it to give good advice.
I will say I have shipped known leaks in the past. I won't say what product or company, but I had a bug where DLL dependencies and insane lifetime management made it next to impossible to correctly free our references to a certain DLL once it was loaded, so in the end we just leaked it. And I still think it was the right thing to do. Other times I've seen things deliberately leaked to keep third party code that was written incorrectly from crashing (though that is a completely separate debate).
But in the end I believe such instances are rare and once you have identified the source of a memory leak, it shouldn't take much more than a day to fix it. It is rare indeed that I would ship with a memory leak that was known and a fix was known. It would have to be something that required a major re-architect involving changing a threading model, or refactoring huge swaths of code, and it would literally have to be a day or two before the product was to ship. At that point I might just leak the memory and promise a patch in a weak after proper testing could be done on the re-architect.

I would be very uncomfortable releasing with such a known bug. It is likely to occur in another way.
You have not specified your environment or language, but I suggest you look at using a memory checking tool such as:
Purify (trial available)
BoundsChecker
Valgrind
or even a free one, Visual Leak Detector

Perhaps, when you are not going to be around to maintain the code later, you don't care about your client/employer and none of the ramifications of your code could possibly affect you.
In other words, in your professional coding life, it's never a really good idea.
If you are working for someone that is less concerned about code quality than you are and simply wants you to finish the code at all costs, then I can see how you'd be in a difficult situation. Finishing faster but poorer will earn you some immediate reward. You should remember though that even if failing to meet your employer/client's expectation for a milestone bites you only once, your poor code may continue to bite you into the future, not only through the difficulties in maintaining it but also through the negative impression others may form of you down the track.

If its a single (or limited) memory leak, and it doesn't grow, and say it only causes a crash when shutting down (the most common case of stuff like this), then it depends. If its a client/desktop software and the users are going to crash every time on their way out, I'd make it an ultra high priority. If its server, and the only one running the server is you, and everything else works fine, I'd say its alright to enter beta. But if the leaks grow, or can cause crashes at "random" times they need to be fixed asap.

To get past an internal milestone, it's arguable, although still nothing to be taken likely.
To release, never. It always comes back and bites you. If your software is in such a bad space that a piece of poor design will get it over the line, you've got much bigger problems looming round the corner

Never, unless you don't care about the poor developer who is going to be maintaining your work afterwards.

Ultimately, a decision like this should be made by the customer or the project manager. Individual developers should not be making these kinds of decisions alone, or keeping this information to themselves.
Tell them what the problem is, and what the consequences will be for not fixing it. If they want you to ship it broken on time, that's their call.
If you don't want to work for people who accept shoddy products, that's your call, but it's a mistake to think that developers have some sort of professional responsibility to ignore their clients' and bosses' quality/cost/time priorities.
If somebody may actually die if you ship bad software, then don't do it, but if the worst-case scenario is that somebody is going to have to reboot a couple times per day, then do what you're told or find another job.

Related

Is PetraVM Jinx Beta 1 good?

PetraVM recently came out with a Beta release of their Jinx product. Has anyone checked it out yet? Any feedback?
By good, I mean:
1) easy to use
2) intuitive
3) useful
4) doesn't take a lot of code to integrate
... those kinds of things.
Thanks guys!
After literally stumbling across Jinx while poking around on Google, I have been on the beta and pre-beta tests with a fair amount of usage already under my belt. As with any beta related comments please understand that things may change or already have changed, so do keep this in mind and take the following with a grain of salt.
So, going through the list of questions one by one:
1) Install and go. Jinx adds a control panel to Visual Studio which you can mostly ignore as the defaults are typically good for most cases. Otherwise you just work normally and forget about it. Jinx does not instrument your code, require any additional libraries linked in or the numerous other things some tools require.
2) The question of "intuitive" is really up to the user. If you understand threaded code and the sorts of bugs possible, Jinx just makes those bugs happen much more frequently, which by itself is a huge benefit to people doing threaded code. While Jinx attempts to stop the code in a state that makes the problem as obvious as possible, "obvious" and "intuitive" are really up to the skill of the programmer.
3) Useful? Anyone who has done threaded code before knows that a race condition can happen regularly or once every month based on cosmic ray counts, that randomness makes debugging threaded code very difficult. With Jinx, even the most minor race condition can be reproduced usually on the first run consistently. This works even for lockless code that other static analysis or instrumenting tools would generally miss.
This sort of quick reproduction of problems is amazingly useful. Jinx has helped me track down a "one instruction in the wrong place" sort of bug that would actually hit once a week at most. Jinx forced the crash to happen almost immediately and allowed me to focus on the actual cause of the bug instead being completely in the dark as to the real source.
4) Integration with Jinx is a breeze. If you don't mind your machine becoming a bit slow, you can tell Jinx to watch the entire machine. It slows the machine down as it is actually watching everything on the machine, including the OS. While interresting and useful if your software consists of multiple processes on the same machine, this is not suggested as it can become painful to work with the machine.
Instead of using the global system, adding an include and two lines of code does the primary work needed of registering the process with Jinx such that Jinx can watch just the registered items. You can help Jinx by using the Jinx specific asserts and registering regions of code that are more important. In the case of the crash mentioned above though, I didn't have to do any of that and Jinx found the problem without the additional integration work. In any case, the integration is extremely simple.
After using Jinx for the last couple months, I have to say that overall it has been a great pleasure. I won't write new threaded code without Jinx running in the background anymore simply because it does its intended job of forcing obscure threading issues to be immediate assert/crashes. As mentioned, things that you could go weeks without seeing become problems almost immediately, this is a wonderful thing to have during initial test and implementation.
KRB
BTW, PetraVM has changed its name to Corensic and you can find Jinx Beta 2 over at www.corensic.com.
--Prashant, the marketing guy at Corensic

Bugs versus enhancement versus new feature [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
When planning and prioritizing what is to be included in a release, do you distinguish between bugs, feature enhancements and new features?
For example, do bugs always take priority - do you fix all known bugs before working on new features? Do you use a formal system for comparing the cost vs. value of each change in your backlog? And if so, do you compare bugs and features using the same formula? Is this different for commercial software vs. open source vs. in-house corporate software?
EDIT: Some great responses - thanks. While I had a preconceived opinion that you need to treat bugs, features, enhancements all the same, and simply select the work based on the cost/benfit of each, I think the reality is that this depends on your situation.
This choice is called triage, a term from emergency departments in hospitals where they have to decide who gets treated (and sometimes, unfortunately, who lives and dies).
As with all business decisions, it's a cost/benefit problem. What is the benefit of fixing a bug or adding a feature? What will it cost (including the opportunity cost of not doing something else)?
Pick those that have the most benefit for least cost. What you're aiming for is the maximum bang-per-buck. Resources are limited, desires are not, the perennial problem of capitalism :-)
There's no point fixing a bug experienced by only one customer who's never going to throw more repeat business your way if it means a feature that will sell hundreds of copies is dropped in the meantime.
For what it's worth, our company has a database of requested changes where customers can basically vote for what they want to see in upcoming versions of our products. The actual creation of these requested changes in that database is limited to the sales force since we don't want all sorts of requests showing up without being evaluated and discussed at least a little bit with the customers.
In addition, we regularly approach our biggest customers (in terms of revenue generated) with the list to figure out what features should be added (they are free to suggest their own desires as well, which also get entered into the database - obviously voting power depends a bit on revenue).
This is quite separate from our bug system although quite often bugs are raised which are actually new feature requests, and they're shipped across to the new features database. It's possible that this may even happen for real bugs that are considered low-impact or have a suitable workarounds in place.
We ask our users.
We have a niche product, and a small user base.
Seriously, the users group are paying maintenance, or thinking about buying.
So we ask them what they would like.
They suggest fixes, ask for new features.
We tell them about the development roadmap: because we have things we want to do to the product ,
due to times changing, design ideas. Changes to regulations.
And if every customer says "we really need feature X" : then it'll come next.
If they say "you guys need to fix the bug where I click there an it doesn't do blah:" then that bug gets fixed.
Commercial software: with the customers voting for changes.
Of course, we take their choices on advisement: the company have other things that are thinking about.
We always look at the cost of fixing the bug versus the problems caused by it. Sometimes, it just isn't worth it to have every single bug properly triaged, root caused, then fixed.
Plenty of times a particular enhancement or new feature is being funded or at least strongly recommended to occur by a large/good customer, so that also affects matters.
I like to think that bug fixes should always come before enhancements and new features, in all cases. Even if the particular bug isn't bothering you too much as the developer, someone somewhere is having their day ruined when your little error pops up.
distinguish, yes.
bugs take priority, yes.
all critical / normal priority and above bugs first, yes.
yes, the 80/20 rule.
no, bugs and features have to be treated differently because they are weighted differently.
yes, all commercial, open-source, and in house applications have their own way to do things.
As an example, FogBugz uses Evidence Based Scheduling and is the only management/tracker that i know of that uses that formula.
Hope that helps!
You have to look at it from the standpoint of what the bug is. A show-stopper bug is always number one priority. If people can't log in or critical data can't be entered or adjusted, etc. then that must take precedence over pretty much everything.
Bugs of lower significance can be worked in as need be. We may delay fixing a bug becasue we know we are working on that section for an enhancement next week. Then the bug fix will go with the enhancement. We may delay fixing a bug if it is minor and a planned enhancement will replace the code entirely shortly. A major enhancement might take precendence over fixing some typos on the interface. A client may tell us that this other project is more critical and to do it before fixing the bug (our software is highly customized by client). It all depends on the affect of the bug and existing plans and corporate politics once you are past the show stopper. A bug that is bothering a major client may take higher precedence even if it seems minor to the developer. If the CEO wants it fixed now, doesn't matter how unimportant it seems compared to the rest of the workload, it gets fixed now.
Point 5 of The Joel Test: 12 Steps to Better Code makes a compelling argument (in my opinion) that it's a good idea to fix bugs before writing new code.
For bugs, it's pretty simple: If you're going to fix it, fix it before you write any more code. Why? The more code you add, the harder the existing bug will become to find.
If you're okay with the idea of the bug never being fixed, by all means triage it over and add features.
Bugs? We have no bugs. They're undocumented features.
For us the choice is always based on business decisions and as a developer I have no input beyond offering my opinion on what should be top priority. More often than not, features win over bugs because adding features "appears" to the business area like progress is being made and bugs that I could have fixed a year ago continue to exist because the business side only wants to see "progress". Triage is great if your allowed, but all too often in the corporate environment, it's about visible results, not functionality.
One thing did not mention so far the severity of the bug. If the bug has high severity (like crash , can not pass duration test, and it surely depends on what kind of application you have) ,you should definitly fix it first before adding new feature.

does a disaster proof language exist?

When creating system services which must have a high reliability, I often end up writing the a lot of 'failsafe' mechanisms in case of things like: communications which are gone (for instance communication with the DB), what would happen if the power is lost and the service restarts.... how to pick up the pieces and continue in a correct way (and remembering that while picking up the pieces the power could go out again...), etc etc
I can imagine for not too complex systems, a language which would cater for this would be very practical. So a language which would remember it's state at any given moment, no matter if the power gets cut off, and continues where it left off.
Does this exist yet? If so, where can I find it? If not, why can't this be realized? It would seem to me very handy for critical systems.
p.s. In case the DB connection is lost, it would signal that a problem arose, and manual intervention is needed. The moment he connection is restored, it would continue where it left off.
EDIT:
Since the discussion seems to have died off let me add a few points(while waiting before I can add a bounty to the question)
The Erlang response seems to be top rated right now. I'm aware of Erlang and have read the pragmatic book by Armstrong (the principal creator). It's all very nice (although functional languages make my head spin with all the recursion), but the 'fault tolerant' bit doesn't come automatically. Far from it. Erlang offers a lot of supervisors en other methodologies to supervise a process, and restart it if necessary. However, to properly make something which works with these structures, you need to be quite the erlang guru, and need to make your software fit all these frameworks. Also, if the power drops, the programmer too has to pick up the pieces and try to recover the next time the program restarts
What I'm searching is something far simpler:
Imagine a language (as simple as PHP for instance), where you can do things like do DB queries, act on it, perform file manipulations, perform folder manipulations, etc.
It's main feature however should be: If the power dies, and the thing restarts it takes of where it left off (So it not only remembers where it was, it will remember the variable states as well). Also, if it stopped in the middle of a filecopy, it will also properly resume. etc etc.
Last but not least, if the DB connection drops and can't be restored, the language just halts, and signals (syslog perhaps) for human intervention, and then carries on where it left off.
A language like this would make a lot of services programming a lot easier.
EDIT:
It seems (judging by all the comments and answers) that such a system doesn't exist. And probably will not in the near foreseeable future due to it being (near?) impossible to get right.
Too bad.... again I'm not looking for this language (or framework) to get me to the moon, or use it to monitor someones heartrate. But for small periodic services/tasks which always end up having loads of code handling bordercases (powerfailure somewhere in the middle, connections dropping and not coming back up),...where a pause here,...fix the issues,....and continue where you left off approach would work well.
(or a checkpoint approach as one of the commenters pointed out (like in a videogame). Set a checkpoint.... and if the program dies, restart here the next time.)
Bounty awarded:
At the last possible minute when everyone was coming to the conclusion it can't be done, Stephen C comes with napier88 which seems to have the attributes I was looking for.
Although it is an experimental language, it does prove it can be done and it is a something which is worth investigating more.
I'll be looking at creating my own framework (with persistent state and snapshots perhaps) to add the features I'm looking for in .Net or another VM.
Everyone thanks for the input and the great insights.
Erlang was designed for use in Telecommunication systems, where high-rel is fundamental. I think they have standard methodology for building sets of communicating processes in which failures can be gracefully handled.
ERLANG is a concurrent functional language, well suited for distributed, highly concurrent and fault-tolerant software. An important part of Erlang is its support for failure recovery. Fault tolerance is provided by organising the processes of an ERLANG application into tree structures. In these structures, parent processes monitor failures of their children and are responsible for their restart.
Software Transactional Memory (STM) combined with nonvolatile RAM would probably satisfy the OP's revised question.
STM is a technique for implementating "transactions", e.g., sets of actions that are done effectively as an atomic operation, or not at all. Normally the purpose of STM is to enable highly parallel programs to interact over shared resources in a way which is easier to understand than traditional lock-that-resource programming, and has arguably lower overhead by virtue of having a highly optimistic lock-free style of programming.
The fundamental idea is simple: all reads and writes inside a "transaction" block are recorded (somehow!); if any two threads conflict on the these sets (read-write or write-write conflicts) at the end of either of their transactions, one is chosen as the winner and proceeds, and the other is forced to roll back his state to the beginning of the transaction and re-execute.
If one insisted that all computations were transactions, and the state at the beginning(/end) of each transaction was stored in nonvolatile RAM (NVRAM), a power fail could be treated as a transaction failure resulting in a "rollback". Computations would proceed only from transacted states in a reliable way. NVRAM these days can be implemented with Flash memory or with battery backup. One might need a LOT of NVRAM, as programs have a lot of state (see minicomputer story at end). Alternatively, committed state changes could be written to log files that were written to disk; this is the standard method used by most databases and by reliable filesystems.
The current question with STM is, how expensive is it to keep track of the potential transaction conflicts? If implementing STM slows the machine down by an appreciable amount, people will live with existing slightly unreliable schemes rather than give up that performance. So far the story isn't good, but then the research is early.
People haven't generally designed languages for STM; for research purposes, they've mostly
enhanced Java with STM (see Communications of ACM article in June? of this year). I hear MS has an experimental version of C#. Intel has an experimental version for C and C++.
THe wikipedia page has a long list. And the functional programming guys
are, as usual, claiming that the side-effect free property of functional programs makes STM relatively trivial to implement in functional languages.
If I recall correctly, back in the 70s there was considerable early work in distributed operating systems, in which processes (code+state) could travel trivally from machine to machine. I believe several such systems explicitly allowed node failure, and could restart a process in a failed node from save state in another node. Early key work was on the
Distributed Computing System by Dave Farber. Because designing languages back in the 70s was popular, I recall DCS had it had its own programming language but I don't remember the name. If DCS didn't allow node failure and restart, I'm fairly sure the follow on research systems did.
EDIT: A 1996 system which appears on first glance to have the properties you desire is
documented here.
Its concept of atomic transactions is consistent with the ideas behind STM.
(Goes to prove there isn't a lot new under the sun).
A side note: Back in in 70s, Core Memory was still king. Core, being magnetic, was nonvolatile across power fails, and many minicomputers (and I'm sure the mainframes) had power fail interrupts that notified the software some milliseconds ahead of loss of power. Using that, one could easily store the register state of the machine and shut it down completely. When power was restored, control would return to a state-restoring point, and the software could proceed. Many programs could thus survive power blinks and reliably restart. I personally built a time-sharing system on a Data General Nova minicomputer; you could actually have it running 16 teletypes full blast, take a power hit, and come back up and restart all the teletypes as if nothing happened. The change from cacophony to silence and back was stunning, I know, I had to repeat it many times to debug the power-failure management code, and it of course made great demo (yank the plug, deathly silence, plug back in...). The name of the language that did this, was of course Assembler :-}
From what I know¹, Ada is often used in safety critical (failsafe) systems.
Ada was originally targeted at
embedded and real-time systems.
Notable features of Ada include:
strong typing, modularity mechanisms
(packages), run-time checking,
parallel processing (tasks), exception
handling, and generics. Ada 95 added
support for object-oriented
programming, including dynamic
dispatch.
Ada supports run-time checks in order
to protect against access to
unallocated memory, buffer overflow
errors, off-by-one errors, array
access errors, and other detectable
bugs. These checks can be disabled in
the interest of runtime efficiency,
but can often be compiled efficiently.
It also includes facilities to help
program verification.
For these
reasons, Ada is widely used in
critical systems, where any anomaly
might lead to very serious
consequences, i.e., accidental death
or injury. Examples of systems where
Ada is used include avionics, weapon
systems (including thermonuclear
weapons), and spacecraft.
N-Version programming may also give you some helpful background reading.
¹That's basically one acquaintance who writes embedded safety critical software
I doubt that the language features you are describing are possible to achieve.
And the reason for that is that it would be very hard to define common and general failure modes and how to recover from them. Think for a second about your sample application - some website with some logic and database access. And lets say we have a language that can detect power shutdown and subsequent restart, and somehow recover from it. The problem is that it is impossible to know for the language how to recover.
Let's say your app is an online blog application. In that case it might be enough to just continue from the point we failed and all be ok. However consider similar scenario for an online bank. Suddenly it's no longer smart to just continue from the same point. For example if I was trying to withdraw some money from my account, and the computer died right after the checks but before it performed the withdrawal, and it then goes back one week later it will give me the money even though my account is in the negative now.
In other words, there is no single correct recovery strategy, so this is not something that can be implemented into the language. What language can do is to tell you when something bad happens - but most languages already support that with exception handling mechanisms. The rest is up to application designers to think about.
There are a lot of technologies that allow designing fault tolerant applications. Database transactions, durable message queues, clustering, hardware hot swapping and so on and on. But it all depends on concrete requirements and how much the end user is willing to pay for it all.
There is an experimental language called Napier88 that (in theory) has some attributes of being disaster-proof. The language supports Orthogonal Persistence, and in some implementations this extends (extended) to include the state of the entire computation. Specifically, when the Napier88 runtime system check-pointed a running application to the persistent store, the current thread state would be included in the checkpoint. If the application then crashed and you restarted it in the right way, you could resume the computation from the checkpoint.
Unfortunately, there are a number of hard issues that need to be addressed before this kind of technology is ready for mainstream use. These include figuring out how to support multi-threading in the context of orthogonal persistence, figuring out how to allow multiple processes share a persistent store, and scalable garbage collection of persistent stores.
And there is the problem of doing Orthogonal Persistence in a mainstream language. There have been attempts to do OP in Java, including one that was done by people associated with Sun (the Pjama project), but there is nothing active at the moment. The JDO / Hibernate approaches are more favoured these days.
I should point out that Orthogonal Persistence isn't really disaster-proof in the large sense. For instance, it cannot deal with:
reestablishment of connections, etc with "outside" systems after a restart,
application bugs that cause corruption of persisted data, or
loss of data due to something bringing down the system between checkpoints.
For those, I don't believe there are general solutions that would be practical.
The majority of such efforts - termed 'fault tolerance' - are around the hardware, not the software.
The extreme example of this is Tandem, whose 'nonstop' machines have complete redundancy.
Implementing fault tolerance at a hardware level is attractive because a software stack is typically made from components sourced from different providers - your high availability software application might be installed along side some decidedly shaky other applications and services on top of an operating system that is flaky and using hardware device drivers that are decidedly fragile..
But at a language level, almost all languages offer the facilities for proper error checking. However, even with RAII, exceptions, constraints and transactions, these code-paths are rarely tested correctly and rarely tested together in multiple-failure scenerios, and its usually in the error handling code that the bugs hide. So its more about programmer understanding, discipline and trade-offs than about the languages themselves.
Which brings us back to the fault tolerance at the hardware level. If you can avoid your database link failing, you can avoid exercising the dodgy error handling code in the applications.
No, a disaster-proof language does not exist.
Edit:
Disaster-proof implies perfection. It brings to mind images of a process which applies some intelligence to resolve unknown, unspecified and unexpected conditions in a logical manner. There is no manner by which a programming language can do this. If you, as the programmer, can not figure out how your program is going to fail and how to recover from it then your program isn't going to be able to do so either.
Disaster from an IT perspective can arise in so many fashions that no one process can resolve all of those different issues. The idea that you could design a language to address all of the ways in which something could go wrong is just incorrect. Due to the abstraction from the hardware many problems don't even make much sense to address with a programming language; yet they are still 'disasters'.
Of course, once you start limiting the scope of the problem; then we can begin talking about developing a solution to it. So, when we stop talking about being disaster-proof and start speaking about recovering from unexpected power surges it becomes much easier to develop a programming language to address that concern even when, perhaps, it doesn't make much sense to handle that issue at such a high level of the stack. However, I will venture a prediction that once you scope this down to realistic implementations it becomes uninteresting as a language since it has become so specific. i.e. Use my scripting language to run batch processes overnight that will recover from unexpected power surges and lost network connections (with some human assistance); this is not a compelling business case to my mind.
Please don't misunderstand me. There are some excellent suggestions within this thread but to my mind they do not rise to anything even remotely approaching disaster-proof.
Consider a system built from non-volatile memory. The program state is persisted at all times, and should the processor stop for any length of time, it will resume at the point it left when it restarts. Therefore, your program is 'disaster proof' to the extent that it can survive a power failure.
This is entirely possible, as other posts have outlined when talking about Software Transactional Memory, and 'fault tolerance' etc. Curious nobody mentioned 'memristors', as they would offer a future architecture with these properties and one that is perhaps not completely von Neumann architecture too.
Now imagine a system built from two such discrete systems - for a straightforward illustration, one is a database server and the other an application server for an online banking website.
Should one pause, what does the other do? How does it handle the sudden unavailability of it's co-worker?
It could be handled at the language level, but that would mean lots of error handling and such, and that's tricky code to get right. That's pretty much no better than where we are today, where machines are not check-pointed but the languages try and detect problems and ask the programmer to deal with them.
It could pause too - at the hardware level they could be tied together, such that from a power perspective they are one system. But that's hardly a good idea; better availability would come from a fault-tolerant architecture with backup systems and such.
Or we could use persistant message queues between the two machines. However, at some point these messages get processed, and they could at that point be too old! Only application logic can really work what to do in that circumstances, and there we are back to languages delegating to the programmer again.
So it seems that the disaster-proofing is better in the current form - uninterrupted power supplies, hot backup servers ready to go, multiple network routes between hosts, etc. And then we only have to hope that our software is bug-free!
Precise answer:
Ada and SPARK were designed for maximum fault-tolerance and to move all bugs possible to compile-time rather than runtime. Ada was designed by the US Dept of Defense for military and aviation systems, running on embedded devices in such things as airplanes. Spark is its descendant. There's another language used in the early US space program, HAL/S geared to handling HARDWARE failure and memory corruption due to cosmic rays.
Practical answer:
I've never met anyone who can code Ada/Spark. For most users the best answer is SQL variants on a DBMS with automatic failover and clustering of servers. Integrity checks guarantee safety. Something like T-SQL or PL/SQL has full transactional security, is Turing-complete, and is pretty tolerant of problems.
Reason there isn't a better answer:
For performance reasons, you can't provide durability for every program operation. If you did, the processing would slow to the speed of your fastest nonvolative storage. At best, your performance will drop by a thousand or million fold, because of how much slower ANYTHING is than CPU caches or RAM.
It would be the equivalent of going from a Core 2 Duo CPU to the ancient 8086 CPU -- at most you could do a couple hundred operations per second. Except, this would be even SLOWER.
In cases where frequent power cycling or hardware failures exist, you use something like a DBMS, which guarantees ACID for every important operation. Or, you use hardware that has fast, nonvolatile storage (flash, for example) -- this is still much slower, but if the processing is simple, this is OK.
At best your language gives you good compile-time safety checks for bugs, and will throw exceptions rather than crashing. Exception handling is a feature of half the languages in use now.
There are several commercially avaible frameworks Veritas, Sun's HA , IBMs HACMP etc. etc.
which will automatically monitor processes and start them on another server in event of failure.
There is also expensive hardware like HPs Tandem Nonstop range which can survive internal hardware failures.
However sofware is built by peoples and peoples love to get it wrong. Consider the cautionary tale of the IEFBR14 program shipped with IBMs MVS. It basically a NOP dummy program which allows the declarative bits of JCL to happen without really running a program. This is the entire original source code:-
IEFBR14 START
BR 14 Return addr in R14 -- branch at it
END
Nothing code be simpler? During its long life this program has actually acummulated a bug bug report and is now on version 4.
Thats 1 bug to three lines of code, the current version is four times the size of the original.
Errors will always creep in, just make sure you can recover from them.
This question forced me to post this text
(Its quoted from HGTTG from Douglas Adams:)
Click, hum.
The huge grey Grebulon reconnaissance ship moved silently through the black void. It was travelling at fabulous, breathtaking speed, yet appeared, against the glimmering background of a billion distant stars to be moving not at all. It was just one dark speck frozen against an infinite granularity of brilliant night.
On board the ship, everything was as it had been for millennia, deeply dark and Silent.
Click, hum.
At least, almost everything.
Click, click, hum.
Click, hum, click, hum, click, hum.
Click, click, click, click, click, hum.
Hmmm.
A low level supervising program woke up a slightly higher level supervising program deep in the ship's semi-somnolent cyberbrain and reported to it that whenever it went click all it got was a hum.
The higher level supervising program asked it what it was supposed to get, and the low level supervising program said that it couldn't remember exactly, but thought it was probably more of a sort of distant satisfied sigh, wasn't it? It didn't know what this hum was. Click, hum, click, hum. That was all it was getting.
The higher level supervising program considered this and didn't like it. It asked the low level supervising program what exactly it was supervising and the low level supervising program said it couldn't remember that either, just that it was something that was meant to go click, sigh every ten years or so, which usually happened without fail. It had tried to consult its error look-up table but couldn't find it, which was why it had alerted the higher level supervising program to the problem .
The higher level supervising program went to consult one of its own look-up tables to find out what the low level supervising program was meant to be supervising.
It couldn't find the look-up table .
Odd.
It looked again. All it got was an error message. It tried to look up the error message in its error message look-up table and couldn't find that either. It allowed a couple of nanoseconds to go by while it went through all this again. Then it woke up its sector function supervisor.
The sector function supervisor hit immediate problems. It called its supervising agent which hit problems too. Within a few millionths of a second virtual circuits that had lain dormant, some for years, some for centuries, were flaring into life throughout the ship. Something, somewhere, had gone terribly wrong, but none of the supervising programs could tell what it was. At every level, vital instructions were missing, and the instructions about what to do in the event of discovering that vital instructions were missing, were also missing.
Small modules of software — agents — surged through the logical pathways, grouping, consulting, re-grouping. They quickly established that the ship's memory, all the way back to its central mission module, was in tatters. No amount of interrogation could determine what it was that had happened. Even the central mission module itself seemed to be damaged.
This made the whole problem very simple to deal with. Replace the central mission module. There was another one, a backup, an exact duplicate of the original. It had to be physically replaced because, for safety reasons, there was no link whatsoever between the original and its backup. Once the central mission module was replaced it could itself supervise the reconstruction of the rest of the system in every detail, and all would be well.
Robots were instructed to bring the backup central mission module from the shielded strong room, where they guarded it, to the ship's logic chamber for installation.
This involved the lengthy exchange of emergency codes and protocols as the robots interrogated the agents as to the authenticity of the instructions. At last the robots were satisfied that all procedures were correct. They unpacked the backup central mission module from its storage housing, carried it out of the storage chamber, fell out of the ship and went spinning off into the void.
This provided the first major clue as to what it was that was wrong.
Further investigation quickly established what it was that had happened. A meteorite had knocked a large hole in the ship. The ship had not previously detected this because the meteorite had neatly knocked out that part of the ship's processing equipment which was supposed to detect if the ship had been hit by a meteorite.
The first thing to do was to try to seal up the hole. This turned out to be impossible, because the ship's sensors couldn't see that there was a hole, and the supervisors which should have said that the sensors weren't working properly weren't working properly and kept saying that the sensors were fine. The ship could only deduce the existence of the hole from the fact that the robots had clearly fallen out of it, taking its spare brain, which would have enabled it to see the hole, with them.
The ship tried to think intelligently about this, failed, and then blanked out completely for a bit. It didn't realise it had blanked out, of course, because it had blanked out. It was merely surprised to see the stars jump. After the third time the stars jumped the ship finally realised that it must be blanking out, and that it was time to take some serious decisions.
It relaxed.
Then it realised it hadn't actually taken the serious decisions yet and panicked. It blanked out again for a bit. When it awoke again it sealed all the bulkheads around where it knew the unseen hole must be.
It clearly hadn't got to its destination yet, it thought, fitfully, but since it no longer had the faintest idea where its destination was or how to reach it, there seemed to be little point in continuing. It consulted what tiny scraps of instructions it could reconstruct from the tatters of its central mission module.
"Your !!!!! !!!!! !!!!! year mission is to !!!!! !!!!! !!!!! !!!!!, !!!!! !!!!! !!!!! !!!!!, land !!!!! !!!!! !!!!! a safe distance !!!!! !!!!! ..... ..... ..... .... , land ..... ..... ..... monitor it. !!!!! !!!!! !!!!!..."
All of the rest was complete garbage.
Before it blanked out for good the ship would have to pass on those instructions, such as they were, to its more primitive subsidiary systems.
It must also revive all of its crew.
There was another problem. While the crew was in hibernation, the minds of all of its members, their memories, their identities and their understanding of what they had come to do, had all been transferred into the ship's central mission module for safe keeping. The crew would not have the faintest idea of who they were or what they were doing there. Oh well.
Just before it blanked out for the final time, the ship realised that its engines were beginning to give out too.
The ship and its revived and confused crew coasted on under the control of its subsidiary automatic systems, which simply looked to land wherever they could find to land and monitor whatever they could find to monitor.
Try taking an existing open source interpreted language and see if you could adapt its implementation to include some of these features. Python's default C implementation embeds an internal lock (called the GIL, Global Interpreter Lock) that is used to "handle" concurrency among Python threads by taking turns every 'n' VM instructions. Perhaps you could hook into this same mechanism to checkpoint the code state.
For a program to continue where it left off if the machine loses power, not only would it need to save state to somewhere, the OS would also have to "know" to resume it.
I suppose implementing a "hibernate" feature in a language could be done, but having that happen constantly in the background so it's ready in the event anything bad happens sounds like the OS' job, in my opinion.
It's main feature however should be: If the power dies, and the thing restarts it takes of where it left off (So it not only remembers where it was, it will remember the variable states as well). Also, if it stopped in the middle of a filecopy, it will also properly resume. etc etc.
... ...
I've looked at erlang in the past. However nice it's fault tolerant features it has... It doesn't survive a powercut. When the code restarts you'll have to pick up the pieces
If such a technology existed, I'd be VERY interested in reading about it. That said, The Erlang solution would be having multiple nodes--ideally in different locations--so that if one location went down, the other nodes could pick up the slack. If all of your nodes were in the same location and on the same power source (not a very good idea for distributed systems), then you'd be out of luck as you mentioned in a comment follow-up.
The Microsoft Robotics Group has introduced a set of libraries that appear to be applicable to your question.
What is Concurrency and Coordination
Runtime (CCR)?
Concurrency and Coordination Runtime
(CCR) provides a highly concurrent
programming model based on
message-passing with powerful
orchestration primitives enabling
coordination of data and work without
the use of manual threading, locks,
semaphores, etc. CCR addresses the
need of multi-core and concurrent
applications by providing a
programming model that facilitates
managing asynchronous operations,
dealing with concurrency, exploiting
parallel hardware and handling partial
failure.
What is Decentralized Software
Services (DSS)?
Decentralized Software Services (DSS)
provides a lightweight, state-oriented
service model that combines
representational state transfer (REST)
with a formalized composition and
event notification architecture
enabling a system-level approach to
building applications. In DSS,
services are exposed as resources
which are accessible both
programmatically and for UI
manipulation. By integrating service
composition, structured state
manipulation, and event notification
with data isolation, DSS provides a
uniform model for writing highly
observable, loosely coupled
applications running on a single node
or across the network.
Most of the answers given are general purpose languages. You may want to look into more specialized languages that are used in embedded devices. The robot is a good example to think about. What would you want and/or expect a robot to do when it recovered from a power failure?
In the embedded world, this can be implemented through a watchdog interrupt and a battery-backed RAM. I've written such myself.
Depending upon your definition of a disaster, it can range from 'difficult' to 'practicly impossible' to delegate this responsibility to the language.
Other examples given include persisting the current state of the application to NVRAM after each statement is executed. This only works so long as the computer doesn't get destroyed.
How would a language level feature know to restart the application on a new host?
And in the situation of restoring the application to a host - what if significant time had passed and assumptions/checks made previously were now invalid?
T-SQL, PL/SQL and other transactional languages are probably as close as you'll get to 'disaster proof' - they either succeed (and the data is saved), or they don't. Excluding disabling transactional isolation, it's difficult (but probably not impossible if you really try hard) to get into 'unknown' states.
You can use techniques like SQL Mirroring to ensure that writes are saved in atleast two locations concurrently before a transaction is committed.
You still need to ensure you save your state every time it's safe (commit).
If I understand your question correctly, I think that you are asking whether it's possible to guarantee that a particular algorithm (that is, a program plus any recovery options provided by the environment) will complete (after any arbitrary number of recoveries/restarts).
If this is correct, then I would refer you to the halting problem:
Given a description of a program and a finite input, decide whether the program finishes running or will run forever, given that input.
I think that classifying your question as an instance of the halting problem is fair considering that you would ideally like the language to be "disaster proof" -- that is, imparting a "perfectness" to any flawed program or chaotic environment.
This classification reduces any combination of environment, language, and program down to "program and a finite input".
If you agree with me, then you'll be disappointed to read that the halting problem is undecidable. Therefore, no "disaster proof" language or compiler or environment could be proven to be so.
However, it is entirely reasonable to design a language that provides recovery options for various common problems.
In the case of power failure.. sounds like to me: "When your only tool is a hammer, every problem looks like a nail"
You don't solve power failure problems within a program. You solve this problem with backup power supplies, batteries, etc.
If the mode of failure is limited to hardware failure, VMware Fault Tolerance claims similar thing that you want. It runs a pair of virtual machines across multiple clusters, and using what they call vLockstep, the primary vm sends all states to the secondary vm real-time, so in case of primary failure, the execution transparently flips to the secondary.
My guess is that this wouldn't help communication failure, which is more common than hardware failure. For serious high availability, you should consider distributed systems like Birman's process group approach (paper in pdf format, or book Reliable Distributed Systems: Technologies, Web Services, and Applications ).
The closest approximation appears to be SQL. It's not really a language issue though; it's mostly a VM issue. I could imagine a Java VM with these properties; implementing it would be another matter.
A quick&dirty approximation is achieved by application checkpointing. You lose the "die at any moment" property, but it's pretty close.
I think its a fundemental mistake for recovery not to be a salient design issue. Punting responsibility exclusivly to the environment leads to a generally brittle solution intolerant of internal faults.
If it were me I would invest in reliable hardware AND design the software in a way that it was able to recover automatically from any possible condition. Per your example database session maintenance should be handled automatically by a sufficiently high level API. If you have to manually reconnect you are likely using the wrong API.
As others have pointed out procedure languages embedded in modern RDBMS systems are the best you are going to get without use of an exotic language.
VMs in general are designed for this sort of thing. You could use a VM vendors (vmware..et al) API to control periodic checkpointing within your application as appropriate.
VMWare in particular has a replay feature (Enhanced Execution Record) which records EVERYTHING and allows point in time playback. Obviously there is a massive performance hit with this approach but it would meet the requirements. I would just make sure your disk drives have a battery backed write cache.
You would most likely be able to find similiar solutions for java bytecode run inside a java virtual machine. Google fault tolerant JVM and virtual machine checkpointing.
If you do want the program information saved, where would you save it?
It would need to be saved e.g. to disk. But this wouldn't help you if the disk failed, so already it's not disaster-proof.
You are only going to get a certain level of granularity in your saved state. If you want something like tihs, then probably the best approach is to define your granularity level, in terms of what constitutes an atomic operation and save state to the database before each atomic operation. Then, you can restore to the point of that level atomic operation.
I don't know of any language that would do this automatically, sincethe cost of saving state to secondary storage is extremely high. Therefore, there is a tradeoff between level of granularity and efficiency, which would be hard to define in an arbitrary application.
First, implement a fault tolerant application. One where, where, if you have 8 features and 5 failure modes, you have done the analysis and test to demonstrate that all 40 combinations work as intended (and as desired by the specific customer: no two will likely agree).
second, add a scripting language on top of the supported set of fault-tolerant features. It needs to be as near to stateless as possible, so almost certainly something non-Turing-complete.
finally, work out how to handle restoration and repair of scripting language state adapted to each failure mode.
And yes, this is pretty much rocket science.
Windows Workflow Foundation may solve your problem. It's .Net based and is designed graphically as a workflow with states and actions.
It allows for persistence to the database (either automatically or when prompted). You could do this between states/actions. This Serialises the entire instance of your workflow into the database. It will be rehydrated and execution will continue when any of a number of conditions is met (certain time, rehydrated programatically, event fires, etc...)
When a WWF host starts, it checks the persistence DB and rehydrates any workflows stored there. It then continues to execute from the point of persistence.
Even if you don't want to use the workflow aspects, you can probably still just use the persistence service.
As long as your steps were atomic this should be sufficient - especially since I'm guessing you have a UPS so could monitor for UPS events and force persistence if a power issue is detected.
If I were going about solving your problem, I would write a daemon (probably in C) that did all database interaction in transactions so you won't get any bad data inserted if it gets interrupted. Then have the system start this daemon at startup.
Obviously developing web stuff in C is quite slower than doing it in a scripting language, but it will perform better and be more stable (if you write good code of course :).
Realistically, I'd write it in Ruby (or PHP or whatever) and have something like Delayed Job (or cron or whatever scheduler) run it every so often because I wouldn't need stuff updating ever clock cycle.
Hope that makes sense.
To my mind, the concept of failure recover is, most of the time, a business problem, not a hardware or language problem.
Take an example : you have one UI Tier and one subsystem.
The subsystem is not very reliable but the client on the UI tier should percieve it as if it was.
Now, imagine that somehow your sub system crash, do you really think that the language you imagine, can think for you how to handle the UI Tier depending on this sub system ?
Your user should be explicitly aware that the subsystem is not reliable, if you use messaging to provide high reliability, the client MUST know that (if he isn't aware, the UI can just freeze waiting a response which can eventually come 2 weeks later). If he should be aware of this, this means that any abstrations to hide it will eventually leak.
By client, I mean end user. And the UI should reflect this unreliability and not hide it, a computer cannot think for you in that case.
"So a language which would remember it's state at any given moment, no matter if the power gets cut off, and continues where it left off."
"continues where it left off" is often not the correct recovery strategy. No language or environment in the world is going to attempt to guess how to recover from a particular fault automatically. The best it can do is provide you with tools to write your own recovery strategy in a way that doesn't interfere with your business logic, e.g.
Exception handling (to fail fast and still ensure consistency of state)
Transactions (to roll back incompleted changes)
Workflows (to define recovery routines that are called automatically)
Logging (for tracking down the cause of a fault)
AOP/dependency injection (to avoid having to manually insert code to do all the above)
These are very generic tools and are available in lots of languages and environments.

How do you protect code from leaking outside? [duplicate]

This question already has answers here:
How do you protect your software from illegal distribution? [closed]
(22 answers)
Closed 5 years ago.
Besides open-sourcing your project and legislation, are there ways to prevent, or at least minimize the damages of code leaking outside your company/group?
We obviously can't block Internet access (to prevent emailing the code) because programmer's need their references. We also can't block peripheral devices (USB, Firewire, etc.)
The code matters most when it has some proprietary algorithms and in-house developed knowledge (as opposed to regular routine code to draw GUIs, connect to databases, etc.), but some applications (like accounting software and CRMs) are just that: complex collections of routine code that are simple to develop in principle, but will take years to write from scratch. This is where leaked code will come in handy to competitors.
As far as I see it, preventing leakage relies almost entirely on human process. What do you think? What precautions and measures are you taking? And has code leakage affected you before?
You can't stop it getting out. So two solutions - stop people wanting to hurt you, and have legal precautions. To stop people hating you treat them right (saying more is probably off topic for stack overflow).
I'm not a lawyer, but to give yourself legal protection, if you believe in it, patent the ideas, put a copyright notice in the code, and make sure the contracts for your programmers specify carefully intellectual property rights.
But at the end of the day, the answer is run quicker than the competition.
Unless you're working with something highly classified and given that you can't block email and USB devices I guess you aren't there's really not to much damage to be had even if the source code leaks. The thing is, what is the code, or parts of it worth without the knowledge of how it works and the organization around it.
In general the value of "source" is much less than is commonly touted, basicly the source without the people or the organization isn't worth the storage it occupies for a competitor.
Also, you're missing the most likely attack vector, and it's also the one you can't stop no matter what. If someone really really want's to know how you made your magic then they'll try to hire your developers away, and since you can't stop them from having information inside their skull and even if they turn in all their possesions ther knowledge and domain expertise is leaving with them. Basicly employee retention and trust is the only way. Sorry.
I don't know how much actual help this is going to be, but:
Don't p*ss your programmers off. Don't get them in a position where they want to give the source to a competitor. Most places undervalue their developers. Given where you are (SO), I guess you are less likely to. Nothing got to me more than seeing the sales folks out for games of golf - paid, and paid for, by the company - while we had to fight to get pizza once a month.
Really, if your direct competitors got your code today, what would it do? Is your product or vertical market that stagnant that you wouldn't release newer, better versions before they could react? Is there no room for innovation? Most companies overvalue their "proprietary algorithms and in-house developed knowledge". Sure, it may cut some time off, but it's only about 10% of the problem.
If you got all the source for all your competitors products, how much actual use would it be? I'd guess it would set you back months. Not forward. Back.
If you had a clean system, and little external/internal knowledge, how long would it take you to get your own product into a buildable state? How long would it take to drill down into the code and workout what is going on? How much time and money would you waste trying to work something out, rather than spending time and money on how to make your product work better?
I've actually been in the position of having all the source - 1million lines+ of code - to a competitor's product. We did nothing with it - aside from a bit of a poke-around and then delete it, which was more than I was comfortable with - but I would expect that we'd have chewed up months of time just to get to where they were then.
So we nuked it, slapped the id10t who got it (yes, a developer/PM who came over from the other company), and thought about how to make our product kick so much butt that it didn't matter what they did. Much better use of time. Worked well, too. We had differentiators, not just re-hashing the same features in the same way they did them.
Sorry, but there is no way you can stop people getting stuff out, and still be able to actually work. You can stop them wanting to do it, or make it so there is no value to them having it.
We were worried about people decompiling our code too. We stopped worrying when we realised that WE had enough trouble working out what was going on inside 500K+ lines of C#, C++ and HTML code talking to MAPI/Exchange. If someone can decompile it and work it out, then we want to hire them......
BTW, for clarity, and given who I now work for, I should point out this is not my current employer. This was quite a while ago.
The code does not leak out on itself. It takes people to take it. There are obviously some security measures you might use like traffic analysis and lock-down on the repositories so only authorized developers can connect to it.
But by the end of the day your best option is to make sure that no one WANTS to steal from you. Your team has to be happy, they have to be proud to work for your they have to be loyal to the company and to each other. If you have such team it's a simple question of explaining to everyone that the code has to be protected from outsiders. It will not stop a dedicated mole but will prevent accidents.
P.S. And yes, proper clauses in the contracts would not harm as well, at least they will make sure that the developers are AWARE that taking code outside is morally wrong.
Follow these guidelines and it shouldn't matter if the contents of your entire source code repository is posted all over stackoverflow:
http://geocities.com/mdetting/unmaintainable.html
Oh, and show your developers that you don't trust them by blocking access to parts of the source code, scanning outgoing/incoming email etc. That is a surefire way to make them want to stay around... ...nothing improves morale like a bit of mistrust in the workplace.
Another cool way is to tell one half that they are "team a" and name the other half as the untrustworthy "team b". Then reverse it and say the same thing to the "team b" members. Encourage them to keep an eye on the "bad guys" in the other team and to report any signs of illoyalty to you. Sprinkle a few "conflict inducers" (e.g. tell "Joe": 'do you know what Ed says about you behind your back?') etc. Works wonders if you set up the developers against each other and create a few [invented-by-you] conflicts here and there...
(Eh, and no, I don't actually recommend any of the above. Just kidding. But I have seen people use all of the tactics above. And it didn't work.)
Okay, I am going to be a little practical here.
Being nice to everybody and hoping they won't hurt you doesn't work.
Every programmer knows from the day he joins a company that he'll not stay there forever. He will change when he's learned enough to get a better opportunity.
The programmers who write the code believe that they have the ownership to it even if they wrote it on the time they rented out to somebody else. So many of them will usually try to get their hands on the source-code even if they don't intend to hurt anybody.
Once they leave the company and they've carried the source code with them and lost contact with their colleagues, the conscience settles down and goes on a vacation and after a while bits and pieces from the code start showing up everywhere.
That's what I KNOW happens cause I've witnessed it happen to my company.
So what does one do?
Sign a NDA which specifically mentions that they programmer WILL not take copies.
Distribute your product between programmers, and if possible get modules coded individually and integrated by a chief whose responsibility is that all programmers do nt get all the code.
At the time of termination get a written undertaking from the coders that they do not possess any IP of the company and they understand the penalties of violation.
If somebody violates your IP, sue the man! No exceptions. It'll work as an example for the present team.
Do I sound extreme?
I remember this happening to Valve when they were developing HL-2. Interesting link here: http://www.shacknews.com/onearticle.x/28619
Most of the answers are based on Moral and ethical values. I wonder if Google, Facebook etc. just rely on their employees good will. Give me a break, that's totally utopian. Don't be a fool. Be realistic.
YES, it is possible to prevent code leaking:
Using a virtual server hosting virtual machines, programmers can only access locally to these virtual machines (intranet) via Remote Desktop. Repository is managed locally. private keys are required to access the repository. Copy/paste from virtual machine to client is disabled. only copy/paste from client to virtual is allowed.
Companies like facebook do that.
The only way to still code is by taking pictures to the actual code, which is totally not practical and feasible at all, and since there are surveillance cameras everywhere, you will have to go to the bathroom to take those pictures.
I've worked somewhere where there was a real culture of secrecy about this sort of thing (historically there had been a number of times when the company was small where "customers" had, shall we say, abused their access to our product).
While at the top the management were very protective, I see it slightly differently. I think our code, while not entirely irrelevant, isn't as key as you'd expect it to be in a software company.
The reason that we are successful is:
1) The code is essentially the solution to a bunch of problems. If you get our code you get those solutions but we still have the smart people who solved those problems. They understand those problems better than you do and are better able to solve the next set of problems better than you are.
2) Because they really understand the problems (and the solutions) we can do things faster than our competitors which translates to cheaper (or more profitable).
3) Also because of those people and the attitude within the company we've delivered well to our clients and provided good support.
4) And because of that we have a good reputation and reference-able customers.
A small number of companies have code which is genuinely worth keeping secret - proprietary algorithms and that sort of thing - but for a vast majority of us our products are very easily replicable by smart people.
What I'm saying is do the basics - write it into people's contracts that they can't take it, keep it secure and so on - but don't obsess over it. Unless you're in a very specific market it's unlikely to be what's really going to make your business succeed or fail.
The best step starts from reruting guys with strong ethical behaviour.
Various other steps can be taken like all communication being scanned. There are places where email and all information going out is scanned. The desktop/laptop does not have hard-disk or the access is restricted and all work is on network folders, even when working from home, one has to get connected to internet. The offline work gets synchronized. The USB and drives are disconnected.
The other policies are to provide access only on need basis.
These will only slow down and hinder to some extent, but is one is very determined then he would find ways to get around this.
The other way is if the code is really very important, then have the idea copywrite protected legaly.
To be honest it's almost impossible. If I wanted to suggest what a company that would shortly appear on the Daily WTF would do:
Disconnect the "work computer" from the internet, bt because they need internet access for reference buy everyone a wbbook.
Stuff the developers USB slots with epoxy and require that they load/unload everything from a centralised server, which scans all the data that goes through it for code like syntax.
Or you could just trust your employees and make them sign an NDA...
I personally never tested on any real case, but I would suggest using code fragmentation:
basically you split your project in a number of libraries, define interfaces and unit tests for each of them, then you separate SVN repositories so that each group have access to a limited part of your precious source code.
This is also a good practice no matter what and should help if you are outsourcing abroad.
The previous answers all seem to center on building trust and employing ethical people.
Another possibility might be to create your own domain specific language and tools. That will make any leaked code harder to use. It might still be possible to steal useful ideas from it, but it would not be possible to simply compile a competing product unless the whole toolchain is leaked.
Trust your developers. People tend to live up or down to expectations. Treat them well, and remember that loyalty goes both ways. After all, if you can't cut off thumb drives, you can't stop anybody from leaking code, no matter how much you don't trust them.
That being said, find yourself a lawyer with trade secret expertise, probably expertise in other parts of IP law, and ask how to legally safeguard stuff. You do want to make sure that, if a competitor gets your stuff, it's not legal for the competitor to benefit from it.

Which is better: shipping a buggy feature or not shipping the feature at all?

this is a bit of a philosophical question. I am adding a small feature to my software which I assume will be used by most users but only maybe 10% of the times they use the software. In other words, the software has been fine without it for 3 months, but 4 or 5 users have asked for it, and I agree that it should be there.
The problem is that, due to limitations of the platform I'm working with (and possibly limitations of my brain), "the best I can do" still has some non-critical but noticeable bugs - let's say the feature as coded is usable but "a bit wonky" in some cases.
What to do? Is a feature that's 90% there really "better than nothing"? I know I'll get some bug reports which I won't be able to fix: what do I tell customers about those? Should I live with unanswered feature requests or unanswered bug reports?
Make sure people know, that you know, that there are problems. That there are bugs. And give them an easy way to proide feedback.
What about having a "closed beta" with the "4 or 5 users" who suggested the feature in the first place?
There will always be unanswered feature requests and bug reports. Ship it, but include a readme with "known issues" and workarounds when possible.
You need to think of this from your user's perspective - which will cause less frustration? Buggy code is usually more frustrating than missing features.
Perfectionists may answer "don't do it".
Business people may answer "do it".
I guess where the balance is is up to you. I would be swaying towards putting the feature in there if the bugs are non-critical. Most users don't see your software the same way you do. You're a craftsman/artist, which means your more critical than regular people.
Is there any way that you can get a beta version to the 4-5 people who requested the feature? Then, once you get their feedback, it may be clear which decision to make.
Precisely document the wonkiness and ship it.
Make sure a user is likely to see and understand your documentation of the wonkiness.
You could even discuss the decision with users who have requested the feature: do some market research.
Just because you can't fix it now, doesn't mean you won't be able to in the future. Things change.
Label what you have now as a 'beta version' and send it out to those people who have asked for it. Get their feedback on how well it works, fix whatever they complain about, and you should then be ready to roll it out to larger groups of users.
Ship early, ship often, constant refactoring.
What I mean is, don't let it stop you from shipping, but don't give up on fixing the problems either.
An inability to resolve wonkiness is a sign of problems in your code base. Spend more time refactoring than adding features.
I guess it depends on your standards. For me, buggy code is not production ready and so shouldn't be shipped. Could you have a beta version with a known issues list so users know what to expect under certain conditions? They get the benefit of using the new features but also know that it's not perfect (use that their own risk). This may keep those 4 or 5 customers that requested the feature happy for a while which gives you more time to fix the bugs (if possible) and release to production later for the masses.
Just some thoughts depending on your situation.
Depends. On the bugs, their severity and how much effort you think it will take to fix them. On the deadline and how much you think you can stretch it. On the rest of the code and how much the client can do with it.
I would not expect coders to deliver known problems into test let alone to release to a customer.
Mind you, I believe in zero tolerance of bugs. Interestingly I find that it is usually developers/ testers who are keenest to remove all bugs - it is often the project manager and/ or customer who are willing to accept bugs.
If you must release the code, then document every feature/ bug that you are aware of, and commit to fixing each one.
Why don't you post more information about the limitations of the platform you are working on, and perhaps some of the clever folk here can help get your bug list down.
If the demand is for a feature NOW, rather than a feature that works. You may have to ship.
In this situation though:
Make sure you document the bug(s)
and consequences (both to the user
and other developers).
Be sure to add the bug(s) to your
bug tracking database.
If you write unit tests (I hope so),
make sure that tests are written
which highlight the bugs, before you
ship. This will mean that when you
come to fix the bugs in the future,
you know where and what they are,
without having to remember.
Schedule the work to fix the bugs
ASAP. You do fix bugs before
writing new code, don't you?
If bugs can cause death or can lose users' files then don't ship it.
If bugs can cause the application to crash itself then ship it with a warning (a readme or whatever). If crashes might cause the application to corrupt the users' files that they were in the middle of editing with this exact application, then display a warning each time they start up the application, and remind them to backup their files first.
If bugs can cause BSODs then be very careful about who you ship it to.
If it doesn't break anything else, why not ship it? It sounds like you have a good relationship with your customers, so those who want the feature will be happy to get it even if it's not all the way there, and those who don't want it won't care. Plus you'll get lots of feedback to improve it in the next release!
The important question you need to answer is if your feature will solve a real business need given the design you've come up with. Then it's only a matter of making the implementation match the design - making the "bugs" being non-bugs by defining them as not part of the intended behaviour of the feature (which should be covered by the design).
This boils down to a very real choice of paths: is a bug something that doesn't work properly, that wasn't part of the intended behaviour and design? Or is it a bug only if if doesn't work in accordance to the intended behaviour?
I am a firm believer in the latter; bugs are the things that do not work the way they were intended to work. The implementation should capture the design, that should capture the business need. If the implementation is used to address a different business need that wasn't covered by the design, it is the design that is at fault, not the implementation; thus it is not a bug.
The former attitude is by far the most common amongst programmers in my experience. It is also the way the user views software issues. From a software development perspective, however, it is not a good idea to adopt this view, because it leads you to fix bugs that are not bugs, but design flaws, instead of redesigning the solution to the business need.
Coming from someone who has to install buggy software for their users - don't ship it with that feature enabled.
It doesn't matter if you document it, the end users will forget about that bug the first time they hit it, and that bug will become critical to them not being able to do their job.

Resources