Defensive programming [closed] - security

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
When writing code do you consciously program defensively to ensure high program quality and to avoid the possibility of your code being exploited maliciously, e.g. through buffer overflow exploits or code injection ?
What's the "minimum" level of quality you'll always apply to your code ?

In my line of work, our code has to be top quality.
So, we focus on two main things:
Testing
Code reviews
Those bring home the money.

Similar to abyx, in the team I am on developers always use unit testing and code reviews. In addition to that, I also aim to make sure that I don't incorporate code that people may use - I tend to write code only for the basic set of methods required for the object at hand to function as has been spec'd out. I've found that incorporating methods that may never be used, but provide functionality can unintentionally introduce a "backdoor" or unintended/unanticipated use into the system.
It's much easier to go back later and introduce methods, attributes, and properties for which are asked versus anticipating something that may never come.

I'd recommend being defensive for data that enter a "component" or framework. Within a "component" or framework one should think that the data is "correct".
Thinking like this. It is up to the caller to supply correct parameters otherwise ALL functions and methods have to check every incomming parameter. But if the check is only done for the caller the check is only needed once. So, a parameter should be "correct" and thus can be passed through to lower levels.
Always check data from external sources, users etc
A "component" or framework should always check incomming calls.
If there is a bug and a wrong value is used in a call. What is really the right thing todo? One only have an indication that the "data" the program is working on is wrong and some like ASSERTS but others want to use advanced error reporting and possible error recovery. In any case the data is found to be faulty and in few cases it's good to continue working on it. (note it's good if servers don't die at least)
An image sent from a satellite might be a case to try advanced error recovery on...an image downloaded from the internet to put up an error icon for...

I recommend people write code that is fascist in the development environment and benevolent in production.
During development you want to catch bad data/logic/code as early as possible to prevent problems either going unnoticed or resulting in later problems where the root cause is hard to track.
In production handle problems as gracefully as possible. If something really is a non-recoverable error then handle it and present that information to the user.
As an example here's our code to Normalize a vector. If you feed it bad data in development it will scream, in production it returns a safety value.
inline const Vector3 Normalize( Vector3arg vec )
{
const float len = Length(vec);
ASSERTMSG(len > 0.0f "Invalid Normalization");
return len == 0.0f ? vec : vec / len;
}

I always work to prevent things like injection attacks. However, when you work on an internal intranet site, most of the security features feel like wasted effort. I still do them, maybe just not as well.

Well, there is a certain set of best practices for security. At a minimum, for database applications, you need to watch out for SQL Injection.
Other stuff like hashing passwords, encrypting connection strings, etc. are also a standard.
From here on, it depends on the actual application.
Luckily, if you are working with frameworks such as .Net, a lot of security protection comes built-in.

You have to always program defensively I would say even for internal apps, simply because users could just through sheer luck write something that breaks your app. Granted you probably don't have to worry about trying to cheat you out of money but still. Always program defensively and assume the app will fail.

Using Test Driven Development certainly helps. You write a single component at a time and then enumerate all of the potential cases for inputs (via tests) before writing the code. This ensures that you've covered all bases and haven't written any cool code that no-one will use but might break.
Although I don't do anything formal I generally spend some time looking at each class and ensuring that:
if they are in a valid state that they stay in a valid state
there is no way to construct them in an invalid state
Under exceptional circumstances they will fail as gracefully as possible (frequently this is a cleanup and throw)

It depends.
If I am genuinely hacking something up for my own use then I will write the best code that I don't have to think about. Let the compiler be my friend for warnings etc. but I won't automatically create types for the hell of it.
The more likely the code is to be used, even occasionally, I ramp up the level of checks.
minimal magic numbers
better variable names
fully checked & defined array/string lengths
programming by contract assertions
null value checks
exceptions (depending upon context of the code)
basic explanatory comments
accessible usage documentation (if perl etc.)

I'll take a different definition of defensive programming, as the one that's advocated by Effective Java by Josh Bloch. In the book, he talks about how to handle mutable objects that callers pass to your code (e.g., in setters), and mutable objects that you pass to callers (e.g., in getters).
For setters, make sure to clone any mutable objects, and store the clone. This way, callers cannot change the passed-in object after the fact to break your program's invariants.
For getters, either return an immutable view of your internal data, if the interface allows it; or else return a clone of the internal data.
When calling user-supplied callbacks with internal data, send in an immutable view or clone, as appropriate, unless you intend the callback to alter the data, in which case you have to validate it after the fact.
The take-home message is to make sure no outside code can hold an alias to any mutable objects that you use internally, so that you can maintain your invariants.

I am very much of the opinion that correct programming will protect against these risks. Things like avoiding deprecated functions, which (in the Microsoft C++ libraries at least) are commonly deprecated because of security vulnerabilities, and validating everything that crosses an external boundary.
Functions that are only called from your code should not require excessive parameter validation because you control the caller, that is, no external boundary is crossed. Functions called by other people's code should assume that the incoming parameters will be invalid and/or malicious at some point.
My approach to dealing with exposed functions is to simply crash out, with a helpful message if possible. If the caller can't get the parameters right then the problem is in their code and they should fix it, not you. (Obviously you have provided documentation for your function, since it is exposed.)
Code injection is only an issue if your application is able to elevate the current user. If a process can inject code into your application then it could easily write the code to memory and execute it anyway. Without being able to gain full access to the system code injection attacks are pointless. (This is why applications used by administrators should not be writeable by lesser users.)

In my experience, positively employing defensive programming does not necessarily mean that you end up improving the quality of your code. Don't get me wrong, you need to defensively program to catch the kinds of problems that users will come across - users don't like it when your program crashes on them - but this is unlikely to make the code any easier to maintain, test, etc.
Several years ago, we made it policy to use assertions at all levels of our software and this - along with unit testing, code reviews, etc. plus our existing application test suites - had a significant, positive effect on the quality of our code.

Java, Signed JARs and JAAS.
Java to prevent buffer overflow and pointer/stack whacking exploits.
Don't use JNI. ( Java Native Interface) it exposes you to DLL/Shared libraries.
Signed JAR's to stop class loading being a security problem.
JAAS can let your application not trust anyone, even itself.
J2EE has (admittedly limited) built-in support for Role based security.
There is some overhead for some of this but the security holes go away.

Simple answer: It depends.
Too much defensive coding can cause major performance issues.

Related

How to Protect an Exe File from Decompilation

What are the methods for protecting an Exe file from Reverse Engineering.Many Packers are available to pack an exe file.Such an approach is mentioned in http://c-madeeasy.blogspot.com/2011/07/protecting-your-c-programexe-files-from.html
Is this method efficient?
The only good way to prevent a program from being reverse-engineered ("understood") is to revise its structure to essentially force the opponent into understanding Turing Machines. Essentially what you do is:
take some problem which generally proven to be computationally difficult
synthesize a version of that whose outcome you know; this is generally pretty easy compared to solving a version
make the correct program execution dependent on the correct answer
make the program compute nonsense if the answer is not correct
Now an opponent staring at your code has to figure what the "correct" computation is, by solving algorithmically hard problems. There's tons of NP-hard problems that nobody has solved efficiently in the literature in 40 years; its a pretty good bet if your program depends on one of these, that J. Random Reverse-Engineer won't suddenly be able to solve them.
One generally does this by transforming the original program to obscure its control flow, and/or its dataflow. Some techniques scramble the control flow by converting some control flow into essentially data flow ("jump indirect through this pointer array"), and then implementing data flow algorithms that require precise points-to analysis, which is both provably hard and has proven difficult in practice.
Here's a paper that describes a variety of techniques rather shallowly but its an easy read:
http://www.cs.sjsu.edu/faculty/stamp/students/kundu_deepti.pdf
Here's another that focuses on how to ensure that the obfuscating transformations lead to results that are gauranteed to be computationally hard:
http://www.springerlink.com/content/41135jkqxv9l3xme/
Here's one that surveys a wide variety of control flow transformation methods,
including those that provide levels of gaurantees about security:
http://www.springerlink.com/content/g157gxr14m149l13/
This paper obfuscates control flows in binary programs with low overhead:
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.167.3773&rank=2
Now, one could go through a lot of trouble to prevent a program from being decompiled. But if the decompiled one was impossible to understand, you simply might not bother; that's the approach I'd take.
If you insist on preventing decompilation, you can attack that by considering what decompilation is intended to accomplish. Decompilation essentially proposes that you can convert each byte of the target program into some piece of code. One way to make that fail, is to ensure that the application can apparently use each byte
as both computer instructions, and as data, even if if does not actually do so, and that the decision to do so is obfuscated by the above kinds of methods. One variation on this is to have lots of conditional branches in the code that are in fact unconditional (using control flow obfuscation methods); the other side of the branch falls into nonsense code that looks valid but branches to crazy places in the existing code. Another variant on this idea is to implement your program as an obfuscated interpreter, and implement the actual functionality as a set of interpreted data.
A fun way to make this fail is to generate code at run time and execute it on the fly; most conventional languages such as C have pretty much no way to represent this.
A program built like this would be difficult to decompile, let alone understand after the fact.
Tools that are claimed to a good job at protecting binary code are listed at:
https://security.stackexchange.com/questions/1069/any-comprehensive-solutions-for-binary-code-protection-and-anti-reverse-engineeri
Packing, compressing and any other methods of binary protection will only every serve to hinder or slow reversal of your code, they have never been and never will be 100% secure solutions (though the marketing of some would have you believe that). You basically need to evaluate what sort of level of hacker you are up against, if they are script kids, then any packer that require real effort and skill (ie:those that lack unpacking scripts/programs/tutorials) will deter them. If your facing people with skills and resources, then you can forget about keeping your code safe (as many of the comments say: if the OS can read it to execute it, so can you, it'll just take a while longer). If your concern is not so much your IP but rather the security of something your program does, then you might be better served in redesigning in a manner where it cannot be attack even with the original source (chrome takes this approach).
Decompilation is always possible. The statement
This threat can be eliminated to extend by packing/compressing the
executable(.exe).
on your linked site is a plain lie.
Currently many solutions can be used to protect your application from being anti-compiled. Such as compressing, Obfuscation, Code snippet, etc.
You can looking for a company to help you achieve this.
Such as Nelpeiron, the website is:https://www.nalpeiron.com/
Which can cover many platforms, Windows, Linux, ARM-Linux, Android.
What is more Virbox is also can be taken into consideration:
The website is: https://lm-global.virbox.com/index.html
I recommend is because they have more options to protect your source code, such as import table protection, memory check.

How to make an Agile methodology work when developing long-term complex systems? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
My team is building a product that has a lot of components that rely on each other. For example, whenever we add a new type of data to the system, we also have to add logging code to track the changes that use that data type. Or, when we add a new UI screen, we have to make sure that its strings are externalized so they can be translated. These things slow down almost every task we do, and sometimes one of the the steps gets forgotten.
The traditional way to handle this problem is to add required checklists and documentation and things like that. How do Agile methodologies handle it?
The design you describe sounds like it might be a little too tightly-coupled. A renewed focus on enterprise patterns (such as Inversion of Control, programming to interfaces, etc) could help a lot.
If you are doing pair programming, you should be checking each other's work, making sure all of the i's are dotted and the t's are crossed.
If you are doing Test Driven Development, your tests should not be passing until all requirements for that particular portion of the development effort are satisfied.
If you are developing a large, complex system, you need experienced developers who understand the design and development process. You may also need a hands-on (read:coding) architect who can oversee the whole process.
Oh, and checklists (despite their traditional nature) are good too.
I'd suggest reading Alistair Cockburn's "”Agile Software Development: The Cooperative Game" - he takes quite an intelligent approach to Agile that's largely "do what gets the job done". That might help you work out how to get some kind of checklist / documentation into what you're doing without making everything horribly top-heavy.
Could some of your problems be solved by better tests? When you talked about not doing things that need to be done, my first thought was "why hasn't a test failed?" Maybe you need to look at tools for testing user interfaces? (edit: or even some small script on commit that greps code for whatever indicates the need for translation strings and checks against the files with the translations in?)
Also, can you change your design so that it's both less coupled and "forces" you to do the right thing. Perhaps making those data types implement a logging interface that the logger delegates to, or similar...?
Depending on your IDE there are various tools to help identify strings that need to be externalized, but if you are in a habit of just not putting in static strings this can be avoided.
If you need to add logging I would suggest AOP, as, at some point you will want to remove the logging code and you risk breaking the application.
But, a long-term, complex system is ideal for agile development, as, while you are developing, the needs of the client/customer may change, and you can adapt to it.
You need to ensure that the customer has feedback on a regular basis (ideally daily, and in a perfect system the customer has a rep sitting by for questions).
When I have many steps that must be done, esp for something like datatypes, I will resort to using a spreadsheet, so, you add a datatype, you add a row to the spreadsheet. Then you can track everything that needs to be done before that datatype is completely added to the application.
Have cross-component teams. That way when you add some functionality, the member(s) from the logging component will update their part and the translator(s) will update the strings.
I think it is important to understand that Agile methodology is only a process framework, not a process in itself. For example it says to follow test driven development and do pair programming but it does not say how to write the tests, or suggest a review checklist or suggest a coding guideline or say what documents to write. Those parts are entirely upto you or your organization to define and follow.
When planning a feature, you can add a engineering task called "review" and allot time for it. You could do the review task in whatever way works best for you and your organization. If pair programming doesn't work as well as a formal inspection for you, you should do formal inspections.
Do what needs to be done, but not more:
better definition of done (the definition of done is a kind of checklist to me),
better testing,
"just enough" documentation (Agile != no documentation),
etc.
Well, where I work we have QA that can catch some of the bugs if there are missing requirements or something slips through. I'm not saying the development team is intentionally putting in bugs, but as a codebase grows, it becomes harder and harder to remain nimble and thorough in checking everything.
Wikis can be useful for capturing methods used to try to get the clearest requirements. By clear I mean that the story card isn't likely to list all the requirements and that there may be discussions with an end-user or business analyst to get their understanding of what is desired. Part of our sign-offs involve getting an end-user to see the functionality and approve it before moving from the development environment.
Once every handful of sprints, we may have most of a sprint devoted to bug fixing/refactoring so that things can be cleaned up that would otherwise not get done as they aren't likely to be important. This can be cosmetic bugs or broken windows that while they have little business value initially can be useful in the long run.

How to implement code in a manner that lessens the possibility of complete re-works [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I had a piece of work thrown out due to a single minor spec change that turned out not to have been spec'ed correctly. If it had been done right at the start of the project then most of that work would have never have been needed in the first place.
What are some good tips/design principles that keep these things from happening?
Or to lessen the amount of re-working to code that is needed in order to implement feature requests or design changes mid implementation?
Modularize. Make small blocks of code that do their job well. However, thats only the beginning. Its usually a large combination of factors that contribute to code so bad it needs a complete rework. Everything from highly unstable requirements, poor design, lack of code ownership, the list goes on and on.
Adding on to what others have brought up: COMMUNICATION.
Communication between you and the customer, you and management, you and the other developers, you and your QA department, communication between everyone is key. Make sure management understands reasonable timeframes and make sure both you and the customer understand exactly what it is that your building.
Take the time to keep communication open with the customer that your building the product for. Make milestones and setup a time to display the project to the customer at each milestone. Even if the customer is completely disappointed with a milestone when you show it, you can scratch what you have and start over from the last milestone. This also requires that your work be built in blocks that work independent of one another as Csunwold stated.
Points...
Keep open communication
Be open and honest with progress of product
Be willing to change daily as to the needs of the customers business and specifications for the product change.
Software requirements change, and there's not much one can do about that except for more frequent interaction with clients.
One can, however, build code that is more robust in face of change. It won't save you from throwing out code that meets a requirement that nobody needs anymore, but it can reduce the impact of such changes.
For example, whenever this applies, use interfaces rather than classes (or the equivalent in your language), and avoid adding operations to the interface unless you are absolutely sure you need them. By building your programs that way you are less likely to rely on knowledge of a specific implementation, and you're less likely to implement things that you would not need.
Another advantage of this approach is that you can easily swap one implementation for another. For example, it sometimes pays off to write the dumbest (in efficiency) but the fastest to write and test implementation for your prototype, and only replace it with something smarter in the end when the prototype is the basis of the product and the performance actually matters. I find that this is a very effective way to avoid premature optimizations, and thus throwing away stuff.
modularity is the answer, as has been said. but it can be a hard answer to use in practice.
i suggest focussing on:
small libraries which do predefined things well
minimal dependencies between modules
writing interfaces first is a good way to achieve both of these (with interfaces used for the dependencies). writing tests next, against the interfaces, before the code is written, often highlights design choices which are un-modular.
i don't know whether your app is UI-intensive; that can make it more difficult to be modular. it's still usually worth the effort, but if not then assume that it will be thrown away before long and follow the iceberg principle, that 90% of the work is not tied to the UI and so easier to keep modular.
finally, i recommend "the pragmatic programmer" by andrew hunt and dave thomas as full of tips. my personal favourite is DRY -- "don't repeat yourself" -- any code which says the same thing twice smells.
iterate small
iterate often
test between iterations
get a simple working product out asap so the client can give input.
Basically assume stuff WILL get thrown out, so code appropriately, and don't get far enough into something that having it be thrown out costs a lot of time.
G'day,
Looking through the other answers here I notice that everyone is mentioning what to do for your next project.
One thing that seems to be missing though is having a washup to find out why the spec. was out of sync. with the actual requirements needed by the customer.
I'm just worried that if you don't do this, no matter what approach you are taking to implementing your next project, if you've still got that mismatch between actual requirements and the spec. for your next project then you'll once again be in the same situation.
It might be something as simple as bad communication or maybe customer requirement creep.
But at least if you know the cause and you can try and help minimise the chances of it happening again.
Not knocking what other answers are saying and there's some great stuff there, but please learn from what happened so that you're not condemned to repeat it.
HTH
cheers,
Sometimes a rewrite is the best solution!
If you are writing software for a camera, you could assume that the next version will also do video, or stereo video or 3d laser scanning and include all hooks for all this functionality, or you could write such a versatile extensible astronaut architecture that it could cope with the next camera including jet engines - but it will cost so much in money, resources and performance that you might have been better off not doing it.
A complete rewrite for new functionality in a new role isn't always a bad idea.
Like csunwold said, modularizing your code is very important. Write it so that if one piece falls prone to errors, it doesn't muck up the rest of the system. This way, you can debug a single buggy section while being able to safely rely on the rest.
Beyond this, documentation is key. If your code is neatly and clearly annotated, reworking it in the future will be infinitely easier for you or whoever happens to be debugging.
Using source control can be helpful too. If you find a piece of code doesn't work properly, there's always the opportunity to revert back to a past robust iteration.
Although it doesn't directly apply to your example, when writing code I try to keep an eye out for ways in which I can see the software evolving in the future.
Basically I try to anticipate where the software will go, but critically, I resist the temptation to implement any of the things I can imagine happening. All I am after is trying to make the APIs and interfaces support possible futures without implementing those features yet, in the hope that these 'possible scenarios' help me come up with a better and more future-proof interface.
Doesn't always work ofcourse.

Achieving Thread-Safety

Question How can I make sure my application is thread-safe? Are their any common practices, testing methods, things to avoid, things to look for?
Background I'm currently developing a server application that performs a number of background tasks in different threads and communicates with clients using Indy (using another bunch of automatically generated threads for the communication). Since the application should be highly availabe, a program crash is a very bad thing and I want to make sure that the application is thread-safe. No matter what, from time to time I discover a piece of code that throws an exception that never occured before and in most cases I realize that it is some kind of synchronization bug, where I forgot to synchronize my objects properly. Hence my question concerning best practices, testing of thread-safety and things like that.
mghie: Thanks for the answer! I should perhaps be a little bit more precise. Just to be clear, I know about the principles of multithreading, I use synchronization (monitors) throughout my program and I know how to differentiate threading problems from other implementation problems. But nevertheless, I keep forgetting to add proper synchronization from time to time. Just to give an example, I used the RTL sort function in my code. Looked something like
FKeyList.Sort (CompareKeysFunc);
Turns out, that I had to synchronize FKeyList while sorting. It just don't came to my mind when initially writing that simple line of code. It's these thins I wanna talk about. What are the places where one easily forgets to add synchronization code? How do YOU make sure that you added sync code in all important places?
You can't really test for thread-safeness. All you can do is show that your code isn't thread-safe, but if you know how to do that you already know what to do in your program to fix that particular bug. It's the bugs you don't know that are the problem, and how would you write tests for those? Apart from that threading problems are much harder to find than other problems, as the act of debugging can already alter the behaviour of the program. Things will differ from one program run to the next, from one machine to the other. Number of CPUs and CPU cores, number and kind of programs running in parallel, exact order and timing of stuff happening in the program - all of this and much more will have influence on the program behaviour. [I actually wanted to add the phase of the moon and stuff like that to this list, but you get my meaning.]
My advice is to stop seeing this as an implementation problem, and start to look at this as a program design problem. You need to learn and read all that you can find about multi-threading, whether it is written for Delphi or not. In the end you need to understand the underlying principles and apply them properly in your programming. Primitives like critical sections, mutexes, conditions and threads are something the OS provides, and most languages only wrap them in their libraries (this ignores things like green threads as provided by for example Erlang, but it's a good point of view to start out from).
I'd say start with the Wikipedia article on threads and work your way through the linked articles. I have started with the book "Win32 Multithreaded Programming" by Aaron Cohen and Mike Woodring - it is out of print, but maybe you can find something similar.
Edit: Let me briefly follow up on your edited question. All access to data that is not read-only needs to be properly synchronized to be thread-safe, and sorting a list is not a read-only operation. So obviously one would need to add synchronization around all accesses to the list.
But with more and more cores in a system constant locking will limit the amount of work that can be done, so it is a good idea to look for a different way to design your program. One idea is to introduce as much read-only data as possible into your program - locking is no longer necessary, as all access is read-only.
I have found interfaces to be a very valuable aid in designing multi-threaded programs. Interfaces can be implemented to have only methods for read-only access to the internal data, and if you stick to them you can be quite sure that a lot of the potential programming errors do not occur. You can freely share them between threads, and the thread-safe reference counting will make sure that the implementing objects are properly freed when the last reference to them goes out of scope or is assigned another value.
What you do is create objects that descend from TInterfacedObject. They implement one or more interfaces which all provide only read-only access to the internals of the object, but they can also provide public methods that mutate the object state. When you create the object you keep both a variable of the object type and a interface pointer variable. That way lifetime management is easy, because the object will be deleted automatically when an exception occurs. You use the variable pointing to the object to call all methods necessary to properly set up the object. This mutates the internal state, but since this happens only in the active thread there is no potential for conflict. Once the object is properly set up you return the interface pointer to the calling code, and since there is no way to access the object afterwards except by going through the interface pointer you can be sure that only read-only access can be performed. By using this technique you can completely remove the locking inside of the object.
What if you need to change the state of the object? You don't, you create a new one by copying the data from the interface, and mutate the internal state of the new objects afterwards. Finally you return the reference pointer to the new object.
By using this you will only need locking where you get or set such interfaces. It can even be done without locking, by using the atomic interchange functions. See this blog post by Primoz Gabrijelcic for a similar use case where an interface pointer is set.
Simple: don't use shared data. Every time you access shared data you risk running into a problem (if you forget to synchronize access). Even worse, each time you access shared data you risk blocking other threads which will hurt your paralelization.
I know this advice is not always applicable. Still, it doesn't hurt if you try to follow it as much as possible.
EDIT: Longer response to Smasher's comment. Would not fit in a comment :(
You are totally correct. That's why I like to keep a shadow copy of the main data in a readonly thread. I add a versioning to the structure (one 4-aligned DWORD) and increment this version in the (lock-protected) data writer. Data reader would compare global and private version (which can be done without locking) and only if they differr it would lock the structure, duplicate it to a local storage, update the local version and unlock. Then it would access the local copy of the structure. Works great if reading is the primary way to access the structure.
I'll second mghie's advice: thread safety is designed in. Read about it anywhere you can.
For a really low level look at how it is implemented, look for a book on the internals of a real time operating system kernel. A good example is MicroC/OS-II: The Real Time Kernel by Jean J. Labrosse, which contains the complete annotated source code to a working kernel along with discussions of why things are done the way they are.
Edit: In light of the improved question focusing on using a RTL function...
Any object that can be seen by more than one thread is a potential synchronization issue. A thread-safe object would follow a consistent pattern in every method's implementation of locking "enough" of the object's state for the duration of the method, or perhaps, narrowed to just "long enough". It is certainly the case that any read-modify-write sequence to any part of an object's state must be done atomically with respect to other threads.
The art lies in figuring out how to get useful work done without either deadlocking or creating an execution bottleneck.
As for finding such problems, testing won't be any guarantee. A problem that shows up in testing can be fixed. But it is extremely difficult to write either unit tests or regression tests for thread safety... so faced with a body of existing code your likely recourse is constant code review until the practice of thread safety becomes second nature.
As folks have mentioned and I think you know, being certain, in general, that your code is thread safe is impossible (I believe provably impossible but I would have to track down the theorem). Naturally, you want to make things easier than that.
What I try to do is:
Use a known pattern of multithreaded design: A thread pool, the actor model paradigm, the command pattern or some such approach. This way, the syncronization process happens in the same way, in a uniform way, throughout the application.
Limit and concentrate the points of synchronization. Write your code so you need synchronization in as few places as possible and the keep the synchronization code in one or few places in the code.
Write the synchronization code so that the logical relation between the values is clear on both on entering and on exiting the guard. I use lots of asserts for this (your environment may limit this).
Don't ever access shared variables without guards/synchronization. Be very clear what your shared data is. (I've heard there are paradigms for guardless multithreaded programming but that would require even more research).
Write your code as cleanly, clearly and DRY-ly as possible.
My simple answer combined with those answer is:
Create your application/program using
thread safety manner
Avoid using public static variable in
all places
Therefore it usually fall into this habit/practice easily but it needs some time to get used to:
program your logic (not the UI) in functional programming language such as F# or even using Scheme or Haskell. Also functional programming promotes thread safety practice while it also warns us to always code towards purity in functional programming.
If you use F#, there's also clear distinction about using mutable or immutable objects such as variables.
Since method (or simply functions) is a first class citizen in F# and Haskell, then the code you write will also have more disciplined toward less mutable state.
Also using the lazy evaluation style that usually can be found in these functional languages, you can be sure that your program is safe fromside effects, and you'll also realize that if your code needs effects, you have to clearly define it. IF side effects are taken into considerations, then your code will be ready to take advantage of composability within components in your codes and the multicore programming.

Have we given up on the idea of code reuse? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
A couple of years ago the media was rife with all sorts of articles on
how the idea of code reuse was a simple way to improve productivity
and code quality.
From the blogs and sites I check on a regular basis it seems as though
the idea of "code reuse" has gone out of fashion. Perhaps the 'code
reuse' advocates have all joined the SOA crowd instead? :-)
Interestingly enough, when you search for 'code reuse' in Google the
second result is titled:
"Internal Code Reuse Considered Dangerous"!
To me the idea of code reuse is just common sense, after all look at
the success of the apache commons project!
What I want to know is:
Do you or your company try and reuse code?
If so how and at what level, i.e. low level api, components or
shared business logic? How do you or your company reuse code?
Does it work?
Discuss?
I am fully aware that there are many open source libs available and that anyone who has used .NET or the Java has reused code in some form. That is common sense!
I was referring more to code reuse within an organizations rather than across a community via a shared lib etc.
I originally asked;
Do you or your company try and reuse code?
If so how and at what level, i.e. low level api, components or shared business logic? How do you or your company reuse code?
From where I sit I see very few example of companies trying to reuse code internally?
If you have a piece of code which could potentially be shared across a medium size organization how would you go about informing other members of the company that this lib/api/etc existed and could be of benefit?
The title of the article you are referring to is misleading, and is actually a very good read. Code reuse is very beneficial, but there are downsides with everything. Basically, if I remember correctly, the gist of the article is that you are sealing the code in a black box and not revisiting it, so as the original developers leave you lose the knowledge. While I see the point, I don't necessarily agree with it - at least not to a "sky is falling" regard.
We actually group code reuse into more than just reusable classes, we look at the entire enterprise. Things that are more like framework enhancement or address cross-cutting concerns are put into a development framework that all of our applications use (think things like pre- and post-validation, logging, etc.). We also have business logic that is applicable to more than one application, so those sort of things get moved to a BAL core that is accessible anywhere.
I think that the important thing is not to promote things for reuse if they are not going to really be reused. They should be well documented, so that new developers can have a resource to help them come up to speed, as well. Chances are, if the knowledge isn't shared, the code will eventually be reinvented somewhere else and will lead to duplication if you are not rigorous in documentation and knowledge sharing.
We reuse code - in fact, our developers specifically write code that can be reused in other projects. This has paid off quite nicely - we're able to start new projects quickly, and we iteratively harden our core libraries.
But one can't just write code and expect it to be re-used; code reuse requires communication among team members and other users so people know what code is available, and how to use it.
The following things are needed for code reuse to work effectively:
The code or library itself
Demand for the code across multiple projects or efforts
Communication of the code's features/capabilities
Instructions on how to use the code
A commitment to maintaining and improving the code over time
Code reuse is essential. I find that it also forces me to generalize as much as possible, also making code more adaptable to varying situations. Ideally, almost every lower level library you write should be able to adapt to a new set of requirements for a different application.
I think code reuse is being done through open source projects for the most part. Anything that can be reused or extended is being done via libraries. Java has an amazing number of open source libraries available for doing a large number of things. Compare that to C++, and how early on everything would have to be implemented from scratch using MFC or the Win32 API.
We reuse code.
On a small scale we try to avoid code duplication as much as posible. And we have a complete library with a lot of frequently used code.
Normally code is developed for one application. And if it is generic enough, it is promoted to the library. This works excelent.
The idea of code reuse is no longer a novel idea...hence the apparent lack of interest. But it is still very much a good idea. The entire .NET framework and the Java API are good examples of code reuse in action.
We have grown accustomed to developing OO libraries of code for our projects and reusing them in other projects. Its a part of the natural life cycle of an idea. It is hotly debated for a while and then everyone accepts and there is no reason for further discussion.
Of course we reuse code.
There are a near infinite amount of packages, libraries and shared objects available for all languages, with whole communities of developers behing them supporting and updating.
I think the lack of "media attention" is due to the fact that everyone is doing it, so it's no longer worth writing about. I don't hear as many people raising awareness of Object-Oriented Programming and Unit Testing as I used to either. Everyone is already aware of these concepts (whether they use them or not).
Level of media attention to an issue has little to do with its importance, whether we're talking software development or politics! It's important to avoid wasting development effort by reinventing (or re-maintaining!) the wheel, but this is so well-known by now that an editor probably isn't going to get excited by another article on the subject.
Rather than looking at the number of current articles and blog posts as a measure of importance (or urgency) look at the concepts and buzz-phrases that have become classics or entered the jargon (another form of reuse!) For example, Google for uses of the DRY acronym for good discussion on the many forms of redundancy that can be eliminated in software and development processes.
There's also a role for mature judgment regarding costs of reuse vs. where the benefits are achieved. Some writers advocate waiting to worry about reuse until a second or third use actually emerges, rather than spending effort to generalize bit of code the first time it is written.
My personal view, based on the practise in my company:
Do you or your company try and reuse code?
Obviously, if we have another piece of code that already fits our needs we will reuse it. We don't go out of our way to use square pegs in round holes though.
If so how and at what level, i.e. low level api, components or shared business logic? How do you or your company reuse code?
At every level. It is written into our coding standards that developers should always assume their code will be reused - even if in reality that is highly unlikely. See below
If your OO model is good, your API probably reflects your business domain, so reusable classes probably equates to reusable business logic without additional effort.
For actual reuse, one key point is knowing what code is already available. We resolve this by having everything documented in a central location. We just need a little discipline to ensure that the documentation is up-to-date and searchable in a meaningful way.
Does it work?
Yes, but not because of the potential or actual reuse! In reality, beyond a few core libraries and UI components, there isn't a large amount of reuse.
In my personal opinion, the real value is in making the code reusable. In doing so, aside from a hopefully cleaner API, the code will (a) be documented sufficiently for another developer to use it without trawling the source code, and (b) it will also be replaceable. These points are a great benefit to on-going software maintenance.
Do you or your company try and reuse code? If so how and at what
level, i.e. low level api, components or shared business logic? How do
you or your company reuse code?
I used to work in a codebase with uber code reuse, but it was difficult to maintain because the reused code was unstable. It was prone to design changes and deprecation in ways that cascaded to everything using it. Before that I worked in a codebase with no code reuse where the seniors actually encouraged copying and pasting as a way to reuse even application-specific code, so I got to see the two extremities and I have to say that one isn't necessarily much better than the other when taken to the extremes.
And I used to be an uber bottom-up kind of programmer. You ask me to build something specific and I end up building generalized tools. Then using those tools, I build more complex generalized tools, then start building DIP abstractions to express the design requirements for the lower-level tools, then I build even more complex tools and repeat, and at some point I start writing code that actually does what you want me to do. And as counter-productive as that sounded, I was pretty fast at it and could ship complex products in ways that really surprised people.
Problem was the maintenance over the months, years! After I built layers and layers of these generalized libraries and reused the hell out of them, each one wanted to serve a much greater purpose than what you asked me to do. Each layer wanted to solve the world's hunger needs. So each one was very ambitious: a math library that wants to be amazing and solve the world's hunger needs. Then something built on top of the math library like a geometry library that wants to be amazing and solve the world's hunger needs. You know something's wrong when you're trying to ship a product but your mind is mulling over how well your uber-generalized geometry library works for rendering and modeling when you're supposed to be working on animation because the animation code you're working on needs a few new geometry functions.
Balancing Everyone's Needs
I found in designing these uber-generalized libraries that I had to become obsessed with the needs of every single team member, and I had to learn how raytracing worked, how fluids dynamics worked, how the mesh engine worked, how inverse kinematics worked, how character animation worked, etc. etc. etc. I had to learn how to do pretty much everyone's job on the team because I was balancing all of their specific needs in the design of these uber generalized libraries I left behind while walking a tightrope balancing act of design compromises from all the code reuse (trying to make things better for Bob working on raytracing who is using one of the libraries but without hurting John too much who is working on physics who is also using it but without complicating the design of the library too much to make them both happy).
It got to a point where I was trying to parametrize bounding boxes with policy classes so that they could be stored either as center and half-size as one person wanted or min/max extents as someone else wanted, and the implementation was getting convoluted really fast trying to frantically keep up with everyone's needs.
Design By Committee
And because each layer was trying to serve such a wide range of needs (much wider than we actually needed), they found many reasons to require design changes, sometimes by committee-requested designs (which are usually kind of gross). And then those design changes would cascade upwards and affect all the higher-level code using it, and maintenance of such code started to become a real PITA.
I think you can potentially share more code in a like-minded team. Ours wasn't like-minded at all. These are not real names but I'd have Bill here who is a high-level GUI programmer and scripter who creates nice user-end designs but questionable code with lots of hacks, but it tends to be okay for that type of code. I got Bob here who is an old timer who has been programming since the punch card era who likes to write 10,000 line functions with gotos in them and still doesn't get the point of object-oriented programming. I got Joe here who is like a mathematical wizard but writes code no one else can understand and always make suggestions which are mathematically aligned but not necessarily so efficient from a computational standpoint. Then I got Mike here who is in outer space who wants us to port the software to iPhones and thinks we should all follow Apple's conventions and engineering standards.
Trying to satisfy everyone's needs here while coming up with a decent design was, probably in retrospect, impossible. And in everyone trying to share each other's code, I think we became counter-productive. Each person was competent in an area but trying to come up with designs and standards which everyone is happy with just lead to all kinds of instability and slowed everyone down.
Trade-Offs
So these days I've found the balance is to avoid code reuse for the lowest-level things. I use a top-down approach from the mid-level, perhaps (something not too far divorced from what you asked me to do), and build some independent library there which I can still do in a short amount of time, but the library doesn't intend to produce mini-libs that try to solve the world's hunger needs. Usually such libraries are a little more narrow in purpose than the lower-level ones (ex: a physics library as opposed to a generalized geometry-intersection library).
YMMV, but if there's anything I've learned over the years in the hardest ways possible, it's that there might be a balancing act and a point where we might want to deliberately avoid code reuse in a team setting at some granular level, abandoning some generality for the lowest-level code in favor of decoupling, having malleable code we can better shape to serve more specific rather than generalized needs, and so forth -- maybe even just letting everyone have a little more freedom to do things their own way. But of course all of this is with the aim of still producing a very reusable, generalized library, but the difference is that the library might not decompose into the teeniest generalized libraries, because I found that crossing a certain threshold and trying to make too many teeny, generalized libraries starts to actually become an extremely counter-productive endeavor in the long term -- not in the short term, but in the long run and broad scheme of things.
If you have a piece of code which could potentially be shared across a
medium size organization how would you go about informing other
members of the company that this lib/api/etc existed and could be of
benefit?
I actually am more reluctant these days and find it more forgivable if colleagues do some redundant work because I would want to make sure that code does something fairly useful and non-trivial and is also really well-tested and designed before I try to share it with people and accumulate a bunch of dependencies to it. The design should have very, very few reasons to require any changes from that point onwards if I share it with the rest of the team.
Otherwise it could cause more grief than it actually saves.
I used to be so intolerant of redundancy (in code or efforts) because it appeared to translate to a product that was very buggy and explosive in memory use. But I zoomed in too much on redundancy as the key problem, when really the real problem was poor quality, hastily-written code, and a lack of solid testing. Well-tested, reliable, efficient code wouldn't suffer that problem to nearly as great of a degree even if some people duplicate, say, some math functions here and there.
One of the common sense things to look at and remember that I didn't at the time is how we don't mind some redundancy when we use a very solid third party library. Chances are that you guys use a third party library or two that has some redundant work with what your team is doing. But we don't mind in those cases because the third party library is great and well-tested. I recommend applying that same mindset to your own internal code. The goal should be to create something awesome and well-tested, not to fuss over a little bit of redundancy here and there as I mistakenly did long ago.
So these days I've shifted my intolerance towards a lack of testing instead. Instead of getting upset over redundant efforts, I find it much more productive to get upset over other people's lack of unit and integration testing! :-D
While I think code reuse is valuable, I can see where this sentiment is rooted. I've worked on a lot of projects where much extra care was taken to create re-usable code that was then never reused. Of course reuse is much preferable to duplicate code, but I have seen a lot of very extenisve object models created with the goal of using the objects across the enterprise in multiple projects (kind of the way the same service in SOA can be used in different apps) but have never seen the objects actually used more than once. Maybe I just haven't been part of organizations taking good advantage of the principle of reuse.
The two software projects I've worked on have both been long term development. One is about 10 years old, the other has been around for over 30 years, rewritten in a couple versions of Fortran along the way. Both make extensive reuse of code, but both rely very little on external tools or code libraries. DRY is a big mantra on the newer project, which is in C++ and lends itself more easily to doing that in practice.
Maybe the better question is when do we NOT reuse code these days? We are either in a state on building using someone elses observed "best practices" or prediscovered "design patterns" or just actually building on legacy code, libraries, or copying.
It seems the degree to which code A is reused to make code B is often based around how much the ideas in code A taken to code B are abstracted into design patterns/idioms/books/fleeting thoughts/actual code/libraries. The hard part is in applying all those good ideas to your actual code.
Non-technical types get overzealous about the reuse thing. They don't understand why everything can't be copy-pasted. They don't understand why the greemelfarm needs a special adapter to communicate the same information that it used to to the old system to the new system, and that, unfortunately we can't change either due to a bazillion other reasons.
I think techies have been reusing from day 1 in the same way musicians have been reusing from day 1. Its an ongoing organic evolution and sythesis that will keep ongoing.
Code reuse is an extremely important issue - where code is not reused, projects take longer and are harder for new team members to get into.
However, writing reusable code takes longer.
Personally, I try to write all my code in a reusable way, this takes longer, but it results in the fact that most of my code has become official infrastructures in my organization and that new projects based on these infrastructures take significantly less time.
The danger in reusing code, is if the reused code is not written as an infrastructure - in a general and encapsulated manner with as few as possible assumptions and as much as possible documentation and unit testing, that the code can end up doing unexpected things.
Also, if bugs are found and fixed, or features added, these changes are rarely returned to the source code, resulting in different versions of the reused code, that no one knows of or understands.
The solution is:
1. To design and write the code with not only one project in mind, but to think of future requirements and try to make the design flexible enough to cover them with minimal code change.
2. To enclose the code within libraries that are to be used as-is and not modified within using projects.
3. To allow users to view and modify the code of of the library withing its solution (not within the using project's solution).
4. To design future projects to be based on the existing infrastructures, making changes to the infrastructures as necessary.
5. To charge maintaining the infrastructure to all projects, thus keeping the infrastructure funded.
Maven has solved code reuse. I'm completely serious.

Resources