How safe is Safe Haskell? - haskell

I was thinking about Safe Haskell and I wonder how much I can trust it?
Some fictional scenarios:
I am a little hacker writing a programmable game (think Robocode) where I allow others to program their own entities to compete against each other. Most of the time users will run some untrusted programs on private machines. Untrusted code would probably be inspected before running it.
I am the programmer of an application that is used by several clients. I provide an api so they can extend the functionality and encourage my users to share their plugins. The user community is small and most of the time there is mutual trust, but occasionally someone is working on a top-secret client project and any dataleaks would prove disastrous.
I am ... Google (or Facebook,Yahoo,etc) and want to allow my clients to script their email accounts. Scripts are uploaded and are run on my servers. Any access violations would be fatal.
Given these scenarios:
would Safe Haskell be appropriate to ensure sandboxing and access restriction?
Should someone in the given situations be able to trust the promises made?

As a rule of thumb, I'd say safe Haskell tries to get roughly where the safe subset of C# is. For your scenarios:
Can use safe haskell because you are inspecting code.
Cannot really use safe haskell alone to avoid disastrous data leaks.
I wouldn't recommend Google or Yahoo relying on safe haskell alone to run untrusted code. For one thing, it doesn't manage excessive resource consumption (CPU,memory,disk) or bottoms (undefined or while true). Use an OS sandbox for that.
A note on undefined: operationally, it stops the function returning a value by throwing an exception, as does the error function. Denotationally, it's considered to be the 'bottom' value. Now, even if safe Haskell disallowed undefined and error, a function could still fail to return, just by looping endlessly. And an endless loop is bottom too. So safe Haskell guarantees type and memory safety but doesn't try to guarantee that functions terminate. Safe Haskell is, of course, Turing complete, so it's not possible in general to prove termination. Furthermore, since out-of-memory throws an exception, functions may terminate with them. Finally, pattern match errors throw exceptions. So safe Haskell cannot eliminate bottoms of any kind and may as well allow explicit undefined and error.

To my knowledge, safe Haskell is not safe. Someone can use unsafePerformIO in a package and manually override the "unsafe" factor. If not, every package that had any dependencies on c programs or system libraries could not be marked safe. (Think about libgmp.so, which is linked to by almost everyone's Haskell base packages. For the base packages to be marked safe, this must be somehow getting explicitly marked as safe even though it uses unsafePerformIO).

Related

Can you disallow a common lisp script called from common lisp to call specific functions?

Common Lisp allows to execute/compile code at runtime. But I thought for some (scripting-like) purposes it would be good if one could disallow a user-script to call some functions (especially for application extensions). One could still ask the user if he will allow an extension to access files/... I'm thinking of something like the Android permission system for Common Lisp. Is this possible without rewriting the evaluation code?
The problem I see is, that in Common Lisp you would probably want a script to be able to use reader macros and normal macros and for the latter operators like intern, but those would allow you to get arbitrary symbols (by string manipulation & interning), so simply scanning the code before evaluation won't suffice to ensure that specific functions aren't called.
So, is there something like a lock for functions? I thought of using fmakunbound / makunbound (and keeping the values in a local variable), but would that be possible in a multi-threaded environment?
Thanks in advance.
This is not part of the Common Lisp specification and there is no Common Lisp implementation that is extended to make this kind of restriction easy.
It seems to me like it would be easier to use operating system restrictions (e.g. rlimit, capabilities, etc) to enforce what you want on the Common Lisp process.
This is not an unusual desire, i.e. to run untrusted 3rd party code in a sandbox.
You can hand craft a sandbox by creating a custom parser and interpreter for your scripting language. It is pedantic, but true, than any program with an API is providing such a service. API designers and implementors needs to worry about the vile users.
You can still call eval or the compiler to run your sandbox scripts. It just means you need to assure that your reader, parser and language decline to provide access to any risky functionality.
You can use a lisp package to create a good sandbox. You can still use s-expressions for your scripting language's syntax, but you must cripple the standard reader so the user can't escape package-sandbox. You can still use the evaluator and the compiler, but you need to be sure the package you have boxed the user into contains no functionality that he can use to do inappropriate things.
Successful sandbox design and construction is easier when you start with an empty sandbox and slowly add functionality. Common Lisp is a big language and that creates a huge surface for attacker to poke at. So if you create a sandbox out of a package it's best to start with an empty package and add functions one at a time. Thinking thru what risks they create. The same approach is good when creating your crippled reader. Don't start with the full reader and throw things away, start with a useless reader and add things. Sadly taking that advice creates a pretty significant cost to getting started. But, if you look around I suspect you can find an existing safe reader.
Xach's suggestion is another way to go and in many case more straight forward.

For reliable code, NModel, Spec Explorer, F# or other?

I've got a business app in C#, with unit tests. Can I increase the reliability and cut down on my testing time and expense by using NModel or Spec Explorer? Alternately, if I were to rewrite it in F# (or even Haskell), what kinds (if any) of reliability increase might I see?
Code Contracts? ASML?
I realize this is subjective, and possibly argumentative, so please back up your answers with data, if possible. :) Or maybe an worked example, such as Eric Evans Cargo Shipping System?
If we consider
Unit tests to be specific and strong theorems, checked
quasi-statically on particular “interesting instances” and Types to be general but weak theorems (usually checked statically), and contracts to be general and strong theorems, checked dynamically for particular instances that occur during regular program operation.
(from B. Pierce's Types Considered Harmful),
where do these other tools fit?
We could pose the analogous question for Java, using Java PathFinder, Scala, etc.
Reliability is a function of several variables, including the general architecture of the software, the capability of the programmers, the quality of the requirements and the maturity of your configuration management and general QA processes. All these will affect the reliability of a rewrite.
Having said that, language certainly has a significant impact. All other things being equal:
Defects are roughly proportional to SLOC count. Languages that are terser see fewer coding errors. Haskell seems to require about 10% of the SLOC required by C++, Erlang about 14%, Java around 50%. I guess C# probably fits alongside Java on this scale.
Type systems are not borne equal. Languages with type inference (e.g. Haskell and to a lesser extent O'Caml) will have fewer defects. Haskell in particular will allow you to encode invariants in the type system so that a program will only compile if they can be proven true. Doing so requires extra work, so consider the trade-off on a case-by-case basis.
Managing state is a source of many defects. Functional languages, and especially pure functional languages, avoid this problem.
QuickCheck and its relatives allow you to write unit and system tests that verify general properties rather than individual test cases. This can greatly reduce the work required to test the code, especially if you are aiming for high test coverage metrics. A set of QuickCheck properties resembles a formal specification, and this concept fits nicely with Test Driven Development (write your tests first, and when the code passes them you are done).
Put all of these things together and you should have a powerful toolkit for driving quality through the development lifecycle. Unfortunately I'm not aware of any robust studies that actually prove this. All the factors I listed at the start would confound any real study, and you would need a lot of data before an unambiguous pattern showed itself.
Some comments on the quote, in the context of C# which is my "first" language:
Unit tests to be specific and strong
theorems,
Yes, but they might not give you first order logic checks, like "for all x there exists a y where f(y)", more like "there exists a y, here it is (!), f(y)", aka setup, act, assert. ;)*
checked quasi-statically on
particular “interesting instances” and
Types to be general but weak theorems
(usually checked statically),
Types are not necessarily that weak**.
and
contracts to be general and strong
theorems, checked dynamically for
particular instances that occur during
regular program operation. (from B.
Pierce's Types Considered Harmful),
Unit Testing
Pex + Moles I think is getting closer to the first-order logic type of checking, as it generates the edge-cases and uses the C9 solver to work with integer constraint solving. I would really like to see more Moles tutorials (moles is for replacing implementations), specifically together with some sort of inversion of control container that can leverage what stub- and real- implementations of abstract classes and interfaces already exist.
Weak Types
In C# they are fairly weak, sure: generic typing/types allows you to add protocol semantics for one operation -- i.e. constraining types to be on interfaces, which are in some sense protocols which implementing classes agree to. However, the static typing of the protocol is just for one operation.
Example: Reactive Extensions API
Let's take Reactive Extensions as a discussion topic.
The contract required by the consumer, implemented by the observable.
interface IObserver<in T> : IDisposable {
void OnNext(T);
void OnCompleted();
void OnError(System.Exception);
}
There are more to the protocol than this interface shows: methods called on an IObserver< in T > instance must follow this protocol:
Ordering:
OnNext{0,n} (OnCompleted | OnError){0, 1}
Furthermore, on another axis; time-dimension:
Time:
for all t|-> t:(method -> time). t(OnNext) < t(OnCompleted)
for all t|-> t:(method -> time). t(OnNext) < t(OnError)
i.e. no invocation to OnNext may be done after one to OnCompleted xor OnError.
Furthermore, the axis of parallelism:
Parallelism:
no invocation to OnNext may be done in parallel
i.e. there's a scheduling constraint that needs to be followed from implementers of IObservable. No IObservable may push from multiple threads at the same time, without first synchronizing the invocation around a context.
How do you test this contract holds in an easy way? With c#, I don't know.
Consumer of API
From the consuming side of the application, there might be interactions between different contexts, such as Dispatcher, Background/other threads, and preferably we'd like to give guarantees that we don't end up in a deadlock.
Further, there is the requirement to handle deterministic disposing of the observables. It might not be clear all the time when an extension method's returned IObservable instance takes care of the method's arguments' IObservable instances and dispose those, so there's a requirement to know about the inner workings of the black box (alternatively you can let the references go in a "reasonable way" and the GC will take them at some point)
<<< Without Reactive Extensions, it's not necessarily easier:
There is the task pool on top of TPL is implemented. In the task pool we have a work-stealing queue of delegates to invoke on the worker threads.
Using the APM/begin/end or the async pattern (which queues to the task pool) could leave us open to callback-ordering bugs if we mutating state. Also, the protocol of begin-invocations and their callbacks might be too convoluted and hence impossible to follow. I read a post-mortem the other day about a silverlight project having problems seeing the business logic-forest for all the callback-trees. Then there's the possibility of implementing the poor-man's async monad, the IEnumerable with an async 'manager' iterating through it and calling MoveNext() every time a yielded IAsyncResult completes.
...and don't get me started on the nuuuumerous hidden protocols in IAsyncResult.
Another problem, without using Reactive extensions is the turtles problem - once you decide that you want an IO-blocking operation to be async, there need to be turtles all the way down to the p/invoke call that places the associated Win32-thread on an IO-completion port! If you have three layers and then some logic as well inside of your topmost layer, you need to make all three layers implement the APM pattern; and fulfil the numerous contract obligations of IAsyncResult (or leave it partially broken) -- and there's no default public AsyncResult implementation in the base class library.
>>>
Working with exceptions from the interface
Even with the above memory-management + parallelism + contract + protocol items covered, there are still exceptions to be handled (not just received and forgotten about), in a good, reliable application. I want to make an example;
Context
Let's say that we find ourselves catching an exception from the contract/interface (not necessarily from reactive extensions' IObservable implementations here which have monadic exception handling rather than stack-frame based).
Hopefully the programmer was diligent and documented the possible exceptions, but there might be exception possibilities all the way down. If everything is correctly defined with code contracts at least we can be sure we are capable of catching a few of the exceptions, but many different causes may be lumped together inside of one exception type, and once an exception is thrown, how do we ensure that the work of the least possible size is rectified?
Aim
Say that we are pushing some data-record from a message-bus-consumer in our application, and receiving them on the background thread which decides what to do with them.
Example
A real-life example here could be Spotify, which I'm using every day.
My $100 router/access point throws in the towel at random times. I guess it has a cache-bug or some sort of stack overflow bug, as it happens every time I push more than 2 MB/s LAN/WAN data through it.
I have to NICs up; the wifi and the ethernet card. Ethernet's connection goes down. The sockets of Spotify's event-handler loop return an invalid code (I think it's C or C++) or throw exceptions. Spotify has to handle it, but it doesn't know what my network topology looks like (and there is no code to try all routes/update the routing table and hence the interface to be used); I still have a route to the internet, but just not on the same interface. Spotify crashes.
A thesis
Exceptions are simply not semantic enough. I believe one can look at exceptions from the perspective of the Error monad in Haskell. We either continue or break: unwinding the stack, executing the catches, executing the finally's an praying we don't end up with race conditions on either other exception handlers or the GC, or async exceptions for outstanding IO-completion ports.
But when one of my interfaces' connection/route goes down, Spotify crashes freezes.
Now we have SEH/Structured Exception Handling, but I think we will have SEH2 in the future, where each source of exception gives, with the actual exception, a discriminated union (i.e. it should be statically typed to the linked library/assembly), of possible compensating actions -- in this example, I could imagine Windows' network API telling the application to execute a compensating action to open the same socket on another interface, or to handle it on its own (like now), or to retry the socket, with some kernel-managed retry policy. Each of these options are parts of a discriminated union type, so the implementer must use one of them.
I think that, when we have SEH2, it won't be called exceptions anymore.
^^
Anyway, I have digressed too much already.
Instead of reading my thoughts, listen to some of Erik Meijer's -- this is a very good round-table discussion between him and Joe Duffy. They discuss handling side-effects of calls. Or have a look at this search listing.
I'm finding myself in a position, today, as a consultant, of maintaining a system where stronger static semantics could be good, and I'm looking at tools which can give me the speed of programming + the correctness verification on a level which is accurate and precise. I haven't found it yet.
I simply think we are another 20 years if not more away from developer oriented reliable computing. There are just too many languages, frameworks, marketing BS and concepts in the air right now, for the ordinary develop to stay on top of things.
Why is this under the heading of "weak types"?
Because I find that the type system will be part of the solution; types need not be weak! Terse code and strong type systems (think Haskell) help programmers build reliable software.

Future Protections in Managed Languages and Runtimes

In the future, will managed runtimes provide additional protections against subtle data corruption issues?
Managed runtimes such as Java and the .NET CLR reduce or eliminate the possibility of many memory corruption bugs common in native languages like C#. Nonetheless, they are surprisingly not immune from all memory corruption problems. One intuitively expects that a method that validates its input, has no bugs, and robustly handles exceptions will always transform its object from one valid state to another, but this is not the case. (It is more accurate to say that it is not the case using prevailing programming conventions--object implementors need to go out of their way to avoid the problems I describe.)
Consider the following scenarios:
Threading. The caller might share the object with other threads and make concurrent calls on it. If the object does not implement locking, the fields might be corrupted. (Perhaps--unless notified that the object is thread-safe--runtimes should use an interlock on every method call to throw an exception if any method on the same object executing concurrently on another thread. This would be a protection feature and, just like other well-accepted safety features of managed runtimes, it has some cost.)
Re-entrancy. The method makes a callout to an arbitrary function (such as an event handler) that ultimately calls methods on the object that are not designed to be called at that point. This is even trickier than thread safety and many class libraries do not get this right. (Worse yet, class libraries are known to poorly document what re-entrancy is allowed.)
For all of these cases, it can be argued that thorough documentation is a solution. However, documentation also can prescribe how to allocate and deallocate memory in unmanaged languages. We know from experience (e.g., with memory allocation) that the difference between documentation and language/runtime enforcement is night and day.
What can we expect from languages and runtimes in the future to protect us from these problems and other subtle problems like them?
I think languages and runtimes will keep moving forward, keep abstracting away issues from the developer, and keep making our lives easier and more productive.
Take your example - threading. There are some great new features on the horizon in the .NET world to simplify the threading model we use daily. STM.NET may eventually make shared state much, much safer to handle, for example. The parallel extensions in .NET 4 make life very easy for threading compared to current technologies.
I think that transactional memory is promising for addressing some of these issues. I'm not sure if this answers your question in some way but this is an interesting topic in any event:
http://en.wikipedia.org/wiki/Software_transactional_memory
There was an episode of Software Engineering Radio on the topic a year or so ago maybe.
First of all, "managed" is a bit of a misnomer: languages like OCaml, Haskell, and SML achieve such protections and safety while being fully compiled. All relevant "management" occurs at compile time through static analysis, which aids optimization and speed.
Anyway, to answer your question: if you look at languages like Erlang and Haskell, state is isolated and immutable by default. With kind of system, threading and reentrancy is safe by default, and because you have to go out of your way to break these rules, it is obvious to see where unsafe code can arise.
By starting with safe defaults but leaving room for advanced unsafe usage, you get the best of both worlds. It seems reasonable that future systems that are safe by your definition may follow some of these practices as well.
What can we expect in the future?
Nothing. Thread-state and re-entrancy are not problems I see tools/runtimes solving. Instead I think in the future people will move to styles that avoid programming with mutable state to bypass these issues. Languages and libraries can help make these styles of programming more attractive, but the tools are not the solution - changing the way we write code is the solution.

Achieving Thread-Safety

Question How can I make sure my application is thread-safe? Are their any common practices, testing methods, things to avoid, things to look for?
Background I'm currently developing a server application that performs a number of background tasks in different threads and communicates with clients using Indy (using another bunch of automatically generated threads for the communication). Since the application should be highly availabe, a program crash is a very bad thing and I want to make sure that the application is thread-safe. No matter what, from time to time I discover a piece of code that throws an exception that never occured before and in most cases I realize that it is some kind of synchronization bug, where I forgot to synchronize my objects properly. Hence my question concerning best practices, testing of thread-safety and things like that.
mghie: Thanks for the answer! I should perhaps be a little bit more precise. Just to be clear, I know about the principles of multithreading, I use synchronization (monitors) throughout my program and I know how to differentiate threading problems from other implementation problems. But nevertheless, I keep forgetting to add proper synchronization from time to time. Just to give an example, I used the RTL sort function in my code. Looked something like
FKeyList.Sort (CompareKeysFunc);
Turns out, that I had to synchronize FKeyList while sorting. It just don't came to my mind when initially writing that simple line of code. It's these thins I wanna talk about. What are the places where one easily forgets to add synchronization code? How do YOU make sure that you added sync code in all important places?
You can't really test for thread-safeness. All you can do is show that your code isn't thread-safe, but if you know how to do that you already know what to do in your program to fix that particular bug. It's the bugs you don't know that are the problem, and how would you write tests for those? Apart from that threading problems are much harder to find than other problems, as the act of debugging can already alter the behaviour of the program. Things will differ from one program run to the next, from one machine to the other. Number of CPUs and CPU cores, number and kind of programs running in parallel, exact order and timing of stuff happening in the program - all of this and much more will have influence on the program behaviour. [I actually wanted to add the phase of the moon and stuff like that to this list, but you get my meaning.]
My advice is to stop seeing this as an implementation problem, and start to look at this as a program design problem. You need to learn and read all that you can find about multi-threading, whether it is written for Delphi or not. In the end you need to understand the underlying principles and apply them properly in your programming. Primitives like critical sections, mutexes, conditions and threads are something the OS provides, and most languages only wrap them in their libraries (this ignores things like green threads as provided by for example Erlang, but it's a good point of view to start out from).
I'd say start with the Wikipedia article on threads and work your way through the linked articles. I have started with the book "Win32 Multithreaded Programming" by Aaron Cohen and Mike Woodring - it is out of print, but maybe you can find something similar.
Edit: Let me briefly follow up on your edited question. All access to data that is not read-only needs to be properly synchronized to be thread-safe, and sorting a list is not a read-only operation. So obviously one would need to add synchronization around all accesses to the list.
But with more and more cores in a system constant locking will limit the amount of work that can be done, so it is a good idea to look for a different way to design your program. One idea is to introduce as much read-only data as possible into your program - locking is no longer necessary, as all access is read-only.
I have found interfaces to be a very valuable aid in designing multi-threaded programs. Interfaces can be implemented to have only methods for read-only access to the internal data, and if you stick to them you can be quite sure that a lot of the potential programming errors do not occur. You can freely share them between threads, and the thread-safe reference counting will make sure that the implementing objects are properly freed when the last reference to them goes out of scope or is assigned another value.
What you do is create objects that descend from TInterfacedObject. They implement one or more interfaces which all provide only read-only access to the internals of the object, but they can also provide public methods that mutate the object state. When you create the object you keep both a variable of the object type and a interface pointer variable. That way lifetime management is easy, because the object will be deleted automatically when an exception occurs. You use the variable pointing to the object to call all methods necessary to properly set up the object. This mutates the internal state, but since this happens only in the active thread there is no potential for conflict. Once the object is properly set up you return the interface pointer to the calling code, and since there is no way to access the object afterwards except by going through the interface pointer you can be sure that only read-only access can be performed. By using this technique you can completely remove the locking inside of the object.
What if you need to change the state of the object? You don't, you create a new one by copying the data from the interface, and mutate the internal state of the new objects afterwards. Finally you return the reference pointer to the new object.
By using this you will only need locking where you get or set such interfaces. It can even be done without locking, by using the atomic interchange functions. See this blog post by Primoz Gabrijelcic for a similar use case where an interface pointer is set.
Simple: don't use shared data. Every time you access shared data you risk running into a problem (if you forget to synchronize access). Even worse, each time you access shared data you risk blocking other threads which will hurt your paralelization.
I know this advice is not always applicable. Still, it doesn't hurt if you try to follow it as much as possible.
EDIT: Longer response to Smasher's comment. Would not fit in a comment :(
You are totally correct. That's why I like to keep a shadow copy of the main data in a readonly thread. I add a versioning to the structure (one 4-aligned DWORD) and increment this version in the (lock-protected) data writer. Data reader would compare global and private version (which can be done without locking) and only if they differr it would lock the structure, duplicate it to a local storage, update the local version and unlock. Then it would access the local copy of the structure. Works great if reading is the primary way to access the structure.
I'll second mghie's advice: thread safety is designed in. Read about it anywhere you can.
For a really low level look at how it is implemented, look for a book on the internals of a real time operating system kernel. A good example is MicroC/OS-II: The Real Time Kernel by Jean J. Labrosse, which contains the complete annotated source code to a working kernel along with discussions of why things are done the way they are.
Edit: In light of the improved question focusing on using a RTL function...
Any object that can be seen by more than one thread is a potential synchronization issue. A thread-safe object would follow a consistent pattern in every method's implementation of locking "enough" of the object's state for the duration of the method, or perhaps, narrowed to just "long enough". It is certainly the case that any read-modify-write sequence to any part of an object's state must be done atomically with respect to other threads.
The art lies in figuring out how to get useful work done without either deadlocking or creating an execution bottleneck.
As for finding such problems, testing won't be any guarantee. A problem that shows up in testing can be fixed. But it is extremely difficult to write either unit tests or regression tests for thread safety... so faced with a body of existing code your likely recourse is constant code review until the practice of thread safety becomes second nature.
As folks have mentioned and I think you know, being certain, in general, that your code is thread safe is impossible (I believe provably impossible but I would have to track down the theorem). Naturally, you want to make things easier than that.
What I try to do is:
Use a known pattern of multithreaded design: A thread pool, the actor model paradigm, the command pattern or some such approach. This way, the syncronization process happens in the same way, in a uniform way, throughout the application.
Limit and concentrate the points of synchronization. Write your code so you need synchronization in as few places as possible and the keep the synchronization code in one or few places in the code.
Write the synchronization code so that the logical relation between the values is clear on both on entering and on exiting the guard. I use lots of asserts for this (your environment may limit this).
Don't ever access shared variables without guards/synchronization. Be very clear what your shared data is. (I've heard there are paradigms for guardless multithreaded programming but that would require even more research).
Write your code as cleanly, clearly and DRY-ly as possible.
My simple answer combined with those answer is:
Create your application/program using
thread safety manner
Avoid using public static variable in
all places
Therefore it usually fall into this habit/practice easily but it needs some time to get used to:
program your logic (not the UI) in functional programming language such as F# or even using Scheme or Haskell. Also functional programming promotes thread safety practice while it also warns us to always code towards purity in functional programming.
If you use F#, there's also clear distinction about using mutable or immutable objects such as variables.
Since method (or simply functions) is a first class citizen in F# and Haskell, then the code you write will also have more disciplined toward less mutable state.
Also using the lazy evaluation style that usually can be found in these functional languages, you can be sure that your program is safe fromside effects, and you'll also realize that if your code needs effects, you have to clearly define it. IF side effects are taken into considerations, then your code will be ready to take advantage of composability within components in your codes and the multicore programming.

Defensive programming [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
When writing code do you consciously program defensively to ensure high program quality and to avoid the possibility of your code being exploited maliciously, e.g. through buffer overflow exploits or code injection ?
What's the "minimum" level of quality you'll always apply to your code ?
In my line of work, our code has to be top quality.
So, we focus on two main things:
Testing
Code reviews
Those bring home the money.
Similar to abyx, in the team I am on developers always use unit testing and code reviews. In addition to that, I also aim to make sure that I don't incorporate code that people may use - I tend to write code only for the basic set of methods required for the object at hand to function as has been spec'd out. I've found that incorporating methods that may never be used, but provide functionality can unintentionally introduce a "backdoor" or unintended/unanticipated use into the system.
It's much easier to go back later and introduce methods, attributes, and properties for which are asked versus anticipating something that may never come.
I'd recommend being defensive for data that enter a "component" or framework. Within a "component" or framework one should think that the data is "correct".
Thinking like this. It is up to the caller to supply correct parameters otherwise ALL functions and methods have to check every incomming parameter. But if the check is only done for the caller the check is only needed once. So, a parameter should be "correct" and thus can be passed through to lower levels.
Always check data from external sources, users etc
A "component" or framework should always check incomming calls.
If there is a bug and a wrong value is used in a call. What is really the right thing todo? One only have an indication that the "data" the program is working on is wrong and some like ASSERTS but others want to use advanced error reporting and possible error recovery. In any case the data is found to be faulty and in few cases it's good to continue working on it. (note it's good if servers don't die at least)
An image sent from a satellite might be a case to try advanced error recovery on...an image downloaded from the internet to put up an error icon for...
I recommend people write code that is fascist in the development environment and benevolent in production.
During development you want to catch bad data/logic/code as early as possible to prevent problems either going unnoticed or resulting in later problems where the root cause is hard to track.
In production handle problems as gracefully as possible. If something really is a non-recoverable error then handle it and present that information to the user.
As an example here's our code to Normalize a vector. If you feed it bad data in development it will scream, in production it returns a safety value.
inline const Vector3 Normalize( Vector3arg vec )
{
const float len = Length(vec);
ASSERTMSG(len > 0.0f "Invalid Normalization");
return len == 0.0f ? vec : vec / len;
}
I always work to prevent things like injection attacks. However, when you work on an internal intranet site, most of the security features feel like wasted effort. I still do them, maybe just not as well.
Well, there is a certain set of best practices for security. At a minimum, for database applications, you need to watch out for SQL Injection.
Other stuff like hashing passwords, encrypting connection strings, etc. are also a standard.
From here on, it depends on the actual application.
Luckily, if you are working with frameworks such as .Net, a lot of security protection comes built-in.
You have to always program defensively I would say even for internal apps, simply because users could just through sheer luck write something that breaks your app. Granted you probably don't have to worry about trying to cheat you out of money but still. Always program defensively and assume the app will fail.
Using Test Driven Development certainly helps. You write a single component at a time and then enumerate all of the potential cases for inputs (via tests) before writing the code. This ensures that you've covered all bases and haven't written any cool code that no-one will use but might break.
Although I don't do anything formal I generally spend some time looking at each class and ensuring that:
if they are in a valid state that they stay in a valid state
there is no way to construct them in an invalid state
Under exceptional circumstances they will fail as gracefully as possible (frequently this is a cleanup and throw)
It depends.
If I am genuinely hacking something up for my own use then I will write the best code that I don't have to think about. Let the compiler be my friend for warnings etc. but I won't automatically create types for the hell of it.
The more likely the code is to be used, even occasionally, I ramp up the level of checks.
minimal magic numbers
better variable names
fully checked & defined array/string lengths
programming by contract assertions
null value checks
exceptions (depending upon context of the code)
basic explanatory comments
accessible usage documentation (if perl etc.)
I'll take a different definition of defensive programming, as the one that's advocated by Effective Java by Josh Bloch. In the book, he talks about how to handle mutable objects that callers pass to your code (e.g., in setters), and mutable objects that you pass to callers (e.g., in getters).
For setters, make sure to clone any mutable objects, and store the clone. This way, callers cannot change the passed-in object after the fact to break your program's invariants.
For getters, either return an immutable view of your internal data, if the interface allows it; or else return a clone of the internal data.
When calling user-supplied callbacks with internal data, send in an immutable view or clone, as appropriate, unless you intend the callback to alter the data, in which case you have to validate it after the fact.
The take-home message is to make sure no outside code can hold an alias to any mutable objects that you use internally, so that you can maintain your invariants.
I am very much of the opinion that correct programming will protect against these risks. Things like avoiding deprecated functions, which (in the Microsoft C++ libraries at least) are commonly deprecated because of security vulnerabilities, and validating everything that crosses an external boundary.
Functions that are only called from your code should not require excessive parameter validation because you control the caller, that is, no external boundary is crossed. Functions called by other people's code should assume that the incoming parameters will be invalid and/or malicious at some point.
My approach to dealing with exposed functions is to simply crash out, with a helpful message if possible. If the caller can't get the parameters right then the problem is in their code and they should fix it, not you. (Obviously you have provided documentation for your function, since it is exposed.)
Code injection is only an issue if your application is able to elevate the current user. If a process can inject code into your application then it could easily write the code to memory and execute it anyway. Without being able to gain full access to the system code injection attacks are pointless. (This is why applications used by administrators should not be writeable by lesser users.)
In my experience, positively employing defensive programming does not necessarily mean that you end up improving the quality of your code. Don't get me wrong, you need to defensively program to catch the kinds of problems that users will come across - users don't like it when your program crashes on them - but this is unlikely to make the code any easier to maintain, test, etc.
Several years ago, we made it policy to use assertions at all levels of our software and this - along with unit testing, code reviews, etc. plus our existing application test suites - had a significant, positive effect on the quality of our code.
Java, Signed JARs and JAAS.
Java to prevent buffer overflow and pointer/stack whacking exploits.
Don't use JNI. ( Java Native Interface) it exposes you to DLL/Shared libraries.
Signed JAR's to stop class loading being a security problem.
JAAS can let your application not trust anyone, even itself.
J2EE has (admittedly limited) built-in support for Role based security.
There is some overhead for some of this but the security holes go away.
Simple answer: It depends.
Too much defensive coding can cause major performance issues.

Resources