Handling invalid states in haskell - haskell

I'm trying to get a better feel for how to handle error states in Haskell, since there seem to be a lot of ways to do it. Ideally, my data structures would make any invalid inputs unrepresentable, but despite considerable effort to the contrary, I still occasionally end up working with data where the type system can allow invalid states. As an example, let's consider that my program input is the training results for a neural network. In order for math to work, each matrix needs to have the correct bounds, and that's not (really) representable by the type system. If data is invalid, there's really nothing the application can do but halt any further processing and notify someone of the problem (so it's not recoverable). What's the best way to handle this in Haskell? It seems like I could:
1) Use error or other partial functions when processing my data. My understanding is this should only be used to represent a bug in the code. So it would have to be coupled with some sort of validation at the point that I load the data, and any point "after" that check I just assume that the data is in a valid format. This feels imperative to me, and doesn't seem to fit very well with lazy, declarative code.
2) Throw an exception when processing the data using Control.Exception.throw, and then catch it at the top level where I can alert someone. Contrary to error, I believe this doesn't indicate a bug in the program, so perhaps there wouldn't be verification when I load the data beyond what can be represented through the type system? The presence or absence of an exception when processing the data would define the verification.
3) Lift any data processing that could fail into the IO monad and use Control.Exception.throwIO.
4) Lift any data processing that could fail into the IO monad and use fail (I've read that using fail frowned on by the community?)
5) Return an Either or something similar, and let that bubble up through all your logic. I've definitely had some cases where composing Eithers becomes (to me) exceedingly impractical.
6) Use Control.Monad.Exception, which I only marginally understand, but seems to involve lifting any data processing that could fail into some exceptional monad, that I think is supposed to be more easily composeable than Either?
and I'm not even sure that's all the options. Is there an approach to this problem that's generally accepted by the community, or is this really an opinionated topic?


How does lazy-evaluation allow for greater modularization?

In his article "Why Functional Programming Matters," John Hughes argues that "Lazy evaluation is perhaps the most powerful tool for modularization in the functional programmer's repertoire." To do so, he provides an example like this:
Suppose you have two functions, "infiniteLoop" and "terminationCondition." You can do the following:
terminationCondition(infiniteLoop input)
Lazy evaluation, in Hughes' words "allows termination conditions to be separated from loop bodies." This is definitely true, since "terminationCondition" using lazy evaluation here means this condition can be defined outside the loop -- infiniteLoop will stop executing when terminationCondition stops asking for data.
But couldn't higher-order functions achieve the same thing as follows?
infiniteLoop(input, terminationCondition)
How does lazy evaluation provide modularization here that's not provided by higher-order functions?
Yes you could use a passed in termination check, but for that to work the author of infiniteLoop would have had to forsee the possibility of wanting to terminate the loop with that sort of condition, and hardwire a call to the termination condition into their function.
And even if the specific condition can be passed in as a function, the "shape" of it is predetermined by the author of infiniteLoop. What if they give me a termination condition "slot" that is called on each element, but I need access to the last several elements to check some sort of convergence condition? Maybe for a simple sequence generator you could come up with "the most general possible" termination condition type, but it's not obvious how to do so and remain efficient and easy to use. Do I repeatedly pass the entire sequence so far into the termination condition, in case that's what it's checking? Do I force my callers to wrap their simple termination conditions up in a more complicated package so they fit the most general condition type?
The callers certainly have to know exactly how the termination condition is called in order to supply a correct condition. That could be quite a bit of dependence on this specific implementation. If they switch to a different implementation of infiniteLoop written by another third party, how likely is it that exactly the same design for the termination condition would be used? With a lazy infiniteLoop, I can drop in any implementation that is supposed to produce the same sequence.
And what if infiniteLoop isn't a simple sequence generator, but actually generates a more complex infinite data structure, like a tree? If all the branches of the tree are independently recursively generated (think of a move tree for a game like chess) it could make sense to cut different branches at different depths, based on all sorts of conditions on the information generated thus far.
If the original author didn't prepare (either specifically for my use case or for a sufficiently general class of use cases), I'm out of luck. The author of the lazy infiniteLoop can just write it the natural way, and let each individual caller lazily explore what they want; neither has to know much about the other at all.
Furthermore, what if the decision to stop lazily exploring the infinite output is actually interleaved with (and dependent on) the computation the caller is doing with that output? Think of the chess move tree again; how far I want to explore one branch of the tree could easily depend on my evaluation of the best option I've found in other branches of the tree. So either I do my traversal and calculation twice (once in the termination condition to return a flag telling infinteLoop to stop, and then once again with the finite output so I can actually have my result), or the author of infiniteLoop had to prepare for not just a termination condition, but a complicated function that also gets to return output (so that I can push my entire computation inside the "termination condition").
Taken to extremes, I could explore the output and calculate some results, display them to a user and get input, and then continue exploring the data structure (without recalling infiniteLoop based on the user's input). The original author of the lazy infiniteLoop need have no idea that I would ever think of doing such a thing, and it will still work. If we've got purity enforced by the type system, then that would be impossible with the passed-in termination condition approach unless the whole infiniteLoop was allowed to have side effects if the termination condition needs to (say by giving the whole thing a monadic interface).
In short, to allow the same flexibility you'd get with lazy evaluation by using a strict infiniteLoop that takes higher order functions to control it can be a large amount of extra complexity for both the author of infiniteLoop and its caller (unless a variety of simpler wrappers are exposed, and one of them matches the caller's use case). Lazy evaluation can allow producers and consumers to be almost completely decoupled, while still giving the consumer the ability to control how much output the producer generates. Everything you can do that way you can do with extra function arguments as you say, but it requires to the producer and consumer to essentially agree on a protocol for how the control functions work; and that protocol is almost always either specialised to the use case at hand (tying the consumer and producer together) or so complicated in order to be fully-general that the producer and consumer are up tied to that protocol, which is unlikely to be recreated elsewhere, and so they're still tied together.

Do we care about the 'past' in FRP?

When toying around with implementing FRP one thing I've found that is confusing is what to
do with the past? Basically, my understanding was that I would be able to do this with a Behaviour at any point:
beh.at(x) // where time x < now
This seems like it could be problematic performance wise in a case such as this:
val beh = Stepper(0, event) // stepwise behaviour
Here we can see that to evaluate the Behaviour in the past we need to keep all the Events and we will end up performing (at worst) linear scans each time we sample.
Do we want this ability to be available or should Behaviours only be allowed to be evaluated at a time >= now? Do we even want to expose the at function to the programmer?
While a behaviour is considered to be a function of time, reliance on an arbitrary amount of past data in FRP is a Bad Thing, and is referred to as a time leak. That is, transformations on behaviours should generally be streaming/reactive in that they do not rely on more than a bounded amount of the past (and should accumulate this knowledge of the history explicitly).
So, no, at is not desirable in a real FRP system: it should not be possible to look at either the past or the future. (The latter is, of course, impossible, if the state of the future depends on anything external to the FRP system.)
Of course, this leads to the problem that only being able to look at the exact present severely restricts what you can do when writing a function to transform behaviours: Behaviour a -> Behaviour b becomes the same as a -> b, which makes many things we'd like to do impossible. But this is more an issue of finding a semantics, one of FRP's persistent problems, than anything else; as long as the primitive transformations on behaviours you provide are powerful enough without causing time leaks, everything should be fine. For more information on this, see Garbage collecting the semantics of FRP.

Is there an object-identity-based, thread-safe memoization library somewhere?

I know that memoization seems to be a perennial topic here on the haskell tag on stack overflow, but I think this question has not been asked before.
I'm aware of several different 'off the shelf' memoization libraries for Haskell:
The memo-combinators and memotrie packages, which make use of a beautiful trick involving lazy infinite data structures to achieve memoization in a purely functional way. (As I understand it, the former is slightly more flexible, while the latter is easier to use in simple cases: see this SO answer for discussion.)
The uglymemo package, which uses unsafePerformIO internally but still presents a referentially transparent interface. The use of unsafePerformIO internally results in better performance than the previous two packages. (Off the shelf, its implementation uses comparison-based search data structures, rather than perhaps-slightly-more-efficient hash functions; but I think that if you find and replace Cmp for Hashable and Data.Map for Data.HashMap and add the appropraite imports, you get a hash based version.)
However, I'm not aware of any library that looks answers up based on object identity rather than object value. This can be important, because sometimes the kinds of object which are being used as keys to your memo table (that is, as input to the function being memoized) can be large---so large that fully examining the object to determine whether you've seen it before is itself a slow operation. Slow, and also unnecessary, if you will be applying the memoized function again and again to an object which is stored at a given 'location in memory' 1. (This might happen, for example, if we're memoizing a function which is being called recursively over some large data structure with a lot of structural sharing.) If we've already computed our memoized function on that exact object before, we can already know the answer, even without looking at the object itself!
Implementing such a memoization library involves several subtle issues and doing it properly requires several special pieces of support from the language. Luckily, GHC provides all the special features that we need, and there is a paper by Peyton-Jones, Marlow and Elliott which basically worries about most of these issues for you, explaining how to build a solid implementation. They don't provide all details, but they get close.
The one detail which I can see which one probably ought to worry about, but which they don't worry about, is thread safety---their code is apparently not threadsafe at all.
My question is: does anyone know of a packaged library which does the kind of memoization discussed in the Peyton-Jones, Marlow and Elliott paper, filling in all the details (and preferably filling in proper thread-safety as well)?
Failing that, I guess I will have to code it up myself: does anyone have any ideas of other subtleties (beyond thread safety and the ones discussed in the paper) which the implementer of such a library would do well to bear in mind?
Following #luqui's suggestion below, here's a little more data on the exact problem I face. Let's suppose there's a type:
data Node = Node [Node] [Annotation]
This type can be used to represent a simple kind of rooted DAG in memory, where Nodes are DAG Nodes, the root is just a distinguished Node, and each node is annotated with some Annotations whose internal structure, I think, need not concern us (but if it matters, just ask and I'll be more specific.) If used in this way, note that there may well be significant structural sharing between Nodes in memory---there may be exponentially more paths which lead from the root to a node than there are nodes themselves. I am given a data structure of this form, from an external library with which I must interface; I cannot change the data type.
I have a function
myTransform : Node -> Node
the details of which need not concern us (or at least I think so; but again I can be more specific if needed). It maps nodes to nodes, examining the annotations of the node it is given, and the annotations its immediate children, to come up with a new Node with the same children but possibly different annotations. I wish to write a function
recursiveTransform : Node -> Node
whose output 'looks the same' as the data structure as you would get by doing:
recursiveTransform Node originalChildren annotations =
myTransform Node recursivelyTransformedChildren annotations
recursivelyTransformedChildren = map recursiveTransform originalChildren
except that it uses structural sharing in the obvious way so that it doesn't return an exponential data structure, but rather one on the order of the same size as its input.
I appreciate that this would all be easier if say, the Nodes were numbered before I got them, or I could otherwise change the definition of a Node. I can't (easily) do either of these things.
I am also interested in the general question of the existence of a library implementing the functionality I mention quite independently of the particular concrete problem I face right now: I feel like I've had to work around this kind of issue on a few occasions, and it would be nice to slay the dragon once and for all. The fact that SPJ et al felt that it was worth adding not one but three features to GHC to support the existence of libraries of this form suggests that the feature is genuinely useful and can't be worked around in all cases. (BUT I'd still also be very interested in hearing about workarounds which will help in this particular case too: the long term problem is not as urgent as the problem I face right now :-) )
1 Technically, I don't quite mean location in memory, since the garbage collector sometimes moves objects around a bit---what I really mean is 'object identity'. But we can think of this as being roughly the same as our intuitive idea of location in memory.
If you only want to memoize based on object identity, and not equality, you can just use the existing laziness mechanisms built into the language.
For example, if you have a data structure like this
data Foo = Foo { ... }
expensive :: Foo -> Bar
then you can just add the value to be memoized as an extra field and let the laziness take care of the rest for you.
data Foo = Foo { ..., memo :: Bar }
To make it easier to use, add a smart constructor to tie the knot.
makeFoo ... = let foo = Foo { ..., memo = expensive foo } in foo
Though this is somewhat less elegant than using a library, and requires modification of the data type to really be useful, it's a very simple technique and all thread-safety issues are already taken care of for you.
It seems that stable-memo would be just what you needed (although I'm not sure if it can handle multiple threads):
Whereas most memo combinators memoize based on equality, stable-memo does it based on whether the exact same argument has been passed to the function before (that is, is the same argument in memory).
stable-memo only evaluates keys to WHNF.
This can be more suitable for recursive functions over graphs with cycles.
stable-memo doesn't retain the keys it has seen so far, which allows them to be garbage collected if they will no longer be used. Finalizers are put in place to remove the corresponding entries from the memo table if this happens.
Data.StableMemo.Weak provides an alternative set of combinators that also avoid retaining the results of the function, only reusing results if they have not yet been garbage collected.
There is no type class constraint on the function's argument.
stable-memo will not work for arguments which happen to have the same value but are not the same heap object. This rules out many candidates for memoization, such as the most common example, the naive Fibonacci implementation whose domain is machine Ints; it can still be made to work for some domains, though, such as the lazy naturals.
Ekmett just uploaded a library that handles this and more (produced at HacPhi): http://hackage.haskell.org/package/intern. He assures me that it is thread safe.
Edit: Actually, strictly speaking I realize this does something rather different. But I think you can use it for your purposes. It's really more of a stringtable-atom type interning library that works over arbitrary data structures (including recursive ones). It uses WeakPtrs internally to maintain the table. However, it uses Ints to index the values to avoid structural equality checks, which means packing them into the data type, when what you want are apparently actually StableNames. So I realize this answers a related question, but requires modifying your data type, which you want to avoid...

What does it mean for something to "compose well"?

Many a times, I've come across statements of the form
X does/doesn't compose well.
I can remember few instances that I've read recently :
Macros don't compose well (context: clojure)
Locks don't compose well (context: clojure)
Imperative programming doesn't compose well... etc.
I want to understand the implications of composability in terms of designing/reading/writing code ? Examples would be nice.
"Composing" functions basically just means sticking two or more functions together to make a big function that combines their functionality in a useful way. Essentially, you define a sequence of functions and pipe the results of each one into the next, finally giving the result of the whole process. Clojure provides the comp function to do this for you, you could do it by hand too.
Functions that you can chain with other functions in creative ways are more useful in general than functions that you can only call in certain conditions. For example, if we didn't have the last function and only had the traditional Lisp list functions, we could easily define last as (def last (comp first reverse)). Look at that — we didn't even need to defn or mention any arguments, because we're just piping the result of one function into another. This would not work if, for example, reverse took the imperative route of modifying the sequence in-place. Macros are problematic as well because you can't pass them to functions like comp or apply.
Composition in programming means assembling bigger pieces out of smaller ones.
Composition of unary functions creates a more complicated unary function by chaining simpler ones.
Composition of control flow constructs places control flow constructs inside other control flow constructs.
Composition of data structures combines multiple simpler data structures into a more complicated one.
Ideally, a composed unit works like a basic unit and you as a programmer do not need to be aware of the difference. If things fall short of the ideal, if something doesn't compose well, your composed program may not have the (intended) combined behavior of its individual pieces.
Suppose I have some simple C code.
void run_with_resource(void) {
Resource *r = create_resource();
C facilitates compositional reasoning about control flow at the level of functions. I don't have to care about what actually happens inside do_some_work(); I know just by looking at this small function that every time a resource is created on line 2 with create_resource(), it will eventually be destroyed on line 4 by destroy_resource().
Well, not quite. What if create_resource() acquires a lock and destroy_resource() frees it? Then I have to worry about whether do_some_work acquires the same lock, which would prevent the function from finishing. What if do_some_work() calls longjmp(), and skips the end of my function entirely? Until I know what goes on in do_some_work(), I won't be able to predict the control flow of my function. We no longer have compositionality: we can no longer decompose the program into parts, reason about the parts independently, and carry our conclusions back to the whole program. This makes designing and debugging much harder and it's why people care whether something composes well.
"Bang for the Buck" - composing well implies a high ratio of expressiveness per rule-of-composition. Each macro introduces its own rules of composition. Each custom data structure does the same. Functions, especially those using general data structures have far fewer rules.
Assignment and other side effects, especially wrt concurrency have even more rules.
Think about when you write functions or methods. You create a group of functionality to do a specific task. When working in an Object Oriented language you cluster your behavior around the actions you think a distinct entity in the system will perform. Functional programs break away from this by encouraging authors to group functionality according to an abstraction. For example, the Clojure Ring library comprises a group of abstractions that cover routing in web applications.
Ring is composable where functions that describe paths in the system (routes) can be grouped into higher order functions (middlewhere). In fact, Clojure is so dynamic that it is possible (and you are encouraged) to come up with patterns of routes that can be dynamically created at runtime. This is the essence of composablilty, instead of coming up with patterns that solve a certain problem you focus on patterns that generate solutions to a certain class of problem. Builders and code generators are just two of the common patterns used in functional programming. Function programming is the art of patterns that generate other patterns (and so on and so on).
The idea is to solve a problem at its most basic level then come up with patterns or groups of the lowest level functions that solve the problem. Once you start to see patterns in the lowest level you've discovered composition. As folks discover second order patterns in groups of functions they may start to see a third level. And so on...
Composition (in the context you describe at a functional level) is typically the ability to feed one function into another cleanly and without intermediate processing. Such an example of composition is in std::cout in C++:
cout << each << item << links << on;
That is a simple example of composition which doesn't really "look" like composition.
Another example with a form more visibly compositional:
Wikipedia Link
Composition is useful for readability and compactness, however chaining large collections of interlocking functions which can potentially return error codes or junk data can be hazardous (this is why it is best to minimize error code or null return values.)
Provided your functions use exceptions, or alternatively return null objects you can minimize the requirement for branching (if) on errors and maximize the compositional potential of your code at no extra risk.
Object composition (vs inheritance) is a separate issue (and not what you are asking, but it shares the name). It is one of containment to derive object hierarchy as opposed to direct inheritance.
Within the context of clojure, this comment addresses certain aspects of composability. In general, it seems to emerge when units of the system do one thing well, do not require other units to understand its internals, eschew side-effects, and accept and return the system's pervasive data structures. All of the above can be seen in M2tM's C++ example.
composability, applied to functions, means that the functions are smaller and well-defined, thus easy to integrate into other functions (i have seen this idea in the book "the joy of clojure")
the concept can apply to other things that are supposed be composed into something else.
the purpose of composability is reuse. for example, a function well-build (composable) is easier to reuse
macros aren't that well-composable because you can't pass them as parameters
lock are crap because you can't really give them names (define them well) or reuse them. you just do them inplace
imperative languages aren't that composable because (some of them, at least) don't have closures. if you want functionality passed as parameter, you're screwed. you have to build an object and pass that; disclaimer here: this last idea i'm not entirely convinced is true, therefore research more before taking it for granted
another idea on imperative languages is that they don't compose well because they imply state (from wikipedia knowledgebase :) "Imperative programming - describes computation in terms of statements that change a program state").
state does not compose well because although you have given a specific "something" in input, that "something" generates an output according to it's state. different internal state, different behaviour. and thus you can say good-bye to what you where expecting to happen.
with state, you depend to much on knowing what the current state of an object is... if you want to predict it's behavior. more stuff to keep in the back of your mind, less composable (remember well-defined ? or "small and simple", as in "easy to use" ?)
ps: thinking of learning clojure, huh ? investigating... ? good for you ! :P

Programming style question on how to code functions

So, I was just coding a bit today, and I realized that I don't have much consistency when it comes to a coding style when programming functions. One of my main concerns is whether or not its proper to code it so that you check that the input of the user is valid OUTSIDE of the function, or just throw the values passed by the user into the function and check if the values are valid in there. Let me sketch an example:
I have a function that lists hosts based on an environment, and I want to be able to split the environment into chunks of hosts. So an example of the usage is this:
listhosts -e testenv -s 2 1
This will get all the hosts from the "testenv", split it up into two parts, and it is displaying part one.
In my code, I have a function that you pass it in a list, and it returns a list of lists based on you parameters for splitting. BUT, before I pass it a list, I first verify the parameters in my MAIN during the getops process, so in the main I check to make sure there are no negatives passed by the user, I make sure the user didnt request to split into say, 4 parts, but asking to display part 5 (which would not be valid), etc.
tl;dr: Would you check the validity of a users input the flow of you're MAIN class, or would you do a check in your function itself, and either return a valid response in the case of valid input, or return NULL in the case of invalid input?
Obviously both methods work, I'm just interested to hear from experts as to which approach is better :) Thanks for any comments and suggestions you guys have! FYI, my example is coded in Python, but I'm still more interested in a general programming answer as opposed to a language-specific one!
Good question! My main advice is that you approach the problem systematically. If you are designing a function f, here is how I think about its specification:
What are the absolute requirements that a caller of f must meet? Those requirements are f's precondition.
What does f do for its caller? When f returns, what is the return value and what is the state of the machine? Under what circumstances does f throw an exception, and what exception is thrown? The answers to all these questions constitute f's postcondition.
The precondition and postcondition together constitute f's contract with callers.
Only a caller meeting the precondition gets to rely on the postcondition.
Finally, bearing directly on your question, what happens if f's caller doesn't meet the precondition? You have two choices:
You guarantee to halt the program, one hopes with an informative message. This is a checked run-time error.
Anything goes. Maybe there's a segfault, maybe memory is corrupted, maybe f silently returns a wrong answer. This is an unchecked run-time error.
Notice some items not on this list: raising an exception or returning an error code. If these behaviors are to be relied upon, they become part of f's contract.
Now I can rephrase your question:
What should a function do when its caller violates its contract?
In most kinds of applications, the function should halt the program with a checked run-time error. If the program is part of an application that needs to be reliable, either the application should provide an external mechanism for restarting an application that halts with a checked run-time error (common in Erlang code), or if restarting is difficult, all functions' contracts should be made very permissive so that "bad input" still meets the contract but promises always to raise an exception.
In every program, unchecked run-time errors should be rare. An unchecked run-time error is typically justified only on performance grounds, and even then only when code is performance-critical. Another source of unchecked run-time errors is programming in unsafe languages; for example, in C, there's no way to check whether memory pointed to has actually been initialized.
Another aspect of your question is
What kinds of contracts make the best designs?
The answer to this question varies more depending on the problem domain.
Because none of the work I do has to be high-availability or safety-critical, I use restrictive contracts and lots of checked run-time errors (typically assertion failures). When you are designing the interfaces and contracts of a big system, it is much easier if you keep the contracts simple, you keep the preconditions restrictive (tight), and you rely on checked run-time errors when arguments are "bad".
I have a function that you pass it in a list, and it returns a list of lists based on you parameters for splitting. BUT, before I pass it a list, I first verify the parameters in my MAIN during the getops process, so in the main I check to make sure there are no negatives passed by the user, I make sure the user didnt request to split into say, 4 parts, but asking to display part 5.
I think this is exactly the right way to solve this particular problem:
Your contract with the user is that the user can say anything, and if the user utters a nonsensical request, your program won't fall over— it will issue a sensible error message and then continue.
Your internal contract with your request-processing function is that you will pass it only sensible requests.
You therefore have a third function, outside the second, whose job it is to distinguish sense from nonsense and act accordingly—your request-processing function gets "sense", the user is told about "nonsense", and all contracts are met.
One of my main concerns is whether or not its proper to code it so that you check that the input of the user is valid OUTSIDE of the function.
Yes. Almost always this is the best design. In fact, there's probably a design pattern somewhere with a fancy name. But if not, experienced programmers have seen this over and over again. One of two things happens:
parse / validate / reject with error message
parse / validate / process
This kind of design has one data type (request) and four functions. Since I'm writing tons of Haskell code this week, I'll give an example in Haskell:
data Request -- type of a request
parse :: UserInput -> Request -- has a somewhat permissive precondition
validate :: Request -> Maybe ErrorMessage -- has a very permissive precondition
process :: Request -> Result -- has a very restrictive precondition
Of course there are many other ways to do it. Failures could be detected at the parsing stage as well as the validation stage. "Valid request" could actually be represented by a different type than "unvalidated request". And so on.
I'd do the check inside the function itself to make sure that the parameters I was expecting were indeed what I got.
Call it "defensive programming" or "programming by contract" or "assert checking parameters" or "encapsulation", but the idea is that the function should be responsible for checking its own pre- and post-conditions and making sure that no invariants are violated.
If you do it outside the function, you leave yourself open to the possibility that a client won't perform the checks. A method should not rely on others knowing how to use it properly.
If the contract fails you either throw an exception, if your language supports them, or return an error code of some kind.
Checking within the function adds complexity, so my personal policy is to do sanity checking as far up the stack as possible, and catch exceptions as they arise. I also make sure that my functions are documented so that other programmers know what the function expects of them. They may not always follow such expectations, but to be blunt, it is not my job to make their programs work.
It often makes sense to check the input in both places.
In the function you should validate the inputs and throw an exception if they are incorrect. This prevents invalid inputs causing the function to get halfway through and then throw an unexpected exception like "array index out of bounds" or similar. This will make debugging errors much simpler.
However throwing exceptions shouldn't be used as flow control and you wouldn't want to throw the raw exception straight to the user, so I would also add logic in the user interface to make sure I never call the function with invalid inputs. In your case this would be displaying a message on the console, but in other cases it might be showing a validation error in a GUI, possibly as you are typing.
"Code Complete" suggests an isolation strategy where one could draw a line between classes that validate all input and classes that treat their input as already validated. Anything allowed to pass the validation line is considered safe and can be passed to functions that don't do validation (they use asserts instead, so that errors in the external validation code can manifest themselves).
How to handle errors depends on the programming language; however, when writing a commandline application, the commandline really should validate that the input is reasonable. If the input is not reasonable, the appropriate behavior is to print a "Usage" message with an explanation of the requirements as well as to exit with a non-zero status code so that other programs know it failed (by testing the exit code).
Silent failure is the worst kind of failure, and that is what happens if you simply return incorrect results when given invalid arguments. If the failure is ever caught, then it will most likely be discovered very far away from the true point of failure (passing the invalid argument). Therefore, it is best, IMHO to throw an exception (or, where not possible, to return an error status code) when an argument is invalid, since it flags the error as soon as it occurs, making it much easier to identify and correct the true cause of failure.
I should also add that it is very important to be consistent in how you handle invalid inputs; you should either check and throw an exception on invalid input for all functions or do that for none of them, since if users of your interface discover that some functions throw on invalid input, they will begin to rely on this behavior and will be incredibly surprised when other function simply return invalid results rather than complaining.
