So, I was just coding a bit today, and I realized that I don't have much consistency when it comes to a coding style when programming functions. One of my main concerns is whether or not its proper to code it so that you check that the input of the user is valid OUTSIDE of the function, or just throw the values passed by the user into the function and check if the values are valid in there. Let me sketch an example:
I have a function that lists hosts based on an environment, and I want to be able to split the environment into chunks of hosts. So an example of the usage is this:
listhosts -e testenv -s 2 1
This will get all the hosts from the "testenv", split it up into two parts, and it is displaying part one.
In my code, I have a function that you pass it in a list, and it returns a list of lists based on you parameters for splitting. BUT, before I pass it a list, I first verify the parameters in my MAIN during the getops process, so in the main I check to make sure there are no negatives passed by the user, I make sure the user didnt request to split into say, 4 parts, but asking to display part 5 (which would not be valid), etc.
tl;dr: Would you check the validity of a users input the flow of you're MAIN class, or would you do a check in your function itself, and either return a valid response in the case of valid input, or return NULL in the case of invalid input?
Obviously both methods work, I'm just interested to hear from experts as to which approach is better :) Thanks for any comments and suggestions you guys have! FYI, my example is coded in Python, but I'm still more interested in a general programming answer as opposed to a language-specific one!
Good question! My main advice is that you approach the problem systematically. If you are designing a function f, here is how I think about its specification:
What are the absolute requirements that a caller of f must meet? Those requirements are f's precondition.
What does f do for its caller? When f returns, what is the return value and what is the state of the machine? Under what circumstances does f throw an exception, and what exception is thrown? The answers to all these questions constitute f's postcondition.
The precondition and postcondition together constitute f's contract with callers.
Only a caller meeting the precondition gets to rely on the postcondition.
Finally, bearing directly on your question, what happens if f's caller doesn't meet the precondition? You have two choices:
You guarantee to halt the program, one hopes with an informative message. This is a checked run-time error.
Anything goes. Maybe there's a segfault, maybe memory is corrupted, maybe f silently returns a wrong answer. This is an unchecked run-time error.
Notice some items not on this list: raising an exception or returning an error code. If these behaviors are to be relied upon, they become part of f's contract.
Now I can rephrase your question:
What should a function do when its caller violates its contract?
In most kinds of applications, the function should halt the program with a checked run-time error. If the program is part of an application that needs to be reliable, either the application should provide an external mechanism for restarting an application that halts with a checked run-time error (common in Erlang code), or if restarting is difficult, all functions' contracts should be made very permissive so that "bad input" still meets the contract but promises always to raise an exception.
In every program, unchecked run-time errors should be rare. An unchecked run-time error is typically justified only on performance grounds, and even then only when code is performance-critical. Another source of unchecked run-time errors is programming in unsafe languages; for example, in C, there's no way to check whether memory pointed to has actually been initialized.
Another aspect of your question is
What kinds of contracts make the best designs?
The answer to this question varies more depending on the problem domain.
Because none of the work I do has to be high-availability or safety-critical, I use restrictive contracts and lots of checked run-time errors (typically assertion failures). When you are designing the interfaces and contracts of a big system, it is much easier if you keep the contracts simple, you keep the preconditions restrictive (tight), and you rely on checked run-time errors when arguments are "bad".
I have a function that you pass it in a list, and it returns a list of lists based on you parameters for splitting. BUT, before I pass it a list, I first verify the parameters in my MAIN during the getops process, so in the main I check to make sure there are no negatives passed by the user, I make sure the user didnt request to split into say, 4 parts, but asking to display part 5.
I think this is exactly the right way to solve this particular problem:
Your contract with the user is that the user can say anything, and if the user utters a nonsensical request, your program won't fall over— it will issue a sensible error message and then continue.
Your internal contract with your request-processing function is that you will pass it only sensible requests.
You therefore have a third function, outside the second, whose job it is to distinguish sense from nonsense and act accordingly—your request-processing function gets "sense", the user is told about "nonsense", and all contracts are met.
One of my main concerns is whether or not its proper to code it so that you check that the input of the user is valid OUTSIDE of the function.
Yes. Almost always this is the best design. In fact, there's probably a design pattern somewhere with a fancy name. But if not, experienced programmers have seen this over and over again. One of two things happens:
parse / validate / reject with error message
parse / validate / process
This kind of design has one data type (request) and four functions. Since I'm writing tons of Haskell code this week, I'll give an example in Haskell:
data Request -- type of a request
parse :: UserInput -> Request -- has a somewhat permissive precondition
validate :: Request -> Maybe ErrorMessage -- has a very permissive precondition
process :: Request -> Result -- has a very restrictive precondition
Of course there are many other ways to do it. Failures could be detected at the parsing stage as well as the validation stage. "Valid request" could actually be represented by a different type than "unvalidated request". And so on.
I'd do the check inside the function itself to make sure that the parameters I was expecting were indeed what I got.
Call it "defensive programming" or "programming by contract" or "assert checking parameters" or "encapsulation", but the idea is that the function should be responsible for checking its own pre- and post-conditions and making sure that no invariants are violated.
If you do it outside the function, you leave yourself open to the possibility that a client won't perform the checks. A method should not rely on others knowing how to use it properly.
If the contract fails you either throw an exception, if your language supports them, or return an error code of some kind.
Checking within the function adds complexity, so my personal policy is to do sanity checking as far up the stack as possible, and catch exceptions as they arise. I also make sure that my functions are documented so that other programmers know what the function expects of them. They may not always follow such expectations, but to be blunt, it is not my job to make their programs work.
It often makes sense to check the input in both places.
In the function you should validate the inputs and throw an exception if they are incorrect. This prevents invalid inputs causing the function to get halfway through and then throw an unexpected exception like "array index out of bounds" or similar. This will make debugging errors much simpler.
However throwing exceptions shouldn't be used as flow control and you wouldn't want to throw the raw exception straight to the user, so I would also add logic in the user interface to make sure I never call the function with invalid inputs. In your case this would be displaying a message on the console, but in other cases it might be showing a validation error in a GUI, possibly as you are typing.
"Code Complete" suggests an isolation strategy where one could draw a line between classes that validate all input and classes that treat their input as already validated. Anything allowed to pass the validation line is considered safe and can be passed to functions that don't do validation (they use asserts instead, so that errors in the external validation code can manifest themselves).
How to handle errors depends on the programming language; however, when writing a commandline application, the commandline really should validate that the input is reasonable. If the input is not reasonable, the appropriate behavior is to print a "Usage" message with an explanation of the requirements as well as to exit with a non-zero status code so that other programs know it failed (by testing the exit code).
Silent failure is the worst kind of failure, and that is what happens if you simply return incorrect results when given invalid arguments. If the failure is ever caught, then it will most likely be discovered very far away from the true point of failure (passing the invalid argument). Therefore, it is best, IMHO to throw an exception (or, where not possible, to return an error status code) when an argument is invalid, since it flags the error as soon as it occurs, making it much easier to identify and correct the true cause of failure.
I should also add that it is very important to be consistent in how you handle invalid inputs; you should either check and throw an exception on invalid input for all functions or do that for none of them, since if users of your interface discover that some functions throw on invalid input, they will begin to rely on this behavior and will be incredibly surprised when other function simply return invalid results rather than complaining.
Related
I have a Python 3 class representing a finite state machine, with functions for actions to transition from state to state.
What type of error should I raise if a user calls actor.take_train() while actor.state is BED, or if a user calls actor.sleep() while actor.state is WORK? That last case is probably also ill-advised at most workplaces, but you do you.
The function call is valid sometimes, but invalid at others, and I'm unaware of whether there is a defined error appropriate to raise in this case.
Image courtesy of Dwarves Foundation
Well if you're specifically looking for an exception, probably a custom one derived from RuntimeError (ValueError would be the closest of the standard exceptions but it's not quite a match).
But whether to even throw an exception is a consideration of the specific use case for the state machine, and what's more useful / convenient e.g. you might want to just log the invalid signal and do nothing, or return a placeholder of some sort.
I'm trying to get a better feel for how to handle error states in Haskell, since there seem to be a lot of ways to do it. Ideally, my data structures would make any invalid inputs unrepresentable, but despite considerable effort to the contrary, I still occasionally end up working with data where the type system can allow invalid states. As an example, let's consider that my program input is the training results for a neural network. In order for math to work, each matrix needs to have the correct bounds, and that's not (really) representable by the type system. If data is invalid, there's really nothing the application can do but halt any further processing and notify someone of the problem (so it's not recoverable). What's the best way to handle this in Haskell? It seems like I could:
1) Use error or other partial functions when processing my data. My understanding is this should only be used to represent a bug in the code. So it would have to be coupled with some sort of validation at the point that I load the data, and any point "after" that check I just assume that the data is in a valid format. This feels imperative to me, and doesn't seem to fit very well with lazy, declarative code.
2) Throw an exception when processing the data using Control.Exception.throw, and then catch it at the top level where I can alert someone. Contrary to error, I believe this doesn't indicate a bug in the program, so perhaps there wouldn't be verification when I load the data beyond what can be represented through the type system? The presence or absence of an exception when processing the data would define the verification.
3) Lift any data processing that could fail into the IO monad and use Control.Exception.throwIO.
4) Lift any data processing that could fail into the IO monad and use fail (I've read that using fail frowned on by the community?)
5) Return an Either or something similar, and let that bubble up through all your logic. I've definitely had some cases where composing Eithers becomes (to me) exceedingly impractical.
6) Use Control.Monad.Exception, which I only marginally understand, but seems to involve lifting any data processing that could fail into some exceptional monad, that I think is supposed to be more easily composeable than Either?
and I'm not even sure that's all the options. Is there an approach to this problem that's generally accepted by the community, or is this really an opinionated topic?
I am trying to implement a utility library in nodeJS that I can use in different projects. I am stuck with how to handle errors properly.For example suppose I have a function
function dateCompare(date1,operator,date2) // it can take either a date object or valid date string.
Now suppose an invalid input is supplied-
1. I can return error in result like asynchronous logic- {error:true,result:""}, but this prevents me from using my function as if((date1,'eq',date2) || (date3,'l3',date4)
2. If I throw custom exception here, then I am afraid that node is single threaded and creating error context is very expensive.
How can we handle it so that it is easy to use as well as not very expensive? Under what circumstances throwing exceptions will be more appropriate even if it is too expensive ? some practical use cases will be very helpful.
There's no "right" answer for questions like this. There are various different philosophies and you have to decide which one makes the most sense for you or for your context.
Here's my general scheme:
If you detect a serious programming mistake such as a required argument to a function is missing or is the wrong type, then I prefer to just throw an exception and spell out in the exception msg exactly what is wrong. This should get seen by the developer the first time this code is run and they should then know they need to correct their code immediately. The general idea here is that you want the developer to see their error immediately and throwing an exception is usually the fastest way to do so and you can put a useful message in the exception.
If there are expected error return values such as "user name already taken" or "user name contains invalid characters" that are not programming mistakes, but are just an indication of why a given operation (perhaps containing user data) did not complete, then I would craft return values from the function that communicate this info to the caller.
If your function needs to return either a result or an error, then you have to decide on a case by case basis if it is easy to come up with a range of error values that are easily detectable as separate from the successful return values. For example, Array.prototype.indexOf() returns a negative value to indicate the value was not found or zero or a positive number to indicate it is returning an index. These ranges are completely independent so they are easy to code a test to distinguish them.
Another reason to throw an exception is that your code is likely to be used in a circumstance where it's simpler to let the exception propagate up several calling levels or block levels rather than manually writing code to propagate errors. This is a double edged sword. While sometimes it's very useful to let the exception propagate, sometimes you actually need to know about and deal with the exception at each level anyway to properly clean up in an error condition (release resources, etc...) so you can't let it go up that may levels automatically anyway.
If such a distinction is not simple to do for either you the code of the function or the developer who will call it, then sometimes it makes sense to return an object that has more than one property, one of which is an error property, another of which is a value.
In your specific case of:
function dateCompare(date1,operator,date2)
and
if (dateCompare(date1,'eq',date2) || dateCompare(date3,'l3',date4))
It sure would be convenient if the function just returns a boolean and throws an exception of the date values or operator are invalid. Whether this is good design decision depends a bit on how this is going to be used. If you're in a tight loop, running this on lots of values, many of which will be badly formatted and would throw such an exception and performance is important in this case, then it may be better to return the above-described object and change how you write the calling code.
But, if a format failure is not a regular expected case or you're only doing it once or the performance difference of an exception vs. a return value wouldn't even be noticed (which is usually the case), then throw the exception - it's a clean way to handle invalid input without polluting the expected use case of the function.
How can we handle it so that it is easy to use as well as not very
expensive?
It's not expensive to throw an exception upon bad input if that isn't the normally expected case. Plus, unless this code is in some kind of tight loop and called many times, it's unlikely you would even notice the difference between a return value and a thrown/caught exception. So, I'd suggest you code to make the expected cases simpler to code for and use exceptions for the unexpected conditions. Then, your expected code path doesn't go the exception route. In other words, exceptions actually are "exceptions" to normal.
Under what circumstances throwing exceptions will be more appropriate
even if it is too expensive?
See the description above.
I'm new to programming and I have a conceptual question.
That is, can "exception" be perfectly replaced by "if.. else" ?
I know "exception" is to handling some exceptional conditions that might cause error or crash.
But we also use "if.. else" to ensure the correctness of value of variables, don't we?
Or "exception" can really be replaced by "if.. else", but using "exception" has other benefits(like convenience?)
Thank you, and sorry for my poor English.
The biggest difference between exceptions and "if..else" is that exceptions pass up the call stack: an exception raised in one function can be caught in a caller any number of frames up the stack. Using "if" statements doesn't let you transfer control in this way, everything has to be handled in the same function that detected the condition.
Most of your questions relate to Python, so here is an answer based on that fact.
In Python, it is idiomatic (or "pythonic") to use try-except blocks. We call this "EAFP": Easier to ask for forgiveness than permission.
In C, where there were no exceptions, it was usual to "LBYL": Look before you leap, resulting in lots of if (...) statements.
So, while you can LBYL, you should follow the idioms of the language in which you are programming: using exceptions for handling exceptional cases and if-statements for conditionals.
Technically, the answer is yes, exceptions can be perfectly replaced by if-else. Many languages, C for example, have no native notion of exceptions that can be thrown and caught.
The primary advantage of exceptions is code readability and maintainability. They serve a different purpose than if-else. Exceptions are for exceptional conditions, while if-else is for program flow.
See this excellent article explaining the difference.
That's a lot of branch conditions to manage. In theory, exceptions aren't necessary for perfect code, but perfect code does not exist in real life. Exceptions are a well-established mechanism for dealing with problems in a controlled manner.
The old way for handling an error from a function looks something like this:
int result = function_returns_error_code();
if (result != GOOD)
{
/* handle problem */
}
else
{
/* keep going */
}
The problem with this solution (and others like it - using if-else) is that if there is a real problem, and the programmer does not properly handle it with an if...else (if the function returns an error code indicating major problems, but the programmer forgets about it), it is left ignored. With an exception, it goes further and further up the call stack ) until it is either handled or the program quits.
Further, it is tedious to check for error codes in functions, or pass a parameter into which to put an error code. It is simpler, cleaner, and better to use exceptions, for maintainability and abstraction.
In most high-level languages working with exceptions is often more efficient than if-else because you avoid multiple validation. eg:
if value is not 0 then print 10 / value
In most interpreters 10 / value will internally test whether value is a valid divider before using it so you've actually tested for the same problem twice. In some cases the exception may come all the way up from hardware so no software validation is happening at all.
On the other hand:
try print 10 / value ... catch exception
Will only test whether value is valid once. Furthermore there's a good chance the test will be better optimised than your own code and more capable of handling truly unexpected conditions (like out of memory errors).
If you look at the call stack of a program and treat each return pointer as a token, what kind of automata is needed to build a recognizer for the valid states of the program?
As a corollary, what kind of automata is needed to build a recognizer for a specific bug state?
(Note: I'm only looking at the info that could be had from this function.)
My thought is that if these form regular languages than some interesting tools could be built around that. E.g. given a set of crash/failure dumps, automatically group them and generate a recognizer to identify new instances of know bugs.
Note: I'm not suggesting this as a diagnostic tool but as a data management tool for turning a pile of crash reports into something more useful.
"These 54 crashes seem related, as do those 42."
"These new crashes seem unrelated to anything before date X."
etc.
It would seem that I've not been clear about what I'm thinking of accomplishing, so here's an example:
Say you have a program that has three bugs in it.
Two bugs that cause invalid args to be passed to a single function tripping the same sanity check.
A function that if given a (valid) corner case goes into an infinite recursion.
Also as that when the program crashes (failed assert, uncaught exception, seg-V, stack overflow, etc.) it grabs a stack trace, extracts the call sites on it and ships them to a QA reporting server. (I'm assuming that only that information is extracted because 1, it's easy to get with a one time per project cost and 2, it has a simple, definite meaning that can be used without any special knowledge about the program)
What I'm proposing would be a tool that would attempt to classify incoming reports as connected to one of the known bugs (or as a new bug).
The simplest thing would be to assume that one failure site is one bug, but in the first example, two bugs get detected in the same place. The next easiest thing would be to require the entire stack to match, but again, this doesn't work in cases like the second example where you have multiple pieces of (valid) valid code that can trip the same bug.
The return pointer on the stack is just a pointer to memory. In theory if you look at the call stack of a program that just makes one function call, the return pointer (for that one function) can have different value for every execution of the program. How would you analyze that?
In theory you could read through a core dump using a map file. But doing so is extremely platform and compiler specific. You would not be able to create a general tool for doing this with any program. Read your compiler's documentation to see if it includes any tools for doing postmortem analysis.
If your program is decorated with assert statements, then each assert statement defines a valid state. The program statements between the assertions define the valid state changes.
A program that crashes has violated enough assertions that something broken.
A program that's incorrect but "flaky" has violated at least one assertion but hasn't failed.
It's not at all clear what you're looking for. The valid states are -- sometimes -- hard to define but -- usually -- easy to represent as simple assert statements.
Since a crashed program has violated one or more assertions, a program with explicit, executable assertions, doesn't need an crash debugging. It will simply fail an assert statement and die visibly.
If you don't want to put in assert statements then it's essentially impossible to know what state should have been true and which (never-actually-stated) assertion was violated.
Unwinding the call stack to work out the position and the nesting is trivial. But it's not clear what that shows. It tells you what broke, but not what other things lead to the breakage. That would require guessing what assertions where supposed to have been true, which requires deep knowledge of the design.
Edit.
"seem related" and "seem unrelated" are undefinable without recourse to the actual design of the actual application and the actual assertions that should be true in each stack frame.
If you don't know the assertions that should be true, all you have is a random puddle of variables. What can you claim about "related" given a random pile of values?
Crash 1: a = 2, b = 3, c = 4
Crash 2: a = 3, b = 4, c = 5
Related? Unrelated? How can you classify these without knowing everything about the code? If you know everything about the code, you can formulate standard assert-statement conditions that should have been true. And then you know what the actual crash is.