Why are new programming languages shifting types to the other side? - programming-languages

If you look at Rust, Go, Swift, TypeScript and a few others, and compare them to C/C++, the first thing that I noticed was how the types have moved positions.
int one = 1;
In comparsion to:
let one:int = 1;
My question: Why?
To me, personally, it is weird reading type specifiers that far into the line, since I am very used to them being on the left. So it interests me on why the type specifiers are being moved - and this not being the case with just one, but many modern/new languages that are on the table.

To me, personally, it is weird reading type specifiers that far into the line, since I am very used to them being on the left
And English is the best language because it is the only language where the words are spoken in the same order I think them. One wonders why anyone speaks French at all, with the words all in the wrong order!
So it interests me on why the type specifiers are being moved - and this not being the case with just one, but many modern/new languages that are on the table.
I note that you ignore the existence of the many older languages which use this pattern. Visual Basic (mid 1990s) immediately comes to mind.
Function F(x As String) As Object
Pascal, 1970s:
var
Set1 : set of 1..10;
Simply-typed lambda calculus, a programming language invented before computers, in the 1940s:
λx:S.λy:T:S-->T-->S
The whole ML family. I could go on. There are plenty of very old languages that use the types on the right convention.
But we can get far older than the 1940s. When you say in mathematics f : Q --> R, you are putting the name of the function on the left and the type -- a map from Q to R -- on the right. When you say x∈R to indicate that x is a real, you're putting the type on the right. "Type on the right" predates type on the left in C by literally centuries. This is not anything new!
In fact the "types on the left" syntax is the weird one! It just seems natural to you because you happen to have used a language that uses this convention in your formative years.
The types on the right syntax is much superior, for numerous reasons. Just a few:
var x : int = 1;
function y(z : int) : string { ... }
emphasizes that x is a variable and y is a function. If the type comes to the left and you see int y then you don't know whether it is a function or a variable until later. This makes programs harder for humans to read, which is bad enough. As a compiler developer, let me tell you it is quite inconvenient that the type comes on the left in C#. (I could point out numerous inconsistencies in how C# syntax deals with the positions of types.)
Another reason: In the "type on the right" syntax you can make types optional. If you have
var x : int = 1;
then you can easily say "well, we can infer the int, and so eliminate it"
var x = 1;
but if the int is on the left, then what do you do?
Inverting this: you mention TypeScript. TypeScript is a gradually-typed JavaScript. The convention in JavaScript is already
var x = 1;
function f(y) { }
Given that, plainly it is easier to modify both existing code, and the language as a whole, to introduce optional type elements on the right than it would be to make the "var" and "function" keywords replaced by a type.
Consider also the positioning. When you say:
int x = 1;
then the two things that must be consistent -- the type and the initializer -- are as far apart as they possibly can be. With var x : int = 1; they are side by side. And in
int f() {
...
...
return 123;
}
what have we got? The return is logically as far to the right as possible, so why does the function declaration move the type of the return as far to the left as possible?" With the type on the right syntax we have this nice flow:
function f(x : string) : int
{ ... ... ... return 123; }
What happens in a function call? The flow of the declaration is now the same as the flow of control: the things on the left -- initialization of formal parameters -- happens first, and the things on the right -- production of a return value -- happen last.
I could go on at some additional length pointing out how the C style gets it completely backwards, but it is late. Summing up: first, type on the right is superior in almost every possible way, and second, it is very, very old. New languages which use this convention are the ones that are being consistent with traditional practice.

If you do a web search, it is not hard to find the developers of newer languages answering this question in their own words. For example, the Go developers have a FAQ entry on this, as well as an entire article on their language blog. Many programmers are so used to C-like languages that any alternative seems weird, so this question tends to come up a lot...
However, you could argue that the C type declaration syntax itself is odd at best. The pattern-like features for pointers and function types become awkward and unintuitive very quickly, and were never developed as part of, or into, any kind of more general pattern-matching facility. For the sake of familiarity, they were adopted to a greater or lesser degree by many successive C-like languages, but the feature itself sticks out as more of a failed experiment that we have to live with for the sake of backwards compatibility.
One advantage of extricating yourself from C type syntax is that it makes it easier to use types in more places than just declarations. If you can place types conveniently wherever they make sense, you can use your types as annotation, as described in the Swift documentation.

Related

Is there a language where types can take content of fields into account?

I had this crazy idea and was wondering if such a thing exists:
Usually, in a strongly typed language, types are mainly concerned with memory layout, or membership to an abstract 'class'. So class Foo {int a;} and class Bar {int a; int b;} are distinct, but so is class Baz {int a; int b;} (although it has the same layout, it's a different type). So far, so good.
I was wondering if there is a language that allows one to specify more fine grained contraints as to what makes a type. For example, I'd like to have:
class Person {
//...
int height;
}
class RollercoasterSafe: Person (where .height>140) {}
void ride(RollercoasterSafe p) { //... }
and the compiler would make sure that it's impossible to have p.height < 140 in ride. This is just a stupid example, but I'm sure there are use cases where this could really help. Is there such a thing?
It depends on whether the predicate is checked statically or dynamically. In either case the answer is yes, but the resulting systems look different.
On the static end: PL researchers have proposed the notion of a refinement type, which consists of a base type together with a predicate: http://en.wikipedia.org/wiki/Program_refinement. I believe the idea of refinement types is that the predicates are checked at compile time, which means that you have to restrict the language of predicates to something tractable.
It's also possible to express constraints using dependent types, which are types parameterized by run-time values (as opposed to polymorphic types, which are parameterized by other types).
There are other tricks that you can play with powerful type systems like Haskell's, but IIUC you would have to change height from int to something whose structure the type checker could reason about.
On the dynamic end: SQL has something called domains, as in CREATE DOMAIN: http://developer.postgresql.org/pgdocs/postgres/sql-createdomain.html (see the bottom of the page for a simple example), which again consist of a base type and a constraint. The domain's constraint is checked dynamically whenever a value of that domain is created. In general, you can solve the problem by creating a new abstract data type and performing the check whenever you create a new value of the abstract type. If your language allows you to define automatic coercions from and to your new type, then you can use them to essentially implement SQL-like domains; if not, you just live with plain old abstract data types instead.
Then there are contracts, which are not thought of as types per se but can be used in some of the same ways, such as constraining the arguments and results of functions/methods. Simple contracts include predicates (eg, "accepts a Person object with height > 140"), but contracts can also be higher-order (eg, "accepts a Person object whose makeSmallTalk() method never returns null"). Higher-order contracts cannot be checked immediately, so they generally involve creating some kind of proxy. Contract checking does not create a new type of value or tag existing values, so the dynamic check will be repeated every time the contract is performed. Consequently, programmers often put contracts along module boundaries to minimize redundant checks.
An example of a language with such capabilities is Spec#. From the tutorial documentation available on the project site:
Consider the method ISqrt in Fig. 1, which computes the integer square root of
a given integer x. It is possible to implement the method only if x is non-negative, so
int ISqrt(int x)
requires 0 <= x;
ensures result*result <= x && x < (result+1)*(result+1);
{
int r = 0;
while ((r+1)*(r+1) <= x)
invariant r*r <= x;
{
r++;
}
return r;
}
In your case, you could probably do something like (note that I haven't tried this, I'm just reading docs):
void ride(Person p)
requires p.height > 140;
{
//...
}
There may be a way to roll up that requires clause into a type declaration such as RollercoasterSafe that you have suggested.
Your idea sounds somewhat like C++0x's concepts, though not identical. However, concepts have been removed from the C++0x standard.
I don't know any language that supports that kind of thing, but I don't find it really necessary.
I'm pretty convinced that simply applying validation in the setters of the properties may give you all the necessary restrictions.
In your RollercoasterSafe class example, you may throw an exception when the value of height property is set to a value less than 140. It's runtime checking, but polymorphism can make compile-time checking impossible.

Feedback, resources, and information for declarative programming language

I've been thinking of some concepts underlying a new language. It was kind of a toy at first, but now I'm wondering whether it could really mean something. I'm posting this question to Stack Overflow to see if it's been done before, and if I can get any feedback, ideas, or other information.
I began thinking about this mainly after reading Jonathan Edward's presentation on declarative programming. I then mixed it with some of my older ideas and what I've seen in modern languages.
The main idea behind declarative programming is "what" vs. "how." However, I've heard that so many times, so it seems to almost always be like the word "interesting," where it doesn't actually tell you anything, which is frustrating.
In Jonathan Edward's version though, he first began by emphasizing lazy evaluation. This has some interesting consequences, namely functional reactive programming (FRP). Here is an example of FRP with animation (using a syntax I made up):
x as time * 2 // time is some value representing the current time
y as x + (2 * 500)
new Point(x, y)
So here the values just automatically change if the inputs change. In one of my favorite languages, D, there was a distinction between "pure" and "impure" functions. A pure function is a function that had no connection to the outside world and only used other pure functions. Otherwise it'd be impure. The point was that you could always trust a pure function to return the same value for given arguments.
I suppose a similar transitive principle applies here. Our impurity is time. Everything touched by time, being x, thus y, and thus new Point(x, y) are impure. However, notice (2 * 500) is pure. So you see this tells the compiler where its limits are. I think of it like simplifying a math expression with variables:
(x ^ 2) + 3x + 5
(4 ^ 2) + 3x + 5 = 16 + 3x + 5 = 21 + 3x = 3(7 + x)
By telling the compiler what's pure and what's not, we can simplify our programs a lot. Another point is eager or mutable data. Jonathan Edward's recognized the input as being mutable and eager, but the output as functional and lazy. Basically, given new input, the program defined an atomic state change, and then the output would simply be a function of the current state. If you want to see why this can be important, see the presentation. Input is impure. Lazy evaluation helps define the atomic state change. Let's take a look at how a program would be written procedurally:
void main ()
{
String input = "";
writeln("Hello, world!");
writeln("What's your name? ");
input = readln();
writeln("Hello, %s!", input);
writeln("What's your friends name? ");
input = readln();
writeln("Hello to you too, %s!", input);
}
Here the bind keyword says that the following code is executed if begin changes. The mutable keyword says that input is not lazy, but eager. Now let's look at how an "atomic state change" might represent it.
program:
mutable step := 0
bind begin:
writeln("Hello, world!")
writeln("What's your name? ")
++step
bind readln() as input when step = 1:
writeln("Hello, %s!", input)
writeln("What's your friends name? ")
++step
bind readln() as input when step = 2:
writeln("Hello to you too, %s!", input)
Now here we see something that could be made easier, and more readable, for the programmer. First of all is the ugly step variable and how we must increment and test it each time. Here's an example of what a new and improved version might look like:
program:
bind begin:
writeln("Hello, world!")
writeln("What's your name? ")
bind readln() as input:
writeln("Hello, %s!", input)
writeln("What's your friends name? ")
yield // This just means the program jumps to here instead of at the beginning
writeln("Hello to you too, %s!", input)
halt
That's better. Not perfect, though. But if I knew the perfect answer, I wouldn't be here, right?
Here's a better example, using a game engine:
class VideoManager:
bind begin: // Basically a static constructor, will only be called once and at the beginning
// Some video set up stuff
bind end: // Basically a static destructor
// Some video shut down stuff
class Input:
quitEvent as handle // A handle is an empty value, but can be updated so code that's bound to it changes.
keyboardEvent as handle(KeyboardEvent) // This handle does return a value though
mouseEvent as handle(MouseEvent)
// Some other code manages actually updating the handles.
class Sprite:
mutable x := 0
mutable y := 0
bind this.videoManager.updateFrame:
// Draw this sprite
class FieldState:
input as new Input
player as new Sprite
bind input.quitEvent:
halt
bind input.keyboardEvent as e:
if e.type = LEFT:
this.player.x -= 2
else if e.type = RIGHT:
this.player.x += 2
else if e.type = UP:
this.player.y -= 2
else if e.type = DOWN:
this.player.y += 2
I like that this doesn't require callbacks, events, or even loops or anything, and threads are obvious. It's easier to tell what's going on, and it's not just the Python-like syntax. I think it's the kind of thing like when language developers realized that there were only a few things people were using labels and goto's for: conditional branches and loops. So they built if-then-else, while, and for into languages, labels and goto's became deprecated, and the compilers as well as the people could tell what was going on. Most of what we use comes from that process.
Going back to threads, the nice thing about this is that threads are way more flexible. If the compiler is free to do what it wants because we've come closer to saying what we want, not how we want it done. So, the compiler can take advantage of multi-core and distributed processors, yet still compensate for platforms without good threading support.
There's one last thing I'd like to mention. And that is my view on templates. This was kind of a conceptual egg that began developing as I started programming (about 2 years ago, actually), then later began to crack open. Basically it was the principle of abstraction, but it stretched farther than classes and objects.
It had to do with how I perceived a function. For example:
int add (int a, int b)
{
return a + b;
}
Okay, add returned an int, but what was it? It kind of felt like an int waiting to happen. Like a puzzle without a few pieces. There were limited possibilities, and only certain pieces fit, but when you were done you had a finished product that you could then use somewhere else. This is, like I said, the principle of abstraction. Here are some examples of what I think are abstraction + missing pieces -> concrete relationships:
function + arguments -> value
abstract class + methods -> class
class + instance values -> object
template + arguments -> function or class
program + input + state -> output
They're all closely related. It seems this could be taken advantage of. But how? Again, that's why this is a question. But lazy evaluation is interesting here, since you can pass something with its pieces still missing to something else. To the compiler it's mostly a matter of dereferencing names down to the impurities. Like my example from above:
(x ^ 2) + 3x + 5
(4 ^ 2) + 3x + 5 = 16 + 3x + 5 = 21 + 3x = 3(7 + x)
The more pieces you give the compiler, the more it can finish it and reduce the program to its essential core. And the add function above would be resolved at compile time, automatically, since it relies on no outside resources. Even lots of classes and objects could be resolved, and huge portions of programs, depending on how smart the compiler is.
That's all for now. If you've seen examples of these things already done, I'd like to see. And if you have any ideas, innovation, resources, or feedback, I'd appreciate that too.
You'd definitely want to take a look at the Haskell programming language.
Haskell is extremely declarative, lazy-evaluation is built-in and even functional reactive programming libraries exist. But most notably, Haskell is purely functional, i.e. everything, really everything, is pure.
So the question is how does Haskell deal with the necessary impurities that arise through any IO.
The answer fits quite well to the thoughts you presented. Haskell uses a mathematical construct called monads that basically represent a computation producing some value along with a function bind (>>= as an infix operator), that sequences such computations.
So let's take some IO example: Read a line and output your name ... Even IO is pure, so you cannot simply run something. Instead, you build up a greater IO computation
do
putStr "Enter your name: "
name <- getLine
putStrLn ("Hello " ++ name)
Looks quite imperative, but under the hood, it's just syntax for
(putStr "Enter your name: ") >>
(getLine >>= \name ->
putStrLn ("Hello " ++ name))
Now you can define this bind/>>= for arbitrary kinds of computations in any way you like. So in fact everything you talked about can be implemented in this way - even FRP.
Just try looking for monads or Haskell here on Stackoverflow; there have been many questions to this topic.
And after all, it's still everything type-checked and thus correctness can be enforced by the compiler.

Why do a lot of programming languages put the type *after* the variable name?

I just came across this question in the Go FAQ, and it reminded me of something that's been bugging me for a while. Unfortunately, I don't really see what the answer is getting at.
It seems like almost every non C-like language puts the type after the variable name, like so:
var : int
Just out of sheer curiosity, why is this? Are there advantages to choosing one or the other?
There is a parsing issue, as Keith Randall says, but it isn't what he describes. The "not knowing whether it is a declaration or an expression" simply doesn't matter - you don't care whether it's an expression or a declaration until you've parsed the whole thing anyway, at which point the ambiguity is resolved.
Using a context-free parser, it doesn't matter in the slightest whether the type comes before or after the variable name. What matters is that you don't need to look up user-defined type names to understand the type specification - you don't need to have understood everything that came before in order to understand the current token.
Pascal syntax is context-free - if not completely, at least WRT this issue. The fact that the variable name comes first is less important than details such as the colon separator and the syntax of type descriptions.
C syntax is context-sensitive. In order for the parser to determine where a type description ends and which token is the variable name, it needs to have already interpreted everything that came before so that it can determine whether a given identifier token is the variable name or just another token contributing to the type description.
Because C syntax is context-sensitive, it very difficult (if not impossible) to parse using traditional parser-generator tools such as yacc/bison, whereas Pascal syntax is easy to parse using the same tools. That said, there are parser generators now that can cope with C and even C++ syntax. Although it's not properly documented or in a 1.? release etc, my personal favorite is Kelbt, which uses backtracking LR and supports semantic "undo" - basically undoing additions to the symbol table when speculative parses turn out to be wrong.
In practice, C and C++ parsers are usually hand-written, mixing recursive descent and precedence parsing. I assume the same applies to Java and C#.
Incidentally, similar issues with context sensitivity in C++ parsing have created a lot of nasties. The "Alternative Function Syntax" for C++0x is working around a similar issue by moving a type specification to the end and placing it after a separator - very much like the Pascal colon for function return types. It doesn't get rid of the context sensitivity, but adopting that Pascal-like convention does make it a bit more manageable.
the 'most other' languages you speak of are those that are more declarative. They aim to allow you to program more along the lines you think in (assuming you aren't boxed into imperative thinking).
type last reads as 'create a variable called NAME of type TYPE'
this is the opposite of course to saying 'create a TYPE called NAME', but when you think about it, what the value is for is more important than the type, the type is merely a programmatic constraint on the data
If the name of the variable starts at column 0, it's easier to find the name of the variable.
Compare
QHash<QString, QPair<int, QString> > hash;
and
hash : QHash<QString, QPair<int, QString> >;
Now imagine how much more readable your typical C++ header could be.
In formal language theory and type theory, it's almost always written as var: type. For instance, in the typed lambda calculus you'll see proofs containing statements such as:
x : A y : B
-------------
\x.y : A->B
I don't think it really matters, but I think there are two justifications: one is that "x : A" is read "x is of type A", the other is that a type is like a set (e.g. int is the set of integers), and the notation is related to "x ε A".
Some of this stuff pre-dates the modern languages you're thinking of.
An increasing trend is to not state the type at all, or to optionally state the type. This could be a dynamically typed langauge where there really is no type on the variable, or it could be a statically typed language which infers the type from the context.
If the type is sometimes given and sometimes inferred, then it's easier to read if the optional bit comes afterwards.
There are also trends related to whether a language regards itself as coming from the C school or the functional school or whatever, but these are a waste of time. The languages which improve on their predecessors and are worth learning are the ones that are willing to accept input from all different schools based on merit, not be picky about a feature's heritage.
"Those who cannot remember the past are condemned to repeat it."
Putting the type before the variable started innocuously enough with Fortran and Algol, but it got really ugly in C, where some type modifiers are applied before the variable, others after. That's why in C you have such beauties as
int (*p)[10];
or
void (*signal(int x, void (*f)(int)))(int)
together with a utility (cdecl) whose purpose is to decrypt such gibberish.
In Pascal, the type comes after the variable, so the first examples becomes
p: pointer to array[10] of int
Contrast with
q: array[10] of pointer to int
which, in C, is
int *q[10]
In C, you need parentheses to distinguish this from int (*p)[10]. Parentheses are not required in Pascal, where only the order matters.
The signal function would be
signal: function(x: int, f: function(int) to void) to (function(int) to void)
Still a mouthful, but at least within the realm of human comprehension.
In fairness, the problem isn't that C put the types before the name, but that it perversely insists on putting bits and pieces before, and others after, the name.
But if you try to put everything before the name, the order is still unintuitive:
int [10] a // an int, ahem, ten of them, called a
int [10]* a // an int, no wait, ten, actually a pointer thereto, called a
So, the answer is: A sensibly designed programming language puts the variables before the types because the result is more readable for humans.
I'm not sure, but I think it's got to do with the "name vs. noun" concept.
Essentially, if you put the type first (such as "int varname"), you're declaring an "integer named 'varname'"; that is, you're giving an instance of a type a name. However, if you put the name first, and then the type (such as "varname : int"), you're saying "this is 'varname'; it's an integer". In the first case, you're giving an instance of something a name; in the second, you're defining a noun and stating that it's an instance of something.
It's a bit like if you were defining a table as a piece of furniture; saying "this is furniture and I call it 'table'" (type first) is different from saying "a table is a kind of furniture" (type last).
It's just how the language was designed. Visual Basic has always been this way.
Most (if not all) curly brace languages put the type first. This is more intuitive to me, as the same position also specifies the return type of a method. So the inputs go into the parenthesis, and the output goes out the back of the method name.
I always thought the way C does it was slightly peculiar: instead of constructing types, the user has to declare them implicitly. It's not just before/after the variable name; in general, you may need to embed the variable name among the type attributes (or, in some usage, to embed an empty space where the name would be if you were actually declaring one).
As a weak form of pattern-matching, it is intelligable to some extent, but it doesn't seem to provide any particular advantages, either. And, trying to write (or read) a function pointer type can easily take you beyond the point of ready intelligability. So overall this aspect of C is a disadvantage, and I'm happy to see that Go has left it behind.
Putting the type first helps in parsing. For instance, in C, if you declared variables like
x int;
When you parse just the x, then you don't know whether x is a declaration or an expression. In contrast, with
int x;
When you parse the int, you know you're in a declaration (types always start a declaration of some sort).
Given progress in parsing languages, this slight help isn't terribly useful nowadays.
Fortran puts the type first:
REAL*4 I,J,K
INTEGER*4 A,B,C
And yes, there's a (very feeble) joke there for those familiar with Fortran.
There is room to argue that this is easier than C, which puts the type information around the name when the type is complex enough (pointers to functions, for example).
What about dynamically (cheers #wcoenen) typed languages? You just use the variable.

In Functional Programming, is it considered a bad practice to have incomplete pattern matchings

Is it generally considered a bad practice to use non-exhaustive pattern machings in functional languages like Haskell or F#, which means that the cases specified don't cover all possible input cases?
In particular, should I allow code to fail with a MatchFailureException etc. or should I always cover all cases and explicitly throw an error if necessary?
Example:
let head (x::xs) = x
Or
let head list =
match list with
| x::xs -> x
| _ -> failwith "Applying head to an empty list"
F# (unlike Haskell) gives a warning for the first code, since the []-case is not covered, but can I ignore it without breaking functional style conventions for the sake of succinctness? A MatchFailure does state the problem quite well after all ...
If you complete your pattern-matchings with a constructor [] and not the catch-all _, the compiler will have a chance to tell you to look again at the function with a warning the day someone adds a third constructor to lists.
My colleagues and I, working on a large OCaml project (200,000+ lines), force ourselves to avoid partial pattern-matching warnings (even if that means writing | ... -> assert false from time to time) and to avoid so-called "fragile pattern-matchings" (pattern matchings written in such a way that the addition of a constructor may not be detected) too. We consider that the maintainability benefits.
Explicit is better than implicit (borrowed from the Zen of Python ;))
It's exactly the same as in a C switch over an enum... It's better to write all the cases (with a fall through) rather than just putting a default, because the compiler will tell you if you add new elements to the enumeration and you forgot to handle them.
I think that it depends quite a bit on the context. Are you trying to write robust, easy to debug code, or are you trying to write something simple and succinct?
If I were working on a long term project with multiple developers, I'd put in the assert to give a more useful error message. I also agree with Pascal's comment that not using a wildcard would be ideal from a software engineering perspective.
If I were working on a smaller scale project on which I was the only developer, I wouldn't think twice about using an incomplete match. If necessary, you can always check the compiler warnings.
I think it also depends a bit on the types you're matching against. Realistically, no extra union cases will be added to the list type, so you don't need to worry about fragile matching. On the other hand, in code that you control and are actively working on, there may well be types which are in flux and have additional union cases added, which means that protecting against fragile matching may be worth it.
This is a special case of a more general question, which is "should you ever create partial functions". Incomplete pattern matches are only one example of partial functions.
As a rule total functions are preferable. When you find yourself looking at a function that just has to be partial, ask yourself if you can solve the problem in the type system first. Sometimes that is more trouble than its worth (e.g. creating a whole type of lists with known lengths just to avoid the "head []" problem). So its a trade-off.
Or maybe you just asking whether its good practice in partial functions to say things like
head [] = error "head: empty list"
In which case the answer is YES!
The Haskell prelude (standard functions) contains many partial functions, e.g. head and tail only work on non-empty lists, but don't ask me why.
This question has two aspects.
For the user of the API, failwith... simply throws a System.Exception, which is unspecific (and therefore is sometimes considered a bad practice in itself). On the other hand, the implicitly thrown MatchFailureException can be specifically caught using a type test pattern, and therefore is preferrable.
For the reviewer of the implementation code, failwith... clearly documents that the implementer has at least given some thought about the possible cases, and therefore is preferrable.
As the two aspects contradict each other, the right answer depends on the circumstances (see also kvb's answer). A solution which is 100% "correct" from any point of view would have to
deal with every case explicitly,
throw a specific exception where necessary, and
clearly document the exception
Example:
/// <summary>Gets the first element of the list.</summary>
/// <exception cref="ArgumentException">The list is empty.</exception>
let head list =
match list with
| [] -> invalidArg "list" "The list is empty."
| x::xs -> x

Why is the 'if' statement considered evil?

I just came from Simple Design and Testing Conference. In one of the session we were talking about evil keywords in programming languages. Corey Haines, who proposed the subject, was convinced that if statement is absolute evil. His alternative was to create functions with predicates. Can you please explain to me why if is evil.
I understand that you can write very ugly code abusing if. But I don't believe that it's that bad.
The if statement is rarely considered as "evil" as goto or mutable global variables -- and even the latter are actually not universally and absolutely evil. I would suggest taking the claim as a bit hyperbolic.
It also largely depends on your programming language and environment. In languages which support pattern matching, you will have great tools for replacing if at your disposal. But if you're programming a low-level microcontroller in C, replacing ifs with function pointers will be a step in the wrong direction. So, I will mostly consider replacing ifs in OOP programming, because in functional languages, if is not idiomatic anyway, while in purely procedural languages you don't have many other options to begin with.
Nevertheless, conditional clauses sometimes result in code which is harder to manage. This does not only include the if statement, but even more commonly the switch statement, which usually includes more branches than a corresponding if would.
There are cases where it's perfectly reasonable to use an if
When you are writing utility methods, extensions or specific library functions, it's likely that you won't be able to avoid ifs (and you shouldn't). There isn't a better way to code this little function, nor make it more self-documented than it is:
// this is a good "if" use-case
int Min(int a, int b)
{
if (a < b)
return a;
else
return b;
}
// or, if you prefer the ternary operator
int Min(int a, int b)
{
return (a < b) ? a : b;
}
Branching over a "type code" is a code smell
On the other hand, if you encounter code which tests for some sort of a type code, or tests if a variable is of a certain type, then this is most likely a good candidate for refactoring, namely replacing the conditional with polymorphism.
The reason for this is that by allowing your callers to branch on a certain type code, you are creating a possibility to end up with numerous checks scattered all over your code, making extensions and maintenance much more complex. Polymorphism on the other hand allows you to bring this branching decision as closer to the root of your program as possible.
Consider:
// this is called branching on a "type code",
// and screams for refactoring
void RunVehicle(Vehicle vehicle)
{
// how the hell do I even test this?
if (vehicle.Type == CAR)
Drive(vehicle);
else if (vehicle.Type == PLANE)
Fly(vehicle);
else
Sail(vehicle);
}
By placing common but type-specific (i.e. class-specific) functionality into separate classes and exposing it through a virtual method (or an interface), you allow the internal parts of your program to delegate this decision to someone higher in the call hierarchy (potentially at a single place in code), allowing much easier testing (mocking), extensibility and maintenance:
// adding a new vehicle is gonna be a piece of cake
interface IVehicle
{
void Run();
}
// your method now doesn't care about which vehicle
// it got as a parameter
void RunVehicle(IVehicle vehicle)
{
vehicle.Run();
}
And you can now easily test if your RunVehicle method works as it should:
// you can now create test (mock) implementations
// since you're passing it as an interface
var mock = new Mock<IVehicle>();
// run the client method
something.RunVehicle(mock.Object);
// check if Run() was invoked
mock.Verify(m => m.Run(), Times.Once());
Patterns which only differ in their if conditions can be reused
Regarding the argument about replacing if with a "predicate" in your question, Haines probably wanted to mention that sometimes similar patterns exist over your code, which differ only in their conditional expressions. Conditional expressions do emerge in conjunction with ifs, but the whole idea is to extract a repeating pattern into a separate method, leaving the expression as a parameter. This is what LINQ already does, usually resulting in cleaner code compared to an alternative foreach:
Consider these two very similar methods:
// average male age
public double AverageMaleAge(List<Person> people)
{
double sum = 0.0;
int count = 0;
foreach (var person in people)
{
if (person.Gender == Gender.Male)
{
sum += person.Age;
count++;
}
}
return sum / count; // not checking for zero div. for simplicity
}
// average female age
public double AverageFemaleAge(List<Person> people)
{
double sum = 0.0;
int count = 0;
foreach (var person in people)
{
if (person.Gender == Gender.Female) // <-- only the expression
{ // is different
sum += person.Age;
count++;
}
}
return sum / count;
}
This indicates that you can extract the condition into a predicate, leaving you with a single method for these two cases (and many other future cases):
// average age for all people matched by the predicate
public double AverageAge(List<Person> people, Predicate<Person> match)
{
double sum = 0.0;
int count = 0;
foreach (var person in people)
{
if (match(person)) // <-- the decision to match
{ // is now delegated to callers
sum += person.Age;
count++;
}
}
return sum / count;
}
var males = AverageAge(people, p => p.Gender == Gender.Male);
var females = AverageAge(people, p => p.Gender == Gender.Female);
And since LINQ already has a bunch of handy extension methods like this, you actually don't even need to write your own methods:
// replace everything we've written above with these two lines
var males = list.Where(p => p.Gender == Gender.Male).Average(p => p.Age);
var females = list.Where(p => p.Gender == Gender.Female).Average(p => p.Age);
In this last LINQ version the if statement has "disappeared" completely, although:
to be honest the problem wasn't in the if by itself, but in the entire code pattern (simply because it was duplicated), and
the if still actually exists, but it's written inside the LINQ Where extension method, which has been tested and closed for modification. Having less of your own code is always a good thing: less things to test, less things to go wrong, and the code is simpler to follow, analyze and maintain.
Huge runs of nested if/else statements
When you see a function spanning 1000 lines and having dozens of nested if blocks, there is an enormous chance it can be rewritten to
use a better data structure and organize the input data in a more appropriate manner (e.g. a hashtable, which will map one input value to another in a single call),
use a formula, a loop, or sometimes just an existing function which performs the same logic in 10 lines or less (e.g. this notorious example comes to my mind, but the general idea applies to other cases),
use guard clauses to prevent nesting (guard clauses give more confidence into the state of variables throughout the function, because they get rid of exceptional cases as soon as possible),
at least replace with a switch statement where appropriate.
Refactor when you feel it's a code smell, but don't over-engineer
Having said all this, you should not spend sleepless nights over having a couple of conditionals now and there. While these answers can provide some general rules of thumb, the best way to be able to detect constructs which need refactoring is through experience. Over time, some patterns emerge that result in modifying the same clauses over and over again.
There is another sense in which if can be evil: when it comes instead of polymorphism.
E.g.
if (animal.isFrog()) croak(animal)
else if (animal.isDog()) bark(animal)
else if (animal.isLion()) roar(animal)
instead of
animal.emitSound()
But basically if is a perfectly acceptable tool for what it does. It can be abused and misused of course, but it is nowhere near the status of goto.
A good quote from Code Complete:
Code as if whoever maintains your program is a violent psychopath who
knows where you live.
— Anonymous
IOW, keep it simple. If the readability of your application will be enhanced by using a predicate in a particular area, use it. Otherwise, use the 'if' and move on.
I think it depends on what you're doing to be honest.
If you have a simple if..else statement, why use a predicate?
If you can, use a switch for larger if replacements, and then if the option to use a predicate for large operations (where it makes sense, otherwise your code will be a nightmare to maintain), use it.
This guy seems to have been a bit pedantic for my liking. Replacing all if's with Predicates is just crazy talk.
There is the Anti-If campaign which started earlier in the year. The main premise being that many nested if statements often can often be replaced with polymorphism.
I would be interested to see an example of using the Predicate instead. Is this more along the lines of functional programming?
Just like in the bible verse about money, if statements are not evil -- the LOVE of if statements is evil. A program without if statements is a ridiculous idea, and using them as necessary is essential. But a program that has 100 if-else if blocks in a row (which, sadly, I have seen) is definitely evil.
I have to say that I recently have begun to view if statements as a code smell: especially when you find yourself repeating the same condition several times. But there's something you need to understand about code smells: they don't necessarily mean that the code is bad. They just mean that there's a good chance the code is bad.
For instance, comments are listed as a code smell by Martin Fowler, but I wouldn't take anyone seriously who says "comments are evil; don't use them".
Generally though, I prefer to use polymorphism instead of if statements where possible. That just makes for so much less room for error. I tend to find that a lot of the time, using conditionals leads to a lot of tramp arguments as well (because you have to pass the data needed to form the conditional on to the appropriate method).
if is not evil(I also hold that assigning morality to code-writing practices is asinine...).
Mr. Haines is being silly and should be laughed at.
I'll agree with you; he was wrong. You can go too far with things like that, too clever for your own good.
Code created with predicates instead of ifs would be horrendous to maintain and test.
Predicates come from logical/declarative programming languages, like PROLOG. For certain classes of problems, like constraint solving, they are arguably superior to a lot of drawn out step-by-step if-this-do-that-then-do-this crap. Problems that would be long and complex to solve in imperative languages can be done in just a few lines in PROLOG.
There's also the issue of scalable programming (due to the move towards multicore, the web, etc.). If statements and imperative programming in general tend to be in step-by-step order, and not scaleable. Logical declarations and lambda calculus though, describe how a problem can be solved, and what pieces it can be broken down into. As a result, the interpreter/processor executing that code can efficiently break the code into pieces, and distribute it across multiple CPUs/cores/threads/servers.
Definitely not useful everywhere; I'd hate to try writing a device driver with predicates instead of if statements. But yes, I think the main point is probably sound, and worth at least getting familiar with, if not using all the time.
The only problem with a predicates (in terms of replacing if statements) is that you still need to test them:
function void Test(Predicate<int> pr, int num)
{
if (pr(num))
{ /* do something */ }
else
{ /* do something else */ }
}
You could of course use the terniary operator (?:), but that's just an if statement in disguise...
Perhaps with quantum computing it will be a sensible strategy to not use IF statements but to let each leg of the computation proceed and only have the function 'collapse' at termination to a useful result.
Sometimes it's necessary to take an extreme position to make your point. I'm sure this person uses if -- but every time you use an if, it's worth having a little think about whether a different pattern would make the code clearer.
Preferring polymorphism to if is at the core of this. Rather than:
if(animaltype = bird) {
squawk();
} else if(animaltype = dog) {
bark();
}
... use:
animal.makeSound();
But that supposes that you've got an Animal class/interface -- so really what the if is telling you, is that you need to create that interface.
So in the real world, what sort of ifs do we see that lead us to a polymorphism solution?
if(logging) {
log.write("Did something");
}
That's really irritating to see throughout your code. How about, instead, having two (or more) implementations of Logger?
this.logger = new NullLogger(); // logger.log() does nothing
this.logger = new StdOutLogger(); // logger.log() writes to stdout
That leads us to the Strategy Pattern.
Instead of:
if(user.getCreditRisk() > 50) {
decision = thoroughCreditCheck();
} else if(user.getCreditRisk() > 20) {
decision = mediumCreditCheck();
} else {
decision = cursoryCreditCheck();
}
... you could have ...
decision = getCreditCheckStrategy(user.getCreditRisk()).decide();
Of course getCreditCheckStrategy() might contain an if -- and that might well be appropriate. You've pushed it into a neat place where it belongs.
It probably comes down to a desire to keep code cyclomatic complexity down, and to reduce the number of branch points in a function. If a function is simple to decompose into a number of smaller functions, each of which can be tested, you can reduce the complexity and make code more easily testable.
IMO:
I suspect he was trying to provoke a debate and make people think about the misuse of 'if'. No one would seriously suggest such a fundamental construction of programming syntax was to be completely avoided would they?
Good that in ruby we have unless ;)
But seriously probably if is the next goto, that even if most of the people think it is evil in some cases is simplifying/speeding up the things (and in some cases like low level highly optimized code it's a must).
I think If statements are evil, but If expressions are not. What I mean by an if expression in this case can be something like the C# ternary operator (condition ? trueExpression : falseExpression). This is not evil because it is a pure function (in a mathematical sense). It evaluates to a new value, but it has no effects on anything else. Because of this, it works in a substitution model.
Imperative If statements are evil because they force you to create side-effects when you don't need to. For an If statement to be meaningful, you have to produce different "effects" depending on the condition expression. These effects can be things like IO, graphic rendering or database transactions, which change things outside of the program. Or, it could be assignment statements that mutate the state of the existing variables. It is usually better to minimize these effects and separate them from the actual logic. But, because of the If statements, we can freely add these "conditionally executed effects" everywhere in the code. I think that's bad.
If is not evil! Consider ...
int sum(int a, int b) {
return a + b;
}
Boring, eh? Now with an added if ...
int sum(int a, int b) {
if (a == 0 && b == 0) {
return 0;
}
return a + b;
}
... your code creation productivity (measured in LOC) is doubled.
Also code readability has improved much, for now you can see in the blink of an eye what the result is when both argument are zero. You couldn't do that in the code above, could you?
Moreover you supported the testteam for they now can push their code coverage test tools use up more to the limits.
Furthermore the code now is better prepared for future enhancements. Let's guess, for example, the sum should be zero if one of the arguments is zero (don't laugh and don't blame me, silly customer requirements, you know, and the customer is always right).
Because of the if in the first place only a slight code change is needed.
int sum(int a, int b) {
if (a == 0 || b == 0) {
return 0;
}
return a + b;
}
How much more code change would have been needed if you hadn't invented the if right from the start.
Thankfulness will be yours on all sides.
Conclusion: There's never enough if's.
There you go. To.

Resources