Groovy for loop execution time - groovy

O Groovy Gurus,
This code snippet runs in around 1 second
for (int i in (1..10000000)) {
j = i;
}
while this one takes almost 9 second
for (int i = 1; i < 10000000; i++) {
j = i;
}
Why is it so?

Ok. Here is my take on why?
If you convert both scripts to bytecode, you will notice that
ForInLoop uses Range. Iterator is used to advance during each loop. Comparison (<) is made directly to int (or Integer) to determine whether the exit condition has been met or not
ForLoop uses traditional increment, check condition, and perform action. For checking condition i < 10000000 it uses Groovy's ScriptBytecodeAdapter.compareLessThan. If you dig deep into that method's code, you will find both sides of comparison is taken in as Object and there are so many things going on, casting, comparing them as object, etc.
ScriptBytecodeAdapter.compareLessThan --> ScriptBytecodeAdapter.compareTo --> DefaultTypeTransformation.compareTo
There are other classes in typehandling package which implements compareTo method specifically for math data types, not sure why they are not being used, (if they are not being used)
I am suspecting that is the reason second loop is taking longer.
Again, please correct me if I am wrong or missing something...

In your testing, be sure to "warm" the JVM up before taking the measure, otherwise you may wind up triggering various startup actions in the platform (class loading, JIT compilation). Run your tests many times in a row too. Also, if you did the second test while a garbage collect was going on, that might have an impact. Try running each of your tests 100 times and print out the times after each test, and see what that tells you.

If you can eliminate potential artifacts from startup time as Jim suggests, then I'd hazard a guess that the Java-style for loop in Groovy is not so well implemented as the original Groovy-style for loop. It was only added as of v1.5 after user requests, so perhaps its implementation was a bit of an afterthought.
Have you taken a look at the bytecode generated for your two examples to see if there are any differences? There was a discussion about Groovy performance here in which one of the comments (from one 'johnchase') says this:
I wonder if the difference you saw related to how Groovy uses numbers (primitives) - since it wraps all primitives in their equivalent Java wrapper classes (int -> Integer), I’d imagine that would slow things down quite a bit. I’d be interested in seeing the performance of Java code that loops 10,000,000 using the wrapper classes instead of ints.
So perhaps the original Groovy for loop does not suffer from this? Just speculation on my part really though.

Related

How would I know if Python creates a new sublist in memory for: `for item in nums[1:]`

I'm not asking for an answer to the question, but rather how I, on my own, could have gotten the answer.
Original Question:
Does the following code cause Python to make a new list of size (len(nums) - 1) in memory that then gets iterated over?
for item in nums[1:]:
# do stuff with item
Original Answer
A similarish question is asked here and there is a subcomment by Srinivas Reddy Thatiparthy saying that a new sublist is created.
But, there is no detail given about how he arrived at this answer, which I think makes it very different from what I'm looking for.
Question:
How could I have figured out on my own what the answer to my question is?
I've had similar questions before. For instance, I learned that if I do my_function(nums[1:]), I don't pass in a "slice" but rather a completely new, different sublist! I found this out by just testing whether the original list passed into my_function was modified post-function (it wasn't).
But I don't see an immediate way to figure out if Python is making a new sublist for the for loop example. Please help me to know how to do this.
side note
By the way, this is the current solution I'm using from the original stackoverflow post solutions:
for indx, item in enumerate(nums):
if indx == 0:
continue
# do stuff w items
In general, the easy way to learn if you have a new chunk of data or just a new reference to an existing chunk of data is to modify the data through one reference, and then see if it is also modified through the other. (It sounds like that's "the hard way" you did, but I would recommend it as a general technique.) Some psuedocode would look like:
function areSameRef(thing1, thing2){
thing1.modify()
return thing1.equals(thing2) //make sure this is not just a referential equality check
}
It is very rare that this will fail, and essentially requires behind-the-scenes optimizations where data isn't cloned immediately but only when modified. In this case the fact that the underlying data is the same is being hidden from you, and in most cases, you should just trust that whoever did the hiding knows what they're doing. Exceptions are if they did it wrong, or if you have some complex performance issues. For that you may need to turn to more language-specific debugging or profiling tools. (See below for more)
Do also be careful about cases where part of the data may be shared - for instance, look up cons lists and tail sharing. In those cases if you do something like:
function foo(list1, list2){
list1.append(someElement)
return list1.length == list2.length
}
will return false - the element is only added to the first list, but something like
function bar(list1, list2){
list1.set(someIndex, someElement)
return list1.get(someIndex)==list2.get(someIndex)
}
will return true (though in practice, lists created that way usually don't have an interface that allows mutability.)
I don't see a question in part 2, but yes, your conclusion looks valid to me.
EDIT: More on actual memory usage
As you pointed out, there are situations where that sort of test won't work because you don't actually have two references, as in the for i in [nums 1:] case. In that case I would say turn to a profiler, but you couldn't really trust the results.
The reason for that comes down to how compilers/interpreters work, and the contract they fulfill in the language specification. The general rule is that the interpreter is allowed to re-arrange and modify the execution of your code in any way that does not change the results, but may change the memory or time performance. So, if the state of your code and all the I/O are the same, it should not be possible for foo(5) to return 6 in one interpreter implementation/execution and 7 in another, but it is valid for them to take very different amounts of time and memory.
This matters because a lot of what interpreters and compilers do is behind-the-scenes optimizations; they will try to make your code run as fast as possible and with as small a memory footprint as possible, so long as the results are the same. However, it can only do so when it can prove that the changes will not modify the results.
This means that if you write a simple test case, the interpreter may optimize it behind the scenes to minimize the memory usage and give you one result - "no new list is created." But, if you try to trust that result in real code, the real code may be too complex for the compiler to tell if the optimization is safe, and it may fail. It can also depend upon the specific interpreter version, environmental variables or available hardware resources.
Here's an example:
def foo(x : int):
l = range(9999)
return 5
def bar(x:int):
l = range(9999)
if (x + 1 != (x*2+2)/2):
return l[x]
else:
return 5
I can't promise this for any particular language, but I would usually expect foo and bar to have much different memory usages. In foo, any moderately-well-created interpreter should be able to tell that l is never referenced before it goes out of scope, and thus can freely skip actually allocating any memory at all as a safe operation. In bar (unless I failed at arithmetic), l will never be used either - but knowing that requires some reasoning about the condition of the if statement. It takes a much smarter interpreter to recognize that, so even though these two code snippets might look the same logically, they can have very different behind-the-scenes performances.
EDIT: As has been pointed out to my, Python specifically may not be able to optimize either of these, given the dynamic nature of the language; the range function and the list type may both have been re-assigned or altered from elsewhere in the code. Without specific expertise in the python optimization world I can't say what they do or don't do. Anyway I'm leaving this here for edification on the general concept of optimizations, but take my error as a case lesson in "reasoning about optimization is hard".
All of that being said: FWIW, I strongly suspect that the python interpreter is smart enough to recognize that for i in nums[1:] should not actually allocate new memory, but just iterate over a slice. That looks to my eyes to be a relatively simple, safe and valuable transformation on a very common use case, so I would expect the (highly optimized) python interpreter to handle it.
EDIT2: As a final (opinionated) note, I'm less confident about that in Python than I am in almost any other language, because Python syntax is so flexible and allows so many strange things. This makes it much more difficult for the python interpreter (or a human, for that matter) to say anything with confidence, because the space of "legal python code" is so large. This is a big part of why I prefer much stricter languages like Rust, which force the programmer to color inside the lines but result in much more predictable behaviors.
EDIT3: As a post-final note, usually for things like this it's best to trust that the execution environment is handling these sorts of low-level optimizations. Nine times out of ten, don't try to solve this kind of performance problem until something actually breaks.
As for knowing how list slice works, from the language reference Sequence Types — list, tuple, range, we know that
s[i:j] - The slice of s from i to j is defined as the sequence of
items with index k such that i <= k < j.
So, the slice creates a new sequence but we don't know whether that sequence is a list or whether there is some clever way that the same list object somehow represents both of these sequences. That's not too surprising with the python language spec where lists are described as part of the general discussion of sequences and the spec never really tries to cover all of the details for object implementation.
That's because in the end, something like nums[1:] is really just syntactic sugar for nums.__getitem__(slice(1, None)), meaning that lists get to decide for themselves what slicing means. And you need to go to the source for the implementation. See the list_subscript function in listobject.c.
But we can experiment. Looking at the doucmentation for The for statement,
for_stmt ::= "for" target_list "in" starred_list ":" suite
["else" ":" suite]
The starred_list expression is evaluated once; it should yield an iterable object.
So, nums[1:] is an expression that must yield an iterable object and we can assign that object to an intermediate variable.
nums = [1 ,2, 3]
tmp = nums[1:]
for item in tmp:
pass
tmp[0] = "new stuff"
assert id(nums) != id(tmp), "List slice creates a new object"
assert type(tmp) == type(nums), "List slice creates a new list"
assert 999 not in nums, "List slice doesn't affect original"
Run that, and if neither assertion error is raised, you know that a new list was created.
Other sequence-like objects may work radically different. In a numpy array, for instance, two array objects may indeed reference the same memory. In this example, that final assert will be raised because the slice is another view into the same array. Yes, this can keep you up all night.
import numpy as np
nums = np.array([1,2,3])
tmp = nums[1:]
for item in tmp:
pass
tmp[0] = 999
assert id(nums) != id(tmp), "array slice creates a new object"
assert type(tmp) == type(nums), "array slice creates a new list"
assert 999 not in nums, "array slice doesn't affect original"
You can use the new Walrus operator := to capture the temporary object created by Python for the slice. A little investigation demonstrates that they aren't the same object.
import sys
print(sys.version)
a = list(range(1000))
for i in (b := a[1:]):
b[0] = 906
print(b is a)
print(a[:10])
print(b[:10])
print(sys.getsizeof(a))
print(sys.getsizeof(b))
Generates the following output:
3.11.0 (main, Nov 4 2022, 00:14:47) [GCC 7.5.0]
False
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[906, 2, 3, 4, 5, 6, 7, 8, 9, 10]
8056
8048
See for yourself on the Godbolt Compiler Explorer where you can also see the compiler generated code.

Why would more array accesses perform better?

I'm taking a course on coursera that uses minizinc. In one of the assignments, I was spinning my wheels forever because my model was not performing well enough on a hidden test case. I finally solved it by changing the following types of accesses in my model
from
constraint sum(neg1,neg2 in party where neg1 < neg2)(joint[neg1,neg2]) >= m;
to
constraint sum(i,j in 1..u where i < j)(joint[party[i],party[j]]) >= m;
I dont know what I'm missing, but why would these two perform any differently from eachother? It seems like they should perform similarly with the former being maybe slightly faster, but the performance difference was dramatic. I'm guessing there is some sort of optimization that the former misses out on? Or, am I really missing something and do those lines actually result in different behavior? My intention is to sum the strength of every element in raid.
Misc. Details:
party is an array of enum vars
party's index set is 1..real_u
every element in party should be unique except for a dummy variable.
solver was Gecode
verification of my model was done on a coursera server so I don't know what optimization level their compiler used.
edit: Since minizinc(mz) is a declarative language, I'm realizing that "array accesses" in mz don't necessarily have a direct corollary in an imperative language. However, to me, these two lines mean the same thing semantically. So I guess my question is more "Why are the above lines different semantically in mz?"
edit2: I had to change the example in question, I was toting the line of violating coursera's honor code.
The difference stems from the way in which the where-clause "a < b" is evaluated. When "a" and "b" are parameters, then the compiler can already exclude the irrelevant parts of the sum during compilation. If "a" or "b" is a variable, then this can usually not be decided during compile time and the solver will receive a more complex constraint.
In this case the solver would have gotten a sum over "array[int] of var opt int", meaning that some variables in an array might not actually be present. For most solvers this is rewritten to a sum where every variable is multiplied by a boolean variable, which is true iff the variable is present. You can understand how this is less efficient than an normal sum without multiplications.

Using a Capture like a Slurpy

I've been reading about Captures and this paragraph intrigued me:
Inside a Signature, a Capture may be created by prefixing a sigilless
parameter with a vertical bar |. This packs the remainder of the
argument list into that parameter.
This sounds a lot like a **# (non flattening) slurpy, so this concocted this test code:
my $limit=1_000_000;
my #a=1 xx 10000;
my #b=-1 xx 10000;
sub test1(|c){
1;
};
sub test2(**#c){
1;
};
{
for ^$limit {
test1(#b,#a);
}
say now - ENTER now;
}
{
for ^$limit {
test2(#b,#a);
}
say now - ENTER now;
}
A sample run gives durations of each test block:
0.82560328
2.6650674
The Capture certainly seems to have the performance advantage. Is there a down side to using a Capture as a slurpy in this fashion? Have I over simplified the comparison?
A Capture has two slots, holding a VM-level array (positional arguments) and hash (named arguments). It is quite cheaply constructed, and - since |c style arguments are quite common in various bits of the internals - has been well optimized. Since a capture parameter slurps up both positional and named arguments, any named arguments will be silently ignored. That's probably not much of an issue for methods, where unrequired named arguments will be silently placed into %_ anyway, but might be a consideration if using this construct on a sub, since it's not a pure optimization: it changes behavior.
The **#c case allocates an Array, and then allocates a Scalar container for each of the passed values, placing them into the Scalar containers and those Scalar containers into the Array. That's a reasonable amount of extra work.
There's another case not considered here, which is this one:
sub test3(**#c is raw){
1;
}
That places a List in #c, and sets its elements to refer directly to the things that were passed. This is a bit cheaper than the case without is raw. In theory, it could probably perform as well as - if not better than - a capture parameter like |c; it probably just needs somebody working on the compiler to have a dig into why it doesn't yet.
In summary, if not caring about enforcing itemization and/or having a mutable Array of incoming arguments, then adding is raw is probably a better optimization bet than picking a capture parameter: the argument processing semantics are closer, it's already a bit faster, will allow for more natural code, and has future potential to be every bit as fast as, if not faster, than |c.

partial functions vs input verification

I really love using total functions. That said, sometimes I'm not sure what the best approach is for guaranteeing that. Lets say that I'm writing a function similar to chunksOf from the split package, where I want to split up a list into sublists of a given size. Now I'd really rather say that the input for sublist size needs to be a positive int (so excluding 0). As I see it I have several options:
1) all-out: make a newtype for PositiveInt, hide the constructor, and only expose safe functions for creating a PositiveInt (perhaps returning a Maybe or some union of Positive | Negative | Zero or what have you). This seems like it could be a huge hassle.
2) what the split package does: just return an infinite list of size-0 sublists if the size <= 0. This seems like you risk bugs not getting caught, and worse: those bugs just infinitely hanging your program with no indication of what went wrong.
3) what most other languages do: error when the input is <= 0. I really prefer total functions though...
4) return an Either or Maybe to cover the case that the input might have been <= 0. Similar to #1, it seems like using this could just be a hassle.
This seems similar to this post, but this has more to do with error conditions than just being as precise about types as possible. I'm looking for thoughts on how to decide what the best approach for a case like this is. I'm probably most inclined towards doing #1, and just dealing with the added overhead, but I'm concerned that I'll be kicking myself down the road. Is this a decision that needs to be made on a case-by-case basis, or is there a general strategy that consistently works best?

Why is the 'if' statement considered evil?

I just came from Simple Design and Testing Conference. In one of the session we were talking about evil keywords in programming languages. Corey Haines, who proposed the subject, was convinced that if statement is absolute evil. His alternative was to create functions with predicates. Can you please explain to me why if is evil.
I understand that you can write very ugly code abusing if. But I don't believe that it's that bad.
The if statement is rarely considered as "evil" as goto or mutable global variables -- and even the latter are actually not universally and absolutely evil. I would suggest taking the claim as a bit hyperbolic.
It also largely depends on your programming language and environment. In languages which support pattern matching, you will have great tools for replacing if at your disposal. But if you're programming a low-level microcontroller in C, replacing ifs with function pointers will be a step in the wrong direction. So, I will mostly consider replacing ifs in OOP programming, because in functional languages, if is not idiomatic anyway, while in purely procedural languages you don't have many other options to begin with.
Nevertheless, conditional clauses sometimes result in code which is harder to manage. This does not only include the if statement, but even more commonly the switch statement, which usually includes more branches than a corresponding if would.
There are cases where it's perfectly reasonable to use an if
When you are writing utility methods, extensions or specific library functions, it's likely that you won't be able to avoid ifs (and you shouldn't). There isn't a better way to code this little function, nor make it more self-documented than it is:
// this is a good "if" use-case
int Min(int a, int b)
{
if (a < b)
return a;
else
return b;
}
// or, if you prefer the ternary operator
int Min(int a, int b)
{
return (a < b) ? a : b;
}
Branching over a "type code" is a code smell
On the other hand, if you encounter code which tests for some sort of a type code, or tests if a variable is of a certain type, then this is most likely a good candidate for refactoring, namely replacing the conditional with polymorphism.
The reason for this is that by allowing your callers to branch on a certain type code, you are creating a possibility to end up with numerous checks scattered all over your code, making extensions and maintenance much more complex. Polymorphism on the other hand allows you to bring this branching decision as closer to the root of your program as possible.
Consider:
// this is called branching on a "type code",
// and screams for refactoring
void RunVehicle(Vehicle vehicle)
{
// how the hell do I even test this?
if (vehicle.Type == CAR)
Drive(vehicle);
else if (vehicle.Type == PLANE)
Fly(vehicle);
else
Sail(vehicle);
}
By placing common but type-specific (i.e. class-specific) functionality into separate classes and exposing it through a virtual method (or an interface), you allow the internal parts of your program to delegate this decision to someone higher in the call hierarchy (potentially at a single place in code), allowing much easier testing (mocking), extensibility and maintenance:
// adding a new vehicle is gonna be a piece of cake
interface IVehicle
{
void Run();
}
// your method now doesn't care about which vehicle
// it got as a parameter
void RunVehicle(IVehicle vehicle)
{
vehicle.Run();
}
And you can now easily test if your RunVehicle method works as it should:
// you can now create test (mock) implementations
// since you're passing it as an interface
var mock = new Mock<IVehicle>();
// run the client method
something.RunVehicle(mock.Object);
// check if Run() was invoked
mock.Verify(m => m.Run(), Times.Once());
Patterns which only differ in their if conditions can be reused
Regarding the argument about replacing if with a "predicate" in your question, Haines probably wanted to mention that sometimes similar patterns exist over your code, which differ only in their conditional expressions. Conditional expressions do emerge in conjunction with ifs, but the whole idea is to extract a repeating pattern into a separate method, leaving the expression as a parameter. This is what LINQ already does, usually resulting in cleaner code compared to an alternative foreach:
Consider these two very similar methods:
// average male age
public double AverageMaleAge(List<Person> people)
{
double sum = 0.0;
int count = 0;
foreach (var person in people)
{
if (person.Gender == Gender.Male)
{
sum += person.Age;
count++;
}
}
return sum / count; // not checking for zero div. for simplicity
}
// average female age
public double AverageFemaleAge(List<Person> people)
{
double sum = 0.0;
int count = 0;
foreach (var person in people)
{
if (person.Gender == Gender.Female) // <-- only the expression
{ // is different
sum += person.Age;
count++;
}
}
return sum / count;
}
This indicates that you can extract the condition into a predicate, leaving you with a single method for these two cases (and many other future cases):
// average age for all people matched by the predicate
public double AverageAge(List<Person> people, Predicate<Person> match)
{
double sum = 0.0;
int count = 0;
foreach (var person in people)
{
if (match(person)) // <-- the decision to match
{ // is now delegated to callers
sum += person.Age;
count++;
}
}
return sum / count;
}
var males = AverageAge(people, p => p.Gender == Gender.Male);
var females = AverageAge(people, p => p.Gender == Gender.Female);
And since LINQ already has a bunch of handy extension methods like this, you actually don't even need to write your own methods:
// replace everything we've written above with these two lines
var males = list.Where(p => p.Gender == Gender.Male).Average(p => p.Age);
var females = list.Where(p => p.Gender == Gender.Female).Average(p => p.Age);
In this last LINQ version the if statement has "disappeared" completely, although:
to be honest the problem wasn't in the if by itself, but in the entire code pattern (simply because it was duplicated), and
the if still actually exists, but it's written inside the LINQ Where extension method, which has been tested and closed for modification. Having less of your own code is always a good thing: less things to test, less things to go wrong, and the code is simpler to follow, analyze and maintain.
Huge runs of nested if/else statements
When you see a function spanning 1000 lines and having dozens of nested if blocks, there is an enormous chance it can be rewritten to
use a better data structure and organize the input data in a more appropriate manner (e.g. a hashtable, which will map one input value to another in a single call),
use a formula, a loop, or sometimes just an existing function which performs the same logic in 10 lines or less (e.g. this notorious example comes to my mind, but the general idea applies to other cases),
use guard clauses to prevent nesting (guard clauses give more confidence into the state of variables throughout the function, because they get rid of exceptional cases as soon as possible),
at least replace with a switch statement where appropriate.
Refactor when you feel it's a code smell, but don't over-engineer
Having said all this, you should not spend sleepless nights over having a couple of conditionals now and there. While these answers can provide some general rules of thumb, the best way to be able to detect constructs which need refactoring is through experience. Over time, some patterns emerge that result in modifying the same clauses over and over again.
There is another sense in which if can be evil: when it comes instead of polymorphism.
E.g.
if (animal.isFrog()) croak(animal)
else if (animal.isDog()) bark(animal)
else if (animal.isLion()) roar(animal)
instead of
animal.emitSound()
But basically if is a perfectly acceptable tool for what it does. It can be abused and misused of course, but it is nowhere near the status of goto.
A good quote from Code Complete:
Code as if whoever maintains your program is a violent psychopath who
knows where you live.
— Anonymous
IOW, keep it simple. If the readability of your application will be enhanced by using a predicate in a particular area, use it. Otherwise, use the 'if' and move on.
I think it depends on what you're doing to be honest.
If you have a simple if..else statement, why use a predicate?
If you can, use a switch for larger if replacements, and then if the option to use a predicate for large operations (where it makes sense, otherwise your code will be a nightmare to maintain), use it.
This guy seems to have been a bit pedantic for my liking. Replacing all if's with Predicates is just crazy talk.
There is the Anti-If campaign which started earlier in the year. The main premise being that many nested if statements often can often be replaced with polymorphism.
I would be interested to see an example of using the Predicate instead. Is this more along the lines of functional programming?
Just like in the bible verse about money, if statements are not evil -- the LOVE of if statements is evil. A program without if statements is a ridiculous idea, and using them as necessary is essential. But a program that has 100 if-else if blocks in a row (which, sadly, I have seen) is definitely evil.
I have to say that I recently have begun to view if statements as a code smell: especially when you find yourself repeating the same condition several times. But there's something you need to understand about code smells: they don't necessarily mean that the code is bad. They just mean that there's a good chance the code is bad.
For instance, comments are listed as a code smell by Martin Fowler, but I wouldn't take anyone seriously who says "comments are evil; don't use them".
Generally though, I prefer to use polymorphism instead of if statements where possible. That just makes for so much less room for error. I tend to find that a lot of the time, using conditionals leads to a lot of tramp arguments as well (because you have to pass the data needed to form the conditional on to the appropriate method).
if is not evil(I also hold that assigning morality to code-writing practices is asinine...).
Mr. Haines is being silly and should be laughed at.
I'll agree with you; he was wrong. You can go too far with things like that, too clever for your own good.
Code created with predicates instead of ifs would be horrendous to maintain and test.
Predicates come from logical/declarative programming languages, like PROLOG. For certain classes of problems, like constraint solving, they are arguably superior to a lot of drawn out step-by-step if-this-do-that-then-do-this crap. Problems that would be long and complex to solve in imperative languages can be done in just a few lines in PROLOG.
There's also the issue of scalable programming (due to the move towards multicore, the web, etc.). If statements and imperative programming in general tend to be in step-by-step order, and not scaleable. Logical declarations and lambda calculus though, describe how a problem can be solved, and what pieces it can be broken down into. As a result, the interpreter/processor executing that code can efficiently break the code into pieces, and distribute it across multiple CPUs/cores/threads/servers.
Definitely not useful everywhere; I'd hate to try writing a device driver with predicates instead of if statements. But yes, I think the main point is probably sound, and worth at least getting familiar with, if not using all the time.
The only problem with a predicates (in terms of replacing if statements) is that you still need to test them:
function void Test(Predicate<int> pr, int num)
{
if (pr(num))
{ /* do something */ }
else
{ /* do something else */ }
}
You could of course use the terniary operator (?:), but that's just an if statement in disguise...
Perhaps with quantum computing it will be a sensible strategy to not use IF statements but to let each leg of the computation proceed and only have the function 'collapse' at termination to a useful result.
Sometimes it's necessary to take an extreme position to make your point. I'm sure this person uses if -- but every time you use an if, it's worth having a little think about whether a different pattern would make the code clearer.
Preferring polymorphism to if is at the core of this. Rather than:
if(animaltype = bird) {
squawk();
} else if(animaltype = dog) {
bark();
}
... use:
animal.makeSound();
But that supposes that you've got an Animal class/interface -- so really what the if is telling you, is that you need to create that interface.
So in the real world, what sort of ifs do we see that lead us to a polymorphism solution?
if(logging) {
log.write("Did something");
}
That's really irritating to see throughout your code. How about, instead, having two (or more) implementations of Logger?
this.logger = new NullLogger(); // logger.log() does nothing
this.logger = new StdOutLogger(); // logger.log() writes to stdout
That leads us to the Strategy Pattern.
Instead of:
if(user.getCreditRisk() > 50) {
decision = thoroughCreditCheck();
} else if(user.getCreditRisk() > 20) {
decision = mediumCreditCheck();
} else {
decision = cursoryCreditCheck();
}
... you could have ...
decision = getCreditCheckStrategy(user.getCreditRisk()).decide();
Of course getCreditCheckStrategy() might contain an if -- and that might well be appropriate. You've pushed it into a neat place where it belongs.
It probably comes down to a desire to keep code cyclomatic complexity down, and to reduce the number of branch points in a function. If a function is simple to decompose into a number of smaller functions, each of which can be tested, you can reduce the complexity and make code more easily testable.
IMO:
I suspect he was trying to provoke a debate and make people think about the misuse of 'if'. No one would seriously suggest such a fundamental construction of programming syntax was to be completely avoided would they?
Good that in ruby we have unless ;)
But seriously probably if is the next goto, that even if most of the people think it is evil in some cases is simplifying/speeding up the things (and in some cases like low level highly optimized code it's a must).
I think If statements are evil, but If expressions are not. What I mean by an if expression in this case can be something like the C# ternary operator (condition ? trueExpression : falseExpression). This is not evil because it is a pure function (in a mathematical sense). It evaluates to a new value, but it has no effects on anything else. Because of this, it works in a substitution model.
Imperative If statements are evil because they force you to create side-effects when you don't need to. For an If statement to be meaningful, you have to produce different "effects" depending on the condition expression. These effects can be things like IO, graphic rendering or database transactions, which change things outside of the program. Or, it could be assignment statements that mutate the state of the existing variables. It is usually better to minimize these effects and separate them from the actual logic. But, because of the If statements, we can freely add these "conditionally executed effects" everywhere in the code. I think that's bad.
If is not evil! Consider ...
int sum(int a, int b) {
return a + b;
}
Boring, eh? Now with an added if ...
int sum(int a, int b) {
if (a == 0 && b == 0) {
return 0;
}
return a + b;
}
... your code creation productivity (measured in LOC) is doubled.
Also code readability has improved much, for now you can see in the blink of an eye what the result is when both argument are zero. You couldn't do that in the code above, could you?
Moreover you supported the testteam for they now can push their code coverage test tools use up more to the limits.
Furthermore the code now is better prepared for future enhancements. Let's guess, for example, the sum should be zero if one of the arguments is zero (don't laugh and don't blame me, silly customer requirements, you know, and the customer is always right).
Because of the if in the first place only a slight code change is needed.
int sum(int a, int b) {
if (a == 0 || b == 0) {
return 0;
}
return a + b;
}
How much more code change would have been needed if you hadn't invented the if right from the start.
Thankfulness will be yours on all sides.
Conclusion: There's never enough if's.
There you go. To.

Resources