Why do people disassemble .NET (CLR) binaries? - reflector

I'm somewhat new to .NET but not new to programming, and I'm somewhat puzzled at the trend and excitement about disassembling compiled .NET code. It seems pointless.
The high-level ease of use of .NET is the reason I use it. I've written C and real (hardware processor) assembly in environments with limited resources. That was the reason to spend the effort on so many meticulous details, for efficiency. Up in .NET land, it kind of defeats the purpose of having a high-level object-oriented language if you waste time diving down into the most cryptic details of the implementation. In the course of working with .NET, I have debugged the usual performance issues an odd race conditions, and I've done it all by reading my own source code, never once having any thought as to what intermediate language the compiler is generating. For example, it's pretty obvious that a for(;;) loop is going to be faster than a foreach() on an array, considering that foreach() is going to use an enumeration object with a method call to advance to each next time instead of a simple increment of a variable, and this is easy to prove with a tight loop run a few million times (no disassembly required).
What really makes disassembling IL silly is the fact that's it's not real machine code. It's virtual machine code. I've heard some people actually like to move instructions around to optimize it. Are you kidding me? Just-in-time compiled virtual machine code can't even do a simple tight for(;;) loop at the speed of natively compiled code. If you want to squeeze every last cycle out of your processor, then use C/C++ and spend time learning real assembly. That way the time you spend understanding lots of low-level details will actually be worthwhile.
So, other than having too much time on their hands, why do people disassemble .NET (CLR) binaries?

Understanding what compilers for various high-level languages are actually doing with your sources is an important skill to acquire as you move towards mastery of a certain environment, just like, say, understanding how DB engines will plan to execute various kinds of SQL queries you can toss at them. To use in a masterful way a certain level of abstraction, familiarity with (at least) the level below it is quite a good thing to acquire; see e.g. some notes on my talk on the subject of abstraction and the slides for that talk, as well as Joel Spolsky's "law of leaky abstractions" that I refer to in the talk.

I've used it when the source code has been lost or what's in version control in a particular tagged release doesn't appear to correspond to the shipped binary.

After just completing a 4 day course in secure software development, I would say that many people would decompile source to find any vulnerabilities in it. Knowing the source of a client application could help in planning an attack on a server.
Of course, little utilities and such, there wouldn't be any such issues.
If i remember correctly, there is an app out there that obfuscates your .net binaries. I believe it was called dotfuscator.

To understand how to use a poorly documented interface.
(sadly it's much too frequent in .net based tools such as BizTalk or WCF to only have generic generated documentation, so disassembling to C# is sometimes necessary to see what a method is doing, in which context to use it)

Each .NET language implements its own subset of CLR functionality. Knowing that the CLR is capable of things that the language you're currently using isn't can let you make an informed decision on whether to change languages or emit IL or find another way.
Your assumption that the only reason people do things like this is because they have too much time is insulting and uneducated.

To locate library bugs and figure out how to work around them.
For example: without reflection you cannot remote an exception and rethrow it without slaughtering its backtrace. However the framework can do it.

From your question it looks like you do not know that Reflector disassembles CLR assemblies back to C# or VB so you pretty much see original code, not IL!

Actually, a foreach over an int[] gets compiled into a for statement. If we cast it to an enumerable, you are right, it uses an Enumerator. HOWEVER, that strangely makes it FASTER since there is no incrementing the temp int. To prove this, we use benchmarking coupled with the decompiler for added understanding...
So I think by asking this question, you really answered it yourself.
If this benchmark differs from yours, please let me know how. I tried it with object arrays, nulls, etc, etc...
code:
static void Main(string[] args)
{
int[] ints = Enumerable.Repeat(1, 50000000).ToArray();
while (true)
{
DateTime now = DateTime.Now;
for (int i = 0; i < ints.Length; i++)
{
//nothing really
}
Console.WriteLine("for loop: " + (DateTime.Now - now));
now = DateTime.Now;
for (int i = 0; i < ints.Length; i++)
{
int nothing = ints[i];
}
Console.WriteLine("for loop with assignment: " + (DateTime.Now - now));
now = DateTime.Now;
foreach (int i in ints)
{
//nothing really
}
Console.WriteLine("foreach: " + (DateTime.Now - now));
now = DateTime.Now;
foreach (int i in (IEnumerable<int>)ints)
{
//nothing really
}
Console.WriteLine("foreach casted to IEnumerable<int>: " + (DateTime.Now - now));
}
}
results:
for loop: 00:00:00.0273438
for loop with assignment: 00:00:00.0712890
foreach: 00:00:00.0693359
foreach casted to IEnumerable<int>: 00:00:00.6103516
for loop: 00:00:00.0273437
for loop with assignment: 00:00:00.0683594
foreach: 00:00:00.0703125
foreach casted to IEnumerable<int>: 00:00:00.6250000
for loop: 00:00:00.0273437
for loop with assignment: 00:00:00.0683594
foreach: 00:00:00.0683593
foreach casted to IEnumerable<int>: 00:00:00.6035157
for loop: 00:00:00.0283203
for loop with assignment: 00:00:00.0771484
foreach: 00:00:00.0771484
foreach casted to IEnumerable<int>: 00:00:00.6005859
for loop: 00:00:00.0273438
for loop with assignment: 00:00:00.0722656
foreach: 00:00:00.0712891
foreach casted to IEnumerable<int>: 00:00:00.6210938
decompiled (note that the empty foreach had to add a variable assignment... something our empty for loop didn't but obviously needed):
private static void Main(string[] args)
{
int[] ints = Enumerable.Repeat<int>(1, 0x2faf080).ToArray<int>();
while (true)
{
DateTime now = DateTime.Now;
for (int i = 0; i < ints.Length; i++)
{
}
Console.WriteLine("for loop: " + ((TimeSpan) (DateTime.Now - now)));
now = DateTime.Now;
for (int i = 0; i < ints.Length; i++)
{
int num1 = ints[i];
}
Console.WriteLine("for loop with assignment: " + ((TimeSpan) (DateTime.Now - now)));
now = DateTime.Now;
int[] CS$6$0000 = ints;
for (int CS$7$0001 = 0; CS$7$0001 < CS$6$0000.Length; CS$7$0001++)
{
int num2 = CS$6$0000[CS$7$0001];
}
Console.WriteLine("foreach: " + ((TimeSpan) (DateTime.Now - now)));
now = DateTime.Now;
using (IEnumerator<int> CS$5$0002 = ((IEnumerable<int>) ints).GetEnumerator())
{
while (CS$5$0002.MoveNext())
{
int current = CS$5$0002.Current;
}
}
Console.WriteLine("foreach casted to IEnumerable<int>: " + ((TimeSpan) (DateTime.Now - now)));
}
}

To learn.
Articles are nice, but they do not present production code. Without .NET Reflector, it would have taken me a couple of weeks to figure out how Microsoft implemented events in the FileSystemWatcher component. Instead, it only a few hours and I was able to finish my FileSystemSearcher component.

I myself often wonder this... :)
Sometimes there is a need to understand how a specific library method works or why exactly it works this way. There maybe a situation when the documentation on this function is vague or there is some odd behavior that needs investigation. In this case some people go to disassemble libraries to look what calls inside certain methods are made.
As for optimization I never heard of this. I think it is ultimately stupid trying to optimize MIL, since it will be then fed to a translator which will generate the real machine code with a pretty good efficiency and your "optimizations" could get lost anyway.

To understand how the underlying system is implemented, understand what's the equivalent of a high level code in IL, circumvent licensing...

I have used it in the following, an more, cases:
Had trouble with an internal assembly to which I did not have the source code for.
Needed to figure out how a particular third-party controls library looks for a run-time license.
Needed to find out how the .Net license compiler works. (Just placed lc.exe inside Reflector)
Used it to make sure I had the correct build of certain libraries.

Something that folks haven't mentioned is that reflector comes in super useful if you use a compile time weaving AOP framework like PostSharp.

Related

How to hide literals in code

What are the main existing approaches to hide the value of literals in code, so that they are not easily traced with just an hexdumper or a decompiler?
For example, instead of coding this:
static final int MY_VALUE = 100;
We could have:
static final int MY_VALUE = myFunction1();
private int myFunction1(){
int i = 23;
i += 8 << 4;
for(int j = 0; j < 3; j++){
i-= (j<<1);
}
return myFunction2(i);
}
private int myFunction2(int i){
return i + 19;
}
That was just an example of what we're trying to do. (Yes, I know, the compiler may optimize it and precalculate the constant).
Disclaimer: I know this will not provide any aditional security at all, but it makes the code more obscure (or interesting) to reverse-engineer. The purpose of this is just to force the attacker to debug the program, and waste time on it. Keep in mind that we're doing it just for fun.
Since you're trying to hide text, which will be visible in the simple dump of the program, you can use some kind of simple encryption to obfuscate your program and hide that text from prying eyes.
Detailed instuctions:
Visit ROT47.com and encode your text online. You can also use this web site for a more generic ROTn encoding.
Replace contents of your string constants with the encoded text.
Use the decoder in your code to transform the text back into its original form when you need it. ROT13 Wikipedia article contains some notes about implementation, and here is Javascript implementation of ROTn on StackOverflow. It is trivial to adapt it to whatever language you're using.
Why use ROT47 which is notoriously weak encryption?
In the end, your code will look something like this:
decryptedData = decryptStr(MY_ENCRYPTED_CONSTANT)
useDecrypted(decryptedData)
No matter how strong your cypher, anybody equipped with a debugger can set a breakpoint on useDecrypted() and recover the plaintext. So, strength of the cypher does not matter. However, using something like Rot47 has two distinct advantages:
You can encode your text online, no need to write a specialized program to encode your text.
Decryption is very easy to implement, so you don't waste your time on something that does not add any value to your customers.
Anybody reading your code (your coworker or yourself after 5 years) will know immediately this is not a real security, but security by obscurity.
Your text will still appear as gibberish to anyone just prying inside your compiled program, so mission accomplished.
Run some game of life variant for a large number of iterations, and then make control flow decisions based on the final state vector.
If your program is meant to actually do something useful, you could have your desired branches planned ahead of time and choose bits of the state vector to suit ("I want a true here, bit 17 is on, so make that the condition..")
You could also use some part of compiled code as data, then modify it a little. This would be hard to do in a program executed by virtual machine, but is doable in languages like asm or c.

whether a language needs preIncrement (++x) and postIncrement (x++)

I have never seen the usecase for pre-increment and post-increment in actual code. The only place i see them most often are puzzles.
My opinion is, it introduces more confusion rather than being useful.
is there any real use case scenario for this
can't this can be done by using +=
y = x++
y = x
x += 1
It's just a shorter way of writing the same thing and it's only confusing to those who don't deeply understand C (a). The same argument could be made for replacing:
for (i = 0; i < 10; i++)
printf ("%d\n", i);
with:
i = 0;
while (i < 10) {
printf ("%d\n", i);
i = i + 1;
}
since any for can also be done with while, or:
i = 0;
loop: if (i < 10) {
printf ("%d\n", i);
i = i + 1;
goto loop;
}
since any loop construct can be built out of conditions and goto. But (I'm hoping) you wouldn't do that, would you?
(a) I sometimes like to explain this to my students as simple statements and side effects, something that allows C code to be more succinct with usually no or minimal loss in readability.
For the statement:
y = x++;
the statement is assigning x to y with the side effect that x is incremented afterwards. ++x is the same, it's just that the side effect happens beforehand.
Similarly, the side effect of an assignment is that it evaluates as the value assigned, meaning you can do things like:
while ((c = getchar()) != -1) count++;
and which makes things like:
42;
perfectly valid, but useless, C statements.
The pre- and post-increment operators make much more sense if you consider them in the light of history and when they were conceived.
Back in the days when C was basically a high-level assembler for PDP-11 machines</flamebait>, and long before we had the nice optimizing compilers we have now, there were common idioms used that the post-increment operators were perfect for. Things like this:
char* strcpy(char* src, char* dest)
{
/* highly simplified version and likely not compileable as-is */
while (*dest++ = *src++);
return dest;
}
The code in question generated PDP-11 (or other) machine language code that made heavy use of the underlying addressing modes (like relative direct and relative indirect) that incorporated exactly these kinds of pre- and post-increment and decrement operations.
So to answer your question: do languages "need" these nowadays? No, of course not. It's provable that you need very little in terms of instructions to compute things. The question is more interesting if you ask "are these features desirable?" To that I'd answer a qualified "yes".
Using your examples:
y = x;
x += 1;
vs.
y = x++;
I can see two advantages right off the top of my head.
The code is more succinct. Everything I need to know to understand what you're doing is in one place (as long as I know the language, naturally!) instead of spread out. "Spreading out" across two lines seems like a picky thing but if you're doing thousands of them it can make a big difference in the end.
It is far more likely that the code generated even by a crappy compiler will be atomic in the second case. In the first case it very likely will not be unless you have a nice compiler. (Not all platforms have good, strong optimizing compilers.)
Also, I find it very telling that you're talking about += when that itself is an "unneeded" way of saying x = x + 1;.... After all there is no use case scenario I can think of for += that couldn't be served fine by _ = _ + _ instead.
You're accidentally raising a much larger issue here, and it's one that will make itself more and more known to you as the years (decades) go by.
Languages often make the mistake of supplying "abilities" when they shouldn't. IMO, ++ should be a stand-alone statement only, and absolutely not an expression operator.
Try to keep the following close to heart: The goal is not to create code for the competent engineer to read. The goal is to create code for the competent engineer to read when he is exhausted at 3am and hopped up on caffeine.
If an engineer says to you "All code constructs can get you into trouble. You just have to know what you're doing.", then walk away laughing, because he's just exposed himself as part of the problem.
In other words, please don't ever code anything like this:
a[aIndex++] = b[++bIndex];
You can find a interesting conversation about this kind of thing here:
Why avoid increment ("++") and decrement ("--") operators in JavaScript?

Groovy for loop execution time

O Groovy Gurus,
This code snippet runs in around 1 second
for (int i in (1..10000000)) {
j = i;
}
while this one takes almost 9 second
for (int i = 1; i < 10000000; i++) {
j = i;
}
Why is it so?
Ok. Here is my take on why?
If you convert both scripts to bytecode, you will notice that
ForInLoop uses Range. Iterator is used to advance during each loop. Comparison (<) is made directly to int (or Integer) to determine whether the exit condition has been met or not
ForLoop uses traditional increment, check condition, and perform action. For checking condition i < 10000000 it uses Groovy's ScriptBytecodeAdapter.compareLessThan. If you dig deep into that method's code, you will find both sides of comparison is taken in as Object and there are so many things going on, casting, comparing them as object, etc.
ScriptBytecodeAdapter.compareLessThan --> ScriptBytecodeAdapter.compareTo --> DefaultTypeTransformation.compareTo
There are other classes in typehandling package which implements compareTo method specifically for math data types, not sure why they are not being used, (if they are not being used)
I am suspecting that is the reason second loop is taking longer.
Again, please correct me if I am wrong or missing something...
In your testing, be sure to "warm" the JVM up before taking the measure, otherwise you may wind up triggering various startup actions in the platform (class loading, JIT compilation). Run your tests many times in a row too. Also, if you did the second test while a garbage collect was going on, that might have an impact. Try running each of your tests 100 times and print out the times after each test, and see what that tells you.
If you can eliminate potential artifacts from startup time as Jim suggests, then I'd hazard a guess that the Java-style for loop in Groovy is not so well implemented as the original Groovy-style for loop. It was only added as of v1.5 after user requests, so perhaps its implementation was a bit of an afterthought.
Have you taken a look at the bytecode generated for your two examples to see if there are any differences? There was a discussion about Groovy performance here in which one of the comments (from one 'johnchase') says this:
I wonder if the difference you saw related to how Groovy uses numbers (primitives) - since it wraps all primitives in their equivalent Java wrapper classes (int -> Integer), I’d imagine that would slow things down quite a bit. I’d be interested in seeing the performance of Java code that loops 10,000,000 using the wrapper classes instead of ints.
So perhaps the original Groovy for loop does not suffer from this? Just speculation on my part really though.

Looking for good server-side language that will allow players to upload code that can be executed

I had an idea of a program I want to write, but which language would be best is my problem.
If I have a car racing game and I want to allow users to submit code for new interactive 3D race tracks (think of tracks such as found in the Speed Racer movie), vehicles and for their autonomous vehicles, so, they would create the AI for their car that will enable the car to determine how to handle hazards.
So, I need a language that will run fast, and as part of a world map that the server has of all the possible races available, and their various states.
I am curious if this would be a good reason to look at creating a DSL in Scala, for example?
I don't want to have to restart an application to load new dlls or jar files so many compiled languages would be a problem.
I am open to Linux or Windows, and for the languages, most scripting languages, F#, Scala, Erlang or most OOP I can program in.
The user will be able to monitor how their vehicle is doing, and if they have more than one AI uploaded for that car, when it gets to certain obstacles they should be able to swap one AI program for another on demand.
Update: So far the solutions are javascript, using V8, and Lua.
I am curious if this may be a good use for a DSL, actually 3 separate ones. 1 for creating a racetrack, another for controlling a racecar and the third for creating new cars.
If so, would Haskell, F# or Scala be good choices for this?
Update: Would it make sense to have different parts end up in different languages? For example, if Erlang was used for the controlling of the car and Lua for the car itself, and also for the animated racetrack?
Your situation sounds like a good candidate for Lua.
You need sandboxing: This is easy to do in Lua. You simply initialize the users' environment by overwriting or deleting the os.execute command, for instance, and there is no way for the user to access that function anymore.
You want fast: Check out some of the Lua benchmarks against other languages.
Assumably you need to interoperate with another language. Lua is very easy (IMO) to embed in C or C++, at least. I haven't used LuaInterface, but that's the C# binding.
Lua has first-order functions, so it should be easy to swap functions on-the-fly.
Lua supports OOP to some extent with metatables.
Lua's primary data structure is the table (associative array) which is well-suited to sparse data structures like integrating with a world map.
Lua has a very regular syntax. There are no funny tricks with semicolons or indentation, so that's one less thing for your users to learn when they are picking up your language -- not to mention, using a well-documented language takes away some of the work you have to do in terms of documenting it yourself.
Also, as #elviejo points out in a comment, Lua is already used as a scripting language in many games. If nothing else, there's certainly some precedent for using Lua in the way you've described. And, as #gmonc mentions, there is a chance that your users have already used Lua in another game.
As far as how to integrate with Lua: generally, your users should simply need to upload a Lua script file. To grossly oversimplify, you might provide the users with available functions such as TurnLeft, TurnRight, Go, and Stop. Then, the users would upload a script like
Actions = {} -- empty table, but you might want to provide default functions
function Actions.Cone()
TurnLeft()
end
function Actions.Wall()
Stop()
TurnRight()
TurnRight()
Go()
end
Then server-side, you would might start them off with a Go(). Then, when their car reaches a cone, you call their Actions.Cone() function; a wall leads to the Actions.Wall() function, etc. At this point, you've (hopefully) already sandboxed the Lua environment, so you can simply execute their script without even much regard for error checking -- if their script results in an error, no reason you can't pass the error on directly to the user. And if there aren't any errors, the lua_State in your server's code should contain the final state of their car.
Better example
Here's a standalone C file that takes a Lua script from stdin and runs it like I explained above. The game is that you'll encounter Ground, a Fence, or a Branch, and you have to respectively Run, Jump, or Duck to pass. You input a Lua script via stdin to decide how to react. The source is a little long, but hopefully it's easy to understand (besides the Lua API which takes a while to get used to). This is my original creation over the past 30 minutes, hope it helps:
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include "lua.h"
#include "lauxlib.h"
#include "lualib.h"
#define FAIL 0
#define SUCCESS 1
/* Possible states for the player */
enum STATE {
RUNNING,
JUMPING,
DUCKING
};
/* Possible obstacles */
enum OBSTACLE {
GROUND,
FENCE,
BRANCH
};
/* Using global vars here for brevity */
enum STATE playerstate = RUNNING;
enum OBSTACLE currentobstacle = GROUND;
/* Functions to be bound to Lua */
int Duck(lua_State *L)
{
playerstate = DUCKING;
return 0; /* no return values to Lua */
}
int Run(lua_State *L)
{
playerstate = RUNNING;
return 0;
}
int Jump(lua_State *L)
{
playerstate = JUMPING;
return 0;
}
/* Check if player can pass obstacle, offer feedback */
int CanPassObstacle()
{
if ( (playerstate == RUNNING && currentobstacle == GROUND) )
{
printf("Successful run!\n");
return SUCCESS;
}
if (playerstate == JUMPING && currentobstacle == FENCE)
{
printf("Successful jump!\n");
return SUCCESS;
}
if (playerstate == DUCKING && currentobstacle == BRANCH)
{
printf("Successful duck!\n");
return SUCCESS;
}
printf("Wrong move!\n");
return FAIL;
}
/* Pick a random obstacle */
enum OBSTACLE GetNewObstacle()
{
int i = rand() % 3;
if (i == 0) { return GROUND; }
if (i == 1) { return FENCE; }
else { return BRANCH; }
}
/* Execute appropriate function defined in Lua for the next obstacle */
int HandleObstacle(lua_State *L)
{
/* Get the table named Actions */
lua_getglobal(L, "Actions");
if (!lua_istable(L, -1)) {return FAIL;}
currentobstacle = GetNewObstacle();
/* Decide which user function to call */
if (currentobstacle == GROUND)
{
lua_getfield(L, -1, "Ground");
}
else if (currentobstacle == FENCE)
{
lua_getfield(L, -1, "Fence");
}
else if (currentobstacle == BRANCH)
{
lua_getfield(L, -1, "Branch");
}
if (lua_isfunction(L, -1))
{
lua_call(L, 0, 0); /* 0 args, 0 results */
return CanPassObstacle();
}
return FAIL;
}
int main()
{
int i, res;
srand(time(NULL));
lua_State *L = lua_open();
/* Bind the C functions to Lua functions */
lua_pushcfunction(L, &Duck);
lua_setglobal(L, "Duck");
lua_pushcfunction(L, &Run);
lua_setglobal(L, "Run");
lua_pushcfunction(L, &Jump);
lua_setglobal(L, "Jump");
/* execute script from stdin */
res = luaL_dofile(L, NULL);
if (res)
{
printf("Lua script error: %s\n", lua_tostring(L, -1));
return 1;
}
for (i = 0 ; i < 5 ; i++)
{
if (HandleObstacle(L) == FAIL)
{
printf("You failed!\n");
return 0;
}
}
printf("You passed!\n");
return 0;
}
Build the above on GCC with gcc runner.c -o runner -llua5.1 -I/usr/include/lua5.1.
And pretty much the only Lua script that will pass successfully every time is:
Actions = {}
function Actions.Ground() Run() end
function Actions.Fence() Jump() end
function Actions.Branch() Duck() end
which could also be written as
Actions = {}
Actions.Ground = Run
Actions.Fence = Jump
Actions.Branch = Duck
With the good script, you'll see output like:
Successful duck!
Successful run!
Successful jump!
Successful jump!
Successful duck!
You passed!
If the user tries something malicious, the program will simply provide an error:
$ echo "Actions = {} function Actions.Ground() os.execute('rm -rf /') end" | ./runner
PANIC: unprotected error in call to Lua API (stdin:1: attempt to index global 'os' (a nil value))
With an incorrect move script, the user will see that he performed the wrong move:
$ echo "Actions = {} Actions.Ground = Jump; Actions.Fence = Duck; Actions.Branch = Run" | ./runner
Wrong move!
You failed!
Why not JavaScript or EcmaScript? Google's V8 is a really nice sandboxed way to do this. I remember it being really really easy. Of course, you will have to write some bindings for it.
I would recommend Dot Net for several reasons:
Players can choose which language they implement their solutions in: C#, IronPython, VB.NET, Boo, etc. but your runtime wouldn't care - it is just dynamically loading dot net assemblies into its sandbox. But this gives your players a choice of their own favorite language. This encourages players to enjoy the experience, rather than some players deciding not to participate because they simply don't like the single language that you chose. Your overall framework would probably be in C#, but players' code could be in any Dot Net language.
Sandboxing and dynamically loading are very mature in Dot Net. You could load the players' assemblies into your own sandboxed AppDomains that are running with Partial Trust. You would not have to restart the container process to load and unload these player AppDomains.
Players are encouraged to "play" this game because the language (whichever Dot Net language they choose) is not only useful for game scripting, but can lead to a real career in the industry. Doing a job search for "C#" gives a lot more hits than for "Lua" or "Haskell", for example. Therefore, the game is not only fun, but especially for younger players, is actually helping them to learn genuinely useful, marketable skills that can earn them money later. That is big encouragement to participate in this game.
Execution speed is blazing. Unlike some alternatives like Lua, this is compiled code that is well known for excellent performance, even in real-time games. (See Unity3d, for example).
Players can use MonoDevelop on Mac and Linux or they can use Visual Studio Express for free from Microsoft, or they can use good ol' notepad and the command line. The difference from the alternatives here is that mature, modern IDE's are available if players should choose to use them.
A DSL doesn't seem like a good idea for the AI parts of the problem simply because implementing AI for the vehicles is going to require a lot of creative problem solving on the part of the players. With a DSL, you are locking them into only the way that you defined the problem when you thought about it. Smart players with a complete platform like Dot Net (or some of the other choices mentioned) might have radically new and innovative solutions to some of the AI problems that you never foresaw. With the right tools, these players could implement crazy learning programs or small neural networks or who knows what in order to implement their AI. But if you lock them into a simplified DSL, there might not be much variety in different players' AI implementations (because their set of available expressions of ideas is so much smaller).
For the other parts of the problem such as defining the tracks, a DSL might be fine. Again, though, I would lean toward one of the simpler Dot Net languages like Boo simply so that you can have a unified tech stack for your entire project.
I had done in MMO before, you know, NPC response scripts were using python, while it is in a framework of C++, say any NPC related action will trigger the framework to run a python script (a C-python interface of course, not a shell call such as "python /script/run.py"). The scripts are replaceable runtime, though need the player or game admin to issue a command to do a refresh, but anyway the game server program is not required to restart.
Actually I forgot that whether "do a refresh by issuing a command" was required or not for a new script runtime..2 years before...but I think it suitable for you.
Consider Erlang:
You need sandboxing / DSL: you can write "parser generators" to scrub access to critical/vulnerable system calls. The stock compiler can be "easily" enhanced with this functionality.
You need fine-grained scheduling : you have some control over this also provided you run each "user" in separate emulators. Maybe you can do better but I'd have to dig more. Remember the scheduling is O(1) :-)
You need resource partitioning between your "players" ( I guess if I understood correctly): Erlang has no shared-state so this helps from the on-start. You can easily craft some supervisors that watch resource consumption of the players etc. See also link on above point (lots of knobs to control the emulator).
You need code hot-swapping: Erlang was designed for this from the on-start
You need scaling: Erlang scales with SMP nicely and since it is based on message passing with seamless inter-machine communication, you can scale horizontally
You can optimize the critical paths using C drivers
Integrated "supervisor" functionality for restarting gracefully "users"
Ulf Wiger on Concurrency

Why is the 'if' statement considered evil?

I just came from Simple Design and Testing Conference. In one of the session we were talking about evil keywords in programming languages. Corey Haines, who proposed the subject, was convinced that if statement is absolute evil. His alternative was to create functions with predicates. Can you please explain to me why if is evil.
I understand that you can write very ugly code abusing if. But I don't believe that it's that bad.
The if statement is rarely considered as "evil" as goto or mutable global variables -- and even the latter are actually not universally and absolutely evil. I would suggest taking the claim as a bit hyperbolic.
It also largely depends on your programming language and environment. In languages which support pattern matching, you will have great tools for replacing if at your disposal. But if you're programming a low-level microcontroller in C, replacing ifs with function pointers will be a step in the wrong direction. So, I will mostly consider replacing ifs in OOP programming, because in functional languages, if is not idiomatic anyway, while in purely procedural languages you don't have many other options to begin with.
Nevertheless, conditional clauses sometimes result in code which is harder to manage. This does not only include the if statement, but even more commonly the switch statement, which usually includes more branches than a corresponding if would.
There are cases where it's perfectly reasonable to use an if
When you are writing utility methods, extensions or specific library functions, it's likely that you won't be able to avoid ifs (and you shouldn't). There isn't a better way to code this little function, nor make it more self-documented than it is:
// this is a good "if" use-case
int Min(int a, int b)
{
if (a < b)
return a;
else
return b;
}
// or, if you prefer the ternary operator
int Min(int a, int b)
{
return (a < b) ? a : b;
}
Branching over a "type code" is a code smell
On the other hand, if you encounter code which tests for some sort of a type code, or tests if a variable is of a certain type, then this is most likely a good candidate for refactoring, namely replacing the conditional with polymorphism.
The reason for this is that by allowing your callers to branch on a certain type code, you are creating a possibility to end up with numerous checks scattered all over your code, making extensions and maintenance much more complex. Polymorphism on the other hand allows you to bring this branching decision as closer to the root of your program as possible.
Consider:
// this is called branching on a "type code",
// and screams for refactoring
void RunVehicle(Vehicle vehicle)
{
// how the hell do I even test this?
if (vehicle.Type == CAR)
Drive(vehicle);
else if (vehicle.Type == PLANE)
Fly(vehicle);
else
Sail(vehicle);
}
By placing common but type-specific (i.e. class-specific) functionality into separate classes and exposing it through a virtual method (or an interface), you allow the internal parts of your program to delegate this decision to someone higher in the call hierarchy (potentially at a single place in code), allowing much easier testing (mocking), extensibility and maintenance:
// adding a new vehicle is gonna be a piece of cake
interface IVehicle
{
void Run();
}
// your method now doesn't care about which vehicle
// it got as a parameter
void RunVehicle(IVehicle vehicle)
{
vehicle.Run();
}
And you can now easily test if your RunVehicle method works as it should:
// you can now create test (mock) implementations
// since you're passing it as an interface
var mock = new Mock<IVehicle>();
// run the client method
something.RunVehicle(mock.Object);
// check if Run() was invoked
mock.Verify(m => m.Run(), Times.Once());
Patterns which only differ in their if conditions can be reused
Regarding the argument about replacing if with a "predicate" in your question, Haines probably wanted to mention that sometimes similar patterns exist over your code, which differ only in their conditional expressions. Conditional expressions do emerge in conjunction with ifs, but the whole idea is to extract a repeating pattern into a separate method, leaving the expression as a parameter. This is what LINQ already does, usually resulting in cleaner code compared to an alternative foreach:
Consider these two very similar methods:
// average male age
public double AverageMaleAge(List<Person> people)
{
double sum = 0.0;
int count = 0;
foreach (var person in people)
{
if (person.Gender == Gender.Male)
{
sum += person.Age;
count++;
}
}
return sum / count; // not checking for zero div. for simplicity
}
// average female age
public double AverageFemaleAge(List<Person> people)
{
double sum = 0.0;
int count = 0;
foreach (var person in people)
{
if (person.Gender == Gender.Female) // <-- only the expression
{ // is different
sum += person.Age;
count++;
}
}
return sum / count;
}
This indicates that you can extract the condition into a predicate, leaving you with a single method for these two cases (and many other future cases):
// average age for all people matched by the predicate
public double AverageAge(List<Person> people, Predicate<Person> match)
{
double sum = 0.0;
int count = 0;
foreach (var person in people)
{
if (match(person)) // <-- the decision to match
{ // is now delegated to callers
sum += person.Age;
count++;
}
}
return sum / count;
}
var males = AverageAge(people, p => p.Gender == Gender.Male);
var females = AverageAge(people, p => p.Gender == Gender.Female);
And since LINQ already has a bunch of handy extension methods like this, you actually don't even need to write your own methods:
// replace everything we've written above with these two lines
var males = list.Where(p => p.Gender == Gender.Male).Average(p => p.Age);
var females = list.Where(p => p.Gender == Gender.Female).Average(p => p.Age);
In this last LINQ version the if statement has "disappeared" completely, although:
to be honest the problem wasn't in the if by itself, but in the entire code pattern (simply because it was duplicated), and
the if still actually exists, but it's written inside the LINQ Where extension method, which has been tested and closed for modification. Having less of your own code is always a good thing: less things to test, less things to go wrong, and the code is simpler to follow, analyze and maintain.
Huge runs of nested if/else statements
When you see a function spanning 1000 lines and having dozens of nested if blocks, there is an enormous chance it can be rewritten to
use a better data structure and organize the input data in a more appropriate manner (e.g. a hashtable, which will map one input value to another in a single call),
use a formula, a loop, or sometimes just an existing function which performs the same logic in 10 lines or less (e.g. this notorious example comes to my mind, but the general idea applies to other cases),
use guard clauses to prevent nesting (guard clauses give more confidence into the state of variables throughout the function, because they get rid of exceptional cases as soon as possible),
at least replace with a switch statement where appropriate.
Refactor when you feel it's a code smell, but don't over-engineer
Having said all this, you should not spend sleepless nights over having a couple of conditionals now and there. While these answers can provide some general rules of thumb, the best way to be able to detect constructs which need refactoring is through experience. Over time, some patterns emerge that result in modifying the same clauses over and over again.
There is another sense in which if can be evil: when it comes instead of polymorphism.
E.g.
if (animal.isFrog()) croak(animal)
else if (animal.isDog()) bark(animal)
else if (animal.isLion()) roar(animal)
instead of
animal.emitSound()
But basically if is a perfectly acceptable tool for what it does. It can be abused and misused of course, but it is nowhere near the status of goto.
A good quote from Code Complete:
Code as if whoever maintains your program is a violent psychopath who
knows where you live.
— Anonymous
IOW, keep it simple. If the readability of your application will be enhanced by using a predicate in a particular area, use it. Otherwise, use the 'if' and move on.
I think it depends on what you're doing to be honest.
If you have a simple if..else statement, why use a predicate?
If you can, use a switch for larger if replacements, and then if the option to use a predicate for large operations (where it makes sense, otherwise your code will be a nightmare to maintain), use it.
This guy seems to have been a bit pedantic for my liking. Replacing all if's with Predicates is just crazy talk.
There is the Anti-If campaign which started earlier in the year. The main premise being that many nested if statements often can often be replaced with polymorphism.
I would be interested to see an example of using the Predicate instead. Is this more along the lines of functional programming?
Just like in the bible verse about money, if statements are not evil -- the LOVE of if statements is evil. A program without if statements is a ridiculous idea, and using them as necessary is essential. But a program that has 100 if-else if blocks in a row (which, sadly, I have seen) is definitely evil.
I have to say that I recently have begun to view if statements as a code smell: especially when you find yourself repeating the same condition several times. But there's something you need to understand about code smells: they don't necessarily mean that the code is bad. They just mean that there's a good chance the code is bad.
For instance, comments are listed as a code smell by Martin Fowler, but I wouldn't take anyone seriously who says "comments are evil; don't use them".
Generally though, I prefer to use polymorphism instead of if statements where possible. That just makes for so much less room for error. I tend to find that a lot of the time, using conditionals leads to a lot of tramp arguments as well (because you have to pass the data needed to form the conditional on to the appropriate method).
if is not evil(I also hold that assigning morality to code-writing practices is asinine...).
Mr. Haines is being silly and should be laughed at.
I'll agree with you; he was wrong. You can go too far with things like that, too clever for your own good.
Code created with predicates instead of ifs would be horrendous to maintain and test.
Predicates come from logical/declarative programming languages, like PROLOG. For certain classes of problems, like constraint solving, they are arguably superior to a lot of drawn out step-by-step if-this-do-that-then-do-this crap. Problems that would be long and complex to solve in imperative languages can be done in just a few lines in PROLOG.
There's also the issue of scalable programming (due to the move towards multicore, the web, etc.). If statements and imperative programming in general tend to be in step-by-step order, and not scaleable. Logical declarations and lambda calculus though, describe how a problem can be solved, and what pieces it can be broken down into. As a result, the interpreter/processor executing that code can efficiently break the code into pieces, and distribute it across multiple CPUs/cores/threads/servers.
Definitely not useful everywhere; I'd hate to try writing a device driver with predicates instead of if statements. But yes, I think the main point is probably sound, and worth at least getting familiar with, if not using all the time.
The only problem with a predicates (in terms of replacing if statements) is that you still need to test them:
function void Test(Predicate<int> pr, int num)
{
if (pr(num))
{ /* do something */ }
else
{ /* do something else */ }
}
You could of course use the terniary operator (?:), but that's just an if statement in disguise...
Perhaps with quantum computing it will be a sensible strategy to not use IF statements but to let each leg of the computation proceed and only have the function 'collapse' at termination to a useful result.
Sometimes it's necessary to take an extreme position to make your point. I'm sure this person uses if -- but every time you use an if, it's worth having a little think about whether a different pattern would make the code clearer.
Preferring polymorphism to if is at the core of this. Rather than:
if(animaltype = bird) {
squawk();
} else if(animaltype = dog) {
bark();
}
... use:
animal.makeSound();
But that supposes that you've got an Animal class/interface -- so really what the if is telling you, is that you need to create that interface.
So in the real world, what sort of ifs do we see that lead us to a polymorphism solution?
if(logging) {
log.write("Did something");
}
That's really irritating to see throughout your code. How about, instead, having two (or more) implementations of Logger?
this.logger = new NullLogger(); // logger.log() does nothing
this.logger = new StdOutLogger(); // logger.log() writes to stdout
That leads us to the Strategy Pattern.
Instead of:
if(user.getCreditRisk() > 50) {
decision = thoroughCreditCheck();
} else if(user.getCreditRisk() > 20) {
decision = mediumCreditCheck();
} else {
decision = cursoryCreditCheck();
}
... you could have ...
decision = getCreditCheckStrategy(user.getCreditRisk()).decide();
Of course getCreditCheckStrategy() might contain an if -- and that might well be appropriate. You've pushed it into a neat place where it belongs.
It probably comes down to a desire to keep code cyclomatic complexity down, and to reduce the number of branch points in a function. If a function is simple to decompose into a number of smaller functions, each of which can be tested, you can reduce the complexity and make code more easily testable.
IMO:
I suspect he was trying to provoke a debate and make people think about the misuse of 'if'. No one would seriously suggest such a fundamental construction of programming syntax was to be completely avoided would they?
Good that in ruby we have unless ;)
But seriously probably if is the next goto, that even if most of the people think it is evil in some cases is simplifying/speeding up the things (and in some cases like low level highly optimized code it's a must).
I think If statements are evil, but If expressions are not. What I mean by an if expression in this case can be something like the C# ternary operator (condition ? trueExpression : falseExpression). This is not evil because it is a pure function (in a mathematical sense). It evaluates to a new value, but it has no effects on anything else. Because of this, it works in a substitution model.
Imperative If statements are evil because they force you to create side-effects when you don't need to. For an If statement to be meaningful, you have to produce different "effects" depending on the condition expression. These effects can be things like IO, graphic rendering or database transactions, which change things outside of the program. Or, it could be assignment statements that mutate the state of the existing variables. It is usually better to minimize these effects and separate them from the actual logic. But, because of the If statements, we can freely add these "conditionally executed effects" everywhere in the code. I think that's bad.
If is not evil! Consider ...
int sum(int a, int b) {
return a + b;
}
Boring, eh? Now with an added if ...
int sum(int a, int b) {
if (a == 0 && b == 0) {
return 0;
}
return a + b;
}
... your code creation productivity (measured in LOC) is doubled.
Also code readability has improved much, for now you can see in the blink of an eye what the result is when both argument are zero. You couldn't do that in the code above, could you?
Moreover you supported the testteam for they now can push their code coverage test tools use up more to the limits.
Furthermore the code now is better prepared for future enhancements. Let's guess, for example, the sum should be zero if one of the arguments is zero (don't laugh and don't blame me, silly customer requirements, you know, and the customer is always right).
Because of the if in the first place only a slight code change is needed.
int sum(int a, int b) {
if (a == 0 || b == 0) {
return 0;
}
return a + b;
}
How much more code change would have been needed if you hadn't invented the if right from the start.
Thankfulness will be yours on all sides.
Conclusion: There's never enough if's.
There you go. To.

Resources