How to hide literals in code - literals

What are the main existing approaches to hide the value of literals in code, so that they are not easily traced with just an hexdumper or a decompiler?
For example, instead of coding this:
static final int MY_VALUE = 100;
We could have:
static final int MY_VALUE = myFunction1();
private int myFunction1(){
int i = 23;
i += 8 << 4;
for(int j = 0; j < 3; j++){
i-= (j<<1);
}
return myFunction2(i);
}
private int myFunction2(int i){
return i + 19;
}
That was just an example of what we're trying to do. (Yes, I know, the compiler may optimize it and precalculate the constant).
Disclaimer: I know this will not provide any aditional security at all, but it makes the code more obscure (or interesting) to reverse-engineer. The purpose of this is just to force the attacker to debug the program, and waste time on it. Keep in mind that we're doing it just for fun.

Since you're trying to hide text, which will be visible in the simple dump of the program, you can use some kind of simple encryption to obfuscate your program and hide that text from prying eyes.
Detailed instuctions:
Visit ROT47.com and encode your text online. You can also use this web site for a more generic ROTn encoding.
Replace contents of your string constants with the encoded text.
Use the decoder in your code to transform the text back into its original form when you need it. ROT13 Wikipedia article contains some notes about implementation, and here is Javascript implementation of ROTn on StackOverflow. It is trivial to adapt it to whatever language you're using.
Why use ROT47 which is notoriously weak encryption?
In the end, your code will look something like this:
decryptedData = decryptStr(MY_ENCRYPTED_CONSTANT)
useDecrypted(decryptedData)
No matter how strong your cypher, anybody equipped with a debugger can set a breakpoint on useDecrypted() and recover the plaintext. So, strength of the cypher does not matter. However, using something like Rot47 has two distinct advantages:
You can encode your text online, no need to write a specialized program to encode your text.
Decryption is very easy to implement, so you don't waste your time on something that does not add any value to your customers.
Anybody reading your code (your coworker or yourself after 5 years) will know immediately this is not a real security, but security by obscurity.
Your text will still appear as gibberish to anyone just prying inside your compiled program, so mission accomplished.

Run some game of life variant for a large number of iterations, and then make control flow decisions based on the final state vector.
If your program is meant to actually do something useful, you could have your desired branches planned ahead of time and choose bits of the state vector to suit ("I want a true here, bit 17 is on, so make that the condition..")

You could also use some part of compiled code as data, then modify it a little. This would be hard to do in a program executed by virtual machine, but is doable in languages like asm or c.

Related

Bad coding habit wanted

Hey I'm currently trying to emulate an old gaming console. I'm run into a little problem that I do not know how to fix.
More specific, I'm doing a emulator for a GBA. But to make it a little more challenging, I have decided to emulate the games as external devices with memory, possible extra CPUs and such. Just like they are in the real world. To do this I need to create a self contained executable file, where there is allocated some fix size spare memory to save data like saved games.
Is this possible? I know this is uncommon and bad habit. I don't care, it is like people saying DON'T USE GOTO in C, the instruction are there to be used, and I use it with good effect an no headache, and no weird behaviour, it just takes a moment of planning, and maybe a course in compiler implementation to know how.
The language are not important, C will do, but if .net supported it I'll prefer that.
You can take the approach used by some self-extracting archives. Append a marker GUID and a data block after the compiled executable
(executable file) + (marker GUID) + (data area)
In Windows for example, using a command box:
type "c:\vsprojects\my-gba\release\my-gba.exe" "c:\data-files\guid.txt" "c:\data-files\save-data-area.txt" > "c:\data-files\gba-combo.exe"
In Linux you'd cat the 3 pieces together into a new file to form your executable-with-data.
To find the start of the data area, you search for the GUID in the EXE and then seek past it.
There is a very slight chance the GUID could occur naturally in your code, but the odds of that become extremely low the longer the GUID gets. You can use an online GUID generator and combine multiple strings in the GUID text file for example
731056dd-1dd7-4f46-b90b-2ad623198404 a2d0cd76-f8fc-4bcf-8a15-80ca4d9b205f
You do need to make sure you don't write your code in a way that embeds the GUID as a string constant earlier in your code, before the appended marker.
BAD:
const char* foo = "731056dd-1dd7-4f46-b90b-2ad623198404 a2d0cd76-f8fc-4bcf-8a15-80ca4d9b205f"
-- code that seeks for foo ---
OK:
const char* arr[] = { "731056dd-1dd7-4f46-b90b-2ad623198404", "BREAK-MATCH", "a2d0cd76-f8fc-4bcf-8a15-80ca4d9b205f" };
-- code sets foo from arr [0], " ", [2] then seeks for foo ---

Efficient conversion of int to string

I've seen several questions/answers here that suggest the best way to get a string representation of an integer in Objective-C is to use [NSString stringWithFormat:#"%d", x]. I'm afraid the C/C++ programmer in me is having a hard time believing I want to bring all the formatting code into play for such a simple task. That is, I assume stringWithFormat needs to parse through the format string looking for all the different type specifiers, field widths, and options that I could possibly use, then it has to be able to interpret that variable length list of parameters and use the format specifier to coerce x to the appropriate type, then go through a lengthy process of conversion, accounting for signed/unsigned values and negation along the way.
Needless to say in C/C++ I could simply use itoa(x) which does exactly one thing and does it extremely efficiently.
I'm not interested in arguing the relative merits of one language over another, but rather just asking the question: is the incredibly powerful [NSString stringWithFormat:#"%d", x] really the most efficient way to do this very, very simple task in Objective-C? Seems like I'm cracking a peanut with a sledge hammer.
You could use itoa() followed by any of +[NSString stringWithUTF8String:] -[NSString initWithBytes:length:encoding: or +[NSString stringWithCString:encoding:] if it makes you feel better, but I wouldn't worry about it unless you're sure this is a performance problem.
You could also use description method. It boxes it as NSNumber and converts to a NSString.
int intVariable = 1;
NSString* stringRepresentation = [#(intVariable) description];

Obfuscated C# Code - What is the balance between concision and clarity?

Years ago there used to be a contest to see who could produce the most obfuscated C code, and some of the results were dramatically unreadable. C was like that. You could really screw things up with the preprocessor in particular.
However, many of the newer features of C# offer an amazing opportunity to obfuscate code to. I was wondering if anyone had an opinion on finding the right balance between concision and clarity in code. Let me offer one example for discussion, the task of filling items into a ListView. (Yes I know you can do it with data binding, but go with me here.)
The control is two column to be filled with an array of
struct Person
{
public string name;
public string address;
};
One, clear and simple way is this:
private void Fill(Person[] people)
{
foreach(Person person in people)
{
string[] columns = new string[2];
columns[0] = person.name;
columns[1] = person.address;
ListViewItem item = new ListViewItem(columns);
listView1.items.Add(item);
}
}
Clear and simple to understand.
I could also write it like this:
private void Fill(Person[] people)
{
foreach(Person person in people)
{
string[] columns = new string[] { person.name, person.address };
ListViewItem item = new ListViewItem(columns);
listView1.items.Add(item);
}
}
or even:
private void Fill(Person[] people)
{
foreach(var person in people) // Note use implicit typing here
{
listView1.items.Add(new ListViewItem(
new string[] { person.name, person.address }));
}
}
Finally, I could also write it like this:
private void Fill(Person[] people)
{
Array.ForEach(people, item =>
listView1.items.Add(new ListViewItem(
new string[] { person.name, person.address}));
}
Each uses various new features of the language to a greater or lesser extent. How do you find the balance between concision and clarity? Should we have an annual Obfuscated C# contest?
You know what's hard? Writing code that others can read and maintain. Any idiot can write code that compiles and is impossible to maintain.
Always favor maintainability: that's how you find the balance.
Edit:
"Any fool can write code that a computer can understand. Good programmers write code that humans can understand."
Martin Fowler, Refactoring: Improving the Design of Existing Code
Thanks to roygbiv for finding the above quote. Apologies to Fowler for murdering his quote; I knew I'd read it before, I just couldn't remember where.
Stuffing everything into one line doesn't make it "obfuscated" -- it just makes you scroll a lot unnecessarily. It would still be trivial for anyone who knows C# to understand any of the examples you presented, and if you used linebreaks, none would really be much better or worse than the others.
Code for maximum readability, but:
Remember that superfluous verbosity and syntactic noise hurt readability. More conciseness can coincide with improved readability if the more concise notation allows you to express your intent more directly. For example, compare real lambda functions to simulating them with single-method interfaces.
Assume that other people who read your code are decent programmers and know the language you're working in. Don't assume a language lawyer level of knowledge, but assume a good working knowledge. Don't code to the lowest common denominator because, while it may make your code more maintainable by code monkeys, it will annoy both you and maintenance programmers who actually know what they're doing.
More specifically, example 1 has way too much syntactic noise for something so simple. Example 4 is very difficult for a human to parse. I'd say 2 and 3 are both pretty good, though in the case of example 3 I'd reformat it a little, just to make it easier for a human to parse all the function call nesting:
private void Fill(Person[] people)
{
foreach(var person in people)
{
listView1.items.Add(
new ListViewItem(
new string[] { person.name, person.address }
)
);
}
}
Now you have the best of both worlds: It can be easily parsed by humans and doesn't have any superfluous, unnecessarily verbose temporary variables.
Edit: I also think that using implicit variable typing, i.e. var is fine most of the time. People write perfectly readable code in dynamic languages where implicit typing is the only kind of typing, and most of the time the formal types of your variables is a low-level detail that has little to do with the high-level intent of your code.
At least with respect to the example that you provide here, I don't really think Obfuscation rises as you proceed until you get to the last one. Even there, the only reason for any ambiguity is the presence of the Lambda and that just takes some getting used to. So, a newbie might have trouble with the last but shouldn't find the others unreadable in the way that the old wild C competition entries were unreadable.
The difference is that these C# examples are all at the same level of abstraction - the more concise examples just remove "fluff." In C, you have the opportunity for ambiguity due to A) arbitrarily renamed/aliased constructs and B) several levels of memory access bundled into one statement.
One the whole, then, you can right obscure code in ANY language but I don't think that C# is prone to it like C and, indeed, I think it is a clearer language than many - even when using some of the more advanced constructs.
C# and VB.NET languages were designed for more for clarity because they are operate at a higher level than C. C is programming close the metal so-to-speak. It's not possible by-design to write obfuscated C# like that of C.
"Any fool can write code that a computer can understand. Good programmers write code that humans can understand."
Martin Fowler, Refactoring: Improving the Design of Existing Code
I do not find the example obfuscated.
First, the intent is so painfully clear, that a newbie is more probable to learn what a lambda does than not understanding the code. Which is the perfect place to use the more "advanced" techniques - where even someone that doesn't have a clue what they do understands what they "should" do.
Second, all of the above are not only not obfuscated, they are perfectly idiomatic C#. The last one arguably not so much because of the not so widely use Array.ForEach which most people (that I have worked with) would utilise LINQ.

Why is the 'if' statement considered evil?

I just came from Simple Design and Testing Conference. In one of the session we were talking about evil keywords in programming languages. Corey Haines, who proposed the subject, was convinced that if statement is absolute evil. His alternative was to create functions with predicates. Can you please explain to me why if is evil.
I understand that you can write very ugly code abusing if. But I don't believe that it's that bad.
The if statement is rarely considered as "evil" as goto or mutable global variables -- and even the latter are actually not universally and absolutely evil. I would suggest taking the claim as a bit hyperbolic.
It also largely depends on your programming language and environment. In languages which support pattern matching, you will have great tools for replacing if at your disposal. But if you're programming a low-level microcontroller in C, replacing ifs with function pointers will be a step in the wrong direction. So, I will mostly consider replacing ifs in OOP programming, because in functional languages, if is not idiomatic anyway, while in purely procedural languages you don't have many other options to begin with.
Nevertheless, conditional clauses sometimes result in code which is harder to manage. This does not only include the if statement, but even more commonly the switch statement, which usually includes more branches than a corresponding if would.
There are cases where it's perfectly reasonable to use an if
When you are writing utility methods, extensions or specific library functions, it's likely that you won't be able to avoid ifs (and you shouldn't). There isn't a better way to code this little function, nor make it more self-documented than it is:
// this is a good "if" use-case
int Min(int a, int b)
{
if (a < b)
return a;
else
return b;
}
// or, if you prefer the ternary operator
int Min(int a, int b)
{
return (a < b) ? a : b;
}
Branching over a "type code" is a code smell
On the other hand, if you encounter code which tests for some sort of a type code, or tests if a variable is of a certain type, then this is most likely a good candidate for refactoring, namely replacing the conditional with polymorphism.
The reason for this is that by allowing your callers to branch on a certain type code, you are creating a possibility to end up with numerous checks scattered all over your code, making extensions and maintenance much more complex. Polymorphism on the other hand allows you to bring this branching decision as closer to the root of your program as possible.
Consider:
// this is called branching on a "type code",
// and screams for refactoring
void RunVehicle(Vehicle vehicle)
{
// how the hell do I even test this?
if (vehicle.Type == CAR)
Drive(vehicle);
else if (vehicle.Type == PLANE)
Fly(vehicle);
else
Sail(vehicle);
}
By placing common but type-specific (i.e. class-specific) functionality into separate classes and exposing it through a virtual method (or an interface), you allow the internal parts of your program to delegate this decision to someone higher in the call hierarchy (potentially at a single place in code), allowing much easier testing (mocking), extensibility and maintenance:
// adding a new vehicle is gonna be a piece of cake
interface IVehicle
{
void Run();
}
// your method now doesn't care about which vehicle
// it got as a parameter
void RunVehicle(IVehicle vehicle)
{
vehicle.Run();
}
And you can now easily test if your RunVehicle method works as it should:
// you can now create test (mock) implementations
// since you're passing it as an interface
var mock = new Mock<IVehicle>();
// run the client method
something.RunVehicle(mock.Object);
// check if Run() was invoked
mock.Verify(m => m.Run(), Times.Once());
Patterns which only differ in their if conditions can be reused
Regarding the argument about replacing if with a "predicate" in your question, Haines probably wanted to mention that sometimes similar patterns exist over your code, which differ only in their conditional expressions. Conditional expressions do emerge in conjunction with ifs, but the whole idea is to extract a repeating pattern into a separate method, leaving the expression as a parameter. This is what LINQ already does, usually resulting in cleaner code compared to an alternative foreach:
Consider these two very similar methods:
// average male age
public double AverageMaleAge(List<Person> people)
{
double sum = 0.0;
int count = 0;
foreach (var person in people)
{
if (person.Gender == Gender.Male)
{
sum += person.Age;
count++;
}
}
return sum / count; // not checking for zero div. for simplicity
}
// average female age
public double AverageFemaleAge(List<Person> people)
{
double sum = 0.0;
int count = 0;
foreach (var person in people)
{
if (person.Gender == Gender.Female) // <-- only the expression
{ // is different
sum += person.Age;
count++;
}
}
return sum / count;
}
This indicates that you can extract the condition into a predicate, leaving you with a single method for these two cases (and many other future cases):
// average age for all people matched by the predicate
public double AverageAge(List<Person> people, Predicate<Person> match)
{
double sum = 0.0;
int count = 0;
foreach (var person in people)
{
if (match(person)) // <-- the decision to match
{ // is now delegated to callers
sum += person.Age;
count++;
}
}
return sum / count;
}
var males = AverageAge(people, p => p.Gender == Gender.Male);
var females = AverageAge(people, p => p.Gender == Gender.Female);
And since LINQ already has a bunch of handy extension methods like this, you actually don't even need to write your own methods:
// replace everything we've written above with these two lines
var males = list.Where(p => p.Gender == Gender.Male).Average(p => p.Age);
var females = list.Where(p => p.Gender == Gender.Female).Average(p => p.Age);
In this last LINQ version the if statement has "disappeared" completely, although:
to be honest the problem wasn't in the if by itself, but in the entire code pattern (simply because it was duplicated), and
the if still actually exists, but it's written inside the LINQ Where extension method, which has been tested and closed for modification. Having less of your own code is always a good thing: less things to test, less things to go wrong, and the code is simpler to follow, analyze and maintain.
Huge runs of nested if/else statements
When you see a function spanning 1000 lines and having dozens of nested if blocks, there is an enormous chance it can be rewritten to
use a better data structure and organize the input data in a more appropriate manner (e.g. a hashtable, which will map one input value to another in a single call),
use a formula, a loop, or sometimes just an existing function which performs the same logic in 10 lines or less (e.g. this notorious example comes to my mind, but the general idea applies to other cases),
use guard clauses to prevent nesting (guard clauses give more confidence into the state of variables throughout the function, because they get rid of exceptional cases as soon as possible),
at least replace with a switch statement where appropriate.
Refactor when you feel it's a code smell, but don't over-engineer
Having said all this, you should not spend sleepless nights over having a couple of conditionals now and there. While these answers can provide some general rules of thumb, the best way to be able to detect constructs which need refactoring is through experience. Over time, some patterns emerge that result in modifying the same clauses over and over again.
There is another sense in which if can be evil: when it comes instead of polymorphism.
E.g.
if (animal.isFrog()) croak(animal)
else if (animal.isDog()) bark(animal)
else if (animal.isLion()) roar(animal)
instead of
animal.emitSound()
But basically if is a perfectly acceptable tool for what it does. It can be abused and misused of course, but it is nowhere near the status of goto.
A good quote from Code Complete:
Code as if whoever maintains your program is a violent psychopath who
knows where you live.
— Anonymous
IOW, keep it simple. If the readability of your application will be enhanced by using a predicate in a particular area, use it. Otherwise, use the 'if' and move on.
I think it depends on what you're doing to be honest.
If you have a simple if..else statement, why use a predicate?
If you can, use a switch for larger if replacements, and then if the option to use a predicate for large operations (where it makes sense, otherwise your code will be a nightmare to maintain), use it.
This guy seems to have been a bit pedantic for my liking. Replacing all if's with Predicates is just crazy talk.
There is the Anti-If campaign which started earlier in the year. The main premise being that many nested if statements often can often be replaced with polymorphism.
I would be interested to see an example of using the Predicate instead. Is this more along the lines of functional programming?
Just like in the bible verse about money, if statements are not evil -- the LOVE of if statements is evil. A program without if statements is a ridiculous idea, and using them as necessary is essential. But a program that has 100 if-else if blocks in a row (which, sadly, I have seen) is definitely evil.
I have to say that I recently have begun to view if statements as a code smell: especially when you find yourself repeating the same condition several times. But there's something you need to understand about code smells: they don't necessarily mean that the code is bad. They just mean that there's a good chance the code is bad.
For instance, comments are listed as a code smell by Martin Fowler, but I wouldn't take anyone seriously who says "comments are evil; don't use them".
Generally though, I prefer to use polymorphism instead of if statements where possible. That just makes for so much less room for error. I tend to find that a lot of the time, using conditionals leads to a lot of tramp arguments as well (because you have to pass the data needed to form the conditional on to the appropriate method).
if is not evil(I also hold that assigning morality to code-writing practices is asinine...).
Mr. Haines is being silly and should be laughed at.
I'll agree with you; he was wrong. You can go too far with things like that, too clever for your own good.
Code created with predicates instead of ifs would be horrendous to maintain and test.
Predicates come from logical/declarative programming languages, like PROLOG. For certain classes of problems, like constraint solving, they are arguably superior to a lot of drawn out step-by-step if-this-do-that-then-do-this crap. Problems that would be long and complex to solve in imperative languages can be done in just a few lines in PROLOG.
There's also the issue of scalable programming (due to the move towards multicore, the web, etc.). If statements and imperative programming in general tend to be in step-by-step order, and not scaleable. Logical declarations and lambda calculus though, describe how a problem can be solved, and what pieces it can be broken down into. As a result, the interpreter/processor executing that code can efficiently break the code into pieces, and distribute it across multiple CPUs/cores/threads/servers.
Definitely not useful everywhere; I'd hate to try writing a device driver with predicates instead of if statements. But yes, I think the main point is probably sound, and worth at least getting familiar with, if not using all the time.
The only problem with a predicates (in terms of replacing if statements) is that you still need to test them:
function void Test(Predicate<int> pr, int num)
{
if (pr(num))
{ /* do something */ }
else
{ /* do something else */ }
}
You could of course use the terniary operator (?:), but that's just an if statement in disguise...
Perhaps with quantum computing it will be a sensible strategy to not use IF statements but to let each leg of the computation proceed and only have the function 'collapse' at termination to a useful result.
Sometimes it's necessary to take an extreme position to make your point. I'm sure this person uses if -- but every time you use an if, it's worth having a little think about whether a different pattern would make the code clearer.
Preferring polymorphism to if is at the core of this. Rather than:
if(animaltype = bird) {
squawk();
} else if(animaltype = dog) {
bark();
}
... use:
animal.makeSound();
But that supposes that you've got an Animal class/interface -- so really what the if is telling you, is that you need to create that interface.
So in the real world, what sort of ifs do we see that lead us to a polymorphism solution?
if(logging) {
log.write("Did something");
}
That's really irritating to see throughout your code. How about, instead, having two (or more) implementations of Logger?
this.logger = new NullLogger(); // logger.log() does nothing
this.logger = new StdOutLogger(); // logger.log() writes to stdout
That leads us to the Strategy Pattern.
Instead of:
if(user.getCreditRisk() > 50) {
decision = thoroughCreditCheck();
} else if(user.getCreditRisk() > 20) {
decision = mediumCreditCheck();
} else {
decision = cursoryCreditCheck();
}
... you could have ...
decision = getCreditCheckStrategy(user.getCreditRisk()).decide();
Of course getCreditCheckStrategy() might contain an if -- and that might well be appropriate. You've pushed it into a neat place where it belongs.
It probably comes down to a desire to keep code cyclomatic complexity down, and to reduce the number of branch points in a function. If a function is simple to decompose into a number of smaller functions, each of which can be tested, you can reduce the complexity and make code more easily testable.
IMO:
I suspect he was trying to provoke a debate and make people think about the misuse of 'if'. No one would seriously suggest such a fundamental construction of programming syntax was to be completely avoided would they?
Good that in ruby we have unless ;)
But seriously probably if is the next goto, that even if most of the people think it is evil in some cases is simplifying/speeding up the things (and in some cases like low level highly optimized code it's a must).
I think If statements are evil, but If expressions are not. What I mean by an if expression in this case can be something like the C# ternary operator (condition ? trueExpression : falseExpression). This is not evil because it is a pure function (in a mathematical sense). It evaluates to a new value, but it has no effects on anything else. Because of this, it works in a substitution model.
Imperative If statements are evil because they force you to create side-effects when you don't need to. For an If statement to be meaningful, you have to produce different "effects" depending on the condition expression. These effects can be things like IO, graphic rendering or database transactions, which change things outside of the program. Or, it could be assignment statements that mutate the state of the existing variables. It is usually better to minimize these effects and separate them from the actual logic. But, because of the If statements, we can freely add these "conditionally executed effects" everywhere in the code. I think that's bad.
If is not evil! Consider ...
int sum(int a, int b) {
return a + b;
}
Boring, eh? Now with an added if ...
int sum(int a, int b) {
if (a == 0 && b == 0) {
return 0;
}
return a + b;
}
... your code creation productivity (measured in LOC) is doubled.
Also code readability has improved much, for now you can see in the blink of an eye what the result is when both argument are zero. You couldn't do that in the code above, could you?
Moreover you supported the testteam for they now can push their code coverage test tools use up more to the limits.
Furthermore the code now is better prepared for future enhancements. Let's guess, for example, the sum should be zero if one of the arguments is zero (don't laugh and don't blame me, silly customer requirements, you know, and the customer is always right).
Because of the if in the first place only a slight code change is needed.
int sum(int a, int b) {
if (a == 0 || b == 0) {
return 0;
}
return a + b;
}
How much more code change would have been needed if you hadn't invented the if right from the start.
Thankfulness will be yours on all sides.
Conclusion: There's never enough if's.
There you go. To.

What are some advantages of duck-typing vs. static typing?

I'm researching and experimenting more with Groovy and I'm trying to wrap my mind around the pros and cons of implementing things in Groovy that I can't/don't do in Java. Dynamic programming is still just a concept to me since I've been deeply steeped static and strongly typed languages.
Groovy gives me the ability to duck-type, but I can't really see the value. How is duck-typing more productive than static typing? What kind of things can I do in my code practice to help me grasp the benefits of it?
I ask this question with Groovy in mind but I understand it isn't necessarily a Groovy question so I welcome answers from every code camp.
A lot of the comments for duck typing don't really substantiate the claims. Not "having to worry" about a type is not sustainable for maintenance or making an application extendable. I've really had a good opportunity to see Grails in action over my last contract and its quite funny to watch really. Everyone is happy about the gains in being able to "create-app" and get going - sadly it all catches up to you on the back end.
Groovy seems the same way to me. Sure you can write very succinct code and definitely there is some nice sugar in how we get to work with properties, collections, etc... But the cost of not knowing what the heck is being passed back and forth just gets worse and worse. At some point your scratching your head wondering why the project has become 80% testing and 20% work. The lesson here is that "smaller" does not make for "more readable" code. Sorry folks, its simple logic - the more you have to know intuitively then the more complex the process of understanding that code becomes. It's why GUI's have backed off becoming overly iconic over the years - sure looks pretty but WTH is going on is not always obvious.
People on that project seemed to have troubles "nailing down" the lessons learned, but when you have methods returning either a single element of type T, an array of T, an ErrorResult or a null ... it becomes rather apparent.
One thing working with Groovy has done for me however - awesome billable hours woot!
Duck typing cripples most modern IDE's static checking, which can point out errors as you type. Some consider this an advantage. I want the IDE/Compiler to tell me I've made a stupid programmer trick as soon as possible.
My most recent favorite argument against duck typing comes from a Grails project DTO:
class SimpleResults {
def results
def total
def categories
}
where results turns out to be something like Map<String, List<ComplexType>>, which can be discovered only by following a trail of method calls in different classes until you find where it was created. For the terminally curious, total is the sum of the sizes of the List<ComplexType>s and categories is the size of the Map
It may have been clear to the original developer, but the poor maintenance guy (ME) lost a lot of hair tracking this one down.
It's a little bit difficult to see the value of duck typing until you've used it for a little while. Once you get used to it, you'll realize how much of a load off your mind it is to not have to deal with interfaces or having to worry about exactly what type something is.
Next, which is better: EMACS or vi? This is one of the running religious wars.
Think of it this way: any program that is correct, will be correct if the language is statically typed. What static typing does is let the compiler have enough information to detect type mismatches at compile time instead of run time. This can be an annoyance if your doing incremental sorts of programming, although (I maintain) if you're thinking clearly about your program it doesn't much matter; on the other hand, if you're building a really big program, like an operating system or a telephone switch, with dozens or hundreds or thousands of people working on it, or with really high reliability requirements, then having he compiler be able to detect a large class of problems for you without needing a test case to exercise just the right code path.
It's not as if dynamic typing is a new and different thing: C, for example, is effectively dynamically typed, since I can always cast a foo* to a bar*. It just means it's then my responsibility as a C programmer never to use code that is appropriate on a bar* when the address is really pointing to a foo*. But as a result of the issues with large programs, C grew tools like lint(1), strengthened its type system with typedef and eventually developed a strongly typed variant in C++. (And, of course, C++ in turn developed ways around the strong typing, with all the varieties of casts and generics/templates and with RTTI.
One other thing, though --- don't confuse "agile programming" with "dynamic languages". Agile programming is about the way people work together in a project: can the project adapt to changing requirements to meet the customers' needs while maintaining a humane environment for the programmers? It can be done with dynamically typed languages, and often is, because they can be more productive (eg, Ruby, Smalltalk), but it can be done, has been done successfully, in C and even assembler. In fact, Rally Development even uses agile methods (SCRUM in particular) to do marketing and documentation.
There is nothing wrong with static typing if you are using Haskell, which has an incredible static type system. However, if you are using languages like Java and C++ that have terribly crippling type systems, duck typing is definitely an improvement.
Imagine trying to use something so simple as "map" in Java (and no, I don't mean the data structure). Even generics are rather poorly supported.
With, TDD + 100% Code Coverage + IDE tools to constantly run my tests, I do not feel a need of static typing any more. With no strong types, my unit testing has become so easy (Simply use Maps for creating mock objects). Specially , when you are using Generics, you can see the difference:
//Static typing
Map<String,List<Class1<Class2>>> someMap = [:] as HashMap<String,List<Class1<Class2>>>
vs
//Dynamic typing
def someMap = [:]
IMHO, the advantage of duck typing becomes magnified when you adhere to some conventions, such as naming you variables and methods in a consistent way. Taking the example from Ken G, I think it would read best:
class SimpleResults {
def mapOfListResults
def total
def categories
}
Let's say you define a contract on some operation named 'calculateRating(A,B)' where A and B adhere to another contract. In pseudocode, it would read:
Long calculateRating(A someObj, B, otherObj) {
//some fake algorithm here:
if(someObj.doStuff('foo') > otherObj.doStuff('bar')) return someObj.calcRating());
else return otherObj.calcRating();
}
If you want to implement this in Java, both A and B must implement some kind of interface that reads something like this:
public interface MyService {
public int doStuff(String input);
}
Besides, if you want to generalize you contract for calculating ratings (let's say you have another algorithm for rating calculations), you also have to create an interface:
public long calculateRating(MyService A, MyServiceB);
With duck typing, you can ditch your interfaces and just rely that on runtime, both A and B will respond correctly to your doStuff() calls. There is no need for a specific contract definition. This can work for you but it can also work against you.
The downside is that you have to be extra careful in order to guarantee that your code does not break when some other persons changes it (ie, the other person must be aware of the implicit contract on the method name and arguments).
Note that this aggravates specially in Java, where the syntax is not as terse as it could be (compared to Scala for example). A counter-example of this is the Lift framework, where they say that the SLOC count of the framework is similar to Rails, but the test code has less lines because they don't need to implement type checks within the tests.
Here's one scenario where duck typing saves work.
Here's a very trivial class
class BookFinder {
def searchEngine
def findBookByTitle(String title) {
return searchEngine.find( [ "Title" : title ] )
}
}
Now for the unit test:
void bookFinderTest() {
// with Expando we can 'fake' any object at runtime.
// alternatively you could write a MockSearchEngine class.
def mockSearchEngine = new Expando()
mockSearchEngine.find = {
return new Book("Heart of Darkness","Joseph Conrad")
}
def bf = new BookFinder()
bf.searchEngine = mockSearchEngine
def book = bf.findBookByTitle("Heart of Darkness")
assert(book.author == "Joseph Conrad"
}
We were able to substitute an Expando for the SearchEngine, because of the absence of static type checking. With static type checking we would have had to ensure that SearchEngine was an interface, or at least an abstract class, and create a full mock implementation of it. That's labour intensive, or you can use a sophisticated single-purpose mocking framework. But duck typing is general-purpose, and has helped us.
Because of duck typing, our unit test can provide any old object in place of the dependency, just as long as it implements the methods that get called.
To emphasise - you can do this in a statically typed language, with careful use of interfaces and class hierarchies. But with duck typing you can do it with less thinking and fewer keystrokes.
That's an advantage of duck typing. It doesn't mean that dynamic typing is the right paradigm to use in all situations. In my Groovy projects, I like to switch back to Java in circumstances where I feel that compiler warnings about types are going to help me.
To me, they aren't horribly different if you see dynamically typed languages as simply a form of static typing where everything inherits from a sufficiently abstract base class.
Problems arise when, as many have pointed out, you start getting strange with this. Someone pointed out a function that returns a single object, a collection, or a null. Have the function return a specific type, not multiple. Use multiple functions for single vs collection.
What it boils down to is that anyone can write bad code. Static typing is a great safety device, but sometimes the helmet gets in the way when you want to feel the wind in your hair.
It's not that duck typing is more productive than static typing as much as it is simply different. With static typing you always have to worry that your data is the correct type and in Java it shows up through casting to the right type. With duck typing the type doesn't matter as long as it has the right method, so it really just eliminates a lot of the hassle of casting and conversions between types.

Resources