Should Programmers Use Decompilers? - decompiling

Hear lately I've been listening to Jeff Atwood and Joel Spolsky's radio show and they have been talking about dogfooding (the process of reusing your own code, see Jeff Atwood's blog post). So my question is should programmers use decompilers to see how that programmers code is implemented and works, to make sure it won't break your code. Or should you just trust that programmers code and adapt to it because using decompilers go against everything we as programmers have ever learn about hiding data (well OO programmers at least)?
Note: I wasn't sure which tags this would go under so feel free to retag it.
Edit: Just to clarify I was asking about decompilers as a last resort, say you can't get the source code for some reason. Sorry, I should have supplied this in the original question.

Yes, It can be useful to use the output of a decompiler, but not for what you suggest. The output of a compiler doesn't ever look much like what a human would write (except when it does.) It can't tell you why the code does what it does, or what a particular variable should mean. It's unlikely to be worth the trouble to do this unless you already have the source.
If you do have the source, then there are lots of good reasons to use a decompiler in your development process.
Most often, the reasons for using the output of a decompiler is to better optimize code. Sometimes, with high optimization settings, a compiler will just get it wrong. This can be almost impossible to sort out in some cases without comparing the output of the compiler at different levels of optimization.
Other times, when trying to squeeze the most performance out of a very hot code path, a developer can try arranging their code in a few different ways and compare the compiled results. As a last resort, this may be the simplest way to start when implementing a code block in assembly language, by duplicating the compiler's output.

Dogfooding is the process of using the code that you write, not necessarily re-using code.
However, code re-use typically means you have the source, hence 'code-reuse' otherwise its just using a library supplied by someone else.
Decompiling is hard to get right, and the output is typically very hard to follow.

You should use a decompiler if it is the tool that's required to get the job done. However, I don't think it's the proper use of a decompiler to get an idea of how well the code which is being decompiled was written. Depending on the language you use, the decompiled code can be very different from the code which was actually written. If you want to see some real code, look at open source code. If you want to see the code of some particular product, it's probably better to try to get access to the actual code through some legal means.

I'm not sure what exactly it is you are asking, what you expect "decompilers" to show you, or what this has to do with Atwood and Spolsky, or what the question is exactly. If you're programming to public interfaces then why would you need to see the original source of the the third party code to see if it will "break" your code? You could more effectively build tests to in order to determine this. As well, what the "decompiler" will tell you largely depends on the language/platform the software was written in, whether it is Java, .NET, C and so forth. It's not the same as having the original source to read, even in the case of .NET assemblies. Anyway, if you are worried about third party code not working for you then you should really be doing typical kinds of unit tests against the code rather than trying to "decompile" it. As far as whether you "should," if you mean whether you "should" in some other way other than what would be the best use of your time then I'm not sure what you mean.

Should Programmers Use Decompilers?
Use the right tool for the right job. Decompilers don't often produce results that are easy to understand, but sometimes they are what's needed.
should programmers use decompilers to
see how that programmers code is
implemented and works, to make sure it
won't break your code.
No, not unless you find a problem and need support. In general you don't use it if you don't trust it, and if you have to use it you even when you don't trust it you develop tests to prove the functionality and verify that later upgrades still work as expected.
Don't use functionality you don't test, unless you have very good support or a relationship of trust.
-Adam

Or should you just trust that programmers code and adapt to it because using decompilers go against everything we as programmers have ever learn about hiding data (well OO programmers at least)?
This is not true at all. You would use a decompiler not because you want to get around any sort of abstraction, encapsulation, or defeat OO principles, but because you want to understand why the code is behaving the way it is better.
Sometimes you need to use a decompiler (or in the Java world, a bytecode viewer) when you are troubleshooting an annoying bug with a 3rd party library where an exception is thrown with no useful error message, no logging, etc.
Use of a decompiler has nothing to do with OO principles.

The short answer to this... Program to a public and documented specification, not to an implementation. Relying on implementation specifics and side-effects will burn you.
Decompilation is not a tool to help you program correctly, though it might, in a pinch, assist you in understanding a problem with someone else's code for which you don't have source.
Also, beware of the possible legal risk of decompiling; many software companies have no-decompile clauses which could expose you and your employer to legal consequences.

Related

How should dangerous code snippets be published?

When discussing (asking/answering questions about, writing blog posts about, etc.) some programming matters, it may be desirable to give source code examples of what you're talking about; but in some cases these snippets may be dangerous, not because they are directly harmful but because they seem to work at first but only set up for problems later. Two examples would be when discussing concurrency issues, where the code works most of the time but rarely and non-deterministically fails, and when discussing security issues, where the code seems to work but can in fact be exploited; and there could be other examples.
It is necessary to be able to discuss such issues, to foster awareness of them at least. However, I am always worried that someone will come from a search engine, barely read the post, copy and paste the snippet and use it for something; more subtly, someone may read the post, try out the code in a test project and confirm it can indeed be exploited (as he is encouraged to do), then some time later reuse the dangerous code, as he has forgotten the code is dangerous and there is no longer a blog post explaining why the code is dangerous around the snippet.
So I am wondering how to mark such code so that no part of it somehow makes it to production (or if it ever does, then the responsible party could not plausibly deny awareness).
One way I came up with is to put:
an #error (or similar) directive inside each of the functions, as well as
repeated comments warning of the dangerousness of the code (since someone who will try out the code in a test project to confirm the issue will have removed the #error directive).
But since these comments would only clutter up the snippet when reading on the web, I make them the same color as the background (or at least I am trying to; see how I put it in action here, I incidentally have a question on doctype.com asking how to best do this).
If that seems completely overkill, remember that concurrency (and security) issues are very dangerous so I want to do all I can (within reason) to prevent my snippets from causing issues in real software; I am sometimes comparing this to fissile material handling.
(I honestly don't know whether it would be best suited for programmers.stackexchange.com or here, so I'm asking here first; feel free to move to programmers.stackexchange.com if it turns out it would be better there.)
You make a very good point and I think that you handle it pretty well right now.
However, the #error lines show up in the blog post for me, they are not white.
I think that you shouldn't worry so much about it being picked up by a feed or something like that. If the code is pulled away from the warning message on your blog, it's more important to have the #error lines visible.
But overall, I like your system. I might be good idea to set some standard for this, though, as programmers.
I would however add a link to the original post explaining why it is bad, too. That is way more important than just saying it is.
So to summarize: good idea, we should think of a standard. Make sure to include a link to a why.
Personally, yes, I think it's overkill.
I don't think you need to concern yourself with someone who extracts and uses the code without reading the context in which it's given. Such a programmer will likely be making so many other mistakes as to render using your code largely irrelevant.
In short they will have and be creating bigger problems.

Code obfuscation usage in various languages

I recently learned about code obfuscation. Its nice thing to do, when you have spare time, but I have different question. Why to do it?
First, there are languages in which I am sure its great thing - interpreted ones, like php, JavaScript and much more. There it seems like a good and more secure thing.
Second, there are languages where this seems to have no real effect for me - all the native code compiled languages. Take C for example. when compiled, all the variable names, function names, most of obfuscation techniques go away. If some can make it into native code, it would be things like recursion instead of for cycles and so, but disassembled code will anyway have instead of names some disassembler-generated identifiers, right?
And last category are languages I am not quite sure about. And that's the main reason I ask. These languages would be Java, C# (.NET),and the last Silverlight used in WP7. I ask because I read some article that state that on WP7 apps, code obfuscation helps preventing code from hacking. But I always thought of byte-code as being very similar to standard assembler codes, therefore again not having any information about real pre-compilation variable names, function names, etc. So, where is the truth?
Do it if you want, but don't expect any determined person to be scared away by it. There exist de-obfuscators, people can read obfuscated code as well (just as there are people who can read optimized assembly and reconstruct the original C code). Code obfuscation just gives you a false sense of security and might deter a person who is just curious (instead of deterring those who are serious about stealing your code). All it gives you is a false sense of security but no real one. Schneier aptly names this "security theater".
Yes, many modern languages that retain more information about the source can be obfuscated better than those that are compiled right to machine code. For the latter the compiler already does quite a good job with optimization. Your notion of bytecode being akin to traditional assembler is slightly wrong here, though. Especially .NET bytecode retains enough metadata to reconstruct the original source almost exactly (see Reflector). What isn't retained there are the names of local variables and arguments to methods. But you still need and retain the method and class names.
Another issue you should be aware of: If you give your customers an obfuscated executable and your program crashes, make sure you have a way of getting the real stacktrace back instead of the obfuscated one. Saying "Sorry, I cannot determine the root cause of why my program killed hours of your work since I chose to obfuscate it" isn't going to cut it, I guess :-)
Obfuscation is a common technique for mobile applications where you have hardware restrictions. Obfuscated code tends to have shorter identifiers and therefore smaller binaries.

How to protect your software code? [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicates:
How do you protect your software from illegal distribution?
Best practice to prevent software copy
Hypothetical situation:
Lets say I have built a software product from the scratch and it does wonderful things. The only problem is that, once someone takes a look at the code, they will understand it very easily and they can easily build it up themselves.
Now, the thing is that I built the code from the scratch 100% and uses a mixture of API calls.
Nobody else is involved in the development of the code.
If I want to sell this product, what is the guarantee that someone much smarter than me will reverse engineer the whole thing and come up with better product?
Right now I am thinking of fragmenting the whole code. Adding lots of redundant code and tonnes of comments.
Is there any software which encrypts the software code, that will make debugging, troubleshooting, and understanding how the code works virtually impossible? and yet runs as usual? so that the developer can have peace of mind?
Very few things in a program are truly novel. Almost everything that you are likely to put into your code, someone else could invent on their own. Generally more easily than they could learn it by reading your code. Reading code is harder than writing it, and most programmers don't really like doing it anyway.
So it's much more likely that they will look at your app and think "I could do that", then "That's cool, I'm gonna read that code and then copy it!". Even if they understand it, you will still own the copyright, you still get to market first.
I recommend that you just forget about it.
once someone takes a look at the
code, they will understand it very
easily and they can easily build it up
themselves.
So don't give anybody the source code.
If I want to sell this product, what
is the guarantee that someone much
smarter than me will reverse engineer
the whole thing and come up with
better product?
(a) So start selling it now and capture the market. Reverse engineering takes time, during which you are capturing market and 'mind-share'. (b) Put a provision in your licence agreement that prohibits reverse-engineering. (c) Make sure everybody who gets the product signs the agreement.
Right now I am thinking of fragmenting
the whole code. Adding lots of
redundant code and tonnes of comments.
That only has a point if you're going to distribute the source code. In which case nobody even needs to reverse-engineer. They have your source code. Don't give it to them.
Is there any software ...
There's lots of software that purports to do this job. However it is a technical solution to a business problem. All software can be reverse-engineered, because at some point or other it all has to be decrypted and de-obfuscated to the point where the CPU will understand it. At that point it is essentially plaintext. So no technical solution is formally speaking possible (short of something like code that executes in a tamper-proof HSM).
I will add that there is another business mechanism you can use to defend against business loss, which is what this is all about: price. Make the price so high that the licensees will value their copy and not permit it to be inspected, or make it so low that reverse-engineering is cost-infeasible; or make it free and make your money on the support contract.
Once you actually have the knowledge and experience to write such a codebase, it will be clear to you that obfuscation is meant to deter casual IP infringement.
Someone who wants to know your code is going to know your code.
If it becomes an issue of monetary loss, the courts are your protection.
That's how it works.
Someone will always be able to understand and work out your code. Heck, if you had 0 way getting to the code, even just using the system is enough for someone to be able to replicate the process.
Example: I take a jug of water and pour it into the cup, while my back is facing to another person. This other person knows that water and gravity are awesome at making things fall into other containers, so they can then work out a process of lifting a jug to let gravity (API call) work in their favour. They mightn't know exact what angle you used in your forearm and any super-sneaky cup-holding techniques you used, but they can replicate the same process and improve on it over time.
tl;dr: You can't protect code.
The thing to do is invent even more wonderful things while the competition is reverse-engineering your current stuff. It's called competing through innovation.
I am not a lawyer
if you are really worried about it, to the point you are willing to invest money in it, dont protect your code (beyond something reasonable like obfuscation or encryption) but rather patent your idea and your art. Then if someone does take it, reverse engineer it and make a better process based of yours, you have legal grounds to get your money.
There are tons of things you will have to do, include proving they took your idea (which isnt easy), but if this is the solution to world hunger and all of humanities problems its the thing to do.
Now for the downside, I will guess, and probably be 90% right that your method is:
Not patentable, for various reasons (I was amazed at the number of already patented ideas, and how difficult it was to identify original art)
Not new, or unique (i.e. there is already established art for it)
Not worth patenting because the expense far outways the benefits
An IP lawyer can tell you for sure, and the expense of a consult is not that much. Overall it will be cheaper to consult with them then to invest a lot of time in hiding code.
Good luck.
Don't even bother. If your code really "does wonderful things" be assured that it'll get hacked. And be it just for curiosity.
There is no 100% way to protect your code from reverse engineering. What language are we talking about? If this is C/C++ then it is pretty hard to reverse engineer, more you could strip it from debugging information etc. But if this is for example Java then even if you obfuscate the code, there are some pretty cool tools (like JAD) that will reveal much of your work anyway.
Despite all of this I think you should try to change your attitude. Big companies pay a lot of money for simple solutions and it seems that nowadays service is the most important thing, not the software (hence the success of open-software based companies). So, if you have a great software don't be scared that someone might steal it, rather think how to sell it good.
Is there any software which encrypts the software code, that will make debugging, troubleshooting, and understanding how the code works virtually impossible? and yet runs as usual? so that the developer can have peace of mind?
This is the totally wrong mindset IMO. What happens if you get hit by a bus? Your company goes bankrupt? All your data gets destroyed in a fire? For every single one of your customers, the value of their investment in your software will drop, and eventually reach zero, because the software can't be developed, or troubleshot, any further without you. I have seen so much money wasted that way, I think it's a horrible business model.
I earn my bread with making software myself so I know the hardships of making a living with it. Still, obfuscation can't be the way to go nowadays. Impose strict license agreements on your customers, scare the hell out of them so they don't even think about redistributing the software, but leave it open.
This is futile. There is always someone smarter than you and therefore they will be able to reverse engineer your obfuscation.
Usually someone smart enough to hack your code and use it in a meaningful way is smart enough to do it on their own, and probably thinks they can do it better than you did, so they won't bother stealing your stuff.
Don't worry about the people who can hack your code but not make meaningful use of it. If you've done a good job, this can only reinforce the quality of the job you've done (think of all the crappy touchscreen phone imitators).
They are going to reverse-engineer your code. Nothing can stop them.. The only thing you can do is make it harder. This ranges from obfuscating code that is inheritely "open" such as PHP and Javascript, all the way down to littering your code with a crap load of self-modification.
In a lot of ways, I think, the thing that makes a piece of software valuable, is not the crazy technological advancement that it provides, but rather the things that we think might think of as being tertiary to the piece of software itself. Like the fact that you'll be there to support it. Or that it's provided as a web service and you'll be there to make sure the server is running. Or that it's a community, and you'll be there to moderate and build the community.
While you may be actually selling code, the value you that your code has isn't intrinsic to the code itself, but rather derives from the features and ecosystem that surrounds your code.

Resources for learning a new language quickly?

The title may seem slightly self-contradictory, and I accept that you can't really learn a language quickly. However, an experienced programmer that already has knowledge of a few languagues and different styles (functional, OO, imperative etc.) often wants to get started quickly. I've seen a few websites doing effective "translations" in the form of "just show me syntax equivalence". I can't remember the sites now, but for related languages (e.g. Perl/PHP) it's quite common.
Is there a better resource that covers more languages? Is there a resource that covers idioms as well as syntax? I think this would be incredibly useful for doing small amounts of work on existing code bases where you are not familiar with the language. Looking at the existing code, as we know, is not always a good indicator of quality. Likewise, for "learn by doing" weekend project I always have the urge to write reasonably idiomatic, clean code from the start. Such a resource could also link to known good example projects of varying sizes for those that prefer to learn by reading. Reading a well-written medium sized code base can also be much more practical when access to development environments might be limited.
I think it's possible to find tutorials and summaries for individual languages that provide some of this functionality in disparate web locations but I'm hoping there is a good, centralised, comparative place that the busy programmer can turn to.
You generally have two main things to overcome:
Syntax
Reference
Syntax you can pick up fairly quickly with a language tutorial and a stack of samplecode.
Reference (library/API calls) you need to find a proper guide to; perhaps the language reference, or perhaps google...
With those two in place, following a walkthrough (to get you used to using the development environment) will have you pretty much ready - you'll be able to look up what you want to say (reference), and know how to say it (syntax).
This, of course, applies principally to procedural/oop languages; languages that require a paradigm switch (ML/Haskell) you should go to lectures for ;)
(and for the weirder moments, there's SO!)
In the past my favour was "learning by doing". So e.g. I know a little bit of C++ and a lot of C#.Net but I must write a FTP Tool in Python.
So I sit for an hour and so the syntax differences by a tutorial, than I develop the form itself and look at the generated code. Then I search a open source Python FTP Client and get pieces of code (Not copy and paste, write it self to see, feel and remember the code!)
After a few hours I get it.
So: The mix is the best. A book, a piece of good code, the willing to learn and a free night with much coffee.
At the risk of sounding cheesy, I would start with the language's website tutorial and/or FAQ, followed by asking more specific questions here. SO is my centralized location for programming knowledge.
I remember when I learned Perl. I was asked to modify some Perl code at work and I'd never seen the language before. I had experience with several other languages, however, so it wasn't hard to figure out the syntax with the online Perl docs in one window and the code in another, side-by-side. I don't know that solely reading existing code is necessarily the best way to learn. In my case, I didn't know Perl but I could tell that the person who originally wrote the code didn't know Perl either. I'm not sure I could've distinguished between good Perl and really confusing Perl. It would've been nice to be able to ask questions here at the time.
Language isn't important. What is important is learning your ways around designing algorithms and the proper application of design patterns. Focus on the technique, not the language that implements a certain technique. Once you understand the proper development techniques, any programming language will just become real easy, no matter how obscure they are...
When you put a focus on a language, you're restricting your own knowledge.
http://devcheatsheet.com/ seems to be a step in the right direction: it aggregates cheat sheets/quick references and they are (somewhat) manually reviewed. It's also wide-ranging. It still comes up short a bit in terms of "idiomatic" quick reference: for example, the page on Ruby doesn't mention yield.
Rosetta Code appears to be an excellent resource that includes hints on coding idiomatically and moves from simple (like for-loops) to things like drawing. I haven't checked out how comprehensive it is, but there are a large number of languages and tasks listed. The drawbacks re: original question are:
Some of the linking is not accurate
(navigating Python->ForLoop will
take you to the top of the ForLoop
page, not the Python section). It's a
wiki, this can be improved.
Ideally you could "slice" the wiki
however you chose to see e.g. the top
20 tasks for two languages
side-by-side.
http://hyperpolyglot.org/ seems to be an almost perfect match for what I was looking for. The quality is not always there, or idiom can be lacking, but it has the same intention and is pretty comprehensive.

Writing easily modified code

What are some ways in which I can write code that is easily modified?
The one I have learned from experience is that I almost always need to write one to throw away. That way I have developed a sense of the domain knowledge and program structure required before coding the actual application.
The general guidelines are offcourse
High cohesion, low coupling
Dont repeat yourself
Recognize design patterns and implement them
Dont recognize design patterns where they are not existing or necassary
Use a coding standard, stick to it
Comment everyting that should be commented, when in doubt : comment
Use unit tests
Write comments and tests before implementation, that way you know exactly what you want to do
And when it goes wrong : refactor, refactor, refactor. With good tests you can be sure nothing breaks
And oh yeah:
read this : http://www.pragprog.com/the-pragmatic-programmer
Everything (i think) above and more is in it
I think your emphasis on modifiability is more important than readability. It is not hard to make something easy to read, but the real test of how well it is understood comes when someone else (or you) has to modify it in repsonse to changing requirements.
What I try to do is assume that modifications will be necessary, and if it is not really clear how to do them, leave explicit directions in the code for how to do them.
I assume that I may have to do some educating of the reader of the code to get him or her to know how to modify the code properly. This requires energy on my part, and it requires energy on the part of the person reading the code.
So while I admire the idea of literate programming, that can be easily read and understood, sometimes it is more like math, where the only way to do it is for the reader to buckle down, pay close attention, re-read it a few times, and make sure they understand.
Readability helps a lot: If you do something non-obvious, or you are taking a shortcut, comment. Comments are places where you can go back and refactor if you have time later. Use sensible names for everything, makes it easier to understand what is going on.
Continuous revision will let you move from that first draft to a better one without throwing away (too much) work. Any time you rewrite from scratch you may lose lessons learned. As you code, use refactoring tools to eliminate code representing areas of exploration that are no longer needed, and to make obvious things that were obscure. The first one reduces the amount that you need to maintain; the second reduces the effort per square foot. (Sqft makes about as much sense as lines of code, really.)
Modularize appropriately and enforce encapsulation and separation of logic between your modules. You don't want too many dependencies on any one part of the code or that part becomes inherently harder to understand.
Considering using tried and true methods over cutting edge ones. You give up some functionality for predictability.
Finally, if this is code that people will be using before and after modification, you need(ed) to have an appropriate API insulating your code from theirs. Having a strong API lets you change things behind the scenes without needing to alert all your consumers. I think there's a decent article on Coding Horror about this.
Hang Your Code Out to D.R.Y.
I learned this early when assigned the task of changing the appearance of a web-interface. The code was in C, which I hated, and was compiled to a CGI executable. And, worse, it was built on a library that was abandoned—no updates, no support, and too many man-hours put into its use to change it. On top of the framework was a disorderly web of code, consisting of various form and element builders, custom string implementations, and various other arcane things (for a non-C programmer to commit suicide with).
For each change I made there were several, sometimes many, exceptions to the output HTML. Each one of these exceptions required a small change or improvement in the form builder, thanks to the language there's no inheritance and therefore only functions and structs, and instead of putting the hours in the team instead wrote these exceptions frequently.
In my inexperience I was forced to change the output of each exception, rather than consolidate the changes in an improved form builder. But, trawling through 15,000 lines of code for several hours after ineffective changes would induce code-burn, and a fogginess that took a night's sleep to cure.
Always run your code through the DRY-er.
The easiest way to modify a code is NOT to write code. Write pseudo code not just for algo but how your code should be structured if you are unsure.
Designing while writing code never works...for me :-)
Here is my current experience: I'm working (Java) with a kind of database schema that might often change (fields added/removed, data types modified). My strategy is to parse this schema and to generate the code with apache velocity. The BaseClass generated is never modified by the programmer. Else, a MyClass extends BaseClass is created and the logical components of this class (e.g. toString() ! )are implemented using the 'getters' and the 'setters' of the super class.

Resources