I recently learned about code obfuscation. Its nice thing to do, when you have spare time, but I have different question. Why to do it?
First, there are languages in which I am sure its great thing - interpreted ones, like php, JavaScript and much more. There it seems like a good and more secure thing.
Second, there are languages where this seems to have no real effect for me - all the native code compiled languages. Take C for example. when compiled, all the variable names, function names, most of obfuscation techniques go away. If some can make it into native code, it would be things like recursion instead of for cycles and so, but disassembled code will anyway have instead of names some disassembler-generated identifiers, right?
And last category are languages I am not quite sure about. And that's the main reason I ask. These languages would be Java, C# (.NET),and the last Silverlight used in WP7. I ask because I read some article that state that on WP7 apps, code obfuscation helps preventing code from hacking. But I always thought of byte-code as being very similar to standard assembler codes, therefore again not having any information about real pre-compilation variable names, function names, etc. So, where is the truth?
Do it if you want, but don't expect any determined person to be scared away by it. There exist de-obfuscators, people can read obfuscated code as well (just as there are people who can read optimized assembly and reconstruct the original C code). Code obfuscation just gives you a false sense of security and might deter a person who is just curious (instead of deterring those who are serious about stealing your code). All it gives you is a false sense of security but no real one. Schneier aptly names this "security theater".
Yes, many modern languages that retain more information about the source can be obfuscated better than those that are compiled right to machine code. For the latter the compiler already does quite a good job with optimization. Your notion of bytecode being akin to traditional assembler is slightly wrong here, though. Especially .NET bytecode retains enough metadata to reconstruct the original source almost exactly (see Reflector). What isn't retained there are the names of local variables and arguments to methods. But you still need and retain the method and class names.
Another issue you should be aware of: If you give your customers an obfuscated executable and your program crashes, make sure you have a way of getting the real stacktrace back instead of the obfuscated one. Saying "Sorry, I cannot determine the root cause of why my program killed hours of your work since I chose to obfuscate it" isn't going to cut it, I guess :-)
Obfuscation is a common technique for mobile applications where you have hardware restrictions. Obfuscated code tends to have shorter identifiers and therefore smaller binaries.
Related
How can I package my Java application into an executable jar that cannot be decompiled (for example , by Jadclipse)?
You can't. If the JRE can run it, an application can de-compile it.
The best you can hope for is to make it very hard to read (replace all symbols with combinations of 'l' and '1' and 'O' and '0', put in lots of useless code and so on). You'd be surprised how unreadable you can make code, even with a relatively dumb translation tool.
This is called obfuscation and, while not perfect, it's sometimes adequate.
Remember, you can't stop the determined hacker any more than the determined burglar. What you're trying to do is make things very hard for the casual attacker. When presented with the symbols O001l1ll10O, O001llll10O, OO01l1ll10O, O0Ol11ll10O and O001l1ll1OO, and code that doesn't seem to do anything useful, most people will just give up.
First you can't avoid people reverse engineering your code. The JVM bytecode has to be plain to be executed and there are several programs to reverse engineer it (same applies to .NET CLR). You can only make it more and more difficult to raise the barrier (i.e. cost) to see and understand your code.
Usual way is to obfuscate the source with some tool. Classes, methods and fields are renamed throughout the codebase, even with invalid identifiers if you choose to, making the code next to impossible to comprehend. I had good results with JODE in the past. After obfuscating use a decompiler to see what your code looks like...
Next to obfuscation you can encrypt your class files (all but a small starter class) with some method and use a custom class loader to decrypt them. Unfortunately the class loader class can't be encrypted itself, so people might figure out the decryption algorithm by reading the decompiled code of your class loader. But the window to attack your code got smaller. Again this does not prevent people from seeing your code, just makes it harder for the casual attacker.
You could also try to convert the Java application to some windows EXE which would hide the clue that it's Java at all (to some degree) or really compile into machine code, depending on your need of JVM features. (I did not try this.)
GCJ is a free tool that can compile to either bytecode or native code. Keeping in mind, that does sort of defeat the purpose of Java.
A little late I know, but the answer is no.
Even if you write in C and compile to native code, there are dissasemblers / debuggers which will allow people to step through your code. Granted - debugging optimized code without symbolic information is a pain - but it can be done, I've had to do it on occasion.
There are steps that you can take to make this harder - e.g. on windows you can call the IsDebuggerPresent API in a loop to see if somebody is debugging your process, and if yes and it is a release build - terminate the process. Of course a sufficiently determined attacker could intercept your call to IsDebuggerPresent and always return false.
There are a whole variety of techniques that have cropped up - people who want to protect something and people who are out to crack it wide open, it is a veritable arms race! Once you go down this path - you will have to constantly keep updating/upgrading your defenses, there is no stopping.
This not my practical solution but , here i think good collection or resource and tutorials for making it happen to highest level of satisfaction.
A suggestion from this website (oracle community)
(clean way), Obfuscate your code, there are many open source and free
obfuscator tools, here is a simple list of them : [Open source
obfuscators list] .
These tools make your code unreadable( though still you can decompile
it) by changing names. this is the most common way to protect your
code.
2.(Not so clean way) If you have a specific target platform (like windows) or you can have different versions for different platforms,
you can write a sophisticated part of your algorithms in a low level
language like C (which is very hard to decompile and understand) and
use it as a native library in you java application. it is not clean,
because many of us use java for it's cross-platform abilities, and
this method fades that ability.
and this one below a step by step follow :
ProtectYourJavaCode
Enjoy!
Keep your solutions added we need this more.
EDIT 3 Quite some new development have happened since I asked this question. Basically I wasn't "seeing things" and webapps written in Clojure have been found to be vulnerable, which prompted changes in Clojure 1.5 and very heated discussion on the Clojure Google groups.
Here's a quote from someone on Hacker News about the changes in Clojure 1.5:
Another slightly interesting thing is the sudden enhancement to
read-eval and EDN[2]. That's mainly because of the rough weather
Ruby/Rubygems was in with the YAML-exploits, which caused a heated
discussion on how the Clojure reader should act by default.
Holes have been found and it's too late to really fix Clojure, so read-eval shall still ship by default set to true (because otherwise it would break too many things). And anyone parsing inputs in Clojure should not use the default read functions but the EDN ones.
So I certainly wasn't seeing things and it didn't take long (not even 18 months) for people to find ways to attack common Clojure webapp stacks.
EDIT 2 I didn't know it but my question is a dupe of the following question (which has been described as a 'killer question'): Lisp data security/validation
If anyone's interested in the answer(s) to this question, I'd suggest they open the above question and read the answers made there by Lisp gurus instead of the ones of the type "nothing to see here, move along, it's just like PHP or JavaScript".
EDIT: I'd like to know if, somehow, because it is Lisp, it would be "easier" for an attacker to transform "data" (i.e. "crafted user input with a malicious intent") into "code". For example, do I need to escape/replace all the parentheses in the user input before starting to "evaluate" / parse or whatever the data?
Original question
I'm still reading about Lisp and suddenly I was wondering, with this entire "code is data" / "data is code" thing, do Lisp need to perform input sanitizing in order to prevent attacks?
I was thinking specifically of webapps, say when a user does some HTTP POST.
What if the data he's sending contains things like:
This is some malicious (eval '(nasty-stuff (...)) or whatever.
(I'm no Lisp programmer, it's just an example of what I've got in mind, it's not meant to be actually mean code)
Is there anything special to keep in mind due to how Lisp works? For example if some dark-side hacker would know that some webserver is running on Clojure, can he exploit that fact and then inject "code between parentheses" that would then be evaluated on the webserver?
Is this a concern at all when receiving/parsing user data (and hence potentially crafted data) from Lisp?
I have written some webapps in Lisp (i.e. Common Lisp) and here are the things I've kept in mind:
if you use read, you should always set *read-eval* to nil for any untrusted data
if you are dealing with code generation - for example, HTML, JS, CSS or SQL generation - which is very common in Lisp-land, you shouldn't forget to use the sanitizing facilities provided by the corresponding libraries (not use raw input strings)
Basically, that's all. Moreover, since it's Lisp, it usually makes your system less prone to attack, because:
there are no standard attacks (as Lisp's use is relatively rare)
the system is rather secure in terms of defaults - this isn't unique, but many web-oriented languages (like PHP in the first place) suffer from insecurity by default, although it is mitigated by modern frameworks
You should always assume that injection attacks are possible until proven otherwise. Without knowing more about your specific Lisp environment and what you are comparing it with, it is impossible to answer whether you need "special" sanitization.
We know that machine code attacks are possible.
We know that SQL injection is possible.
We should assume that it is possible to hijack any turing-complete system, whether it is hardware or software.
Note that the soft barrier between "code" and "data" is not unique to Lisp. perl, once the workhorse of the web world has eval. So does PHP. It looks like Java bytecode injection may be possible as well.
It really does boil down to: don't use READ and don't use EVAL. You need to know exactly what you are sending to either or both of those functions, as well as the contexts within which they are executed. If you do not call either of these, then you're fine.
What are the methods for protecting an Exe file from Reverse Engineering.Many Packers are available to pack an exe file.Such an approach is mentioned in http://c-madeeasy.blogspot.com/2011/07/protecting-your-c-programexe-files-from.html
Is this method efficient?
The only good way to prevent a program from being reverse-engineered ("understood") is to revise its structure to essentially force the opponent into understanding Turing Machines. Essentially what you do is:
take some problem which generally proven to be computationally difficult
synthesize a version of that whose outcome you know; this is generally pretty easy compared to solving a version
make the correct program execution dependent on the correct answer
make the program compute nonsense if the answer is not correct
Now an opponent staring at your code has to figure what the "correct" computation is, by solving algorithmically hard problems. There's tons of NP-hard problems that nobody has solved efficiently in the literature in 40 years; its a pretty good bet if your program depends on one of these, that J. Random Reverse-Engineer won't suddenly be able to solve them.
One generally does this by transforming the original program to obscure its control flow, and/or its dataflow. Some techniques scramble the control flow by converting some control flow into essentially data flow ("jump indirect through this pointer array"), and then implementing data flow algorithms that require precise points-to analysis, which is both provably hard and has proven difficult in practice.
Here's a paper that describes a variety of techniques rather shallowly but its an easy read:
http://www.cs.sjsu.edu/faculty/stamp/students/kundu_deepti.pdf
Here's another that focuses on how to ensure that the obfuscating transformations lead to results that are gauranteed to be computationally hard:
http://www.springerlink.com/content/41135jkqxv9l3xme/
Here's one that surveys a wide variety of control flow transformation methods,
including those that provide levels of gaurantees about security:
http://www.springerlink.com/content/g157gxr14m149l13/
This paper obfuscates control flows in binary programs with low overhead:
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.167.3773&rank=2
Now, one could go through a lot of trouble to prevent a program from being decompiled. But if the decompiled one was impossible to understand, you simply might not bother; that's the approach I'd take.
If you insist on preventing decompilation, you can attack that by considering what decompilation is intended to accomplish. Decompilation essentially proposes that you can convert each byte of the target program into some piece of code. One way to make that fail, is to ensure that the application can apparently use each byte
as both computer instructions, and as data, even if if does not actually do so, and that the decision to do so is obfuscated by the above kinds of methods. One variation on this is to have lots of conditional branches in the code that are in fact unconditional (using control flow obfuscation methods); the other side of the branch falls into nonsense code that looks valid but branches to crazy places in the existing code. Another variant on this idea is to implement your program as an obfuscated interpreter, and implement the actual functionality as a set of interpreted data.
A fun way to make this fail is to generate code at run time and execute it on the fly; most conventional languages such as C have pretty much no way to represent this.
A program built like this would be difficult to decompile, let alone understand after the fact.
Tools that are claimed to a good job at protecting binary code are listed at:
https://security.stackexchange.com/questions/1069/any-comprehensive-solutions-for-binary-code-protection-and-anti-reverse-engineeri
Packing, compressing and any other methods of binary protection will only every serve to hinder or slow reversal of your code, they have never been and never will be 100% secure solutions (though the marketing of some would have you believe that). You basically need to evaluate what sort of level of hacker you are up against, if they are script kids, then any packer that require real effort and skill (ie:those that lack unpacking scripts/programs/tutorials) will deter them. If your facing people with skills and resources, then you can forget about keeping your code safe (as many of the comments say: if the OS can read it to execute it, so can you, it'll just take a while longer). If your concern is not so much your IP but rather the security of something your program does, then you might be better served in redesigning in a manner where it cannot be attack even with the original source (chrome takes this approach).
Decompilation is always possible. The statement
This threat can be eliminated to extend by packing/compressing the
executable(.exe).
on your linked site is a plain lie.
Currently many solutions can be used to protect your application from being anti-compiled. Such as compressing, Obfuscation, Code snippet, etc.
You can looking for a company to help you achieve this.
Such as Nelpeiron, the website is:https://www.nalpeiron.com/
Which can cover many platforms, Windows, Linux, ARM-Linux, Android.
What is more Virbox is also can be taken into consideration:
The website is: https://lm-global.virbox.com/index.html
I recommend is because they have more options to protect your source code, such as import table protection, memory check.
I recently came across the term Polymorphic Code, and was wondering if anyone could suggest a legitimate (i.e. in legal and business appropriate software) reason to use it in a computer program? Links to real world examples would be appreciated!
Before someone answers, telling us all about the benefits of polymorphism in object oriented programming, please read the following definition for polymorphic code (taken from Wikipedia):
"Polymorphic code is code that uses a polymorphic engine to mutate while keeping the original algorithm intact. That is, the code changes itself each time it runs, but the function of the code in whole will not change at all."
Thanks, MagicAndi.
Update
Summary of answers so far:
Runtime optimization of the original code
Assigning a "DNA fingerprint" to each individual copy of an application
Obfuscate a program to prevent reverse-engineering
I was also introduced to the term 'metamorphic code'.
Runtime optimization of the original code, based on actual performance statistics gathered when running the application in its real environment and real inputs.
Digitally watermarking music is something often done to determine who was responsible for leaking a track, for example. It makes each copy of the music unique so that copies can be traced back to the original owner, but doesn't affect the audible qualities of the track.
Something similar could be done for compiled software by running each individual copy through a polymorphic engine before distributing it. Then if a cracked version of this software is released onto the Internet, the developer might be able to tell who cracked it by looking for specific variations produced the polymorphic engine (a sort of DNA test). As far as I know, this technique has never been used in practice.
It's not exactly what you were looking for I guess, since the polymorphic engine is not distributed with the code, but I think it's the closest to a legitimate business use you will find for this kind of technique.
Polymorphic code is a nice thing, but metamorphic is even nicer. To the legitimate uses: well, I can't think of anything other than anti-cracking and copy protection. Look at vx.org.ua if you wan't real world uses (not that legitimate though)
As Sami notes, on-the-fly optimisation is an excellent application of polymorphic code. A great example of this is the Fastest Fourier Transform in the West. It has a number of solvers at its disposal, which it combines with self-profiling to adjust the code path and solver parameters on subsequent executions. The result is the program optimises itself for your computing environment, getting faster with subsequent runs!
A related idea that may possibly be of interest is computational steering. This is the practice of altering the execution path of large simulations as the run proceeds, to focus on areas of interest to the researcher. The overall purpose of the simulation is not changed, but the feedback cycle acts to optimise the calculation. In this case the executable code is not being explicitly rewritten, but the effect from a user perspective is similar.
Polymorph code can be used to obfuscate weak or proprietary algorithms - that may use encryption e. g.. There're many "legitimate" uses for that. The term legitimate these days is kind of narrow-minded when it comes to IT. The core-paradigms of IT contain security. Whether you use polymorph shellcode in exploits or detect such code with an AV scanner. You have to know about it.
Obfuscate a program i.e. prevent reverse-engineering: goal being to protect IP (Intellectual Property).
Hear lately I've been listening to Jeff Atwood and Joel Spolsky's radio show and they have been talking about dogfooding (the process of reusing your own code, see Jeff Atwood's blog post). So my question is should programmers use decompilers to see how that programmers code is implemented and works, to make sure it won't break your code. Or should you just trust that programmers code and adapt to it because using decompilers go against everything we as programmers have ever learn about hiding data (well OO programmers at least)?
Note: I wasn't sure which tags this would go under so feel free to retag it.
Edit: Just to clarify I was asking about decompilers as a last resort, say you can't get the source code for some reason. Sorry, I should have supplied this in the original question.
Yes, It can be useful to use the output of a decompiler, but not for what you suggest. The output of a compiler doesn't ever look much like what a human would write (except when it does.) It can't tell you why the code does what it does, or what a particular variable should mean. It's unlikely to be worth the trouble to do this unless you already have the source.
If you do have the source, then there are lots of good reasons to use a decompiler in your development process.
Most often, the reasons for using the output of a decompiler is to better optimize code. Sometimes, with high optimization settings, a compiler will just get it wrong. This can be almost impossible to sort out in some cases without comparing the output of the compiler at different levels of optimization.
Other times, when trying to squeeze the most performance out of a very hot code path, a developer can try arranging their code in a few different ways and compare the compiled results. As a last resort, this may be the simplest way to start when implementing a code block in assembly language, by duplicating the compiler's output.
Dogfooding is the process of using the code that you write, not necessarily re-using code.
However, code re-use typically means you have the source, hence 'code-reuse' otherwise its just using a library supplied by someone else.
Decompiling is hard to get right, and the output is typically very hard to follow.
You should use a decompiler if it is the tool that's required to get the job done. However, I don't think it's the proper use of a decompiler to get an idea of how well the code which is being decompiled was written. Depending on the language you use, the decompiled code can be very different from the code which was actually written. If you want to see some real code, look at open source code. If you want to see the code of some particular product, it's probably better to try to get access to the actual code through some legal means.
I'm not sure what exactly it is you are asking, what you expect "decompilers" to show you, or what this has to do with Atwood and Spolsky, or what the question is exactly. If you're programming to public interfaces then why would you need to see the original source of the the third party code to see if it will "break" your code? You could more effectively build tests to in order to determine this. As well, what the "decompiler" will tell you largely depends on the language/platform the software was written in, whether it is Java, .NET, C and so forth. It's not the same as having the original source to read, even in the case of .NET assemblies. Anyway, if you are worried about third party code not working for you then you should really be doing typical kinds of unit tests against the code rather than trying to "decompile" it. As far as whether you "should," if you mean whether you "should" in some other way other than what would be the best use of your time then I'm not sure what you mean.
Should Programmers Use Decompilers?
Use the right tool for the right job. Decompilers don't often produce results that are easy to understand, but sometimes they are what's needed.
should programmers use decompilers to
see how that programmers code is
implemented and works, to make sure it
won't break your code.
No, not unless you find a problem and need support. In general you don't use it if you don't trust it, and if you have to use it you even when you don't trust it you develop tests to prove the functionality and verify that later upgrades still work as expected.
Don't use functionality you don't test, unless you have very good support or a relationship of trust.
-Adam
Or should you just trust that programmers code and adapt to it because using decompilers go against everything we as programmers have ever learn about hiding data (well OO programmers at least)?
This is not true at all. You would use a decompiler not because you want to get around any sort of abstraction, encapsulation, or defeat OO principles, but because you want to understand why the code is behaving the way it is better.
Sometimes you need to use a decompiler (or in the Java world, a bytecode viewer) when you are troubleshooting an annoying bug with a 3rd party library where an exception is thrown with no useful error message, no logging, etc.
Use of a decompiler has nothing to do with OO principles.
The short answer to this... Program to a public and documented specification, not to an implementation. Relying on implementation specifics and side-effects will burn you.
Decompilation is not a tool to help you program correctly, though it might, in a pinch, assist you in understanding a problem with someone else's code for which you don't have source.
Also, beware of the possible legal risk of decompiling; many software companies have no-decompile clauses which could expose you and your employer to legal consequences.