Canonical method of modifying (add/delete code) a binary executable programmatically? - security

I've always had a great interest in computer security and after reading:
Surreptitious Software: Obfuscation, Watermarking, and Tamperproofing for Software Protection
-I'd like to implement some of these algorithms, which assume that you are able to modify an executable at the binary level.
Using a HEX editor, I have accomplished to insert a simple checksum algorithm for tamper-proofing protection of a code region. However, this technique is not feasible in practice, so I'm looking for ways of automating this.
Are there any well-known frameworks or techniques that takes an executable as input and allow the programmer to work with the code in a programatically way? If not, what are my options?
By programatically, I mean, parse it into, e.g., a tree-like structure that can be written back out as a new (modified) executable.
Thank you for your time and interest.

Related

How to add security to Spring boot jar file? [duplicate]

How can I package my Java application into an executable jar that cannot be decompiled (for example , by Jadclipse)?
You can't. If the JRE can run it, an application can de-compile it.
The best you can hope for is to make it very hard to read (replace all symbols with combinations of 'l' and '1' and 'O' and '0', put in lots of useless code and so on). You'd be surprised how unreadable you can make code, even with a relatively dumb translation tool.
This is called obfuscation and, while not perfect, it's sometimes adequate.
Remember, you can't stop the determined hacker any more than the determined burglar. What you're trying to do is make things very hard for the casual attacker. When presented with the symbols O001l1ll10O, O001llll10O, OO01l1ll10O, O0Ol11ll10O and O001l1ll1OO, and code that doesn't seem to do anything useful, most people will just give up.
First you can't avoid people reverse engineering your code. The JVM bytecode has to be plain to be executed and there are several programs to reverse engineer it (same applies to .NET CLR). You can only make it more and more difficult to raise the barrier (i.e. cost) to see and understand your code.
Usual way is to obfuscate the source with some tool. Classes, methods and fields are renamed throughout the codebase, even with invalid identifiers if you choose to, making the code next to impossible to comprehend. I had good results with JODE in the past. After obfuscating use a decompiler to see what your code looks like...
Next to obfuscation you can encrypt your class files (all but a small starter class) with some method and use a custom class loader to decrypt them. Unfortunately the class loader class can't be encrypted itself, so people might figure out the decryption algorithm by reading the decompiled code of your class loader. But the window to attack your code got smaller. Again this does not prevent people from seeing your code, just makes it harder for the casual attacker.
You could also try to convert the Java application to some windows EXE which would hide the clue that it's Java at all (to some degree) or really compile into machine code, depending on your need of JVM features. (I did not try this.)
GCJ is a free tool that can compile to either bytecode or native code. Keeping in mind, that does sort of defeat the purpose of Java.
A little late I know, but the answer is no.
Even if you write in C and compile to native code, there are dissasemblers / debuggers which will allow people to step through your code. Granted - debugging optimized code without symbolic information is a pain - but it can be done, I've had to do it on occasion.
There are steps that you can take to make this harder - e.g. on windows you can call the IsDebuggerPresent API in a loop to see if somebody is debugging your process, and if yes and it is a release build - terminate the process. Of course a sufficiently determined attacker could intercept your call to IsDebuggerPresent and always return false.
There are a whole variety of techniques that have cropped up - people who want to protect something and people who are out to crack it wide open, it is a veritable arms race! Once you go down this path - you will have to constantly keep updating/upgrading your defenses, there is no stopping.
This not my practical solution but , here i think good collection or resource and tutorials for making it happen to highest level of satisfaction.
A suggestion from this website (oracle community)
(clean way), Obfuscate your code, there are many open source and free
obfuscator tools, here is a simple list of them : [Open source
obfuscators list] .
These tools make your code unreadable( though still you can decompile
it) by changing names. this is the most common way to protect your
code.
2.(Not so clean way) If you have a specific target platform (like windows) or you can have different versions for different platforms,
you can write a sophisticated part of your algorithms in a low level
language like C (which is very hard to decompile and understand) and
use it as a native library in you java application. it is not clean,
because many of us use java for it's cross-platform abilities, and
this method fades that ability.
and this one below a step by step follow :
ProtectYourJavaCode
Enjoy!
Keep your solutions added we need this more.

Can you disallow a common lisp script called from common lisp to call specific functions?

Common Lisp allows to execute/compile code at runtime. But I thought for some (scripting-like) purposes it would be good if one could disallow a user-script to call some functions (especially for application extensions). One could still ask the user if he will allow an extension to access files/... I'm thinking of something like the Android permission system for Common Lisp. Is this possible without rewriting the evaluation code?
The problem I see is, that in Common Lisp you would probably want a script to be able to use reader macros and normal macros and for the latter operators like intern, but those would allow you to get arbitrary symbols (by string manipulation & interning), so simply scanning the code before evaluation won't suffice to ensure that specific functions aren't called.
So, is there something like a lock for functions? I thought of using fmakunbound / makunbound (and keeping the values in a local variable), but would that be possible in a multi-threaded environment?
Thanks in advance.
This is not part of the Common Lisp specification and there is no Common Lisp implementation that is extended to make this kind of restriction easy.
It seems to me like it would be easier to use operating system restrictions (e.g. rlimit, capabilities, etc) to enforce what you want on the Common Lisp process.
This is not an unusual desire, i.e. to run untrusted 3rd party code in a sandbox.
You can hand craft a sandbox by creating a custom parser and interpreter for your scripting language. It is pedantic, but true, than any program with an API is providing such a service. API designers and implementors needs to worry about the vile users.
You can still call eval or the compiler to run your sandbox scripts. It just means you need to assure that your reader, parser and language decline to provide access to any risky functionality.
You can use a lisp package to create a good sandbox. You can still use s-expressions for your scripting language's syntax, but you must cripple the standard reader so the user can't escape package-sandbox. You can still use the evaluator and the compiler, but you need to be sure the package you have boxed the user into contains no functionality that he can use to do inappropriate things.
Successful sandbox design and construction is easier when you start with an empty sandbox and slowly add functionality. Common Lisp is a big language and that creates a huge surface for attacker to poke at. So if you create a sandbox out of a package it's best to start with an empty package and add functions one at a time. Thinking thru what risks they create. The same approach is good when creating your crippled reader. Don't start with the full reader and throw things away, start with a useless reader and add things. Sadly taking that advice creates a pretty significant cost to getting started. But, if you look around I suspect you can find an existing safe reader.
Xach's suggestion is another way to go and in many case more straight forward.

How to Protect an Exe File from Decompilation

What are the methods for protecting an Exe file from Reverse Engineering.Many Packers are available to pack an exe file.Such an approach is mentioned in http://c-madeeasy.blogspot.com/2011/07/protecting-your-c-programexe-files-from.html
Is this method efficient?
The only good way to prevent a program from being reverse-engineered ("understood") is to revise its structure to essentially force the opponent into understanding Turing Machines. Essentially what you do is:
take some problem which generally proven to be computationally difficult
synthesize a version of that whose outcome you know; this is generally pretty easy compared to solving a version
make the correct program execution dependent on the correct answer
make the program compute nonsense if the answer is not correct
Now an opponent staring at your code has to figure what the "correct" computation is, by solving algorithmically hard problems. There's tons of NP-hard problems that nobody has solved efficiently in the literature in 40 years; its a pretty good bet if your program depends on one of these, that J. Random Reverse-Engineer won't suddenly be able to solve them.
One generally does this by transforming the original program to obscure its control flow, and/or its dataflow. Some techniques scramble the control flow by converting some control flow into essentially data flow ("jump indirect through this pointer array"), and then implementing data flow algorithms that require precise points-to analysis, which is both provably hard and has proven difficult in practice.
Here's a paper that describes a variety of techniques rather shallowly but its an easy read:
http://www.cs.sjsu.edu/faculty/stamp/students/kundu_deepti.pdf
Here's another that focuses on how to ensure that the obfuscating transformations lead to results that are gauranteed to be computationally hard:
http://www.springerlink.com/content/41135jkqxv9l3xme/
Here's one that surveys a wide variety of control flow transformation methods,
including those that provide levels of gaurantees about security:
http://www.springerlink.com/content/g157gxr14m149l13/
This paper obfuscates control flows in binary programs with low overhead:
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.167.3773&rank=2
Now, one could go through a lot of trouble to prevent a program from being decompiled. But if the decompiled one was impossible to understand, you simply might not bother; that's the approach I'd take.
If you insist on preventing decompilation, you can attack that by considering what decompilation is intended to accomplish. Decompilation essentially proposes that you can convert each byte of the target program into some piece of code. One way to make that fail, is to ensure that the application can apparently use each byte
as both computer instructions, and as data, even if if does not actually do so, and that the decision to do so is obfuscated by the above kinds of methods. One variation on this is to have lots of conditional branches in the code that are in fact unconditional (using control flow obfuscation methods); the other side of the branch falls into nonsense code that looks valid but branches to crazy places in the existing code. Another variant on this idea is to implement your program as an obfuscated interpreter, and implement the actual functionality as a set of interpreted data.
A fun way to make this fail is to generate code at run time and execute it on the fly; most conventional languages such as C have pretty much no way to represent this.
A program built like this would be difficult to decompile, let alone understand after the fact.
Tools that are claimed to a good job at protecting binary code are listed at:
https://security.stackexchange.com/questions/1069/any-comprehensive-solutions-for-binary-code-protection-and-anti-reverse-engineeri
Packing, compressing and any other methods of binary protection will only every serve to hinder or slow reversal of your code, they have never been and never will be 100% secure solutions (though the marketing of some would have you believe that). You basically need to evaluate what sort of level of hacker you are up against, if they are script kids, then any packer that require real effort and skill (ie:those that lack unpacking scripts/programs/tutorials) will deter them. If your facing people with skills and resources, then you can forget about keeping your code safe (as many of the comments say: if the OS can read it to execute it, so can you, it'll just take a while longer). If your concern is not so much your IP but rather the security of something your program does, then you might be better served in redesigning in a manner where it cannot be attack even with the original source (chrome takes this approach).
Decompilation is always possible. The statement
This threat can be eliminated to extend by packing/compressing the
executable(.exe).
on your linked site is a plain lie.
Currently many solutions can be used to protect your application from being anti-compiled. Such as compressing, Obfuscation, Code snippet, etc.
You can looking for a company to help you achieve this.
Such as Nelpeiron, the website is:https://www.nalpeiron.com/
Which can cover many platforms, Windows, Linux, ARM-Linux, Android.
What is more Virbox is also can be taken into consideration:
The website is: https://lm-global.virbox.com/index.html
I recommend is because they have more options to protect your source code, such as import table protection, memory check.

Protecting shared library

Is there any way to protect a shared library (.so file) against reverse engineering ?
Is there any free tool to do that ?
The obvious first step is to strip the library of all symbols, except the ones required for the published API you provide.
The "standard" disassembly-prevention techniques (such as jumping into the middle of instruction, or encrypting some parts of code and decrypting them on-demand) apply to shared libraries.
Other techniques, e.g. detecting that you are running under debugger and failing, do not really apply (unless you want to drive your end-users completely insane).
Assuming you want your end-users to be able to debug the applications they are developing using your library, obfuscation is a mostly lost cause. Your efforts are really much better spent providing features and support.
Reverse engineering protection comes in many forms, here are just a few:
Detecting reversing environments, such as being run in a debugger or a virtual machine, and aborting. This prevents an analyst from figuring out what is going on. Usually used by malware. A common trick is to run undocumented instructions that behave differently in VMWare than on a real CPU.
Formatting the binary so that it is malformed, e.g. missing ELF sections. You're trying to prevent normal analysis tools from being able to open the file. On Linux, this means doing something that libbfd doesn't understand (but other libraries like capstone may still work).
Randomizing the binary's symbols and code blocks so that they don't look like what a compiler would produce. This makes decompiling (guessing at the original source code) more difficult. Some commercial programs (like games) are deployed with this kind of protection.
Polymorphic code that changes itself on the fly (e.g. decompresses into memory when loaded). The most sophisticated ones are designed for use by malware and are called packers (avoid these unless you want to get flagged by anti-malware tools). There are also academic ones like UPX http://upx.sourceforge.net/ (which provides a tool to undo the UPX'ing).

Using Polymorphic Code for Legitimate Purposes?

I recently came across the term Polymorphic Code, and was wondering if anyone could suggest a legitimate (i.e. in legal and business appropriate software) reason to use it in a computer program? Links to real world examples would be appreciated!
Before someone answers, telling us all about the benefits of polymorphism in object oriented programming, please read the following definition for polymorphic code (taken from Wikipedia):
"Polymorphic code is code that uses a polymorphic engine to mutate while keeping the original algorithm intact. That is, the code changes itself each time it runs, but the function of the code in whole will not change at all."
Thanks, MagicAndi.
Update
Summary of answers so far:
Runtime optimization of the original code
Assigning a "DNA fingerprint" to each individual copy of an application
Obfuscate a program to prevent reverse-engineering
I was also introduced to the term 'metamorphic code'.
Runtime optimization of the original code, based on actual performance statistics gathered when running the application in its real environment and real inputs.
Digitally watermarking music is something often done to determine who was responsible for leaking a track, for example. It makes each copy of the music unique so that copies can be traced back to the original owner, but doesn't affect the audible qualities of the track.
Something similar could be done for compiled software by running each individual copy through a polymorphic engine before distributing it. Then if a cracked version of this software is released onto the Internet, the developer might be able to tell who cracked it by looking for specific variations produced the polymorphic engine (a sort of DNA test). As far as I know, this technique has never been used in practice.
It's not exactly what you were looking for I guess, since the polymorphic engine is not distributed with the code, but I think it's the closest to a legitimate business use you will find for this kind of technique.
Polymorphic code is a nice thing, but metamorphic is even nicer. To the legitimate uses: well, I can't think of anything other than anti-cracking and copy protection. Look at vx.org.ua if you wan't real world uses (not that legitimate though)
As Sami notes, on-the-fly optimisation is an excellent application of polymorphic code. A great example of this is the Fastest Fourier Transform in the West. It has a number of solvers at its disposal, which it combines with self-profiling to adjust the code path and solver parameters on subsequent executions. The result is the program optimises itself for your computing environment, getting faster with subsequent runs!
A related idea that may possibly be of interest is computational steering. This is the practice of altering the execution path of large simulations as the run proceeds, to focus on areas of interest to the researcher. The overall purpose of the simulation is not changed, but the feedback cycle acts to optimise the calculation. In this case the executable code is not being explicitly rewritten, but the effect from a user perspective is similar.
Polymorph code can be used to obfuscate weak or proprietary algorithms - that may use encryption e. g.. There're many "legitimate" uses for that. The term legitimate these days is kind of narrow-minded when it comes to IT. The core-paradigms of IT contain security. Whether you use polymorph shellcode in exploits or detect such code with an AV scanner. You have to know about it.
Obfuscate a program i.e. prevent reverse-engineering: goal being to protect IP (Intellectual Property).

Resources