linux disassembler - linux

how can I write just a simple disassembler for linux from scratches?
Are there any libs to use? I need something that "just works".

Instead of writing one, try Objdump.
Based on your comment, and your desire to implement from scratch, I take it this is a school project. You could get the source for objdump and see what libraries and techniques it uses.
The BFD library might be of use.

you have to understand the ELF file format first. Then, you can start processing the various sections of code according to the opcodes of your architecture.

You can use libbfd and libopcodes, which are libraries distributed as part of binutils.
http://www.gnu.org/software/binutils/
As an example of the power of these libraries, check out the Online Disassembler (ODA).
http://www.onlinedisassembler.com
ODA supports a myriad of architectures and provides a basic feature set. You can enter binary data in the Live View and watch the disassembly appear as you type, or you can upload a file to disassemble. A nice feature of this site is that you can share the link to the disassembly with others.

You can take a look at the code of ERESI
The ERESI Reverse Engineering Software Interface is a multi-architecture binary analysis framework with a tailored domain specific language for reverse engineering and program manipulation.

Related

Is win32com library available on Linux?

I want to use win32com.client module on Linux.
So is there any problem to use it?
Is win32com library available on Linux?
Certainly not.
win32com looks like a Windows specific library, tied to the WinAPI.
Linux has a different operating system API (because Linux is not Windows and both are different OSes), mostly following the POSIX standards. For example, both Linux and Windows have files, directories, processes, executables, dynamic loading, users, etc... but the details about them vary greatly and significantly (and you need to understand them, since "the evil is in the details");
To learn the Linux operating system API, read a good Linux programming book. The ALP book is freely downloadable, even if it is a bit old (but most of its content still apply); and you could get (e.g. buy) newer books.
For more, read the man pages (which are the canonical documentation on Unix). In particular syscalls(2) and the many other pages refereed from it. You'll also need to look into the section 3 of man pages, since it is listing many functions usable in Linux.
Read also Operating Systems: Three Easy Pieces
You might find (but this is programming language specific) some framework libraries trying to provide some common abstractions above several OSes. Look (for C++) into Qt, POCO, Boost, etc....
At last, don't forget that Linux is made of free software. Sometimes, it is useful to download them and study their source code. In some occasions, that is a good approach to leaky abstractions.
PS. Budget several weeks of your time to read documentation and books, and perhaps study the source code of some free software similar to your goals.

Replacement for Linux Standard Base?

When distributing source code is not an option the Linux Standard Base allows a mechanism for achieving binary compatibility with many Linux distros.
http://www.linuxfoundation.org/collaborate/workgroups/lsb
However, it appears that it has been somewhat abandoned as an approach:
https://github.com/LinuxStandardBase/lsb/blob/master/README.md
I'd be interested in thoughts on the best path forward for maintaining binary compatibility. The document referenced above basically says "Linux is evolving so rapidly that it's not practical to have a standard base". I can appreciate that, but what are the other options? It's difficult to google this topic because you mostly get references about the LSB.

Parsing Linux source into abstract syntax tree

I'd like to perform source code analysis of Linux kernel, but to do that, I first need to parse it. What are my options? I'd prefer an AST usable from python, but any other language is ok too.
Apparently CIL is able to parse whole kernel, but it's not clear from the website, how to do that.
I'd recommend starting with the sparse static analysis tool. Because sparse was designed specifically to assist the kernel developers in performing static analysis on the kernel, you can have some level of assurance that it really ought to parse the combination of C99 and GNU extensions that are used in the kernel sources. The code I've examined looked clean and straight forward but I never tried to extend it in any fashion. The Documentation/sparse.txt file has a very short synopsis of using sparse on the kernel sources, if you want a very high-level overview.
Another option is GCC MELT, a tool designed to make it easier to build plugins for the gcc compiler. Using it would require knowing enough gcc internals to find your way around, but MELT does look far easier than coding a similar plugin directly in C.
You can check the page Parsing Kernel
about tools comparision. The winner seems to be KDevelop.
Regards,
Do you really need an AST? Or a lower level intermediate representation would be just enough? For both options, you can use Clang, and either analyse its AST (sadly, with C++ only) or an LLVM IR.
CIL is also an option, but you'd need to write your analysis tool in OCaml. cilly is its drop-in replacement for gcc, but it might need some hacking for using it with such a non-trivial build sequence as the Linux kernel. Just using --merge won't be sufficient.

Protecting shared library

Is there any way to protect a shared library (.so file) against reverse engineering ?
Is there any free tool to do that ?
The obvious first step is to strip the library of all symbols, except the ones required for the published API you provide.
The "standard" disassembly-prevention techniques (such as jumping into the middle of instruction, or encrypting some parts of code and decrypting them on-demand) apply to shared libraries.
Other techniques, e.g. detecting that you are running under debugger and failing, do not really apply (unless you want to drive your end-users completely insane).
Assuming you want your end-users to be able to debug the applications they are developing using your library, obfuscation is a mostly lost cause. Your efforts are really much better spent providing features and support.
Reverse engineering protection comes in many forms, here are just a few:
Detecting reversing environments, such as being run in a debugger or a virtual machine, and aborting. This prevents an analyst from figuring out what is going on. Usually used by malware. A common trick is to run undocumented instructions that behave differently in VMWare than on a real CPU.
Formatting the binary so that it is malformed, e.g. missing ELF sections. You're trying to prevent normal analysis tools from being able to open the file. On Linux, this means doing something that libbfd doesn't understand (but other libraries like capstone may still work).
Randomizing the binary's symbols and code blocks so that they don't look like what a compiler would produce. This makes decompiling (guessing at the original source code) more difficult. Some commercial programs (like games) are deployed with this kind of protection.
Polymorphic code that changes itself on the fly (e.g. decompresses into memory when loaded). The most sophisticated ones are designed for use by malware and are called packers (avoid these unless you want to get flagged by anti-malware tools). There are also academic ones like UPX http://upx.sourceforge.net/ (which provides a tool to undo the UPX'ing).

Binary Analysis Research Tools

Can some one provide me with a list of leading binary research tools for Windows OS and windows applications? I found BinScope from microsoft itself but was wondering if there are any other better tools around?
Thanks,
Omer
If you only have access to the binary your access is limited. If you want to peer into the inner workings of this binary your best bet is a Decompiler like IDA Pro and a assembler level debugger like OllyDBG.
Tom Reps, a professor at the University of Wisconsin and founder of GrammaTech, gave an impressive talk on this at Stanford last summer. GrammaTech is working on binary analysis (http://www.grammatech.com/research/contracts/HSARPA/HSARPA-2005-MCSB/), but I don't know whether it's available in their static analysis product yet.
Disclaimer: One of their VP's bought me lunch and got me to try a demo of their source code analysis tool while I was at Palm (before the binary analysis talk), but I think the results are confidential.
BAP is a toolkit for performing binary analysis on x86 programs. It lifts binary code to an easily understandable and analyzable language similar to compiler intermediate languages. It's not a point and click solution (i.e., programming is required to use it effectively), but it can be useful for people who want to write new program analyses on binary code without redefining the semantics of x86.

Resources