I am trying to extract the .text section, i.e, the code, from a PE file (a dll). Is there any simple tool in Linux or some python or ruby lib that allows me to do this easily?
Solved it myself. I used the pefile python module where I extracted the text section and used PointerToRawData and VirtualSize to deduce where the text section was. Then I used dd to extract the .text section to a separate file.
import pefile
pe = pefile.PE('filepath')
for section in pe.sections:
if section.Name == '.text'
print "%s %s" % (section.PointerToRawData),hex(section.Misc_VirtualSize))
Then dd:
dd if=<lib> of=<lib.text> bs=1 skip=$PointerToRawData count=$VirtualSize
Related
I'm reading https://0xax.gitbooks.io/linux-insides/content/Booting/linux-bootstrap-1.html and it reads some assembly like
.section ".reset", "ax", %progbits
.code16
.globl _start
_start:
.byte 0xe9
.int _start16bit - ( . + 2 )
...
There's a line where he compiles like this
nasm -f bin boot.nasm && qemu-system-x86_64 boot
So I thougth it was NASM assembly for linux. I went and found https://asmtutor.com/# which says it uses NASM assembly for linux. However, it's not the same thing. Just to name a few: linux kernel uses .section instead of SECTION, .globl instead of global and I don't recognize what .byte, .int, etc does.
So which assembly does linux use and where can I learn it?
The Linux kernel uses the GAS assembler(GNU Assembler) which is part of GCC. You can find reference documentation on it here.
You can find a pretty thorough introduction to GAS here provided that you already have a basic understanding of assembly in general.
As for .byte and .int, .byte places 1 or more 1 byte values that follow it into memory at the current assembler address, and .int does the same but for 32 bit integers instead of bytes.
I'm trying to disassembly app written in assembly. I'm on Linux, x64:
$ objdump -d my_app
my_app: file format elf64-x86-64
That's it. What's wrong with it? It's not a simple hello world of a few lines, it's around 200 lines of code.
The same with gbd:
$ gdb -q my_app
Reading symbols from my_app...(no debugging symbols found)...done.
(gdb)
And
$ radare2 my_app
Warning: Cannot initialize section headers
Warning: Cannot initialize strings table
Warning: Cannot initialize dynamic strings
Warning: Cannot initialize dynamic section
-- Calculate checksums for the current block with the commands starting with '#' (#md5, #crc32, #all, ..)
update:
$ objdump -D my_app
my_app: file format elf64-x86-64
compiling:
$ fasm my_app.asm
# => my_app
update2:
; simplified
format ELF64 executable 3
include "import64.inc"
interpreter "/lib64/ld-linux-x86-64.so.2"
needed "libc.so.6"
import printf, close
segment readable
A equ 123
B equ 222
C equ 333
segment readable writeable
struc s1 a, b, c {
.a1 dw a
.b1 dw b
.c dd c
}
msg:
.m1 db "aaa", 0
.m2 db "bbb", 0
.m3 db "ccc", 0
segment readable executable
entry $
mov rax, 2
mov rdi, "something.txt"
mov rsi, 0
syscall
; .............
; omitted
Asking fasm to directly produce an ELF binary without the use of a linker will only create segments but no sections in the output. This confuses some tools. In particular objdump -d is specifically documented to operate on sections. Note that gdb can still debug and disassemble it, if you give it some addresses, e.g. the entry point.
I'm trying to make a script or program that will take given bytes (given in hexadecimal), and convert them into a x86 instructions (For example c3 -> retq)
I've tried doing it by calling gcc -c on an assembly file just containing
retq
retq
and then using a script to insert bytes where it says "c3 c3", then using objdump -d to see what it says now. But it seems that it messes up the format of the file unless I only pass an instruction of the same size as the original instruction bytes.
I'm running it on a Raspbian Pi (A linux based operating system) using SSH, BASH terminal. I'm using BASH shell scripts and python, as well as the tools listed here, and gdb.
Disassemble flat binary file: objdump -D -b binary -m i386 foo.bin. Or create an object file using .byte directives from assembly source, e.g. put .byte 0xc3 into foo.s then gcc -c foo.s then objdump -d foo.o
Say I have the following Assembly code:
.section .text
.globl _start
_start:
If I created an executable file using the following commands:
as 1.s -o 1.o
ld 1.o -o 1
Will the GNU Assembler add its own entry point to my executable which calls _start or will _start be the actual entry point?
See this question for more details.
The file crt0.o (or crt1.o or however this file is called) that contains the startup code mentioned in the other question has also been written in assembler.
So what the Linker ("ld") does is to search all object files (which are in fact all created using "as") for a symbol named "_start" which becomes the entry point.
You are of course free to add crt0.o to your assembler-written program when using "ld". In this case however you MUST NOT name your symbol "_start" but "main" in your assembler file:
.globl main
.text
main:
...
Otherwise "ld" will print an error message because it will find two symbols named "_start" and it does not know which one is the entry point!
You can check it this way:
objdump -x 1 # n.b. 1 is the name of your program
This will print, among other things:
start address 0x000000...
Take the address it gives you, and search for it elsewhere in the output. I think you will find it matches the start of the .text segment, as well as the _start symbol. If so, then _start is indeed the ELF entry point.
What's the best tool for converting PE binaries to ELF binaries?
Following is a brief motivation for this question:
Suppose I have a simple C program.
I compiled it using gcc for linux(this gives ELF), and using 'i586-mingw32msvc-gcc' for Windows(this gives a PE binary).
I want to analyze these two binaries for similarities, using Bitblaze's static analysis tool - vine(http://bitblaze.cs.berkeley.edu/vine.html)
Now vine doesn't have a good support for PE binaries, so I wanted to convert PE->ELF, and then carry on with my comparison/analysis.
Since all the analysis has to run on Linux, I would prefer a utility/tool that runs on Linux.
Thanks
It is possible to rebuild an EXE as an ELF binary, but the resulting binary will segfault very soon after loading, due to the missing operating system.
Here's one method of doing it.
Summary
Dump the section headers of the EXE file.
Extract the raw section data from the EXE.
Encapsulate the raw section data in GNU linker script snippets.
Write a linker script to build an ELF binary, including those scripts from the previous step.
Run ld with the linker script to produce the ELF file.
Run the new program, and watch it segfault as it's not running on Windows (and it tries to call functions in the Import Address Table, which doesn't exist).
Detailed Example
Dump the section headers of the EXE file. I'm using objdump from the mingw cross compiler package to do this.
$ i686-pc-mingw32-objdump -h trek.exe
trek.exe: file format pei-i386
Sections:
Idx Name Size VMA LMA File off Algn
0 AUTO 00172600 00401000 00401000 00000400 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE
1 .idata 00001400 00574000 00574000 00172a00 2**2
CONTENTS, ALLOC, LOAD, DATA
2 DGROUP 0002b600 00576000 00576000 00173e00 2**2
CONTENTS, ALLOC, LOAD, DATA
3 .bss 000e7800 005a2000 005a2000 00000000 2**2
ALLOC
4 .reloc 00013000 0068a000 0068a000 0019f400 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
5 .rsrc 00000a00 0069d000 0069d000 001b2400 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
Use dd (or a hex editor) to extract the raw section data from the EXE. Here, I'm just going to copy the code and data sections (named AUTO and DGROUP in this example). You may want to copy additional sections though.
$ dd bs=512 skip=2 count=2963 if=trek.exe of=code.bin
$ dd bs=512 skip=2975 count=347 if=trek.exe of=data.bin
Note, I've converted the file offsets and section sizes from hex to decimal to use as skip and count, but I'm using a block size of 512 bytes in dd to speed up the process (example: 0x0400 = 1024 bytes = 2 blocks # 512 bytes).
Encapsulate the raw section data in GNU ld linker scripts snippets (using the BYTE directive). This will be used to populate the sections.
cat code.bin | hexdump -v -e '"BYTE(0x" 1/1 "%02X" ")\n"' >code.ld
cat data.bin | hexdump -v -e '"BYTE(0x" 1/1 "%02X" ")\n"' >data.ld
Write a linker script to build an ELF binary, including those scripts from the previous step. Note I've also set aside space for the uninitialized data (.bss) section.
start = 0x516DE8;
ENTRY(start)
OUTPUT_FORMAT("elf32-i386")
SECTIONS {
.text 0x401000 :
{
INCLUDE "code.ld";
}
.data 0x576000 :
{
INCLUDE "data.ld";
}
.bss 0x5A2000 :
{
. = . + 0x0E7800;
}
}
Run the linker script with GNU ld to produce the ELF file. Note I have to use an emulation mode elf_i386 since I'm using 64-bit Linux, otherwise a 64-bit ELF would be produced.
$ ld -o elf_trek -m elf_i386 elf_trek.ld
ld: warning: elf_trek.ld contains output sections; did you forget -T?
$ file elf_trek
elf_trek: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV),
statically linked, not stripped
Run the new program, and watch it segfault as it's not running on Windows.
$ gdb elf_trek
(gdb) run
Starting program: /home/quasar/src/games/botf/elf_trek
Program received signal SIGSEGV, Segmentation fault.
0x0051d8e6 in ?? ()
(gdb) bt
\#0 0x0051d8e6 in ?? ()
\#1 0x00000000 in ?? ()
(gdb) x/i $eip
=> 0x51d8e6: sub (%edx),%eax
(gdb) quit
IDA Pro output for that location:
0051D8DB ; size_t stackavail(void)
0051D8DB proc stackavail near
0051D8DB push edx
0051D8DC call [ds:off_5A0588]
0051D8E2 mov edx, eax
0051D8E4 mov eax, esp
0051D8E6 sub eax, [edx]
0051D8E8 pop edx
0051D8E9 retn
0051D8E9 endp stackavail
For porting binaries to Linux, this is kind of pointless, given the Wine project.
For situations like the OP's, it may be appropriate.
I've found a simpler way to do this. Use the strip command.
Example
strip -O elf32-i386 -o myprogram.elf myprogram.exe
The -O elf32-i386 has it write out the file in that format.
To see supported formats run
strip --info
I am using the strip command from mxe, which on my system is actually named /opt/mxe/usr/bin/i686-w64-mingw32.static-strip.
I don't know whether this totally fits your needs, but is it an option for you to cross-compile with your MinGW version of gcc?
I mean do say: does it suit your needs to have i586-mingw32msvc-gcc compile direct to ELF format binaries (instead of the PEs you're currently getting). A description of how to do things in the other direction can be found here - I imagine it will be a little hacky but entirely possible to make this work for you in the other direction (I must admit I haven't tried it).