How can I tag ELF libs with build IDs?
I downloaded a precompiled library that has a sha1 sum in it:
user#localhost ~/tmp $ file foo.so.0
foo.so.0: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, BuildID[sha1]=0x7e3374eb34cafb69d3dca8b126f4aa33d44bb465, stripped
user#localhost ~/tmp $ ldd foo.so.0
linux-vdso.so.1 (0x00007fff955b1000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f436d3c9000)
libc.so.6 => /lib64/libc.so.6 (0x00007f436d022000)
/lib64/ld-linux-x86-64.so.2 (0x0000003000000000)
From http://fedoraproject.org/wiki/RolandMcGrath/BuildID
ld: new option --build-id:
This adds an option to ld to synthesize a .note.gnu.build-id section with type SHT_NOTE and flags SHF_ALLOC (read-only data), that contains an ELF note header and the build ID bits. This then goes into the link as if it were part of the first object file (so it may be placed or merged by the linker script). The build ID bits are determined as the very last thing ld does before writing out the linked file. You can give --build-id=style chose md5, uuid (128 random bits), or 0xabcdef (your chosen bytes in hex). Just --build-id defaults to md5, which computes an 128-bit MD5 signature based all the ELF header bits and section contents in the file--i.e., an ID that is unique among the set of meaningful contents for ELF files and identical when the output file would otherwise have been identical.
The Linux binutils-2.17.50.0.17 release includes this, in f8test1.
Related
I tried to remove symbol using:
objcopy -v -I elf32-little -N <function> main_lib.so new_lib.so
On which it says:
copy from `main_lib.so' [elf32-little] to `new_lib.so' [elf32-little]
There is no change between the two files, and the function is not removed.
I used to following commands to get the list of functions and format:
readelf -Ws main_lib.so > main_functions.txt
file main_lib.so
File command says:
main_lib.so: ELF 32-bit LSB shared object, ARM, EABI5 version 1 (SYSV),
dynamically linked (uses shared libs), BuildID[md5/uuid]=..., stripped
[root#wdctc1281 bin]# ldd node
linux-vdso.so.1 => (0x00007fffd33f2000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007f70f7855000)
librt.so.1 => /lib64/librt.so.1 (0x00007f70f764d000)
libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f70f7345000)
libm.so.6 => /lib64/libm.so.6 (0x00007f70f7043000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f70f6e2d000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f70f6c10000)
libc.so.6 => /lib64/libc.so.6 (0x00007f70f684f000)
/lib64/ld-linux-x86-64.so.2 (0x00007f70f7a61000)
What does the first line and last line mean? They don't look like the normal
xxxx.so => /lib64/xxxxx.so (0xxxxxxxxxxxxxxxxxxxx)
format.
the first line is the VDSO. this is described in depth in the vdso(7) manpage. basically it's a shared library that's embedded in your kernel and automatically loaded whenever a new process is exec-ed. that's why there's no filesystem path on the right side -- there is none! the file only exists in the kernel memory (well, not 100% precise, but see the man page for more info).
the last line is the ELF interpreter. this is described in depth in the ld.so(8) manpage. the reason it has a full path is because your program has the full path hardcoded in it. the reason it doesn't have an entry on the right side is that it's not linked against (directly) and thus no search was performed. you can check this by running:
$ readelf -l node | grep interpreter
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
$ scanelf -i node
ET_EXEC /lib64/ld-linux-x86-64.so.2 node
all the other lines are libraries you've linked against. you can see those by looking at DT_NEEDED tags when you run readelf -d on the file. since those files lack full paths, the ld.so needs to perform a dynamic path search for it. that's actually what the lines are telling you: "libdl.so.2 is needed, so when i searched for it, i found it at /lib64/libdl.so.2 (and was loaded into memory at address 0x00007f70f7855000)"
ldd filename shows you the program shared libraries used by the file.
libc.so.6, for example, is libc shared object version 6, which sits in /lib64 and its memory location is 0x00007f70f684f000.
The last line talks about ld-linux-x86-64.so version 2 under /lib64. This fellow will find and load shared libraries node needs. It will prep those libraries and run them. So, speaking in very crude terms, ld-linux-x86-64 is the runner. libc.so.6 and others are loaded and ldd shows the location of those shared libraries and memory locations. That is my understanding.
I have a linux app I'm building from source. When I run ldd against the binary, I understand most of the libraries...but not all.
Is there a way to add a flag to ld or gcc/g++ or anything I can do to determine why the linker chose to link against specific libraries?
Edit:
To explore the route #shloim set up, I tried the following:
> nm -u /lib/x86_64-linux-gnu/libcrypto.so.1.0.0
nm: /lib/x86_64-linux-gnu/libcrypto.so.1.0.0: no symbols
> file /lib/x86_64-linux-gnu/libcrypto.so.1.0.0
/lib/x86_64-linux-gnu/libcrypto.so.1.0.0: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, BuildID[sha1]=230ebe6145b6681d0cb7e4c9021f0d899c02e0c4, stripped
Is there an obvious reason why nm would not work on libcrypto?
This should show you all symbols used in the so file that are undefined within the so:
nm -u <your_so_file>
You can then compare it with
nm --defined-only <3rd_party_so_file>
And try to figure out the common symbols
Is there an obvious reason why nm would not work on libcrypto?
Generally nm is to list the symbols of object files. Here, nm is used for share object file. So try like this nm -D libcrypto.so.
readelf or objdump can also be used to check the symbols present in shared objects.
readelf -Ws will show all the symbols
I compile following C program on lubuntu 12.10 with anjuta
int main()
{
return 0;
}
the file name is foobar
then I open up terminal and write command
ndisasm foobar -b 32 1>asm.txt
(disassemble foobar with 32 bit instruction option and save disassembled result to asm.txt)
I open up asm.txt
there are many 0x0000 and miss-understandable code.
the instruction jg 0x47(0x7F45) on 0x00000000 and dec esp(0x4C) on 0x00000002
seems ELF file format signature.
(because the hex code 0x454c46 is 'ELF' in ascii)
the Linux might load this code to memory and don't jump to 0x00000000 because there is no executable code.
I have questions here.
how do I know the address of starting address?
which code is OK to ignore?(maybe many 0x0000 would be OK to ignore but what else?)
Even for the simplest program like yours, gcc is linking some libraries and some object files (notably crt0.o which calls your main and contains _start, the ELF starting point). And your binary is probably dynamically linked to some libc.so.6 so needs the dynamic linker (use ldd foobar to find out). Use gcc -v to understand what gcc is doing. And objdump has a lot of interesting flags or options.
You may also want to read the Assembly Howto, the X86 calling conventions, this question, the X86-64 ABI, these notes on X86-64 programming, etc
What's the best tool for converting PE binaries to ELF binaries?
Following is a brief motivation for this question:
Suppose I have a simple C program.
I compiled it using gcc for linux(this gives ELF), and using 'i586-mingw32msvc-gcc' for Windows(this gives a PE binary).
I want to analyze these two binaries for similarities, using Bitblaze's static analysis tool - vine(http://bitblaze.cs.berkeley.edu/vine.html)
Now vine doesn't have a good support for PE binaries, so I wanted to convert PE->ELF, and then carry on with my comparison/analysis.
Since all the analysis has to run on Linux, I would prefer a utility/tool that runs on Linux.
Thanks
It is possible to rebuild an EXE as an ELF binary, but the resulting binary will segfault very soon after loading, due to the missing operating system.
Here's one method of doing it.
Summary
Dump the section headers of the EXE file.
Extract the raw section data from the EXE.
Encapsulate the raw section data in GNU linker script snippets.
Write a linker script to build an ELF binary, including those scripts from the previous step.
Run ld with the linker script to produce the ELF file.
Run the new program, and watch it segfault as it's not running on Windows (and it tries to call functions in the Import Address Table, which doesn't exist).
Detailed Example
Dump the section headers of the EXE file. I'm using objdump from the mingw cross compiler package to do this.
$ i686-pc-mingw32-objdump -h trek.exe
trek.exe: file format pei-i386
Sections:
Idx Name Size VMA LMA File off Algn
0 AUTO 00172600 00401000 00401000 00000400 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE
1 .idata 00001400 00574000 00574000 00172a00 2**2
CONTENTS, ALLOC, LOAD, DATA
2 DGROUP 0002b600 00576000 00576000 00173e00 2**2
CONTENTS, ALLOC, LOAD, DATA
3 .bss 000e7800 005a2000 005a2000 00000000 2**2
ALLOC
4 .reloc 00013000 0068a000 0068a000 0019f400 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
5 .rsrc 00000a00 0069d000 0069d000 001b2400 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
Use dd (or a hex editor) to extract the raw section data from the EXE. Here, I'm just going to copy the code and data sections (named AUTO and DGROUP in this example). You may want to copy additional sections though.
$ dd bs=512 skip=2 count=2963 if=trek.exe of=code.bin
$ dd bs=512 skip=2975 count=347 if=trek.exe of=data.bin
Note, I've converted the file offsets and section sizes from hex to decimal to use as skip and count, but I'm using a block size of 512 bytes in dd to speed up the process (example: 0x0400 = 1024 bytes = 2 blocks # 512 bytes).
Encapsulate the raw section data in GNU ld linker scripts snippets (using the BYTE directive). This will be used to populate the sections.
cat code.bin | hexdump -v -e '"BYTE(0x" 1/1 "%02X" ")\n"' >code.ld
cat data.bin | hexdump -v -e '"BYTE(0x" 1/1 "%02X" ")\n"' >data.ld
Write a linker script to build an ELF binary, including those scripts from the previous step. Note I've also set aside space for the uninitialized data (.bss) section.
start = 0x516DE8;
ENTRY(start)
OUTPUT_FORMAT("elf32-i386")
SECTIONS {
.text 0x401000 :
{
INCLUDE "code.ld";
}
.data 0x576000 :
{
INCLUDE "data.ld";
}
.bss 0x5A2000 :
{
. = . + 0x0E7800;
}
}
Run the linker script with GNU ld to produce the ELF file. Note I have to use an emulation mode elf_i386 since I'm using 64-bit Linux, otherwise a 64-bit ELF would be produced.
$ ld -o elf_trek -m elf_i386 elf_trek.ld
ld: warning: elf_trek.ld contains output sections; did you forget -T?
$ file elf_trek
elf_trek: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV),
statically linked, not stripped
Run the new program, and watch it segfault as it's not running on Windows.
$ gdb elf_trek
(gdb) run
Starting program: /home/quasar/src/games/botf/elf_trek
Program received signal SIGSEGV, Segmentation fault.
0x0051d8e6 in ?? ()
(gdb) bt
\#0 0x0051d8e6 in ?? ()
\#1 0x00000000 in ?? ()
(gdb) x/i $eip
=> 0x51d8e6: sub (%edx),%eax
(gdb) quit
IDA Pro output for that location:
0051D8DB ; size_t stackavail(void)
0051D8DB proc stackavail near
0051D8DB push edx
0051D8DC call [ds:off_5A0588]
0051D8E2 mov edx, eax
0051D8E4 mov eax, esp
0051D8E6 sub eax, [edx]
0051D8E8 pop edx
0051D8E9 retn
0051D8E9 endp stackavail
For porting binaries to Linux, this is kind of pointless, given the Wine project.
For situations like the OP's, it may be appropriate.
I've found a simpler way to do this. Use the strip command.
Example
strip -O elf32-i386 -o myprogram.elf myprogram.exe
The -O elf32-i386 has it write out the file in that format.
To see supported formats run
strip --info
I am using the strip command from mxe, which on my system is actually named /opt/mxe/usr/bin/i686-w64-mingw32.static-strip.
I don't know whether this totally fits your needs, but is it an option for you to cross-compile with your MinGW version of gcc?
I mean do say: does it suit your needs to have i586-mingw32msvc-gcc compile direct to ELF format binaries (instead of the PEs you're currently getting). A description of how to do things in the other direction can be found here - I imagine it will be a little hacky but entirely possible to make this work for you in the other direction (I must admit I haven't tried it).