objdump and udis86 produce different output when disassembling /proc/kcore - linux

I need to disassemble /proc/kcore file in Linux and I need to obtain virtual addresses of some special instructions to put kprobes later on it. According to this document /proc/kcore is an image of physical memory, but in this question someone answered that it is kernel's virtual memory (exactly what I am looking for).
When I use objdump tool to disassemble it, it starts with address something like f7c0b000, but udis86 starts with 0x0 (and totally different instruction). When I try to grep some specific instruction, let's say mov 0xf7c1d60c,%edx, I got:
objdump
f7c0b022 mov 0xf7c1d60c,%edx
udis86
290ec02a mov 0xf7c1d60c,%edx
It looks like the offset between udis86 and objdump is always 0xbffff000. Why so strange offset? How can I obtain virtual address of specific instruction? Somewhere I've read, that kernel is statically mapped at virtual address 0xc0000000 + 0x100000. If /proc/kcore is really physical image, is it correct only to add 0x100000 to addresses returned by objdump and I will get virtual address?

objdump understands ELF format files (such as /proc/kcore). It is able to extract the executable sections of the file while ignoring non-executable content (such as .note sections).
You can see the structure of an ELF exectuable using the -h flag, for example:
# objdump -h /proc/kcore
/proc/kcore: file format elf64-x86-64
Sections:
Idx Name Size VMA LMA File off Algn
0 note0 00001944 0000000000000000 0000000000000000 000002a8 2**0
CONTENTS, READONLY
1 .reg/0 000000d8 0000000000000000 0000000000000000 0000032c 2**2
CONTENTS
2 .reg 000000d8 0000000000000000 0000000000000000 0000032c 2**2
CONTENTS
3 load1 00800000 ffffffffff600000 0000000000000000 7fffff602000 2**12
CONTENTS, ALLOC, LOAD, CODE
(...)
It looks like the udcli tool from udis86 probably starts disassembling things from the beginning of the file, which suggests that your output will probably start with a bunch of irrelevant output and it's up to you to figure out where execution starts.
UPDATE
Here's the verification. We use this answer to extract the first load section from /proc/kcore, like this:
# dd if=/proc/kcore of=mysection bs=1 skip=$[0x7fffff602000] count=$[0x00800000]
And now if we view that with udcli:
# udcli mysection
0000000000000000 48 dec eax
0000000000000001 c7c060000000 mov eax, 0x60
0000000000000007 0f05 syscall
0000000000000009 c3 ret
000000000000000a cc int3
000000000000000b cc int3
We see that it looks almost identical to the output of objdump -d /proc/kcore:
# objdump -d /proc/kcore
/proc/kcore: file format elf64-x86-64
Disassembly of section load1:
ffffffffff600000 <load1>:
ffffffffff600000: 48 c7 c0 60 00 00 00 mov $0x60,%rax
ffffffffff600007: 0f 05 syscall
ffffffffff600009: c3 retq
ffffffffff60000a: cc int3
ffffffffff60000b: cc int3

Related

"Whirlwind Tutorial on Teensy ELF Executables" -- why is the output of ld 10X bigger, 20 years later? [duplicate]

For a university course, I like to compare code-sizes of functionally similar programs if written and compiled using gcc/clang versus assembly. In the process of re-evaluating how to further shrink the size of some executables, I couldn't trust my eyes when the very same assembly code I assembled/linked 2 years ago now has grown >10x in size after building it again (which true for multiple programs, not only helloworld):
$ make
as -32 -o helloworld-asm-2020.o helloworld-asm-2020.s
ld -melf_i386 -o helloworld-asm-2020 helloworld-asm-2020.o
$ ls -l
-rwxr-xr-x 1 xxx users 708 Jul 18 2018 helloworld-asm-2018*
-rwxr-xr-x 1 xxx users 8704 Nov 25 15:00 helloworld-asm-2020*
-rwxr-xr-x 1 xxx users 4724 Nov 25 15:00 helloworld-asm-2020-n*
-rwxr-xr-x 1 xxx users 4228 Nov 25 15:00 helloworld-asm-2020-n-sstripped*
-rwxr-xr-x 1 xxx users 604 Nov 25 15:00 helloworld-asm-2020.o*
-rw-r--r-- 1 xxx users 498 Nov 25 14:44 helloworld-asm-2020.s
The assembly code is:
.code32
.section .data
msg: .ascii "Hello, world!\n"
len = . - msg
.section .text
.globl _start
_start:
movl $len, %edx # EDX = message length
movl $msg, %ecx # ECX = address of message
movl $1, %ebx # EBX = file descriptor (1 = stdout)
movl $4, %eax # EAX = syscall number (4 = write)
int $0x80 # call kernel by interrupt
# and exit
movl $0, %ebx # return code is zero
movl $1, %eax # exit syscall number (1 = exit)
int $0x80 # call kernel again
The same hello world program, compiled using GNU as and GNU ld (always using 32-bit assembly) was 708 bytes then, and has grown to 8.5K now. Even when telling the linker to turn off page alignment (ld -n), it still has almost 4.2K. stripping/sstripping doesn't pay off either.
readelf tells me that the start of section headers is much later in the code (byte 468 vs 8464), but I have no idea why. It's running on the same arch system as in 2018, the Makefile is the same and I'm not linking against any libraries (especially not libc). I guess something regarding ld has changed due to the fact that the object file is still quite small, but what and why?
Disclaimer: I'm building 32-bit executables on an x86-64 machine.
Edit: I'm using GNU binutils (as & ld) version 2.35.1 Here is a base64-encoded archive which includes the source and both executables (small old one, large new one) :
cat << EOF | base64 -d | tar xj
QlpoOTFBWSZTWVaGrEQABBp////xebj/7//Xf+a8RP/v3/rAAEVARARAeEADBAAAoCAI0AQ+NAam
ytMpCGmpDVPU0aNpGmh6Rpo9QAAeoBoADQaNAADQ09IAACSSGUwaJpTNQGE9QZGhoADQPUAA0AAA
AA0aA4AAAABoAAAAA0GgAAAAZAGgAHAAAAANAAAAAGg0AAAADIA0AASJCBIyE8hHpqPVPUPU/VAa
fqn6o0ep6BB6TQaNGj0j1ABobU00yeU9JYiuVVZKYE+dKNa3wls6x81yBpGAN71NoylDUvNryWiW
E4ER8XkfpaJcPb6ND12ULEqkQX3eaBHP70Apa5uFhWNDy+U3Ekj+OLx5MtDHxQHQLfMcgCHrGayE
Dc76F4ZC4rcRkvTW4S2EbJAsbBGbQxSbx5o48zkyk5iPBBhJowtCSwDBsQBc0koYRSO6SgJNL0Bg
EmCoxCDAs5QkEmTGmQUgqZNIoxsmwDmDQe0NIDI0KjQ64leOr1fVk6AaVhjOAJjLrEYkYy4cDbyS
iXSuILWohNh+PA9Izk0YUM4TQQGEYNgn4oEjGmAByO+kzmDIxEC3Txni6E1WdswBJLKYiANdiQ2K
00jU/zpMzuIhjTbgiBqE24dZWBcNBBAAioiEhCQEIfAR8Vir4zNQZFgvKZa67Jckh6EHZWAWuf6Q
kGy1lOtA2h9fsyD/uPPI2kjvoYL+w54IUKBEEYFBIWRNCNpuyY86v3pNiHEB7XyCX5wDjZUSF2tO
w0PVlY2FQNcLQcbZjmMhZdlCGkVHojuICHMMMB5kQQSZRwNJkYTKz6stT/MTWmozDCcj+UjtB9Cf
CUqAqqRlgJdREtMtSO4S4GpJE2I/P8vuO9ckqCM2+iSJCLRWx2Gi8VSR8BIkVX6stqIDmtG8xSVU
kk7BnC5caZXTIynyI0doXiFY1+/Csw2RUQJroC0lCNiIqVVUkTqTRMYqKNVGtCJ5yfo7e3ZpgECk
PYUEihPU0QVgfQ76JA8Eb16KCbSzP3WYiVApqmfDhUk0aVc+jyBJH13uKztUuva8F4YdbpmzomjG
kSJmP+vCFdKkHU384LdRoO0LdN7VJlywJ2xJdM+TMQ0KhMaicvRqfC5pHSu+gVDVjfiss+S00ikI
DeMgatVKKtcjsVDX09XU3SzowLWXXunnFZp/fP3eN9Rj1ubiLc0utMl3CUUkcYsmwbKKrWhaZiLO
u67kMSsW20jVBcZ5tZUKgdRtu0UleWOs1HK2QdMpyKMxTRHWhhHwMnVEsWIUEjIfFEbWhRTRMJXn
oIBSEa2Q0llTBfJV0LEYEQTBTFsDKIxhgqNwZB2dovl/kiW4TLp6aGXxmoIpVeWTEXqg1PnyKwux
caORGyBhTEPV2G7/O3y+KeAL9mUM4Zjl1DsDKyTZy8vgn31EDY08rY+64Z/LO5tcRJHttMYsz0Fh
CRN8LTYJL/I/4u5IpwoSCtDViIA=
EOF
Update:
When using ld.gold instead of ld.bfd (to which /usr/bin/ld is symlinked to by default), the executable size becomes as small as expected:
$ cat Makefile
TARGET=helloworld
all:
as -32 -o ${TARGET}-asm.o ${TARGET}-asm.s
ld.bfd -melf_i386 -o ${TARGET}-asm-bfd ${TARGET}-asm.o
ld.gold -melf_i386 -o ${TARGET}-asm-gold ${TARGET}-asm.o
rm ${TARGET}-asm.o
$ make -q
$ ls -l
total 68
-rw-r--r-- 1 eso eso 200 Dec 1 13:57 Makefile
-rwxrwxr-x 1 eso eso 8700 Dec 1 13:57 helloworld-asm-bfd
-rwxrwxr-x 1 eso eso 732 Dec 1 13:57 helloworld-asm-gold
-rw-r--r-- 1 eso eso 498 Dec 1 13:44 helloworld-asm.s
Maybe I just used gold previously without being aware.
It's not 10x in general, it's page-alignment of a couple sections as Jester says, per changes to ld's default linker script for security reasons:
First change: Making sure data from .data isn't present in any of the mapping of .text, so none of that static data is available for ROP / Spectre gadgets in an executable page. (In older ld, that meant the program-headers mapped the same disk-block twice, also into a RW-without-exec segment for the actual .data section. The executable mapping was still read-only.)
More recent change: Separate .rodata from .text into separate segments, again so static data isn't mapped into an executable page. Previously, const char code[]= {...} could be cast to a function pointer and called, without needing mprotect or gcc -z execstack or other tricks, if you wanted to test shellcode that way. (A separate Linux kernel change made -z execstack only apply to the actual stack, not READ_IMPLIES_EXEC.)
See Why an ELF executable could have 4 LOAD segments? for this history, including the strange fact that .rodata is in a separate segment from the read-only mapping for access to the ELF metadata.
That extra space is just 00 padding and will compress well in a .tar.gz or whatever.
So it has a worst-case upper bound of about 2x 4k extra pages of padding, and tiny executables are close to that worst case.
gcc -Wl,--nmagic will turn off page-alignment of sections if you want that for some reason. (see the ld(1) man page) I don't know why that doesn't pack everything down to the old size. Perhaps checking the default linker script would shed some light, but it's pretty long. Run ld --verbose to see it.
stripping won't help for padding that's part of a section; I think it can only remove whole sections.
ld -z noseparate-code uses the old layout, only 2 total segments to cover the .text and .rodata sections, and the .data and .bss sections. (And the ELF metadata that dynamic linking wants access to.)
Related:
Linking with gcc instead of ld
This question is about ld, but note that if you're using gcc -nostdlib, that used to also default to making a static executable. But modern Linux distros config GCC with -pie as the default, and GCC won't make a static-pie by default even if there aren't any shared libraries being linked. Unlike with -no-pie mode where it will simply make a static executable in that case. (A static-pie still needs startup code to apply relocations for any absolute addresses.)
So the equivalent of ld directly is gcc -nostdlib -static (which implies -no-pie). Or gcc -nostdlib -no-pie should let it default to -static when there are no shared libs being linked. You can combine this with -Wl,--nmagic and/or -Wl,-z -Wl,noseparate-code.
Also:
A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux - eventually making a 45 byte executable, with the machine code for an _exit syscall stuffed into the ELF program header itself.
FASM can make quite small executables, using its mode where it outputs a static executable (not object file) directly with no ELF section metadata, just program headers. (It's a pain to debug with GDB or disassemble with objdump; most tools assume there will be section headers, even though they're not needed to run static executables.)
What is a reasonable minimum number of assembly instructions for a small C program including setup?
What's the difference between "statically linked" and "not a dynamic executable" from Linux ldd? (static vs. static-pie vs. (dynamic) PIE that happens to have no shared libraries.)

Analyzing segmentation fault without core file

Suppose my binaries are running in a customer site where I cannot enable core dump generation using ulimit -c . How do engineers debug the segmentation faults in such real world scenarios? Is there any other method of debugging or identifying crashes without core dumps generated.
In the past, I had to deal with this kind of restriction on several occasions. A segmentation fault or, more generally, abnormal process termination had to be investigated with the caveat that a core dump was not available.
For Linux, our platform of choice for this walkthrough, a few reasons come to mind:
Core dump generation is disabled altogether (using limits.conf or ulimit)
The target directory (current working directory or a directory in /proc/sys/kernel/core_pattern) does not exist or is inaccessible due to filesystem permissions or SELinux
The target filesystem has insufficient diskspace resulting in a partial dump
For all of those, the net result is the same: there's no (valid) core dump to use for analysis. Fortunately, a workaround exists for post-mortem debugging that has the potential to save the day, but given it's inherent limitations, your mileage may vary from case to case.
Identifying the Faulting Instruction
The following sample contains a classic use-after-free memory error:
#include <iostream>
struct Test
{
const std::string &m_value;
Test(const std::string &value):
m_value(value)
{
}
void print()
{
std::cout << m_value << std::endl;
}
};
int main()
{
std::string *value = new std::string("this is a test");
Test test(*value);
delete value;
test.print();
return 0;
}
After delete value, the std::string reference Test::m_value points to inaccessible memory. Therefore, running it results in a segmentation fault:
$ ./a.out
Segmentation fault
When a process terminates due to an access violation, the Linux kernel creates a log entry accessible via dmesg and, depending on the system's configuration, the syslog (usually /var/log/messages). The example (compiled with -O0) creates the following entry:
$ dmesg | grep segfault
[80440.957955] a.out[7098]: segfault at ffffffffffffffe8 ip 00007f9f2c2b56a3 sp 00007ffc3e75bc48 error 5 in libstdc++.so.6.0.19[7f9f2c220000+e9000]
The corresponding Linux kernel source from arch/x86/mm/fault.c:
printk("%s%s[%d]: segfault at %lx ip %px sp %px error %lx",
loglvl, tsk->comm, task_pid_nr(tsk), address,
(void *)regs->ip, (void *)regs->sp, error_code);
The error (error_code) reveals what the trigger was. It's a CPU-specific bit set (x86). In our case, the value 5 (101 in binary) indicates that the page represented by the faulting address 0xffffffffffffffe8 was mapped but inaccessible due to page protection and a read was attempted.
The log message identifies the module that executed the faulting instruction: libstdc++.so.6.0.1. The sample was compiled without optimization, so the call to std::basic_ostream<char, std::char_traits<char> >& std::operator<< <char, std::char_traits<char>, std::allocator<char> >(std::basic_ostream<char, std::char_traits<char> >&, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) was not inlined:
400bef: e8 4c fd ff ff callq 400940 <_ZStlsIcSt11char_traitsIcESaIcEERSt13basic_ostreamIT_T0_ES7_RK
SbIS4_S5_T1_E#plt>
The STL performs the read access. Knowing those basics, how can we identify where the segmentation fault occurred exactly? The log entry features two essential addresses we need for doing so:
ip 00007f9f2c2b56a3 [...] error 5 in
^^^^^^^^^^^^^^^^
libstdc++.so.6.0.19[7f9f2c220000+e9000]
^^^^^^^^^^^^
The first is the instruction pointer (rip) at the time of the access violation, the second is the address the .text section of the library is mapped to. By subtracting the .text base address from rip, we get the relative address of the instruction in the library and can disassemble the implementation using objdump (you can simply search for the offset):
0x7f9f2c2b56a3-0x7f9f2c220000=0x956a3
$ objdump --demangle -d /usr/lib64/libstdc++.so.6
[...]
00000000000956a0 <std::basic_ostream<char, std::char_traits<char> >& std::operator<< <char, std::char_traits<char>, s
td::allocator<char> >(std::basic_ostream<char, std::char_traits<char> >&, std::basic_string<char, std::char_traits<ch
ar>, std::allocator<char> > const&)##GLIBCXX_3.4>:
956a0: 48 8b 36 mov (%rsi),%rsi
956a3: 48 8b 56 e8 mov -0x18(%rsi),%rdx
^^^^^
956a7: e9 24 4e fc ff jmpq 5a4d0 <std::basic_ostream<char, std::char_traits<char> >& std::__ostream_insert<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*, long)#plt>
956ac: 0f 1f 40 00 nopl 0x0(%rax)
[...]
Is that the correct instruction? We can consult GDB to confirm our analysis:
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7b686a3 in std::basic_ostream<char, std::char_traits<char> >& std::operator<< <char, std::char_traits<char>, std::allocator<char> >(std::basic_ostream<char, std::char_traits<char> >&, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /lib64/libstdc++.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.17-323.el7_9.x86_64 libgcc-4.8.5-44.el7.x86_64 libstdc++-4.8.5-44.el7.x86_64
(gdb) disass
Dump of assembler code for function _ZStlsIcSt11char_traitsIcESaIcEERSt13basic_ostreamIT_T0_ES7_RKSbIS4_S5_T1_E:
0x00007ffff7b686a0 <+0>: mov (%rsi),%rsi
=> 0x00007ffff7b686a3 <+3>: mov -0x18(%rsi),%rdx
0x00007ffff7b686a7 <+7>: jmpq 0x7ffff7b2d4d0 <_ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PKS3_l#plt>
End of assembler dump.
GDB shows the very same instruction. We can also use a debugging session to verify the read address:
(gdb) print /x $rsi-0x18
$2 = 0xffffffffffffffe8
This value matches the read address in the log entry.
Identifying the Callers
So, despite the absence of a core dump, the kernel output enables us to identify the exact location of the segmentation fault. In many scenarios, though, that is far from being enough. For one thing, we're missing the list of calls that got us to that point - the call stack or stack trace.
Without a dump in the backpack, you have two options to get hold of the callers: you can start your process using catchsegv (a glibc utility) or you can implement your own signal handler.
catchsegv serves as a wrapper, generates the stack trace, and also dumps register values and the memory map:
$ catchsegv ./a.out
*** Segmentation fault
Register dump:
RAX: 0000000002158040 RBX: 0000000002158040 RCX: 0000000002158000
[...]
Backtrace:
/lib64/libstdc++.so.6(_ZStlsIcSt11char_traitsIcESaIcEERSt13basic_ostreamIT_T0_ES7_RKSbIS4_S5_T1_E+0x3)[0x7f1794fd36a3]
??:?(_ZN4Test5printEv)[0x400bf4]
??:?(main)[0x400b2d]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f179467a555]
??:?(_start)[0x4009e9]
Memory map:
00400000-00401000 r-xp 00000000 08:02 50331747 /home/user/a.out
[...]
7f1794f3e000-7f1795027000 r-xp 00000000 08:02 33600977 /usr/lib64/libstdc++.so.6.0.19
7f1795027000-7f1795227000 ---p 000e9000 08:02 33600977 /usr/lib64/libstdc++.so.6.0.19
7f1795227000-7f179522f000 r--p 000e9000 08:02 33600977 /usr/lib64/libstdc++.so.6.0.19
7f179522f000-7f1795231000 rw-p 000f1000 08:02 33600977 /usr/lib64/libstdc++.so.6.0.19
[...]
How does catchsegv work? It essentially injects a signal handler using LD_PRELOAD and the library libSegFault.so. If your application already happens to install a signal handler for SIGSEGV and you intend to take advantage of libSegFault.so, your signal handler needs to forward the signal to the original handler (as returned by sigaction(SIGSEGV, NULL)).
The second option is to implement the stack trace functionality yourself using a custom signal handler and backtrace(). This allows you to customize the output location and the output itself.
Based on that information, we can essentially do the same we did before (0x7f1794fd36a3-0x7f1794f3e000=0x956a3). This time around, we can go back to the callers to dig deeper. The second frame is represented by the following line:
??:?(_ZN4Test5printEv)[0x400bf4]
0x400bf4 is the address the callee returns to after Test::print(), it's located in the executable. We can visualize the call site as follows:
$ objdump --demangle -d ./a.out
[...]
400bea: bf a0 20 60 00 mov $0x6020a0,%edi
400bef: e8 4c fd ff ff callq 400940 <std::basic_ostream<char, std::char_traits<char> >& std::operator<< <char, std:
:char_traits<char>, std::allocator<char> >(std::basic_ostream<char, std::char_traits<char> >&, std::basic_string<char, std::char_trai
ts<char>, std::allocator<char> > const&)#plt>
400bf4: be 70 09 40 00 mov $0x400970,%esi
^^^^^^
400bf9: 48 89 c7 mov %rax,%rdi
400bfc: e8 5f fd ff ff callq 400960 <std::ostream::operator<<(std::ostream& (*)(std::ostream&))#plt>
[...]
Note that the output of objdump matches the address in this instance because we run it against the executable, which has a default base address of 0x400000 on x86_64 - objdump takes that into account. With address space layout randomization (ASLR) enabled (compiled with -fpie, linked with -pie), the base address has to be taken into account as outlined before.
Going back further involves the same steps:
??:?(main)[0x400b2d]
$ objdump --demangle -d ./a.out
[...]
400b1c: e8 af fd ff ff callq 4008d0 <operator delete(void*)#plt>
400b21: 48 8d 45 d0 lea -0x30(%rbp),%rax
400b25: 48 89 c7 mov %rax,%rdi
400b28: e8 a7 00 00 00 callq 400bd4 <Test::print()>
400b2d: b8 00 00 00 00 mov $0x0,%eax
^^^^^^
400b32: eb 2a jmp 400b5e <main+0xb1>
[...]
Until now, we've been manually translating the absolute address to a relative address. Instead, the base address of the module can be passed to objdump via --adjust-vma=<base-address>. That way, the value of rip or a caller's address can be used directly.
Adding Debug Symbols
We've come a long way without a dump. For debugging to be effective, another critical puzzle piece is absent, however: debug symbols. Without them, it can be difficult to map the assembly to the corresponding source code. Compiling the sample with -O3 and without debug information illustrates the problem:
[98161.650474] a.out[13185]: segfault at ffffffffffffffe8 ip 0000000000400a4b sp 00007ffc9e738270 error 5 in a.out[400000+1000]
As a consequence of inlining, the log entry now points to our executable as the trigger. Using objdump gets us to the following:
400a3e: e8 dd fe ff ff callq 400920 <operator delete(void*)#plt>
400a43: 48 8b 33 mov (%rbx),%rsi
400a46: bf a0 20 60 00 mov $0x6020a0,%edi
400a4b: 48 8b 56 e8 mov -0x18(%rsi),%rdx
^^^^^^
400a4f: e8 4c ff ff ff callq 4009a0 <std::basic_ostream<char, std::char_traits<char> >& std::__ostream_insert<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*, long)#plt>
400a54: 48 89 c5 mov %rax,%rbp
400a57: 48 8b 00 mov (%rax),%rax
Part of the stream implementation was inlined, making it harder to identify the associated source code. Without symbols, you have to use export symbols, calls (like operator delete(void*)) and the surrounding instructions (mov $0x6020a0 loads the address of std::cout: 00000000006020a0 <std::cout##GLIBCXX_3.4>) for the purpose of orientation.
With debug symbols (-g), more context is available by calling objdump with --source:
400a43: 48 8b 33 mov (%rbx),%rsi
operator<<(basic_ostream<_CharT, _Traits>& __os,
const basic_string<_CharT, _Traits, _Alloc>& __str)
{
// _GLIBCXX_RESOLVE_LIB_DEFECTS
// 586. string inserter not a formatted function
return __ostream_insert(__os, __str.data(), __str.size());
400a46: bf a0 20 60 00 mov $0x6020a0,%edi
400a4b: 48 8b 56 e8 mov -0x18(%rsi),%rdx
^^^^^^
400a4f: e8 4c ff ff ff callq 4009a0 <std::basic_ostream<char, std::char_traits<char> >& std::__ostream_insert<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*, long)#plt>
400a54: 48 89 c5 mov %rax,%rbp
That worked as expected. In the real world, debug symbols are not embedded in the binaries - they are managed in separate debuginfo packages. In those circumstances, objdump ignores debug symbols even if they are installed. To address this limitation, symbols have to be re-added to the affected binary. The following procedure creates detached symbols and re-adds them using eu-unstrip from elfutils to the benefit of objdump:
# compile with debug info
g++ segv.cxx -O3 -g
# create detached debug info
objcopy --only-keep-debug a.out a.out.debug
# remove debug info from executable
strip -g a.out
# re-add debug info to executable
eu-unstrip ./a.out ./a.out.debug -o ./a.out-debuginfo
# objdump with executable containing debug info
objdump --demangle -d ./a.out-debuginfo --source
Using GDB instead of objdump
Thus far, we've been using objdump because it's usually available, even on production systems. Can we just use GDB instead? Yes, by executing gdb with the module of interest. I use 0x0x400a4b as in the previous objdump invocation:
$ gdb ./a.out
[...]
(gdb) disass 0x400a4b
Dump of assembler code for function main():
[...]
0x0000000000400a43 <+67>: mov (%rbx),%rsi
0x0000000000400a46 <+70>: mov $0x6020a0,%edi
0x0000000000400a4b <+75>: mov -0x18(%rsi),%rdx
0x0000000000400a4f <+79>: callq 0x4009a0 <_ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PKS3_l#plt>
0x0000000000400a54 <+84>: mov %rax,%rbp
In contrast to objdump, GDB can deal with external symbol information without a hitch. disass /m corresponds to objdump --source:
(gdb) disass /m 0x400a4b
Dump of assembler code for function main():
[...]
21 Test test(*value);
22 delete value;
0x0000000000400a25 <+37>: test %rbx,%rbx
0x0000000000400a28 <+40>: je 0x400a43 <main()+67>
0x0000000000400a3b <+59>: mov %rbx,%rdi
0x0000000000400a3e <+62>: callq 0x400920 <_ZdlPv#plt>
23 test.print();
24 return 0;
25 }
0x0000000000400a88 <+136>: add $0x18,%rsp
[...]
End of assembler dump.
In case of an optimized binary, GDB might skip instructions in this mode if the source code cannot be mapped unambiguously. Our instruction at 0x400a4b is not listed. objdump never skips instructions and might skip the source context instead - an approach, that I prefer for debugging at this level. This does not mean that GDB is not useful for this task, it's just something to be aware of.
Final Thoughts
Termination reason, registers, memory map, and stack trace. It's all there without even a trace of a core dump. While definitely useful (I fixed quite a few crashes that way), you have to keep in mind that you're still missing valuable information by going that route, most notably the stack and heap as well as per-thread data (thread metadata, registers, stack).
So, whatever the scenario may be, you should seriously consider enabling core dump generation and ensure that dumps can be generated successfully if push comes to shove. Debugging in itself is complex enough, debugging without information you could technically have needlessly increases complexity and turnaround time, and, more importantly, significantly lowers the probability that the root cause can be found and addressed in a timely manner.

what is segment 00 in my Linux executable program (64 bits)

Here is a very simple assembly program, just return 12 after executed.
$ cat a.asm
global _start
section .text
_start: mov rax, 60 ; system call for exit
mov rdi, 12 ; exit code 12
syscall
It can be built and executed correctly:
$ nasm -f elf64 a.asm && ld a.o && ./a.out || echo $?
12
But the size of a.out is big, it is more than 4k:
$ wc -c a.out
4664 a.out
I try to understand it by reading elf content:
$ readelf -l a.out
Elf file type is EXEC (Executable file)
Entry point 0x401000
There are 2 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000
0x00000000000000b0 0x00000000000000b0 R 0x1000
LOAD 0x0000000000001000 0x0000000000401000 0x0000000000401000
0x000000000000000c 0x000000000000000c R E 0x1000
Section to Segment mapping:
Segment Sections...
00
01 .text
it is strange, segment 00 is aligned by 0x1000, I think it means such segment at least will occupy 4096 bytes.
My question is what is this segment 00?
(nasm version 2.14.02, ld version 2.34, os is Ubuntu 20.04.1)
Since it starts at file offset zero, it is probably a "padding" segment introduced to make the loading of the ELF more efficient.
The .text segment will, in fact, be already aligned in the file as it should be in memory.
You can force ld not to align sections both in memory and in the file with -n. You can also strip the symbols with -s.
This will reduce the size to about 352 bytes.
Now the ELF contains:
The ELF header (Needed)
The program header table (Needed)
The code (Needed)
The string table (Possibly unneeded)
The section table (Possibly unneeded)
The string table can be removed, but apparently strips can't do that.
I've removed the .shstrtab section data and all the section headers manually to shrink the size down to 144 bytes.
Consider that 64 bytes come from the ELF header, 60 from the single program header and 12 from your code; for a total of 136 bytes.
The extra 8 bytes are padding, 4 bytes at the end of the code section (easy to remove), and one at the end of the program header (which requires a bit of patching).

Is it safe for ld to interpret executables linked by gold?

Take a simple hello world program and compile it as follows:
> g++ --version
g++ 6.3.0
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
> g++ -fuse-ld=gold test.cpp -o test
Inspecting the binary produced:
> readelf -l ./test
Elf file type is EXEC (Executable file)
Entry point 0x400750
There are 9 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
PHDR 0x0000000000000040 0x0000000000400040 0x0000000000400040
0x00000000000001f8 0x00000000000001f8 R 8
INTERP 0x0000000000000238 0x0000000000400238 0x0000000000400238
0x000000000000001c 0x000000000000001c R 1
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000
0x0000000000000ac8 0x0000000000000ac8 R E 1000
LOAD 0x0000000000000dc0 0x0000000000401dc0 0x0000000000401dc0
0x0000000000000288 0x00000000000003d0 RW 1000
DYNAMIC 0x0000000000000de0 0x0000000000401de0 0x0000000000401de0
0x0000000000000200 0x0000000000000200 RW 8
NOTE 0x0000000000000254 0x0000000000400254 0x0000000000400254
0x0000000000000044 0x0000000000000044 R 4
GNU_EH_FRAME 0x0000000000000a8c 0x0000000000400a8c 0x0000000000400a8c
0x000000000000003c 0x000000000000003c R 4
GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 RW 0
GNU_RELRO 0x0000000000000dc0 0x0000000000401dc0 0x0000000000401dc0
0x0000000000000240 0x0000000000000240 RW 8
Section to Segment mapping:
Segment Sections...
00
01 .interp
02 .interp .note.ABI-tag .note.gnu.build-id .dynsym .dynstr .gnu.hash .gnu.version .gnu.version_r .rela.dyn .rela.plt .init .plt .text .fini .rodata .eh_frame .eh_frame_hdr
03 .jcr .fini_array .init_array .dynamic .got .got.plt .data .bss
04 .dynamic
05 .note.ABI-tag .note.gnu.build-id
06 .eh_frame_hdr
07
08 .jcr .fini_array .init_array .dynamic .got
Notice that the interpreter used is ld. Whilst the program happens to work, I've not been able to find any information on whether this is safe. For all I know, gold interprets the ELF specification in a subtly different and incompatible way that requires a different interpreter.
I've done my best to research this, but have been unable to find anything that answers my question. The closest I've found is that gold struggles to link the Linux kernel (or struggled, since time has past and it may have been fixed).
You're falling into the naming trap — gold is a link editor, while ld.so is a dynamic loader. Although at different times, they are called linkers (the latter often referred to also as runtime linker.)
Their scope and usage is very different, the first one generates the final executable you'll eventually run, while the latter takes the generated file, finds its dependencies, and resolve (links) the undefined symbols between those.
Indeed, gold and ld (precisely, bfd-ld), the link editors, are provided by binutils (or alternative toolchain packages such as clang and so on), while ld.so is provided by the C library package, usually glibc on Linux distributions, but alternatively uclibc or musl.
Combining this with Martin Rosenau's comment...
Looking at the content of /usr/bin/gold, you can see that the string /lib64/ld-linux-x86-64.so.2 is stored inside the gold executable. This means that the gold linker itself "decides" using that runtime interpreter. For this reason I doubt that there are incompatibilities.
... ld.so should be compatible with the gold linker.

Get machine code of the proccess by PID without attaching a debugger

I want to get a machine code of the running proccess by his PID for analysing malicious instructions, by using heuristic methods of data analysing.
All I need to know is list of current machine instructions and values of registers (EIP, EAX, EBX...).
I can use gdb for reach this goal gdb output, but is take a several problems:
I don't know how interact with gdb from my application;
malicious code can use some technics of debugger detection like this: http://www.ouah.org/linux-anti-debugging.txt
https://www.youtube.com/watch?v=UTVp4jpJoyc&list=LLw7XNcx80oj63tRYAg7hrsA
for windows;
Getting info from console output makes work of my application slower.
Is are any way to get this information by PID in Linux? Or maybe Windows?
you may have a look to gcore:
$ gcore
usage: gcore [-o filename] pid
so you can dump process core using its pid:
$ gcore 792
warning: Could not load vsyscall page because no executable was specified
0x00007f5f73998410 in ?? ()
Saved corefile core.792
and then open it in gdb:
$ gdb -c core.792
GNU gdb (GDB) Fedora 8.0.1-30.fc26
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
[...]
[New LWP 792]
Missing separate debuginfo for the main executable file
Try: dnf --enablerepo='*debug*' install /usr/lib/debug/.build-id/09/b9d38bb6291b6282de4a2692e45448828d50da
Core was generated by `./a.out'.
#0 0x00007f5f73998410 in ?? ()
(gdb) info registers
rax 0xfffffffffffffe00 -512
rbx 0x0 0
rcx 0x7f5f73998410 140047938061328
rdx 0x1 1
rsi 0x7ffd30683d73 140725415591283
rdi 0x3 3
rbp 0x7ffd30683d90 0x7ffd30683d90
rsp 0x7ffd30683d68 0x7ffd30683d68
r8 0x1d 29
r9 0x0 0
r10 0x3 3
r11 0x246 582
r12 0x4006d0 4196048
r13 0x7ffd30683e70 140725415591536
r14 0x0 0
r15 0x0 0
rip 0x7f5f73998410 0x7f5f73998410
eflags 0x246 [ PF ZF IF ]
cs 0x33 51
ss 0x2b 43
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x0 0
or even using the binary image from /proc to get some symbols:
gdb -c core.792 /proc/792/exe
You may know that you can pass scripts to gdb, this can ease not having to interact with it from your binary (man gdb for more details).
if you don't want to use gdb directly you may try using ptrace() directly, but it is for sure more work.
For the anti debugging technics, well... they work and there is no easy way to handle them directly as far as I know, each one may be worked arounded manually, (patching binary, disassembling from unaligned addresses manually by setting then in objdump, etc...)
I'm not an expert of the domain, I hope this will help you a bit.

Resources