How does elf-loader knows the address of stderr and stdout

How does elf-loader knows the address of stderr and stdout - linux

I am disassembling a very simple ELF program (Linux x86).
With IDA PRO software I see stdout and stderr in .bss-section.
And I haven't found anything that sets the values of stdout or stderr.
How does it work?
Сan stdout and stderr be null?

So you mean stdout and stderr should always be at the same memory address in .bss
The offset from start of .bss to stdout and stderr is determined at static link time.
The address of start of .bss is subject to ASLR (heap placement randomization). Thus, for a given binary, the address of stdout may change from run to run.
how IDA pro knows this item in .bss is stdout or stderr
The only way it can know is via the symbol table. You should see it in output from:
readelf -Ws ./a.out | egrep 'stdout|stderr'
nm ./a.out | egrep 'stdout|stderr'
nm -D ./a.out | egrep 'stdout|stderr'
Update:
but what happens if symbol table is stripped
There are two cases to consider: fully-static link, and dynamic link.
In the fully-static case, all references to stderr can be completely removed, and IDA pro will not know where stderr is.
In the dynamically-linked case, there are two symbol tables: the "regular" one (displayed by nm) and the dynamic one (displayed by nm -D). Strip will remove only the regular symbol table (because removing dynamic symbol table makes no sense -- the executable will not run without it). IDA pro can then use the dynamic symbol table entry for stderr to tell where that symbol is.

Related

How to debug assembly?

I have the following assembly file test which I want to debug,
How can I do that?
Note I am working with x86-64 and att syntax, plus I don't have access to c code.
I want to stop after each line and being able to see the registers in a table (I remember there is such an option).
I tried:
gdb test
r
but I get:
Starting program:
No executable file specified.
Use the "file" or "exec-file" command.

After running GDB on the executable1:
Use start or starti to set a breakpoint in main or _start respectively and run the program.
Or set breakpoints yourself with b 12 to set a breakpoint on source line 12 (if you built with enough debug info for this to work), or b *0x00401007 to set a breakpoint on an address you copy/pasted from disas output.
layout asm / layout reg puts GDB into text-UI mode with "windows" in your terminal for disassembly and registers. (This can be a bit flaky, you sometimes need control-L to redraw the screen, and sometimes GDB crashes when your process exits although I'm not sure if that's specifically from TUI.)
Otherwise without TUI mode, info reg and disas can be useful.
See the bottom of https://stackoverflow.com/tags/x86/info for more asm debugging tips.
Especially strace ./test is highly useful to see the system calls your program makes, decoded into C style. In toy programs you're playing with for your own experimentation, this basically works as an alternative to checking for error return values.
Footnote 1: You're not doing that part correctly:
No executable file specified.
That means no file called test existed in the directory where you ran gdb test.
You have to assemble + link test.S into an executable called test before you can run GDB on that file. If ls -l test shows it, then gdb test can debug it. (And ./test can run it.)
Often gcc -no-pie foo.S is a good choice to make debugging easier: addresses will be fixed at link time, so objdump -drwC -Mintel test output will match the addresses you see at run-time. And the addresses will be numerically smaller, so it's easier to visually spot a code (.text) address vs. .rodata (modern ld puts it in a separate page so it can avoid exec permission) vs. .data / .bss.
Either way, stack addresses are still easy to distinguish from code either way, 0x555... or 0x0000...XXXXXX is in the executable, 0x7fffff... is in the stack, other addresses from mmap are randomized. (But libc also gets mapped to a high address near the stack, with or without PIE.)
(Or if you're writing _start instead of main, gcc -nostdlib -static foo.S implies -no-pie)

Symbol available in gdb but missing in objdump/nm output [duplicate]

This question already has answers here:
Where are GDB symbols coming from?
(2 answers)
Closed 3 years ago.
On Fedora 31, I see that gdb knows the symbols of some system binaries, e.g. main of /usr/bin/true:
$ gdb -ex 'set height 0' -ex 'disas main' -ex q /bin/true
Reading symbols from /bin/true...
Reading symbols from .gnu_debugdata for /usr/bin/true...
(No debugging symbols found in .gnu_debugdata for /usr/bin/true)
Dump of assembler code for function main:
0x0000000000002550 <+0>: endbr64
0x0000000000002554 <+4>: cmp edi,0x2
[..]
But main doesn't show up in objdump -d nor `nm output:
$ objdump -d /usr/bin/true | grep '\<main\>'
$ nm /usr/bin/true | grep main
nm: /usr/bin/true: no symbols
$ nm -D /usr/bin/true | grep '\<main\>'
$
How come? Is gdb able to read the main symbol from some additional symbol table?
When I compile my own binaries with gcc, nm/objdump show the main symbol as expected. Also, when I strip such a binary, gdb can't find the main symbol, as well.
I assume that rpmbuild calls gcc/strip with some special flags that cause the above behavior. What are those?

Is gdb able to read the main symbol from some additional symbol table?
Yes: the one contained in the .gnu_debugdata section. More info here.
gdb also prints: No debugging symbols found in .gnu_debugdata for /usr/bin/true
GDB says: there are no debugging symbols (i.e. ones with file/line info, variable info, etc.). It doesn't say "there are no symbols" (i.e. things you would see in nm output). In fact, symbols are raison d'etre for .gnu_debugdata in the first place.

How to find the PHDR of dynamically linked/loaded libraries from a kernel module?

I need to access the program header tables (or alternatively to the section headers) of a process from the kernel in order to find the addresses of .eh_frame and .eh_frame_hdr sections from a linux kernel module. In userspace I would use dl_iterate_phdr(), but I need a kernel-space solution. If possible, it would not need to go through the elf files.
The auxiliary vector has the AT_PHDR field, but it does not help to find the PHDRs of dynamically linked/loaded libraries.
My other idea was to iterate on the vm_areas to find the PHDR address from every file that has an executable mmap in the task's memory. The problem with this solution is that the elf file can be changed or deleted after load.
Is there a way to do this that relies only on memory and not on the elf file?

It looks like the Elf header (which has the file offset to the phdr table - often the same as the offset in memory) is always at the beginning of executable mmaps. It does not seem really reliable as I could not find any documentation about the appearance of the Ehdr but it seems present in practice. This could be because of the fact that it must be at the beginning of Elf files and that the page size and alignment makes the executable segment start at offset 0x0.
We can verify that executable mappings start at offset 0x0 for all running processes and loaded shared object with this bash line:
sudo cat /proc/*/maps | awk '{ print $2 " " $3 " " $6;}' | egrep '^..x.' | grep -vE '.... 0{8}'
It outputs all the executable mappings that do not start at offset 0x0, so no output means that the Ehdrs are at the beginning of executable vm_areas.

How to make gdb print symbols in shared libraries loaded with dlopen?

I want to debug a process running on Linux 2.6 using GDB. attach PID (where PID is the process ID), print main, print sin, print gzopen and print dlopen work (i.e. they find the respective symbols). But print myfoo doesn't work, where myfoo is a function loaded by the process from an .so file using dlopen. Here is what I get:
(gdb) print main
$3 = {int (int, char **)} 0x805ba90 <main>
(gdb) print sin
$4 = {<text variable, no debug info>} 0xb7701230 <sin>
(gdb) print gzopen
$5 = {<text variable, no debug info>} 0xb720df50 <gzopen>
(gdb) print dlopen
$6 = {<text variable, no debug info>} 0xb77248e0 <__dlopen_nocheck>
(gdb) print myfoo
No symbol "myfoo" in current context.
How do I get GDB to find myfoo?
The function myfoo does indeed exist, because in the program I managed to get its address using dlsym (after dlopen), and I managed to call it. Only after that I attached GDB to the process.
It turned out that there was a mydir/mylib.so: No such file or directory error message printed by the attach $PID command of GDB. Apparently GDB was started in the wrong directory. Doing the proper cd before starting GDB fixed the problem, and print myfoo started working.
I'd like to automate this: I want GDB figure out where my .so files (loaded with dlopen) are. An approximation I can think of is examining /proc/$PID/maps (on Linux), finding possible directories, and adding all of them to the GDB library search path before starting GDB. Extending LD_LIBRARY_PATH and doing a set solib-search-path /tmp/parent didn't work (ls -l /tmp/parent/mydir/myfoo.so does work), GDB still reported the No such file or directory. How do I tell GDB where to look for mydir/myfoo.so?
My other question is how do I get the list of possible directories? On Linux, /proc/$PID/maps contains them -- but what about other operating systems like FreeBSD and the Mac OS X?

"info target" command in gdb will show a list of all sections in all loaded shared objects (including dlopen()ed libraries). At least this works on Linux -- I don't know how it behaves on other operating systems.

I maintain a program that loads a shared library via dlopen() and have successfully accessed symbols in the shared library using GDB. This will only work, however, if the shared library has a symbol table.

It looks like there is no easy way to automate finding finding .so files in GDB.

How to convert PE(Portable Executable) format to ELF in linux

What's the best tool for converting PE binaries to ELF binaries?
Following is a brief motivation for this question:
Suppose I have a simple C program.
I compiled it using gcc for linux(this gives ELF), and using 'i586-mingw32msvc-gcc' for Windows(this gives a PE binary).
I want to analyze these two binaries for similarities, using Bitblaze's static analysis tool - vine(http://bitblaze.cs.berkeley.edu/vine.html)
Now vine doesn't have a good support for PE binaries, so I wanted to convert PE->ELF, and then carry on with my comparison/analysis.
Since all the analysis has to run on Linux, I would prefer a utility/tool that runs on Linux.
Thanks

It is possible to rebuild an EXE as an ELF binary, but the resulting binary will segfault very soon after loading, due to the missing operating system.
Here's one method of doing it.
Summary
Dump the section headers of the EXE file.
Extract the raw section data from the EXE.
Encapsulate the raw section data in GNU linker script snippets.
Write a linker script to build an ELF binary, including those scripts from the previous step.
Run ld with the linker script to produce the ELF file.
Run the new program, and watch it segfault as it's not running on Windows (and it tries to call functions in the Import Address Table, which doesn't exist).
Detailed Example
Dump the section headers of the EXE file. I'm using objdump from the mingw cross compiler package to do this.
$ i686-pc-mingw32-objdump -h trek.exe
trek.exe: file format pei-i386
Sections:
Idx Name Size VMA LMA File off Algn
0 AUTO 00172600 00401000 00401000 00000400 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE
1 .idata 00001400 00574000 00574000 00172a00 2**2
CONTENTS, ALLOC, LOAD, DATA
2 DGROUP 0002b600 00576000 00576000 00173e00 2**2
CONTENTS, ALLOC, LOAD, DATA
3 .bss 000e7800 005a2000 005a2000 00000000 2**2
ALLOC
4 .reloc 00013000 0068a000 0068a000 0019f400 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
5 .rsrc 00000a00 0069d000 0069d000 001b2400 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
Use dd (or a hex editor) to extract the raw section data from the EXE. Here, I'm just going to copy the code and data sections (named AUTO and DGROUP in this example). You may want to copy additional sections though.
$ dd bs=512 skip=2 count=2963 if=trek.exe of=code.bin
$ dd bs=512 skip=2975 count=347 if=trek.exe of=data.bin
Note, I've converted the file offsets and section sizes from hex to decimal to use as skip and count, but I'm using a block size of 512 bytes in dd to speed up the process (example: 0x0400 = 1024 bytes = 2 blocks # 512 bytes).
Encapsulate the raw section data in GNU ld linker scripts snippets (using the BYTE directive). This will be used to populate the sections.
cat code.bin | hexdump -v -e '"BYTE(0x" 1/1 "%02X" ")\n"' >code.ld
cat data.bin | hexdump -v -e '"BYTE(0x" 1/1 "%02X" ")\n"' >data.ld
Write a linker script to build an ELF binary, including those scripts from the previous step. Note I've also set aside space for the uninitialized data (.bss) section.
start = 0x516DE8;
ENTRY(start)
OUTPUT_FORMAT("elf32-i386")
SECTIONS {
.text 0x401000 :
{
INCLUDE "code.ld";
}
.data 0x576000 :
{
INCLUDE "data.ld";
}
.bss 0x5A2000 :
{
. = . + 0x0E7800;
}
}
Run the linker script with GNU ld to produce the ELF file. Note I have to use an emulation mode elf_i386 since I'm using 64-bit Linux, otherwise a 64-bit ELF would be produced.
$ ld -o elf_trek -m elf_i386 elf_trek.ld
ld: warning: elf_trek.ld contains output sections; did you forget -T?
$ file elf_trek
elf_trek: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV),
statically linked, not stripped
Run the new program, and watch it segfault as it's not running on Windows.
$ gdb elf_trek
(gdb) run
Starting program: /home/quasar/src/games/botf/elf_trek
Program received signal SIGSEGV, Segmentation fault.
0x0051d8e6 in ?? ()
(gdb) bt
\#0 0x0051d8e6 in ?? ()
\#1 0x00000000 in ?? ()
(gdb) x/i $eip
=> 0x51d8e6: sub (%edx),%eax
(gdb) quit
IDA Pro output for that location:
0051D8DB ; size_t stackavail(void)
0051D8DB proc stackavail near
0051D8DB push edx
0051D8DC call [ds:off_5A0588]
0051D8E2 mov edx, eax
0051D8E4 mov eax, esp
0051D8E6 sub eax, [edx]
0051D8E8 pop edx
0051D8E9 retn
0051D8E9 endp stackavail
For porting binaries to Linux, this is kind of pointless, given the Wine project.
For situations like the OP's, it may be appropriate.

I've found a simpler way to do this. Use the strip command.
Example
strip -O elf32-i386 -o myprogram.elf myprogram.exe
The -O elf32-i386 has it write out the file in that format.
To see supported formats run
strip --info
I am using the strip command from mxe, which on my system is actually named /opt/mxe/usr/bin/i686-w64-mingw32.static-strip.

I don't know whether this totally fits your needs, but is it an option for you to cross-compile with your MinGW version of gcc?
I mean do say: does it suit your needs to have i586-mingw32msvc-gcc compile direct to ELF format binaries (instead of the PEs you're currently getting). A description of how to do things in the other direction can be found here - I imagine it will be a little hacky but entirely possible to make this work for you in the other direction (I must admit I haven't tried it).

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string