What does #plt mean here? - linux

0x00000000004004b6 <main+30>: callq 0x400398 <printf#plt>
Anyone knows?
UPDATE
Why two disas printf give me different result?
(gdb) disas printf
Dump of assembler code for function printf#plt:
0x0000000000400398 <printf#plt+0>: jmpq *0x2004c2(%rip) # 0x600860 <_GLOBAL_OFFSET_TABLE_+24>
0x000000000040039e <printf#plt+6>: pushq $0x0
0x00000000004003a3 <printf#plt+11>: jmpq 0x400388
(gdb) disas printf
Dump of assembler code for function printf:
0x00000037aa44d360 <printf+0>: sub $0xd8,%rsp
0x00000037aa44d367 <printf+7>: mov %rdx,0x30(%rsp)
0x00000037aa44d36c <printf+12>: movzbl %al,%edx
0x00000037aa44d36f <printf+15>: mov %rsi,0x28(%rsp)
0x00000037aa44d374 <printf+20>: lea 0x0(,%rdx,4),%rax
0x00000037aa44d37c <printf+28>: lea 0x3f(%rip),%rdx # 0x37aa44d3c2 <printf+98>

It's a way to get code fix-ups (adjusting addresses based on where code sits in virtual memory, which may be different across different processes) without having to maintain a separate copy of the code for each process. The PLT, or procedure linkage table, is one of the structures which makes dynamic loading and linking easier to use (another is the GOT, or global offsets table).
Refer to the following diagram, which shows both your calling code and the library code (that you call) mapped to different virtual addresses in two different processes, A and B. There is only one copy of each piece of code in real memory, with the different virtual addresses within each process mapping to that real address):
Process A
Addresses (virtual):
0x1234 0x8888
+-------------+ +---------+ +---------+
| | | Private | | |
| | | PLT/GOT | | |
| Shared | +---------+ | Shared |
===== application =============== library =====
| code | +---------+ | code |
| | | Private | | |
| | | PLT/GOT | | |
+-------------+ +---------+ +---------+
0x2020 0x6666
Process B
When the shared library is brought in to the address space, entries are constructed in the process-specific (private) PLT and/or GOT which will, on first use, perform some fix-up to make things faster. Subsequent usage will then bypass the fix-up as it will no longer be needed.
The process goes something like this.
printf#plt is actually a small stub which (eventually) calls the real printf function, modifying things on the way to make subsequent calls faster.
The real printf function is mapped into an arbitrary location in a given process (virtual address space), as is the code that is trying to call it.
So, in order to allow proper code sharing of calling code (left side above) and called code (right side), you cannot apply any fix-ups to the calling code directly since that will "damage" how it works in the other processes (that wouldn't matter if it mapped to the same location in every process but that's a bit of a restriction, especially if something else had already been mapped there).
So the PLT is a smaller process-specific area at a reliably-calculated-at-runtime address that isn't shared between processes, so any given process is free to change it however it wants to, without adverse effects on other processes.
Let's follow the process through in a bit more detail. The diagram above doesn't show the address of the PLT/GOT since it can be found using a location relative to the current program counter. This is evidenced by your PC-relative lookup:
<printf#plt+0>: jmpq *0x2004c2(%rip) ; 0x600860 <_GOT_+24>
By using position independent code in the called library, along with the PLT/GOT, the first call to the function printf#plt (so in the PLT) is a multi-stage operation, in which the following actions take place:
It calls the GOT version (via a pointer) which initially points back to some set-up code in the PLT.
That set-up code loads the relevant shared library if not yet done, then modifies the GOT pointer so that subsequent calls go directly to the real printf (at the process-specific virtual address) rather than the PLT set-up code.
It then calls the loaded printf code at that address.
On subsequent calls, because the GOT pointer has been modified, the multi-stage approach is simplified:
It calls the GOT version (via pointer), which now points to the real printf.
A good article can be found here, detailing how glibc is loaded at run time.

Not sure, but probably what you have seen makes sense. The first time you run the disas command the printf is not yet called so it's not resolved. Once your program calls the printf method the first time the GOT is updated and now the printf is resolved and the GOT points to the real function. Thus, the next call to the disas command shows the real printf assembly.

Related

Cheating with the gp register on RISC-V - what could go wrong?

I absolutely have to pass an initialisation value to a dynamic library/module (everything is written in assembly) for some RISC-V code. The only way I seem to be able to do this is to use the gp register - and the code I am using runs and there are no crashes (yet). It is used to pass the value of a stack where a couple of initialisation values are stored.
70 PUSH gp
71 mv gp, s10
72 call dlopen
73 POP gp
(PUSH and POP are my main stack macros, s10 points to the stack I am using to store values for initialisation).Everything runs on top of GNU libc/libdl.
I restore the value of gp as quickly as I can: everything says never change the value of this register - so what could go wrong, or if it works, can I just relax about it?
The answer was to write some library code that would allow access to read and write a memory location that held the value. So the writer (main executable) could write to the address and then the reader (library needing the address) reads it as required.

How to find the address of a not imported libc function when ASLR is on?

I have a 32bit elf program that I have to exploit remotely (for academic purposes).
The final goal is to spawn a shell. I have a stack that I can fill with any data I want and I can abuse one of the printf format strings. The only problem is that system/execv/execvp is not imported. The .got.plt segment is full of not-very-useful functions and I want to replace atoi with system because of how similar their signature is and the flow of the code indicates that that is the right function to replace. For the following attempts, I used IDA remote debug, so bad stack alignment and not proper format string is out of question. I wanted to make sure it is doable and apparently for me it isn't yet.
At first I tried to replace atoi#.got.plt with the unrandomized address of system. Got SIGSEGV.
Alright, it's probably because of ASLR, so let's try something else. I loaded up gdb and looked up system#0xb7deeda0 and atoi#0xb7de1250. Then I calculated the diff, which is 0xDB50. So the next time when I changed the address of atoi to system in the .got.plt segment, I actually just added diff to that value to get the address of system. Got SIGSEGV again.
My logic:
0xb7deeda0 <__libc_system>
0xb7de1250 <atoi>
diff = 0xb7deeda0 - 0xb7de1250
system#.got.plt = atoi#.got.plt + diff
example: 0x08048726 + DB50 = 0x08056276
Can anyone tell me what I did wrong and how can I jump to a "valid system()" with the help of leaking a function address from .got.plt?
Answering to my own question. Measuring the distance between functions in your
l̲o̲c̲a̲l̲ libc does not guarantee that the r̲e̲m̲o̲t̲e̲ libc will have the same alignment.
You have to find the libc version somehow, then you can get the address difference like so:
readelf -s /lib32/libc-2.19.so | grep printf
Possible ways to find the libc version if you know two addresses:
Libc binary collection
libcdb.com
pwnlib
... or you have access to the shell on the remote machine and can peek into the library with readelf yourself

Execution flow of dynamic library code stub

There are few questions on StackOverflow with the similar title but this question addresses different issue.
I have created a simple fprintf program which prints a certain value in a file. I wanted to understand the execution flow of the program with respect to the library code. My analysis of the dynamic libraries is that each dynamic library is called using a stub which is stored in the plt section of the object file. Simple objdump of an object file revealed three main sections : init, plt and text. All the stubs in the plt section consist of 3/4 similar lines. Example for fprintf :
00000000004004d0 <fprintf#plt>:
4004d0: jmpq *0x200b42(%rip) # 601018 <_GLOBAL_OFFSET_TABLE_+0x30>
4004d6: pushq $0x3
4004db: jmpq 400490 <_init+0x18>
I used the pintool to trace the execution of the program. Apparently, the first time fprintf is called, the execution flow is 4004d0,4004d6,4004db. This means that the first instruction i.e. jump goes to the next instruction at the first time and then the next time, the same jump instruction leads to the library code(I could identify this from the ip of the next instruction).
My query is where is the _GLOBAL_OFFSET_TABLE_ information maintained in the object file ? I used readelf -a command to read the object file contents but could not find the interested instruction pointers in that object file.
Is the plt section and the stub method the only way we can access the shared library code ?
I used the pintool to trace the execution of the program. Apparently, the first time fprintf is called, the execution flow is 4004d0,4004d6,4004db. This means that the first instruction i.e. jump goes to the next instruction at the first time and then the next time, the same jump instruction leads to the library code(I could identify this from the ip of the next instruction).
Congratulations, you have observed lazy symbol relocation in action. Read more about it here (especially "The lazy binding optimization" section).
My query is where is the _GLOBAL_OFFSET_TABLE_ information maintained in the object file ?
It's not (as reading referenced blog post will show).

Function graph (timestamped entry and exit) for both user, library and kernel space in Linux?

I'm writing this more-less in frustration - but who knows, maybe there's a way for this too...
I would like to analyze what happens with a function from ALSA, say snd_pcm_readi; for that purpose, let's say I have prepared a small testprogram.c, where I have this:
void doCapture() {
ret = snd_pcm_readi(handle, buffer, period_size);
}
The problem with this function is that it eventually (should) hook into snd_pcm_readi in the shared system library /usr/lib/libasound.so; from there, I believe via ioctl, it would somehow communicate to snd_pcm_read in the kernel module /lib/modules/$(uname -r)/kernel/sound/core/snd-pcm.ko -- and that should ultimately talk to whatever .ko kernel module which is a driver for a particular soundcard.
Now, with the organization like above, I can do something like:
valgrind --tool=callgrind --toggle-collect=doCapture ./testprogram
... and then kcachegrind callgrind.out.12406 does indeed reveal a relationship between snd_pcm_readi, libasound.so and an ioctl (I cannot get the same information to show with callgrind_annotate) - so that somewhat covers userspace; but that is as far as it goes. Furthermore, it produces a call graph, that is to say general caller/callee relationships between functions (possibly by a count of samples/ticks each function has spent working as scheduled).
However, what I would like to get instead, is something like the output of the Linux ftrace tracer called function_graph, which provides a timestamped entry and exit of traced kernel functions... example from ftrace: add documentation for function graph tracer [LWN.net]:
$ cat /sys/kernel/debug/tracing/trace
# tracer: function_graph
#
# TIME CPU DURATION FUNCTION CALLS
# | | | | | | | |
2105.963678 | 0) | mutex_unlock() {
2105.963682 | 0) 5.715 us | __mutex_unlock_slowpath();
2105.963693 | 0) + 14.700 us | }
2105.963698 | 0) | dnotify_parent() {
(NB: newer ftrace documentation seems to not show a timestamp at first for the function\_graph, only duration - but I think it's still possible to modify that)
With ftrace, one can filter so one can only trace functions in a given kernel module - so in my case, I could add the functions of snd-pcm.ko and whatever .ko module is the soundcard driver, and I'd have whatever I find interesting in kernel-space covered. But then, I lose the link to the user-space program (unless I explicitly printf to /sys/kernel/debug/tracing/trace_marker, or do a trace_printk from user-space .c files)
Ultimately, what I'd like, is to have the possibility to specify an executable, possibly also library files and kernel modules - and obtain a timestamped function graph (with indented/nested entry and exit per function) like ftrace provides. Are there any alternatives for something like this? (Note I can live without the function exits - but I'd really like to have timestamped function entries)
As a PS: it seems I actually found something that fits the description, which is the fulltrace application/script:
fulltrace [andreoli#Github]
fulltrace traces the execution of an ELF program, providing as output a full trace of its userspace, library and kernel function calls. ...
(prerequisites) the following kernel configuration options and their dependencies must be set as enabled (=y): FTRACE, TRACING_SUPPORT, UPROBES, UPROBE_EVENT, FUNCTION_GRAPH_TRACER.
Sounds perfect - but the problem is, I'm on Ubuntu 11.04, and while this 2.6.38 kernel luckily has CONFIG_FTRACE=y enabled -- its /boot/config-`uname -r`
doesn't even mention UPROBES :/ And since I'd like to avoid doing kernel hacking, unfortunately I cannot use this script...
(Btw, if UPROBES were available, (as far as I understand) one sets a trace probe on a symbol address (as obtained from say objdump -d), and output goes again to /sys/kernel/debug/tracing/trace - so some custom solution would have been possible using UPROBES, even without the fulltrace script)
So, to narrow down my question a bit - is there a solution, that would allow simultaneous user-space (incl. shared libraries) and kernel-space "function graph" tracing, but where UPROBES are not available in the kernel?

What is the purpose of this code segment from glibc

I am trying to understand what the following code segment from tls.h in glibc is doing and why:
/* Macros to load from and store into segment registers. */
# define TLS_GET_FS() \
({ int __seg; __asm ("movl %%fs, %0" : "=q" (__seg)); __seg; })
I think I understand the basic operation it is moving the value stored in the fs register to __seg. However, I have some questions:
My understanding is the fs is only 16-bits. Is this correct? What happens when the value gets moved to a quadword memory location? Does this mean the upper bits get set to 0?
More importantly I think that the scope of the variable __seg that gets declared at the start of the segment is limited to this segment. So how is __seg useful? I'm sure that the authors of glibc have a good reason for doing this but I can't figure out what it is from looking at the source code.
I tried generating assembly for this code and I got the following?
#APP
# 13 "fs-test.cpp" 1
movl %fs, %eax
# 0 "" 2
#NO_APP
So in my case it looks like eax was used for __seg. But I don't know if that is always what happens or if it was just what happened in the small test file that I compiled. If it is always going to use eax why wouldn't the assembly be written that way? If the compiler might pick other registers then how will the programmer know which one to access since __seg goes out of scope at the end of the macro? Finally I did not see this macro used anywhere when I grepped for it in the glibc source code, so that further adds to my confusion about what its purpose is. Any explanation about what the code is doing and why is appreciated.
My understanding is the fs is only 16-bits. Is this correct? What happens when the value gets moved to a quadword memory location? Does this mean the upper bits get set to 0?
Yes.
the variable __seg that gets declared at the start of the segment is limited to this segment. So how is __seg useful?
You have to read about GCC statement-expression extension. The value of statement expression is the value of the last expression in it. The __seg; at the end would be useless, unless one assigns it to something else, like this:
int foo = TLS_GET_FS();
Finally I did not see this macro used anywhere when I grepped for it in the glibc source code
The TLS_{GET,SET}_FS in fact do not appear to be used. They probably were used in some version, then accidentally left over when the code referencing them was removed.

Resources