Why does strace believe this memory is uninitialized when attaching to a process? - linux

I have an extremely simple program that does nothing more than call recvfrom() in a loop. According to its manpage, one of the arguments is a pointer to the length of the address. This address is initialized in the .data section to the integer value 16. I noticed some strange behavior when I attach to the already-running process to trace it which is not present when I trace the process directly (when I start it traced). Scroll to the end of the lines:
# strace -x -s 10 -e trace=recvfrom ./test
recvfrom(3, "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"..., 32, 0, {sa_family=AF_INET, sin_port=htons(42134), sin_addr=inet_addr("127.0.0.1")}, [16]) = 32
recvfrom(3, "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"..., 32, 0, {sa_family=AF_INET, sin_port=htons(49442), sin_addr=inet_addr("127.0.0.1")}, [16]) = 32
recvfrom(3, ^Cstrace: Process 18909 detached
<detached ...>
# ./test &
# strace -x -s 10 -e trace=recvfrom -p $!
strace: Process 18916 attached
recvfrom(3, "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"..., 32, 0, {sa_family=AF_INET, sin_port=htons(50906), sin_addr=inet_addr("127.0.0.1")}, [1999040176->16]) = 32
recvfrom(3, "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"..., 32, 0, {sa_family=AF_INET, sin_port=htons(52956), sin_addr=inet_addr("127.0.0.1")}, [16]) = 32
recvfrom(3, ^Cstrace: Process 18916 detached
<detached ...>
When I trace it directly, the address length argument shows as [16], which makes sense. After all, the address is a pointer to an int of the value 16. However, when I attach to the process and trace it, the very first call shows that it is not initialized, e.g. [1999040176->16]. This happens for the first syscall every time I attach, but all subsequent calls it shows it correctly as [16]. If I detach from the process and re-attach, the first call will show it as having uninitialized memory.
To be brief:
When I run it under strace, the last argument shows [16] for every recvfrom().
When I attach to it when it is already running, the last argument shows things like [1999040176->16] in the first call to recvfrom(), and [16] in all subsequent ones.
If I detach from it and attach again, the first call to recvfrom() again displays this odd behavior, and all subsequent calls display the expected [16].
The program itself is correct. Here is the program (written in MIPS assembly):
.section .text
.global __start
__start:
# socket
li $v0,4183
li $a0,2
li $a1,1
li $a2,0
syscall
sw $v0,sockfd
# bind
li $v0,4169
lw $a0,sockfd
la $a1,sockaddr_b
li $a2,16
syscall
loop:
# recvfrom
li $v0,4176
lw $a0,sockfd
la $a1,buffer
li $a2,32
li $a3,0
la $t0,sockaddr_a
sw $t0,16($sp)
la $t0,addrlen
sw $t0,20($sp)
syscall
j loop
.section .bss
sockaddr_a: .space 16
buffer: .space 32
sockfd: .space 4
.section .data
addrlen: .int 16
.section .rodata
sockaddr_b: .hword 2,1234,0,0

Related

creating Linux i386 a.out executable shorter than 4097 bytes

I'm trying to create a Linux i386 a.out executable shorter than 4097 bytes, but all my efforts have failed so far.
I'm compiling it with:
$ nasm -O0 -f bin -o prog prog.nasm && chmod +x prog
I'm testing it in a Ubuntu 10.04 i386 VM running Linux 2.6.32 with:
$ sudo modprobe binfmt_aout
$ sudo sysctl vm.mmap_min_addr=4096
$ ./prog; echo $?
Hello, World!
0
This is the source code of the 4097-byte executable which works:
; prog.nasm
bits 32
cpu 386
org 0x1000 ; Linux i386 a.out QMAGIC file format has this.
SECTION_text:
a_out_header:
dw 0xcc ; magic=QMAGIC; Demand-paged executable with the header in the text. The first page (0x1000 bytes) is unmapped to help trap NULL pointer references.
db 0x64 ; type=M_386
db 0 ; flags=0
dd SECTION_data - SECTION_text ; a_text=0x1000 (byte size of .text; mapped as r-x)
dd SECTION_end - SECTION_data ; a_data=0x1000 (byte size of .data; mapped as rwx, not just rw-)
dd 0 ; a_bss=0 (byte size of .bss)
dd 0 ; a_syms=0 (byte size of symbol table data)
dd _start ; a_entry=0x1020 (in-memory address of _start == file offset of _start + 0x1000)
dd 0 ; a_trsize=0 (byte size of relocation info or .text)
dd 0 ; a_drsize=0 (byte size of relocation info or .data)
_start: mov eax, 4 ; __NR_write
mov ebx, 1 ; argument: STDOUT_FILENO
mov ecx, msg ; argument: address of string to output
mov edx, msg_end - msg ; argument: number of bytes
int 0x80 ; syscall
mov eax, 1 ; __NR_exit
xor ebx, ebx ; argument: EXIT_SUCCESS == 0.
int 0x80 ; syscall
msg: db 'Hello, World!', 10
msg_end:
times ($$ - $) & 0xfff db 0 ; padding to multiple of 0x1000 ; !! is this needed?
SECTION_data: db 0
; times ($$ - $) & 0xfff db 0 ; padding to multiple of 0x1000 ; !! is this needed?
SECTION_end:
How can I make the executable file smaller? (Clarification: I still want a Linux i386 a.out executable. I know that that it's possible to create a smaller Linux i386 ELF executable.) There is several thousands bytes of padding at the end of the file, which seems to be required.
So far I've discovered the following rules:
If a_text or a_data is 0, Linux doesn't run the program. (See relevant Linux source block 1 and 2.)
If a_text is not a multiple of 0x1000 (4096), Linux doesn't run the program. (See relevant Linux source block 1 and 2.)
If the file is shorter than a_text + a_data bytes, Linux doesn't run the program. (See relevant Linux source code location.)
Thus file_size >= a_text + a_data >= 0x1000 + 1 == 4097 bytes.
The combinations nasm -f aout + ld -s -m i386linux and nasm -f elf + ld -s -m i386linux and as -32 + ld -s -m i386linux produce an executable of 4100 bytes, which doesn't even work (because its a_data is 0), and by adding a single byte to section .data makes the executable file 8196 bytes long, and it will work. Thus this path doesn't lead to less than 4097 bytes.
Did I miss something?
TL;DR It doesn't work.
It is impossible to make a Linux i386 a.out QMAGIC executable shorter than 4097 bytes work on Linux 2.6.32, based on evidence in the Linux kernel source code of the binfmt_aout module.
Details:
If a_text is 0, Linux doesn't run the program. (Evidence for this check: a_text is passed as the length argument to mmap(2) here.)
If a_data is 0, Linux doesn't run the program. (Evidence for this check: a_data is passed as the length argument to mmap(2) here.)
If a_text is not a multiple of 0x1000 (4096), Linux doesn't run the program. (Evidence for this check: fd_offset + ex.a_text is passed as the offset argument to mmap(2) here. For QMAGIC, fd_offset is 0.)
If the file is shorter than a_text + a_data bytes, Linux doesn't run the program. (Evidence for this check: file sizes is compared to a_text + a_data + a_syms + ... here.)
Thus file_size >= a_text + a_data >= 0x1000 + 1 == 4097 bytes.
I've also tried OMAGIC, ZMAGIC and NMAGIC, but none of them worked. Details:
For OMAGIC, read(2) is used instead of mmap(2) within here, thus it can work. However, Linux tries to load the code to virtual memory address 0 (N_TXTADDR is 0), and this causes SIGKILL (if non-root and vm.mmap_min_addr is larger than 0) or SIGILL (otherwise), thus it doesn't work. Maybe the reason for SIGILL is that the page allocated by set_brk is not executable (but that should be indicated by SIGSEGV), this could be investigated further.
For ZMAGIC and NMAGIC, read(2) instead of mmap(2) within here if fd_offset is not a multiple of the page size (0x1000). fd_offset is 32 for NMAGIC, and 1024 for ZMAGIC, so good. However, it doesn't work for the same reason (load to virtual memory address 0).
I wonder if it's possible to run OMAGIC, ZMAGIC or NMAGIC executables at all on Linux 2.6.32 or later.

How does gdb start an assembly compiled program and step one line at a time?

Valgrind says the following on their documentation page
Your program is then run on a synthetic CPU provided by the Valgrind core
However GDB doesn't seem to do that. It seems to launch a separate process which executes independently. There's also no c library from what I can tell. Here's what I did
Compile using clang or gcc gcc -g tiny.s -nostdlib (-g seems to be required)
gdb ./a.out
Write starti
Press s a bunch of times
You'll see it'll print out "Test1\n" without printing test2. You can also kill the process without terminating gdb. GDB will say "Program received signal SIGTERM, Terminated." and won't ever write Test2
How does gdb start the process and have it execute only one line at a time?
.text
.intel_syntax noprefix
.globl _start
.p2align 4, 0x90
.type _start,#function
_start:
lea rsi, [rip + .s1]
mov edi, 1
mov edx, 6
mov eax, 1
syscall
lea rsi, [rip + .s2]
mov edi, 1
mov edx, 6
mov eax, 1
syscall
mov eax, 60
xor edi, edi
syscall
.s1:
.ascii "Test1\n"
.s2:
.ascii "Test2\n"
starti implementation
As usual for a process that wants to start another process, it does a fork/exec, like a shell does. But in the new process, GDB doesn't just make an execve system call right away.
Instead, it calls ptrace(PTRACE_TRACEME) to wait for the parent process to attach to it, so GDB (the parent) is already attached before the child process makes an execve() system call to make this process start executing the specified executable file.
Also note in the execve(2) man page:
If the current program is being ptraced, a SIGTRAP signal is sent
to it after a successful execve().
So that's how the kernel debugging API supports stopping before the first user-space instruction is executed in a newly-execed process. i.e. exactly what starti wants. This doesn't depend on setting a breakpoint; that can't happen until after execve anyway, and with ASLR the correct address isn't even known until after execve picks a base address. (GDB by default disables ASLR, but it still works if you tell it not to disable ASLR.)
This is also what GDB use if you set breakpoints before run, manually, or by using start to set a one-time breakpoint on main. Before the starti command existed, a hack to emulate that functionality was to set an invalid breakpoint before run, so GDB would stop on that error, giving you control at that point.
If you strace -f -o gdb.trace gdb ./foo or something, you'll see some of what GDB does. (Nested tracing apparently doesn't work, so running GDB under strace means GDB's ptrace system call fails, but we can see what it does leading up to that.)
...
231566 execve("/usr/bin/gdb", ["gdb", "./foo"], 0x7ffca2416e18 /* 57 vars */) = 0
# the initial GDB process is PID 231566.
... whole bunch of stuff
231566 write(1, "Starting program: /tmp/foo \n", 28) = 28
231566 personality(0xffffffff) = 0 (PER_LINUX)
231566 personality(PER_LINUX|ADDR_NO_RANDOMIZE) = 0 (PER_LINUX)
231566 personality(0xffffffff) = 0x40000 (PER_LINUX|ADDR_NO_RANDOMIZE)
231566 vfork( <unfinished ...>
# 231584 is the new PID created by vfork that would go on to execve the new PID
231584 openat(AT_FDCWD, "/proc/self/fd", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 13
231584 newfstatat(13, "", {st_mode=S_IFDIR|0500, st_size=0, ...}, AT_EMPTY_PATH) = 0
231584 getdents64(13, 0x558403e20360 /* 16 entries */, 32768) = 384
231584 close(3) = 0
... all these FDs
231584 close(12) = 0
231584 getdents64(13, 0x558403e20360 /* 0 entries */, 32768) = 0
231584 close(13) = 0
231584 getpid() = 231584
231584 getpid() = 231584
231584 setpgid(231584, 231584) = 0
231584 ptrace(PTRACE_TRACEME) = -1 EPERM (Operation not permitted)
231584 write(2, "warning: ", 9) = 9
231584 write(2, "Could not trace the inferior pro"..., 37) = 37
231584 write(2, "\n", 1) = 1
231584 write(2, "warning: ", 9) = 9
231584 write(2, "ptrace", 6) = 6
231584 write(2, ": ", 2) = 2
231584 write(2, "Operation not permitted", 23) = 23
231584 write(2, "\n", 1) = 1
# gotta love unbuffered stderr
231584 exit_group(127) = ?
231566 <... vfork resumed>) = 231584 # in the parent
231584 +++ exited with 127 +++
# then the parent is running again
231566 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=231584, si_uid=1000, si_status=127, si_utime=0, si_stime=0} ---
231566 rt_sigreturn({mask=[]}) = 231584
... then I typed "quit" and hit return
There some earlier clone system calls to create more threads in the main GDB process, but those didn't exit until after the vforked PID that attempted ptrace(PTRACE_TRACEME). They were all just threads since they used clone with CLONE_VM. There was one earlier vfork / execve of /usr/bin/iconv.
Annoyingly, modern Linux has moved to PIDs wider than 16-bit so the numbers get inconveniently large for human minds.
step implementation:
Unlike stepi which would use PTRACE_SINGLESTEP on ISAs that support it (e.g. x86 where the kernel can use the TF trap flag, but interestingly not ARM), step is based on source-level line number <-> address debug info. That's usually pointless for asm, unless you want to step past macro expansions or something.
But for step, GDB will use ptrace(PTRACE_POKETEXT) to write an int3 debug-break opcode over the first byte of an instruction, then ptrace(PTRACE_CONT) to let execution run in the child process until it hits a breakpoint or other signal. (Then put back the original opcode byte when this instruction needs to execute). The place at which it puts that breakpoint is something it finds by looking for the next address of a line-number in the DWARF or STABS debug info (metadata) in the executable. That's why only stepi (aka si) works when you don't have debug info.
Or possibly it would use PTRACE_SINGLESTEP one or two times as an optimization if it saw it was close.
(I normally only use si or ni for debugging asm, not s or n. layout reg is also nice, when GDB doesn't crash. See the bottom of the x86 tag wiki for more GDB asm debugging tips.)
If you meant to ask how the x86 ISA supports debugging, rather than the Linux kernel API which exposes those features via a target-independent API, see the related Q&As:
How is PTRACE_SINGLESTEP implemented?
Why Single Stepping Instruction on X86?
How to tell length of an x86-64 instruction opcode using CPU itself?
Also How does a debugger work? has some Windowsy answers.

ARM Assembly Branch Segmentation Fault

I'm new to assembly and I'm currently getting a segmentation fault when executing the following:
.global _start # Provide program starting address to linker
_start: mov R0,#0 # A value of 1 indicates "True"
bl v_bool # Call subroutine to display "True" or "False"
mov R0,#0 # Exit Status code of 0 for "normal completion"
mov R7,#1 # Service command 1 terminates this program
svc 0 # Issue Linux command to terminate program
# Subroutine v_bool wil display "True" or "False" on the monitor
# R0: contains 0 implies false; non-zero implies true
# LR: Contains the return address
# Registers R0 through R7 will be used by v_bool and not saved
v_bool: cmp R0,#0 # Set condition flags for True or False
beq setf
bne sett
mov R2,#6 # Number of characters to be displayed at a time.
mov R0,#1 # Code for stdout (standard output, monitor)
mov R7,#4 # Linux service command code to write.
svc 0 # Call Linux command
bx LR # Return to the calling program
sett: ldr R1,=T_msg
setf: ldr R1,=F_msg
.data
T_msg: .ascii "True " # ASCII string to display if true
F_msg: .ascii "False " # ASCII string to display if false
.end
I've used the debugger to find that the causes of the segmentation fault are the two branches sett and setf, and I understand that this is caused by the program trying to write to an illegal memory location.
However, I do not understand why these branches are not able to write to R1, or what I should do to fix this. Any help is greatly appreciated.
The issue is not the instructions themselves. The problem is, after executing the instruction at, for instance setf, the execution continues on to undefined memory. You need to make sure the execution after setf and sett goes back to the code of v_bool.

Assembler messages: Error: junk when running as on Linux

I am currently studying the material here where the author is creating an OS in Windows using mingw. I am trying to follow along and I'm using Ubuntu, yet when I get to a particular stage, namely assembling the object file I receive an error.
The command I am using is:
as -o boot.o boot.s
and here is my error:
as -o boot.o boot.s
boot.s: Assembler messages:
boot.s:22: Error: junk `iResSect' after expression
boot.s:24: Error: invalid character ',' in mnemonic
Makefile:10: recipe for target 'boot.o' failed
make: *** [boot.o] Error 1
Here are some of the files:
boot.s:
.code16
.intel_syntax noprefix
.text
.org 0x0
LOAD_SEGMENT = 0x1000 # Load the 2nd Stage to Here
FAT_SEGMENT = 0x0ee0 # Load FAT to here
.global main
main:
jmp short start
nop
.include "bootsector.s"
.include "macros.s"
start:
mInitSegments
mResetDiskSystem
mWriteString loadmsg
mFindFile filename, LOAD_SEGMENT
mReadFAT FAT_SEGMENT
mReadFile LOAD_SEGMENT, FAT_SEGMENT
mStartSecondStage
#
# Booting has failed because of a disk error
# Inform the user and reboot.
#
bootFailure:
mWriteString diskerror
mReboot
.include "functions.s"
# DATA
filename: .asciz "2NDSTAGEBIN"
rebootmsg: .asciz "Press any key to reboot.\r\n"
diskerror: .asciz "Disk error. "
loadmsg: .asciz "Loading SamOS...\r\n"
root_strt: .byte 0,0 # Holds offset of Root Dir on disk
root_scts: .byte 0,0 # Hold No. Sectors in Root Dir
file_strt: .byte 0,0 # Hold offset of bootloaded on disk
.fill (510-(.-main)), 1, 0
BootMagic: .int 0xAA55
bootsector.s:
bootsector:
iOEM: .ascii "DevOS " # OEM String
iSectSize: .word 0x200 # Bytes per Sector
iClustSize: .byte 1 # Sectors per Cluster
iResSect: .word 1 # No. Reserved Sectors
iFatCnt: .byte 2 # No. FAT Copies
iRootSize: .word 224 # Size of Root Dir.
iTotalSect: .word 2880 # Total no. of sectors (<32mb)
iMedia: .byte 0xF0 # Media Descriptor
iFatSize: .word 9 # Size of each FAT
iTrackSect: .word 9 # Sectors per Track
iHeadCnt: .word 2 # No. Read/Write Heads
iHiddenSect: .int 0 # No. Hidden Sectors
iSect32: .int 0 # No. Sectors if > 32mb
iBootDrive: .byte 0 # Boot Sectors comes from here
iReserved: .byte 0 # No. Reserved Sectors
iBootSign: .byte 0x29 # Extended boot sect. signature
iVolID: .ascii "seri" # Disk Serial
acVolLabel: .ascii "MYVOLUME " # Placeholder
acFSType: .ascii "FAT16 "
Any suggestions as to why this is happening?

Why is mmap done during printfs calls?

Why does printf() do an sys_mmap() and then copy the contents of string in chunks (of 1024) to new address space for sys_write() ?
Strace of simple static "hello" program is shown below.
> gcc -o hello -static hello.c
> strace ./hello
execve("./hello", ["./hello"], [/* 71 vars */]) = 0
uname({sys="Linux", node="Kumar", ...}) = 0
brk(0) = 0x1ce8000
brk(0x1ce91c0) = 0x1ce91c0
arch_prctl(ARCH_SET_FS, 0x1ce8880) = 0
readlink("/proc/self/exe", "/home/admin/hello", 4096) = 18
brk(0x1d0a1c0) = 0x1d0a1c0
brk(0x1d0b000) = 0x1d0b000
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 28), ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7feda2130000
write(1, "Hello", 5Hello) = 5
exit_group(0) = ?
+++ exited with 0 +++
Objdump of rodata
> objdump -s --start-address=0x4935a0 ./hello | head -5
./hello: file format elf64-x86-64
Contents of section .rodata:
4935a0 01000200 48656c6c 6f006c69 62632d73 ....Hello.libc-s
If we hook the address of sys_write() system call at kernel level, we see the address passed to it is of mmap-ed address region. Is it not just a waste of new address space, given that the string already exits in .rodata section in first loadable segment of binary. Has it got something to do with NO write permissions etc? Then why not make compiler put the string in .data section (which is writable as well) at first place?
UPDATE:
Mmap-ed address is indeed for sys_write() which can be verified in an easier way when we make the string bigger (say ~1500 chars). GDB will confirm the data address being printed [Note the second breakpoint]
(gdb) c
Continuing.
Hello World hhhhhhhhhhalhfafeuirafheuhrgiegieguehguergjkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqwwwwwwwwwwwwwwwwwwwwww pppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuiiiiiiiiiiiiiiiiiiiiiiiiiiiiiwqiuwqiuwiquwiqhchasnvjnavjanvjdanvjdanvjdanjfanvjaddijuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuquweuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuunnnnnnnnnnnnnnnnnnnnnnnnnnnzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz,,,,,,,,,,,,,,,,,,,,,,
Breakpoint 1, _IO_new_file_write (f=0x6b8300 <_IO_2_1_stdout_>, data=0x7ffff7ffc000, n=706) at fileops.c:1257
1257 {
Have you tried using a debugger?
$ gdb /tmp/hello
...
(gdb) b __mmap
Breakpoint 1 at 0x4152e0
(gdb) r
Starting program: /tmp/hello
Breakpoint 1, 0x00000000004152e0 in mmap64 ()
(gdb) bt
#0 0x00000000004152e0 in mmap64 ()
#1 0x000000000045d73c in _IO_file_doallocate ()
#2 0x0000000000401fec in _IO_doallocbuf ()
#3 0x000000000042ca10 in _IO_new_file_overflow ()
#4 0x000000000042be9d in _IO_new_file_xsputn ()
#5 0x000000000040111d in puts ()
#6 0x00000000004004de in main () at hello.c:4
(gdb) c
Continuing.
Hello, w
[Inferior 1 (process 4294) exited with code 011]
So it allocates memory for buffered input-output, which FILE* uses. Note that using printf with only constant string will cause puts call because GCC is smart enough. And puts(string) is actually an fputs(string, stdout) where stdout is FILE*.
Using raw write, however doesn't incur such behaviour:
#include <unistd.h>
int main() {
write(1, "Hello, w\n", sizeof("Hello, w\n"));
}

Resources