nasm system calls Linux - linux

I have got a question about linux x86 system calls in assembly.
When I am creating a new assembly program with nasm on linux, I'd like to know which system calls I have to use for doing a specific task (for example reading a file, writing output, or simple exiting...). I know some syscall because I've read them on some examples taken around internet (such as eax=0, ebx=1 int 0x80 exit with return value of 1), but nothing more... How could I know if there are other arguments for exit syscall? Or for another syscall? I'm looking for a docs that explain which syscalls have which arguments to pass in which registers.
I've read the man page about exit function etc. but it didn't explain to me what I'm asking.
Hope I was clear enough,
Thank you!

The x86 wiki (which I just updated again :) has links to the system call ABI (what the numbers are for every call, where to put the params, what instruction to run, and which registers will clobbered on return). This is not documented in the man page because it's architecture-specific. Same for binary constants: they don't have to be the same on every architecture.
grep -r O_APPEND /usr/include for your target architecture to recursively search the .h files.
Even better is to set things up so you can use the symbolic constants in your asm source, for readability and to avoid the risk of errors.
The gcc actually does use the C Preprocessor when processing .S files, but including most C header files will also get you some C prototypes.
Or convert the #defines to NASM macros with sed or something. Maybe feed some #include<> lines to the C preprocessor and have it print out just the macro definitions.
printf '#include <%s>\n' unistd.h sys/stat.h |
gcc -dD -E - |
sed -ne 's/^#define \([A-Za-z_0-9]*\) \(.\)/\1\tequ \2/p'
That turns every non-empty #define into a NASM symbol equ value. The resulting file has many lines of error: expression syntax error when I tried to run NASM on it, but manually selecting some valid lines from that may work.
Some constants are defined in multiple steps, e.g. #define S_IRGRP (S_IRUSR >> 3). This might or might not work when converted to NASM equ symbol definitions.
Also note that in C 0666, is an octal constant. In NASM, you need either 0o666 or 666o; a leading 0 is not special. Otherwise, NASM syntax for hex and decimal constants is compatible with C.

Perhaps you are looking for something like linux/syscalls.h[1], which you have on your system if you've installed the Linux source code via apt-get or whatever your distro uses.
[1] http://lxr.free-electrons.com/source/include/linux/syscalls.h#L326

Related

How would I lookup Linux syscalls by name/number in Go?

I'd like to lookup Linux syscalls for amd64 and i386 by name/number in Go, and was wondering if there's a built-in mapping available somewhere within the Go standard library, or a third-party module.
I can see here that the Go developers have hardcoded Linux syscall numbers into the syscall module:
i386: https://golang.org/src/syscall/zsysnum_linux_386.go
amd64: https://golang.org/src/syscall/zsysnum_linux_amd64.go
It looks like they've generated each of these files using GCC: https://golang.org/src/syscall/mksysnum_linux.pl
Example syscalls (amd64):
// mksysnum_linux.pl /usr/include/asm/unistd_32.h
// Code generated by the command above; DO NOT EDIT.
// +build 386,linux
package syscall
const (
SYS_RESTART_SYSCALL = 0
SYS_EXIT = 1
SYS_FORK = 2
SYS_READ = 3
SYS_WRITE = 4
SYS_OPEN = 5
SYS_CLOSE = 6
...
Would my best bet be to hard-code this mapping within my code, or is there a maintained mapping available somewhere?
I'm not looking for the mapping between syscall names/numbers on a particular Linux system, I'm looking for a (likely) mapping between syscall names/numbers on any (modern) Linux system on amd64/i386.
I understand that syscall numbers may change, but this is intended as a best-effort approach.
The mapping is in the kernel source, for example one architecture's mapping is /usr/include/asm/unistd_32.h
You should read that file side-by-side with the perl script that parses it, (the script is only a page long, and matches a very small number of #define patterns in the header file... some of the patterns will match many times in a row, finding the whole list of syscalls by name and number)
Also refer to this question (cross-site dupe):
Where do you find the syscall table for Linux?

Minimal assembly program ARM

I am learning assembly, but I'm having trouble understanding how a program is executed by a CPU when using gnu/linux on arm. I will elaborate.
Problem:
I want my program to return 5 as it's exit status.
Assembly for this is:
.text
.align 2
.global main
.type main, %function
main:
mov w0, 5 //move 5 to register w0
ret //return
I then assemble it with:
as prog.s -o prog.o
Everything ok up to here. I understand I then have to link my object file to the C library in order to add additional code that will make my program run. I then link with(paths are omitted for clarity):
ld crti.o crtn.o crt1.o libc.so prog.o ld-linux-aarch64.so.1 -o prog
After this, things work as expected:
./prog; echo $?
5
My problem is that I can't figure out what the C standard library is actually doing here. I more or less understand crti/n/1 are adding entry code to my program (eg the .init and .start sections), but no clue what's libc purpose.
I am interested at what would be a minimal assembly implementation of "returning 5 as exit status"
Most resources on the web focus on the instructions and program flow once you are in main. I am really interested at what are all the steps that go on once I execute with ./. I am now going through computer architecture textbooks, but I hope I can get a little help here.
The C language starts at main() but for C to work you in general need at least a minimal bootstrap. For example before main can be called from the C bootstrap
1) stack/stackpointer
2) .data initialized
3) .bss initalized
4) argc/argv prepared
And then there is C library which there are many/countless C libraries and each have their own designs and requirements to be satisfied before main is called. The C library makes system calls into the system so this starts to become system (operating system, Linux, Windows, etc) dependent, depending on the design of the C library that may be a thin shim or heavily integrated or somewhere in between.
Likewise for example assuming that the operating system is taking the "binary" (binary formats supported by the operating system and rules for that format are defined by the operating system and the toolchain must conform likewise the C library (even though you see the same brand name sometimes assume toolchain and C library are separate entities, one designed to work with the other)) from a non volatile media like a hard drive or ssd and copying the relevant parts into memory (some percentage of the popular, supported, binary file formats, are there for debug or file format and not actually code or data that is used for execution).
So this leaves a system level design option of does the binary file format indicate .data, .bss, .text, etc (note that .data, .bss, .text are not standards just convention most people know what that means even if a particular toolchain did not choose to use those names for sections or even the term sections).
If so the operating systems loader that takes the program and loads it into memory can choose to put .data in the right place and zero .bss for you so that the bootstrap does not have to. In a bare-metal situation the bootstrap would normally handle the read/write items because it is not loaded from media by some other software it is often simply in the address space of the processor on a rom of some flavor.
Likewise argv/argc could be handled by the operating systems tool that loads the binary as it had to parse out the location of the binary from the command line assuming the operating system has/uses a command line interface. But it could as easily simply pass the command line to the bootstrap and the bootstrap has to do it, these are system level design choices that have little to do with C but everything to do with what happens before main is called.
The memory space rules are defined by the operating system and between the operating system and the C library which often contains the bootstrap due to its intimate nature but I guess the C library and bootstrap could be separate. So linking plays a role as well, does this operating system support protection is it just read/write memory and you just need to spam it in there or are there separate regions for read/only (.text, .rodata, etc) and read/write (.data, .bss, etc). Someone needs to handle that, linker script and bootstrap often have a very intimate relationship and the linker script solution is specific to a toolchain not assumed to be portable, why would it, so while there are other solutions the common solution is that there is a C library with a bootstrap and linker solution that are heavily tied to the operating system and target processor.
And then you can talk about what happens after main(). I am happy to see you are using ARM not x86 to learn first, although aarch64 is a nightmare for a first one, not the instruction set just the execution levels and all the protections, you can go a long long way with this approach but there are some things and some instructions you cannot touch without going bare metal. (assuming you are using a pi there is a very good bare-metal forum with a lot of good resources).
The gnu tools are such that binutils and gcc are separate but intimately related projects. gcc knows where things are relative to itself so assuming you combined gcc with binutils and glibc (or you just use the toolchain you found), gcc knows relative to where it executed to find these other items and what items to pass when it calls the linker (gcc is to some extent just a shell that calls a preprocessor a compiler the assembler then linker if not instructed not to do these things). But the gnu binutils linker does not. While as distasteful as it feels to use, it is easier just to
gcc test.o -o test
rather than figure out for that machine that day what all you need on the ld command line and what paths and depending on design the order on the command line of the arguments.
Note you can probably get away with this as a minimum
.global main
.type main, %function
main:
mov w0, 5 //move 5 to register w0
ret //return
or see what gcc generates
unsigned int fun ( void )
{
return 5;
}
.arch armv8-a
.file "so.c"
.text
.align 2
.p2align 4,,11
.global fun
.type fun, %function
fun:
mov w0, 5
ret
.size fun, .-fun
.ident "GCC: (GNU) 10.2.0"
I am used to seeing more fluff in there:
.arch armv5t
.fpu softvfp
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.eabi_attribute 26, 2
.eabi_attribute 30, 2
.eabi_attribute 34, 0
.eabi_attribute 18, 4
.file "so.c"
.text
.align 2
.global fun
.syntax unified
.arm
.type fun, %function
fun:
# args = 0, pretend = 0, frame = 0
# frame_needed = 0, uses_anonymous_args = 0
# link register save eliminated.
mov r0, #5
bx lr
.size fun, .-fun
.ident "GCC: (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609"
.section .note.GNU-stack,"",%progbits
Either way, you can look up each of the assembly language items and decide if you really need them or not, depends in part on if you feel the need to use a debugger or binutils tools to tear apart the binary (do you really need to know the size of fun for example in order to learn assembly language?)
If you wish to control all of the code and not link with a C library you are more than welcome to you need to know the memory space rules for the operating system and create a linker script (the default one may in part be tied to the C library and is no doubt overly complicated and not something you would want to use as a starting point). In this case being two instructions in main you simply need the one address space valid for the binary, however the operating system enters (ideally using the ENTRY(label), which could be main if you want but often is not _start is often found in linker scripts but is not a rule either, you choose. And as pointed out in comments you would need to make the system call to exit the program. System calls are specific to the operating system and possibly version and not specific to a target (ARM), so you would need to use the right one in the right way, very doable, your whole project linker script and assembly language could be maybe a couple dozen lines of code total. We are not here to google those for you so you would be on your own for that.
Part of your problem here is you are searching for compiler solutions when the compiler in general has absolutely nothing to do with any of this. A compiler takes one language turns it into another language. An assembler same deal but one is simple and the other usually machine code, bits. (some compilers output bits not text as well). It is equivalent to looking up the users manual for a table saw to figure out how to build a house. The table saw is just a tool, one of the tools you need, but just a generic tool. The compiler, specific gnu's gcc, is generic it does not even know what main() is. Gnu follows the Unix way so it has a separate binutils and C library, separate developments, and you do not have to combine them if you do not want to, you can use them separately. And then there is the operating system so half your question is buried in operating system details, the other half in a particular C library or other solution to connect main() to the operating system.
Being open source you can go look at the bootstrap for glibc and others and see what they do. Understanding this type of open source project the code is nearly unreadable, much easier to disassemble sometimes, YMMV.
You can search for the Linux system calls for arm aarch64 and find the one for exit, you can likely see that the open source C libraries or bootstrap solutions you find that are buried under what you are using today, will call exit but if not then there is some other call they need to make to return back to the operating system. It is unlikely it is a simple ret with a register that holds a return value, but technically that is how someone could choose to do it for their operating system.
I think you will find for Linux on arm that Linux is going to parse the command line and pass argc/argv in registers, so you can simply use them. And likely going to prep .data and .bss so long as you build the binary correctly (you link it correctly).
Here's a bare minimum example.
Run it with:
gcc -c thisfile.S && ld thisfile.o && ./a.out
Source code:
#include <sys/syscall.h>
.global _start
_start:
movq $SYS_write, %rax
movq $1, %rdi
movq $st, %rsi
movq $(ed - st), %rdx
syscall
movq $SYS_exit, %rax
movq $1, %rdi
syscall
st:
.ascii "\033[01;31mHello, OS World\033[0m\n"
ed:

How to find the address of a not imported libc function when ASLR is on?

I have a 32bit elf program that I have to exploit remotely (for academic purposes).
The final goal is to spawn a shell. I have a stack that I can fill with any data I want and I can abuse one of the printf format strings. The only problem is that system/execv/execvp is not imported. The .got.plt segment is full of not-very-useful functions and I want to replace atoi with system because of how similar their signature is and the flow of the code indicates that that is the right function to replace. For the following attempts, I used IDA remote debug, so bad stack alignment and not proper format string is out of question. I wanted to make sure it is doable and apparently for me it isn't yet.
At first I tried to replace atoi#.got.plt with the unrandomized address of system. Got SIGSEGV.
Alright, it's probably because of ASLR, so let's try something else. I loaded up gdb and looked up system#0xb7deeda0 and atoi#0xb7de1250. Then I calculated the diff, which is 0xDB50. So the next time when I changed the address of atoi to system in the .got.plt segment, I actually just added diff to that value to get the address of system. Got SIGSEGV again.
My logic:
0xb7deeda0 <__libc_system>
0xb7de1250 <atoi>
diff = 0xb7deeda0 - 0xb7de1250
system#.got.plt = atoi#.got.plt + diff
example: 0x08048726 + DB50 = 0x08056276
Can anyone tell me what I did wrong and how can I jump to a "valid system()" with the help of leaking a function address from .got.plt?
Answering to my own question. Measuring the distance between functions in your
l̲o̲c̲a̲l̲ libc does not guarantee that the r̲e̲m̲o̲t̲e̲ libc will have the same alignment.
You have to find the libc version somehow, then you can get the address difference like so:
readelf -s /lib32/libc-2.19.so | grep printf
Possible ways to find the libc version if you know two addresses:
Libc binary collection
libcdb.com
pwnlib
... or you have access to the shell on the remote machine and can peek into the library with readelf yourself

How to seach for specific instructions in an ELF binary

How do i search of a specific ASM instruction in an ELF executable?
eg. I want to check if the sequence mov $0,%eax in my executable. Are there any tools for this? Is there any tool which lets me search for the similar instructions which varies only by the register used?
eg: It should match both mov $0,%eax as well as move $0,%ecx.
I have resorted to two methods for the requirement.
The easiest was objdump. Convert the binary to asm format using objdump -S myexe > myexe.s and use grep for the most basic type of search.
When my search requirements got advanced, for eg. find jmp instructions with pop instructions just before the jmp, i moved on to using ruby regexes instead of grep to keep track of jmp's and backtrack for pops.
But at times i had to scan through really large binaries ( 100 MB ) and the objdump ruby method was slow and getting killed by the kernel ( my script was very basic. bad scripting could be part of the reason. but it was still very slow )
I went through the code of llvm-objdump and tinkered around to keep track of when a particular opcode was occuring, when particular operands were occuring etc. Trust me, the existing code was very easy to understand and make modifications to. This modified objdump was tons faster and got the job done. I made the modofications to DisassembleObject where there is a
for (Index = Start; Index < End; Index += Size)
main loop which goes through each instruction in the executable. Inst is the name of the current instruction which you can do checks on and introduce your own code.

gdb break when program opens specific file

Back story: While running a program under strace I notice that '/dev/urandom' is being open'ed. I would like to know where this call is coming from (it is not part of the program itself, it is part of the system).
So, using gdb, I am trying to break (using catch syscall open) program execution when the open call is issued, so I can see a backtrace. The problem is that open is being called alot, like several hundred times so I can't narrow down the specific call that is opening /dev/urandom. How should I go about narrowing down the specific call? Is there a way to filter by arguments, and if so how do I do it for a syscall?
Any advice would be helpful -- maybe I am going about this all wrong.
GDB is a pretty powerful tool, but has a bit of a learning curve.
Basically, you want to set up a conditional breakpoint.
First use the -i flag to strace or objdump -d to find the address of the open function or more realistically something in the chain of getting there, such as in the plt.
set a breakpoint at that address (if you have debug symbols, you can use those instead, omitting the *, but I'm assuming you don't - though you may well have them for library functions if nothing else.
break * 0x080482c8
Next you need to make it conditional
(Ideally you could compare a string argument to a desired string. I wasn't getting this to work within the first few minutes of trying)
Let's hope we can assume the string is a constant somewhere in the program or one of the libraries it loads. You could look in /proc/pid/maps to get an idea of what is loaded and where, then use grep to verify the string is actually in a file, objdump -s to find it's address, and gdb to verify that you've actually found it in memory by combining the high part of the address from maps with the low part from the file. (EDIT: it's probably easier to use ldd on the executable than look in /proc/pid/maps)
Next you will need to know something about the abi of the platform you are working on, specifically how arguments are passed. I've been working on arm's lately, and that's very nice as the first few arguments just go in registers r0, r1, r2... etc. x86 is a bit less convenient - it seems they go on the stack, ie, *($esp+4), *($esp+8), *($esp+12).
So let's assume we are on an x86, and we want to check that the first argument in esp+4 equals the address we found for the constant we are trying to catch it passing. Only, esp+4 is a pointer to a char pointer. So we need to dereference it for comparison.
cond 1 *(char **)($esp+4)==0x8048514
Then you can type run and hope for the best
If you catch your breakpoint condition, and looking around with info registers and the x command to examine memory seems right, then you can use the return command to percolate back up the call stack until you find something you recognize.
(Adapted from a question edit)
Following Chris's answer, here is the process that eventually got me what I was looking for:
(I am trying to find what functions are calling the open syscall on "/dev/urandom")
use ldd on executable to find loaded libraries
grep through each lib (shell command) looking for 'urandom'
open library file in hex editor and find address of string
find out how parameters are passed in syscalls (for open, file is first parameter. on x86_64 it is passed in rdi -- your mileage may vary
now we can set the conditional breakpoint: break open if $rdi == _addr_
run program and wait for break to hit
run bt to see backtrace
After all this I find that glib's g_random_int() and g_rand_new() use urandom. Gtk+ and ORBit were calling these functions -- if anybody was curious.
Like Andre Puel said:
break open if strcmp($rdi,"/dev/urandom") == 0
Might do the job.

Resources