In linux assembly, we can write a string to the default output with system call write. But this system call needs the string length, but the argument doesn't have a fixed length over all the executions.
So, I know that we can calculate the length of the argument by browsing it and looking for the null byte. Although, I am looking for a simpler way to print an argument (or any string with unknown length) with Linux assembly.
So can anyone tell me the simplest way to print an unknown string length with Linux assembly.
There are no Linux system calls that write an implicit-length string (C-style null-terminated) to a file descriptor. So you have to just work out the length yourself before making a system call.
Linux is portable across many architectures, so I'll express the answer in portable assembly language, aka C:
int write_implicit_length_string(const char *str) {
size_t size = strlen(str);
return write(1, str, size); // stdout is always fd 1
}
If you want to see the asm, compile it with gcc (although that will just show you a function call to strlen. gcc -O3 doesn't inline code for strlen on x86).
As far as asm implementations of strlen, for x86-64 your best bet is an SSE2 loop that uses pcmpeqb / pmovmskb / test / jnz to find the first zero byte. Obviously every ISA will have its own way of doing it, but the important point is that there's no way to have the kernel do it for you.
There are C standard library functions that print strings to stdio FILE * (e.g. fputs) but not to unix file descriptors (libc just has wrappers for system calls).
Related
When I try to research about return values of system calls of the kernel, I find tables that describe them and what do I need to put in the different registers to let them work. However, I don't find any documentation where it states what is that return value I get from the system call. I'm just finding in different places that what I receive will be in the EAX register.
TutorialsPoint:
The result is usually returned in the EAX register.
Assembly Language Step-By-Step: Programming with Linux book by Jeff Duntemann states many times in his programs:
Look at sys_read's return value in EAX
Copy sys_read return value for safe keeping
Any of the websites I have don't explain about this return value. Is there any Internet source? Or can someone explain me about this values?
See also this excellent LWN article about system calls which assumes C knowledge.
Also: The Definitive Guide to Linux System Calls (on x86), and related: What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code?
C is the language of Unix systems programming, so all the documentation is in terms of C. And then there's documentation for the minor differences between the C interface and the asm on any given platform, usually in the Notes section of man pages.
sys_read means the raw system call (as opposed to the libc wrapper function). The kernel implementation of the read system call is a kernel function called sys_read(). You can't call it with a call instruction, because it's in the kernel, not a library. But people still talk about "calling sys_read" to distinguish it from the libc function call. However, it's ok to say read even when you mean the raw system call (especially when the libc wrapper doesn't do anything special), like I do in this answer.
Also note that syscall.h defines constants like SYS_read with the actual system call number, or asm/unistd.h for the Linux __NR_read names for the same constants. (The value you put in EAX before an int 0x80 or syscall instruction).
Linux system call return values (in EAX/RAX on x86) are either "normal" success, or a -errno code for error. e.g. -EFAULT if you pass an invalid pointer. This behaviour is documented in the syscalls(2) man page.
-1 to -4095 means error, anything else means success. See AOSP non-obvious syscall() implementation for more details on this -4095UL .. -1UL range, which is portable across architectures on Linux, and applies to every system call. (In the future, a different architecture could use a different value for MAX_ERRNO, but the value for existing arches like x86-64 is guaranteed to stay the same as part of Linus's don't-break-userspace policy of keeping kernel ABIs stable.)
For example, glibc's generic syscall(2) wrapper function uses this sequence: cmp rax, -4095 / jae SYSCALL_ERROR_LABEL, which is guaranteed to be future-proof for all Linux system calls.
You can use that wrapper function to make any system call, like syscall( __NR_mmap, ... ). (Or use an inline-asm wrapper header like https://github.com/linux-on-ibm-z/linux-syscall-support/blob/master/linux_syscall_support.h that has safe inline-asm for multiple ISAs, avoiding problems like missing "memory" clobbers that some other inline-asm wrappers have.)
Interesting cases include getpriority where the kernel ABI maps the -20..19 return-value range to 1..40, and libc decodes it. More details in a related answer about decoding syscall error return values.
For mmap, if you wanted you could also detect error just by checking that the return value isn't page-aligned (e.g. any non-zero bits in the low 11, for a 4k page size), if that would be more efficient than checking p > -4096ULL.
To find the actual numeric values of constants for a specific platform, you need to find the C header file where they're #defined. See my answer on a question about that for details. e.g. in asm-generic/errno-base.h / asm-generic/errno.h.
The meanings of return values for each sys call are documented in the section 2 man pages, like read(2). (sys_read is the raw system call that the glibc read() function is a very thin wrapper for.) Most man pages have a whole section for the return value. e.g.
RETURN VALUE
On success, the number of bytes read is returned (zero indicates
end of file), and the file position is advanced by this number. It
is not an error if this number is smaller than the number of bytes
requested; this may happen for example because fewer bytes are
actually available right now (maybe because we were close to end-of-
file, or because we are reading from a pipe, or from a terminal), or
because read() was interrupted by a signal. See also NOTES.
On error, -1 is returned, and errno is set appropriately. In this
case, it is left unspecified whether the file position (if any)
changes.
Note that the last paragraph describes how the glibc wrapper decodes the value and sets errno to -EAX if the raw system call's return value is negative, so errno=EFAULT and return -1 if the raw system call returned -EFAULT.
And there's a whole section listing all the possible error codes that read() is allowed to return, and what they mean specifically for read(). (POSIX standardizes most of this behaviour.)
I'm trying to understand what the Linux syscall() function expects to get. I'm looking at the man of the syscall and I can't seem to figure out the amount of parameters and what they represent. In the source code:
extern long int syscall (long int __sysno, ...) __THROW;
Does it mean that it can handle unlimited number of parameters? If not, what which parameter represents?
The second arg ... indicates a variadic function -- one that accepts a variable number of args; common examples are printf() and co. By design, while the number and types of args are unknown to any variadic function, for syscall() the correct arg-count and types are specific to each system call, which is indexed by __sysno and should be a manifest constant like SYS_exit found in a system header.
Although the number of args is mostly unlimited, there are practical limitations, performance considerations, and arch differences; in short, fewer is often better.
Note that variadic functions can be quite versatile. As one example: create your own (error_message + exit) variadic routine that combines an error status as the first arg followed by printf args; see man stdarg and services like vdprintf() and vfprintf().
Dual benefits include more concise source and a smaller .text segment.
How should I format my input for the return to libc attack in the following code:
void example_function(int x, const char *name)
{
void (*foo)(int, const char *) = http_serve_none;
char buf[1024];
sprintf(buf, name);
foo(x, buf);
}
Given that the stack is non executable. I want to do return to libc attack by changing foo function pointer to system in libc and not by changing the return address of example_function. What I've done so far is used the conventional method for the input:
padding + address of system ( at foo function's address ) + address of
exit + ptr to string ( string = "/bin/sh" )
but however this is not working. I don't know how to format my argument in input string for system call. I searched a lot on internet but everywhere I saw calling system() using return address only.
Extra Assumption:
there are no '0' in system call address. Machine is 32 bit and sprintf is working properly i.e, storing name into the buffer buf[1024].
I solved it finally... I just passed pointer to string "/bin/sh" at address of x and it worked for me
First, make sure your sprintf is actually sprintf and not __sprintf_chk which is going to try to thwart your overflow. You can tell it's the fortify version because you'll get a SIGABRT rather than something else (like SIGSEGV). You can turn off fortify globally, but you can also just get avoid the macro like:
(sprintf)(buf, name);
Your next problem is zeros. If the address of system has any zero bytes in it, then the sprintf is going to stop copying at that point. This is both "well, there are an awful lot of 64-bit addresses with zeros in them," and intentional. It's called ASCII armoring, and on some platforms important libc functions are made sure to have zeros in them to stop exactly this attack. (See Writing a return-to-libc attack, but libc is loaded at 0x00 in memory for a little more on that.)
You may want to explore this attack using a function in your program (i.e. not libc) that you've ensured has no zeros in it. That's a bit easier to do if you build for 32-bit. And of course, you probably want to turn off fortify and stack protection and all the other things that the compiler is doing to stop this :D (at least until you have have a basic version working; then you can turn those on one at a time).
How can I see the implementation of function execve (under x86_64 Linux), it is in the library unistd?
I want this because I want to know how can I call an external program using assembler, without calling execve.
I know that there is a syscall named execve, but I don't know how can I use it.
How can I put a variable of type char * and type char * [] into registers ?
The implementation of the execve() function in userspace looks something like:
int execve(const char *filename, char * const argv[], char * const envp[]) {
return syscall(SYS_execve, filename, argv, envp);
}
All of the actual "work" is done in the kernel. There's nothing particularly interesting happening in libc, besides perhaps some threading cleanup.
Just take a look at the kernel sources (more specifically: arch/YOUR-ARCH/kernel/head*.S) for the system call convention on your architecture (registers and/or stack for the syscall number and the parameters).
On ARM, for example, you would load __NR_execve into r7, load the arguments into r0, r1, r2 and then use swi 0. You might be interested in this explantion of ARM EABI syscalls for more details.
There is no real straightforward implementation of system calls in the source code to glibc - this is generated at build time from various files defining the system call numbers.
The relevant information can be found in sysdep.h if you understand it, except for the actual system call numbers (you want __NR_execve with, IIRC, #include <asm/unistd.h> - I can't recall offhand what it is on x86_64).
The system call number goes in %rax, and the arguments go in %rdi %rsi %rdx. All this information (including stack alignment and something about register usage by the kernel) is commented in sysdep.h.
I need some reference but a good one, possibly with some nice examples. I need it because I am starting to write code in assembly using the NASM assembler. I have this reference:
http://bluemaster.iu.hio.no/edu/dark/lin-asm/syscalls.html
which is quite nice and useful, but it's got a lot of limitations because it doesn't explain the fields in the other registers. For example, if I am using the write syscall, I know I should put 1 in the EAX register, and the ECX is probably a pointer to the string, but what about EBX and EDX? I would like that to be explained too, that EBX determines the input (0 for stdin, 1 for something else etc.) and EDX is the length of the string to be entered, etc. etc. I hope you understood me what I want, I couldn't find any such materials so that's why I am writing here.
Thanks in advance.
The standard programming language in Linux is C. Because of that, the best descriptions of the system calls will show them as C functions to be called. Given their description as a C function and a knowledge of how to map them to the actual system call in assembly, you will be able to use any system call you want easily.
First, you need a reference for all the system calls as they would appear to a C programmer. The best one I know of is the Linux man-pages project, in particular the system calls section.
Let's take the write system call as an example, since it is the one in your question. As you can see, the first parameter is a signed integer, which is usually a file descriptor returned by the open syscall. These file descriptors could also have been inherited from your parent process, as usually happens for the first three file descriptors (0=stdin, 1=stdout, 2=stderr). The second parameter is a pointer to a buffer, and the third parameter is the buffer's size (as an unsigned integer). Finally, the function returns a signed integer, which is the number of bytes written, or a negative number for an error.
Now, how to map this to the actual system call? There are many ways to do a system call on 32-bit x86 (which is probably what you are using, based on your register names); be careful that it is completely different on 64-bit x86 (be sure you are assembling in 32-bit mode and linking a 32-bit executable; see this question for an example of how things can go wrong otherwise). The oldest, simplest and slowest of them in the 32-bit x86 is the int $0x80 method.
For the int $0x80 method, you put the system call number in %eax, and the parameters in %ebx, %ecx, %edx, %esi, %edi, and %ebp, in that order. Then you call int $0x80, and the return value from the system call is on %eax. Note that this return value is different from what the reference says; the reference shows how the C library will return it, but the system call returns -errno on error (for instance -EINVAL). The C library will move this to errno and return -1 in that case. See syscalls(2) and intro(2) for more detail.
So, in the write example, you would put the write system call number in %eax, the first parameter (file descriptor number) in %ebx, the second parameter (pointer to the string) in %ecx, and the third parameter (length of the string) in %edx. The system call will return in %eax either the number of bytes written, or the error number negated (if the return value is between -1 and -4095, it is a negated error number).
Finally, how do you find the system call numbers? They can be found at /usr/include/linux/unistd.h. On my system, this just includes /usr/include/asm/unistd.h, which finally includes /usr/include/asm/unistd_32.h, so the numbers are there (for write, you can see __NR_write is 4). The same goes for the error numbers, which come from /usr/include/linux/errno.h (on my system, after chasing the inclusion chain I find the first ones at /usr/include/asm-generic/errno-base.h and the rest at /usr/include/asm-generic/errno.h). For the system calls which use other constants or structures, their documentation tells which headers you should look at to find the corresponding definitions.
Now, as I said, int $0x80 is the oldest and slowest method. Newer processors have special system call instructions which are faster. To use them, the kernel makes available a virtual dynamic shared object (the vDSO; it is like a shared library, but in memory only) with a function you can call to do a system call using the best method available for your hardware. It also makes available special functions to get the current time without even having to do a system call, and a few other things. Of course, it is a bit harder to use if you are not using a dynamic linker.
There is also another older method, the vsyscall, which is similar to the vDSO but uses a single page at a fixed address. This method is deprecated, will result in warnings on the system log if you are using recent kernels, can be disabled on boot on even more recent kernels, and might be removed in the future. Do not use it.
If you download that web page (like it suggests in the second paragraph) and download the kernel sources, you can click the links in the "Source" column, and go directly to the source file that implements the system calls. You can read their C signatures to see what each parameter is used for.
If you're just looking for a quick reference, each of those system calls has a C library interface with the same name minus the sys_. So, for example, you could check out man 2 lseek to get the information about the parameters forsys_lseek:
off_t lseek(int fd, off_t offset, int whence);
where, as you can see, the parameters match the ones from your HTML table:
%ebx %ecx %edx
unsigned int off_t unsigned int