calling printf from assembly language on 64bit and 32bit architecture using nasm - linux

I want to call printf function from assembly language in linux.
i want to know the method for for 64 bit and 32 bit assembly language programs.
1) please tell me for two cases if i want to pass a 32 bit arguement and 64 bit arguement in printf with a string. how should i do it?
2) for x86 32 bit architecture if i want to do the same thing as in point 1.
please tell me the code. and let me know do i need to adjust the stack for both cases and do i just need to pass the arguements in registers?
Thanks alot

There are 2 ways to print a string with assembly language in Linux.
1) Use syscall for x64, or int 0x80 for x86. It's not printf, it's kernel routines. You can find more here (x86) and here (x64).
2) Use printf from glibc. I assume you are familiar with the structure of NASM program, so here is a nice x86 example from acm.mipt.ru:
global main
;Declare used libc functions
extern exit
extern puts
extern scanf
extern printf
section .text
main:
;Arguments are passed in reversed order via stack (for x86)
;For x64 first six arguments are passed in straight order
; via RDI, RSI, RDX, RCX, R8, R9 and other are passed via stack
;The result comes back in EAX/RAX
push dword msg
call puts
;After passing arguments via stack, you have to clear it to
; prevent segfault with add esp, 4 * (number of arguments)
add esp, 4
push dword a
push dword b
push dword msg1
call scanf
add esp, 12
;For x64 this scanf call will look like:
; mov rdi, msg1
; mov rsi, b
; mov rdx, a
; call scanf
mov eax, dword [a]
add eax, dword [b]
push eax
push dword msg2
call printf
add esp, 8
push dword 0
call exit
add esp, 4
ret
section .data
msg : db "An example of interfacing with GLIBC.",0xA,0
msg1 : db "%d%d",0
msg2 : db "%d", 0xA, 0
section .bss
a resd 1
b resd 1
You can assembly it with nasm -f elf32 -o foo.o foo.asm and link with gcc -m32 -o foo foo.o for x86. For x64 just replace elf32 with elf64 and -m32 with -m64. Note than you need gcc-multilib to build x86 programs on x64 system using gcc.

Related

Segmentation fault (core dumped) when I run my assembly code [duplicate]

I've been looking at a tutorial for assembly, and I'm trying to get a hello world program to run. I am using Bash on Ubuntu on Windows.
Here is the assembly:
section .text
global _start ;must be declared for linker (ld)
_start: ;tells linker entry point
mov edx,len ;message length
mov ecx,msg ;message to write
mov ebx,1 ;file descriptor (stdout)
mov eax,4 ;system call number (sys_write)
int 0x80 ;call kernel
mov eax,1 ;system call number (sys_exit)
int 0x80 ;call kernel
section .data
msg db 'Hello, world!', 0xa ;string to be printed
len equ $ - msg ;length of the string
I am using these commands to create the executable:
nasm -f elf64 hello.asm -o hello.o
ld -o hello hello.o -m elf_x86_64
And I run it using:
./hello
The program then seems to run without a segmentation fault or error, but it produces no output.
I can't figure out why the code won't produce an output, but I wonder if using Bash on Ubuntu on Windows has anything to do with it? Why doesn't it produce output and how can I fix it?
Related: WSL2 does allow 32-bit user-space programs, WSL1 doesn't. See Does WSL 2 really support 32 bit program? re: making sure you're actually using WSL2. The rest of this answer was written before WLS2 existed.
The issue is with Ubuntu for Windows (Windows Subsystem for Linux version 1). It only supports the 64-bit syscall interface and not the 32-bit x86 int 0x80 system call mechanism.
Besides not being able to use int 0x80 (32-bit compatibility) in 64-bit binaries, Ubuntu on Windows (WSL1) doesn't support running 32-bit executables either. (Same as if you'd built a real Linux kernel without CONFIG_IA32_EMULATION, like some Gentoo users do.)
You need to convert from using int 0x80 to syscall. It's not difficult. A different set of registers are used for a syscall and the system call numbers are different from their 32-bit counterparts. Ryan Chapman's blog has information on the syscall interface, the system calls, and their parameters. Sys_write and Sys_exit are defined this way:
%rax System call %rdi %rsi %rdx %r10 %r8 %r9
----------------------------------------------------------------------------------
0 sys_read unsigned int fd char *buf size_t count
1 sys_write unsigned int fd const char *buf size_t count
60 sys_exit int error_code
Using syscall also clobbers RCX and the R11 registers. They are considered volatile. Don't rely on them being the same value after the syscall.
Your code could be modified to be:
section .text
global _start ;must be declared for linker (ld)
_start: ;tells linker entry point
mov edx,len ;message length
mov rsi,msg ;message to write
mov edi,1 ;file descriptor (stdout)
mov eax,edi ;system call number (sys_write)
syscall ;call kernel
xor edi, edi ;Return value = 0
mov eax,60 ;system call number (sys_exit)
syscall ;call kernel
section .data
msg db 'Hello, world!', 0xa ;string to be printed
len equ $ - msg ;length of the string
Note: in 64-bit code if the destination register of an instruction is 32-bit (like EAX, EBX, EDI, ESI etc) the processor zero extends the result into the upper 32-bits of the 64-bit register. mov edi,1 has the same effect as mov rdi,1.
This answer isn't a primer on writing 64-bit code, only about using the syscall interface. If you are interested in the nuances of writing code that calls the C library, and conforms to the 64-bit System V ABI there are reasonable tutorials to get you started like Ray Toal's NASM tutorial. He discusses stack alignment, the red zone, register usage, and a basic overview of the 64-bit System V calling convention.
As already pointed out in comments by Ross Ridge, don't use 32-bit calling of kernel functions when you compile 64bit.
Either compile for 32bit or "translate" the code into 64 bit syscalls.
Here is what that could look like:
section .text
global _start ;must be declared for linker (ld)
_start: ;tells linker entry point
mov rdx,len ;message length
mov rsi,msg ;message to write
mov rdi,1 ;file descriptor (stdout)
mov rax,1 ;system call number (sys_write)
syscall ;call kernel
mov rax,60 ;system call number (sys_exit)
mov rdi,0 ;add this to output error code 0(to indicate program terminated without errors)
syscall ;call kernel
section .data
msg db 'Hello, world!', 0xa ;string to be printed
len equ $ - msg ;length of the string

Segmentation fault in assembly code using macros [duplicate]

I've been looking at a tutorial for assembly, and I'm trying to get a hello world program to run. I am using Bash on Ubuntu on Windows.
Here is the assembly:
section .text
global _start ;must be declared for linker (ld)
_start: ;tells linker entry point
mov edx,len ;message length
mov ecx,msg ;message to write
mov ebx,1 ;file descriptor (stdout)
mov eax,4 ;system call number (sys_write)
int 0x80 ;call kernel
mov eax,1 ;system call number (sys_exit)
int 0x80 ;call kernel
section .data
msg db 'Hello, world!', 0xa ;string to be printed
len equ $ - msg ;length of the string
I am using these commands to create the executable:
nasm -f elf64 hello.asm -o hello.o
ld -o hello hello.o -m elf_x86_64
And I run it using:
./hello
The program then seems to run without a segmentation fault or error, but it produces no output.
I can't figure out why the code won't produce an output, but I wonder if using Bash on Ubuntu on Windows has anything to do with it? Why doesn't it produce output and how can I fix it?
Related: WSL2 does allow 32-bit user-space programs, WSL1 doesn't. See Does WSL 2 really support 32 bit program? re: making sure you're actually using WSL2. The rest of this answer was written before WLS2 existed.
The issue is with Ubuntu for Windows (Windows Subsystem for Linux version 1). It only supports the 64-bit syscall interface and not the 32-bit x86 int 0x80 system call mechanism.
Besides not being able to use int 0x80 (32-bit compatibility) in 64-bit binaries, Ubuntu on Windows (WSL1) doesn't support running 32-bit executables either. (Same as if you'd built a real Linux kernel without CONFIG_IA32_EMULATION, like some Gentoo users do.)
You need to convert from using int 0x80 to syscall. It's not difficult. A different set of registers are used for a syscall and the system call numbers are different from their 32-bit counterparts. Ryan Chapman's blog has information on the syscall interface, the system calls, and their parameters. Sys_write and Sys_exit are defined this way:
%rax System call %rdi %rsi %rdx %r10 %r8 %r9
----------------------------------------------------------------------------------
0 sys_read unsigned int fd char *buf size_t count
1 sys_write unsigned int fd const char *buf size_t count
60 sys_exit int error_code
Using syscall also clobbers RCX and the R11 registers. They are considered volatile. Don't rely on them being the same value after the syscall.
Your code could be modified to be:
section .text
global _start ;must be declared for linker (ld)
_start: ;tells linker entry point
mov edx,len ;message length
mov rsi,msg ;message to write
mov edi,1 ;file descriptor (stdout)
mov eax,edi ;system call number (sys_write)
syscall ;call kernel
xor edi, edi ;Return value = 0
mov eax,60 ;system call number (sys_exit)
syscall ;call kernel
section .data
msg db 'Hello, world!', 0xa ;string to be printed
len equ $ - msg ;length of the string
Note: in 64-bit code if the destination register of an instruction is 32-bit (like EAX, EBX, EDI, ESI etc) the processor zero extends the result into the upper 32-bits of the 64-bit register. mov edi,1 has the same effect as mov rdi,1.
This answer isn't a primer on writing 64-bit code, only about using the syscall interface. If you are interested in the nuances of writing code that calls the C library, and conforms to the 64-bit System V ABI there are reasonable tutorials to get you started like Ray Toal's NASM tutorial. He discusses stack alignment, the red zone, register usage, and a basic overview of the 64-bit System V calling convention.
As already pointed out in comments by Ross Ridge, don't use 32-bit calling of kernel functions when you compile 64bit.
Either compile for 32bit or "translate" the code into 64 bit syscalls.
Here is what that could look like:
section .text
global _start ;must be declared for linker (ld)
_start: ;tells linker entry point
mov rdx,len ;message length
mov rsi,msg ;message to write
mov rdi,1 ;file descriptor (stdout)
mov rax,1 ;system call number (sys_write)
syscall ;call kernel
mov rax,60 ;system call number (sys_exit)
mov rdi,0 ;add this to output error code 0(to indicate program terminated without errors)
syscall ;call kernel
section .data
msg db 'Hello, world!', 0xa ;string to be printed
len equ $ - msg ;length of the string

Linux Assembly x86_64 create a file using command line parameters

I'm trying to teach myself assembly. I've found a good website; however, everything is written for x86 and I use a 64-bit machine.
I know what the problem is, but I don't know how to fix it. If I run the program with strace, then here is the results:
execve("./file", ["./file", "hello"], [/* 94 vars */]) = 0
creat(NULL, 0) = -1 EINVAL (Invalid argument)
write(0, NULL, 0 <unfinished ...>
+++ exited with 234 +++
So, I know that when I call creat, that the file name "hello" is not being passed and as a result I don't have a file descriptor.
Here is the code in question:
section .text
global _start
_start:
pop rbx ; argc
pop rbx ; prog name
pop rbx ; the file name
mov eax,85 ; syscall number for creat()
mov ecx,00644Q ; rw,r,r
int 80h ; call the kernel
I know that I can use the syscall command; however, I want to use interrupt.
Any ideas or suggestions would be helpful. Also, I'm using nasm an assembler.
You attempted to use the 32 bit mechanism. If you have a 32 bit tutorial, you can of course create 32 bit programs and those will work as-is in compatibility mode.
If you want to write 64 bit code however, you will need to use the 64 bit conventions and interfaces. Here, that means the syscall instruction with the appropriate registers:
global _start
_start:
mov eax,85 ; syscall number for creat()
mov rdi,[rsp+16] ; argv[1], the file name
mov esi,00644Q ; rw,r,r
syscall ; call the kernel
xor edi, edi ; exit code 0
mov eax, 60 ; syscall number for exit()
syscall
See also the x86-64 sysv abi on wikipedia or the abi pdf for more details.

How to avoid stdin input that does not fit in buffer be sent to the shell in Linux 64-bit Intel (x86-64) assembly

Edit: Title changed, as #Gunner pointed out that this is not a buffer overflow.
In reading user input from stdin with NR_read in Linux 64-bit Intel assembly, I wonder how can I avoid that the input that does not fit in the input buffer being sent to Linux shell eg. bash? For example in this example program I have defined an input buffer of 255 bytes (the size of the buffer can be whatever >= 1). The rest of an input longer than 255 bytes is sent to bash (if running from bash) and and this is obviously a serious vulnerability. How should input be read in Linux 64-bit assembly to avoid this vulnerability?
Here's my code:
[bits 64]
section .text
global _start
; can be compiled eg. with nasm or yasm.
; nasm:
; nasm -f elf64 read_stdin_64.asm; ld read_stdin_64.o -o read_stdin_64
; yasm:
; yasm -f elf64 -m amd64 read_stdin_64.asm -o read_stdin_64.o; ld read_stdin_64.o -o read_stdin_64
NR_read equ 0
NR_exit equ 60
STDIN equ 1
; input:
; rax number of syscall
; rdi parameter 1
; rsi parameter 2
; rdx parameter 3
; r10 parameter 4
; r8 parameter 5
; r9 parameter 6
;
; output:
; rax syscall's output
#do_syscall:
push rcx
push r11
syscall ; 64-bit syscall, overwrites rcx and r11
pop r11 ; syscall's return value in rax
pop rcx
ret
#read_stdin:
push rdi
push rsi
push rdx
mov rdi,STDIN ; file handle to read. STDIN = 1.
lea rsi,[input_buffer]
mov rdx,input_buffer_length ; length of string
mov rax,NR_read ; number of syscall (0)
call #do_syscall
sub rax,1 ; get the number of writable characters.
pop rdx
pop rsi
pop rdi
ret
_start: ; linker entry point
call #read_stdin
#end_program:
xor rdi,rdi
mov rax,NR_exit ; number of syscall (60)
syscall
section .data
input_buffer times 255 db 0
input_buffer_length equ $-input_buffer
It is not a buffer overflow as others have stated. I wrote a tutorial on reading from the terminal in Linux which also shows how to deal with this issue. It uses 32-bit Int 0x80, but you can easily change it to fit your needs.
http://www.dreamincode.net/forums/topic/286248-nasm-linux-terminal-inputoutput-wint-80h/
The read syscall already has that protection built in. One other thing though: You shouldn't be explicitly using syscall. What if your code is taken to an x86-64 machine (which uses sysenter)? You should be using Linux's VDSO (virtual dynamic shared object), which contains code to do syscalls on all architectures, regardless as to wheather they support syscall, sysenter, or only int.
You could read the input until newline character is found.

assembly subroutines get called twice without even being called from main

I'm trying to define some subroutines that have calls to printf in them.
A very trivial example is as follows:
extern printf
LINUX equ 80H
EXIT equ 60
section .data
intfmt: db "%ld", 10, 0
segment .text
global main
main:
call os_return ; return to operating system
os_return:
mov rax, EXIT ; Linux system call 60 i.e. exit ()
mov rdi, 0 ; Error code 0 i.e. no errors
int LINUX ; Interrupt Linux kernel
test:
push rdi
push rsi
mov rsi, 10
mov rdi, intfmt
xor rax, rax
call printf
pop rdi
pop rsi
ret
Here test just has a call to printf that outputs the number 10 to the screen. I would not expect this to get called as I have no call to it.
However when compiling and running:
nasm -f elf64 test.asm
gcc -m64 -o test test.o
I get the output:
10
10
I'm totally baffled and wondered if someone could explain why this is happening?
int 80H invokes the 32-bit system call interface, which a) uses the 32-bit system call numbers and b) is intended for use by 32-bit code, not 64-bit code. Your code is actually performing a umask system call with random parameters.
For a 64-bit system call, use the syscall instruction instead:
...
os_return:
mov rax, EXIT ; Linux system call 60 i.e. exit ()
mov rdi, 0 ; Error code 0 i.e. no errors
syscall ; Interrupt Linux kernel
...
I would say that your call to exit is failing, so when it returns, it falls through to the test function, that prints the first 10.
Then when you return with ret you go back to the instruction just after the call os_return, that is, well os_return. The call to exit fails again and falls through to the test function again. But this time the ret returns from the main function and the program ends.
About why is the exit call failing, I cannot tell as I don't have a 64-bit system available. But you could disassemble the exit function from libc and see how it is done there. My guess is that the int LINUX interface is 32-bit only, as it exists only for historic compatibility, and 64-bit linux in not so old.

Resources