I'm trying to write byte 0xff to the parallel port at 0x378. It compiles and links without issue, but segfaults at the OUTSB instruction.
section .text
global _start
_err_exit:
mov eax, 1
mov ebx, 1
int 80h
_start:
mov eax, 101 ; ioperm
mov ebx, 0x378 ; Parallel port addr
mov ecx, 2 ; number of bytes to 'unlock'
mov edx, 1 ; enable
int 80h
mov esi, 0xff
mov dx, 0x378
outsb
mov eax, 1 ; exit
mov ebx, 0
int 80h
If I step through it with GDB and check the registers just before the OUTSB instruction, it doesn't look like there is anything in the DX register? or dx == edx in 32bit?
(gdb) info registers
eax 0x0 0
ecx 0x2 2
edx 0x378 888
ebx 0x378 888
esp 0xffffd810 0xffffd810
ebp 0x0 0x0
esi 0xff 255
edi 0x0 0
eip 0x8048090 0x8048090 <_start+36>
eflags 0x246 [ PF ZF IF ]
cs 0x23 35
ss 0x2b 43
ds 0x2b 43
es 0x2b 43
fs 0x0 0
gs 0x0 0
What am I doing wrong here?
(info on the OUTS instructions: http://siyobik.info/main/reference/instruction/OUTS%2FOUTSB%2FOUTSW%2FOUTSD)
EDIT:
The C version of the program works:
int main(int argc, char *argv[])
{
int addr = 0x378;
int result = ioperm(addr,5,1);
outb(0xff, addr);
}
There is a number of issues with that code. Firstly, you seem to forget that OUTSB is a privileged instruction, i.e. it can be executed only if the calling process has ring 0 access, i.e. it's a part of the kernel code. As far as I'm aware, the only code in Linux that has access to privileged instructions is the kernel itself, and the modules that it loads. All the other processes will give you a Segmentation fault (which is actually a General Protection Fault signalled by the CPU) when you try to execute a privileged instruction from a nonprivileged segment of code. I don't know how calling the ioperm syscall influences that, though.
Secondly, OUTSB writes a byte from a memory location specified by ESI to the I/O port in DX. In this case, you're telling the processor to write data to the port from location 0xff, to which the process surely doesn't have access. You can simplify that by simply changing the code to use the OUT instruction, since OUTSB is rather meant to be used with the REP prefix. Try this :
mov al, 0xff
out 0x378, al
This outputs the byte in al to the I/O port specified by the immediate operand, in this case 0x378.
Let me know how that turned out.
Related
I've created a string and turned it into an array. Looping through each index and moving to the al register so it can print out to the vga. The problem is, it prints the size of the string with no problem, but the characters in gibberish. Can you please help me figure out what the problem is in the code. It will be highly appreciated.
org 0
bits 16
section .text
global _start
_start:
mov si, msg
loop:
inc si
mov ah, 0x0e
mov al, [si]
or al, al
jz end
mov bh, 0x00
int 0x10
jmp loop
end:
jmp .done
.done:
jmp $
msg db 'Hello, world!',0xa
len equ $ - msg
TIMES 510 - ($ - $$) db 0
DW 0xAA55
bootloader code
ORG 0x7c00
BITS 16
boot:
mov ah, 0x02
mov al, 0x01
mov ch, 0x00
mov cl, 0x02
mov dh, 0x00
mov dl, 0x00
mov bx, 0x1000
mov es, bx
int 0x13
jmp 0x1000:0x00
times 510 - ($ - $$) db 0
dw 0xAA55
The bootloader
Before tackling the kernel code, let's look at the bootloader that brings the kernel in memory.
You have written a very minimalistic version of a bootloader, one that omits much of the usual stuff like setting up segment registers, but thanks to its reduced nature that's not really a problem.
What could be a problem is that you wrote mov dl, 0x00, hardcoding a zero to select the first floppy as your bootdisk. No problem if this is indeed the case, but it would be much better to just use whatever value the BIOS preloaded the DL register with. That's the ID for the disk that holds your bootloader and kernel.
What is a problem is that you load the kernel to the segmented address 0x1000:0x1000 and then later jump to the segmented address 0x1000:0x0000 which is 4096 bytes short of the kernel. You got lucky that the kernel code did run in the end, thanks to the memory between these two addresses most probably being filled with zero-bytes that (two by two) translate into the instruction add [bx+si], al. Because you omitted setting up the DS segment register, we don't know what unlucky byte got overwritten so many times. Let's hope it was not an important byte...
mov bx, 0x1000
mov es, bx
xor bx, bx <== You forgot to write this instruction!
int 0x13
jmp 0x1000:0x0000
What is a problem is that you ignore the possibility of encountering troubles when loading a sector from the disk. At the very least you should inspect the carry flag that the BIOS.ReadSector function 02h reports and if the flag is set you could abort cleanly. A more sophisticated approach would also retry a limited number of times, say 3 times.
ORG 0x7C00
BITS 16
; IN (dl)
mov dh, 0x00 ; DL is bootdrive
mov cx, 0x0002
mov bx, 0x1000
mov es, bx
xor bx, bx
mov ax, 0x0201 ; BIOS.ReadSector
int 0x13 ; -> AH CF
jc ERR
jmp 0x1000:0x0000
ERR:
cli
hlt
jmp ERR
times 510 - ($ - $$) db 0
dw 0xAA55
The kernel
After the jmp 0x1000:0x0000 instruction has brought you to the first instruction of your kernel, the CS code segment register holds the value 0x1000. None of the other segment registers did change, and since you did not setup any of them in the bootloader, we still don't know what any of them contain. However in order to retrieve the bytes from the message at msg with the mov al, [si] instruction, we need a correct value for the DS data segment register. In accordance with the ORG 0 directive, the correct value is the one we already have in CS. Just two 1-byte instructions are needed: push cs pop ds.
There's more to be said about the kernel code:
The printing loop uses a pre-increment on the pointer in the SI register. Because of this the first character of the string will not get displayed. You could compensate for this via mov si, msg - 1.
The printing loop processes a zero-terminating string. You don't need to prepare that len equate. What you do need is an explicit zero byte that terminates the string. You should not rely on that large number of zero bytes thattimes produced. In some future version of the code there might be no zero byte at all!
You (think you) have included a newline (0xa) in the string. For the BIOS.Teletype function 0Eh, this is merely a linefeed that moves down on the screen. To obtain a newline, you need to include both carriage return (13) and linefeed (10).
There's no reason for your kernel code to have the bootsector signature bytes at offset 510. Depending on how you get this code to the disk, it might be necessary to pad the code up to (a multiple of) 512, so keep times 512 - ($ - $$) db 0.
The kernel:
ORG 0
BITS 16
section .text
global _start
_start:
push cs
pop ds
mov si, msg
mov bx, 0x0007 ; DisplayPage=0, GraphicsColor=7 (White)
jmp BeginLoop
PrintLoop:
mov ah, 0x0E ; BIOS.Teletype
int 0x10
BeginLoop:
mov al, [si]
inc si
test al, al
jnz PrintLoop
cli
hlt
jmp $-2
msg db 'Hello, world!', 13, 10, 0
TIMES 512 - ($ - $$) db 0
Background: I am a beginner trying to understand how to golf assembly, in particular to solve an online challenge.
EDIT: clarification: I want to print the value at the memory address of RDX. So “SUPER SECRET!”
Create some shellcode that can output the value of register RDX in <= 11 bytes. Null bytes are not allowed.
The program is compiled with the c standard library, so I have access to the puts / printf statement. It’s running on x86 amd64.
$rax : 0x0000000000010000 → 0x0000000ac343db31
$rdx : 0x0000555555559480 → "SUPER SECRET!"
gef➤ info address puts
Symbol "puts" is at 0x7ffff7e3c5a0 in a file compiled without debugging.
gef➤ info address printf
Symbol "printf" is at 0x7ffff7e19e10 in a file compiled without debugging.
Here is my attempt (intel syntax)
xor ebx, ebx ; zero the ebx register
inc ebx ; set the ebx register to 1 (STDOUT
xchg ecx, edx ; set the ECX register to RDX
mov edx, 0xff ; set the length to 255
mov eax, 0x4 ; set the syscall to print
int 0x80 ; interrupt
hexdump of my code
My attempt is 17 bytes and includes null bytes, which aren't allowed. What other ways can I lower the byte count? Is there a way to call puts / printf while still saving bytes?
FULL DETAILS:
I am not quite sure what is useful information and what isn't.
File details:
ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=5810a6deb6546900ba259a5fef69e1415501b0e6, not stripped
Source code:
void main() {
char* flag = get_flag(); // I don't get access to the function details
char* shellcode = (char*) mmap((void*) 0x1337,12, 0, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
mprotect(shellcode, 12, PROT_READ | PROT_WRITE | PROT_EXEC);
fgets(shellcode, 12, stdin);
((void (*)(char*))shellcode)(flag);
}
Disassembly of main:
gef➤ disass main
Dump of assembler code for function main:
0x00005555555551de <+0>: push rbp
0x00005555555551df <+1>: mov rbp,rsp
=> 0x00005555555551e2 <+4>: sub rsp,0x10
0x00005555555551e6 <+8>: mov eax,0x0
0x00005555555551eb <+13>: call 0x555555555185 <get_flag>
0x00005555555551f0 <+18>: mov QWORD PTR [rbp-0x8],rax
0x00005555555551f4 <+22>: mov r9d,0x0
0x00005555555551fa <+28>: mov r8d,0xffffffff
0x0000555555555200 <+34>: mov ecx,0x22
0x0000555555555205 <+39>: mov edx,0x0
0x000055555555520a <+44>: mov esi,0xc
0x000055555555520f <+49>: mov edi,0x1337
0x0000555555555214 <+54>: call 0x555555555030 <mmap#plt>
0x0000555555555219 <+59>: mov QWORD PTR [rbp-0x10],rax
0x000055555555521d <+63>: mov rax,QWORD PTR [rbp-0x10]
0x0000555555555221 <+67>: mov edx,0x7
0x0000555555555226 <+72>: mov esi,0xc
0x000055555555522b <+77>: mov rdi,rax
0x000055555555522e <+80>: call 0x555555555060 <mprotect#plt>
0x0000555555555233 <+85>: mov rdx,QWORD PTR [rip+0x2e26] # 0x555555558060 <stdin##GLIBC_2.2.5>
0x000055555555523a <+92>: mov rax,QWORD PTR [rbp-0x10]
0x000055555555523e <+96>: mov esi,0xc
0x0000555555555243 <+101>: mov rdi,rax
0x0000555555555246 <+104>: call 0x555555555040 <fgets#plt>
0x000055555555524b <+109>: mov rax,QWORD PTR [rbp-0x10]
0x000055555555524f <+113>: mov rdx,QWORD PTR [rbp-0x8]
0x0000555555555253 <+117>: mov rdi,rdx
0x0000555555555256 <+120>: call rax
0x0000555555555258 <+122>: nop
0x0000555555555259 <+123>: leave
0x000055555555525a <+124>: ret
Register state right before shellcode is executed:
$rax : 0x0000000000010000 → "EXPLOIT\n"
$rbx : 0x0000555555555260 → <__libc_csu_init+0> push r15
$rcx : 0x000055555555a4e8 → 0x0000000000000000
$rdx : 0x0000555555559480 → "SUPER SECRET!"
$rsp : 0x00007fffffffd940 → 0x0000000000010000 → "EXPLOIT\n"
$rbp : 0x00007fffffffd950 → 0x0000000000000000
$rsi : 0x4f4c5058
$rdi : 0x00007ffff7fa34d0 → 0x0000000000000000
$rip : 0x0000555555555253 → <main+117> mov rdi, rdx
$r8 : 0x0000000000010000 → "EXPLOIT\n"
$r9 : 0x7c
$r10 : 0x000055555555448f → "mprotect"
$r11 : 0x246
$r12 : 0x00005555555550a0 → <_start+0> xor ebp, ebp
$r13 : 0x00007fffffffda40 → 0x0000000000000001
$r14 : 0x0
$r15 : 0x0
(This register state is a snapshot at the assembly line below)
●→ 0x555555555253 <main+117> mov rdi, rdx
0x555555555256 <main+120> call rax
Since I already spilled the beans and "spoiled" the answer to the online challenge in comments, I might as well write it up. 2 key tricks:
Create 0x7ffff7e3c5a0 (&puts) in a register with lea reg, [reg + disp32], using the known value of RDI which is within the +-2^31 range of a disp32. (Or use RBP as a starting point, but not RSP: that would need a SIB byte in the addressing mode).
This is a generalization of the code-golf trick of lea edi, [rax+1] trick to create small constants from other small constants (especially 0) in 3 bytes, with code that runs less slowly than push imm8 / pop reg.
The disp32 is large enough to not have any zero bytes; you have a couple registers to choose from in case one had been too close.
Copy a 64-bit register in 2 bytes with push reg / pop reg, instead of 3-byte mov rdi, rdx (REX + opcode + modrm). No savings if either push needs a REX prefix (for R8..R15), and actually costs bytes if both are "non-legacy" registers.
See other answers on Tips for golfing in x86/x64 machine code on codegolf.SE for more.
bits 64
lea rsi, [rdi - 0x166f30]
;; add rbp, imm32 ; alternative, but that would mess up a call-preserved register so we might crash on return.
push rdx
pop rdi ; copy RDX to first arg, x86-64 SysV calling convention
jmp rsi ; tailcall puts
This is exactly 11 bytes, and I don't see a way for it to be smaller. add r64, imm32 is also 7 bytes, same as LEA. (Or 6 bytes if the register is RAX, but even the xchg rax, rdi short form would cost 2 bytes to get it there, and the RAX value is still the fgets return value, which is the small mmap buffer address.)
The puts function pointer doesn't fit in 32 bits, so we need a REX prefix on any instruction that puts it into a register. Otherwise we could just mov reg, imm32 (5 bytes) with the absolute address, not deriving it from another register.
$ nasm -fbin -o exploit.bin -l /dev/stdout exploit.asm
1 bits 64
2 00000000 488DB7D090E9FF lea rsi, [rdi - 0x166f30]
3 ;; add rbp, imm32 ; we can avoid messing up any call-preserved registers
4 00000007 52 push rdx
5 00000008 5F pop rdi ; copy to first arg
6 00000009 FFE6 jmp rsi ; tailcall
$ ll exploit.bin
-rw-r--r-- 1 peter peter 11 Apr 24 04:09 exploit.bin
$ ./a.out < exploit.bin # would work if the addresses in my build matched yours
My build of your incomplete .c uses different addresses on my machine, but it does reach this code (at address 0x10000, mmap_min_addr which mmap picks after the amusing choice of 0x1337 as a hint address, which isn't even page aligned but doesn't result in EIVAL on current Linux.)
Since we only tailcall puts with correct stack alignment and don't modify any call-preserved registers, this should successfully return to main.
Note that 0 bytes (ASCII NUL, not NULL) would actually work in shellcode for this test program, if not for the requirement that forbids it.
The input is read using fgets (apparently to simulate a gets() overflow).
fgets actually can read a 0 aka '\0'; the only critical character is 0xa aka '\n' newline. See Is it possible to read null characters correctly using fgets or gets_s?
Often buffer overflows exploit a strcpy or something else that stops on a 0 byte, but fgets only stops on EOF or newline. (Or the buffer size, a feature gets is missing, hence its deprecation and removal from even the ISO C standard library! It's literally impossible to use safely unless you control the input data). So yes, it's totally normal to forbid zero bytes.
BTW, your int 0x80 attempt is not viable: What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code? - you can't use the 32-bit ABI to pass 64-bit pointers to write, and the string you want to output is not in the low 32 bits of virtual address space.
Of course, with the 64-bit syscall ABI, you're fine if you can hardcode the length.
push rdx
pop rsi
shr eax, 16 ; fun 3-byte way to turn 0x10000` into `1`, __NR_write 64-bit, instead of just push 1 / pop
mov edi, eax ; STDOUT_FD = __NR_write
lea edx, [rax + 13 - 1] ; 3 bytes. RDX = 13 = string length
; or mov dl, 0xff ; 2 bytes leaving garbage in rest of RDX
syscall
But this is 12 bytes, as well as hard-coding the length of the string (which was supposed to be part of the secret?).
mov dl, 0xff could make sure the length was at least 255, and actually much more in this case, if you don't mind getting reams of garbage after the string you want, until write hits an unmapped page and returns early. That would save a byte, making this 11.
(Fun fact, Linux write does not return an error when it's successfully written some bytes; instead it returns how many it did write. If you try again with buf + write_len, you would get a -EFAULT return value for passing a bad pointer to write.)
I recently make input using Linux assembly (x86_64) i got this code and i try it eventually the code are doesn't work like what i expected, i should wait a keystroke but it auto input from no where, i suspect termios flags... code are in below :
;Get current settings
Mov EAX, 16 ; SYS_ioctl
Mov EDI, 0 ; STDIN_FILENO
Mov ESI, 0x5401 ; TCGETS
Mov RDX, termios
Int 80h
And dword [c_cflag], 0xFD ; Clear ICANON to disable canonical mode
; Write termios structure back
Mov EAX, 16 ; SYS_ioctl
Mov EDI, 0 ; STDIN_FILENO
Mov ESI, 0x5402 ; TCSETS
Mov RDX, termios
Int 80h
Mov EAX,0 ;sys_read kernel call
Mov EBX,0 ;stdin trap (standart input)
Mov ECX,Nada ;Masukkan offset/jumlah byte yang akan di baca
Mov EDX,1 ;Jumlah byte yang dibaca
Int 80h ;Call Kernel
for the termios struct :
SECTION .bss ;deklarasi untuk variable yang belum terdefinisi
Enter: Resb 1 ;Pesan 1 byte untuk Enter
Nada: Resb 1
termios:
c_iflag Resd 1 ; input mode flags
c_oflag Resd 1 ; output mode flags
c_cflag Resd 1 ; control mode flags
c_lflag Resd 1 ; local mode flags
c_line Resb 1 ; line discipline
c_cc Resb 64 ; control characters
for the output :
nasm -f elf64 -g -F stabs key.asm
ld -o KeyPress key.o
./KeyPress
Untuk memulai tekan tombol enter:
Tekan tombol untuk memainkan satu not: (1,2,3,4,5,6,7,8)
//this part are the error occur,i have to check if user inputed right value
if not it will jump to error label and printed below message//
Error note not found please contact the app developer !!
reference : Linux Getch(), My Github Repo
PS: For the newest code i already push on my repository i use ubuntu 20.04 and Intel i7 (64-bit), thanks for the help
... i got this code and i try it eventually the code are doesn't work like what i expected ...
Mov ESI, 0x5401
Mov RDX, termios
Int 80h
This won't work:
Int 80h is the 32-bit system call used in 32-bit programs. The first three arguments are passed in EBX, ECX and EDX, and definitely not in ESI.
And the values of EAX required for Int 80h differ from the method used in 64-bit programs: read() would be EAX=3, not EAX=0.
Int 80h seems to work in 64-bit programs, too, however, passing 64-bit values wont work; so you cannot use Int 80h for system calls that take addresses (in the example: the address of termios) as argument.
Either you assemble and link your code as 32-bit program, use int 80h, pass the arguments in EBX, ECX and EDX and use the values in EAX required for 32-bit programs (for example: EAX=3 for read()):
mov eax, 54 ; sys_ioctl when using "int 80h"
mov ebx, 0 ; stdin
mov ecx, 0x5402 ; TCSETS
mov edx, termios
int 80h
Or you build a 64-bit program and use the syscall instruction to call system calls (see this question):
mov eax, 0 ; sys_read when using "syscall"
; note that this instruction will actually set RAX to 0
mov edi, 0 ; set RDI to stdin (implicitly sets rdi)
mov rsi, Nada ; Address of the buffer (see below)
; we explicitly have to use "rsi" here!
mov edx, 1 ; number of bytes
syscall
mov ecx, Nada
I don't use "nasm" but another assembler; so maybe I am wrong. But as far as I know the instruction above would be interpreted by "nasm" as:
Read the value stored in the RAM at the address Nada and write that value to the ecx register.
However, you want the address of Nada to be written to the ecx register.
As far as I know, this instruction would be written as: mov ecx, offset Nada in "masm".
If this is true, the corresponding line in my example above must be: mov rsi, offset Nada.
And dword [c_cflag], 0xFD ; Clear ICANON to disable canonical mode
This line contains two errors:
ICANON is located in C_LFLAG, not in C_CFLAG.
And this instruction would be identical to the C/C++ instruction: c_cflag &= ~0xFFFFFF02, but you want to do: c_cflag &= ~2.
To clear bit 1 only, you have two possibilities:
And byte [c_lflag], 0xFD
; OR:
And dword [c_lflag], 0xFFFFFFFD
I want to make Linux just take 1 keystroke from keyboard using sys_read, but sys_read just wait until i pressed enter. How to read 1 keystroke ? this is my code:
Mov EAX,3
Mov EBX,0
Mov ECX,Nada
Mov EDX,1
Int 80h
Cmp ECX,49
Je Do_C
Jmp Error
I already tried using BIOS interrupt but it's failed (Segmentation fault), I want capture number 1 to 8 input from keyboard.
Syscalls in 64-bit linux
The tables from man syscall provide a good overview here:
arch/ABI instruction syscall # retval Notes
──────────────────────────────────────────────────────────────────
i386 int $0x80 eax eax
x86_64 syscall rax rax See below
arch/ABI arg1 arg2 arg3 arg4 arg5 arg6 arg7 Notes
──────────────────────────────────────────────────────────────────
i386 ebx ecx edx esi edi ebp -
x86_64 rdi rsi rdx r10 r8 r9 -
I have omitted the lines that are not relevant here. In 32-bit mode, the parameters were transferred in ebx, ecx, etc and the syscall number is in eax. In 64-bit mode it is a little different: All registers are now 64-bit wide and therefore have a different name. The syscall number is still in eax, which now becomes rax. But the parameters are now passed in rdi, rsi, etc. In addition, the instruction syscall is used here instead of int 0x80 to trigger a syscall.
The order of the parameters can also be read in the man pages, here man 2 ioctl and man 2 read:
int ioctl(int fd, unsigned long request, ...);
ssize_t read(int fd, void *buf, size_t count);
So here the value of int fd is in rdi, the second parameter in rsi etc.
How to get rid of waiting for a newline
Firstly create a termios structure in memory (in .bss section):
termios:
c_iflag resd 1 ; input mode flags
c_oflag resd 1 ; output mode flags
c_cflag resd 1 ; control mode flags
c_lflag resd 1 ; local mode flags
c_line resb 1 ; line discipline
c_cc resb 19 ; control characters
Then get the current terminal settings and disable canonical mode:
; Get current settings
mov eax, 16 ; syscall number: SYS_ioctl
mov edi, 0 ; fd: STDIN_FILENO
mov esi, 0x5401 ; request: TCGETS
mov rdx, termios ; request data
syscall
; Modify flags
and byte [c_lflag], 0FDh ; Clear ICANON to disable canonical mode
; Write termios structure back
mov eax, 16 ; syscall number: SYS_ioctl
mov edi, 0 ; fd: STDIN_FILENO
mov esi, 0x5402 ; request: TCSETS
mov rdx, termios ; request data
syscall
Now you can use sys_read to read in the keystroke:
mov eax, 0 ; syscall number: SYS_read
mov edi, 0 ; int fd: STDIN_FILENO
mov rsi, buf ; void* buf
mov rdx, len ; size_t count
syscall
Afterwards check the return value in rax: It contains the number of characters read.
(Or a -errno code on error, e.g. if you closed stdin by running ./a.out <&- in bash. Use strace to print a decoded trace of the system calls your program makes, so you don't need to actually write error handling in toy experiments.)
References:
What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code?
Why does the sys_read system call end when it detects a new line?
How do i read single character input from keyboard using nasm (assembly) under ubuntu?
Using the raw keyboard mode under Linux (external site with example in 32-bit assembly)
I was experimenting and have the following assembly code, which works very well, except that I get a "Segmentation fault (core dumped)" message right before my program ends:
GLOBAL _start
%define ___STDIN 0
%define ___STDOUT 1
%define ___SYSCALL_WRITE 0x04
segment .data
segment .rodata
L1 db "hello World", 10, 0
segment .bss
segment .text
_start:
mov eax, ___SYSCALL_WRITE
mov ebx, ___STDOUT
mov ecx, L1
mov edx, 13
int 0x80
It doesn't matter whether or not I have ret at the end; I still get the message.
What's the problem?
I'm using x86 and nasm.
You can't ret from start; it isn't a function and there's no return address on the stack. The stack pointer points at argc on process entry.
As n.m. said in the comments, the issue is that you aren't exiting the program, so execution runs off into garbage code and you get a segfault.
What you need is:
;; Linux 32-bit x86
%define ___SYSCALL_EXIT 1
// ... at the end of _start:
mov eax, ___SYSCALL_EXIT
mov ebx, 0
int 0x80
(The above is 32-bit code. In 64-bit code you want mov eax, 231 (exit_group) / syscall, with the exit status in EDI. For example:
;; Linux x86-64
xor edi, edi ; or mov edi, eax if you have a ret val in EAX
mov eax, 231 ; __NR_exit_group
syscall