Basic input with x64 assembly code

Basic input with x64 assembly code - linux

I am writing a tutorial on basic input and output in assembly. I am using a Linux distribution (Ubuntu) that is 64 bit. For the first part of my tutorial I spoke about basic output and created a simple program like this:
global _start
section .text
_start:
mov rax,1
mov rdi,1
mov rsi,message
mov rdx,13
syscall
mov rax,60
xor rdi,rdi
syscall
section .data
message: db "Hello, World", 10
That works great. The system prints the string and exits cleanly. For the next part of my tutorial, I simply want to read one character in from the keyboard. From my understanding of this web site we change the rdi register to be 0 for a sys_read call.
I first subtract 8 from the current rsp and then load that address into the rsi register. (That is where I want to store the char). When I compile and run my program it appears to work... but the terminal seems to mimick the input I type in again.
Here is the program:
global _start
section .text
_start:
sub rsp,8 ; allocate space on the stack to read
mov rdi,0 ; set rdi to 0 to indicate a system read
mov rsi,[rsp-8]
mov rdx,1
syscall
mov rax,1
mov rdi,1
mov rsi,message
mov rdx,13
syscall
mov rax,60
xor rdi,rdi
syscall
section .data
message: db "Hello, World", 10
and this is what happens in my terminal...
matthew#matthew-Precision-WorkStation-690:~/Documents/Programming/RockPaperScissors$ nasm -felf64 rps.asm && ld rps.o && ./a.out
5
Hello, World
matthew#matthew-Precision-WorkStation-690:~/Documents/Programming/RockPaperScissors$ 5
5: command not found
matthew#matthew-Precision-WorkStation-690:~/Documents/Programming/RockPaperScissors$
The input 5 is repeated back to the terminal after the program has exited. What is the proper way to read in a single char using NASM and Linux x64?

In your first code section you have to set the SYS_CALL to 0 for SYS_READ (as mentioned rudimentically in the other answer).
So check a Linux x64 SYS_CALL list for the appropriate parameters and try
_start:
mov rax, 0 ; set SYS_READ as SYS_CALL value
sub rsp, 8 ; allocate 8-byte space on the stack as read buffer
mov rdi, 0 ; set rdi to 0 to indicate a STDIN file descriptor
lea rsi, [rsp] ; set const char *buf to the 8-byte space on stack
mov rdx, 1 ; set size_t count to 1 for one char
syscall

it appears to work... but the terminal seems to mimick the input I type in again.
No, the 5 + newline that bash reads is the one you typed. Your program waited for input but didn't actually read the input, leaving it in the kernel's terminal input buffer for bash to read after your program exited. (And bash does its own echoing of terminal input because it puts the terminal in no-echo mode before reading; the normal mechanism for characters to appear on the command line as you type is for bash to print what it reads.)
How did your program manage to wait for input without reading any? mov rsi, [rsp-8] loads 8 bytes from that address. You should have used lea to set rsi to point to that location instead of loading what was in that buffer. So read fails with -EFAULT instead of reading anything, but interestingly it doesn't check this until after waiting for there to be some terminal input.
I used strace ./foo to trace system calls made by your program:
execve("./foo", ["./foo"], 0x7ffe90b8e850 /* 51 vars */) = 0
read(0, 5
NULL, 1) = -1 EFAULT (Bad address)
write(1, "Hello, World\n", 13Hello, World
) = 13
exit(0) = ?
+++ exited with 0 +++
Normal terminal input/output is mixed with the strace output; I could have used -o foo.trace or whatever. The cleaned-up version of the read system call trace (without the 5\n mixed in) is:
read(0, NULL, 1) = -1 EFAULT (Bad address)
So (as expected for _start in a static executable under Linux), the memory below RSP was zeroed. But anything that isn't a pointer to writeable memory would have produced the same result.
zx485's answer is correct but inefficient (large code-size and an extra instruction). You don't need to worry about efficiency right away, but it's one of the main reasons for doing anything with asm and there's interesting stuff to say about this case.
You don't need to modify RSP; you can use the red-zone (memory below RSP) because you don't need to make any function calls. This is what you were trying to do with rsp-8, I think. (Or else you didn't realize that it was only safe because of special circumstances...)
The read system call's signature is
ssize_t read(int fd, void *buf, size_t count);
so fd is an integer arg, so it's only looking at edi not rdi. You don't need to write the full rdi, just the regular 32-bit edi. (32-bit operand-size is usually the most efficient thing on x86-64).
But for zero or positive integers, just setting edi also sets rdi anyway. (Anything you write to edi is zero-extended into the full rdi) And of course zeroing a register is best done with xor same,same; this is probably the best-known x86 peephole optimization trick.
As the OP later commented, reading only 1 byte will leave the newline unread, when the input is 5\n, and that would make bash read it and print an extra prompt. We can bump up the size of the read and the space for the buffer to 2 bytes. (There'd be no downside to using lea rsi, [rsp-8] and leave a gap; I'm using lea rsi, [rsp-2] to pack the buffer right below argc on the stack, or below the return value if this was a function instead of a process entry point. Mostly to show exactly how much space is needed.)
; One read of up to 2 characters
; giving the user room to type a digit + newline
_start:
;mov eax, 0 ; set SYS_READ as SYS_CALL value
xor eax, eax ; rax = __NR_read = 0 from unistd_64.h
lea rsi, [rsp-2] ; rsi = buf = rsp-2
xor edi, edi ; edi = fd = 0 (stdin)
mov edx, 2 ; rdx = count = 2 char
syscall ; sys_read(0, rsp-2, 2)
; total = 16 bytes
This assembles like so:
+ yasm -felf64 -Worphan-labels -gdwarf2 foo.asm
+ ld -o foo foo.o
ld: warning: cannot find entry symbol _start; defaulting to 0000000000400080
$ objdump -drwC -Mintel
0000000000400080 <_start>:
400080: 31 c0 xor eax,eax
400082: 48 8d 74 24 ff lea rsi,[rsp-0x1]
400087: 31 ff xor edi,edi
400089: ba 01 00 00 00 mov edx,0x1
40008e: 0f 05 syscall
; next address = ...90
; I left out the rest of the program so you can't actually *run* foo
; but I used a script that assembles + links, and disassembles the result
; The linking step is irrelevant for just looking at the code here.
By comparison, zx485's answer assembles to 31 bytes. Code size is not the most important thing, but when all else is equal, smaller is better for L1i cache density, and sometimes decode efficiency. (And my version has fewer instructions, too.)
0000000000400080 <_start>:
400080: 48 c7 c0 00 00 00 00 mov rax,0x0
400087: 48 83 ec 08 sub rsp,0x8
40008b: 48 c7 c7 00 00 00 00 mov rdi,0x0
400092: 48 8d 34 24 lea rsi,[rsp]
400096: 48 c7 c2 01 00 00 00 mov rdx,0x1
40009d: 0f 05 syscall
; total = 31 bytes
Note how those mov reg,constant instructions use the 7-byte mov r64, sign_extended_imm32 encoding. (NASM optimizes those to 5-byte mov r32, imm32 for a total of 25 bytes, but it can't optimize mov to xor because xor affects flags; you have to do that optimization yourself.)
Also, if you are going to modify RSP to reserve space, you only need mov rsi, rsp not lea. Only use lea reg1, [rsp] (with no displacement) if you're padding your code with longer instructions instead of using a NOP for alignment. For source registers other than rsp or rbp, lea won't be longer but it is still slower than mov. (But by all means use lea to copy-and-add. I'm just saying it's pointless when you can replace it with a mov.)
You could save even more space by using lea edx, [rax+1] instead of mov edx,1 at essentially no performance cost, but that's not something compilers normally do. (Although perhaps they should.)

You need to set eax to the system call number for read.

Related

Why is the RDI register missing in this "Hello world" assembly program?

I found this "Hello" (shellcode) assembly program:
SECTION .data
SECTION .text
global main
main:
mov rax, 1
mov rsi, 0x6f6c6c6548 ; "Hello" is stored in reverse order "olleH"
push rsi
mov rsi, rsp
mov rdx, 5
syscall
mov rax, 60
syscall
And I found that mov rdi, 1 is missing. In other "hello world" programs that instruction appears so I would like to understand why this happens.

I was going to say it's an intentional trick or hack to save code bytes, using argc as the file descriptor. (1 if you run it from the shell without extra command line args). main(int argc, char**argv) gets its args in EDI and RSI respectively, in the x86-64 SysV calling convention used on Linux.
But given the other choices, like mov rax, 1 instead of mov eax, edi, it's probably just a bug that got overlooked because the code happened to work.
It would not work in real shellcode for a code-injection attack, where execution would probably reach this code with garbage other than 0, 1, or 2 in EDI. The shellcode test program on the tutorial you linked calls a const char[] of machine code as the only thing in main, which will normally compile to asm that doesn't touch RDI.
This code wouldn't work for code-injection attacks based on strcpy or other C-string overflows either, since the machine code contains 00 bytes as part of mov eax, 1, mov edx, 5, and the end of that character string.
Also, modern linkers don't link .rodata into an executable segment, and -zexecstack only affects the actual stack, not all readable memory. So that shellcode test won't work, although I expect it did when written. See How to get c code to execute hex machine code? for working ways, like using a local array and compiling with -zexecstack.
That tutorial is overall not great, probably something this guy wrote while learning. (But not as bad as I expected based on this bug and the use of Kali; it's at least decently written, just missing some tricks.)
Since you're using NASM, you don't need to manually waste time looking up ASCII codes and getting the byte order correct. Unlike some assemblers, mov rsi, "Hello" / push rsi results in those bytes being in memory in source order.
You also don't need an empty .data section, especially when making shellcode which is just a self-contained snippet of machine code which can't reference anything outside itself.
Writing a 32-bit register implicitly zero-extends to 64-bit. NASM optimizes mov rax,1 into mov eax,1 for you (as you can see in the objdump -d AT&D disassembly; objdump -drwC -Mintel to use Intel-syntax disassembly similar to NASM.)
The following should work:
global main
main:
mov rax, `Hello\n ` ; non-zero padding to fill 8 bytes
push rax
mov rsi, rsp
push 1 ; push imm8
pop rax ; __NR_write
mov edi, eax ; STDOUT_FD is also 1
lea edx, [rax-1 + 6] ; EDX = 6; using 3 bytes with no zeros
syscall
mov al, 60 ; assuming write success, RAX = 5, zero outside the low byte
;lea eax, [rdi-1 + 60] ; the safe way that works even with ./hello >&- to return -EBADF
syscall
This is fewer bytes of machine code than the original, and avoids \x00 bytes which strcpy would stop on. I changed the string to end with a newline, using NASM backticks to support C-style escape sequences like \n as 0x0a byte.
Running normally (I linked it into a static executable without CRT, despite it being called main instead of _start. ld foo.o -o foo):
$ strace ./foo > /dev/null
execve("./foo", ["./foo"], 0x7ffecdc70a20 /* 54 vars */) = 0
write(1, "Hello\n", 6) = 6
exit(1) = ?
Running with stdout closed to break the mov al, 60 __NR_exit hack:
$ strace ./foo >&-
execve("./foo", ["./foo"], 0x7ffe3d24a240 /* 54 vars */) = 0
write(1, "Hello\n", 6) = -1 EBADF (Bad file descriptor)
syscall_0xffffffffffffff3c(0x1, 0x7ffd0b37a988, 0x6, 0, 0, 0) = -1 ENOSYS (Function not implemented)
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0xffffffffffffffda} ---
+++ killed by SIGSEGV (core dumped) +++
Segmentation fault (core dumped)
To still exit cleanly, use lea eax, [rdi-1 + 60] (3 bytes) instead of mov al, 60 (2 bytes) to set RAX according to the unmodified EDI, instead of depending on the upper bytes of RAX being zero which they aren't after an error return.
See also https://codegolf.stackexchange.com/questions/132981/tips-for-golfing-in-x86-x64-machine-code

Compact shellcode to print a 0-terminated string pointed-to by a register, given puts or printf at known absolute addresses?

Background: I am a beginner trying to understand how to golf assembly, in particular to solve an online challenge.
EDIT: clarification: I want to print the value at the memory address of RDX. So “SUPER SECRET!”
Create some shellcode that can output the value of register RDX in <= 11 bytes. Null bytes are not allowed.
The program is compiled with the c standard library, so I have access to the puts / printf statement. It’s running on x86 amd64.
$rax : 0x0000000000010000 → 0x0000000ac343db31
$rdx : 0x0000555555559480 → "SUPER SECRET!"
gef➤ info address puts
Symbol "puts" is at 0x7ffff7e3c5a0 in a file compiled without debugging.
gef➤ info address printf
Symbol "printf" is at 0x7ffff7e19e10 in a file compiled without debugging.
Here is my attempt (intel syntax)
xor ebx, ebx ; zero the ebx register
inc ebx ; set the ebx register to 1 (STDOUT
xchg ecx, edx ; set the ECX register to RDX
mov edx, 0xff ; set the length to 255
mov eax, 0x4 ; set the syscall to print
int 0x80 ; interrupt
hexdump of my code
My attempt is 17 bytes and includes null bytes, which aren't allowed. What other ways can I lower the byte count? Is there a way to call puts / printf while still saving bytes?
FULL DETAILS:
I am not quite sure what is useful information and what isn't.
File details:
ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=5810a6deb6546900ba259a5fef69e1415501b0e6, not stripped
Source code:
void main() {
char* flag = get_flag(); // I don't get access to the function details
char* shellcode = (char*) mmap((void*) 0x1337,12, 0, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
mprotect(shellcode, 12, PROT_READ | PROT_WRITE | PROT_EXEC);
fgets(shellcode, 12, stdin);
((void (*)(char*))shellcode)(flag);
}
Disassembly of main:
gef➤ disass main
Dump of assembler code for function main:
0x00005555555551de <+0>: push rbp
0x00005555555551df <+1>: mov rbp,rsp
=> 0x00005555555551e2 <+4>: sub rsp,0x10
0x00005555555551e6 <+8>: mov eax,0x0
0x00005555555551eb <+13>: call 0x555555555185 <get_flag>
0x00005555555551f0 <+18>: mov QWORD PTR [rbp-0x8],rax
0x00005555555551f4 <+22>: mov r9d,0x0
0x00005555555551fa <+28>: mov r8d,0xffffffff
0x0000555555555200 <+34>: mov ecx,0x22
0x0000555555555205 <+39>: mov edx,0x0
0x000055555555520a <+44>: mov esi,0xc
0x000055555555520f <+49>: mov edi,0x1337
0x0000555555555214 <+54>: call 0x555555555030 <mmap#plt>
0x0000555555555219 <+59>: mov QWORD PTR [rbp-0x10],rax
0x000055555555521d <+63>: mov rax,QWORD PTR [rbp-0x10]
0x0000555555555221 <+67>: mov edx,0x7
0x0000555555555226 <+72>: mov esi,0xc
0x000055555555522b <+77>: mov rdi,rax
0x000055555555522e <+80>: call 0x555555555060 <mprotect#plt>
0x0000555555555233 <+85>: mov rdx,QWORD PTR [rip+0x2e26] # 0x555555558060 <stdin##GLIBC_2.2.5>
0x000055555555523a <+92>: mov rax,QWORD PTR [rbp-0x10]
0x000055555555523e <+96>: mov esi,0xc
0x0000555555555243 <+101>: mov rdi,rax
0x0000555555555246 <+104>: call 0x555555555040 <fgets#plt>
0x000055555555524b <+109>: mov rax,QWORD PTR [rbp-0x10]
0x000055555555524f <+113>: mov rdx,QWORD PTR [rbp-0x8]
0x0000555555555253 <+117>: mov rdi,rdx
0x0000555555555256 <+120>: call rax
0x0000555555555258 <+122>: nop
0x0000555555555259 <+123>: leave
0x000055555555525a <+124>: ret
Register state right before shellcode is executed:
$rax : 0x0000000000010000 → "EXPLOIT\n"
$rbx : 0x0000555555555260 → <__libc_csu_init+0> push r15
$rcx : 0x000055555555a4e8 → 0x0000000000000000
$rdx : 0x0000555555559480 → "SUPER SECRET!"
$rsp : 0x00007fffffffd940 → 0x0000000000010000 → "EXPLOIT\n"
$rbp : 0x00007fffffffd950 → 0x0000000000000000
$rsi : 0x4f4c5058
$rdi : 0x00007ffff7fa34d0 → 0x0000000000000000
$rip : 0x0000555555555253 → <main+117> mov rdi, rdx
$r8 : 0x0000000000010000 → "EXPLOIT\n"
$r9 : 0x7c
$r10 : 0x000055555555448f → "mprotect"
$r11 : 0x246
$r12 : 0x00005555555550a0 → <_start+0> xor ebp, ebp
$r13 : 0x00007fffffffda40 → 0x0000000000000001
$r14 : 0x0
$r15 : 0x0
(This register state is a snapshot at the assembly line below)
●→ 0x555555555253 <main+117> mov rdi, rdx
0x555555555256 <main+120> call rax

Since I already spilled the beans and "spoiled" the answer to the online challenge in comments, I might as well write it up. 2 key tricks:
Create 0x7ffff7e3c5a0 (&puts) in a register with lea reg, [reg + disp32], using the known value of RDI which is within the +-2^31 range of a disp32. (Or use RBP as a starting point, but not RSP: that would need a SIB byte in the addressing mode).
This is a generalization of the code-golf trick of lea edi, [rax+1] trick to create small constants from other small constants (especially 0) in 3 bytes, with code that runs less slowly than push imm8 / pop reg.
The disp32 is large enough to not have any zero bytes; you have a couple registers to choose from in case one had been too close.
Copy a 64-bit register in 2 bytes with push reg / pop reg, instead of 3-byte mov rdi, rdx (REX + opcode + modrm). No savings if either push needs a REX prefix (for R8..R15), and actually costs bytes if both are "non-legacy" registers.
See other answers on Tips for golfing in x86/x64 machine code on codegolf.SE for more.
bits 64
lea rsi, [rdi - 0x166f30]
;; add rbp, imm32 ; alternative, but that would mess up a call-preserved register so we might crash on return.
push rdx
pop rdi ; copy RDX to first arg, x86-64 SysV calling convention
jmp rsi ; tailcall puts
This is exactly 11 bytes, and I don't see a way for it to be smaller. add r64, imm32 is also 7 bytes, same as LEA. (Or 6 bytes if the register is RAX, but even the xchg rax, rdi short form would cost 2 bytes to get it there, and the RAX value is still the fgets return value, which is the small mmap buffer address.)
The puts function pointer doesn't fit in 32 bits, so we need a REX prefix on any instruction that puts it into a register. Otherwise we could just mov reg, imm32 (5 bytes) with the absolute address, not deriving it from another register.
$ nasm -fbin -o exploit.bin -l /dev/stdout exploit.asm
1 bits 64
2 00000000 488DB7D090E9FF lea rsi, [rdi - 0x166f30]
3 ;; add rbp, imm32 ; we can avoid messing up any call-preserved registers
4 00000007 52 push rdx
5 00000008 5F pop rdi ; copy to first arg
6 00000009 FFE6 jmp rsi ; tailcall
$ ll exploit.bin
-rw-r--r-- 1 peter peter 11 Apr 24 04:09 exploit.bin
$ ./a.out < exploit.bin # would work if the addresses in my build matched yours
My build of your incomplete .c uses different addresses on my machine, but it does reach this code (at address 0x10000, mmap_min_addr which mmap picks after the amusing choice of 0x1337 as a hint address, which isn't even page aligned but doesn't result in EIVAL on current Linux.)
Since we only tailcall puts with correct stack alignment and don't modify any call-preserved registers, this should successfully return to main.
Note that 0 bytes (ASCII NUL, not NULL) would actually work in shellcode for this test program, if not for the requirement that forbids it.
The input is read using fgets (apparently to simulate a gets() overflow).
fgets actually can read a 0 aka '\0'; the only critical character is 0xa aka '\n' newline. See Is it possible to read null characters correctly using fgets or gets_s?
Often buffer overflows exploit a strcpy or something else that stops on a 0 byte, but fgets only stops on EOF or newline. (Or the buffer size, a feature gets is missing, hence its deprecation and removal from even the ISO C standard library! It's literally impossible to use safely unless you control the input data). So yes, it's totally normal to forbid zero bytes.
BTW, your int 0x80 attempt is not viable: What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code? - you can't use the 32-bit ABI to pass 64-bit pointers to write, and the string you want to output is not in the low 32 bits of virtual address space.
Of course, with the 64-bit syscall ABI, you're fine if you can hardcode the length.
push rdx
pop rsi
shr eax, 16 ; fun 3-byte way to turn 0x10000` into `1`, __NR_write 64-bit, instead of just push 1 / pop
mov edi, eax ; STDOUT_FD = __NR_write
lea edx, [rax + 13 - 1] ; 3 bytes. RDX = 13 = string length
; or mov dl, 0xff ; 2 bytes leaving garbage in rest of RDX
syscall
But this is 12 bytes, as well as hard-coding the length of the string (which was supposed to be part of the secret?).
mov dl, 0xff could make sure the length was at least 255, and actually much more in this case, if you don't mind getting reams of garbage after the string you want, until write hits an unmapped page and returns early. That would save a byte, making this 11.
(Fun fact, Linux write does not return an error when it's successfully written some bytes; instead it returns how many it did write. If you try again with buf + write_len, you would get a -EFAULT return value for passing a bad pointer to write.)

Print newline with as little code as possible with NASM

I'm learning a bit of assembly for fun and I am probably too green to know the right terminology and find the answer myself.
I want to print a newline at the end of my program.
Below works fine.
section .data
newline db 10
section .text
_end:
mov rax, 1
mov rdi, 1
mov rsi, newline
mov rdx, 1
syscall
mov rax, 60
mov rdi, 0
syscall
But I'm hoping to achieve the same result without defining the newline in .data. Is it possible to call sys_write directly with the byte you want, or must it always be done with a reference to some predefined data (which I assume is what mov rsi, newline is doing)?
In short, why can't I replace mov rsi, newline by mov rsi, 10?

You always need the data in memory to copy it to a file-descriptor. There is no system-call equivalent of C stdio fputc that takes data by value instead of by pointer.
mov rsi, newline puts a pointer into a register (with a huge mov r64, imm64 instruction). sys_write doesn't special-case size=1 and treat its void *buf arg as a char value if it's not a valid pointer.
There aren't any other system calls that would do the trick. pwrite and writev are both more complicated (taking a file offset as well as a pointer, or taking an array of pointer+length to gather the data in kernel space).
There is a lot you can do to optimize this for code-size, though. See https://codegolf.stackexchange.com/questions/132981/tips-for-golfing-in-x86-x64-machine-code
First, putting the newline character in static storage means you need to generate a static address in a register. Your options here are:
5-bytes mov esi, imm32 (only in Linux non-PIE executables, so static addresses are link-time constants and are known to be in the low 2GiB of virtual address space and thus work as 32-bit zero-extended or sign-extended)
7-byte lea rsi, [rel newline] Works everywhere, the only good option if you can't use the 5-byte mov-immediate.
10-byte mov rsi, imm64. This works even in PIE executables (e.g. if you link with gcc -nostdlib without -static, on a distro where PIE is the default.) But only via a runtime relocation fixup, and the code-size is terrible. Compilers never use this because it's not faster than LEA.
But like I said, we can avoid static addressing entirely: Use push to put immediate data on the stack. This works even if we need zero-terminated strings, because push imm8 and push imm32 both sign-extend the immediate to 64-bit. Since ASCII uses the low half of the 0..255 range, this is equivalent to zero-extension.
Then we just need to copy RSP to RSI, because push leave RSP pointing to the data that was pushed. mov rsi, rsp would be 3 bytes because it needs a REX prefix. If you were targeting 32-bit code or the x32 ABI (32-bit pointers in long mode) you could use 2-byte mov esi, esp. But Linux puts the stack pointer at top of user virtual address space, so on x86-64 that's 0x007ff..., right at the top of the low canonical range. So truncating a pointer to stack memory to 32 bits isn't an option; we'd get -EFAULT.
But we can copy a 64-bit register with 1-byte push + 1-byte pop. (Assuming neither register needs a REX prefix to access.)
default rel ; We don't use any explicit addressing modes, but no reason to leave this out.
_start:
push 10 ; \n
push rsp
pop rsi ; 2 bytes total vs. 3 for mov rsi,rsp
push 1 ; _NR_write call number
pop rax ; 3 bytes, vs. 5 for mov edi, 1
mov edx, eax ; length = call number by coincidence
mov edi, eax ; fd = length = call number also coincidence
syscall ; write(1, "\n", 1)
mov al, 60 ; assuming write didn't return -errno, replace the low byte and keep the high zeros
;xor edi, edi ; leave rdi = 1 from write
syscall ; _exit(1)
.size: db $ - _start
xor-zeroing is the most well-known x86 peephole optimization: it saves 3 bytes of code size, and is actually more efficient than mov edi, 0. But you only asked for the smallest code to print a newline, without specifying that it had to exit with status = 0. So we can save 2 bytes by leaving that out.
Since we're just making an _exit system call, we don't need to clean up the stack from the 10 we pushed.
BTW, this will crash if the write returns an error. (e.g. redirected to /dev/full, or closed with ./newline >&-, or whatever other condition.) That would leave RAX=-something, so mov al, 60 would give us RAX=0xffff...3c. Then we'd get -ENOSYS from the invalid call number, and fall off the end of _start and decode whatever is next as instructions. (Probably zero bytes which decode with [rax] as an addressing mode. Then we'd fault with a SIGSEGV.)
objdump -d -Mintel disassembly of that code, after building with nasm -felf64 and linking with ld
0000000000401000 <_start>:
401000: 6a 0a push 0xa
401002: 54 push rsp
401003: 5e pop rsi
401004: 6a 01 push 0x1
401006: 58 pop rax
401007: 89 c2 mov edx,eax
401009: 89 c7 mov edi,eax
40100b: 0f 05 syscall
40100d: b0 3c mov al,0x3c
40100f: 0f 05 syscall
0000000000401011 <_start.size>:
401011: 11 .byte 0x11
So the total code-size is 0x11 = 17 bytes. vs. your version with 39 bytes of code + 1 byte of static data. Your first 3 mov instructions alone are 5, 5, and 10 bytes long. (Or 7 bytes long for mov rax,1 if you use YASM which doesn't optimize it to mov eax,1).
Running it:
$ strace ./newline
execve("./newline", ["./newline"], 0x7ffd4e98d3f0 /* 54 vars */) = 0
write(1, "\n", 1
) = 1
exit(1) = ?
+++ exited with 1 +++
If this was part of a larger program:
If you already have a pointer to some nearby static data in a register, you could do something like a 4-byte lea rsi, [rdx + newline-foo] (REX.W + opcode + modrm + disp8), assuming the newline-foo offset fits in a sign-extended disp8 and that RDX holds the address of foo.
Then you can have newline: db 10 in static storage after all. (Put it .rodata or .data, depending on which section you already had a pointer to).

It expects an address of the string in rsi register. Not a character or string.
mov rsi, newline loads the address of newline into rsi.

Segmentation fault movsb nasm in 64 bits linux

I'm new in asm, and trying to use some opcodes for getting my hands on it.
I'm working on linux, 64 bits, and have allways a segmentation fault when using movsb. I compile with nasm:
nasm -f elf64 test.asm
Here is the code
DEFAULT ABS
segment data
data:
texte: db 'Hello, World !!', 10, 13
len: equ $-texte
texteBis: db 'Hello, World !.', 10, 13
segment code
global main
main:
;The problem is here
mov rsi, texteBis
mov rdi, texte
mov cx, len
rep movsb
mov dx, len
mov rcx, texte
mov bx, 1
mov ax, 4
int 0x80
mov bx,0 ; exit code, 0=normal
mov ax,1 ; exit command to kernel
int 0x80 ; interrupt 80 hex, call kernel
Other question, with string (or other large db instance), should I use
mov rsi, texte
or
mov rsi, [texte]
I didn't understand which one gives the value and which one the address.

Do you also link?
ld -e main test.o -o test
Anyways, texteBis seems to be static data, in the data segment. That page is read-only and protected for writing/execution.
You should allocate a buffer (either on the stack or on the heap if you are allowed to use a runtime library).

Your problem is that you are writing to write-protected memory, i.e. the DATA section. Once your program gets loaded into the memory, the DATA section is actually on a read-only page. You have to use stack memory (or dynamically allocated memory) and use that as the destination of your string copy.
Example:
sub rsp, len ; move stack pointer down 'len' bytes
mov rsi, texteBis
mov rdi, rsp ; use address of stack pointer as dest.
xor rcx,rcx ; cx = 0
mov cx, len
rep movsb
That should fix your problem. As in C, it is important to allocate enough space or you will overwrite data on the stack.
Assigning values to registers
Another thing that I noticed is that you often write to sub-parts of registers, e.g.
mov dx, len
This is dangerous since other parts are not overwritten by this. Only the lowest 16 bit of the register are written. Say rdx, a 64 bit value was set to 0xffffffffffffffff. Then rdx would look like this after your move: 0xffffffffffff0011. The calling code probably reads rdx completely and therefore interprets a length of 0xffffffffffff0011 byte. Not what you want. Solution:
xor rdx,rdx
mov dx, len
or
mov rdx, len
Tools that might help you later
Note, gdb will help you find where your error is happening and will also give you additional information (such as register values and stack values). Excerpt:
Program received signal SIGSEGV, Segmentation fault.
0x00000000004005bb in main ()
(gdb) disassemble
Dump of assembler code for function main:
0x00000000004005a6: sub $0x13,%rsp
0x00000000004005aa: mov -0x1c(%rip),%rsi # 0x400595
0x00000000004005b1: mov %rsp,%rdi
0x00000000004005b4: xor %cx,%cx
0x00000000004005b7: mov $0x11,%cx
=> 0x00000000004005bb: rep movsb %ds:(%rsi),%es:(%rdi)
0x00000000004005bd: mov $0x11,%dx
0x00000000004005c1: movabs $0x400584,%rcx
0x00000000004005cb: mov $0x1,%bx
0x00000000004005cf: mov $0x4,%ax
0x00000000004005d3: int $0x80
0x00000000004005d5: mov $0x0,%bx
0x00000000004005d9: mov $0x1,%ax
0x00000000004005dd: int $0x80
End of assembler dump.
(gdb) info registers rsi
rsi 0x57202c6f6c6c6548 6278066737626506568
Since nasm does not support a useful debugging format but it is often the case that you want to break on certain occasions, you can use the int3 instruction to raise a SIGTRAP at a certain point in the code:
mov eax, 10
int3 ; debugger will catch signal here
Hope that helps getting you started in assembly.

You don't need to use dynamic memory. Your data segment or section is read-only because is not an standard section and you are not defining it's attributes and by default nasm assign them as read only data sections.
Using objdump -h with you code outputs the following:
0 data 00000022 0000000000000000 0000000000000000 00000200 2**0
CONTENTS, ALLOC, LOAD, READONLY, DATA
1 code 0000003c 0000000000000000 0000000000000000 00000230 2**0
CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA
If you change the name of your segements to .data and .text the program runs perfectly and objdump outputs:
0 .data 00000022 0000000000000000 0000000000000000 00000200 2**2
CONTENTS, ALLOC, LOAD, DATA
1 .text 0000003c 0000000000000000 0000000000000000 00000230 2**4
CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
Which are the correct attributes for what you intend with your sections.
To get more info on what attributes means I recommend this page:
https://www.tortall.net/projects/yasm/manual/html/objfmt-elf-section.html

x86 assembly, little endianness not being followed(or is it?) (Linux)

I am new to assembly language programming and I wrote a small program to print the integer using sys_write system call. Here's my code :
section .data
N: dw 216
chr: dw ,0,0,0,0x0a
section .bss
section .text
global _start
_start:
xor ax, ax
mov ax, word [N]
mov cx, 10
mov ebx,4
shift_while: div cx
add dx, 0x0030
mov word [chr+ebx],dx
sub ebx, 2
xor dx, dx
cmp ax, 0
jne shift_while
call printchar
exit: mov eax, 1
mov ebx, 0
int 80h
printchar: pushad
mov eax, 4
mov ebx, 1
mov ecx, chr
mov edx, 8
int 80h
popad
ret
I have hard coded 216, the number to be printed and I am getting the correct output. However what I am bemused by is the "mov word [chr+ebx],dx" instruction. dx contains 0x0032 in the first iteration so at the address [chr+ebx] this value should be stored as
32 00 (hex). But when I examined chr memory using gdb, it showed:
(gdb) x /5hx 0x80490d2
0x80490d2 <chr>: 0x0032 0x0031 0x0036 0x000a
what I expected was 0x3200 0x3100 0x3600 x0a00 and thought I'd have to do further memory manipulation to get the right result.
Am I going wrong somewhere with this. Are there things I can't seem to see. I'd really appreciate a little help here. This is my first first post on stackoverflow.

It's just a representation thing - what you have in memory from a byte-wise perspective is
32 00 31 00 26 00 0a 00
but when you view this as 16 bit values it's
0032 0031 0026 000a
Similarly, if you viewed it as 32 bit values it would be:
00310032 000a0026
Such is the weirdness of little endianness. ;-)

gdb is helping you out here.
You asked for the h (halfword) format, on a little-endian platform, so it is decoding the memory as 16-bit little endian-values for you.
If you use the b format instead, you'll see something more like you expected.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string