error: invalid combination of opcode and operands - nasm

I am new to NASM. I am getting the error:
invalid combination of opcode and operands
on the first line below
mov si,bl ;si contains address of number string
mov cx,7 ;once for each line
jmp print_num ;print the number
loop line_loop ;decrement cx, repeat if cx<>0
int 20h

si is a 16 bit register while bl is a 8 bit register. You can only use mov instructions when the operands are both in same bits.
As a solution to your problem, use bx instead of bl
mov si,bx
It is because Intel 8086 processor uses 16 bit addressing instead of 8 bit.
By the way, you can use esi and ebx when programming 32 bit applications.

Related

Compact shellcode to print a 0-terminated string pointed-to by a register, given puts or printf at known absolute addresses?

Background: I am a beginner trying to understand how to golf assembly, in particular to solve an online challenge.
EDIT: clarification: I want to print the value at the memory address of RDX. So “SUPER SECRET!”
Create some shellcode that can output the value of register RDX in <= 11 bytes. Null bytes are not allowed.
The program is compiled with the c standard library, so I have access to the puts / printf statement. It’s running on x86 amd64.
$rax : 0x0000000000010000 → 0x0000000ac343db31
$rdx : 0x0000555555559480 → "SUPER SECRET!"
gef➤ info address puts
Symbol "puts" is at 0x7ffff7e3c5a0 in a file compiled without debugging.
gef➤ info address printf
Symbol "printf" is at 0x7ffff7e19e10 in a file compiled without debugging.
Here is my attempt (intel syntax)
xor ebx, ebx ; zero the ebx register
inc ebx ; set the ebx register to 1 (STDOUT
xchg ecx, edx ; set the ECX register to RDX
mov edx, 0xff ; set the length to 255
mov eax, 0x4 ; set the syscall to print
int 0x80 ; interrupt
hexdump of my code
My attempt is 17 bytes and includes null bytes, which aren't allowed. What other ways can I lower the byte count? Is there a way to call puts / printf while still saving bytes?
FULL DETAILS:
I am not quite sure what is useful information and what isn't.
File details:
ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=5810a6deb6546900ba259a5fef69e1415501b0e6, not stripped
Source code:
void main() {
char* flag = get_flag(); // I don't get access to the function details
char* shellcode = (char*) mmap((void*) 0x1337,12, 0, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
mprotect(shellcode, 12, PROT_READ | PROT_WRITE | PROT_EXEC);
fgets(shellcode, 12, stdin);
((void (*)(char*))shellcode)(flag);
}
Disassembly of main:
gef➤ disass main
Dump of assembler code for function main:
0x00005555555551de <+0>: push rbp
0x00005555555551df <+1>: mov rbp,rsp
=> 0x00005555555551e2 <+4>: sub rsp,0x10
0x00005555555551e6 <+8>: mov eax,0x0
0x00005555555551eb <+13>: call 0x555555555185 <get_flag>
0x00005555555551f0 <+18>: mov QWORD PTR [rbp-0x8],rax
0x00005555555551f4 <+22>: mov r9d,0x0
0x00005555555551fa <+28>: mov r8d,0xffffffff
0x0000555555555200 <+34>: mov ecx,0x22
0x0000555555555205 <+39>: mov edx,0x0
0x000055555555520a <+44>: mov esi,0xc
0x000055555555520f <+49>: mov edi,0x1337
0x0000555555555214 <+54>: call 0x555555555030 <mmap#plt>
0x0000555555555219 <+59>: mov QWORD PTR [rbp-0x10],rax
0x000055555555521d <+63>: mov rax,QWORD PTR [rbp-0x10]
0x0000555555555221 <+67>: mov edx,0x7
0x0000555555555226 <+72>: mov esi,0xc
0x000055555555522b <+77>: mov rdi,rax
0x000055555555522e <+80>: call 0x555555555060 <mprotect#plt>
0x0000555555555233 <+85>: mov rdx,QWORD PTR [rip+0x2e26] # 0x555555558060 <stdin##GLIBC_2.2.5>
0x000055555555523a <+92>: mov rax,QWORD PTR [rbp-0x10]
0x000055555555523e <+96>: mov esi,0xc
0x0000555555555243 <+101>: mov rdi,rax
0x0000555555555246 <+104>: call 0x555555555040 <fgets#plt>
0x000055555555524b <+109>: mov rax,QWORD PTR [rbp-0x10]
0x000055555555524f <+113>: mov rdx,QWORD PTR [rbp-0x8]
0x0000555555555253 <+117>: mov rdi,rdx
0x0000555555555256 <+120>: call rax
0x0000555555555258 <+122>: nop
0x0000555555555259 <+123>: leave
0x000055555555525a <+124>: ret
Register state right before shellcode is executed:
$rax : 0x0000000000010000 → "EXPLOIT\n"
$rbx : 0x0000555555555260 → <__libc_csu_init+0> push r15
$rcx : 0x000055555555a4e8 → 0x0000000000000000
$rdx : 0x0000555555559480 → "SUPER SECRET!"
$rsp : 0x00007fffffffd940 → 0x0000000000010000 → "EXPLOIT\n"
$rbp : 0x00007fffffffd950 → 0x0000000000000000
$rsi : 0x4f4c5058
$rdi : 0x00007ffff7fa34d0 → 0x0000000000000000
$rip : 0x0000555555555253 → <main+117> mov rdi, rdx
$r8 : 0x0000000000010000 → "EXPLOIT\n"
$r9 : 0x7c
$r10 : 0x000055555555448f → "mprotect"
$r11 : 0x246
$r12 : 0x00005555555550a0 → <_start+0> xor ebp, ebp
$r13 : 0x00007fffffffda40 → 0x0000000000000001
$r14 : 0x0
$r15 : 0x0
(This register state is a snapshot at the assembly line below)
●→ 0x555555555253 <main+117> mov rdi, rdx
0x555555555256 <main+120> call rax
Since I already spilled the beans and "spoiled" the answer to the online challenge in comments, I might as well write it up. 2 key tricks:
Create 0x7ffff7e3c5a0 (&puts) in a register with lea reg, [reg + disp32], using the known value of RDI which is within the +-2^31 range of a disp32. (Or use RBP as a starting point, but not RSP: that would need a SIB byte in the addressing mode).
This is a generalization of the code-golf trick of lea edi, [rax+1] trick to create small constants from other small constants (especially 0) in 3 bytes, with code that runs less slowly than push imm8 / pop reg.
The disp32 is large enough to not have any zero bytes; you have a couple registers to choose from in case one had been too close.
Copy a 64-bit register in 2 bytes with push reg / pop reg, instead of 3-byte mov rdi, rdx (REX + opcode + modrm). No savings if either push needs a REX prefix (for R8..R15), and actually costs bytes if both are "non-legacy" registers.
See other answers on Tips for golfing in x86/x64 machine code on codegolf.SE for more.
bits 64
lea rsi, [rdi - 0x166f30]
;; add rbp, imm32 ; alternative, but that would mess up a call-preserved register so we might crash on return.
push rdx
pop rdi ; copy RDX to first arg, x86-64 SysV calling convention
jmp rsi ; tailcall puts
This is exactly 11 bytes, and I don't see a way for it to be smaller. add r64, imm32 is also 7 bytes, same as LEA. (Or 6 bytes if the register is RAX, but even the xchg rax, rdi short form would cost 2 bytes to get it there, and the RAX value is still the fgets return value, which is the small mmap buffer address.)
The puts function pointer doesn't fit in 32 bits, so we need a REX prefix on any instruction that puts it into a register. Otherwise we could just mov reg, imm32 (5 bytes) with the absolute address, not deriving it from another register.
$ nasm -fbin -o exploit.bin -l /dev/stdout exploit.asm
1 bits 64
2 00000000 488DB7D090E9FF lea rsi, [rdi - 0x166f30]
3 ;; add rbp, imm32 ; we can avoid messing up any call-preserved registers
4 00000007 52 push rdx
5 00000008 5F pop rdi ; copy to first arg
6 00000009 FFE6 jmp rsi ; tailcall
$ ll exploit.bin
-rw-r--r-- 1 peter peter 11 Apr 24 04:09 exploit.bin
$ ./a.out < exploit.bin # would work if the addresses in my build matched yours
My build of your incomplete .c uses different addresses on my machine, but it does reach this code (at address 0x10000, mmap_min_addr which mmap picks after the amusing choice of 0x1337 as a hint address, which isn't even page aligned but doesn't result in EIVAL on current Linux.)
Since we only tailcall puts with correct stack alignment and don't modify any call-preserved registers, this should successfully return to main.
Note that 0 bytes (ASCII NUL, not NULL) would actually work in shellcode for this test program, if not for the requirement that forbids it.
The input is read using fgets (apparently to simulate a gets() overflow).
fgets actually can read a 0 aka '\0'; the only critical character is 0xa aka '\n' newline. See Is it possible to read null characters correctly using fgets or gets_s?
Often buffer overflows exploit a strcpy or something else that stops on a 0 byte, but fgets only stops on EOF or newline. (Or the buffer size, a feature gets is missing, hence its deprecation and removal from even the ISO C standard library! It's literally impossible to use safely unless you control the input data). So yes, it's totally normal to forbid zero bytes.
BTW, your int 0x80 attempt is not viable: What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code? - you can't use the 32-bit ABI to pass 64-bit pointers to write, and the string you want to output is not in the low 32 bits of virtual address space.
Of course, with the 64-bit syscall ABI, you're fine if you can hardcode the length.
push rdx
pop rsi
shr eax, 16 ; fun 3-byte way to turn 0x10000` into `1`, __NR_write 64-bit, instead of just push 1 / pop
mov edi, eax ; STDOUT_FD = __NR_write
lea edx, [rax + 13 - 1] ; 3 bytes. RDX = 13 = string length
; or mov dl, 0xff ; 2 bytes leaving garbage in rest of RDX
syscall
But this is 12 bytes, as well as hard-coding the length of the string (which was supposed to be part of the secret?).
mov dl, 0xff could make sure the length was at least 255, and actually much more in this case, if you don't mind getting reams of garbage after the string you want, until write hits an unmapped page and returns early. That would save a byte, making this 11.
(Fun fact, Linux write does not return an error when it's successfully written some bytes; instead it returns how many it did write. If you try again with buf + write_len, you would get a -EFAULT return value for passing a bad pointer to write.)

Print newline with as little code as possible with NASM

I'm learning a bit of assembly for fun and I am probably too green to know the right terminology and find the answer myself.
I want to print a newline at the end of my program.
Below works fine.
section .data
newline db 10
section .text
_end:
mov rax, 1
mov rdi, 1
mov rsi, newline
mov rdx, 1
syscall
mov rax, 60
mov rdi, 0
syscall
But I'm hoping to achieve the same result without defining the newline in .data. Is it possible to call sys_write directly with the byte you want, or must it always be done with a reference to some predefined data (which I assume is what mov rsi, newline is doing)?
In short, why can't I replace mov rsi, newline by mov rsi, 10?
You always need the data in memory to copy it to a file-descriptor. There is no system-call equivalent of C stdio fputc that takes data by value instead of by pointer.
mov rsi, newline puts a pointer into a register (with a huge mov r64, imm64 instruction). sys_write doesn't special-case size=1 and treat its void *buf arg as a char value if it's not a valid pointer.
There aren't any other system calls that would do the trick. pwrite and writev are both more complicated (taking a file offset as well as a pointer, or taking an array of pointer+length to gather the data in kernel space).
There is a lot you can do to optimize this for code-size, though. See https://codegolf.stackexchange.com/questions/132981/tips-for-golfing-in-x86-x64-machine-code
First, putting the newline character in static storage means you need to generate a static address in a register. Your options here are:
5-bytes mov esi, imm32 (only in Linux non-PIE executables, so static addresses are link-time constants and are known to be in the low 2GiB of virtual address space and thus work as 32-bit zero-extended or sign-extended)
7-byte lea rsi, [rel newline] Works everywhere, the only good option if you can't use the 5-byte mov-immediate.
10-byte mov rsi, imm64. This works even in PIE executables (e.g. if you link with gcc -nostdlib without -static, on a distro where PIE is the default.) But only via a runtime relocation fixup, and the code-size is terrible. Compilers never use this because it's not faster than LEA.
But like I said, we can avoid static addressing entirely: Use push to put immediate data on the stack. This works even if we need zero-terminated strings, because push imm8 and push imm32 both sign-extend the immediate to 64-bit. Since ASCII uses the low half of the 0..255 range, this is equivalent to zero-extension.
Then we just need to copy RSP to RSI, because push leave RSP pointing to the data that was pushed. mov rsi, rsp would be 3 bytes because it needs a REX prefix. If you were targeting 32-bit code or the x32 ABI (32-bit pointers in long mode) you could use 2-byte mov esi, esp. But Linux puts the stack pointer at top of user virtual address space, so on x86-64 that's 0x007ff..., right at the top of the low canonical range. So truncating a pointer to stack memory to 32 bits isn't an option; we'd get -EFAULT.
But we can copy a 64-bit register with 1-byte push + 1-byte pop. (Assuming neither register needs a REX prefix to access.)
default rel ; We don't use any explicit addressing modes, but no reason to leave this out.
_start:
push 10 ; \n
push rsp
pop rsi ; 2 bytes total vs. 3 for mov rsi,rsp
push 1 ; _NR_write call number
pop rax ; 3 bytes, vs. 5 for mov edi, 1
mov edx, eax ; length = call number by coincidence
mov edi, eax ; fd = length = call number also coincidence
syscall ; write(1, "\n", 1)
mov al, 60 ; assuming write didn't return -errno, replace the low byte and keep the high zeros
;xor edi, edi ; leave rdi = 1 from write
syscall ; _exit(1)
.size: db $ - _start
xor-zeroing is the most well-known x86 peephole optimization: it saves 3 bytes of code size, and is actually more efficient than mov edi, 0. But you only asked for the smallest code to print a newline, without specifying that it had to exit with status = 0. So we can save 2 bytes by leaving that out.
Since we're just making an _exit system call, we don't need to clean up the stack from the 10 we pushed.
BTW, this will crash if the write returns an error. (e.g. redirected to /dev/full, or closed with ./newline >&-, or whatever other condition.) That would leave RAX=-something, so mov al, 60 would give us RAX=0xffff...3c. Then we'd get -ENOSYS from the invalid call number, and fall off the end of _start and decode whatever is next as instructions. (Probably zero bytes which decode with [rax] as an addressing mode. Then we'd fault with a SIGSEGV.)
objdump -d -Mintel disassembly of that code, after building with nasm -felf64 and linking with ld
0000000000401000 <_start>:
401000: 6a 0a push 0xa
401002: 54 push rsp
401003: 5e pop rsi
401004: 6a 01 push 0x1
401006: 58 pop rax
401007: 89 c2 mov edx,eax
401009: 89 c7 mov edi,eax
40100b: 0f 05 syscall
40100d: b0 3c mov al,0x3c
40100f: 0f 05 syscall
0000000000401011 <_start.size>:
401011: 11 .byte 0x11
So the total code-size is 0x11 = 17 bytes. vs. your version with 39 bytes of code + 1 byte of static data. Your first 3 mov instructions alone are 5, 5, and 10 bytes long. (Or 7 bytes long for mov rax,1 if you use YASM which doesn't optimize it to mov eax,1).
Running it:
$ strace ./newline
execve("./newline", ["./newline"], 0x7ffd4e98d3f0 /* 54 vars */) = 0
write(1, "\n", 1
) = 1
exit(1) = ?
+++ exited with 1 +++
If this was part of a larger program:
If you already have a pointer to some nearby static data in a register, you could do something like a 4-byte lea rsi, [rdx + newline-foo] (REX.W + opcode + modrm + disp8), assuming the newline-foo offset fits in a sign-extended disp8 and that RDX holds the address of foo.
Then you can have newline: db 10 in static storage after all. (Put it .rodata or .data, depending on which section you already had a pointer to).
It expects an address of the string in rsi register. Not a character or string.
mov rsi, newline loads the address of newline into rsi.

Endianness followed in NASM 64 bit programming

I have 2 situations :
I.
arr dq 1234567887654321H
mov rsi,arr
mov rbx,[rsi]
Now, as we know that rsi always points to 1 byte of location in memory and x86 follows little-endian. Does rsi points to 21H and then this 21 gets into rbx or the complete value in arr gets transfered to rbx ?
II.
tempbuff resb 16
arr resb 1234567887654321H
mov rbx,qword[arr]
mov rsi,tempbuff
mov [rsi],rbx
Above statements are taken from different sections and combined here so as to focus on important details.
Now, from the above statements, rbx stores the entire contents of arr.
rsi points to 1st memory location of tempbuff. Then does the mov [rsi],rbx
stores the entire content of rbx to tempbuff OR does it simply stores the lowest 1 byte of rbx(here 21) into the the location pointed by rsi(1 byte location) ?
For case 1:
mov rbx,[rsi]
Nasm implicitly resolves memory size pointed by right hand side ,[r64] based on left hand side target which is a r64. Hence rsi is 64bit address pointing to a 64bit [rsi] value which will be moved to a 64 bit rbx register.
This could be explicitly stated in Nasm as
mov rbx,qword [rsi]
It implies the statement in the question
"rsi always points to 1 byte of location in memory"
is incorrect. It would be valid for following instruction:
mov rbx,byte [rsi]
Where first encountered byte pointed by 64 bit rsi address will become rbx least significant byte.
For case 2:
mov [rsi],rbx
Whole 64 bit rbx value is moved into memory pointed by 64 bit rsi address.
From the chat discussion I can draw conclusion OP's real doubt was about little endiannes affecting registers and memory.
exampleQuadWord dq 0102030405060708H ; this is represented in memory as 0807060504030201
exampleQWordWithByte dq 1 ; this is represented in memory as 0100000000000000
In a simplistic example
mov rbx,2
mov rsi, exampleQWordWithByte
mov [rsi],rbx
; exampleQWordWithByte in memory is now 0200000000000000

Assembly Language nasm error

I have written the following assembly code as prescribed by my text book in the intel 64 bit syntax
Section .text
global _short
_start:
jmp short Gotocall
shellcode:
pop rcx
xor eax,eax
mov byte [rcx+8], al
lea rdi, [rax]
mov long [rcx+8], rdi
mov long [rcx+12], eax
mov byte al, 0x3b
mov rsi, rax
lea rdi, [esi+8]
lea edx, [esi+12]
int 0x80
Gotocall:
call shellcode
db '/bin/shJAAAAKKKK'
but i get a nasm error in line 10 like this
asmshell.asm:10: error: mismatch in operand sizes
Can anybody tell me what mistake is their in my code.
And can anybody please tell me some good references to the 64 bit intel assembly instructions.
If you mean the error is on line 10
mov long [rcx+8], rdi
I was about to ask you what size long qualifier is, but the next line
mov long [rcx+12], eax
shows that you are moving two different sizes of register to the same size destination. In the first case the 64-bit register rdi, in the second case the 32-bit register eax, and long cannot satisfy them both.
Why not just drop the long since by specifying the register, the assembler knows the size of the destination? But sadly, you have only allowed 4 bytes memory to store a 64-bit register, given away by the [rcx+8] followed by [rcx+12].
Perhaps you intended
mov long [rcx+8], edi

Can I multiply a register's value by an immediate number to add the result to another register?

Learning Assembly with NASM, Ubuntu, 32 bits.
My array in .data:
ary db 1,2,2,4,5 ; Five elements of one byte each
And some number:
tmp db 2 ; Holds the number 2
Let's say I want to print the element at index 4 in the array (so it would be 5).
I know I could do this:
mov EAX,4
mov EBX,0
mov ECX,ary ; Put the array's address in ECX
add ECX,4 ; Move address four bytes to the right
add byte [ECX],'0' ; The value at this address to ASCII
mov EDX,1
int 0x80
However, for whatever reasons, I decided that instead of writing the constant number 4, I want to do it by multiplying my variable (which is 2) by 2.
This is the updated code:
mov EAX,[tmp] ; Put the number 2 in EAX
mov ECX,ary ; Put the array's address in ECX
add ECX,EAX * 2 ; Move (2 * 2) = 4 bytes to the right
add byte [ECX],'0' ; Decimal to ASCII
mov EAX,4
mov EBX,0
mov EDX,1
int 0x80
This doesn't work at add ECX,EAX * 2:
invalid operand type
But why? Doesn't ECX evaluate to 2? Being equivalent to
add ECX,2 * 2
Curiously, these do work:
add ECX,EAX * 1 ; Moves by 2
add ECX,EAX * 0 ; Moves by 0
The above suggests me that the answer is no. And the reason that multiplying by 1 or 0 works is because the assembler doesn't actually need to do any multiplication to know the answer in the first place.
Does this mean that to achieve what I want, I do have to use the mul instruction?
You CAN do multiplication and adding in one instruction if you use lea:
lea ECX,[ECX+EAX*2]
In x86, although lea supports multiplication by a constant, the add instruction doesn't support an operand that multiples a register by a constant. It supports additive offsets, but not multiplication. I assume, as you noted, that the assembler is being somewhat forgiving in this case in the accepted syntax of add ECX,EAX*0 and add ECX,EAX*1 as being equivalent to add ECX,0 and add ECX,EAX, respectively.
You would instead need do something like this:
mov ECX,ary ; Put the array's address in ECX
mov EAX,[tmp] ; Put the number 2 in EAX
shl EAX,1 ; (instead of mul EAX,2)
add ECX,EAX ; Move (2 * 2) = 4 bytes to the right
add byte [ECX],'0' ; Decimal to ASCII
mov EAX,4
mov EBX,0
mov EDX,1
int 0x80
The instruction LEA can be used to provide two additions and one limited multiplication at once. The common syntax is:
lea reg, [offset+reg+const*reg]
Here, reg is any register, offset is some constant number and const is one of 1, 2, 4 or 8 constant.
This way, this instruction is very powerful is order to compute some pretty complex equations:
The equation from the question:
add ECX,EAX * 2
can be computed this way:
lea ecx, [ecx+2*eax]
There are many other uses:
lea eax, [ebx+2*ebx] ; eax = 3*ebx
lea eax, [eax+4*eax] ; eax = 5*eax
lea eax, [ecx+8*ecx] ; eax = 9*ecx
lea eax, [1234+ebx+8*ecx]
Note, that FASM allows shorter syntax for the above examples:
lea eax, [3*ebx]
lea eax, [5*eax]
lea eax, [9*ecx]
Additional advantage of lea instruction is that it does not affects the flags. The execution speed of this instruction is very fast on all x86 CPU.

Resources