Disassembling and Reassembling, how to properly pipeline this in the terminal? - linux

I'm using the eicar.com file and playing around with reverse engineering tools. I'd like to be able to disassemble and reassemble this file. I get close but there are still a few problems that I cannot figure out.
This is the original eicar.com ascii file.
X5O!P%#AP[4\PZX54(P^)7CC)7}$EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H*
Using udcli udcli -noff -nohex eicar.com > stage1.asm I end up with this x86 assembly
pop eax
xor eax, 0x2550214f
inc eax
inc ecx
push eax
pop ebx
xor al, 0x5c
push eax
pop edx
pop eax
xor eax, 0x5e502834
sub [edi], esi
inc ebx
inc ebx
sub [edi], esi
jge 0x40
inc ebp
dec ecx
inc ebx
inc ecx
push edx
sub eax, 0x4e415453
inc esp
inc ecx
push edx
inc esp
sub eax, 0x49544e41
push esi
dec ecx
push edx
push ebp
push ebx
sub eax, 0x54534554
sub eax, 0x454c4946
and [eax+ecx*2], esp
sub ecx, [eax+0x2a]
Finally, putting it back together with nasm using this command, nasm stage1.asm -o stage2 I end up with...
fXf5O!P%f#fAfPf[4\fPfZfXf54(P^fg)7fCfCfg)7^O<8d>^R^#fEfIfCfAfRf- STANfDfAfRfDf-ANTIfVfIfRfUfSf-TESTf-FILEfg!$Hfg+H*
In this case I'm starting with an ASCII file and end up with a bin file that holds a lot of extra garbage.
What am I missing here? How do I end up with the original ASCII string and have the proper file type?
EDIT:
Per #Ross Ridge's suggestion, he noted that I was disassembling a 16-bit file as a 32-bit one, this has successfully cleaned up the string but he file type however is still incorrectly output as binary.
First fix: udcli -16 -noff -nohex eicar.com > stage1.asm to obtain proper output string.
Results in X5O!P%#AP[4\PZX54(P^)7CC)7^O<8d>"^#EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H*
Still a little garbage data not present in the original but very close.

In general you can't reassemble the output of a dissembler back into the exact the same binary file as the original. There is often more than one way to assemble a given assembly instruction into machine code. As far your ultimate goal of understanding the code you're trying to do this with it's also not very helpful. Even if you do get something that you can assemble back into the original code, it's extremely unlikely you'll get something you can modify and assemble into code that works.
To illustrate this I've provided my own "disassembly" of the eicar.com file, one that allows it to be modified to a limited extent. You can modify the string it prints, so long as the message isn't too long and does't contain any dollar sign $ characters. You should be able to modify the string while still keeping the output consisting of only of printable ASCII characters, assuming you only put printable ASCII characters in the string.
BITS 16
ORG 0x100
ascii_shift EQU 0x097b
start:
pop ax
xor ax, 0x2000 | (skip - start + 0x100) | 0x000f
push ax
and ax, 0x4000 | (skip - start + 0x100)
push ax
pop bx
xor al, (msg - start) ^ (skip - start)
push ax
pop dx
pop ax
xor ax, (0x2000 | (skip - start + 0x100) | 0x000f) ^ ascii_shift
push ax
pop si
sub [bx], si
inc bx
inc bx
sub [bx], si
jnl skip
msg:
DB 'EICAR-STANDARD-ANTIVIRUS-TEST-FILE!'
DB '$'
%if ($ - msg) < 0x21
TIMES 0x21 - ($ - msg) DB '$'
%endif
skip:
DW 0x21cd + ascii_shift
DW 0x20cd + ascii_shift
%if skip - msg > 0x7e
%error 'msg too long'
%endif
I won't explain how the code works, but I'll give you one hint: MS-DOS pushes a 16-bit 0 value on the stack at the start execution of a .COM format executable.

The problem is that the disassembler makes no difference between the code and the data.
Notice this:
sub eax, 0x54534554 ; 'TEST'
sub eax, 0x454c4946 ; 'FILE'
(and all the sub eax statements)
this is not really code (it makes no sense substracting both values without using them in-between), this is a part of the message (there's TEST in the first instruction, then FILE)
So when you're reassembling it, optimizations can occur which break your data (sub could be interpreted in different ways). You have to identify the data sections so they're not treated as code by your assembler.
Another way to go is to turn off all assembling optimizations.

Related

Reversing a string using stack in x86 NASM

I'm trying to write a function in x86 NASM assembly which reverses order of characters in a string passed as argument. I tried implementing it using stack but ended up getting error message
*** stack smashing detected ***: <unknown> terminated
Aborted (core dumped)
Code below:
section .text
global reverse
reverse:
push ebp ; epilogue
mov ebp, esp
mov eax, [ebp+8]
xor ecx, ecx ; ecx = 0
push ebx ; saved register
push_eax:
mov edx, [eax] ; edx = char at eax
test edx, edx
jz inc_eax ; if edx == 0, move eax pointer back and pop eax
push edx
inc eax
inc ecx ; counter + 1
jmp push_eax
inc_eax:
sub eax, ecx ; move eax back to beginning of string
mov ebx, ecx ; to move eax back at the end of function
pop_eax:
test ecx, ecx ; loop counter == 0
jz end
pop edx
mov [eax], edx ; char at eax = edx
inc eax ; eax++
dec ecx ; ecx--
jmp pop_eax
end:
sub eax, ebx
pop ebx ; saved register
mov esp, ebp
pop ebp
ret
C declaration:
extern char* reverse(char*);
I've read somewhere that you get this error when trying to for instance write something in an array that is longer than allocated but i don't see how would that function do it? Also when instead of using ebx at the end I manually move the pointer in eax back (string in C of length 9 -> sub eax, 9) I get the reversed string at the output followed by 2nd, 3rd and 4th char. (No matter the length of the string I declare in C). So for instanceinput: "123456789"
output: "987654321234" but that only happens when I move eax manually, using ebx like in the code above outputs some trash.
Peter's answer is the answer you are looking for. However, may I comment on the technique? Must you use the stack? Do you already know the length of the string, or must you calculate/find that yourself?
For example, if you already know the length of the string, can you place a pointer at the first and another at the end and simply exchange the characters, moving each pointer toward the center until they meet? This has the advantage of not assuming there is enough room on the stack for the string. In fact, you don't even touch the stack except for the prologue and epilogue. (Please note you comment that the epilogue is at the top, when it is an 'ending' term.)
If you do not know the length of the string, to use the above technique, you must find the null char first. By doing this, you have touched each character in the string already, before you even start. Advantage, it is now loaded in to the cache. Disadvantage, you must touch each character again, in essence, reading the string twice. However, since you are using assembly, a repeated scasb instruction is fairly fast and has the added advantage of auto-magically placing a pointer near the end of the string for you.
I am not expecting an answer by asking these questions. I am simply suggesting a different technique based on certain criteria of the task. When I read the question, the following instantly came to mind:
p[i] <-> p[n-1]
i++, n--
loop until n <= i
Please note that you will want to check that 'n' is actually greater than 'i' before you make the first move. i.e.: it isn't a zero length string.
If this is a string of 1-byte characters, you want movzx edx, byte [eax] byte loads and mov [eax], dl byte stores.
You're doing 4-byte stores, which potentially steps on bytes past the end of the array. You also probably overread until you find a whole dword on the stack that's all zero. test edx, edx is fine if you correctly zero-extended a byte into EDX, but loading a whole word probably resulted in overread.
Use a debugger to see what you're doing to memory around the input arg.
(i.e. make sure you aren't writing past the end of the array, which is probably what happened here, stepping on the buffer-overflow detection cookie.)

How do i reverse a string on emu8086 assembly language [duplicate]

I have to do a simple calculator in assembly using EMU8086, but every time I try to launch it EMU8086 gives this error:
INT 21h, AH=09h -
address: 170B5
byte 24h not found after 2000 bytes.
; correct example of INT 21h/9h:
mov dx, offset msg
mov ah, 9
int 21h
ret
msg db "Hello$"
I checked the other stuff, but there were no mistakes:
data segment
choice db ?
snum1 db 4 dup(?)
snum2 db 4 dup(?)
sres db 4 dup(?)
num1 db ?
num2 db ?
res db ?
;;menu1 db "Chose a function to procced", 10, 13, "Add [+]", 10, 13, "Sub [-]", 10, 13
;;menu2 db "Mul [*]", 10, 13, "Div [/]", 10, 13, "Mod [%]", 10, 13, "Pow [^]", 10, 13, "Exit [x]$"
messStr db "Enter Your Choice:",10,13,"",10,13,"Add --> +",10,13,"Sub --> -",10,13,"Mul --> *",10,13,"Div --> /",10,13,"Mod --> %",10,13,"Pow --> ^",10,13,"Exit --> X",10,13,"$"
msg1 db "Enter first number$"
msg2 db "Enter second number$"
msg3 db "Press any key to procced$"
msg4 db "The result is $"
ends
stack segment
dw 128 dup(0)
ends
code segment
assume cs:code, ds:data, ss:stack
newline proc ;; new line
push ax
push dx
mov ah, 2
mov DL, 10
int 21h
mov ah, 2
mov DL, 13
int 21h
pop dx
pop ax
ret
endp
printstr proc ;; print string
push BP
mov BP, SP
push dx
push ax
mov dx, [BP+4]
mov ah, 9
int 21h
pop ax
pop dx
pop BP
ret 2
endp
inputstr proc ;; collect input
push BP
mov BP, SP
push bx
push ax
mov bx, [BP+4]
k1:
mov ah, 1
int 21h
cmp al, 13
je sofk
mov [bx], al
inc bx
jmp k1
sofk:
mov byte ptr [bx], '$'
pop ax
pop bx
pop BP
ret 2
endp
getNums proc ;; get the numbers
call newline
push offset msg1
call printstr
call newline
push offset snum1
call inputstr
call newline
push offset msg2
call printstr
call newline
push offset snum2
call inputstr
ret
endp
start:
mov ax, data
mov ds, ax
mov ax, stack
mov ss, ax
;; print the main menu
call newline
push offset msg4
call printstr
;; collect the input
call newline
mov bx, offset choice
mov ah, 1
int 21h
mov [bx], al
;; check it
mov al, choice
cmp al, '+'
jne cexit
call getNums
jmp cont
cexit:
cmp al, 'x'
je cend
cont:
;; pause before going to the main menu
call newline
push offset msg3
call printstr
mov bx, offset choice
mov ah, 1
int 21h
call newline
call newline
call newline
jmp start
cend:
mov ax, 4c00h
int 21h
ends
end start
I cut most of the code segment because it wasn't important here.
After experimenting with the code I found that the problem was related to the lengths of the messages in the data segment. menu1 & menu2 were too long and any message after them can't be printed (msg1 & msg2 are printed, but nothing after them). I checked if I should merge menu1 & menu2, but it didn't help out. Please help me find out what is wrong with it.
The error message means you use int 21h / AH=09h on a string that didn't end with a $ (ASCII 24h). The system-call handler checked 2000 bytes without finding one.
Often, that means your code or data is buggy, e.g. in a fixed string you forgot a $ at the end, or if copying bytes into a buffer then you maybe overwrote or never stored a '$' in the first place.
But in this case, it appears that EMU8086 has a bug assembling push offset msg4. (In a way that truncates the 00B5h 16-bit address to 8-bit, and sign-extends back to 16, creating a wrong pointer that points past where any $ characters are in your data.)
Based on the error message below I know you are using EMU8086 as your development environment.
INT 21h, AH=09h -
address: 170B5
byte 24h not found after 2000 bytes.
; correct example of INT 21h/9h:
mov dx, offset msg
mov ah, 9
int 21h
ret
msg db "Hello$"
I'm no expert on EMU8086 by any stretch of the imagination. I do know why your offsets don't work. I can't tell you if there is a proper way to resolve this, or if it's an EMU8086 bug. Someone with a better background on this emulator would know.
You have created a data segment with some variables. It seems okay to me (but I may be missing something). I decided to load up EMU8086 to actually try this code. It assembled without error. Using the debugger I single stepped to the push offset msg1 line near the beginning of the program. I knew right away from the instruction encoding what was going on. This is the decoded instruction I saw:
It shows the instruction was encoded as push 0b5h where 0b5h is the offset. The trouble is that it is encoded as a push imm8 . The two highlighted bytes on the left hand pane show it was encoded with these bytes:
6A B5
If you review an instruction set reference you'll find the encodings for PUSH instruction encoded with 6A is listed as:
Opcode* Instruction Op/En 64-Bit Mode Compat/Leg Mode Description
6A ib PUSH imm8 I Valid Valid Push imm8.
You may say that B5 fits within a byte (imm8) so what is the problem? The smallest value that can be pushed onto the stack with push in 16-bit mode is a 16-bit word. Since a byte is smaller than a word, the processor takes the byte and sign extends it to make a 16-bit value. The instruction set reference actually says this:
If the source operand is an immediate of size less than the operand size, a sign-extended value is pushed on the stack
B5 is binary 10110101 . The sign bit is the left most bit. Since it is 1 the upper 8 bits placed onto the stack will be 11111111b (FF). If the sign bit is 0 then then 00000000b is placed in the upper 8 bits. The emulator didn't place 00B5 onto the stack, it placed FFB5. That is incorrect! This can be confirmed if I step through the push 0b5h instruction and review the stack. This is what I saw:
Observe that the value placed on the stack is FFB5. I could not find an appropriate syntax (even using the word modifier) to force EMU8086 to encode this as push imm16. A push imm16 would be able to encode the entire word as push 00b5 which would work.
Two things you can do. You can place 256 bytes of dummy data in your data segment like this:
data segment
db 256 dup(?)
choice db ?
... rest of data
Why does this work? Every variable defined after the dummy data will be an offset that can't be represented in a single byte. Because of this EMU8086 is forced to encode push offset msg1 as a word push.
The cleaner solution is to use the LEA instruction. This is the load effective address instruction. It takes a memory operand and computes the address (in this case the offset relative to the data segment). You can replace all your code that uses offset with something like:
lea ax, [msg1]
push ax
AX can be any of the general purpose 16-bit registers. Once in a register, push the 16-bit register onto the stack.
Someone may have a better solution for this, or know a way to resolve this. If so please feel free to comment.
Given the information above, you may ask why did it seem to work when you moved the data around? The reason is that the way you reorganized all the strings (placing the long one last) caused all the variables to start with offsets that were less than < 128. Because of this the PUSH of an 8-bit immediate offset sign extended a 0 in the top bits when placed on the stack. The offsets would be correct. Once the offsets are >= 128 (and < 256) the sign bit is 1 and the value placed on the stack sign will have an upper 8 bits of 1 rather than 0.
There are other bugs in your program, I'm concentrating on the issue directly related to the error you are receiving.
I reviewed your code and concentrated on the following sequence of instructions:
mov bx, offset choice ; here you set BX to the address of 'choice'
mov ah, 1
int 21h ; here you 'READ CHARACTER FROM STANDARD INPUT, WITH ECHO'
mov [bx], al ; because INT 21h does preserve BX, you are writing back the result of the interrupt call (AL) back to the memory location at BX, which is named 'choice'
;; check it
mov al, choice ; HERE you are moving a BYTE variable named 'choice' to AL, overwriting the result of the last INT 21h call
cmp al, '+' ; ... and compare this variable to the ASCII value of '+'
jne cexit ; if this variable is unequal to '+' you jump to 'cexit'
call getNums ; otherwise you try to get another number from the input/STANDARD CONSOLE
So your sequence
mov bx, offset choice ; here you set BX to the address of 'choice'
...
mov [bx], al ; because INT 21h does preserve BX, you ...
...
mov al, choice
essentially means, that you are setting BX to the address of 'choice', then setting 'choice'([BX]) to AL and copying it back to AL.
This is redundant.
After that, you compare that char to '+' and...
if that char equals to '+', you get the next char with call getNums and then continue with cont:.
if that char does not equal to '+', you compare it to 'x', the exit-char. If it's not 'x', you fall through to cont:
No error here.
So your problem with menu1 and menu2 may stem from some escape characters included in your strings like %,/,\. For example, % is a MACRO character in some assemblers which may create problems.
simple solution is that your strings should always end in '$'
change DUP(?) to DUP('$') and all other strings end with ,'$'

Printing a string in assembly using no predefined function

I have to define a function in assembly that will allow me to loop through a string of declared bytes and print them using a BIOS interrupt. I'm in 16 bit real mode. This is an exercise on writing a little bootloader from a textbook, but it seems that it was only a draft and it's missing some stuff.
I have been given the following code:
org 0x7c00
mov bx, HELLO_MSG
call print_string
mov bx, GOODBYE_MSG
call print_string
jmp $ ;hang so we can see the message
%include "print_string.asm"
HELLO_MSG:
db 'Hello, World!', 0
GOODBYE_MSG:
db 'Goodbye!', 0
times 510 - ($ - $$) db 0
dw 0xaa55
My print_string.asm looks like this:
print_string:
pusha
mov ah, 0x0e
loop:
mov al, bl
cmp al, 0
je return
int 0x10
inc bx
jmp loop
return:
popa
ret
I have some idea of what I'm doing, but the book doesn't explain how to iterate through something. I know how to do it in C but this is my first time using assembly for something other than debugging C code. What happens when I boot through the emulator is that it prints out a couple of lines of gibberish and eventually hangs there for me to see my failure in all it's glory. Hahaha.
Well, it looks like it loads the address of the string into the BX register before calling the function.
The actual function looks like it is trying to loop through the string, using BX as a pointer and incrementing it (inc bx) until it hits the ASCII NUL at the end of the string (cmp al, 0; je return)...
...but something is wrong. The "mov al, bl" instruction does not look correct, because that would move the low 8 bits of the address into al to be compared for an ASCII NUL, which does not make a lot of sense. I think it should be something more like "mov al, [bx]"; i.e. move the byte referenced by the BX address into the AL register -- although it has been a long time since I've worked with assembly so I might not have the syntax correct.
Because of that bug, the 10h interrupt would also be printing random characters based on the address of the string rather than the contents of the string. That would explain the gibberish you're seeing.
I think the issue is you cannot count on the int preserving any of your registers, so you need to protect them. Plus, what Steven pointed out regarding loading of your string address:
; Print the string whose address is in `bx`, segment `ds`
; String is zero terminated
;
print_string:
pusha
loop:
mov al, [bx] ; load what `bx` points to
cmp al, 0
je return
push bx ; save bx
mov ah, 0x0e ; load this every time through the loop
; you don't know if `int` preserves it
int 0x10
pop bx ; restore bx
inc bx
jmp loop
return:
popa
ret

Copy string from BSS variable to BSS variable in Assembly

Let's suppose I have to string stored in variables created in the .BSS section.
var1 resw 5 ; this is "abcde" (UNICODE)
var2 resw 5 ; here I will copy the first one
How would I do this with NASM?
I tried something like this:
mov ebx, var2 ; Here we will copy the string
mov dx, 5 ; Length of the string
mov esi, dword var1 ; The variable to be copied
.Copy:
lodsw
mov [ebx], word ax ; Copy the character into the address from EBX
inc ebx ; Increment the EBX register for the next character to copy
dec dx ; Decrement DX
cmp dx, 0 ; If DX is 0 we reached the end
jg .Copy ; Otherwise copy the next one
So, first problem is that the string is not copied as UNICODE but as ASCII and I don't know why. Secondly, I know there might be some not recommended use of some registers. And lastly, I wonder if there is some quicker way of doing this (maybe there are instructions specially created for this kind of operations with strings). I'm talking about 8086 processors.
inc ebx ; Increment the EBX register for the next character to copy
A word is 2 bytes, but you're only stepping ebx 1 byte ahead. Replace inc ebx with add ebx,2.
Michael already answered about the obvious problem of the demonstrated code.
But there is also another layer of understanding. It is not important how you will copy the string from one buffer to another - by bytes, words or double words. It will always create exact copy of the string.
So, how to copy the string is a matter of optimization. Using rep movsd is the fastest known way.
Here is one example:
; ecx contains the length of the string in bytes
; esi - the address of the source, aligned on dword
; edi - the address of the destination aligned on dword
push ecx
shr ecx, 2
rep movsd
pop ecx
and ecx, 3
rep movsb

How would I find the length of a string using NASM?

I'm trying to make a program using NASM that takes input from command line arguments. Since string length is not provided, I'm trying to make a function to compute my own. Here is my attempt, which takes a pointer to a string in the ebx register, and returns the length of the string in ecx:
len:
push ebx
mov ecx,0
dec ebx
count:
inc ecx
inc ebx
cmp ebx,0
jnz count
dec ecx
pop ebx
ret
My method is to go through the string, character by character, and check if it's null. If it's not, I increment ecx and go to the next character. I believe the problem is that cmp ebx,0 is incorrect for what I'm trying to do. How would I properly go about checking whether the character is null? Also, are there other things that I could be doing better?
You are comparing the value in ebx with 0 which is not what you want. The value in ebx is the address of a character in memory so it should be dereferenced like this:
cmp byte[ebx], 0
Also, the last push ebx should be pop ebx.
Here is how I do it in a 64-bit Linux executable that checks argv[1]. The kernel starts a new process with argc and argv[] on the stack, as documented in the x86-64 System V ABI.
_start:
pop rsi ; number of arguments (argc)
pop rsi ; argv[0] the command itself (or program name)
pop rsi ; rsi = argv[1], a pointer to a string
mov ecx, 0 ; counter
.repeat:
lodsb ; byte in AL
test al,al ; check if zero
jz .done ; if zero then we're done
inc ecx ; increment counter
jmp .repeat ; repeat until zero
.done:
; string is unchanged, ecx contains the length of the string
; unused, we look at command line args instead
section .rodata
asciiz: db "This is a string with 36 characters.", 0
This is slow and inefficient, but easy to understand.
For efficiency, you'd want
only 1 branch in the loop (Why are loops always compiled into "do...while" style (tail jump)?)
avoid a false dependency by loading with movzx instead of merging into the previous RAX value (Why doesn't GCC use partial registers?).
subtract pointers after the loop instead of incrementing a counter inside.
And of course SSE2 is always available in x86-64, so we should use that to check in chunks of 16 bytes (after reaching an alignment boundary). See optimized hand-written strlen implementations like in glibc. (https://code.woboq.org/userspace/glibc/sysdeps/x86_64/strlen.S.html).
Here how I would have coded it
len:
push ebx
mov eax, ebx
lp:
cmp byte [eax], 0
jz lpend
inc eax
jmp lp
lpend:
sub eax, ebx
pop ebx
ret
(The result is in eax). Likely there are better ways.

Resources