Finding null pointer after environment variables - linux

I'm reading a book(Assembly Language Step by Step, Programming with Linux by Jeff Duntemann) and I'm trying to change this program that show's arguments to instead show the environment variables. I'm trying to only use what was taught thus far(no C) and I've gotten the program to print environment variables but only after I counted how many I had and used an immediate, obviously not satisfying. Here's what I have:
global _start ; Linker needs this to find the entry point!
_start:
nop ; This no-op keeps gdb happy...
mov ebp,esp ; Save the initial stack pointer in EBP
; Copy the command line argument count from the stack and validate it:
cmp dword [ebp],MAXARGS ; See if the arg count exceeds MAXARGS
ja Error ; If so, exit with an error message
; Here we calculate argument lengths and store lengths in table ArgLens:
xor eax,eax ; Searching for 0, so clear AL to 0
xor ebx,ebx ; Stack address offset starts at 0
ScanOne:
mov ecx,0000ffffh ; Limit search to 65535 bytes max
mov edi,dword [ebp+16+ebx*4] ; Put address of string to search in EDI
mov edx,edi ; Copy starting address into EDX
cld ; Set search direction to up-memory
repne scasb ; Search for null (0 char) in string at edi
jnz Error ; REPNE SCASB ended without finding AL
mov byte [edi-1],10 ; Store an EOL where the null used to be
sub edi,edx ; Subtract position of 0 from start address
mov dword [ArgLens+ebx*4],edi ; Put length of arg into table
inc ebx ; Add 1 to argument counter
cmp ebx,44; See if arg counter exceeds argument count
jb ScanOne ; If not, loop back and do another one
; Display all arguments to stdout:
xor esi,esi ; Start (for table addressing reasons) at 0
Showem:
mov ecx,[ebp+16+esi*4] ; Pass offset of the message
mov eax,4 ; Specify sys_write call
mov ebx,1 ; Specify File Descriptor 1: Standard Output
mov edx,[ArgLens+esi*4] ; Pass the length of the message
int 80H ; Make kernel call
inc esi ; Increment the argument counter
cmp esi,44 ; See if we've displayed all the arguments
jb Showem ; If not, loop back and do another
jmp Exit ; We're done! Let's pack it in!
I moved the displacement up past the first null pointer to the first environment variable([ebp+4+ebx*4] > [ebp+16+ebx*4]) in both ScanOne and Showem. When I compare to the number of environment variables I have(44) it will print them just fine without a segfault, comparing to 45 only gives me a segfault.
I've tried using the pointers to compare to zero(in search of null pointer): cmp dword [ebp+16+ebx*4],0h but that just returns a segfault. I'm sure that the null pointer comes after the last environment variable in the stack but it's like it won't do anything up to and beyond that.
Where am I going wrong?

What if your program has 2, 3, or 0 args, would your code still work? Each section is separated by a NULL pointer (4 bytes of 0) You could just get the count of parameters and use that as your array index and skip over the args until you get to the NULL bytes. Now you have your Environment Block:
extern printf, exit
section .data
fmtstr db "%s", 10, 0
fmtint db "%d", 10, 0
global main
section .text
main:
push ebp
mov ebp, esp
mov ebx, [ebp + 4]
.SkipArgs:
mov edi, dword [ebp + 4 * ebx]
inc ebx
test edi, edi
jnz .SkipArgs
.ShowEnvBlock:
mov edi, dword [ebp + 4 * ebx]
test edi, edi
jz .NoMore
push edi
push fmtstr
call printf
add esp, 4 * 2
inc ebx
jmp .ShowEnvBlock
.NoMore:
push 0
call exit
Yes I use printf here, but you just swap that with your system call.

Want to go ahead and apologize, this always happens to me(fix it myself after asking question on stackoverflow). I think when I tried comparing pointer to 0h I typed something wrong. Here's what I did:
inc ebx
cmp dword [ebp+16+ebx*4],0h
jnz ScanOne
and
inc esi
cmp dword [ebp+16+esi*4],0h
jnz Showem
This worked.

Related

FInding length of String in NASM

I'm trying to check the length of a sting given in the program argument as part of a larger program. I need to place the value of the string length in a variable called n, which I've left uninitialized in BSS. I've tried a couple different methods of doing this, including the one Im trying right now which was given in another answer here. However when I try to print or use the result of this little algorithm, it always returns 7. I believe I'm getting the address of what I want and not the result but I cant figure what to change. Heres my code:
%include "asm_io.inc"
SECTION .data
fmt1: db "%d",10,0
argErrMsg: db "Needs 2 args",10,0
argOkMsg : db "Arguments ok",10,0
doneMsg: db "Program finished, now exiting",10,0
strErrMsg: db "String must be between 1 and 20 charaters",10,0
strOkMsg: db "String length ok",10,0
SECTION .bss
X: resd 20
i: resd 1
n: resd 1
k: resd 1
SECTION .text
global asm_main
extern printf
extern strlen
asm_main:
enter 0,0
pusha
CHECK_ARGS:
cmp dword [ebp+8],2
jne ERROR_ARGS
je OK_ARGS
ERROR_ARGS:
push dword argErrMsg
call printf
add esp,8
jmp EXIT
OK_ARGS:
push dword argOkMsg
call printf
add esp,8
jmp CHECK_STRING
CHECK_STRING:
mov eax, dword[ebp+16]
push eax ;This is the code I tried using from another answer
mov ecx,0
dec eax
count:
inc ecx
inc eax
cmp byte[eax],0
jnz count
dec ecx
ret
pop eax
mov [n],ecx ;Tried prining it here to see the result
push dword [n]
push dword fmt1
call printf
add esp,8
cmp byte [n],1
jl ERROR_STRING
cmp byte [n],20
jg ERROR_STRING
jmp OK_STRING ;The program always gets to here since [n] = 7?
ERROR_STRING:
push dword strErrMsg
call printf
add esp,8
jmp EXIT
OK_STRING:
push dword strOkMsg
call printf
add esp,8
jmp EXIT
EXIT:
push dword doneMsg
call printf
popa
leave
ret
To get the length of argv[1]:
mov edi,[ebp+12] ;edi = argv = &argv[0]
mov edi,[edi+4] ;edi = argv[1]
mov ecx,0ffffh ;scan for 0
xor eax,eax
repne scasb
sub ecx,0ffffh ;ecx = -(length + 1)
not ecx ;ecx = length
main should return a 0 in eax:
popa
xor eax,eax
leave
ret

Insertion sort not working, 32bit assembly

I'm trying to implement insertion sort in 32bit assembly in linux using NASM and I get a segmentation fault mid-run (not to mention that for some reason 'printf' prints random garbage values, I'm not totally sure why), Here is the
code:
section .rodata
MSG: DB "welcome to sortMe, please sort me",10,0
S1: DB "%d",10,0 ; 10 = '\n' , 0 = '\0'
section .data
array DD 5,1,7,3,4,9,12,8,10,2,6,11 ; unsorted array
len DB 12
section .text
align 16
global main
extern printf
main:
push MSG ; print welcome message
call printf
add esp,4 ; clean the stack
call printArray ;print the unsorted array
;parameters
;push len
;push array
mov eax, len
mov ebx, array
push eax
push ebx
call myInsertionSort
call printArray ; print the sorted one
mov eax, 1 ;exit system call
int 0x80
printArray:
push ebp ;save old frame pointer
mov ebp,esp ;create new frame on stack
pushad ;save registers
mov eax,0
mov ebx,0
mov edi,0
mov esi,0 ;array index
mov bl, byte [len]
add edi,ebx ; edi = array size
print_loop:
cmp esi,edi
je print_end
push dword [array+esi*4]
push S1
call printf
add esp, 8 ;clean the stack
inc esi
jmp print_loop
print_end:
popa ;restore registers
mov esp,ebp ;clean the stack frame
pop ebp ;return to old stack frame
ret
myInsertionSort:
push ebp
mov ebp, esp
push ebx
push esi
push edi
mov ecx, [ebp+12]
movzx ecx, byte [ecx] ;put len in ecx, our loop variable
mov eax, 0
mov ebx, 0
mov esi, [ebp+8] ; the array
loop loop_1
loop_1:
cmp ecx, 0 ; if we're done
je done_1 ; then done with loop
mov edx, ecx
push ecx ; we save len, because loop command decrements ecx
sub edx, ecx
mov ecx, [esi+4*edx] ;;;;;; ecx now array[i] ? how do I access array[i] in a similar manner?
mov ebx, eax
shr ebx, 2 ; number of times for inner loop
loop_2:
cmp ebx, 0 ; we don't use loop to not affect ecx so we use ebx and compare it manually with 0
jl done_2
cmp [esi+ebx], ecx ;we see if array[ebx] os ecx so we can exit the loop
jle done_2
lea edx, [esi+ebx]
push dword [edx] ; pushing our array[ebx]
add edx, 4
pop dword [edx] ; popping the last one
dec ebx ; decrementing the loop iterator
jmp loop_2 ; looping again
done_2:
mov [esi+ebx+1], ecx
inc eax ; incrementing iterator
pop ecx ; len of array to compare now to eax and see if we're done
jmp loop_1
done_1:
pop edi
pop esi
pop ebx
pop ebp ; we pop them in opposite to how we pushed
ret
About the printf thing, I'm positive that I should push the parameters the opposite way (first S1 and then the integer so it'd be from left to right as we'd call it in C), and if I do switch them, nothing is printed at all while I'm getting a segmentation fault. I don't know what to do, it prints these as output:
welcome to sortMe, please sort me
5
16777216
65536
256
1
117440512
458752
1792
7
50331648
196608
768
mov ecx, [ebp+12] ;put len in ecx, our loop variab
This only moves the address of LEN into ECX not its value! You need to add movzx ecx, byte [ecx]
You also need to define LEN=48
loop loop_1
What's this bizare use of LOOP doing here?
You are mixing bytes and dwords on multiple occasions. You need to rework the code. p.e.
dec ebx ; ebx is now number of times we should go through inner loop
should become
shr ebx,2
This is not correct because you need the address and not the value. Change MOV into LEA.
jle done_2
mov edx, [esi+ebx]
Perhaps you can post your reworked code as an EDIT within your Original question.
Your edited code does not address ALL the problems signaled by user3144770!
The parameters to printf are correct but here are some additional problems with your printArray routine.
Since ESI is an index in an array of dwords you need to scale it up!
push dword [array+esi*4]
Are you sure pusha will save 32 bits ? Perhaps you'd better use pushad
ps Should you decide to rework your code and post the edit then please add the reworked code after the last line of the existing post. This way the original question will continue making sense to people viewing it the first time!

Tower of Hanoi in assembly x86 using arrays

Hi every one I am trying to do a tower of Hanoi in assembly x86 but I am trying to use arrays. So this code gets a number from user as a parameter in Linux, then error checks a bunch of stuff. So now I just want to make the algorithm which use the three arrays i made (start, end, temp) and output them step by step. If someone can help it would be greatly appreciated. `
%include "asm_io.inc"
segment .data ; where all the predefined variables are stored
aofr: db "Argument out of Range", 0 ; define aofr as a String "Argument out of Range"
ia: db "Incorrect Argument", 0 ; define ia as a String "Incorrect Argument"
tma: db "Too many Arguments", 0 ; define tma as a String "Too many Arguments"
hantowNumber dd 0 ; define hantowNumber this is what the size of the tower will be stored in
start: dd 0,0,0,0,0,0,0,0,9 ; this array is where all the rings start at
end: dd 0,0,0,0,0,0,0,0,9 ; this array is where all the rings end up at
temp: dd 0,0,0,0,0,0,0,0,9 ; this array is used to help move the rings
test: dd 0,0,0,0,0,0,0,0,9
; The next couple lines define the strings to show the pegs and what they look like
towerName: db " Tower 1 Tower 2 Tower 3 ", 10, 0
lastLineRow: db "XXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXX ", 10, 0
buffer: db " ", 0
fmt:db "%d",10,0
segment .bss ; where all the input variables are stored
segment .text
global asm_main ; run the main function
extern printf
asm_main:
enter 0,0 ; setup routine
pusha
mov edx, dword 0 ; set edx to zero this is where the hantowNumber is saved for now
mov ecx, dword[ebp+8] ; ecx has how many arguments are given
mov eax, dword[ebp+12] ; save the first argument in eax
add eax, 4 ; move the pointer to the main argument
mov ebx, dword[eax] ; save the number into ebx
push ebx ; reserve ebx
push ecx ; reserve ecx
cmp ecx, dword 2 ; compare if there are more the one argument given
jg TmA ; if more then one argument is given then jump Too many Argument (TmA)
mov ecx, 0 ; ecx = 0
movzx eax, byte[ebx+ecx] ; eax is the first character number from the inputed number
sub eax, 48 ; subtract 48 to get the actual number/letter/symbol
cmp eax, 10 ; check if eax is less then 10
jg IA ; if eax is greater then 10 then it is a letter or symbol
string_To_int: ; change String to int procedure
add edx, eax ; put the number in edx
inc ecx ; increase counter (ecx)
movzx eax, byte[ebx+ecx] ; move the next number in eax
cmp eax, 0 ; if eax = 0 then there are no more numbers
mov [hantowNumber], edx ; change hantowNumber to what ever is in edx
je rangeCheck ; go to rangeCheck to check if between 2-8
sub eax, 48 ; subtract 48 to get the actual number/letter/symbol
cmp eax, 10 ; check if eax is less then 10
jg IA ; if eax is greater then 10 then it is a letter or symbol
imul edx, 10 ; multiply edx by 10 so the next number can be added to the end
jmp string_To_int ; jump back to string_To_int if not done
rangeCheck: ; check the range of the number
cmp edx, dword 2 ; compare edx with 2
jl AofR ; if hantowNumber (edx) < 2 then go to Argument out of Range (AofR)
cmp edx, dword 8 ; compare edx with 8
jg AofR ; if hantowNumber (edx) > 8 then go to Argument out of Range (AofR)
mov ecx, [hantowNumber] ; move the number enterd by user in ecx
mov esi, 28 ; esi == 28 for stack pointer counter
setStart: ; set the first array the starting peg
mov [start+esi], ecx ; put ecx into the array
sub esi, 4 ; take one away from stack pointer conter
dec ecx ; take one away from ecx so the next number can go in to the array
cmp ecx, 0 ; compare ecx with 0
jne setStart ; if ecx != 0 then go to setStart loop
; This is the section where the algoritham should go for tower of hanoi
mov ecx, [hantowNumber]
towerAlgorithm:
cmp ecx, 0
jg Exit ; jump to Exit at the end of the program
dec ecx
IA:
mov eax, ia ; put the string in eax
push eax ; reserve eax
call print_string ; output the string that is in eax
call print_nl ; print a new line after the output
pop eax ; put eax back to normal
add esp, 4 ; takes 4 from stack
jmp Exit ; jump to Exit at the end of the program
AofR:
mov eax, aofr ; put the string in eax
push eax ; reserve eax
call print_string ; output the string that is in eax
call print_nl ; print a new line after the output
pop eax ; put eax back to normal
add esp, 4 ; takes 4 from stack
jmp Exit ; jump to Exit at the end of the program
TmA:
mov eax, tma ; put the string in eax
push eax ; reserve eax
call print_string ; output the string that is in eax
call print_nl ; print a new line after the output
pop eax ; put eax back to normal
add esp, 4 ; takes 4 from stack
jmp Exit ; jump to Exit at the end of the program
Exit: ; ends the program when it jumps to here
add esp, 9 ; takes 8 from stack
popa
mov eax, 0 ; return back to C
leave
ret
haha I'm doing the exact same assignment and stuck on the algorithm however though when running your code it seems to identify "too many arguments" even though only one argument is provided, consider this algorithm when dealing with arguments(don't forget ./ is considered the "first argument" since it is the zeroth argument provided):
enter 0,0
pusha
; address of 1st argument is on stack at address ebp+12
; address of 2nd arg = address of 1st arg + 4
mov eax, dword [ebp+12] ;eax = address of 1st arg
add eax, 4 ;eax = address of 2nd arg
mov ebx, dword [eax] ;ebx = 2nd arg, it is pointer to string
mov eax, 0 ;clear the register
mov al, [ebx] ;it moves only 1 byte
sub eax, '0' ;now eax contains the numeric value of the firstcharacter of string

How to print a number in assembly NASM?

Suppose that I have an integer number in a register, how can I print it? Can you show a simple example code?
I already know how to print a string such as "hello, world".
I'm developing on Linux.
If you're already on Linux, there's no need to do the conversion yourself. Just use printf instead:
;
; assemble and link with:
; nasm -f elf printf-test.asm && gcc -m32 -o printf-test printf-test.o
;
section .text
global main
extern printf
main:
mov eax, 0xDEADBEEF
push eax
push message
call printf
add esp, 8
ret
message db "Register = %08X", 10, 0
Note that printf uses the cdecl calling convention so we need to restore the stack pointer afterwards, i.e. add 4 bytes per parameter passed to the function.
You have to convert it in a string; if you're talking about hex numbers it's pretty easy. Any number can be represented this way:
0xa31f = 0xf * 16^0 + 0x1 * 16^1 + 3 * 16^2 + 0xa * 16^3
So when you have this number you have to split it like I've shown then convert every "section" to its ASCII equivalent.
Getting the four parts is easily done with some bit magic, in particular with a right shift to move the part we're interested in in the first four bits then AND the result with 0xf to isolate it from the rest. Here's what I mean (soppose we want to take the 3):
0xa31f -> shift right by 8 = 0x00a3 -> AND with 0xf = 0x0003
Now that we have a single number we have to convert it into its ASCII value. If the number is smaller or equal than 9 we can just add 0's ASCII value (0x30), if it's greater than 9 we have to use a's ASCII value (0x61).
Here it is, now we just have to code it:
mov si, ??? ; si points to the target buffer
mov ax, 0a31fh ; ax contains the number we want to convert
mov bx, ax ; store a copy in bx
xor dx, dx ; dx will contain the result
mov cx, 3 ; cx's our counter
convert_loop:
mov ax, bx ; load the number into ax
and ax, 0fh ; we want the first 4 bits
cmp ax, 9h ; check what we should add
ja greater_than_9
add ax, 30h ; 0x30 ('0')
jmp converted
greater_than_9:
add ax, 61h ; or 0x61 ('a')
converted:
xchg al, ah ; put a null terminator after it
mov [si], ax ; (will be overwritten unless this
inc si ; is the last one)
shr bx, 4 ; get the next part
dec cx ; one less to do
jnz convert_loop
sub di, 4 ; di still points to the target buffer
PS: I know this is 16 bit code but I still use the old TASM :P
PPS: this is Intel syntax, converting to AT&T syntax isn't difficult though, look here.
Linux x86-64 with printf
main.asm
default rel ; make [rel format] the default, you always want this.
extern printf, exit ; NASM requires declarations of external symbols, unlike GAS
section .rodata
format db "%#x", 10, 0 ; C 0-terminated string: "%#x\n"
section .text
global main
main:
sub rsp, 8 ; re-align the stack to 16 before calling another function
; Call printf.
mov esi, 0x12345678 ; "%x" takes a 32-bit unsigned int
lea rdi, [rel format]
xor eax, eax ; AL=0 no FP args in XMM regs
call printf
; Return from main.
xor eax, eax
add rsp, 8
ret
GitHub upstream.
Then:
nasm -f elf64 -o main.o main.asm
gcc -no-pie -o main.out main.o
./main.out
Output:
0x12345678
Notes:
sub rsp, 8: How to write assembly language hello world program for 64 bit Mac OS X using printf?
xor eax, eax: Why is %eax zeroed before a call to printf?
-no-pie: plain call printf doesn't work in a PIE executable (-pie), the linker only automatically generates a PLT stub for old-style executables. Your options are:
call printf wrt ..plt to call through the PLT like traditional call printf
call [rel printf wrt ..got] to not use a PLT at all, like gcc -fno-plt.
Like GAS syntax call *printf#GOTPCREL(%rip).
Either of these are fine in a non-PIE executable as well, and don't cause any inefficiency unless you're statically linking libc. In which case call printf can resolve to a call rel32 directly to libc, because the offset from your code to the libc function would be known at static linking time.
See also: Can't call C standard library function on 64-bit Linux from assembly (yasm) code
If you want hex without the C library: Printing Hexadecimal Digits with Assembly
Tested on Ubuntu 18.10, NASM 2.13.03.
It depends on the architecture/environment you are using.
For instance, if I want to display a number on linux, the ASM code will be different from the one I would use on windows.
Edit:
You can refer to THIS for an example of conversion.
I'm relatively new to assembly, and this obviously is not the best solution,
but it's working. The main function is _iprint, it first checks whether the
number in eax is negative, and prints a minus sign if so, than it proceeds
by printing the individual numbers by calling the function _dprint for
every digit. The idea is the following, if we have 512 than it is equal to: 512 = (5 * 10 + 1) * 10 + 2 = Q * 10 + R, so we can found the last digit of a number by dividing it by 10, and
getting the reminder R, but if we do it in a loop than digits will be in a
reverse order, so we use the stack for pushing them, and after that when
writing them to stdout they are popped out in right order.
; Build : nasm -f elf -o baz.o baz.asm
; ld -m elf_i386 -o baz baz.o
section .bss
c: resb 1 ; character buffer
section .data
section .text
; writes an ascii character from eax to stdout
_cprint:
pushad ; push registers
mov [c], eax ; store ascii value at c
mov eax, 0x04 ; sys_write
mov ebx, 1 ; stdout
mov ecx, c ; copy c to ecx
mov edx, 1 ; one character
int 0x80 ; syscall
popad ; pop registers
ret ; bye
; writes a digit stored in eax to stdout
_dprint:
pushad ; push registers
add eax, '0' ; get digit's ascii code
mov [c], eax ; store it at c
mov eax, 0x04 ; sys_write
mov ebx, 1 ; stdout
mov ecx, c ; pass the address of c to ecx
mov edx, 1 ; one character
int 0x80 ; syscall
popad ; pop registers
ret ; bye
; now lets try to write a function which will write an integer
; number stored in eax in decimal at stdout
_iprint:
pushad ; push registers
cmp eax, 0 ; check if eax is negative
jge Pos ; if not proceed in the usual manner
push eax ; store eax
mov eax, '-' ; print minus sign
call _cprint ; call character printing function
pop eax ; restore eax
neg eax ; make eax positive
Pos:
mov ebx, 10 ; base
mov ecx, 1 ; number of digits counter
Cycle1:
mov edx, 0 ; set edx to zero before dividing otherwise the
; program gives an error: SIGFPE arithmetic exception
div ebx ; divide eax with ebx now eax holds the
; quotent and edx the reminder
push edx ; digits we have to write are in reverse order
cmp eax, 0 ; exit loop condition
jz EndLoop1 ; we are done
inc ecx ; increment number of digits counter
jmp Cycle1 ; loop back
EndLoop1:
; write the integer digits by poping them out from the stack
Cycle2:
pop eax ; pop up the digits we have stored
call _dprint ; and print them to stdout
dec ecx ; decrement number of digits counter
jz EndLoop2 ; if it's zero we are done
jmp Cycle2 ; loop back
EndLoop2:
popad ; pop registers
ret ; bye
global _start
_start:
nop ; gdb break point
mov eax, -345 ;
call _iprint ;
mov eax, 0x01 ; sys_exit
mov ebx, 0 ; error code
int 0x80 ; край
Because you didn't say about number representation I wrote the following code for unsigned number with any base(of course not too big), so you could use it:
BITS 32
global _start
section .text
_start:
mov eax, 762002099 ; unsigned number to print
mov ebx, 36 ; base to represent the number, do not set it too big
call print
;exit
mov eax, 1
xor ebx, ebx
int 0x80
print:
mov ecx, esp
sub esp, 36 ; reserve space for the number string, for base-2 it takes 33 bytes with new line, aligned by 4 bytes it takes 36 bytes.
mov edi, 1
dec ecx
mov [ecx], byte 10
print_loop:
xor edx, edx
div ebx
cmp dl, 9 ; if reminder>9 go to use_letter
jg use_letter
add dl, '0'
jmp after_use_letter
use_letter:
add dl, 'W' ; letters from 'a' to ... in ascii code
after_use_letter:
dec ecx
inc edi
mov [ecx],dl
test eax, eax
jnz print_loop
; system call to print, ecx is a pointer on the string
mov eax, 4 ; system call number (sys_write)
mov ebx, 1 ; file descriptor (stdout)
mov edx, edi ; length of the string
int 0x80
add esp, 36 ; release space for the number string
ret
It's not optimised for numbers with base of power of two and doesn't use printf from libc.
The function print outputs the number with a new line. The number string is formed on stack. Compile by nasm.
Output:
clockz
https://github.com/tigertv/stackoverflow-answers/tree/master/8194141-how-to-print-a-number-in-assembly-nasm

How would I find the length of a string using NASM?

I'm trying to make a program using NASM that takes input from command line arguments. Since string length is not provided, I'm trying to make a function to compute my own. Here is my attempt, which takes a pointer to a string in the ebx register, and returns the length of the string in ecx:
len:
push ebx
mov ecx,0
dec ebx
count:
inc ecx
inc ebx
cmp ebx,0
jnz count
dec ecx
pop ebx
ret
My method is to go through the string, character by character, and check if it's null. If it's not, I increment ecx and go to the next character. I believe the problem is that cmp ebx,0 is incorrect for what I'm trying to do. How would I properly go about checking whether the character is null? Also, are there other things that I could be doing better?
You are comparing the value in ebx with 0 which is not what you want. The value in ebx is the address of a character in memory so it should be dereferenced like this:
cmp byte[ebx], 0
Also, the last push ebx should be pop ebx.
Here is how I do it in a 64-bit Linux executable that checks argv[1]. The kernel starts a new process with argc and argv[] on the stack, as documented in the x86-64 System V ABI.
_start:
pop rsi ; number of arguments (argc)
pop rsi ; argv[0] the command itself (or program name)
pop rsi ; rsi = argv[1], a pointer to a string
mov ecx, 0 ; counter
.repeat:
lodsb ; byte in AL
test al,al ; check if zero
jz .done ; if zero then we're done
inc ecx ; increment counter
jmp .repeat ; repeat until zero
.done:
; string is unchanged, ecx contains the length of the string
; unused, we look at command line args instead
section .rodata
asciiz: db "This is a string with 36 characters.", 0
This is slow and inefficient, but easy to understand.
For efficiency, you'd want
only 1 branch in the loop (Why are loops always compiled into "do...while" style (tail jump)?)
avoid a false dependency by loading with movzx instead of merging into the previous RAX value (Why doesn't GCC use partial registers?).
subtract pointers after the loop instead of incrementing a counter inside.
And of course SSE2 is always available in x86-64, so we should use that to check in chunks of 16 bytes (after reaching an alignment boundary). See optimized hand-written strlen implementations like in glibc. (https://code.woboq.org/userspace/glibc/sysdeps/x86_64/strlen.S.html).
Here how I would have coded it
len:
push ebx
mov eax, ebx
lp:
cmp byte [eax], 0
jz lpend
inc eax
jmp lp
lpend:
sub eax, ebx
pop ebx
ret
(The result is in eax). Likely there are better ways.

Resources