How to split / truncate a string variable value in Assembly?

How to split / truncate a string variable value in Assembly? - string

I am currently working on a project and for storage's sake I would like to cut off a variable in assembly, and (optionally) make that the value of a register, such as eax.
I will need code that works with NASM using Intel syntax.
For example, if the variable "msg" is set to "29ak49", I want to take a part of that, like "9ak4", and put it in a register, or something similar.

As Peter Cordes mentioned in the comments, you can always add a null terminator (0) into the existing string's buffer to right truncate; if you don't mind modifying the original string data.
The example below will retrieve a substring without modifying the original string.
If you have the address of a variable, and you know where you want to truncate it, you can take the address of the starting position of the data, and add an offset to left truncate. To right truncate you can just read as many characters as you need from the new offset.
For example in x86:
msg db '29ak49' ; create a string (1 byte per char)
;; left truncate
mov esi, msg ; get the address of the start of the string
add esi, OFFSET_INTO_DATA ; offset into the string (1 byte per char)
;; right truncate
mov edi, NUM_CHARS ; number of characters to take
.loop:
movzx eax, byte [esi] ; get the value of the next character
;; do something with the character in eax
inc esi
dec edi
jnz .loop
;; end loop
EDIT:
The following is a runable test implementation as a 32-bit Linux application that prints out the substring selected based on OFFSET_INTO_DATA and NUM_CHARS (note: the algorithm is the same, but the registers have changed):
section .text
global _start
_start:
;; left truncate
mov esi, msg ; get the address of the start of the string
add esi, OFFSET_INTO_DATA ; offset into the string (1 byte per char)
;; right truncate
mov edi, NUM_CHARS ; number of characters to take
.loop:
mov ecx, esi ; get the address of the next character
call print_char_32
inc esi
dec edi
jnz .loop
jmp halt
;;; input: ecx -> character to display
print_char_32:
mov edx, 1 ; PRINT
mov ebx, 1 ;
mov eax, 4 ;
int 0x80 ;
ret
halt:
mov eax, 1 ; EXIT
int 0x80 ;
jmp halt
section .data
msg db '29ak49' ; create a string (1 byte per char)
OFFSET_INTO_DATA EQU 1
NUM_CHARS EQU 3
Compiled with:
nasm -f elf substring.asm ; ld -m elf_i386 -s -o substring substring.o

Related

segmentation fault with assembler code [duplicate]

I am doing a proj. in 64-bit NASM. I have to convert decimal to binary and binary to decimal.
I keep getting segmentation fault after debugging when i call printf.
extern printf
section .bss
decsave: resd 2 ; stores dec->bin conversion
binsave: resd 1
section .data ; preset constants, writeable
dec1: db '1','2','4','.','3','7','5',0
bin1: dq 01010110110101B ; 10101101.10101 note where binary point should be
ten: dq 10
debug: db "debug 124 is %ld", 10, 0
section .text ; instructions, code segment
global main ; for gcc standard linking
main: ; label
push rbp ; save rbp
;parse and convert integer portion of dec->bin
mov rax,0 ; accumulate value here
mov al,[dec1] ; get first ASCII digit
sub al,48 ; convert ASCII digit to binary
mov rbx,0 ; clear register (upper part)
mov bl,[dec1+1] ; get next ASCII digit
sub rbx,48 ; convert ASCII digit to binary
imul rax,10 ; ignore rdx
add rax,rbx ; increment accumulator
mov rbx,0
mov bl,[dec1+2]
sub rbx,48
imul rax,10
add rax,rbx
mov [decsave],rax ; save decimal portion
mov rdi, debug
mov rsi, [decsave]
mov rax,0
call printf
; return using c-style pops to return stack to correct position
; and registers to correct content
pop rbp
mov rax,0
ret ; return
; print the bits in decsave:
section .bss
abits: resb 17 ; 16 characters & zero terminator
section .data
fmts: db "%s",0
section .text
; shift decimal portion into abits as ascii
mov rax,[decsave] ; restore rax to dec. portion
mov rcx,8 ; for printing 1st 8 bits
loop3: mov rdx,0 ; clear rdx ready for a bit
shld rdx,rax,1 ; top bit of rax into rdx
add rdx,48 ; make it ASCII
mov [abits+rcx-1],dl ; store character
ror rax,1 ; next bit into top of rax
loop loop3 ; decrement rcx, jump non zero
mov byte [abits+7],'.' ; end of dec. portion string
mov byte [abits+8],0 ; end of "C" string
push qword abits ; string to print
push qword fmts ; "%s"
call printf
add rsp,8
mov rax,[decsave+16] ; increment to fractional portion
mov rcx,16 ; for printing 3 bits as required in the directions
loop4: mov rdx,0 ; clear rdx ready for a bit
shld rdx,rax,1 ; top bit of rax into rdx
add rdx,48 ; make it ASCII
mov [abits+rcx-1],dl ; store character
ror rax,1 ; next bit into top of rax
loop loop4 ; decrement rcx, jump non zero
mov byte [abits+3],10 ; end of "C" string at 3 places
mov byte [abits+4],0 ; end of "C" string
push qword abits ; string to print
push qword fmts ; "%s"
call printf
add rsp,8
Is there a any other way to get around it?
Thank you.

As Jester pointed out, if the vararg function is not using sse, then al must be zero. There is a bigger issue here:
With the x86-64 calling convention, parameters are not passed on the stack as they are for 32bit, but instead passed through registers. Which registers all depend on what OS your program is written for.
x86 calling conventions

Move to end of string within buffer - Assembly Languge

I am trying to take in a string and then see if the last value in the string is an EOL character. I figured I would use the length of the string read in and then add it to the address of the buffer to find the last element. This does not seem to work.
Edit: I apologize that I did not include more information. Variables are defined as such:
%define BUFLEN 256
SECTION .bss ; uninitialized data section
buf: resb BUFLEN ; buffer for read
newstr: resb BUFLEN ; converted string
rlen: resb 4
Then a dos interrupt is called to accept a string from the user like so:
; read user input
;
mov eax, SYSCALL_READ ; read function
mov ebx, STDIN ; Arg 1: file descriptor
mov ecx, buf ; Arg 2: address of buffer
mov edx, BUFLEN ; Arg 3: buffer length
int 080h
Then we go into our loop:
test_endl:
mov ecx, [rlen]
mov esi, buf
add esi, ecx ; i want to move 'rlen' bytes into buf
mov al, [esi] ; to see what the last element is
cmp al, 10 ; compare it to EOL
jnz L1_init
dec ecx ; and then decrease 'rlen' if it is an EOL
mov [rlen], ecx\
I am user NASM to compile and writing for an i386 machine.

Adding the length of the string to the address of the buffer gives access to the byte behind the string.
Based on you saying that
you want to see if the last value in the string is an EOL character
you aim to decrease 'rlen' if it is an EOL (*)
I conclude that you consider the possible EOL character part of the string as defined by its length rlen. If you don't then (*) doesn't make sense.
Use mov al,[esi-1] to see what the last element is!
test_endl:
mov ecx, [rlen]
mov esi, buf
add esi, ecx ; i want to move 'rlen' bytes into buf
mov al, [esi-1] ; to see what the last element is
cmp al, 10 ; compare it to EOL
jnz L1_init
dec ecx ; and then decrease 'rlen' if it is an EOL
mov [rlen], ecx

This is a much more roundabout way (literally) of getting to the end of the string. I loop through all the characters in the string based on what the size of the counter, rlen, is. Then, once the loop is complete, I make the comparison and decrement rlen as necessary.
test_loop:
mov al, [esi] ; get a character
inc esi ; update source pointer
dec ecx ; update char count
jnz test_loop ; loop to top if more chars
cmp al, 10 ; comparison
jne L1_init ; if not EOL jump to L1_init
mov ecx, [rlen] ; decrease the size of rlen if necessary
dec ecx
mov [rlen], ecx

Copy string from BSS variable to BSS variable in Assembly

Let's suppose I have to string stored in variables created in the .BSS section.
var1 resw 5 ; this is "abcde" (UNICODE)
var2 resw 5 ; here I will copy the first one
How would I do this with NASM?
I tried something like this:
mov ebx, var2 ; Here we will copy the string
mov dx, 5 ; Length of the string
mov esi, dword var1 ; The variable to be copied
.Copy:
lodsw
mov [ebx], word ax ; Copy the character into the address from EBX
inc ebx ; Increment the EBX register for the next character to copy
dec dx ; Decrement DX
cmp dx, 0 ; If DX is 0 we reached the end
jg .Copy ; Otherwise copy the next one
So, first problem is that the string is not copied as UNICODE but as ASCII and I don't know why. Secondly, I know there might be some not recommended use of some registers. And lastly, I wonder if there is some quicker way of doing this (maybe there are instructions specially created for this kind of operations with strings). I'm talking about 8086 processors.

inc ebx ; Increment the EBX register for the next character to copy
A word is 2 bytes, but you're only stepping ebx 1 byte ahead. Replace inc ebx with add ebx,2.

Michael already answered about the obvious problem of the demonstrated code.
But there is also another layer of understanding. It is not important how you will copy the string from one buffer to another - by bytes, words or double words. It will always create exact copy of the string.
So, how to copy the string is a matter of optimization. Using rep movsd is the fastest known way.
Here is one example:
; ecx contains the length of the string in bytes
; esi - the address of the source, aligned on dword
; edi - the address of the destination aligned on dword
push ecx
shr ecx, 2
rep movsd
pop ecx
and ecx, 3
rep movsb

How to print a number in assembly NASM?

Suppose that I have an integer number in a register, how can I print it? Can you show a simple example code?
I already know how to print a string such as "hello, world".
I'm developing on Linux.

If you're already on Linux, there's no need to do the conversion yourself. Just use printf instead:
;
; assemble and link with:
; nasm -f elf printf-test.asm && gcc -m32 -o printf-test printf-test.o
;
section .text
global main
extern printf
main:
mov eax, 0xDEADBEEF
push eax
push message
call printf
add esp, 8
ret
message db "Register = %08X", 10, 0
Note that printf uses the cdecl calling convention so we need to restore the stack pointer afterwards, i.e. add 4 bytes per parameter passed to the function.

You have to convert it in a string; if you're talking about hex numbers it's pretty easy. Any number can be represented this way:
0xa31f = 0xf * 16^0 + 0x1 * 16^1 + 3 * 16^2 + 0xa * 16^3
So when you have this number you have to split it like I've shown then convert every "section" to its ASCII equivalent.
Getting the four parts is easily done with some bit magic, in particular with a right shift to move the part we're interested in in the first four bits then AND the result with 0xf to isolate it from the rest. Here's what I mean (soppose we want to take the 3):
0xa31f -> shift right by 8 = 0x00a3 -> AND with 0xf = 0x0003
Now that we have a single number we have to convert it into its ASCII value. If the number is smaller or equal than 9 we can just add 0's ASCII value (0x30), if it's greater than 9 we have to use a's ASCII value (0x61).
Here it is, now we just have to code it:
mov si, ??? ; si points to the target buffer
mov ax, 0a31fh ; ax contains the number we want to convert
mov bx, ax ; store a copy in bx
xor dx, dx ; dx will contain the result
mov cx, 3 ; cx's our counter
convert_loop:
mov ax, bx ; load the number into ax
and ax, 0fh ; we want the first 4 bits
cmp ax, 9h ; check what we should add
ja greater_than_9
add ax, 30h ; 0x30 ('0')
jmp converted
greater_than_9:
add ax, 61h ; or 0x61 ('a')
converted:
xchg al, ah ; put a null terminator after it
mov [si], ax ; (will be overwritten unless this
inc si ; is the last one)
shr bx, 4 ; get the next part
dec cx ; one less to do
jnz convert_loop
sub di, 4 ; di still points to the target buffer
PS: I know this is 16 bit code but I still use the old TASM :P
PPS: this is Intel syntax, converting to AT&T syntax isn't difficult though, look here.

Linux x86-64 with printf
main.asm
default rel ; make [rel format] the default, you always want this.
extern printf, exit ; NASM requires declarations of external symbols, unlike GAS
section .rodata
format db "%#x", 10, 0 ; C 0-terminated string: "%#x\n"
section .text
global main
main:
sub rsp, 8 ; re-align the stack to 16 before calling another function
; Call printf.
mov esi, 0x12345678 ; "%x" takes a 32-bit unsigned int
lea rdi, [rel format]
xor eax, eax ; AL=0 no FP args in XMM regs
call printf
; Return from main.
xor eax, eax
add rsp, 8
ret
GitHub upstream.
Then:
nasm -f elf64 -o main.o main.asm
gcc -no-pie -o main.out main.o
./main.out
Output:
0x12345678
Notes:
sub rsp, 8: How to write assembly language hello world program for 64 bit Mac OS X using printf?
xor eax, eax: Why is %eax zeroed before a call to printf?
-no-pie: plain call printf doesn't work in a PIE executable (-pie), the linker only automatically generates a PLT stub for old-style executables. Your options are:
call printf wrt ..plt to call through the PLT like traditional call printf
call [rel printf wrt ..got] to not use a PLT at all, like gcc -fno-plt.
Like GAS syntax call *printf#GOTPCREL(%rip).
Either of these are fine in a non-PIE executable as well, and don't cause any inefficiency unless you're statically linking libc. In which case call printf can resolve to a call rel32 directly to libc, because the offset from your code to the libc function would be known at static linking time.
See also: Can't call C standard library function on 64-bit Linux from assembly (yasm) code
If you want hex without the C library: Printing Hexadecimal Digits with Assembly
Tested on Ubuntu 18.10, NASM 2.13.03.

It depends on the architecture/environment you are using.
For instance, if I want to display a number on linux, the ASM code will be different from the one I would use on windows.
Edit:
You can refer to THIS for an example of conversion.

I'm relatively new to assembly, and this obviously is not the best solution,
but it's working. The main function is _iprint, it first checks whether the
number in eax is negative, and prints a minus sign if so, than it proceeds
by printing the individual numbers by calling the function _dprint for
every digit. The idea is the following, if we have 512 than it is equal to: 512 = (5 * 10 + 1) * 10 + 2 = Q * 10 + R, so we can found the last digit of a number by dividing it by 10, and
getting the reminder R, but if we do it in a loop than digits will be in a
reverse order, so we use the stack for pushing them, and after that when
writing them to stdout they are popped out in right order.
; Build : nasm -f elf -o baz.o baz.asm
; ld -m elf_i386 -o baz baz.o
section .bss
c: resb 1 ; character buffer
section .data
section .text
; writes an ascii character from eax to stdout
_cprint:
pushad ; push registers
mov [c], eax ; store ascii value at c
mov eax, 0x04 ; sys_write
mov ebx, 1 ; stdout
mov ecx, c ; copy c to ecx
mov edx, 1 ; one character
int 0x80 ; syscall
popad ; pop registers
ret ; bye
; writes a digit stored in eax to stdout
_dprint:
pushad ; push registers
add eax, '0' ; get digit's ascii code
mov [c], eax ; store it at c
mov eax, 0x04 ; sys_write
mov ebx, 1 ; stdout
mov ecx, c ; pass the address of c to ecx
mov edx, 1 ; one character
int 0x80 ; syscall
popad ; pop registers
ret ; bye
; now lets try to write a function which will write an integer
; number stored in eax in decimal at stdout
_iprint:
pushad ; push registers
cmp eax, 0 ; check if eax is negative
jge Pos ; if not proceed in the usual manner
push eax ; store eax
mov eax, '-' ; print minus sign
call _cprint ; call character printing function
pop eax ; restore eax
neg eax ; make eax positive
Pos:
mov ebx, 10 ; base
mov ecx, 1 ; number of digits counter
Cycle1:
mov edx, 0 ; set edx to zero before dividing otherwise the
; program gives an error: SIGFPE arithmetic exception
div ebx ; divide eax with ebx now eax holds the
; quotent and edx the reminder
push edx ; digits we have to write are in reverse order
cmp eax, 0 ; exit loop condition
jz EndLoop1 ; we are done
inc ecx ; increment number of digits counter
jmp Cycle1 ; loop back
EndLoop1:
; write the integer digits by poping them out from the stack
Cycle2:
pop eax ; pop up the digits we have stored
call _dprint ; and print them to stdout
dec ecx ; decrement number of digits counter
jz EndLoop2 ; if it's zero we are done
jmp Cycle2 ; loop back
EndLoop2:
popad ; pop registers
ret ; bye
global _start
_start:
nop ; gdb break point
mov eax, -345 ;
call _iprint ;
mov eax, 0x01 ; sys_exit
mov ebx, 0 ; error code
int 0x80 ; край

Because you didn't say about number representation I wrote the following code for unsigned number with any base(of course not too big), so you could use it:
BITS 32
global _start
section .text
_start:
mov eax, 762002099 ; unsigned number to print
mov ebx, 36 ; base to represent the number, do not set it too big
call print
;exit
mov eax, 1
xor ebx, ebx
int 0x80
print:
mov ecx, esp
sub esp, 36 ; reserve space for the number string, for base-2 it takes 33 bytes with new line, aligned by 4 bytes it takes 36 bytes.
mov edi, 1
dec ecx
mov [ecx], byte 10
print_loop:
xor edx, edx
div ebx
cmp dl, 9 ; if reminder>9 go to use_letter
jg use_letter
add dl, '0'
jmp after_use_letter
use_letter:
add dl, 'W' ; letters from 'a' to ... in ascii code
after_use_letter:
dec ecx
inc edi
mov [ecx],dl
test eax, eax
jnz print_loop
; system call to print, ecx is a pointer on the string
mov eax, 4 ; system call number (sys_write)
mov ebx, 1 ; file descriptor (stdout)
mov edx, edi ; length of the string
int 0x80
add esp, 36 ; release space for the number string
ret
It's not optimised for numbers with base of power of two and doesn't use printf from libc.
The function print outputs the number with a new line. The number string is formed on stack. Compile by nasm.
Output:
clockz
https://github.com/tigertv/stackoverflow-answers/tree/master/8194141-how-to-print-a-number-in-assembly-nasm

How do I print out the content of a register in Hex

I'm currently getting started with NASM and wanted to know, how to output the contents of a register with NASM in Hexadecimal.
I can output the content of eax with
section .bss
reg_buf: resb 4
.
.
.
print_register:
mov [reg_buf], eax
mov eax, SYS_WRITE
mov ebx, SYS_OUT
mov ecx, reg_buf
mov edx, 4
int 80h
ret
Let's say eax contains 0x44444444 then the output would be "DDDD". Apparently each pair of "44" is interpreted as 'D'. My ASCII table approves this.
But how do I get my program to output the actual register content (0x44444444)?

This is how i was taught to do it..
.
.
SECTION .data
numbers: db "0123456789ABCDEF" ;; have initialized string of all the digits in base 16
.
.
.
;;binary to hex
mov al , byte [someBuffer+someOffset] ;; some buffer( or whatever ) with your data in
mov ebx, eax ;; dealing with nybbles so make copy for obtaining second nybble
and al,0Fh ;; mask out bits for first nybble
mov al, byte [numbers+eax] ;; offset into numbers is the hex equiv in numbers string
mov byte [someAddress+someOffset+2], al
;;store al into a buffer or string or whatever it's the first hex number
shr bl, 4 ;; get next nybble
mov bl, byte [numbers+ebx] ;; get the hex equiv from numbers string
mov byte [someAddress+someOffset+1], bl
;;place into position next to where al was stored, this completes the process,
;;you now have your hexadecimal equivalent output it or whatever you need with it
.
.
.

You need to format your register as a text string first. The simplest to use API would likely be itoa, followed by your write call. You will need a string buffer allocated for this to work.
If you don't want to do it in assembly, you can make a quick C/Python/Perl/etc program to read from your program and make all output text.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to split / truncate a string variable value in Assembly? - string

Related

segmentation fault with assembler code [duplicate]

Move to end of string within buffer - Assembly Languge

Copy string from BSS variable to BSS variable in Assembly

How to print a number in assembly NASM?

How do I print out the content of a register in Hex

Categories

Resources