linux assembly reverse a string - linux

SO I am working on writing an inline assembly function in c that reverses that contents of a string and puts the reversed string in a new character array, but I am getting extra characters added to the end of my reversed string.
int main(int argc, char **argv) {
char *new_str;
char old_str[20] = "Hello, World!";
mystrrev(new_str, old_str, strlen(old_str));
printf("New string: %s\n", &new_str);
return 0;
}
Assembly function
char *mystrrev(char *dest, const char *src, int size) {
int d0, d1, d2, d3;
__asm__ (
"add %%ebx, %%esi\n\t" /*Move to end of string*/
"std\n\t" /*Decrement esi after load*/
"lodsb\n\t" /*Load esi into al*/
"sub $1, %%ebx\n\t"
"mov %%al, %%cl\n\t" /*mov contents of al to cl*/
"1:\tstd\n\t" /*Begin loop, decrement esi after load*/
"lodsb\n\t" /*Load esi into al*/
"cld\n\t" /*Clear flg*/
"stosb\n\t" /*Store al in edi*/
"sub $1, %%ebx\n\t" /*subtract 1 from strlenght counter*/
"cmp $0, %%ebx\n\t" /*Compare ebx to 0*/
"jne 1b\n\t"
"mov %%ecx, %%edi\n\t" /*Add null terminating char to new str*/
: "=&S"(d0), "=&D"(d1), "=&a"(d2), "=&b" (d3) /*output*/
/*** &S --> ESI, &D --> EDI, &a --> eax ***/
: "0" (src), "1" (dest), "3" (size) /*input*/
: "memory"); /*clobber registers*/
return dest;
}
In this case Hello, World! gets printed out [!dlroW, olleH] but there are extra characters added at the end and I cannot figure out why.
Any thoughts??

The
mov %ecx, %edi /*Add null terminating char to new str*/
instruction does not write anything to memory; it just copies a zero into the register %edi itself. The correct syntax for a memory write, if I remember correctly, is something like
movb %cl, (%edi)
But it is really pointless to meticulously store the zero terminator in %cl during the copying, because you know it's going to be a zero anyway. So just
movb $0, (%edi)
ought to work just as well.
(Also, but unrelated, the string instructions are actually slower than equivalent combinations of separate load/store and increment/decrement operations. At least this used to be the case around the 486 or early Pentium era, and it would surprise me if it's not still true -- especially if you need to manipulate the direction flag each time anyway).
((Also also, it appears to be rather pointless to declare output variables for everything that you don't use anyway. What's the point of that. Even if you need to hardcode the use of eax for the string instructions, simply listing "eax" in the clobber list is immensely clearer than forcing it into a dummy output variable)).

Related

Segmentation fault if I try to print a string char by char [duplicate]

This question already has answers here:
What registers to save in the ARM C calling convention?
(6 answers)
Printf Change values in registers, ARM Assembly
(2 answers)
Closed 2 years ago.
I am trying to print a string char by char, with this ARM32 code:
.global main
.type main%function
# r0 = asciz c
# r1 = singlechar
# r2 = string
# r3 = offset
main:
mov r3,#0 // initialize offset
ldr r0,=single_c
ldr r2,=string
push {ip,lr} // save the lr
loop:
ldrb r1,[r2,r3] // load 1 byte of the address string+offset
cmp r1,#0 // if the char is the null char
beq end // then go to the end
bl printf // else printf("%c",r1)
add r3,#1 // increase offset
b loop // repeat the loop
end:
pop {ip,lr} // restore lr
bx lr // return
string:
.asciz "Test\n"
single_c:
.asciz "%c\n"
I really don't get what am I doing wrong, since if I execute it, I get this:
$ /a.out
T
Segmentation Fault
So it only prints the first letter.
EDIT:
I added a push {r0,r1,r2,r3} before the printf call and a pop {r0,r1,r2,r3} after the printf call, and now it works... but my question is: does printf ALWAYS modify the first four register?

How do I count the number of characters in a string in NASM?

I want to take an input and display the input with how many characters are in the string.
So far I'm able to produce the string, but I'm confused as to how I get the input?
You can use this function:
count db 0; (or resb 1), this is the place where will stay the result
count_string:
lodsb; // load char( letter )
cmp al, 0x00; // check string end; cause 0x00 its not a letter or number, 0 is 30h in ascii
jz done; // count done
inc byte [count]; adds 1 to count register
jmp count_string; // check next char
done:
ret; // exit function
And you can define a string this way:
data:
; or in the section .data
string DB "This is my string", 0; 0 means the end of the string
you can call the function this way:
mov si, string; move the string pointer to register
call count_string;
and if you are using nasm inside a OS like windows or ubuntu uou can watch this: https://www.youtube.com/watch?v=VAy4FGHDx1I
but in x86 i did a 4 byte( for letters ) command line, its really easy to get much more but it was just a quick experiment i used int 16h( keyboard )

Why an extra `CMP` command are needed to implement test-and-set(TSL)?

The following codes are from Wikipedia (http://en.wikipedia.org/wiki/Test-and-set)
enter_region: ; A "jump to" tag; function entry point.
tsl reg, flag ; Test and Set Lock; flag is the
; shared variable; it is copied
; into the register reg and flag
; then atomically set to 1.
cmp reg, #0 ; Was flag zero on entry_region?
jnz enter_region ; Jump to enter_region if
; reg is non-zero; i.e.,
; flag was non-zero on entry.
ret ; Exit; i.e., flag was zero on
; entry. If we get here, tsl
; will have set it non-zero; thus,
; we have claimed the resource as-
; sociated with flag.
leave_region:
move flag, #0 ;store 0 in flag
ret ;return to caller
As I understand these 2 commands tsl reg, flag and cmp reg, #0 can just merge into one command tsl reg, flag, which does three things together: (1) copy flag into reg (2) set flag to 1 (3) test whether reg is zero. What is the benefit or the necessarity to seperate the (3) from tsl command? Does anyone have ideas about this?
tsl does not actually test the contents of flag (besides its name); it justs copies flag to reg atomically. In x86 it is usually implemented with the xchg or compare-and-swap instructions, when the latter are avaliable.

Register value followed by a capture group in a replace gives error

I'm trying to use vim to port an assembly language file to a new assembler. The new assembler does not support local labels. So I'm trying to insert the label name before each local label. But trying to use a register followed by a capture group causes an "Invalid expression error".
Example input:
LABEL_NAME:
MOV SP,#STACK_POINTER
MOV R7,#12 ;WAIT LOOP (3)
12$: MOV R6,#255 ;TIME WASTER
11$: MOV R5,#255 ;MUST WAIT !!!!
10$: RESET_WATCH_DOG
DJNZ R5,10$
DJNZ R6,11$
DJNZ R7,12$
Desired output:
LABEL_NAME:
MOV SP,#STACK_POINTER
MOV R7,#12 ;WAIT LOOP (3)
LABEL_NAME12$: MOV R6,#255 ;TIME WASTER
LABEL_NAME11$: MOV R5,#255 ;MUST WAIT !!!!
LABEL_NAME10$: RESET_WATCH_DOG
DJNZ R5,LABEL_NAME10$
DJNZ R6,LABEL_NAME11$
DJNZ R7,LABEL_NAME12$
This, of course, works:
:.,10s/\(\d\+\$\)/LABEL_NAME\1/
After yanking the label name into register a, this:
:.,10s/\(\d\+\$\)/\=#a/
Gives me this:
LABEL_NAME:
MOV SP,#STACK_POINTER
MOV R7,#12 ;WAIT LOOP (3)
LABEL_NAME: MOV R6,#255 ;TIME WASTER
LABEL_NAME: MOV R5,#255 ;MUST WAIT !!!!
LABEL_NAME: RESET_WATCH_DOG
DJNZ R5,LABEL_NAME
DJNZ R6,LABEL_NAME
DJNZ R7,LABEL_NAME
But this:
:.,10s/\(\d\+\$\)/\=#a\1/
Produces only this:
E15: Invalid expression: #a\1
I'm using gVim 7.3. Ideally I'd like to know why I can't use register expansion followed by capture group expansion in a replace statement. But alternative solutions would also be appreciated.
If you use expression in replacement part, you have to use expression in whole part, not hybrid expression. this should work:
:.,10s/\(\d\+\$\)/\=#a . submatch(1)/g
short explanation:
\= " the replacement will be expression
#a " read the value from register 'a'
. " the string concatenation
submatch(x) " (function) get the matched group from your :s command, x=0,1,2...
help item:
:h :s
:h submatch()

converting a decimal/hex value in a register to ascii

So say I have this value in a register ebx: 30303420
I want to convert that and print out the corresponding ascii values. So it SHOULD print out
004
30 == 0
30 == 0
34 == 4
20 == space character.
How would I get that to print on the screen?
This is 80x86 architecture, using assembly code.
Well, your question have a couple unclear details.
1- If you have 30303420 Hex value in ebx, then you have 4 Ascii characters, precisely "004 ", that is:
mov ebx,30303420H ;is exactly the same than:
mov ebx,"004 "
You have NOT any decimal value (wich one?), so there is not any conversion here.
2- If you want to show that ebx value in the screen, so it shows "004 ", then you must specify under which operating system your program will run in order to use the appropiate services. For example, if you want to use old-style MS-DOS INT 21H functions, that also run in a DOS Window in Windows, then this segment do that:
mov cx,4 ;counter = 4 characters
;
next:
rol  ebx,8 ;rotate left EBX 1 byte: place next char in BL
mov dl,bl ;DL = char to show
mov ah,2 ;AH = VIDEO OUTPUT function
int 21H  ;DOS kernel service Int: show the char
loop next ;repeat 4 times
However, if your program run under Linux, the method to show ebx value is entirely different. Also, your program may use a C library function in a different way, or be a Windows-compliant program, or use BIOS INT 10H service (in charge of the screen), or even directly access the video circuitry, etc, etc, etc...

Resources