I want to delete and then add a character from an ASCII string in Assembly Language (8086). For example in the following code i want to delete the carriage return from the string and add the 0. As a matter of fact the interrupt 39h wants an ASCIIZ pathname but 0Ah adds a final carriage return character and not 0. How can I do it?
.model tiny
.data
folderpath DB "",0
.code
org 0100h
inizio:
mov ah,0ah
lea folderpath ,dx
int 21h
; HERE I WOULD LIKE TO MODIFY THE STRING
lea dx, folderpath
mov ah,39h
int 21h
fine:
mov AH,4Ch
int 21h
end inizio
Your input buffer must have a byte value of the maximum characters that the buffer can hold in the first byte. Plus another byte as a placeholder for the number of characters actually read and additional the amount of bytes for the string itself.
Format of DOS input buffer:
00h BYTE maximum characters buffer can hold
01h BYTE (call) number of chars from last input which may be recalled
(ret) number of characters actually read, excluding CR
02h N BYTEs actual characters read, including the final carriage return
Example:
len = 10
Input_Buffer DB len, ?
folderpath DB len+1 dup (" ")
After the input we can get the number of the characters for calculating the address of the carriage return and for to override it with a zero byte:
Calculating the offset of the carriage return byte = offset of "folderpath" + the number of chars from last input + 1:
lea si, Input_Buffer + 1 ; offset of number of chars from last input
xor ax,ax ; set ax to zero
mov al,[si] ; get the number of chars from last input
mov di,ax ; put it into an address register
mov BYTE PTR[di+folderpath+1], 0 ; override the carriage return(0Dh) with a zero byte
This instruction(using Intel syntax) does not exist:
lea folderpath ,dx
Use this one instead:
lea dx, Input_Buffer
mov ah, 0ah
int 21h
Hint: for a DOS *.com file are: CS=DS=ES, but for a DOS *.exe file we have to set the data segment register and we have to allocate a stack segment too.
Related
how would you remove the new line character that is added at the end of the string when you get it from the console by using system read?
I would like to add a 0 to the end of it to use that string to open a file
I'm getting input like this:
mov rdx,name_len ; size_t count
mov rsi,name ; char *buf
mov rdi,0 ; int fd, 0 for stdin
mov rax,0 ; system read
syscall
read returns the number of characters, so you can index into the buffer and check if the last one read is a newline. Or just unconditionally overwrite it with 0 with this:
...
syscall ; rax = sys_read(0, buf, max_len)
mov byte [rsi + rax - 1], 0
That assumes no error and that the input string was submitted with a newline instead of EOF.
(Linux syscalls preserve all regs except RAX (return value), and RCX/R11, so RSI still holds name.)
I'm trying to get the second character of a string (eg e in Test). Using emu8086 to compile.
When I do:
str db 'Test$'
...
mov si, 1 ; get second character of str
mov bl, str[si] ; store the second character
mov ah, 2 ; display the stored character
mov dl, bl
int 21h
The output is e.
But when I do:
str db 25
db ?
db 25 dup (?)
...
mov ah, 0ah ; accept a string
lea dx, str ; store input in variable str
int 21h
mov si, 1 ; get second character of str (??)
mov bl, str[si] ; store the second character
mov ah, 2 ; display the stored character
mov dl, bl
int 21h
I get ♦.
When I change the second snippet's "get second character of str" portion to this:
mov si, 3 ; get second character of str (why is it '3' instead of '1'?)
mov bl, str[si] ; store the second character
I get e.
I don't understand. While it works in the first snippet, why, in the second snippet, do I have set SI to 3 instead of 1, if I'm trying to reference the second character of the string? Or is the method I'm using misled?
str[si] is not some kind of type/array access, but it will translate into instruction memory operand like [si+1234], where "1234" is offset, where the label str points to in memory.
And in your second example the str label points at byte with value 25 (max length of buffer), then str+1 points at returned input length byte (that's the ♦ value you get on output, if you try to print it out as character), and str+2 points at first character of user input. So to get second character you must use str+3 memory address.
The memory is addressable by bytes, so you either have to be aware of byte-size of all elements, or use more labels, like:
str_int_0a: ; label to the beginning of structure for "0a" DOS service
db 25
db ?
str: ; label to the beginning of raw input buffer (first char)
db 25 dup (?)
Then in the code you use the correct label depending on what you want to do:
...
mov ah, 0ah ; accept a string
lea dx, str_int_0a ; store input in memory at address str
int 21h
mov si, 1 ; index of second character of str
mov bl, str[si] ; load the second character
mov ah, 2 ; display the stored character
mov dl, bl
int 21h
...
You should use some debugger and observe values in memory, and registers, and assembled instructions to get the better feel for how these work inside the CPU, how segment:offset addressing is used to access memory in 16b real mode of x86, etc...
Im using emu8086.
For example i have a macro called 'store' which takes a string and stores it in an array, how do i do that?
sample code:
arrayStr db 30 dup(' ')
store "qwerty"
store MACRO str
*some code here which stores str into arrayStr*
endm
Most examples i found on the internet revolve around already having the string stored in a variable (ex. string db (some string here)) but i want something where the variables get initialized empty first.
Do you want to change a variable at runtime? In this case take a look at the PRINT-macro in emu8086.inc. A few changes and you've got a STORE-macro:
store MACRO str
LOCAL skip_data, endloop, repeat, localdata
jmp skip_data ; Jump over data
localdata db str, '$', 0 ; Store the macro-argument with terminators
skip_data:
mov si, OFFSET localdata
mov di, OFFSET msg
repeat: ; Loop to store the string
cmp byte ptr [si], 0 ; End of string?
je endloop ; Yes: end of loop
movsb ; No: Copy one byte from DS:SI to ES:DI, inc SI & DI
jmp repeat ; Once more
endloop:
ENDM
crlf MACRO
LOCAL skip_data, localdata
jmp skip_data
localdata db 13, 10, '$'
skip_data:
mov dx, offset localdata
mov ah, 09h
int 21h
ENDM
ORG 100h
mov dx, OFFSET msg
mov ah, 09h
int 21h
crlf
store "Hello!"
mov dx, OFFSET msg
mov ah, 09h
int 21h
crlf
store "Good Bye."
mov dx, OFFSET msg
mov ah, 09h
int 21h
mov ax, 4C00h
int 21h
msg db "Hello, World!", '$'
It depends on, what you want to do with the string
Here are some examples:
ASCIZ-String
The string ends with a zero-byte.
The advantage is that everytime the CPU loads a single byte from the RAM the zero-flag is set if the end of the string is reached.
The disadvantage is that the string mustn't contain another zero-byte. Otherwise the program would interprete an earlier zero-byte as the end of the string.
String input from DOS-Function Readln (int 21h/ ah=0ah)
The first byte defines, how long the string inputted by the user could be maximally. The effective length is defined in the second byte. The rest contains the string.
String which is ready to be outputted using WriteLn (int 21h/ ah=09h)
The string ends with a dollar-sign (ASCII 36).
The advantage is that your programm can output the string using a single function (int 21h/ ah=09h).
The disadvantage is that the string mustn't contain another dollar-sign. Otherwise the program would interprete an earlier dollar-sign as the end of the string.
String whose length is defined in a word/byte at the beginning of the String
Unformatted String
You don't have to save the length in a variable nor marking the end, if you save the length to a constant which you can put in a register (e.g. in CX)
I'm new in assembly. I want to compare two string using "cmps". I read some examples and I write this :
GETSTR MACRO STR
MOV AH,0AH
LEA DX,STR
INT 21H
ENDM
PRINTSTR MACRO STR
MOV AH,09H
LEA DX,STR
INT 21H
ENDM
EXTRA SEGMENT
DEST DB ?
EXTRA ENDS
DATA SEGMENT
SOURCE DB ?
STR1 DB 0AH,0DH,'ENTER STR : ' ,'$'
ENTER DB 10,13,'$'
SAME DB 0AH,0DH,'TWO STR ARE THE SAME ' ,'$'
NSAME DB 0AH,0DH,'TWO STR ARE NOT THE SAME ' ,'$'
USER DB 6,10 DUP('$')
USER1 DB 6,10 DUP('$')
DATA ENDS
CODE SEGMENT
ASSUME DS:DATA,CS:CODE,ES:EXTRA
START:
MOV AX,DATA
MOV DS,AX
MOV AX,EXTRA
MOV ES,AX
PRINTSTR STR1
GETSTR USER1
PRINTSTR STR1
GETSTR USER
LEA BX,USER
MOV SI,BX
LEA BX,USER1
MOV DI,BX
CLD
MOV CX,5
REPE CMPSB
JCXZ MTCH
PRINTSTR NSAME
JMP ENDPR
MTCH:
PRINTSTR SAME
ENDPR:
MOV AH,4CH
INT 21H
CODE ENDS
END START
I have some question:
what is exactly the numbers 6,10 in the code below :
USER DB 6,10 DUP('$')
Is there any mistake with the Macros?
Is it necessary to declare EXTRA SEGMENT ?
For any similar strings input the output is : "they are not the same?" what is the reason?
The number 6 defines the number of characters plus 1 that you want DOS to input. The number 10 defines the length of the buffer that follows. Actually the number 7 would have been enough!
The macros seem fine.
You don't need the EXTRA segment. Moreover putting it into ES is wrong because both strings that you will be comparing are in the DATA segment.
Also both LEA instructions must fetch an address that is 2 higher. The first byte will still be the maximum number of bytes to read (6) and the second byte will be the number of bytes actually read [0,5]
The comparison you're making indifferably uses 5 characters. If you don't take into account the real number of characters as reported by DOS in the second byte it's no wonder results might not be satisfying.
I want to do two things:
1) Take a string from user
2) Find the length of that string
I tried the following code:
.model small
.stack 100h
.data
MAXLEN DB 100
ACT_LEN DB 0 ;Actual length of the string
ACT_DATA DB 100 DUP('$') ;String will be stored in ACT_DATA
MSG1 DB 10,13,'ENTER STRING : $'
.CODE
START:
MOV AX,#data
MOV DS,AX
;Normal printing
LEA DX,MSG1
MOV AH,09H
INT 21H
;Cant understand code from here!
LEA DX,ACT_DATA
MOV AH,0AH
MOV DX,OFFSET MAXLEN
INT 21H
LEA SI,ACT_DATA
MOV CL,ACT_LEN
;AND THEH SOME OPERATIONS
END START
But I am confused how the length is stored in CL register, i.e. how the ACT_LEN value is incremented? And what actually does mov AH,0A has relation with length?
Int 21/AH=0Ah
Format of DOS input buffer:
Offset Size Description (Table 01344)
00h BYTE maximum characters buffer can hold (MAXLEN)
01h BYTE (call) number of chars from last input which may be recalled (ACT_LEN)
(ret) number of characters actually read, excluding CR
02h N BYTEs actual characters read, including the final carriage return (ACT_DATA)
The buffered input interrupt will fill in these values.
LEA DX,ACT_DATA
MOV AH,0AH
MOV DX,OFFSET MAXLEN
INT 21H
You do not need LEA DX,ACT_DATA
mov AH,0A is the number of the interrupt to call. Ralph Brown has a big list of interrupts with descriptions and what goes in/comes out.