I am currently working on a small program on a ci20 machine that prompt the user for a integer value then print the value to the screen.
My current code
.data
prompt:
.asciiz "Please enter an integer: "
message:
.asciiz "\nValue entered: "
.text
.global main
main:
addiu $sp, $sp, -4 # push stack
sw $ra, ($sp) # save return address
addi $v0, $0, 4
la $a0, prompt
syscall # printing prompt
addi $v0, $0, 5
syscall # get user input
move $t0, $v0 # save input in $t0
move $a0, $v0
addi $v0, $0, 1 # Not sure if this is right to print message
la $a0, message # Not sure if this is right to print message
syscall
lw $ra, ($sp) # restoring $sp
addiu $sp, $sp, +4 # release the stack space used for $sp
When I try to run the program I get a seg fault and not sure why. Any help or suggestion would be greatly appreciated.
edit: for some reason I completely ignored this code was tested on ci20 machine.
So is this linux? Then you can't use MARS syscalls, you have to find linux syscalls instead. It is then probably segfaulting on the very first syscall instruction, as the arguments are invalid for Linux.
To display "prompt" you use syscall with arguments set as v0 = 4, a0 = prompt ... to display "message" you set arguments for syscall as v0 = 1, a0 = message.
If this is in MARS, then v0=1 is "print integer", so a0 should be integer, not address of "message" string. .. you probably want to call syscall twice, with v0=4 and v0=1 (argument a0 being "message" and users integer for particular call).
Anyway, none of this should segfault. The segfault happens probably at the end, where your code ends with addiu $sp, $sp, +4, not returning to the ra, or calling syscall "exit" function (from the saving of ra at the start of your code it looks like you want rather to return than exit, but it's up to you). So the execution continues over some random instructions (uninitialized memory content).
Anyway 2, you should figure out how to load this code in debugger and step over it instruction by instruction, then you will be capable to say where exactly it segfaults, and what was the content of registers before segfaulting instruction. If your code segfaults and you don't even know where, it shows lack of effort on your side.
(disclaimer: I never did MIPS assembly, so I'm mostly guessing how it works and may have overlooked something)
edit about syscall, maybe this hint will help too?
syscall isn't some magic instruction doing all that nifty stuff on the CPU. It just jumps to some handler routine.
That handler code is set up by the OS. Most of the MIPS assembly listings on SO are targetted at MARS or SPIM, which have completely different handler than Linux.
So you should study linux ABI for MIPS, and how syscall is used there. And then find linux system calls table, you will probably find ton of x86 docs, so you have to convert that into v0/a0/... ABI.
You can still follow MARS examples, but any OS interaction has to be adjusted, and don't expect to find alternative for everything. For example outputting the number is not available in linux. You have to convert the number value into ASCII string by yourself (for single digit numbers adding '0' is enough, for numbers above 9 you have to calculate digit for each power of 10 and convert it into ASCII character and store it into some buffer), and output then the string with sys_write/etc. (or link with some libc and call sprintf-like function from C library).
Related
i am trying to learn RISCV assembly. I am trying to learn the ISA and implement some R type, S type instruction. However whenever i try to run the sw and lw instruction. It always gives me error address out of range and i don't understand why. This is an example:
lw a0,40(zero)
addi a1,zero,1
addi a2,zero,1
beq a0,a1,SAVE
LOOP:
addi a1,a1,1
addi t1,a1,0
addi t2,a2,0
jal ra,MUL
add a2,zero,t0
bne a1,a0,LOOP
SAVE:
sw a2,44(zero)
jal x0,DONE
MUL:
andi t0,t0,0
LOOP_MUL:
add t0,t0,t2
addi t1,t1,-1
bne t1,zero,LOOP_MUL
jalr zero,ra,0
DONE:
add zero,zero,zero
I get some error like this : Error in D:\ctmt\BTL_RISCV\riscv1.asm line 2: Runtime exception at 0x00400004: address out of range 0x00000004.
I am really grateful if someone can explain to me why this bug happens
Address out of range applies to load and store instructions. It tells you the address of the instruction (because that's what the hardware sees/knows), and the address you are attempting to access. Usually such is because of a bad pointer value in the base register, but sometimes, the pointer value is close to the end of a something and the immediate value puts it past the end.
You should look at the memory map for the simulator or environment that you're using, it may tell you what areas of memory are legal — it may also support reconfiguring the memory map. Low memory (values from 0-2048 or so) are not guaranteed to be legal memory locations on many systems, but can sometimes be configured as legal for small memory models, e.g. on embedded systems. One reason low memory is configured as illegal is to catch null pointer dereferencing, which is a common software error (using a null pointer).
question number1:
Having this nasm:
section .data
dat db "write out this:%x", 0xa, 0x0
section .text
global main
extern printf
main:
push rbp
mov rbp, rsp
mov rdi, dat
mov esi, 0xdeedbeef
call printf
leave
ret
gives errno 24 - too many file descriptor opened.
BUT IF CHANGED TO int 80h, instead of
leave
ret
Will terminate without error, how's that?
Also, question number 2:
If I do not make calling convention by :
push rbp
mov rbp, rsp
And only mov rbp, rsp , without pushing rbp before, then command terminated, although no function was call before, therefore there is no need to push base pointer. So why is it needed (in eyes of compiler), and will terminate?
Question 1
You're mistaken about this having anything to do with file descriptors. That isn't what's being reported.
As you explained in comments, 24 is the number shown when you echo $? after running the program. This is the exit code of the program; normally, the value returned from the main function or passed to exit(). It can be whatever you want and normally does not correspond to an errno value.
So why does your program give an exit code of 24? If main returns, then the exit code is its return value. A function's return value is expected to be left in the rax register when it returns (assuming it's of integer or pointer type). But you never touch the rax register, so it still contains the value that was left there when printf returns. Now printf returns the number of characters it successfully printed, which for the string you chose is... 24 (count 'em).
It is just a coincidence that 24 also happens to the errno code for "too many open file descriptors" - that is completely unrelated.
If you want to exit with an exit code of 0 to signal success, then you should xor rax, rax just before your ret.
You didn't show the exact code you used when changing it to int 0x80 to invoke the _exit syscall yourself. But in that case, the exit code would be whatever is in the ebx register when you make the system call. Maybe your code puts zero in ebx, or maybe you are lucky and it happens to already contain zero.
(Side note: int 0x80 is the interface for 32-bit system calls, and is not appropriate in a 64-bit program which is what you seem to be writing, though it may work in a few cases. The 64-bit system call interface uses the syscall instruction and is explained in A.2.1 of the ABI.)
Question 2
You have to align the stack.
When calling a C function from assembly (as printf in this case), you're required by the x86-64 ABI Section 3.2.2 to align the stack to a 16-byte boundary. The stack is aligned appropriately before the call to main, and the return address pushed by call subtracts 8 bytes.
So if you don't touch the stack at all in your code, it will not be correctly aligned when you call printf, and this can cause it to crash. (The assembler will not help you do this; that's not its job.) But when you push rbp, that subtracts a further 8 bytes and gets the stack aligned properly. So either leave that code in, or align the stack yourself.
And in either case, keep in mind that if you change your code to push more stuff on the stack, you'll have to adjust it accordingly before you make any function calls.
I have an assignment in my Computer Architecture class where we are supposed to complete a set of tasks. The last task is to create a subroutine that takes a string and reverses it. We're not allowed to use the original string in order to create a "new memory string". Instead, we are supposed to gradually replace the content of the original string.
My first thought was to load the leftmost and rightmost character in temporary registers and then using these to "swap" their positions. After this is done, the addresses should increment and decrease (there's two addresses, one pointing to the start of the string and one pointing to the end). Then this is going to loop until both of the addresses point to the same point, where the loop then will end.
This is the code I have for the reverse_string subroutine. $a0 is the address for the NULL-ended string. I'm not sure if the algorithm I was thinking of in my head translated into MIPS properly, I'm very new to this type of language.
reverse_string:
#### Write your solution here ####
beqz $a0, rs_exit # Check if string is NULL, exit if it is
li $t0, 0
li $t1, 0
add $t0, $t0, $a0 # Save leftmost adress of string
add $t1, $a0, $v0 # Save rightmost adress of string
rs_loop:
beq $t0, $t1, rs_exit
lbu $t2, 0($t0) # Save leftmost character in temporary register
lbu $t3, 0($t1) # Save rightmost character in temporary register
sb $a0, 0($t2) # Replace rightmost character with leftmost character
sb $a0, 0($t3) # Replace leftmost character with rightmost character
addi $t0, $t0, 1 # Increment leftmost adress by 1
subi $t1, $t1, 1 # Decrement rightmost by 1
j rs_loop
rs_exit:
jr $ra
The code in main for executing reverse_string is the following:
##
### reverse_string
##
li $v0, 4
la $a0, STR_reverse_string
syscall
la $a0, STR_str
la $a1, reverse_string
jal reverse_string
la $a0, STR_str
jal print_test_string
So, as mentioned previously. The expected result is that the program should print out the reversed string. Currently, I'm having errors at the following line:
sb $a0, 0($t2) # Replace rightmost character with leftmost character
The error:
Runtime exception at 0x004000c4: address out of range 0x0000004a
I've tried for several hours. There's several people who have successfully received help with similar problems, however they were a bit different (input from users and also they created new strings instead of replacing the contents of the original one)
I appreciate any help! Thank you.
My first thought was to ... this is going to loop until both of the addresses point to the same point...
Until the "end" pointer is equal or less than "start" pointer. For even length like "abcd" the pointers pointing at "b" and "c" are valid, but after swap and incrementing + decrementing they are still not equal, but you should end the loop. Anyway, except this detail, your idea is good.
Runtime exception at 0x004000c4: address out of range 0x0000004a
This means the sb (store byte) instruction did try to write at memory address 0x0000004a, which is not accessible (no memory there, or not enough rights for your process to write there). Which means the 0($t2) did evaluate to that address, which means the value in t2 is equal to 0x4a. You should be able to see that in debugger, when single stepping over instructions.
From there you have to back-track whole operation, how it become this value and why.
la $a0, STR_str
la $a1, reverse_string
jal reverse_string
why a1 is set? The contract in task says that string address is passed in a0, nothing else. But anyway, that's not your code, just curious... so let's get onto your code.
beqz $a0, rs_exit # Check if string is NULL, exit if it is
li $t0, 0
li $t1, 0
add $t0, $t0, $a0 # Save leftmost adress of string
add $t1, $a0, $v0 # Save rightmost adress of string
null test is ok, you can use also addi or $zero = $0 for zero value, i.e.:
beqz $a0, rs_exit # Check if string is NULL, exit if it is
add $t0, $a0, $zero # t0 = left pointer (start of string)
add $t1, $a0, $v0 # t1 = start of string plus unknown value in v0
And as you can read in the second comment, you have one bug right there at beginning.
Which does imply you didn't debug your code at all, or with very limited inputs.
If the only input to routine is address of string, you have to find char-by-char where the terminating zero is stored, and use that to figure out "end" pointer.
(you can for example copy a0 to a1, load byte from a1, check for zero, increment a1 and fetch again, ... until that terminating zero is found ... the address just ahead (-1) of that first zero is your "end" pointer)
Let's pretend you have correct pointers... then another part:
lbu $t2, 0($t0) # Save leftmost character in temporary register
lbu $t3, 0($t1) # Save rightmost character in temporary register
sb $a0, 0($t2) # Replace rightmost character with leftmost character
sb $a0, 0($t3) # Replace leftmost character with rightmost character
The first two are correct (with correct pointers). But the other two are completely wrong. the "sb" store byte has arguments "value, memory address", so you are trying to store bottom byte of string address to memory address represented by character... 0x4a is in ASCII encoding character 'J', which, as the error message points out, is not valid memory address.
You may consider yourself kinda lucky, because when programming in assembly, sometimes similar bugs actually happen to have in register wrong value, which is accessible, and some memory is overwritten which should have not been, but without crash or any sign of problem. Then much later some completely other part of code may reach for that memory, expecting something else to be stored there, and it will produce some bug. These "memory overwrite" bugs are extremely difficult to decipher and fix, as any oldschool assembly/C/C++ programmer can tell you.
So your code is very weak try to implement the idea you described.
Try to run through it with debugger (if you are using SPIM/MARS simulators, all of them have built-in debugger, not state of art one, but usable for these tiny tutorial tasks), and try to fully understand what is happening, and why those instructions do not represent your idea, and what they are actually doing in reality.
You have to learn this skill, if you want to code in assembly, assembly allows no room for mistakes or some vague interpretations, or to get working code just by "trying" things, changing source randomly. Always figure out what the code actually does, and how precisely it differs from what you want. Then fix it.
Generally assembly questions which show "no debugging" gets downvoted really quickly, because debugging is time consuming process, and just outsourcing it to SO crowd is rude ... but you have bonus point from me for stating clearly your idea, and overall providing almost complete reproducible example (you did forgot to show the string definition ... and in assembly, the way how you define data, is quite often even more important than code, so it may be you have some kind of bug also there, like not adding zero terminator to string, etc...).
Also never try to guess how the instruction works by it's name. Always study the reference manual properly, and make sure you understand everything what it says about the instruction.
You can to some weak results by guessing and trying random things in higher level languages, but it's just waste of time in assembly, even this short routine of ~20 lines already allows for millions of variations (somewhat meaningful at first sight), and only few hundreds of them are correct solution.
Now try again, and focus to stay in control all the time. If you are not sure about something, how to understand it, reread it few more times, or build short code exercising the part you are not sure about, and check in debugger what the CPU does... eventually ask on SO with explanation what you expected, and what surprised you in debugger.
I can' t understand how this works.
Here's a part of main() program disassembled by objdump and written in intel notation
0000000000000530 <main>:
530: lea rdx,[rip+0x37d] # 8b4 <_IO_stdin_used+0x4>
537: mov DWORD PTR [rsp-0xc],0x0
53f: movabs r10,0xedd5a792ef95fa9e
549: mov r9d,0xffffffcc
54f: nop
550: mov eax,DWORD PTR [rsp-0xc]
554: cmp eax,0xd
557: ja 57c <main+0x4c>
559: movsxd rax,DWORD PTR [rdx+rax*4]
55d: add rax,rdx
560: jmp rax
The rodata section dump:
.rodata
08b0 01000200 ecfdffff d4fdffff bcfdffff ................
08c0 9cfdffff 7cfdffff 6cfdffff 4cfdffff ....|...l...L...
08d0 3cfdffff 2cfdffff 0cfdffff ecfcffff <...,...........
08e0 d4fcffff b4fcffff 0cfeffff ............
In 530, rip is [537] so [rdx] = [537 + 37d] = 8b4.
First question is the value of rdx is how large? Is the valueis ec, or ecfdffff or something else? If it has DWORD, I can understand that has 'ecfdffff' (even this is wrong too?:() but this program don't declare it. How can I judge the value?
Then the program continues.
In 559, rax is first appeared.
The second question is this rax can interpret as a part of eax and in this time is the rax = 0? If rax is 0, in 559 means rax = DWORD[rdx] and the value of rax become ecfdffff and next [55d] do rax += rdx, and I think this value can't jamp. There must be something wrong, so tell me where, or how i make any wrongs.
I think I'll diverge from what Peter discussed (he provides good information) and get to the heart of some issues I think are causing you problems. When I first glanced at this question I assumed that the code was likely compiler generated and the jmp rax was likely the result of some control flow statement. The most likely way to generate such a code sequence is via a C switch. It isn't uncommon for a switch statement to be made of a jump table to say what code should execute depending on the control variable. As an example: the control variable for switch(a) is a.
This all made sense to me, and I wrote up a number of comments (now deleted) that ultimately resulted in bizarre memory addresses that jmp rax would go to. I had errands to run but when I returned I had the aha moment that you may have had the same confusion I did. This output from objdump using the -s option appeared as:
.rodata
08b0 01000200 ecfdffff d4fdffff bcfdffff ................
08c0 9cfdffff 7cfdffff 6cfdffff 4cfdffff ....|...l...L...
08d0 3cfdffff 2cfdffff 0cfdffff ecfcffff <...,...........
08e0 d4fcffff b4fcffff 0cfeffff ............
One of your questions seems to be about what values get loaded here. I never used the -s option to look at data in the sections and was unaware that although the dump splits the data out in groups of 4 bytes (32-bit values) they are shown in byte order as it appears in memory. I had at first assumed the output was displaying these values from Most Significant Byte to Least significant byte and objdump -s had done the conversion. That is not the case.
You have to manually reverse the bytes of each group of 4 bytes to get the real value that would be read from memory into a register.
ecfdffff in the output actually means ec fd ff ff. As a DWORD value (32-bit) you need to reverse the bytes to get the HEX value as you would expect when loaded from memory. ec fd ff ff reversed would be ff ff fd ec or the 32-bit value 0xfffffdec. Once you realize that then this makes a lot more sense. If you make this same adjustment for all the data in that table you'd get:
.rodata
08b0: 0x00020001 0xfffffdec 0xfffffdd4 0xfffffdbc
08c0: 0xfffffd9c 0xfffffd7c 0xfffffd6c 0xfffffd4c
08d0: 0xfffffd3c 0xfffffd2c 0xfffffd0c 0xfffffcec
08e0: 0xfffffcd4 0xfffffcb4 0xfffffe0c
Now if we look at the code you have it starts with:
530: lea rdx,[rip+0x37d] # 8b4 <_IO_stdin_used+0x4>
This doesn't load data from memory, it is computing the effective address of some data and places the address in RDX. The disassembly from OBJDUMP is displaying the code and data with the view that it is loaded in memory starting at 0x000000000000. When it is loaded into memory it may be placed at some other address. GCC in this case is producing position independent code (PIC). It is generated in such a way that the first byte of the program can start at an arbitrary address in memory.
The # 8b4 comment is the part we are concerned about (you can ignore the information after that). The disassembly is saying if the program was loaded at 0x0000000000000000 then the value loaded into RDX would be 0x8b4. How was that arrived at? This instruction starts at 0x530 but with RIP relative addressing the RIP (instruction pointer) is relative to the address just after the current instruction. The address the disassembler used was 0x537 (the byte after the current instruction is the address of the first byte of the next instruction). The instruction adds 0x37d to RIP and gets 0x537+0x37d=0x8b4. The address 0x8b4 happens to be in the .rodata section which you are given a dump of (as discussed above).
We now know that RDX contains the base of some data. The jmp rax suggests this is likely going to be a table of 32-bit values that are used to determine what memory location to jump to depending on the value in the control variable of a switch statement.
This statement appears to be storing the value 0 as a 32-bit value on the stack.
537: mov DWORD PTR [rsp-0xc],0x0
These appear to be variables that the compiler chose to store in registers (rather than memory).
53f: movabs r10,0xedd5a792ef95fa9e
549: mov r9d,0xffffffcc
R10 is being loaded with the 64-bit value 0xedd5a792ef95fa9e. R9D is the lower 32-bits of the 64-bit R9 register.The value 0xffffffcc is being loaded into the lower 32-bits of R9 but there is something else occurring. In 64-bit mode if the destination of an instruction is a 32-bit register the CPU automatically zero extends the value into the upper 32-bits of the register. The CPU is guaranteeing us that the upper 32-bits are zeroed.
This is a NOP and doesn't do anything except align the next instruction to memory address 0x550. 0x550 is a value that is 16-byte aligned. This has some value and may hint that the instruction at 0x550 may be the first instruction at the top of a loop. An optimizer may place NOPs into the code to align the first instruction at the top of a loop to a 16-byte aligned address in memory for performance reasons:
54f: nop
Earlier the 32-bit stack based variable at rsp-0xc was set to zero. This reads the value 0 from memory as a 32-bit value and stores it in EAX. Since EAX is a 32-bit register being used as the destination for the instruction the CPU automatically filled the upper 32-bits of RAX to 0. So all of RAX is zero.
550: mov eax,DWORD PTR [rsp-0xc]
EAX is now being compared to 0xd. If it is above (ja) it goes to the instruction at 0x57c.
554: cmp eax,0xd
557: ja 57c <main+0x4c>
We then have this instruction:
559: movsxd rax,DWORD PTR [rdx+rax*4]
The movsxd is an instruction that will take a 32-bit source operand (in this case the 32-bit value at memory address RDX+RAX*4) load it into the bottom 32-bits of RAX and then sign extend the value into the upper 32-bits of RAX. Effectively if the 32-bit value is negative (the most significant bit is 1) the upper 32-bits of RAX will be set to 1. If the 32-bit value is not negative the upper 32-bits of RAX will be set to 0.
When this code is first encountered RDX contains the base of some table at 0x8b4 from the beginning of the program loaded in memory. RAX is set to 0. Effectively the first 32-bits in the table are copied to RAX and sign extended. As seen earlier the value at offset 0xb84 is 0xfffffdec. That 32-bit value is negative so RAX contains 0xfffffffffffffdec.
Now to the meat of the situation:
55d: add rax,rdx
560: jmp rax
RDX still holds the address to the beginning of a table in memory. RAX is being added to that value and stored back in RAX (RAX = RAX+RDX). We then JMP to the address stored in RAX. So this code all seems to suggest we have a JUMP table with 32-bit values that we are using to determine where we should go. So then the obvious question. What are the 32-bit values in the table? The 32-bit values are the difference between the beginning of the table and the address of the instruction we want to jump to.
We know the table is 0x8b4 from the location our program is loaded in memory. The C compiler told the linker to compute the difference between 0x8b4 and the address where the instruction we want to execute resides. If the program had been loaded into memory at 0x0000000000000000 (hypothetically), RAX = RAX+RDX would have resulted in RAX being 0xfffffffffffffdec + 0x8b4 = 0x00000000000006a0. We then use jmp rax to jump to 0x6a0. You didn't show the entire dump of memory but there is going to be code at 0x6a0 that will execute when the value passed to the switch statement is 0. Each 32-bit value in the JUMP table will be a similar offset to the code that will execute depending on the control variable in the switch statement. If we add 0x8b4 to all the entries in the table we get:
08b0: 0x000006a0 0x00000688 0x00000670
08c0: 0x00000650 0x00000630 0x00000620 0x00000600
08d0: 0x000005F0 0x000005e0 0x000005c0 0x000005a0
08e0: 0x00000588 0x00000568 0x000006c0
You should find that in the code you haven't provided us that these addresses coincide with code that appears after the jmp rax.
Given that the memory address 0x550 was aligned, I have a hunch that this switch statement is inside a loop that keeps executing as some kind of state machine until the proper conditions are met for it to exit. Likely the value of the control variable used for the switch statement is changed by the code in the switch statement itself. Each time the switch statement is run the control variable has a different value and will do something different.
The control variable for the switch statement was originally checked for the value being above 0x0d (13). The table starting at 0x8b4 in the .rodata section has 14 entries. One can assume the switch statement probably has 14 different states (cases).
but this program don't declare it
You're looking at disassembly of machine code + data. It's all just bytes in memory. Any labels the disassembler does manage to show are ones that got left in the executable's symbol table. They're irrelevant to how the CPU runs the machine code.
(The ELF program headers tell the OS's program loader how to map it into memory, and where to jump to as an entry point. This has nothing to do with symbols, unless a shared library references some globals or functions defined in the executable.)
You can single-step the code in GDB and watch register values change.
In 559, rax is first appeared.
EAX is the low 32 bits of RAX. Writing to EAX zero-extends into RAX implicitly. From mov DWORD PTR [rsp-0xc],0x0 and the later reload, we know that RAX=0.
This must have been un-optimized compiler output (or volatile int idx = 0; to defeat constant propagation), otherwise it would know at compile time that RAX=0 and could optimize away everything else.
lea rdx,[rip+0x37d] # 8b4
A RIP-relative LEA puts the address of static into a register. It's not a load from memory. (That happens later when movsxd with an indexed addressing mode uses RDX as the base address.)
The disassembler worked out the address for you; it's RDX = 0x8b4. (Relative to the start of the file; when actually running the program would be mapped at a virtual address like 0x55555...000)
554: cmp eax,0xd
557: ja 57c <main+0x4c>
559: movsxd rax,DWORD PTR [rdx+rax*4]
55d: add rax,rdx
560: jmp rax
This is a jump table. First it checks for an out-of-bounds index with cmp eax,0xd, then it indexes a table of 32-bit signed offsets using EAX (movsxd with an addressing mode that scales RAX by 4), and adds that to the base address of the table to get a jump target.
GCC could just make a jump table of 64-bit absolute pointers, but chooses not to so that .rodata is position-independent as well and doesn't need load-time fixups in a PIE executable. (Even though Linux does support doing that.) See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84011 where this is discussed (although the main focus of that bug is that gcc -fPIE can't turn a switch into a table lookup of string addresses, and actually still uses a jump table)
The jump-offset table address is in RDX, this is what was set up with the earlier LEA.
This is code snipper from header.S file in kernel code. I could not understand what the lretw instruction does. I've checked out so many online sources for the instruction.
# We will have entered with %cs = %ds+0x20, normalize %cs so
# it is on par with the other segments.
pushw %ds
pushw $6f
lretw
Can any one help me in understanding this instruction?
ret is the instruction to return from a procedure. So basically it pops the return address from the stack into the EIP register.
the l prefix is here to tell that it is a far return from procedure. In this case, the instruction first pops a value from the stack into the EIP register and then pops a second value into the CS register.
the w suffix is here because at this step we are running in real mode, and operands are 16 bits wide.
The exact code is:
pushw %ds
pushw $6f
lretw
6:
The 6: is very important here. So what this does is: push the value of ds into the stack, push the adress of the 6 label into the stack, and then trigger this lretw instruction. So basically, it will load the address of label 6 into the instruction pointer register, and load the cs register with the value of the ds register. So this is just a trick to continue the execution at label 6 with a change of the cs register value.
You should download http://www.intel.com/design/intarch/manuals/243191.htm which gives precise details for all instructions, including a pseudo-code that details what each instruction is doing.