Assembler messages: Error: junk when running as on Linux - linux

I am currently studying the material here where the author is creating an OS in Windows using mingw. I am trying to follow along and I'm using Ubuntu, yet when I get to a particular stage, namely assembling the object file I receive an error.
The command I am using is:
as -o boot.o boot.s
and here is my error:
as -o boot.o boot.s
boot.s: Assembler messages:
boot.s:22: Error: junk `iResSect' after expression
boot.s:24: Error: invalid character ',' in mnemonic
Makefile:10: recipe for target 'boot.o' failed
make: *** [boot.o] Error 1
Here are some of the files:
boot.s:
.code16
.intel_syntax noprefix
.text
.org 0x0
LOAD_SEGMENT = 0x1000 # Load the 2nd Stage to Here
FAT_SEGMENT = 0x0ee0 # Load FAT to here
.global main
main:
jmp short start
nop
.include "bootsector.s"
.include "macros.s"
start:
mInitSegments
mResetDiskSystem
mWriteString loadmsg
mFindFile filename, LOAD_SEGMENT
mReadFAT FAT_SEGMENT
mReadFile LOAD_SEGMENT, FAT_SEGMENT
mStartSecondStage
#
# Booting has failed because of a disk error
# Inform the user and reboot.
#
bootFailure:
mWriteString diskerror
mReboot
.include "functions.s"
# DATA
filename: .asciz "2NDSTAGEBIN"
rebootmsg: .asciz "Press any key to reboot.\r\n"
diskerror: .asciz "Disk error. "
loadmsg: .asciz "Loading SamOS...\r\n"
root_strt: .byte 0,0 # Holds offset of Root Dir on disk
root_scts: .byte 0,0 # Hold No. Sectors in Root Dir
file_strt: .byte 0,0 # Hold offset of bootloaded on disk
.fill (510-(.-main)), 1, 0
BootMagic: .int 0xAA55
bootsector.s:
bootsector:
iOEM: .ascii "DevOS " # OEM String
iSectSize: .word 0x200 # Bytes per Sector
iClustSize: .byte 1 # Sectors per Cluster
iResSect: .word 1 # No. Reserved Sectors
iFatCnt: .byte 2 # No. FAT Copies
iRootSize: .word 224 # Size of Root Dir.
iTotalSect: .word 2880 # Total no. of sectors (<32mb)
iMedia: .byte 0xF0 # Media Descriptor
iFatSize: .word 9 # Size of each FAT
iTrackSect: .word 9 # Sectors per Track
iHeadCnt: .word 2 # No. Read/Write Heads
iHiddenSect: .int 0 # No. Hidden Sectors
iSect32: .int 0 # No. Sectors if > 32mb
iBootDrive: .byte 0 # Boot Sectors comes from here
iReserved: .byte 0 # No. Reserved Sectors
iBootSign: .byte 0x29 # Extended boot sect. signature
iVolID: .ascii "seri" # Disk Serial
acVolLabel: .ascii "MYVOLUME " # Placeholder
acFSType: .ascii "FAT16 "
Any suggestions as to why this is happening?

Related

creating Linux i386 a.out executable shorter than 4097 bytes

I'm trying to create a Linux i386 a.out executable shorter than 4097 bytes, but all my efforts have failed so far.
I'm compiling it with:
$ nasm -O0 -f bin -o prog prog.nasm && chmod +x prog
I'm testing it in a Ubuntu 10.04 i386 VM running Linux 2.6.32 with:
$ sudo modprobe binfmt_aout
$ sudo sysctl vm.mmap_min_addr=4096
$ ./prog; echo $?
Hello, World!
0
This is the source code of the 4097-byte executable which works:
; prog.nasm
bits 32
cpu 386
org 0x1000 ; Linux i386 a.out QMAGIC file format has this.
SECTION_text:
a_out_header:
dw 0xcc ; magic=QMAGIC; Demand-paged executable with the header in the text. The first page (0x1000 bytes) is unmapped to help trap NULL pointer references.
db 0x64 ; type=M_386
db 0 ; flags=0
dd SECTION_data - SECTION_text ; a_text=0x1000 (byte size of .text; mapped as r-x)
dd SECTION_end - SECTION_data ; a_data=0x1000 (byte size of .data; mapped as rwx, not just rw-)
dd 0 ; a_bss=0 (byte size of .bss)
dd 0 ; a_syms=0 (byte size of symbol table data)
dd _start ; a_entry=0x1020 (in-memory address of _start == file offset of _start + 0x1000)
dd 0 ; a_trsize=0 (byte size of relocation info or .text)
dd 0 ; a_drsize=0 (byte size of relocation info or .data)
_start: mov eax, 4 ; __NR_write
mov ebx, 1 ; argument: STDOUT_FILENO
mov ecx, msg ; argument: address of string to output
mov edx, msg_end - msg ; argument: number of bytes
int 0x80 ; syscall
mov eax, 1 ; __NR_exit
xor ebx, ebx ; argument: EXIT_SUCCESS == 0.
int 0x80 ; syscall
msg: db 'Hello, World!', 10
msg_end:
times ($$ - $) & 0xfff db 0 ; padding to multiple of 0x1000 ; !! is this needed?
SECTION_data: db 0
; times ($$ - $) & 0xfff db 0 ; padding to multiple of 0x1000 ; !! is this needed?
SECTION_end:
How can I make the executable file smaller? (Clarification: I still want a Linux i386 a.out executable. I know that that it's possible to create a smaller Linux i386 ELF executable.) There is several thousands bytes of padding at the end of the file, which seems to be required.
So far I've discovered the following rules:
If a_text or a_data is 0, Linux doesn't run the program. (See relevant Linux source block 1 and 2.)
If a_text is not a multiple of 0x1000 (4096), Linux doesn't run the program. (See relevant Linux source block 1 and 2.)
If the file is shorter than a_text + a_data bytes, Linux doesn't run the program. (See relevant Linux source code location.)
Thus file_size >= a_text + a_data >= 0x1000 + 1 == 4097 bytes.
The combinations nasm -f aout + ld -s -m i386linux and nasm -f elf + ld -s -m i386linux and as -32 + ld -s -m i386linux produce an executable of 4100 bytes, which doesn't even work (because its a_data is 0), and by adding a single byte to section .data makes the executable file 8196 bytes long, and it will work. Thus this path doesn't lead to less than 4097 bytes.
Did I miss something?
TL;DR It doesn't work.
It is impossible to make a Linux i386 a.out QMAGIC executable shorter than 4097 bytes work on Linux 2.6.32, based on evidence in the Linux kernel source code of the binfmt_aout module.
Details:
If a_text is 0, Linux doesn't run the program. (Evidence for this check: a_text is passed as the length argument to mmap(2) here.)
If a_data is 0, Linux doesn't run the program. (Evidence for this check: a_data is passed as the length argument to mmap(2) here.)
If a_text is not a multiple of 0x1000 (4096), Linux doesn't run the program. (Evidence for this check: fd_offset + ex.a_text is passed as the offset argument to mmap(2) here. For QMAGIC, fd_offset is 0.)
If the file is shorter than a_text + a_data bytes, Linux doesn't run the program. (Evidence for this check: file sizes is compared to a_text + a_data + a_syms + ... here.)
Thus file_size >= a_text + a_data >= 0x1000 + 1 == 4097 bytes.
I've also tried OMAGIC, ZMAGIC and NMAGIC, but none of them worked. Details:
For OMAGIC, read(2) is used instead of mmap(2) within here, thus it can work. However, Linux tries to load the code to virtual memory address 0 (N_TXTADDR is 0), and this causes SIGKILL (if non-root and vm.mmap_min_addr is larger than 0) or SIGILL (otherwise), thus it doesn't work. Maybe the reason for SIGILL is that the page allocated by set_brk is not executable (but that should be indicated by SIGSEGV), this could be investigated further.
For ZMAGIC and NMAGIC, read(2) instead of mmap(2) within here if fd_offset is not a multiple of the page size (0x1000). fd_offset is 32 for NMAGIC, and 1024 for ZMAGIC, so good. However, it doesn't work for the same reason (load to virtual memory address 0).
I wonder if it's possible to run OMAGIC, ZMAGIC or NMAGIC executables at all on Linux 2.6.32 or later.

How to print value of a register using spike?

I have an assembly code for RISCV machine.
I have added an instruction to access floating point control and status register and store floating point flags in register a3. I want to print its value to demonstrate that flag gets set when floating point exception occurs.
I tried using spike. There is an instruction in spike(in debug mode) to print value of a register:
: reg 0 a3
to print value of a3.
But first i have to reach my desired point.
I do not know how will i be able to reach that point.
.file "learn_Assembly.c"
.option nopic
.text
.comm a,4,4
.comm b,4,4
.align 1
.globl main
.type main, #function
main:
addi sp,sp,-32
sd s0,24(sp)
addi s0,sp,32
lui a5,%hi(a)
lui a4,%hi(.LC0)
flw fa5,%lo(.LC0)(a4)
fsw fa5,%lo(a)(a5)
lui a5,%hi(b)
lui a4,%hi(.LC1)
flw fa5,%lo(.LC1)(a4)
fsw fa5,%lo(b)(a5)
lui a5,%hi(a)
flw fa4,%lo(a)(a5)
lui a5,%hi(b)
flw fa5,%lo(b)(a5)
fmul.s fa5,fa4,fa5
frflags a3
fsw fa5,-20(s0)
li a5,0
mv a0,a5
ld s0,24(sp)
addi sp,sp,32
jr ra
.size main, .-main
.section .rodata
.align 2
.LC0:
.word 1082130432
.align 2
.LC1:
.word 1077936128
.ident "GCC: (GNU) 8.2.0"
The other option is to somehow write print it using assembly instruction which i am not sure how to do.
To understand the flow of your program , you could create object dump of your program from compiled elf .
To create elf :-
riscv64-unknown-elf-gcc assmebly_code.s -o executable.elf
Then you could create the object dump by :-
riscv64-unknown-elf-objdump -d executable.elf > executable.dump
executable.dump will contains the program flow like this :-
executable.elf: file format elf64-littleriscv
Disassembly of section .text:
00000000000100b0 <_start>:
100b0: 00002197 auipc gp,0x2
100b4: 35018193 addi gp,gp,848 # 12400 <__global_pointer$>
100b8: 81818513 addi a0,gp,-2024 # 11c18 <_edata>
100bc: 85818613 addi a2,gp,-1960 # 11c58 <_end>
100c0: 8e09 sub a2,a2,a0
100c2: 4581 li a1,0
100c4: 1e6000ef jal ra,102aa <memset>
100c8: 00000517 auipc a0,0x0
100cc: 13850513 addi a0,a0,312 # 10200 <__libc_fini_array>
100d0: 104000ef jal ra,101d4 <atexit>
100d4: 174000ef jal ra,10248 <__libc_init_array>
100d8: 4502 lw a0,0(sp)
100da: 002c addi a1,sp,8
100dc: 4601 li a2,0
100de: 0be000ef jal ra,1019c <main>
100e2: 0fe0006f j 101e0 <exit>
....... ........ .................
....... ........ .................
....... ........ .................
Recognize the required pc with required a3 value .
then on spike use command until to run till that pc value :
: until pc 0 <*required pc*>
Note : Your compiler and assembler names may vary.
You can use until spike instruction to execute until a desired equality is reached:
: until pc 0 2020 (stop when pc=2020)
As explain here (interactive debug).
Once value reached you can use reg to read value you want.

Why does strace believe this memory is uninitialized when attaching to a process?

I have an extremely simple program that does nothing more than call recvfrom() in a loop. According to its manpage, one of the arguments is a pointer to the length of the address. This address is initialized in the .data section to the integer value 16. I noticed some strange behavior when I attach to the already-running process to trace it which is not present when I trace the process directly (when I start it traced). Scroll to the end of the lines:
# strace -x -s 10 -e trace=recvfrom ./test
recvfrom(3, "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"..., 32, 0, {sa_family=AF_INET, sin_port=htons(42134), sin_addr=inet_addr("127.0.0.1")}, [16]) = 32
recvfrom(3, "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"..., 32, 0, {sa_family=AF_INET, sin_port=htons(49442), sin_addr=inet_addr("127.0.0.1")}, [16]) = 32
recvfrom(3, ^Cstrace: Process 18909 detached
<detached ...>
# ./test &
# strace -x -s 10 -e trace=recvfrom -p $!
strace: Process 18916 attached
recvfrom(3, "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"..., 32, 0, {sa_family=AF_INET, sin_port=htons(50906), sin_addr=inet_addr("127.0.0.1")}, [1999040176->16]) = 32
recvfrom(3, "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"..., 32, 0, {sa_family=AF_INET, sin_port=htons(52956), sin_addr=inet_addr("127.0.0.1")}, [16]) = 32
recvfrom(3, ^Cstrace: Process 18916 detached
<detached ...>
When I trace it directly, the address length argument shows as [16], which makes sense. After all, the address is a pointer to an int of the value 16. However, when I attach to the process and trace it, the very first call shows that it is not initialized, e.g. [1999040176->16]. This happens for the first syscall every time I attach, but all subsequent calls it shows it correctly as [16]. If I detach from the process and re-attach, the first call will show it as having uninitialized memory.
To be brief:
When I run it under strace, the last argument shows [16] for every recvfrom().
When I attach to it when it is already running, the last argument shows things like [1999040176->16] in the first call to recvfrom(), and [16] in all subsequent ones.
If I detach from it and attach again, the first call to recvfrom() again displays this odd behavior, and all subsequent calls display the expected [16].
The program itself is correct. Here is the program (written in MIPS assembly):
.section .text
.global __start
__start:
# socket
li $v0,4183
li $a0,2
li $a1,1
li $a2,0
syscall
sw $v0,sockfd
# bind
li $v0,4169
lw $a0,sockfd
la $a1,sockaddr_b
li $a2,16
syscall
loop:
# recvfrom
li $v0,4176
lw $a0,sockfd
la $a1,buffer
li $a2,32
li $a3,0
la $t0,sockaddr_a
sw $t0,16($sp)
la $t0,addrlen
sw $t0,20($sp)
syscall
j loop
.section .bss
sockaddr_a: .space 16
buffer: .space 32
sockfd: .space 4
.section .data
addrlen: .int 16
.section .rodata
sockaddr_b: .hword 2,1234,0,0

ARM Assembly Branch Segmentation Fault

I'm new to assembly and I'm currently getting a segmentation fault when executing the following:
.global _start # Provide program starting address to linker
_start: mov R0,#0 # A value of 1 indicates "True"
bl v_bool # Call subroutine to display "True" or "False"
mov R0,#0 # Exit Status code of 0 for "normal completion"
mov R7,#1 # Service command 1 terminates this program
svc 0 # Issue Linux command to terminate program
# Subroutine v_bool wil display "True" or "False" on the monitor
# R0: contains 0 implies false; non-zero implies true
# LR: Contains the return address
# Registers R0 through R7 will be used by v_bool and not saved
v_bool: cmp R0,#0 # Set condition flags for True or False
beq setf
bne sett
mov R2,#6 # Number of characters to be displayed at a time.
mov R0,#1 # Code for stdout (standard output, monitor)
mov R7,#4 # Linux service command code to write.
svc 0 # Call Linux command
bx LR # Return to the calling program
sett: ldr R1,=T_msg
setf: ldr R1,=F_msg
.data
T_msg: .ascii "True " # ASCII string to display if true
F_msg: .ascii "False " # ASCII string to display if false
.end
I've used the debugger to find that the causes of the segmentation fault are the two branches sett and setf, and I understand that this is caused by the program trying to write to an illegal memory location.
However, I do not understand why these branches are not able to write to R1, or what I should do to fix this. Any help is greatly appreciated.
The issue is not the instructions themselves. The problem is, after executing the instruction at, for instance setf, the execution continues on to undefined memory. You need to make sure the execution after setf and sett goes back to the code of v_bool.

Understanding how $ works in assembly [duplicate]

len: equ 2
len: db 2
Are they the same, producing a label that can be used instead of 2? If not, then what is the advantage or disadvantage of each declaration form? Can they be used interchangeably?
The first is equate, similar to C's:
#define len 2
in that it doesn't actually allocate any space in the final code, it simply sets the len symbol to be equal to 2. Then, when you use len later on in your source code, it's the same as if you're using the constant 2.
The second is define byte, similar to C's:
int len = 2;
It does actually allocate space, one byte in memory, stores a 2 there, and sets len to be the address of that byte.
Here's some pseudo-assembler code that shows the distinction:
line addr code label instruction
---- ---- -------- ----- -----------
1 0000 org 1234h
2 1234 elen equ 2
3 1234 02 dlen db 2
4 1235 44 02 00 mov ax, elen
5 1238 44 34 12 mov ax, dlen
Line 1 simply sets the assembly address to be 1234h, to make it easier to explain what's happening.
In line 2, no code is generated, the assembler simply loads elen into the symbol table with the value 2. Since no code has been generated, the address does not change.
Then, when you use it on line 4, it loads that value into the register.
Line 3 shows that db is different, it actually allocates some space (one byte) and stores the value in that space. It then loads dlen into the symbol table but gives it the value of that address 1234h rather than the constant value 2.
When you later use dlen on line 5, you get the address, which you would have to dereference to get the actual value 2.
Summary
NASM 2.10.09 ELF output:
db does not have any magic effects: it simply outputs bytes directly to the output object file.
If those bytes happen to be in front of a symbol, the symbol will point to that value when the program starts.
If you are on the text section, your bytes will get executed.
Weather you use db or dw, etc. that does not specify the size of the symbol: the st_size field of the symbol table entry is not affected.
equ makes the symbol in the current line have st_shndx == SHN_ABS magic value in its symbol table entry.
Instead of outputting a byte to the current object file location, it outputs it to the st_value field of the symbol table entry.
All else follows from this.
To understand what that really means, you should first understand the basics of the ELF standard and relocation.
SHN_ABS theory
SHN_ABS tells the linker that:
relocation is not to be done on this symbol
the st_value field of the symbol entry is to be used as a value directly
Contrast this with "regular" symbols, in which the value of the symbol is a memory address instead, and must therefore go through relocation.
Since it does not point to memory, SHN_ABS symbols can be effectively removed from the executable by the linker by inlining them.
But they are still regular symbols on object files and do take up memory there, and could be shared amongst multiple files if global.
Sample usage
section .data
x: equ 1
y: db 2
section .text
global _start
_start:
mov al, x
; al == 1
mov al, [y]
; al == 2
Note that since the symbol x contains a literal value, no dereference [] must be done to it like for y.
If we wanted to use x from a C program, we'd need something like:
extern char x;
printf("%d", &x);
and set on the asm:
global x
Empirical observation of generated output
We can observe what we've said before with:
nasm -felf32 -o equ.o equ.asm
ld -melf_i386 -o equ equ.o
Now:
readelf -s equ.o
contains:
Num: Value Size Type Bind Vis Ndx Name
4: 00000001 0 NOTYPE LOCAL DEFAULT ABS x
5: 00000000 0 NOTYPE LOCAL DEFAULT 1 y
Ndx is st_shndx, so we see that x is SHN_ABS while y is not.
Also see that Size is 0 for y: db in no way told y that it was a single byte wide. We could simply add two db directives to allocate 2 bytes there.
And then:
objdump -dr equ
gives:
08048080 <_start>:
8048080: b0 01 mov $0x1,%al
8048082: a0 88 90 04 08 mov 0x8049088,%al
So we see that 0x1 was inlined into instruction, while y got the value of a relocation address 0x8049088.
Tested on Ubuntu 14.04 AMD64.
Docs
http://www.nasm.us/doc/nasmdoc3.html#section-3.2.4:
EQU defines a symbol to a given constant value: when EQU is used, the source line must contain a label. The action of EQU is to define the given label name to the value of its (only) operand. This definition is absolute, and cannot change later. So, for example,
message db 'hello, world'
msglen equ $-message
defines msglen to be the constant 12. msglen may not then be redefined later. This is not a preprocessor definition either: the value of msglen is evaluated once, using the value of $ (see section 3.5 for an explanation of $) at the point of definition, rather than being evaluated wherever it is referenced and using the value of $ at the point of reference.
See also
Analogous question for GAS: Difference between .equ and .word in ARM Assembly? .equiv seems to be the closes GAS equivalent.
equ: preprocessor time. analogous to #define but most assemblers are lacking an #undef, and can't have anything but an atomic constant of fixed number of bytes on the right hand side, so floats, doubles, lists are not supported with most assemblers' equ directive.
db: compile time. the value stored in db is stored in the binary output by the assembler at a specific offset. equ allows you define constants that normally would need to be either hardcoded, or require a mov operation to get. db allows you to have data available in memory before the program even starts.
Here's a nasm demonstrating db:
; I am a 16 byte object at offset 0.
db '----------------'
; I am a 14 byte object at offset 16
; the label foo makes the assembler remember the current 'tell' of the
; binary being written.
foo:
db 'Hello, World!', 0
; I am a 2 byte filler at offset 30 to help readability in hex editor.
db ' .'
; I am a 4 byte object at offset 16 that the offset of foo, which is 16(0x10).
dd foo
An equ can only define a constant up to the largest the assembler supports
example of equ, along with a few common limitations of it.
; OK
ZERO equ 0
; OK(some assemblers won't recognize \r and will need to look up the ascii table to get the value of it).
CR equ 0xD
; OK(some assemblers won't recognize \n and will need to look up the ascii table to get the value of it).
LF equ 0xA
; error: bar.asm:2: warning: numeric constant 102919291299129192919293122 -
; does not fit in 64 bits
; LARGE_INTEGER equ 102919291299129192919293122
; bar.asm:5: error: expression syntax error
; assemblers often don't support float constants, despite fitting in
; reasonable number of bytes. This is one of the many things
; we take for granted in C, ability to precompile floats at compile time
; without the need to create your own assembly preprocessor/assembler.
; PI equ 3.1415926
; bar.asm:14: error: bad syntax for EQU
; assemblers often don't support list constants, this is something C
; does support using define, allowing you to define a macro that
; can be passed as a single argument to a function that takes multiple.
; eg
; #define RED 0xff, 0x00, 0x00, 0x00
; glVertex4f(RED);
; #undef RED
;RED equ 0xff, 0x00, 0x00, 0x00
the resulting binary has no bytes at all because equ does not pollute the image; all references to an equ get replaced by the right hand side of that equ.

Resources