How do you make a union in NASM? - nasm

I need to replicate a C-style union in Nasm, but can not find out how.
I need
r_eax dd 0
r_ax dw 0
where r_ax should reside in the same memory location as the low 16 bits of r_eax.
In C, this would be:
union RegType {
long eax;
short ax;
} reg_a;
I understand that Nasm does not care about variable sizes, but I still would like to be able to use different reference / variable names.

Really easily. The EQU directive means "make this symbol the same as another one" - so you can simply write r_ax EQU r_eax in your example.
This even works if you use NASM's "local notation" (which I highly recommend...)
STRUC Reg_A
.eax RESD 1
.ax EQU .eax
.al EQU .ax
.ah EQU ????? ; Ummm!
ENDSTRUC
I'd rearrange it as this, though:
STRUC Reg_A
.al RESB 1
.ah RESB 1
.eah RESW 1 ; Not a real symbol - but NASM won't allow unnamed members
.ax EQU .al
.eax EQU .ax
ENDSTRUC

Related

Hello World printing both messages with one system call? Why does length equ $-msg1 include both? [duplicate]

This question already has an answer here:
In NASM labels next to each other in memory are printing both strings instead of first one
(1 answer)
Closed 8 months ago.
section.text:
global _start
_start:
mov ebx, 1
mov eax, 4
mov ecx, msg1
mov edx, len1
int 0x80
mov eax, 1 ; exit
mov ebx, 0
int 0x80
section.data:
msg1: db "Hello world", 10
msg2: db "Hello world!", 10
len1: equ $-msg1
len2: equ $-msg2
it prints out:
Hello world
Hello world!
but why msg2?
len1 is populated incorrectly, it should be:
section .rodata ; space needed between section directive and its operand
; On Linux we normally put read-only data in .rodata
msg1: db "Hello world", 10
len1: equ $-msg1
msg2: db "Hello world!", 10
len2: equ $-msg2
So len1 is a difference between current address ($) and the address of msg1. This way it would be a length of first message.
See How does $ work in NASM, exactly? for more details and examples.
Note that section.data: is just a label defining a symbol name with a dot in the middle. It doesn't switch sections, so your code and data are in the .text section (with read-only + exec permission), which is the default section at the top of the file for nasm -f elf32 outputs.
Use section .data if you want read+write without exec, or section .rodata on Linux if you want read-only without exec, where compilers put string literals and other constants to group them together separate from code.

Accessing struc members NASM Assembly

Coming from Object Oriented languages such as Python and Java, why is this code not working as I would like it to?
I want to access the cat struc variable cat_name and send it to STDOUT to print in my terminal.
catstruct.asm:
SECTION .bss
struc cat
cat_name: resb 8
endstruc
SECTION .data
catStruc:
istruc cat
at cat_name, db "Garfield"
iend
SECTION .text
GLOBAL _start
_start:
mov edx, 8
mov ecx, cat_name
mov ebx, 1
mov eax, 4
int 0x80
mov ebx, 0
mov eax, 1
int 0x80
No errors when the code assembles, however, it does not print anything when I run it. How come?
cat_name just contains 0, the offset of the cat name from the start of the struct, you need
mov ecx, catStruct+cat_name
quote from the manual
For example, to define a structure called mytype containing a
longword, a word, a byte and a string of bytes, you might code
struc mytype
mt_long: resd 1
mt_word: resw 1
mt_byte: resb 1
mt_str: resb 32
endstruc
The above code defines six symbols: mt_long as 0 (the offset from the
beginning of a mytype structure to the longword field), mt_word as 4,
mt_byte as 6, mt_str as 7, mytype_size as 39, and mytype itself as
zero.

GAS to NASM assembly: translate ".rept .set" to NASM (loop and assign incrementing value to label)

GAS assembly knows about the .set-directive which can be combined with .rept to increment a label (variable) in a loop as in the example below:
pd:
.set SPAGE, 0
.rept 512
.quad SPAGE + 0x87 // PRESENT, R/W, USER, 2MB
.set SPAGE, SPAGE + 0x200000
.endr
How can I achieve something similar convenient in NASM? I know about TIMES directive, but this alone doesn't help me to achieve, what I want. Any ideas? The EQU-directive from NASM only allows assigning a value once. Hence, it will not solve my problem.
Indeed this is impossible to do with times directive due to the operand to TIMES is a critical expression, to repeat more than one line of code, or a complex macro, use the preprocessor %rep directive, take a look at this silly example:
global _start
section .text
_start:
mov rbx, 0
%assign i 0
%rep 5
mov rbx, [variable]
add rbx, i
mov [variable], rbx
%assign i i+1
%endrep
mov rax, 60 ; system call for exit
mov rdi, [variable]; value of 'variable' = 10
syscall
section .bss
variable: resb 1
Check the answer:
nasm -felf64 ass.asm && ld ass.o && ./a.out
echo $?

Why is data stored in memory reversed?

This is the source code I have:
section .data
msg: db "pppaaa"
len: equ $
section .text
global main
main:
mov edx,len
mov ecx,msg
mov ebx,1
mov eax,4
int 0x80
And when I debug this code I will see:
(gdb) info register ecx
ecx 0x804a010 134520848
(gdb) x 0x804a010
0x804a010 <msg>: 0x61707070
(gdb) x 0x804a014
0x804a014: 0x00006161
"70" here represents the character 'p' and "61" the character 'a' obviously.
What I am confused about is, why is the data in location 0x804a010 is 0x61707070 (appp) and moving 4 bytes forward at 0x804a014 the data is --aa ?
I would expect to see (pppa) for the first location and (aa--) for the second location. Why is this the case?
GDB doesn't know that you have a bunch of chars. You are just asking it to look at a memory location and it is displaying what is there, defaulting to a 4-byte integer. It assumes the integer is stored least significant byte first, because that is how it is done on Intel, so you get your bytes reversed.
To fix this, use a format specifier with your x command, like this:
x/10c 0x804a010
(will print 10 chars beginning at 0x804a010).
help x in GDB will give more information.

Has anyone been able to create a hybrid of PE COFF and ELF?

I mean could a single binary file run in both Win32 and Linux i386 ?
This is not possible, because the two types have conflicting formats:
The initial two characters of a PE file must be 'M' 'Z';
The initial four characters of an ELF file must be '\x7f' 'E' 'L' 'F'.
Clearly, you can't create one file that satisifies both formats.
In response to the comment about a polyglot binary valid as both a 16 bit COM file and a Linux ELF file, that's possible (although really a COM file is a DOS program, not Windows - and certainly not Win32).
Here's one I knocked together - compile it with NASM. It works because the first two bytes of an ELF file ('\x7f' 'E') happen to also be valid 8086 machine code (a 45 byte relative jump-if-greater-than instruction). Minimal ELF headers cribbed from Brian Raiter.
BITS 32
ORG 0x08048000
ehdr: ; Elf32_Ehdr
db 0x7F, "ELF", 1, 1, 1, 0 ; e_ident
times 8 db 0
dw 2 ; e_type
dw 3 ; e_machine
dd 1 ; e_version
dd _start ; e_entry
dd phdr - $$ ; e_phoff
dd 0 ; e_shoff
dd 0 ; e_flags
dw ehdrsize ; e_ehsize
dw phdrsize ; e_phentsize
dw 1 ; e_phnum
dw 0 ; e_shentsize
dw 0 ; e_shnum
dw 0 ; e_shstrndx
ehdrsize equ $ - ehdr
times 0x47-($-$$) db 0
; DOS COM File code
BITS 16
mov dx, msg1 - $$ + 0x100
mov ah, 0x09
int 0x21
mov ah, 0x00
int 0x21
msg1: db `Hello World (DOS).\r\n$`
BITS 32
phdr: ; Elf32_Phdr
dd 1 ; p_type
dd 0 ; p_offset
dd $$ ; p_vaddr
dd $$ ; p_paddr
dd filesize ; p_filesz
dd filesize ; p_memsz
dd 5 ; p_flags
dd 0x1000 ; p_align
phdrsize equ $ - phdr
; Linux ELF code
_start:
mov eax, 4 ; SYS_write
mov ebx, 1 ; stdout
mov ecx, msg2
mov edx, msg2_len
int 0x80
mov eax, 1 ; SYS_exit
mov ebx, 0
int 0x80
msg2: db `Hello World (Linux).\n`
msg2_len equ $ - msg2
filesize equ $ - $$
The two formats are sufficiently different that a hybrid is unlikely.
However, Linux supports loading different executable formats by "interpreter". This way compiled .exe files containing CIL (compiled C# or other .NET languages) can be executed directly under Linux, for example.
Sure. Use Java.

Resources