single-line macro existence check (ifdef) in a NASM 0.98.39 multi-line macro

single-line macro existence check (ifdef) in a NASM 0.98.39 multi-line macro - nasm

I'm using NASM 0.98.39 (an old version) to assemble the following test source file:
%macro M 1
%ifdef A_%1_B
%error OK
%else
%error undef
%endif
%endm
%define A_foo_B
M foo
I want OK because A_foo_B is indeed a defined single-line macro. In NASM 2.13.02, it displays OK, however in NASM 0.98.39 and NASM 0.99.06, it displays undef. How can I make it work with NASM 0.98.39?
Please note that upgrading NASM is not an option for this project, and inlining the macros M wouldn't scale, because there are thousands of macro callers, and some abstraction is needed.
FYI If I change it like this, it stops working in NASM 2.13.02 as well, with the error message `%ifdef' expects macro identifiers.`:
%macro MS 1
%ifdef %1
%error OK
%else
%error undef
%endif
%endm
%define A_foo_B
MS A_foo_B
Even more stranglely, these both display OK (as expected) in all the above versions of NASM:
%macro MH 1
%ifdef %1_B
%error OK
%else
%error undef
%endif
%endm
%macro MT 1
%ifdef A_%1
%error OK
%else
%error undef
%endif
%endm
%define A_foo_B
MH A_foo
MT foo_B

This workaround of defining 2 macros works in all versions of NASM mentioned in the question:
%macro MT 1
%ifdef A_%1
%error OK
%else
%error undef
%endif
%endm
%macro MIND 1
MT %1_B
%endm
%define A_foo_B 1
MIND foo
This needs a major refactoring of existing NASM macros using %ifdef though.

Related

Understanding how $ works in assembly [duplicate]

len: equ 2
len: db 2
Are they the same, producing a label that can be used instead of 2? If not, then what is the advantage or disadvantage of each declaration form? Can they be used interchangeably?

The first is equate, similar to C's:
#define len 2
in that it doesn't actually allocate any space in the final code, it simply sets the len symbol to be equal to 2. Then, when you use len later on in your source code, it's the same as if you're using the constant 2.
The second is define byte, similar to C's:
int len = 2;
It does actually allocate space, one byte in memory, stores a 2 there, and sets len to be the address of that byte.
Here's some pseudo-assembler code that shows the distinction:
line addr code label instruction
---- ---- -------- ----- -----------
1 0000 org 1234h
2 1234 elen equ 2
3 1234 02 dlen db 2
4 1235 44 02 00 mov ax, elen
5 1238 44 34 12 mov ax, dlen
Line 1 simply sets the assembly address to be 1234h, to make it easier to explain what's happening.
In line 2, no code is generated, the assembler simply loads elen into the symbol table with the value 2. Since no code has been generated, the address does not change.
Then, when you use it on line 4, it loads that value into the register.
Line 3 shows that db is different, it actually allocates some space (one byte) and stores the value in that space. It then loads dlen into the symbol table but gives it the value of that address 1234h rather than the constant value 2.
When you later use dlen on line 5, you get the address, which you would have to dereference to get the actual value 2.

Summary
NASM 2.10.09 ELF output:
db does not have any magic effects: it simply outputs bytes directly to the output object file.
If those bytes happen to be in front of a symbol, the symbol will point to that value when the program starts.
If you are on the text section, your bytes will get executed.
Weather you use db or dw, etc. that does not specify the size of the symbol: the st_size field of the symbol table entry is not affected.
equ makes the symbol in the current line have st_shndx == SHN_ABS magic value in its symbol table entry.
Instead of outputting a byte to the current object file location, it outputs it to the st_value field of the symbol table entry.
All else follows from this.
To understand what that really means, you should first understand the basics of the ELF standard and relocation.
SHN_ABS theory
SHN_ABS tells the linker that:
relocation is not to be done on this symbol
the st_value field of the symbol entry is to be used as a value directly
Contrast this with "regular" symbols, in which the value of the symbol is a memory address instead, and must therefore go through relocation.
Since it does not point to memory, SHN_ABS symbols can be effectively removed from the executable by the linker by inlining them.
But they are still regular symbols on object files and do take up memory there, and could be shared amongst multiple files if global.
Sample usage
section .data
x: equ 1
y: db 2
section .text
global _start
_start:
mov al, x
; al == 1
mov al, [y]
; al == 2
Note that since the symbol x contains a literal value, no dereference [] must be done to it like for y.
If we wanted to use x from a C program, we'd need something like:
extern char x;
printf("%d", &x);
and set on the asm:
global x
Empirical observation of generated output
We can observe what we've said before with:
nasm -felf32 -o equ.o equ.asm
ld -melf_i386 -o equ equ.o
Now:
readelf -s equ.o
contains:
Num: Value Size Type Bind Vis Ndx Name
4: 00000001 0 NOTYPE LOCAL DEFAULT ABS x
5: 00000000 0 NOTYPE LOCAL DEFAULT 1 y
Ndx is st_shndx, so we see that x is SHN_ABS while y is not.
Also see that Size is 0 for y: db in no way told y that it was a single byte wide. We could simply add two db directives to allocate 2 bytes there.
And then:
objdump -dr equ
gives:
08048080 <_start>:
8048080: b0 01 mov $0x1,%al
8048082: a0 88 90 04 08 mov 0x8049088,%al
So we see that 0x1 was inlined into instruction, while y got the value of a relocation address 0x8049088.
Tested on Ubuntu 14.04 AMD64.
Docs
http://www.nasm.us/doc/nasmdoc3.html#section-3.2.4:
EQU defines a symbol to a given constant value: when EQU is used, the source line must contain a label. The action of EQU is to define the given label name to the value of its (only) operand. This definition is absolute, and cannot change later. So, for example,
message db 'hello, world'
msglen equ $-message
defines msglen to be the constant 12. msglen may not then be redefined later. This is not a preprocessor definition either: the value of msglen is evaluated once, using the value of $ (see section 3.5 for an explanation of $) at the point of definition, rather than being evaluated wherever it is referenced and using the value of $ at the point of reference.
See also
Analogous question for GAS: Difference between .equ and .word in ARM Assembly? .equiv seems to be the closes GAS equivalent.

equ: preprocessor time. analogous to #define but most assemblers are lacking an #undef, and can't have anything but an atomic constant of fixed number of bytes on the right hand side, so floats, doubles, lists are not supported with most assemblers' equ directive.
db: compile time. the value stored in db is stored in the binary output by the assembler at a specific offset. equ allows you define constants that normally would need to be either hardcoded, or require a mov operation to get. db allows you to have data available in memory before the program even starts.
Here's a nasm demonstrating db:
; I am a 16 byte object at offset 0.
db '----------------'
; I am a 14 byte object at offset 16
; the label foo makes the assembler remember the current 'tell' of the
; binary being written.
foo:
db 'Hello, World!', 0
; I am a 2 byte filler at offset 30 to help readability in hex editor.
db ' .'
; I am a 4 byte object at offset 16 that the offset of foo, which is 16(0x10).
dd foo
An equ can only define a constant up to the largest the assembler supports
example of equ, along with a few common limitations of it.
; OK
ZERO equ 0
; OK(some assemblers won't recognize \r and will need to look up the ascii table to get the value of it).
CR equ 0xD
; OK(some assemblers won't recognize \n and will need to look up the ascii table to get the value of it).
LF equ 0xA
; error: bar.asm:2: warning: numeric constant 102919291299129192919293122 -
; does not fit in 64 bits
; LARGE_INTEGER equ 102919291299129192919293122
; bar.asm:5: error: expression syntax error
; assemblers often don't support float constants, despite fitting in
; reasonable number of bytes. This is one of the many things
; we take for granted in C, ability to precompile floats at compile time
; without the need to create your own assembly preprocessor/assembler.
; PI equ 3.1415926
; bar.asm:14: error: bad syntax for EQU
; assemblers often don't support list constants, this is something C
; does support using define, allowing you to define a macro that
; can be passed as a single argument to a function that takes multiple.
; eg
; #define RED 0xff, 0x00, 0x00, 0x00
; glVertex4f(RED);
; #undef RED
;RED equ 0xff, 0x00, 0x00, 0x00
the resulting binary has no bytes at all because equ does not pollute the image; all references to an equ get replaced by the right hand side of that equ.

Where const strings are saved in assembly?

When i declare a string in assembly like that:
string DB "My string", 0
where is the string saved?
Can i determine where it will be saved when declaring it?

db assembles output bytes to the current position in the output file. You control exactly where they go.
There is no indirection or reference to any other location, it's like char string[] = "blah blah", not char *string = "blah blah" (but without the implicit zero byte at the end, that's why you have to use ,0 to add one explicitly.)
When targeting a modern OS (i.e. not making a boot-sector or something), your code + data will end up in an object file and then be linked into an executable or library.
On Linux (or other ELF platforms), put read-only constant data including strings in section .rodata. This section (along with section .text where you put code) becomes part of the text segment after linking.
Windows apparently uses section .rdata.
Different assemblers have different syntax for changing sections, but I think section .whatever works in most of the one that use DB for data bytes.
;; NASM source for the x86-64 System V ABI.
section .rodata ; use section .rdata on Windows
string DB "My string", 0
section .data
static_storage_for_something: dd 123 ; one dword with value = 123
;; usually you don't need .data and can just use registers or the stack
section .bss ; zero-initialized memory, bytes not stored in the executable, just size
static_array: resd 12300000 ;; 12300000 dwords with value = 0
section .text
extern puts ; defined in libc
global main
main:
mov edi, string ; RDI = address of string = first function arg
;mov [rdi], 1234 ; would segfault because .rodata is mapped read-only
jmp puts ; tail-call puts(string)
peter#volta:/tmp$ cat > string.asm
(and paste the above, then press control-D)
peter#volta:/tmp$ nasm -f elf64 string.asm && gcc -no-pie string.o && ./a.out
My string
peter#volta:/tmp$ echo $?
10
10 characters is the return value from puts, which is the return value from main because we tail-called it, which becomes the exit status of our program. (Linux glibc puts apparently returns the character count in this case. But the manual just says it returns non-negative number on success, so don't count on this)
I used -no-pie because I used an absolute address for string with mov instead of a RIP-relative LEA.
You can use readelf -a a.out or nm to look at what went where in your executable.

ARM inline asm: exit system call with value read from memory

Problem
I want to execute the exit system call in ARM using inline assembly on a Linux Android device, and I want the exit value to be read from a location in memory.
Example
Without giving this extra argument, a macro for the call looks like:
#define ASM_EXIT() __asm__("mov %r0, #1\n\t" \
"mov %r7, #1\n\t" \
"swi #0")
This works well.
To accept an argument, I adjust it to:
#define ASM_EXIT(var) __asm__("mov %r0, %0\n\t" \
"mov %r7, #1\n\t" \
"swi #0" \
: \
: "r"(var))
and I call it using:
#define GET_STATUS() (*(int*)(some_address)) //gets an integer from an address
ASM_EXIT(GET_STATUS());
Error
invalid 'asm': operand number out of range
I can't explain why I get this error, as I use one input variable in the above snippet (%0/var). Also, I have tried with a regular variable, and still got the same error.

Extended-asm syntax requires writing %% to get a single % in the asm output. e.g. for x86:
asm("inc %eax") // bad: undeclared clobber
asm("inc %%eax" ::: "eax"); // safe but still useless :P
%r7 is treating r7 as an operand number. As commenters have pointed out, just omit the %s, because you don't need them for ARM, even with GNU as.
Unfortunately, there doesn't seem to be a way to request input operands in specific registers on ARM, the way you can for x86. (e.g. "a" constraint means eax specifically).
You can use register int var asm ("r7") to force a var to use a specific register, and then use an "r" constraint and assume it will be in that register. I'm not sure this is always safe, or a good idea, but it appears to work even after inlining. #Jeremy comments that this technique was recommended by the GCC team.
I did get some efficient code generated, which avoids wasting an instruction on a reg-reg move:
See it on the Godbolt Compiler Explorer:
__attribute__((noreturn)) static inline void ASM_EXIT(int status)
{
register int status_r0 asm ("r0") = status;
register int callno_r7 asm ("r7") = 1;
asm volatile("swi #0\n"
:
: "r" (status_r0), "r" (callno_r7)
: "memory" // any side-effects on shared memory need to be done before this, not delayed until after
);
// __builtin_unreachable(); // optionally let GCC know the inline asm doesn't "return"
}
#define GET_STATUS() (*(int*)(some_address)) //gets an integer from an address
void foo(void) { ASM_EXIT(12); }
push {r7} # # gcc is still saving r7 before use, even though it sees the "noreturn" and doesn't generate a return
movs r0, #12 # stat_r0,
movs r7, #1 # callno,
swi #0
# yes, it literally ends here, after the inlined noreturn
void bar(int status) { ASM_EXIT(status); }
push {r7} #
movs r7, #1 # callno,
swi #0 # doesn't touch r0: already there as bar()'s first arg.
Since you always want the value read from memory, you could use an "m" constraint and include a ldr in your inline asm. Then you wouldn't need the register int var asm("r0") trick to avoid a wasted mov for that operand.
The mov r7, #1 might not always be needed either, which is why I used the register asm() syntax for it, too. If gcc wants a 1 constant in a register somewhere else in a function, it can do it in r7 so it's already there for the ASM_EXIT.
Any time the first or last instructions of a GNU C inline asm statement are mov instructions, there's probably a way to remove them with better constraints.

Creating plain binary data (no ELF, symbol table etc) using an assembler

I want to turn a data-only input file, i.e. something like this:
.data
.org 0
.equ foo, 42
.asciz "foo"
label:
.long 0xffffffff
.long 0x12345678
.byte foo
.long label
.long bar
.equ bar, 'x'
into a file with the corresponding byte sequence 'f','o','o', 0, 0xff, 0xff, 0xff, 0xff, 0x78, 0x56, 0x34, 0x12, 42, 4, 0, 0, 0, 'x', 0 , 0, 0.
When I assemble this with GNU as (as -o foo.o -s foo.S), I get an 400+ bytes ELF file. How can I make GNU as (or NASM or any other assembler) give me the plain binary representation? I've studied the GNU as options but to no avail. I can modify the input format, if that makes the answer easier (i.e. use more and different pseudo ops).
Any hints deeply appreciated!
Regards, Jens

in MASM you would assemble MASM into an .obj file, LINK into an .exe file and then postprocess the result exefile with the EXE2BIN utility.
in TASM you would assemble into an .obj file and then link TLINK with the /t/x parameters.

I dug around a bit and found a solution using nasm, grabbed from http://www.nasm.us/.
The equivalent directives for the original data would be something like this:
org 0
foo equ 42
db "foo", 0
label:
dd 0xffffffff
dd 0x12345678
db foo
dd label
dd bar
bar equ 'x'
Assemble this with nasm -f bin -o file.bin file.S. Voila! Plain binary in file.bin. Guess that makes me a self-learner :-)

gcc, static library, external assembly function becomes undefined symbol

I have a problem with g++ building an application which links to a static library, where the latter shall contain some global functions written in external asm-files, compiled with yasm. So in the library, I have
#ifdef __cplusplus
extern "C" {
#endif
extern void __attribute__((cdecl)) interp1( char *pSrc );
extern void __attribute__((cdecl)) interp2( char *pSrc );
#ifdef __cplusplus
}
#endif
which I reference elsewhere inside the library. Then, there is the implementation in an asm-file, like this:
section .data
; (some data)
section .text
; (some text)
global _interp1
_interp1:
; (code ...)
ret
global _interp2
_interp2:
; (code ...)
ret
Compiling and Linking work fine for the library, I do
yasm -f elf32 -O2 -o interp.o interp.asm
and then
ar -rc libInterp.a objs1.o [...] objsN.o interp.o
ranlib libInterp.a
Now finally, to link the library to the main application, I do
g++ -O4 -ffast-math -DNDEBUG -fomit-frame-pointer -DARCH_X86 -fPIC -o ../bin/interp this.o that.o -lboost_thread -lpthread ./libInterp.a
and I get the errors
undefined reference to `interp1'
undefined reference to `interp2'
What am I doing wrong here? any help is appreciated.

Depending on the target type, gcc will not prepend a leading underscore to external symbols. It appears that this is the case in your scenario.
The simple fix is probably to remove the underscores from the names in your assembly file.
A couple alternatives you might consder might be to use something like one of the following macros for your symbols in the assembly file:
from http://svn.xiph.org/trunk/oggdsf/src/lib/codecs/webm/libvpx/src/vpx_ports/x86_abi_support.asm
; sym()
; Return the proper symbol name for the target ABI.
;
; Certain ABIs, notably MS COFF and Darwin MACH-O, require that symbols
; with C linkage be prefixed with an underscore.
;
%ifidn __OUTPUT_FORMAT__,elf32
%define sym(x) x
%elifidn __OUTPUT_FORMAT__,elf64
%define sym(x) x
%elifidn __OUTPUT_FORMAT__,x64
%define sym(x) x
%else
%define sym(x) _ %+ x
%endif
from http://www.dcs.warwick.ac.uk/~peter/otherstuff.html
%macro public_c_symbol 1
GLOBAL %1,_%1
%1:
_%1:
%endmacro
public_c_symbol my_external_proc:
; ...
RET

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string