Is there a way to use NASM syntax for inline assembly? - rust

I really dislike the GNU Assembler syntax and I some existing code written with NASM syntax that would be quite painful and time consuming to port.
Is it possible to make the global_asm!() macro use NASM as the assembler or possibly make GAS use NASM syntax?

You might be able to change it but it seems as if GAS is the only viable option. In Directives Support:
'Inline assembly supports a subset of the directives supported by both GNU AS and LLVM's internal assembler, given as follows. The result of using other directives is assembler-specific (and may cause an error, or may be accepted as-is).'
Additionally,the documentation states "Currently, all supported targets follow the assembly code syntax used by LLVM's internal assembler which usually corresponds to that of the GNU assembler (GAS). On x86, the .intel_syntax noprefix mode of GAS is used by default.'
This might be helpful as well https://github.com/Amanieu/rfcs/blob/inline-asm/text/0000-inline-asm.md

Related

Different assembly syntaxes for same cpu?

I've decided to learn assembler through online tutorials.
I've come across this one that uses the NASM compiler, which most other tutorials seem to as well:
http://www.tutorialspoint.com/assembly_programming/index.htm
I've also come across this youtube series "Assembly primer for hackers"
https://www.youtube.com/watch?v=K0g-twyhmQ4&list=PLue5IPmkmZ-P1pDbF3vSQtuNquX0SZHpB
This one uses what the guy describes as the 'generic linux compiler' (owtte).
The commands for compiling go something like this:
as -o file.o file.s
Where file.s is the assembly source code. Followed by:
ld -o file file.o
Where file is then the executable.
Each of the tutorials uses a different syntax (e.g. a register in the latter tutorial is always preceded by %. NB. There do appear to be less superficial differences in the syntax than this as well). Are these syntaxes decided by the individual compiler?
I was also initially confused when I tried to compile code from the NASM tutorial with the latter method. I was always under the impression that the instruction set had to depend on the CPU and it therefore shouldn't matter which compiler I use. I've just concluded that it's merely differences in syntax but is that correct?
I'm running a Linux computer, by the way, on kernel 4.1.6.
My main question is really which syntax do I use? Is it just a matter of choice? Is one more widely used than the other? Thanks for any help.
Each of the tutorials uses a different syntax (e.g. a register in the
latter tutorial is always preceded by %. NB. There do appear to be
less superficial differences in the syntax than this as well). Are
these syntaxes decided by the individual compiler?
Yes, different assemblers (= assembly language compilers) might use different assembler language syntax although they provide code for the same processor and platform.
My main question is really which syntax do I use? Is it just a matter
of choice? Is one more widely used than the other?
One assembler, like NASM, might go for a wide range of processors and platforms, in this case you would benefit from learning its syntax when you need to work with several processors or platforms.
In other cases it might be better to stick with the assembler of some prominent vendor, because it is widely used and you can find more example code on the net for it which might help you with your development.
Last not least you might simply prefer a particular assembler because you like its features or syntax.
If your'e on a Windows system, Microsoft's MASM (ML.EXE or ML64.exe for 64 bit) syntax is virtually the same as Intel's syntax. MASM (ML.EXE and ML64.EXE) is included with the free Visual Studio express editions, although you usually have to create a custom build step to invoke the assembler in a VS project. VS express includes a good source level debugger.
If you're on a Linux type system, then you'll probably use AT&T syntax, which I assume ended up that way since it was a conversion of some generic assembler. I don't know which assembler(s) to recommend for Linux.

ARM assembly "retne" instruction

I am currently in the process of understanding what it takes for the Linux kernel to boot. I was browsing through the Linux kernel source tree, in particular for the ARM architecture, until I stumbled upon this assembly instruction retne lr in arch/arm/kernel/hyp-stub.S
Conceptually, it's easily understood that the instruction is suppose to return to the address stored in the link register if the Z-flag is 0. What I am looking for is where this ARM assembly instruction is actually documented.
I searched in the ARM Architecture Reference Manual ARMv7-A and ARMv7-R edition section A8.8 and could not find the description of the instruction.
Grepping the sources and seeing if it was an ARM specific GNU AS extension did not turn up anything in particular.
A google search with the queries "arm assembly ret instruction", "arm return instruction" and anything similar along the lines did not turn up anything useful either. Surely I must be looking in the wrong places or I must be missing something.
Any clarification will be much appreciated.
The architectural assembly language is one thing, real world code is another. Once assembler pseudo-ops and macros come into play, a familiarity with both the toolchain and the codebase in question helps a lot. Linux is particularly nasty as much of the assembly source contains multiple layers of both assembler macros and CPP macros. If you know what to look for, and follow the header trail to arch/arm/include/asm/assembler.h, you eventually find this complicated beast:
.irp c,,eq,ne,cs,cc,mi,pl,vs,vc,hi,ls,ge,lt,gt,le,hs,lo
.macro ret\c, reg
#if __LINUX_ARM_ARCH__ < 6
mov\c pc, \reg
#else
.ifeqs "\reg", "lr"
bx\c \reg
.else
mov\c pc, \reg
.endif
#endif
.endm
.endr
The purpose of this is to emit the architecturally-preferred return instruction for the benefit of microarchitectures with a return stack, whilst allowing the same code to still compile for older architectures.

Documentation for compiler gcc compiler flags?

I am building a makefile for an SDL/OpenGL program. In looking at the Makefile for the SDL2.0 examples, I see compiler flags such as DHAVE_OPENGL, and D_REENTRANT. Nowhere in the man pages for gcc can I find information on either of these flags. Where on the internet/my system can I find documentation about all the flags supported by gcc?
the -D option is used not to define specific compiler flags but to pass macro definitions to the preprocessor.
Indeed -DHAVE_OPENGL is like having #define HAVE_OPENGL 1 in your source code. So they are not related to the compiler per se but just on the code you are compiling.
Here you can find a comprehensive documentation of GCC options in any case.
Those are not compiler flags per-se. -D is a compiler flag, but what follows is a pre-processor definition. You will not find any information on what those mean in the compiler docs because it affects the behavior (e.g. which portions of the code are actually included during compilation) of the actual code that you are building.
So unfortunately, the only way you will know what defining those pre-processor tokens will do is if you investigate the source code you are compiling or if the library you are using documents them.
Generally speaking however, HAVE_OPENGL lets SDL know to compile GL-related code.
Re-entrancy is used for thread safety, and although _REENTRANT is not a standard pre-processor definition (though commonly used with some C stdlib implementations), it is safe to assume that it will cause your software to select re-entrant versions of functions whenever possible.

GNU g++ inline assembly block like Apple g++/Visual C++?

I am currently following a course at my University in which, at this stage, we learn about the assembler code behind certain C/C++ constructs.
The workflow usually goes like this: the lab assistant briefly speaks about a topic, we figure out the quirks and then solve some totally random problem using inline assembly.
(For example: He briefly talks about how struct (members) are stored in memory, we figure out the pattern and then we write the solution using inline assembly to a simple problem in which we use a struct.)
The lab assistant (as well as the rest of the group) is using the Visual C++ compiler and debugger (for disassembly) for his demonstrations however I cannot use it due to ethical reasons and thus I opted for g++ and gdb.
What I find awkward about g++'s inline assembly compared to Visual C++ is the fact that:
If I want to write a 'block' of inline assembly I have two options: Have a single asm("..") construct in which each instruction is preceded by a \n\t (leads to a lot of clutter). Or have each instruction in its own asm("..") block (leads to a lot of typing).
If I want to reference a local variable in the inline assembly I have to either use the extended syntax or reference it by using offsets to esp/ebp.
In respect to the two issues above I prefer the Visual C++'s inline assembly style in which in order to write an asm block all I have to do is __asm { .. } and write each instruction on a new line and in order to reference a variable I just have to write its name.
Throughout my searches I have discovered that Apple's g++ supports the same syntax as Visual C++ with a switch (-fasm-blocks) however this does not seem to be the case for GNU g++.
In the hopes that I might have missed something I am asking here if it is possible to compile Visual C++ like inline assembly blocks under GNU g++.
The syntax you are referring to is not Microsoft specific. As you have found, Apple had it too (although Apple gave up on GCC and switched to Clang). AFAIK, Metrowerks supports the same syntax. GCC does not support it (probably because GCC guys believe that GCC is so good that nobody needs to write assembly anymore :-)). However, there is no need to type \n\t all the time, you can replace it with ;. For example:
void foo()
{
asm("xor %eax,%eax;"
"rep; nop;"
"nop;"
"sfence;"
"nop;");
}
Hope it helps. Good Luck!

Is assembly language `assembler` specific too? Which assembler is best?

I'm learning assembly language. I started with Paul A. Carter's PC Assembly Language which uses NASM (The Netwide Assembler). Then in the middle I switched and started reading Introduction to 80×86 Assembly Language and Computer Architecture which uses MASM.
In NASM I used to write, for initializing a byte
db 110101b
In MASM I'm using
BYTE 110101b
I'm in the middle of reading. Since these are Assembler directives they will be different for each assembler. right?
Doesn't these assembler developers follow a standard for these directives? Because, They know that mnemonics are CPU specific. So, its pain in the ass to learn and code in assembly language.
Now if they follow different directives, its more pain if you change assembler or if you switch the operating system (MASM developer is in deep trouble if he goes to linux).
My confusion is should I acquaint myself with NASM or MASM? I'm fan of windows but I may have to work (in future) on Linux also.
Every book should be titled "_________ Assembly Language using __________ Assembler"
Unfortunately there has never been a standard for assembly language. You'll just have to learn the directives that your assembler supports. Fortunately most of the directives, while having different names, are semantically similar like db and BYTE.
But wait! It gets worse, especially for the x86. You have (at least) two forms of code that assemblers can accept: Intel and AT&T format. AT&T format reverses the order of most operands to instructions (or is it visa versa ;-).
NASM is probably a better choice for portability, but you could also look at the GNU
assembler..
Intel Syntax / AT&T Syntax
With x86 in particular, the first assemblers were from Intel and then largely-compatible assemblers from Microsoft formed one branch.
These assemblers organize source and destination operands right to left and have an unusual (and to my eyes, kind of wacky) abstraction layer that uses a single mnemonic for 8, 16, and 32-bit ops and then derives the actual machine opcode to use based on properties of the operand. Modifiers exist (on operands) to force a particular size.
But Unix was also important and it had a completely different assembler line with different traditions and conventions.
The original Unix vendor was AT&T, which owned the intellectual property developed at Bell Labs. A series of BSD projects and then Linux continued with this tradition. These assemblers historically process operands left to right, have a spare design optimized for speed, and when used by humans they generally use cpp for macros and conditionals, even if the assembler also has parallel features.
These days you are probably using VS on MS or Gnu on Linux or Mac, but this is why we still say AT&T vs Intel. The GNU assembler has an option to assemble both ways, although it's still really in the AT&T camp.
Generally yes. They are mostly feature-compatible though, so converting from one assembler syntax to another is usually not terribly difficult if you know both.
Processors are all documented in a manufacturer supplied Reference manual. This usually developed into the normative syntax (along with the assembler provided by the vendor) for assembly programs on a particular platform. Consequently, many processors from a single vendor have similar syntax.
The situation became more complex with second sourcing of processors and the eventual development of multi-targeting assemblers that, for historical reasons, use mostly consistent syntax across all platforms. This also provides some arguable advantages when porting code across platforms.
Your best choices are to: pick a notation you are comfortable with and accept books with different syntax, see if you can locate cross-system macro libraries or translation tools or bite the bullet and learn multiple dialects. The third is usually tolerable although it makes building private libraries labour intensive.

Resources