Limitations of Intel Assembly Syntax Compared to AT&T [closed] - linux

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
To me, Intel syntax is much easier to read. If I go traipsing through assembly forest concentrating only on Intel syntax, will I miss anything? Is there any reason I would want to switch to AT&T (outside of being able to read others' AT&T assembly)? My first clue is that gdb uses AT&T by default.
If this matters, my focus is only on any relation assembly and syntax may have to Linux/BSD and the C language.

There is really no advantage to one over the other. I agree though that Intel syntax is much easier to read. Keep in mind that, AFAIK, all GNU tools have the option to use Intel syntax also.
It looks like you can make GDB use Intel syntax with this:
set disassembly-flavor intel
GCC can do Intel syntax with -masm=intel.

The primary syntax for the GNU assembler (GAS) is AT&T. Intel syntax is a relatively new addition to it. x86 assembly in the Linux kernel is in AT&T syntax. In the Linux world, it's the common syntax. In the MS world, Intel syntax is more common.
Personally, I hate AT&T syntax. There are plenty of free assemblers (NASM, YASM) along with GAS that support Intel syntax too, so there won't be any problems doing Intel syntax in Linux.
Beyond that, it's just a syntactic difference. The result of both will be the same x86 machine code.

There is really no advantage to one over the other. I disagree though that Intel syntax is much easier to read, because personally I hate Intel syntax. Keep in mind that, AFAIK, all GNU tools have the option to use Intel syntax also.
at&t noprefix intel
mov eax, -4(ebp,edx,4) mov DWORD PTR[-4 +ebp +edx *4], eax
mov eax, -4(ebp) mov DWORD PTR[-4 +ebp], eax
mov edx, (ecx) mov DWORD PTR[ecx], edx
lea ( ,eax,4), eax lea eax, DWORD PTR[8 + eax*4]
lea (eax,eax,2), eax lea eax, DWORD PTR[eax*2+eax]
...and it gets more complicated with more complex instructions
'nuff said.
PS: This answer exists mainly for the reason of highlighting (IMHO) weaknesses in some other answers, which are actually not answers, but opinions. And of course this answer in reality is only my humble opinion.
PPS: I do not hate Intel syntax, I just don't care.

It's the "same language", in that it compiles down to the same machine code, has the same opcodes, etc. On the other hand, if you are using GCC at all, you will probably want to learn AT&T syntax, just because it's the default--no changing compiler options, etc. to get it.
I too cut my teeth on Intel-syntax x86 ASM (on DOS, too) and found it more intuitive initially when switching to C/UNIX. But once you learn AT&T it'll look just as easy.
I wouldn't give it that much thought---it's easy to learn AT&T once you know Intel, and vice-versa. The actual language is much harder to get in your head than the syntax. So by all means just focus on one and then learn the other when it comes up.

It's a sign of professionalism that you are willing to adjust to whatever is in use. There is no real advantage to one or the other. The intel syntax is common in the Microsoft world, AT&T is the standard in Linux/Unix. Since there's no advantage to either one, people tend to imprint on whatever they saw first. That said, a professional programmer raises above things like that. Use whatever they use at work, or in the domain that you're working in.

Intel syntax covers everything (assuming the assembler/disassembler is up to date with the latest junk Intel added to their instruction set). I'm sure at&t is the same.
at&t intel
movl -4(%ebp, %edx, 4), %eax mov eax, [ebp-4+edx*4]
movl -4(%ebp), %eax mov eax, [ebp-4]
movl (%ecx), %edx mov edx, [ecx]
leal 8(,%eax,4), %eax lea eax, [eax*4+8]
leal (%eax,%eax,2), %eax lea eax, [eax*2+eax]
...and it gets more complicated with more complex instructions
'nuff said.

My first assembly language was MIPS, which I've noticed is very similar to the ATT syntax. So I prefer the ATT syntax, but it doesn't really matter as long as you can read it.

Related

Difference between NASM, TASM, & MASM

Can somebody explain the differences between: masm, tasm, & nasm?
Why can't I run tasm code on linux? Are they different languages?
I thought that assembly language was unique for all systems.
TASM, MASM, and NASM are x86 assemblers.
Borland Turbo Assembler (TASM) and Microsoft Macro Assembler (MASM) are DOS/Windows-based, Netwide Assembler (NASM) is available for other platforms as well. TASM produces 16-bit/32-bit output, MASM and NASM also produce 64-bit output.
All of those assemblers take the x86 instruction set as input. That does not mean that assembler source files are identical and compatible, however.
Instruction syntax
Assemblers expect either the original syntax used in the Intel instruction set documentation - the Intel syntax - or the so-called AT&T syntax developed at the AT&T Bell Labs. AT&T uses mov src, dest, Intel uses mov dest, src, amongst other differences.
Windows assemblers prefer Intel syntax (TASM, MASM), most Linux/UNIX assemblers use AT&T. NASM uses a variant of the Intel syntax.
Assembler-specific syntax
Assemblers have their own syntax for directives affecting the assembly process, macros and comments. These usually differ from assembler to assembler.
Compatibility
TASM can assemble MASM sources in "MASM mode".
NASM can assemble TASM code in "TASM mode". So, in theory, you can take TASM code and assemble them using NASM on Linux using that mode. Of course, the code might still need adjustments. If the code have OS dependencies, these will require your attention as well as you move from Windows to Linux.

Why x86 places arguments on stack?

In MIPS arguments are placed in $a0 to $a4 registers for faster accesses. Why some x86 architectures make the design choice to place arguments on stack instead of in registers? What are the advantages of doing this?
The real answer is that it depends more on the compiler than the processor, although I suspect the reason that it is so common for x86 compilers to push arguments onto the stack is that the x86 CPU has always suffered from a scarcity of registers. By the time you eliminate the reserved registers, you are left with three - EAX, ECX, and EDX, which corresponded to AX, CX, and DX in the original 16-bit x86 instruction set. So far as I know, the first processor in the Intel line to raise that limit is the 64 bit "AMD" architecture, which adds eight more, numbered R9 through R15. The five reserved registers get new names, but they remain reserved.
To reinforce my assertion that it depends on the compiler, you need look no further than the native code generator that ships with the Microsoft .NET Framework, which exhibits a hybrid approach, where the first two arguments go into ECX (or RCX) and EDX (or RDX), while additional arguments go onto the stack.
Even within C/C++ code generated by the Microsoft Visual C++ compiler, although the most common calling conventions, __cdecl and __stdcall, use the stack, a third, __fastcall, uses registers. Moreover, I have seen Assembly code that used both; if it needed to talk to C, the routine expected arguments in registers, but private routines that received calls only from other routines in the library used registers.
Registers are naturally faster, considerably faster, but you have to have enough of them. x86 traditionally had very few registers so a stack based approach was the way to go, at that time in history it was in general the way to go, risc and others came along with a lot more registers and the idea of using registers for the first few parameters and perhaps the return value was now something to consider. x86 has more registers now but has generally been stack based although I think zortech or watcom or perhaps even gcc now had a command line option to use a few registers, would have to confirm or deny that with research. but historically it has used the stack for parameters and registers.
ARM, MIPS, etc all have a finite number of registers so eventually dump into the stack, if you keep/control your parameters number and size and at times ordering you can try to limit this and improve performance.
At the end of the day the bottom line is someone or some team defines the calling convention, it is the choice ultimately of the compiler authors, doesnt matter if the chip/processor designer has a recommendation the compiler defines what its calling convention is be it follow a recommendation or do their own thing. there is no reason to create a MIPS or ARM compiler/toolchain that is mostly stack based (the instruction set itself might dictate stack or register based returns, or it could be optional), likewise you are more than welcome to make an x86 compiler with a convention that starts with registers then moves to the stack after some number of them are used.
so a little bit history and a little bit because they choose to...
The simple answer is that you must always follow the ABI for the platform that you are running on. The longer answer is that you incorrectly assume that every 32 bit x86 platform will exclusively use the stack for argument passing. In fact, while each platform will adopt a standard, there are numerous approaches, any of which can be used. (fastcall, cdecl, etc.)

Reading integers from keyboard in Assembly (Linux IA-32 x86 gcc gas)

I'd like to know how to read integers from keyboard in assembly. I'm using Linux/x86 IA-32 architecture and GCC/GAS (GNU Assembler). The examples I found so far are for NASM or some other Windows/DOS related compiler.
I heard that it has something to do with the "int 16h" interrupt, but I don't know how it works (does it needs parameters? The result goes to %eax or any of its virtual registers [AX, AH, AL]?).
Thanks in advance,
Flayshon.
:D
Simple answer is that you don't read integers from the keyboard, you read characters from the keyboard. You don't print integers to the screen, either - you print characters. You will need routines to convert "ascii-to-integer" and "integer-to-ascii". You can "just call scanf" for the one, and "just call printf" for the other. "scanf" works okay if the user is well-behaved and confines input to characters representing decimal digits, but it's difficult to get rid of any "junk" entered! "printf" isn't too bad.
Although I'm a Nasm user (it works fine for Linux - not really "Windows/dos related"), I might have routines in (G)as syntax lying around. I'll see if I can find 'em if you can't figure it out.
As Brian points out, int 16h is a BIOS interrupt - 16-bit code - and is not useful in Linux.
Best,
Frank
In 2012, I don't recommend coding an entire program in assembly. Code only the most critical parts (if you absolutely want some assembly code). Compilers are optimizing better than humans. So use C or C++ for low level software, and higher-level languages e.g. Ocaml instead.
On Linux, you need to understand the role of the linux kernel and of system calls, which are documented in the section 2 of man pages. You probably want at least read(2) and write(2) (if only handling stdin and stdout which should have already be opened by the parent process, e.g. a shell), and you probably need many other syscalls (e.g. open(2) and close(2)). Don't forget to do your buffering (for efficiency purpose).
I strongly recommend learning the Linux system interfaces by reading a good book such as Advanced Unix Programming.
How system calls are done at the machine level in assembly is documented in the Linux Assembly Howto (at least for x86 Linux in 32 bits).
If your goal is to "obtain" a program, I would agree entirely with Basile. If your goal is to "learn assembly language", these other languages aren't really going to help. If your goal is to learn the nitty-gritty details of the hardware, you probably want assembly language, but Linux (or any other "protected mode" OS) isolates us from the hardware, so you might want to use clunky old DOS or even "write your own OS". Flayshon doesn't actually say what his goal is, but since he's asking here, he's probably interested in assembly language...
Some of us have a mental illness that makes us think it's "fun" to write in assembly language. Humor us!
Best,
Frank

Tutorial for GAS with 64bit

Does anyone know a tutorial for GAS where I can find infos about compiling and linking code in AT&T-Syntax on 64bit Systems? I need this for university, so I cannot use nasm instead.
All tuts I can find are either for nasm or something similiar or they only work on 32bit.
Even the minimalistic examples shown by our Prof work on my 32bit System but not on 64bit.
You just need to change the postfixes of the instructions and the register names. Change this:
movl %ebx, %ecx
# to:
movq %rbx, %rcx

A tool for understanding assembler programs?

I have an assembler program and I try to understand it. (I'm actually profiling it).
Is there a Linux tool/editor that would structurize it a little for me?
Displaying where loops/jumps are would be enough.
Handy description of what an instruction does would be great.
If you look for something that resembles OllyDbg but for linux, you might try edb.
Since you are really reversing a high level language for profiling, you can do a couple things to help the process. First enable and preserve debugging information in the C++ compiler and linker (don't let it strip the executable). In g++ the -g command line flag does this. Second many C++ compilers have flags to output the immediate assembly source code, rather than emitting the object code (which is used by the linker). In g++ the -S flag enables this.
The assembly from the compiler can be compared to the assembly from the oprofile disassembly.
I'm not very familiar with decompilers, but two including one from another SO post are Boomerang and REC for C, rather than C++.
I hope that helps.
There's an Asm plugin for Eclipse you could use.
However, usually IDEs aren't employed for assembly programming. I'm not sure how much more understanding you will gain by easily spotting where the jumps and loops are - you have to read through everything anyway.
Have a look at this...
http://sourceforge.net/projects/asmplugin/
It's a plugin for Eclipse...
Not really, but you can always look at instruction references, use syntax highlighting (vim asm syntax) also you could step it through debugger if there's no limitation running it. For already assembled code this might be interesting: LIDA
Only, for PowerPC CPU
The project Demono purpose to restore algorithms of binary code (currently, only for PPC) - and it's not fully decompiler. Project is under construction, but some examples are works.
Site has Online service for generating C-like description of functions from assebler:
online service for decompile PPC asm
For use it, you should perform following steps:
Disassemble binary code(for PowerPC) to text of assembler (by IDA)
Insert text of assebler to field "Ассемблер"
Press button "Восстановить"
Look to form "Восстановленный алгоритм"
For example, if your assembler has text:
funct(){
0x00000002: cmpw r3, r4
0x00000003: ble label_1
0x00000004: mr r6, r3
0x00000005: b label_2
0x00000006: label_1:
0x00000007: mr r6, r4
0x00000008: label_2:
0x00000009: cmpw r6, r5
0x0000000A: ble label_3
0x0000000B: mr r7, r6
0x0000000C: b label_4
0x0000000D: label_3:
0x0000000E: mr r7, r5
0x0000000F: label_4:
0x00000010: mr r3, r7
0x00000011: blr
}
Online service restore following function's algorithm description:
funct(arg_w_0, arg_w_1, arg_w_2){
if(arg_w_0>arg_w_1?true?false){
arg_w_0=arg_w_1;
}else{
}
if(arg_w_0>arg_w_2?true?false){
arg_w_0=arg_w_2;
}else{
}
return (arg_w_0);
}
And, therefore, funct() derive max of three numbers - max3().
Possible, this service helps you understand how assembler instructions works..

Resources