Does anyone know a tutorial for GAS where I can find infos about compiling and linking code in AT&T-Syntax on 64bit Systems? I need this for university, so I cannot use nasm instead.
All tuts I can find are either for nasm or something similiar or they only work on 32bit.
Even the minimalistic examples shown by our Prof work on my 32bit System but not on 64bit.
You just need to change the postfixes of the instructions and the register names. Change this:
movl %ebx, %ecx
# to:
movq %rbx, %rcx
Related
Can somebody explain the differences between: masm, tasm, & nasm?
Why can't I run tasm code on linux? Are they different languages?
I thought that assembly language was unique for all systems.
TASM, MASM, and NASM are x86 assemblers.
Borland Turbo Assembler (TASM) and Microsoft Macro Assembler (MASM) are DOS/Windows-based, Netwide Assembler (NASM) is available for other platforms as well. TASM produces 16-bit/32-bit output, MASM and NASM also produce 64-bit output.
All of those assemblers take the x86 instruction set as input. That does not mean that assembler source files are identical and compatible, however.
Instruction syntax
Assemblers expect either the original syntax used in the Intel instruction set documentation - the Intel syntax - or the so-called AT&T syntax developed at the AT&T Bell Labs. AT&T uses mov src, dest, Intel uses mov dest, src, amongst other differences.
Windows assemblers prefer Intel syntax (TASM, MASM), most Linux/UNIX assemblers use AT&T. NASM uses a variant of the Intel syntax.
Assembler-specific syntax
Assemblers have their own syntax for directives affecting the assembly process, macros and comments. These usually differ from assembler to assembler.
Compatibility
TASM can assemble MASM sources in "MASM mode".
NASM can assemble TASM code in "TASM mode". So, in theory, you can take TASM code and assemble them using NASM on Linux using that mode. Of course, the code might still need adjustments. If the code have OS dependencies, these will require your attention as well as you move from Windows to Linux.
In MIPS arguments are placed in $a0 to $a4 registers for faster accesses. Why some x86 architectures make the design choice to place arguments on stack instead of in registers? What are the advantages of doing this?
The real answer is that it depends more on the compiler than the processor, although I suspect the reason that it is so common for x86 compilers to push arguments onto the stack is that the x86 CPU has always suffered from a scarcity of registers. By the time you eliminate the reserved registers, you are left with three - EAX, ECX, and EDX, which corresponded to AX, CX, and DX in the original 16-bit x86 instruction set. So far as I know, the first processor in the Intel line to raise that limit is the 64 bit "AMD" architecture, which adds eight more, numbered R9 through R15. The five reserved registers get new names, but they remain reserved.
To reinforce my assertion that it depends on the compiler, you need look no further than the native code generator that ships with the Microsoft .NET Framework, which exhibits a hybrid approach, where the first two arguments go into ECX (or RCX) and EDX (or RDX), while additional arguments go onto the stack.
Even within C/C++ code generated by the Microsoft Visual C++ compiler, although the most common calling conventions, __cdecl and __stdcall, use the stack, a third, __fastcall, uses registers. Moreover, I have seen Assembly code that used both; if it needed to talk to C, the routine expected arguments in registers, but private routines that received calls only from other routines in the library used registers.
Registers are naturally faster, considerably faster, but you have to have enough of them. x86 traditionally had very few registers so a stack based approach was the way to go, at that time in history it was in general the way to go, risc and others came along with a lot more registers and the idea of using registers for the first few parameters and perhaps the return value was now something to consider. x86 has more registers now but has generally been stack based although I think zortech or watcom or perhaps even gcc now had a command line option to use a few registers, would have to confirm or deny that with research. but historically it has used the stack for parameters and registers.
ARM, MIPS, etc all have a finite number of registers so eventually dump into the stack, if you keep/control your parameters number and size and at times ordering you can try to limit this and improve performance.
At the end of the day the bottom line is someone or some team defines the calling convention, it is the choice ultimately of the compiler authors, doesnt matter if the chip/processor designer has a recommendation the compiler defines what its calling convention is be it follow a recommendation or do their own thing. there is no reason to create a MIPS or ARM compiler/toolchain that is mostly stack based (the instruction set itself might dictate stack or register based returns, or it could be optional), likewise you are more than welcome to make an x86 compiler with a convention that starts with registers then moves to the stack after some number of them are used.
so a little bit history and a little bit because they choose to...
The simple answer is that you must always follow the ABI for the platform that you are running on. The longer answer is that you incorrectly assume that every 32 bit x86 platform will exclusively use the stack for argument passing. In fact, while each platform will adopt a standard, there are numerous approaches, any of which can be used. (fastcall, cdecl, etc.)
I am using cmp command in x86 processor and is working properly (binary files are generated using gcc)
but while using it in arm cortex a9, it does not give proper output (binaries are generated using cross gcc)
board specific binaries while comparing in X86 machine using cmp command, produces proper output.
X-86 machine:
say I got 2 files a.bin, b.bin (should be same while comparing using cmp)
cmp a.bin b.bin
and its proper.
Arm cortex A9:
a.bin, b.bin
cmp a.bin b.bin
here also it must be same.
but it generates a mismatch.
any clue please !!
Your question isn't very clear and is a little vague so I'll take a stab in the dark and assume that you're asking why the same source code compiles to different files.
Although a compiled program (assuming no UB or portability issues) will be functionally the same no matter what compiler is used, the program on the binary level won't necessarily be.
Different optimization levels will generate different files for example. The compiler may embed build dates into the file. Different compilers will arrange the code differently.
These are all reasons why you may be getting different outputs for the 'same' program.
I have an assembler program and I try to understand it. (I'm actually profiling it).
Is there a Linux tool/editor that would structurize it a little for me?
Displaying where loops/jumps are would be enough.
Handy description of what an instruction does would be great.
If you look for something that resembles OllyDbg but for linux, you might try edb.
Since you are really reversing a high level language for profiling, you can do a couple things to help the process. First enable and preserve debugging information in the C++ compiler and linker (don't let it strip the executable). In g++ the -g command line flag does this. Second many C++ compilers have flags to output the immediate assembly source code, rather than emitting the object code (which is used by the linker). In g++ the -S flag enables this.
The assembly from the compiler can be compared to the assembly from the oprofile disassembly.
I'm not very familiar with decompilers, but two including one from another SO post are Boomerang and REC for C, rather than C++.
I hope that helps.
There's an Asm plugin for Eclipse you could use.
However, usually IDEs aren't employed for assembly programming. I'm not sure how much more understanding you will gain by easily spotting where the jumps and loops are - you have to read through everything anyway.
Have a look at this...
http://sourceforge.net/projects/asmplugin/
It's a plugin for Eclipse...
Not really, but you can always look at instruction references, use syntax highlighting (vim asm syntax) also you could step it through debugger if there's no limitation running it. For already assembled code this might be interesting: LIDA
Only, for PowerPC CPU
The project Demono purpose to restore algorithms of binary code (currently, only for PPC) - and it's not fully decompiler. Project is under construction, but some examples are works.
Site has Online service for generating C-like description of functions from assebler:
online service for decompile PPC asm
For use it, you should perform following steps:
Disassemble binary code(for PowerPC) to text of assembler (by IDA)
Insert text of assebler to field "Ассемблер"
Press button "Восстановить"
Look to form "Восстановленный алгоритм"
For example, if your assembler has text:
funct(){
0x00000002: cmpw r3, r4
0x00000003: ble label_1
0x00000004: mr r6, r3
0x00000005: b label_2
0x00000006: label_1:
0x00000007: mr r6, r4
0x00000008: label_2:
0x00000009: cmpw r6, r5
0x0000000A: ble label_3
0x0000000B: mr r7, r6
0x0000000C: b label_4
0x0000000D: label_3:
0x0000000E: mr r7, r5
0x0000000F: label_4:
0x00000010: mr r3, r7
0x00000011: blr
}
Online service restore following function's algorithm description:
funct(arg_w_0, arg_w_1, arg_w_2){
if(arg_w_0>arg_w_1?true?false){
arg_w_0=arg_w_1;
}else{
}
if(arg_w_0>arg_w_2?true?false){
arg_w_0=arg_w_2;
}else{
}
return (arg_w_0);
}
And, therefore, funct() derive max of three numbers - max3().
Possible, this service helps you understand how assembler instructions works..
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
To me, Intel syntax is much easier to read. If I go traipsing through assembly forest concentrating only on Intel syntax, will I miss anything? Is there any reason I would want to switch to AT&T (outside of being able to read others' AT&T assembly)? My first clue is that gdb uses AT&T by default.
If this matters, my focus is only on any relation assembly and syntax may have to Linux/BSD and the C language.
There is really no advantage to one over the other. I agree though that Intel syntax is much easier to read. Keep in mind that, AFAIK, all GNU tools have the option to use Intel syntax also.
It looks like you can make GDB use Intel syntax with this:
set disassembly-flavor intel
GCC can do Intel syntax with -masm=intel.
The primary syntax for the GNU assembler (GAS) is AT&T. Intel syntax is a relatively new addition to it. x86 assembly in the Linux kernel is in AT&T syntax. In the Linux world, it's the common syntax. In the MS world, Intel syntax is more common.
Personally, I hate AT&T syntax. There are plenty of free assemblers (NASM, YASM) along with GAS that support Intel syntax too, so there won't be any problems doing Intel syntax in Linux.
Beyond that, it's just a syntactic difference. The result of both will be the same x86 machine code.
There is really no advantage to one over the other. I disagree though that Intel syntax is much easier to read, because personally I hate Intel syntax. Keep in mind that, AFAIK, all GNU tools have the option to use Intel syntax also.
at&t noprefix intel
mov eax, -4(ebp,edx,4) mov DWORD PTR[-4 +ebp +edx *4], eax
mov eax, -4(ebp) mov DWORD PTR[-4 +ebp], eax
mov edx, (ecx) mov DWORD PTR[ecx], edx
lea ( ,eax,4), eax lea eax, DWORD PTR[8 + eax*4]
lea (eax,eax,2), eax lea eax, DWORD PTR[eax*2+eax]
...and it gets more complicated with more complex instructions
'nuff said.
PS: This answer exists mainly for the reason of highlighting (IMHO) weaknesses in some other answers, which are actually not answers, but opinions. And of course this answer in reality is only my humble opinion.
PPS: I do not hate Intel syntax, I just don't care.
It's the "same language", in that it compiles down to the same machine code, has the same opcodes, etc. On the other hand, if you are using GCC at all, you will probably want to learn AT&T syntax, just because it's the default--no changing compiler options, etc. to get it.
I too cut my teeth on Intel-syntax x86 ASM (on DOS, too) and found it more intuitive initially when switching to C/UNIX. But once you learn AT&T it'll look just as easy.
I wouldn't give it that much thought---it's easy to learn AT&T once you know Intel, and vice-versa. The actual language is much harder to get in your head than the syntax. So by all means just focus on one and then learn the other when it comes up.
It's a sign of professionalism that you are willing to adjust to whatever is in use. There is no real advantage to one or the other. The intel syntax is common in the Microsoft world, AT&T is the standard in Linux/Unix. Since there's no advantage to either one, people tend to imprint on whatever they saw first. That said, a professional programmer raises above things like that. Use whatever they use at work, or in the domain that you're working in.
Intel syntax covers everything (assuming the assembler/disassembler is up to date with the latest junk Intel added to their instruction set). I'm sure at&t is the same.
at&t intel
movl -4(%ebp, %edx, 4), %eax mov eax, [ebp-4+edx*4]
movl -4(%ebp), %eax mov eax, [ebp-4]
movl (%ecx), %edx mov edx, [ecx]
leal 8(,%eax,4), %eax lea eax, [eax*4+8]
leal (%eax,%eax,2), %eax lea eax, [eax*2+eax]
...and it gets more complicated with more complex instructions
'nuff said.
My first assembly language was MIPS, which I've noticed is very similar to the ATT syntax. So I prefer the ATT syntax, but it doesn't really matter as long as you can read it.