Related
I've decided to learn assembler through online tutorials.
I've come across this one that uses the NASM compiler, which most other tutorials seem to as well:
http://www.tutorialspoint.com/assembly_programming/index.htm
I've also come across this youtube series "Assembly primer for hackers"
https://www.youtube.com/watch?v=K0g-twyhmQ4&list=PLue5IPmkmZ-P1pDbF3vSQtuNquX0SZHpB
This one uses what the guy describes as the 'generic linux compiler' (owtte).
The commands for compiling go something like this:
as -o file.o file.s
Where file.s is the assembly source code. Followed by:
ld -o file file.o
Where file is then the executable.
Each of the tutorials uses a different syntax (e.g. a register in the latter tutorial is always preceded by %. NB. There do appear to be less superficial differences in the syntax than this as well). Are these syntaxes decided by the individual compiler?
I was also initially confused when I tried to compile code from the NASM tutorial with the latter method. I was always under the impression that the instruction set had to depend on the CPU and it therefore shouldn't matter which compiler I use. I've just concluded that it's merely differences in syntax but is that correct?
I'm running a Linux computer, by the way, on kernel 4.1.6.
My main question is really which syntax do I use? Is it just a matter of choice? Is one more widely used than the other? Thanks for any help.
Each of the tutorials uses a different syntax (e.g. a register in the
latter tutorial is always preceded by %. NB. There do appear to be
less superficial differences in the syntax than this as well). Are
these syntaxes decided by the individual compiler?
Yes, different assemblers (= assembly language compilers) might use different assembler language syntax although they provide code for the same processor and platform.
My main question is really which syntax do I use? Is it just a matter
of choice? Is one more widely used than the other?
One assembler, like NASM, might go for a wide range of processors and platforms, in this case you would benefit from learning its syntax when you need to work with several processors or platforms.
In other cases it might be better to stick with the assembler of some prominent vendor, because it is widely used and you can find more example code on the net for it which might help you with your development.
Last not least you might simply prefer a particular assembler because you like its features or syntax.
If your'e on a Windows system, Microsoft's MASM (ML.EXE or ML64.exe for 64 bit) syntax is virtually the same as Intel's syntax. MASM (ML.EXE and ML64.EXE) is included with the free Visual Studio express editions, although you usually have to create a custom build step to invoke the assembler in a VS project. VS express includes a good source level debugger.
If you're on a Linux type system, then you'll probably use AT&T syntax, which I assume ended up that way since it was a conversion of some generic assembler. I don't know which assembler(s) to recommend for Linux.
I'd like to know how to read integers from keyboard in assembly. I'm using Linux/x86 IA-32 architecture and GCC/GAS (GNU Assembler). The examples I found so far are for NASM or some other Windows/DOS related compiler.
I heard that it has something to do with the "int 16h" interrupt, but I don't know how it works (does it needs parameters? The result goes to %eax or any of its virtual registers [AX, AH, AL]?).
Thanks in advance,
Flayshon.
:D
Simple answer is that you don't read integers from the keyboard, you read characters from the keyboard. You don't print integers to the screen, either - you print characters. You will need routines to convert "ascii-to-integer" and "integer-to-ascii". You can "just call scanf" for the one, and "just call printf" for the other. "scanf" works okay if the user is well-behaved and confines input to characters representing decimal digits, but it's difficult to get rid of any "junk" entered! "printf" isn't too bad.
Although I'm a Nasm user (it works fine for Linux - not really "Windows/dos related"), I might have routines in (G)as syntax lying around. I'll see if I can find 'em if you can't figure it out.
As Brian points out, int 16h is a BIOS interrupt - 16-bit code - and is not useful in Linux.
Best,
Frank
In 2012, I don't recommend coding an entire program in assembly. Code only the most critical parts (if you absolutely want some assembly code). Compilers are optimizing better than humans. So use C or C++ for low level software, and higher-level languages e.g. Ocaml instead.
On Linux, you need to understand the role of the linux kernel and of system calls, which are documented in the section 2 of man pages. You probably want at least read(2) and write(2) (if only handling stdin and stdout which should have already be opened by the parent process, e.g. a shell), and you probably need many other syscalls (e.g. open(2) and close(2)). Don't forget to do your buffering (for efficiency purpose).
I strongly recommend learning the Linux system interfaces by reading a good book such as Advanced Unix Programming.
How system calls are done at the machine level in assembly is documented in the Linux Assembly Howto (at least for x86 Linux in 32 bits).
If your goal is to "obtain" a program, I would agree entirely with Basile. If your goal is to "learn assembly language", these other languages aren't really going to help. If your goal is to learn the nitty-gritty details of the hardware, you probably want assembly language, but Linux (or any other "protected mode" OS) isolates us from the hardware, so you might want to use clunky old DOS or even "write your own OS". Flayshon doesn't actually say what his goal is, but since he's asking here, he's probably interested in assembly language...
Some of us have a mental illness that makes us think it's "fun" to write in assembly language. Humor us!
Best,
Frank
Is there a programming language, having usable interactive interpreter, even as it can be compiled to machine code?
Compilation vs. "interpretation" is essentially a matter of implementation, not the language itself. For example, MRI Ruby 1.8 is interpreted, while MacRuby is compiled to native machine code. Both include an interactive REPL. All the languages I know that have at least one machine-code compiler and at least one REPL:
Ruby
Python
Almost all Lisps (Lisp was the language that pioneered this technique, AFAIK)
OCaml
Haskell
Forth
If we're counting compilation to bytecode as well as machine code, it's true of the vast majority of popular bytecode-compiled languages:
Java
Scala
Groovy
Erlang
C#
F#
Smalltalk
Haskell, using the Glasgow Haskell Compiler which has an interactive "shell" called GHCi.
Many flavors of Lisp offer both options, including Clojure.
Two come to my mind : ocaml and scala (~= java), but I'm sure there must be a lot more out there.
And here's another one to burn your house down:
x86 Assembly
Yup, there are interpreters for this as well.
Javascript x86 Assembly Interpreter
Jasmin
At this point you're really in emulator land, but it does meet the requirements you state.
I'm wondering if it's easier to name compiled languages that someone hasn't cobbled up a working interpreter for. :-)
Lua has an interactive mode for one-liners and experimentation. It normally compiles to bytecode for its VM for execution. LuaJIT is an independent implementation of a Lua VM that also does just-in-time compilation to 32-bit x86. Support for 64-bit is underway, and support for ARM is frequently requested.
Compilation to a bytecode is often a reasonable compromise between a pure interpreter and a pure compiler. The VM can be tuned to the needs of the language, and JIT techniques can analyze the VM code as it executes and concentrate on frequently executed code paths and inner loops.
As others have mentioned, OCaml.
If managed code (.NET CLI) is close enough to machine code, F# would be a candidate as well. There are probably other .NET/Mono languages which meet the requirement as well.
You may regret you asked:
C and C++.
Why?
Ch
CINT
EIC
picocc
and there are probably others out there as well.
Plenty of languages offer an implementation that both interacts and compiles to machine code, but it's rare to do both at once. Standard ML of New Jersey is one that has an interactive loop but no bytecode: it simply compiles to machine code in memory and then branches to it.
Not exactly machine code, but Java can be compiled and also used via BeanShell.
I've used Ruby with an interpreter, and there seems to be a compiler here.
Icon used to have a compiler, but it falls in and out of maintenence. It may still work.
Python can be compiled to windows executables.
C# can be compiled by using SnippetCompiler, maybe this would act as an interactive interpreter for you?
Your question is a bit vague. Even Java would fit it:
by interactive interpreter, i mean
shell-like environment, where you can
work in the runtime interactively.
Java has this, e.g. in the Eclipse "scrapbook pages", where you can enter Java expressions and have them evaluated right away. Java is of course also a compiled language (and while it's usually compiled to bytecode, there are various compilers that output machine code).
So what are you looking for? Maybe you could explain your problem or interest.
I tried using mono/.net for a bit and found random GC pauses to be disagreeable (at least on my crusty old laptop). I looked at using gambit-c an implementation of scheme that can compile to C but it seemed difficult to work with because the docs were somewhat limited and the packages where not very easy to install and use.
I usually just stick to having an interpreted language such as python bound to C/C++ which is more painful but at least I know what I am in for.
I'm learning assembly language. I started with Paul A. Carter's PC Assembly Language which uses NASM (The Netwide Assembler). Then in the middle I switched and started reading Introduction to 80×86 Assembly Language and Computer Architecture which uses MASM.
In NASM I used to write, for initializing a byte
db 110101b
In MASM I'm using
BYTE 110101b
I'm in the middle of reading. Since these are Assembler directives they will be different for each assembler. right?
Doesn't these assembler developers follow a standard for these directives? Because, They know that mnemonics are CPU specific. So, its pain in the ass to learn and code in assembly language.
Now if they follow different directives, its more pain if you change assembler or if you switch the operating system (MASM developer is in deep trouble if he goes to linux).
My confusion is should I acquaint myself with NASM or MASM? I'm fan of windows but I may have to work (in future) on Linux also.
Every book should be titled "_________ Assembly Language using __________ Assembler"
Unfortunately there has never been a standard for assembly language. You'll just have to learn the directives that your assembler supports. Fortunately most of the directives, while having different names, are semantically similar like db and BYTE.
But wait! It gets worse, especially for the x86. You have (at least) two forms of code that assemblers can accept: Intel and AT&T format. AT&T format reverses the order of most operands to instructions (or is it visa versa ;-).
NASM is probably a better choice for portability, but you could also look at the GNU
assembler..
Intel Syntax / AT&T Syntax
With x86 in particular, the first assemblers were from Intel and then largely-compatible assemblers from Microsoft formed one branch.
These assemblers organize source and destination operands right to left and have an unusual (and to my eyes, kind of wacky) abstraction layer that uses a single mnemonic for 8, 16, and 32-bit ops and then derives the actual machine opcode to use based on properties of the operand. Modifiers exist (on operands) to force a particular size.
But Unix was also important and it had a completely different assembler line with different traditions and conventions.
The original Unix vendor was AT&T, which owned the intellectual property developed at Bell Labs. A series of BSD projects and then Linux continued with this tradition. These assemblers historically process operands left to right, have a spare design optimized for speed, and when used by humans they generally use cpp for macros and conditionals, even if the assembler also has parallel features.
These days you are probably using VS on MS or Gnu on Linux or Mac, but this is why we still say AT&T vs Intel. The GNU assembler has an option to assemble both ways, although it's still really in the AT&T camp.
Generally yes. They are mostly feature-compatible though, so converting from one assembler syntax to another is usually not terribly difficult if you know both.
Processors are all documented in a manufacturer supplied Reference manual. This usually developed into the normative syntax (along with the assembler provided by the vendor) for assembly programs on a particular platform. Consequently, many processors from a single vendor have similar syntax.
The situation became more complex with second sourcing of processors and the eventual development of multi-targeting assemblers that, for historical reasons, use mostly consistent syntax across all platforms. This also provides some arguable advantages when porting code across platforms.
Your best choices are to: pick a notation you are comfortable with and accept books with different syntax, see if you can locate cross-system macro libraries or translation tools or bite the bullet and learn multiple dialects. The third is usually tolerable although it makes building private libraries labour intensive.
Is there a way that I use MASM under Linux. Even tough NASM is quite popular under Linux, it still differs for some instruction style on code.
Wiki says
The MASM32 EULA does not allow its usage in the development of open source software, and only allows it to be run in Windows operating systems.
so it is a no.
I use DosBox and it does work fine for me.
Details here
You should be able to run MASM under Wine.
MASM dont run with WINE, im running MASM under Virtual Box
Personally I prefer the NASM style, but you can probably run MASM under Wine (or failing that, in a VM). After all it shouldn't need any exotic API calls.
I've been able to run the Win32 NASM binary under Wine on Linux without any problems [long story, no net connection].
If you want to convert Microsoft's OMF binary format to ELF then you should be able to do so using objcopy, but you may need to compile in support for the right object formats.
Run MASM under Wine or see at the wiki that MASM can only run at Windows.
Regards.
An alternative to MASM is UASM.
UASM is a free MASM-compatible assembler based on JWasm.
It works for creating general Linux binaries.
However, shared objects requiring the -fPIC option is not possible with UASM.