I'd like to know how to read integers from keyboard in assembly. I'm using Linux/x86 IA-32 architecture and GCC/GAS (GNU Assembler). The examples I found so far are for NASM or some other Windows/DOS related compiler.
I heard that it has something to do with the "int 16h" interrupt, but I don't know how it works (does it needs parameters? The result goes to %eax or any of its virtual registers [AX, AH, AL]?).
Thanks in advance,
Flayshon.
:D
Simple answer is that you don't read integers from the keyboard, you read characters from the keyboard. You don't print integers to the screen, either - you print characters. You will need routines to convert "ascii-to-integer" and "integer-to-ascii". You can "just call scanf" for the one, and "just call printf" for the other. "scanf" works okay if the user is well-behaved and confines input to characters representing decimal digits, but it's difficult to get rid of any "junk" entered! "printf" isn't too bad.
Although I'm a Nasm user (it works fine for Linux - not really "Windows/dos related"), I might have routines in (G)as syntax lying around. I'll see if I can find 'em if you can't figure it out.
As Brian points out, int 16h is a BIOS interrupt - 16-bit code - and is not useful in Linux.
Best,
Frank
In 2012, I don't recommend coding an entire program in assembly. Code only the most critical parts (if you absolutely want some assembly code). Compilers are optimizing better than humans. So use C or C++ for low level software, and higher-level languages e.g. Ocaml instead.
On Linux, you need to understand the role of the linux kernel and of system calls, which are documented in the section 2 of man pages. You probably want at least read(2) and write(2) (if only handling stdin and stdout which should have already be opened by the parent process, e.g. a shell), and you probably need many other syscalls (e.g. open(2) and close(2)). Don't forget to do your buffering (for efficiency purpose).
I strongly recommend learning the Linux system interfaces by reading a good book such as Advanced Unix Programming.
How system calls are done at the machine level in assembly is documented in the Linux Assembly Howto (at least for x86 Linux in 32 bits).
If your goal is to "obtain" a program, I would agree entirely with Basile. If your goal is to "learn assembly language", these other languages aren't really going to help. If your goal is to learn the nitty-gritty details of the hardware, you probably want assembly language, but Linux (or any other "protected mode" OS) isolates us from the hardware, so you might want to use clunky old DOS or even "write your own OS". Flayshon doesn't actually say what his goal is, but since he's asking here, he's probably interested in assembly language...
Some of us have a mental illness that makes us think it's "fun" to write in assembly language. Humor us!
Best,
Frank
Related
For my university, final-year dissertation, I am going to implement a compiler for a skeletal form of the C programming language, then go about extending it until it resembles something a little more like Java with array bounds checking, type-checking and so forth.
I am relatively competent at much of the theory that relates to compiler construction, and have experience programming in MIPS assembly language, so I do understand a little of what it is to write extremely low-level code.
My main concern is that I am likely to be able to get all the way to the point where I need to produce the actual machine-code output, but then not understand enough about how machine code is executed from the perspective of the operating system running it.
So, my actual question is basically, "does anyone know the best place to read up about writing assembly to run on an intel x86-64 processor under linux?"
The main gap in my knowledge is how the machine code is actually run in practise. Is it run directly on the processor, making "syscall"s (or the x86 equivalent) when it needs services provided by the kernel, or is the assembly language somehow an encapsulated description that tells the kernel how to execute the instructions (in a manner similar to an interpreted language such as Java)?
Any help you can provide would be greatly appreciated.
This document explains how you can implement a foreign function interface to interact with other code: http://www.x86-64.org/documentation/abi.pdf
Firstly, for the machine code start here: http://www.intel.com/products/processor/manuals/
Next, I assume your question about how the machine code is run is really about how the OS loads the exe into memory and calls main()? These links may help
Linkers and loaders:
http://www.linuxjournal.com/article/6463
ELF file format:
http://en.wikipedia.org/wiki/Executable_and_Linkable_Format and
http://www.linuxjournal.com/article/1060
Your machine code will go into the .text section of the executable
Finally, best of luck. Your project is similar to my final year project, except I targeted the JVM and compiled a subset of Visual Basic!
Well, they bring (should bring at least) great increase in performance, isn’t it?
So, I haven’t seen any Linux kernel sources, but ‘d love to ask: are they used somehow? (In this case – there must be some special “code-cap” for system that has no such instructions?)
The SSE and MMX instruction sets have limited value outside of audio/video and gaming work. You might find a few explicit uses in dark corners of the kernel, but I wouldn't count on it. The answer in the general case is "no, they are not used", nor are they used in most non-kernel/userspace applications.
The kernel does sometimes optionally use certain x86 instructions that are specific to certain CPUs (e.g. present on some AMD or Intel models but not all, nor vice-versa), such as syscall, but these are different from the SIMD instruction sets you're referring to, and are not part of some wider set of similarly-themed extensions.
After Mark's answer, I went looking. The only place I could easily identify them being used is in the RAID 6 library (which also has support for AltiVec, which is the PowerPC SIMD instruction set).
(Be wary just grepping the tree, there are a lot of spots where the kernel "knows" about SSE/MMX to support user-space applications, but isn't actually using it. Also a couple cases of unfortunate variable names that have absolutely nothing to do with SSE, e.g. in the SCTP implementation.)
There are severe restrictions on using vector registers and floating point registers in kernel code. See e.g. chapter 6.3 of "Calling conventions for different C++ compilers and operating systems". http://www.agner.org/optimize/#manuals
They are used in the kernel for a few things, such as
Software RAID
Encryption (possibly)
However, I believe it always checks their presence first.
"cpu simd instructions use FPU"
erm, no, not as I understand it. They're in part a modern and (much) more efficient replacement for FPU instructions, but a large part of the SIMD instruction set deals with integer operations.
I've never looked very hard at it, but I suppose (ok, hope) that SIMD code generated by a recent gcc version will not clobber any registers or state.
My first question here, hopefully I'm not doing it wrong.
My problem is that I have a certain old DOS program which has quite much hacked the file format to the extreme to save space. (Yes, it's a demoscene prod for those who know.)
Objdump doesn't want to help me with it; quick Googling yielded no real results for the problem and the manpage doesn't seem too generous in this regard either.
There are others yes, like lida. However, for some reason I couldn't get lida to work; I believe there are alternatives.
Anyone have any experience of disassembling DOS executables on Linux? Or should I just try some DOS based disassembler and run it on Dosemu?
IDA is the best disassembler, and there is also linux version. It's better than a simple dissasembler because it's interactive.
Also, if you want to see nice "hand made" assembly, the best place to look are old viruses. And not the binaries, but sources, because they are commented. You can try Netlux for that.
ndisasm comes with NASM, the netwide assembler. It is pretty versatile, including the ability to disassemble raw streams of bytes (since you mentioned COM files) and also a few object file formats. Strictly speaking I think it's also possible to disassemble raw streams of bytes with some objdump option, but I don't remember how that goes.
However self-modifying code can make this rather tricky. Looking at a stream of bytes, it's hard to predict what the final instructions executed might be if the program will modify itself, a common space-saving trick in the DOS era. You mentioned booting into DOS, which gives me some interesting ideas: Perhaps you could step through it using a DOS debugger, or run DOS under qemu and use its debugging options (some of which include dumping assembly output and register state during execution).
I'm learning assembly language. I started with Paul A. Carter's PC Assembly Language which uses NASM (The Netwide Assembler). Then in the middle I switched and started reading Introduction to 80×86 Assembly Language and Computer Architecture which uses MASM.
In NASM I used to write, for initializing a byte
db 110101b
In MASM I'm using
BYTE 110101b
I'm in the middle of reading. Since these are Assembler directives they will be different for each assembler. right?
Doesn't these assembler developers follow a standard for these directives? Because, They know that mnemonics are CPU specific. So, its pain in the ass to learn and code in assembly language.
Now if they follow different directives, its more pain if you change assembler or if you switch the operating system (MASM developer is in deep trouble if he goes to linux).
My confusion is should I acquaint myself with NASM or MASM? I'm fan of windows but I may have to work (in future) on Linux also.
Every book should be titled "_________ Assembly Language using __________ Assembler"
Unfortunately there has never been a standard for assembly language. You'll just have to learn the directives that your assembler supports. Fortunately most of the directives, while having different names, are semantically similar like db and BYTE.
But wait! It gets worse, especially for the x86. You have (at least) two forms of code that assemblers can accept: Intel and AT&T format. AT&T format reverses the order of most operands to instructions (or is it visa versa ;-).
NASM is probably a better choice for portability, but you could also look at the GNU
assembler..
Intel Syntax / AT&T Syntax
With x86 in particular, the first assemblers were from Intel and then largely-compatible assemblers from Microsoft formed one branch.
These assemblers organize source and destination operands right to left and have an unusual (and to my eyes, kind of wacky) abstraction layer that uses a single mnemonic for 8, 16, and 32-bit ops and then derives the actual machine opcode to use based on properties of the operand. Modifiers exist (on operands) to force a particular size.
But Unix was also important and it had a completely different assembler line with different traditions and conventions.
The original Unix vendor was AT&T, which owned the intellectual property developed at Bell Labs. A series of BSD projects and then Linux continued with this tradition. These assemblers historically process operands left to right, have a spare design optimized for speed, and when used by humans they generally use cpp for macros and conditionals, even if the assembler also has parallel features.
These days you are probably using VS on MS or Gnu on Linux or Mac, but this is why we still say AT&T vs Intel. The GNU assembler has an option to assemble both ways, although it's still really in the AT&T camp.
Generally yes. They are mostly feature-compatible though, so converting from one assembler syntax to another is usually not terribly difficult if you know both.
Processors are all documented in a manufacturer supplied Reference manual. This usually developed into the normative syntax (along with the assembler provided by the vendor) for assembly programs on a particular platform. Consequently, many processors from a single vendor have similar syntax.
The situation became more complex with second sourcing of processors and the eventual development of multi-targeting assemblers that, for historical reasons, use mostly consistent syntax across all platforms. This also provides some arguable advantages when porting code across platforms.
Your best choices are to: pick a notation you are comfortable with and accept books with different syntax, see if you can locate cross-system macro libraries or translation tools or bite the bullet and learn multiple dialects. The third is usually tolerable although it makes building private libraries labour intensive.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
Programming isn't my main job, though I enjoy it and sometimes get paid for it. For many years now I've been hearing about Linux and my friends have shown to me many *nixes (or *nici?), though I stick with Mac OS.
Do you think there are any parts of the Linux kernel that I could enjoy looking at, that would help me understand what's the whole stuff about? For example, how Linux is different from Darwin?
I've grown up with assembler and DOS, so things like interrupts or low-level C shouldn't be barriers to understanding. But in the end I'm more interested in high-level concepts, like threading or networking stack - I know different operating systems do them differently. And I'm looking for something fun, easy and enjoyable, like late-night reading.
(Note: made a CW, just in case)
Update: I looked for some docs and started reading:
Unreliable Guide To Locking
I would recommend looking at LXR. It makes it easier to follow the flow of the code (you do not have to search for each function that is called — well, you have, but the site does it for you).
Some starting points, for the current version (2.6.30):
start_kernel() — think of it as the kernel equivalent of main(). This function initializes almost all the kernel subsystems; follow it to see in code what you see scrolling on the screen during the boot.
entry_32.S — system calls and interrupts (x86-32 version, which should be nearer what you know; note the use of the AT&T assembly dialect instead of the Intel dialect you might be more used to).
head_32.S — the kernel entry point. This is where the kernel starts after switching to protected mode; in the end, it will call start_kernel().
arch/x86/boot — the real-mode bootstrap code. It starts in assembly (boot/header.S), but quickly jumps into C code (starting at boot/main.c). Does the real-mode initialization (mostly BIOS calls which have to be done before switching to protected mode); it is compiled using a weird GCC trick (.code16gcc), which allows the generation of 32-bit real-mode code.
arch/x86/boot/compressed — if you ever wondered where does the "Decompressing Linux..." message comes from, it is from here.
Myself, I've always found the task scheduling code a bit of a hoot :-/
Methinks you need to get yourself a hobby outside the industry. Or a life :-)
The comments in the kernel can be pretty funny. There's some tips on where to find the best ones on kerneltrap.
arch/sparc/lib/checksum.S- /* Sun, you just can't beat me, you just can't. Stop trying,
arch/sparc/lib/checksum.S: * give up. I'm serious, I am going to kick the living shit
arch/sparc/lib/checksum.S- * out of you, game over, lights out.*/
linux-0.01.tar.gz is Historic Kernel and good for start
it is simple and tiny and better for start reading
(also it have void main(void) Instead of start_kernel() lol :D )
You might want to read or skim a book that describes the Linux Kernel before looking deep into the Linux kernel.
The books that come to mind are:
Understanding the Linux Kernel, Second Edition
Design of the UNIX Operating System
You need to re-define the word 'fun' in your context. :)
That said, the Linux kernel may be too much of a monster to take on. You may want to start with some academic or more primitive kernels to get the hang of what's going on, first. You may also want to consider the Jolix book.
You'd probably get more out of reading a book on OS theory. As far as source code goes: I've got no idea, but you could easily download the Linux kernel source and see if you can find anything that appeals.
This should turn up some interesting code when run in the src directory:
grep -ir "fixme" *
also try with other comical terms, crap, shit, f***, penguin, etc.
It's been recommended by quite a few people that v0.0.1 of linux is the easiest to understand.
Though, if your looking for good kernel source to read, I wouldn't go with linux, it's a beast of a hack(about like saying the GCC sources are "fun") Instead, you may wish to try Minix or one of the BSDs(Darwin is basically a branch of NetBSD iirc) or even one of the many free DOS clones if everything else is a little too scary..
Try reading the code that implements these character devices:
/dev/zero
/dev/null
/dev/full
And maybe the random number generators if you are inclined. The code is straightforward and simpler than all other device drivers since it does not touch any hardware.
Start at drivers/char/mem.*
kernel.h
Some simple tricks we can learn, such as
#define ARRAY_SIZE(x) (sizeof(x)/sizeof(x[0]))
...
#define min(x, y) ...
...
#define container_of
For fun I guess you could also see Minix, it isn't exactly linux but Modern Operating systems by tenenbaum is a good read.