How to run 16 bit code on 32 bit Linux? - linux

I have written a small 16-bit assembly program that writes some values in some memory locations.Is there a way I can test it in 32-bit protected mode on Linux?

qemu, dosbox, bochs

Yes, 16-bit code is supported in user processes in Linux. The system call to do it is called vm86() (there's a man page, but there's not much in it). It is, naturally, only works on x86 platforms (and 32-bit only).
If you want an example, the ELKS project has a complete tool for running ELKS 8086 binaries on Linux, which uses it:
https://github.com/lkundrak/dev86/tree/master/elksemu
Look for the run_elks() function. It's pretty straightforward.

Related

Can we convert elf from a cpu architecture to another, in linux? [duplicate]

How I can run x86 binaries (for example .exe file) on arm?As I see on Wikipedia,I need to convert binary data for the emulated platform into binary data suitable for execution on the targeted platform.but question is:How I can do it?I need to open file in hex editor and change?Or something else?
To successfully do this, you'd have to do two things.. one relatively easy, one very hard. Neither of which you want to do by hand in a hex editor.
Convert the machine code from x86 to ARM. This is the easy one, because you should be able to map each x86 opcode to one or more ARM opcodes. There are different ways to do this, some more efficient than others, but it can be done with a pretty straightforward mapping.
Remap function calls (and other jumps). This one is hard, because monkeying with the opcodes is going to change all the offsets for the jump and return points. If you have dynamically linked libraries (.so), and we assume that all the libraries are available at exactly the same version in both places (a sketchy assumption at best), you'd have to remap the loads.
It's essentially a machine->machine compiler and linker.
So, can you do it? Sure.
Is it easy? No.
There may be a commercial tool out there, but I'm not aware of it.
You can not do this with a binary;note1 here binary means an object with no symbol information like an elf file. Even with an elf file, this is difficult to impossible. The issue is determining code from data. If you resolve this issue, then you can make de-compilers and other tools.
Even if you haven an elf file, a compiler will insert constants used in the code in the text segment. You have to look at many op-codes and do a reverse basic block to figure out where a function starts and ends.
A better mechanism is to emulate the x86 on the ARM. Here, you can use JIT technology to do the translation as encountered, but you approximately double code space. Also, the code will execute horribly. The ARM has 16 registers and the x86 is register starved (usually it has hidden registers). A compilers big job is to allocate these registers. QEMU is one technology that does this. I am unsure if it goes in the x86 to ARM direction; and it will have a tough job as noted.
Note1: The x86 has an asymmetric op-code sizing. In order to recognize a function prologue and epilogue, you would have to scan an image multiple times. To do this, I think the problem would be something like O(n!) where n is the bytes of the image, and then you might have trouble with in-line assembler and library routines coded in assembler. It maybe possible, but it is extremely hard.
To run an ARM executable on an X86 machine all you need is qemu-user.
Example:
you have busybox compiled for AARCH64 architecture (ARM64) and you want to run it on an X86_64 linux system:
Assuming a static compile, this runs arm64 code on x86 system:
$ qemu-aarch64-static ./busybox
And this runs X86 code on ARM system:
$ qemu-x86_64-static ./busybox
What I am curioous is if there is a way to embed both in a single program.
read x86 binary file as utf-8,then copy from ELF to last character�.Then go to arm binary and delete as you copy with x86.Then copy x86 in clip-board to the head.i tried and it's working.

Determine dynamically linux OS architecture

Is there is a way to know dynamically Linux architecture, whether it x86-64 or x86?
The Posix standard uname function (implemented in the uname(2) syscall) is dynamically giving you the information about the CPU. You probably want the machine field.
Caution about x86-64 kernels running a 32 bit program (e.g. a 32 bits Debian distribution chroot-ed in a 64 bits Debian, or perhaps a 32 bits ELF binary running on a 64 bits system); I have no idea what they give in that situation; I would imagine some x86_64 in that case, since the kernel does not really know about the binaries and libc of the system.
See also the Linux specific personality(2) syscall.
Google is your friend: http://sourceforge.net/p/predef/wiki/Architectures/
You want to test for the macros __amd64__ and __i386__. Ideally, you don't test the macros at all and write correct, portable code.
You can use lscpu command to list characteristics about CPU.

how come an x64 OS can run a code compiled for x86 machine

Basically, what I wonder is how come an x86-64 OS can run a code compiled for x86 machine. I know when first x64 Systems has been introduced, this wasn't a feature of any of them. After that, they somehow managed to do this.
Note that I know that x86 assembly language is a subset of x86-64 assembly language and ISA's is designed in such a way that they can support backward compatibility. But what confuses me here is stack calling conventions. These conventions differ a lot depending on the architecture. For example, in x86, in order to backup frame pointer, proceses pushes where it points to stack(RAM) and pops after it is done. On the other hand, in x86-64, processes doesn't need to update frame pointer at all since all the references is given via stack pointer. And secondly, While in x86 architecture arguments to functions is passed by stack in x86-64, registers are used for that purpose.
Maybe this differences between stack calling conventions of x86-64 and x64 architecture may not affect the way program stack grows as long as different conventions are not used at the same time and this is mostly the case because x32 functions are called by other x32's and same for x64. But, at one point, a function (probably a system function) will call a function whose code is compiled for a x86-64 machine with some arguments, at this point, I am curious about how OS(or some other control unit) handle to get this function work.
Thanks in advance.
Part of the way that the i386/x86-64 architecture is designed is that the CS and other segment registers refer to entries in the GDT. The GDT entries have a few special bits besides the base and limit that describe the operating mode and privilege level of the current running task.
If the CS register refers to a 32-bit code segment, the processor will run in what is essentially i386 compatibility mode. Likewise 64-bit code requires a 64-bit code segment.
So, putting this all together.
When the OS wants to run a 32-bit task, during the task switch into it, it loads a value into CS which refers to a 32-bit code segment. Interrupt handlers also have segment registers associated with them, so when a system call occurs or an interrupt occurs, the handler will switch back to the OS's 64-bit code segment, (allowing the 64-bit OS code to run correctly) and the OS then can do its work and continue scheduling new tasks.
As a follow up with regards to calling convention. Neither i386 or x86-64 require the use of frame pointers. The code is free to do as it pleases. In fact, many compilers (gcc, clang, VS) offer the ability to compile 32-bit code without frame pointers. What is important is that the calling convention is implemented consistently. If all the code expects arguments to be passed on the stack, that's fine, but the called code better agree with that. Likewise, passing via registers is fine too, just everyone has to agree (at least at the library interface level, internal functions can generally do as they please).
Beyond that, just keep in mind that the difference between the two isn't really an issue because every process gets its own private view of memory. A side consequence though is that 32-bit apps can't load 64-bit dlls, and 64-bit apps can't load 32-bit dlls, because a process either has a 32-bit code segment or a 64-bit code segment. It can't be both.
The processor in put into legacy mode, but that requires everything executing at that time to be 32bit code. This switching is handled by the OS.
Windows : It uses WoW64. WoW64 is responsible for changing the processor mode, it also provides the compatible dll and registry functions.
Linux : Until recently Linux used to (like windows) shift to running the processor in legacy mode when ever it started executing 32bit code, you needed all the 32bit glibc libraries installed, and it would break if it tried to work together with 64bit code. Now there are implementing the X32 ABI which should make everything run like smoother and allow 32bit applications to access x64 feature like increased no. of registers. See this article on the x32 abi
PS : I am not very certain on the details of things, but it should give you a start.
Also, this answer combined with Evan Teran's answer probably give a rough picture of everything that is happening.

Linking 32-bit library to 64-bit program

I have a 32-bit .so binary-only library and I have to generate 64-bit program that uses it.
Is there a way to wrap or convert it, so it can be used with 64-bit program?
No. You can't directly link to 32bit code inside of a 64bit program.
The best option is to compile a 32bit (standalone) program that can run on your 64bit platform (using ia32), and then use a form of inter-process communication to communicate to it from your 64bit program.
For an example of using IPC to run 32-bit plugins from 64-bit code, look at the open source NSPluginWrapper.
It is possible, but not without some serious magic behind the scenes and you will not like the answer. Either emulate a 32 bit CPU (no I am not kidding) or switch the main process back to 32 bit. Emulating may be slow though.
This is a proof of concept of the technique.
Then keep a table of every memory access to and from the 32 bit library and keep them in sync. It is very hard to get to a theoretical completeness, but something workable should be pretty easy, but very tedious.
In most cases, I believe two processes and then IPC between the two may actually be easiest, as suggested othwerwise.

64-bit linux, Assembly Language, Issues?

I'm currently in the process of learning assembly language.
I'm using Gas on Linux Mint (32-bit). Using this book:
Programming from the Ground Up.
The machine I'm using has an AMD Turion 64 bit processor, but I'm limited to 2 GB of RAM.
I'm thinking of upgrading my Linux installation to the 64-bit version of Linux Mint, but I'm worried that because the book is targeted at 32-bit x86 architecture that the code examples won't work.
So two questions:
Is there likely to be any problems with the code samples?
Has anyone here noticed any benefits in general in using 64-bit Linux over 32-bit (I've seen some threads on Stack Overflow about this but they are mostly related to Windows Vista vs. Windows XP.)
Your code examples should all still work. 64-bit processors and operating systems can still run 32-bit code in a sort of "compatability mode". Your assembly examples are no different. You may have to provide an extra line of assembly or two (such as .BITS 32) but that's all.
In general, using a 64-bit OS will be faster than using a 32-bit OS. x86_64 has more registers than i386. Since you're working on assembly, you already know what registers are used for... Having more of them means less stuff has to be moved on and off the stack (and other temporary memory) thus your program spends less time managing data and more time working on that data.
Edit: To compile 32-bit code on 64-bit linux using gas, you just use the commandline argument "--32", as noted in the GAS manual
Even if you run Linux 64bit, it is possible to compile and run 32bit binaries on it. I don't know how good Mint's support for that is, I assume you should check.
64bit assembler however is not fully compatible to 32bit, for example you have different (more) registers. There are some specific instructions not available on either platform.
I would say the move to 64bit is not a big deal. You can still write 32bit assembly and then perhaps try to get it also running as 64bit (shouldn't be too hard), as a source of even more programming/learning fun.
Usually 32-bits is plenty so only use 64-bits or more if you really NEED IT.
Best to decide prior to programming if you want to do it as a 32-bit app or
a 64-bit app and then stick to it as mixed mode debugging ca get tricky fast.

Resources