Is QEMU good for learning programming in assembler for ARM and PowerPC? - linux

I want to learn programming in assembler for PowerPC and ARM, but I'm unable to buy real hardware for this purpose. I'm thinking about using QEMU for that. However I'm not sure if it emulates both architectures enough well, that I'll compile and run my programs in native assembler on it?

QEMU works well for testing program correction (i.e. whether the code would properly run on an actual ARM or PowerPC) but it is not good for testing program efficiency: the emulation is not cycle accurate, and speed measured with QEMU cannot be reliably (or even unreliably) correlated with speed on true hardware.
Also, QEMU will not trap unaligned memory accesses, which is not a problem for PowerPC emulation (the PowerPC tolerates unaligned accesses) but may be for ARM (an unaligned access, e.g. reading a 32-bit word in RAM from an address which is not a multiple of 4, will work fine with QEMU but would trigger an exception on a true ARM processor).
Apart from these points, QEMU is fine for assembly development on ARM or MIPS (haven't tried PowerPC, because I found an old iBook on eBay for that; but I have done ARM and MIPS assembly with QEMU and then ran the resulting code on true hardware, and this worked). You can either emulate a whole system and run Debian in it (in which case the compiler, linker, text editor... will also run in emulation), or use the "user-mode emulation" where the ARM/MIPS executable is run directly, with a wrapper which converts system calls into those for the host PC (this assumes that the host is a PC running Linux). The latter is more convenient (you have access to your normal home directory, programming tools are native...) but requires installing cross-development tools. See buildroot for that (and link with -static, this will avoid many headaches).

Since I have found signs that Debian for PowerPC and for ARM can run on QEMU, I suppose this won't be a problem.

Related

Can we convert elf from a cpu architecture to another, in linux? [duplicate]

How I can run x86 binaries (for example .exe file) on arm?As I see on Wikipedia,I need to convert binary data for the emulated platform into binary data suitable for execution on the targeted platform.but question is:How I can do it?I need to open file in hex editor and change?Or something else?
To successfully do this, you'd have to do two things.. one relatively easy, one very hard. Neither of which you want to do by hand in a hex editor.
Convert the machine code from x86 to ARM. This is the easy one, because you should be able to map each x86 opcode to one or more ARM opcodes. There are different ways to do this, some more efficient than others, but it can be done with a pretty straightforward mapping.
Remap function calls (and other jumps). This one is hard, because monkeying with the opcodes is going to change all the offsets for the jump and return points. If you have dynamically linked libraries (.so), and we assume that all the libraries are available at exactly the same version in both places (a sketchy assumption at best), you'd have to remap the loads.
It's essentially a machine->machine compiler and linker.
So, can you do it? Sure.
Is it easy? No.
There may be a commercial tool out there, but I'm not aware of it.
You can not do this with a binary;note1 here binary means an object with no symbol information like an elf file. Even with an elf file, this is difficult to impossible. The issue is determining code from data. If you resolve this issue, then you can make de-compilers and other tools.
Even if you haven an elf file, a compiler will insert constants used in the code in the text segment. You have to look at many op-codes and do a reverse basic block to figure out where a function starts and ends.
A better mechanism is to emulate the x86 on the ARM. Here, you can use JIT technology to do the translation as encountered, but you approximately double code space. Also, the code will execute horribly. The ARM has 16 registers and the x86 is register starved (usually it has hidden registers). A compilers big job is to allocate these registers. QEMU is one technology that does this. I am unsure if it goes in the x86 to ARM direction; and it will have a tough job as noted.
Note1: The x86 has an asymmetric op-code sizing. In order to recognize a function prologue and epilogue, you would have to scan an image multiple times. To do this, I think the problem would be something like O(n!) where n is the bytes of the image, and then you might have trouble with in-line assembler and library routines coded in assembler. It maybe possible, but it is extremely hard.
To run an ARM executable on an X86 machine all you need is qemu-user.
Example:
you have busybox compiled for AARCH64 architecture (ARM64) and you want to run it on an X86_64 linux system:
Assuming a static compile, this runs arm64 code on x86 system:
$ qemu-aarch64-static ./busybox
And this runs X86 code on ARM system:
$ qemu-x86_64-static ./busybox
What I am curioous is if there is a way to embed both in a single program.
read x86 binary file as utf-8,then copy from ELF to last character�.Then go to arm binary and delete as you copy with x86.Then copy x86 in clip-board to the head.i tried and it's working.

ARMv8 - Running legacy 32 bit Applications on 64 bit OS

Going thru the ARMv8 manual, I have the following questions to help understand the big picture.
Can legacy 32 bit app. (ARMv7 or earlier) run as is on the ARMv8 OS?
If the legacy applications need to be rebuilt for ARMv8 and assuming that I rebuild the application as 32 bit (Aarch32), does this need 32 bit OS underlying support? (It is interesting to know how the addressing mechanism works here.)
Please provide references wherever possible.
PS: I am targeting Linux OS with Aarch64 support (3.7 and later)
Aarch64 platform may run 32bit ARM but this compatibility is optional.
To run AArch32 binaries you need all libraries application would use in 32bit versions. Same as with i686 binaries on x86-64 systems.
There is also a Linux arm64 CONFIG_COMPAT at: https://github.com/torvalds/linux/blob/v4.17/arch/arm64/Kconfig#L1274 which says:
This option enables support for a 32-bit EL0 running under a 64-bit
kernel at EL1. AArch32-specific components such as system calls,
the user helper functions, VFP support and the ptrace interface are
handled appropriately by the kernel.
which will likely be required, and an ARM employee mentioned on this thread: https://community.arm.com/processors/f/discussions/5535/running-armv7-binaries-on-armv8 that userland instructions are basically the same with some exceptions:
For something like a Linux application, then yes. ARMv8-A includes AArch32, which provides backwards compatibility with ARMv7-A. There are some limitations, such as the SWP instruction no longer being supported. But these are types of things that applications are unlikely to be using (and were deprecated in ARMv7).
For baremetal, you have all the usual problems of using a binary from one platform on another. So you are going to need to do some degree of porting in most cases.
I then tried it for myself with this QEMU full system setup but my attempt failed: I compiled a C hello world with the armv7 compiler as:
arm-linux-gcc -static hello_world.c
and put the built file into the aarch64 target, but when I tried to run it it failed with:
a.out: line 1: syntax error: unexpected word (expecting ")")
even though /proc/config.gz says that CONFIG_COMPAT is set.
It seems that the Linux kernel is not identifying it as an ELF file but rather falling back to /bin/sh, I get the same error if I do:
sh /mnt/9p/a.out
is trying to use the shell binfmt instead of ELF.
In particular, I know that the Linux kernel can choose between archs from the binfmt signature because qemu-user does so: https://unix.stackexchange.com/questions/41889/how-can-i-chroot-into-a-filesystem-with-a-different-architechture

Linux kernel assembly and logic

My question is somewhat weird but I will do my best to explain.
Looking at the languages the linux kernel has, I got C and assembly even though I read a text that said [quote] Second iteration of Unix is written completely in C [/quote]
I thought that was misleading but when I said that kernel has assembly code I got 2 questions of the start
What assembly files are in the kernel and what's their use?
Assembly is architecture dependant so how can linux be installed on more than one CPU architecture
And if linux kernel is truly written completely in C than how can it get GCC needed for compiling?
I did a complete find / -name *.s
and just got one assembly file (asm-offset.s) somewhere in the /usr/src/linux-headers-`uname -r/
Somehow I don't think that is helping with the GCC working, so how can linux work without assembly or if it uses assembly where is it and how can it be stable when it depends on the arch.
Thanks in advance
1. Why assembly is used?
Because there are certain things then can be done only in assembly and because assembly results in a faster code. For eg, "you can get access to unusual programming modes of your processor (e.g. 16 bit mode to interface startup, firmware, or legacy code on Intel PCs)".
Read here for more reasons.
2. What assembly file are used?
From: https://www.kernel.org/doc/Documentation/arm/README
"The initial entry into the kernel is via head.S, which uses machine
independent code. The machine is selected by the value of 'r1' on
entry, which must be kept unique."
From https://www.ibm.com/developerworks/library/l-linuxboot/
"When the bzImage (for an i386 image) is invoked, you begin at ./arch/i386/boot/head.S in the start assembly routine (see Figure 3 for the major flow). This routine does some basic hardware setup and invokes the startup_32 routine in ./arch/i386/boot/compressed/head.S. This routine sets up a basic environment (stack, etc.) and clears the Block Started by Symbol (BSS). The kernel is then decompressed through a call to a C function called decompress_kernel (located in ./arch/i386/boot/compressed/misc.c). When the kernel is decompressed into memory, it is called. This is yet another startup_32 function, but this function is in ./arch/i386/kernel/head.S."
Apart from these assembly files, lot of linux kernel code has usage of inline assembly.
3. Architecture dependence?
And you are right about it being architecture dependent, that's why the linux kernel code is ported to different architecture.
Linux porting guide
List of supported arch
Things written mainly in assembly in Linux:
Boot code: boots up the machine and sets it up in a state in which it can start executing C code (e.g: on some processors you may need to manually initialize caches and TLBs, on x86 you have to switch to protected mode, ...)
Interrupts/Exceptions/Traps entry points/returns: there you need to do very processor-specific things, e.g: saving registers and reenabling interrupts, and eventually restoring registers and properly returning to user mode. Some exceptions may be handled entirely in assembly.
Instruction emulation: some CPU models may not support certain instructions, may not support unaligned data access, or may not have an FPU. An option is using emulation when getting the corresponding exception.
VDSO: the VDSO is a virtual library that the kernel maps into userspace. It allows e.g: selecting the optimal syscall sequence for the current CPU (on x86 use sysenter/syscall instead of int 0x80 if available), and implementing certain system calls without requiring a context switch (e.g: gettimeofday()).
Atomic operations and locks: Maybe in a future some of these could be written using C11 support for atomic operations.
Copying memory from/to user mode: Besides using an optimized copy, these check for out-of-bounds access.
Optimized routines: the kernel has optimized version of some routines, e.g: crypto routines, memset, clear_page, csum_copy (checksum and copy to another place IP data in one pass), ...
Support for suspend/resume and other ACPI/EFI/firmware thingies
BPF JIT: newer kernels include a JIT compiler for BPF expressions (used for example by tcpdump, secmode mode 2, ...)
...
To support different architectures, Linux has assembly code (re-)written for each architecture it supports (and sometimes, there are several implementations of some code for different platforms using the same CPU architecture). Just look at all the subdirectories under arch/
Assembly is needed for a couple of reasons.
There are many instructions that are needed for the operation of an operating system that have no C equivalent, at least on most processors. A good example on Intel x86/64 processors is the iret instruciton, which returns from hardware/software interrupts. These interrupts are key to handling hardware events (like a keyboard press) and system calls from programs on older processors.
A computer does not start up in a state that is immediately ready for execution of C code. For an Intel example, when execution gets to the startup routine the processor may not be in 32-bit mode (or 64-bit mode), and the stack required by C also may not be ready. There are some other features present in some processors (like paging) which need to be turned on from assembly as well.
However, most of the Linux kernel is written in C, which interfaces with some platform specific C/assembly code through standardized interfaces. By separating the parts in this way, most of the logic of the Linux kernel can be shared between platforms. The build system simply compiles the platform independent and dependent parts together for specific platforms, which results in different executable kernel files for different platforms (and kernel configurations for that matter).
Assembly code in the kernel is generally used for low-level hardware interaction that can't be done directly from C. They're like a platform- specific foundation that's used by higher-level parts of the kernel that are written in C.
The kernel source tree contains assembly code for a variety of systems. When you compile a kernel for a particular type of system (such as an x86 PC), only the appropriate assembly code for that platform is included in the build process.
Linux is not the second version of Unix (or Unix in general). It is Unix compatible, but Unix and Linux have separate histories and, in terms of code base (of their kernels), are completely separate. Linus Torvald's idea was to write an open source Unix.
Some of the lower level things like some of the architecture dependent parts of memory management are done in assembly. The old (but still available) Linux kernel API for x86, int 0x80, is implemented in assembly. There are probably other places in the kernel that are implemented in assembly, but I don't know any others.
When you compile the kernel, you select an architecture to target. Depending on the target, the right assembly files for that architecture are included in the build.
The reason you don't find anything is because you're searching the headers, not the sources. Download a tar ball from kernel.org and search that.

How to run 16 bit code on 32 bit Linux?

I have written a small 16-bit assembly program that writes some values in some memory locations.Is there a way I can test it in 32-bit protected mode on Linux?
qemu, dosbox, bochs
Yes, 16-bit code is supported in user processes in Linux. The system call to do it is called vm86() (there's a man page, but there's not much in it). It is, naturally, only works on x86 platforms (and 32-bit only).
If you want an example, the ELKS project has a complete tool for running ELKS 8086 binaries on Linux, which uses it:
https://github.com/lkundrak/dev86/tree/master/elksemu
Look for the run_elks() function. It's pretty straightforward.

64-bit linux, Assembly Language, Issues?

I'm currently in the process of learning assembly language.
I'm using Gas on Linux Mint (32-bit). Using this book:
Programming from the Ground Up.
The machine I'm using has an AMD Turion 64 bit processor, but I'm limited to 2 GB of RAM.
I'm thinking of upgrading my Linux installation to the 64-bit version of Linux Mint, but I'm worried that because the book is targeted at 32-bit x86 architecture that the code examples won't work.
So two questions:
Is there likely to be any problems with the code samples?
Has anyone here noticed any benefits in general in using 64-bit Linux over 32-bit (I've seen some threads on Stack Overflow about this but they are mostly related to Windows Vista vs. Windows XP.)
Your code examples should all still work. 64-bit processors and operating systems can still run 32-bit code in a sort of "compatability mode". Your assembly examples are no different. You may have to provide an extra line of assembly or two (such as .BITS 32) but that's all.
In general, using a 64-bit OS will be faster than using a 32-bit OS. x86_64 has more registers than i386. Since you're working on assembly, you already know what registers are used for... Having more of them means less stuff has to be moved on and off the stack (and other temporary memory) thus your program spends less time managing data and more time working on that data.
Edit: To compile 32-bit code on 64-bit linux using gas, you just use the commandline argument "--32", as noted in the GAS manual
Even if you run Linux 64bit, it is possible to compile and run 32bit binaries on it. I don't know how good Mint's support for that is, I assume you should check.
64bit assembler however is not fully compatible to 32bit, for example you have different (more) registers. There are some specific instructions not available on either platform.
I would say the move to 64bit is not a big deal. You can still write 32bit assembly and then perhaps try to get it also running as 64bit (shouldn't be too hard), as a source of even more programming/learning fun.
Usually 32-bits is plenty so only use 64-bits or more if you really NEED IT.
Best to decide prior to programming if you want to do it as a 32-bit app or
a 64-bit app and then stick to it as mixed mode debugging ca get tricky fast.

Resources