ARM assembly "retne" instruction

ARM assembly "retne" instruction - linux

I am currently in the process of understanding what it takes for the Linux kernel to boot. I was browsing through the Linux kernel source tree, in particular for the ARM architecture, until I stumbled upon this assembly instruction retne lr in arch/arm/kernel/hyp-stub.S
Conceptually, it's easily understood that the instruction is suppose to return to the address stored in the link register if the Z-flag is 0. What I am looking for is where this ARM assembly instruction is actually documented.
I searched in the ARM Architecture Reference Manual ARMv7-A and ARMv7-R edition section A8.8 and could not find the description of the instruction.
Grepping the sources and seeing if it was an ARM specific GNU AS extension did not turn up anything in particular.
A google search with the queries "arm assembly ret instruction", "arm return instruction" and anything similar along the lines did not turn up anything useful either. Surely I must be looking in the wrong places or I must be missing something.
Any clarification will be much appreciated.

The architectural assembly language is one thing, real world code is another. Once assembler pseudo-ops and macros come into play, a familiarity with both the toolchain and the codebase in question helps a lot. Linux is particularly nasty as much of the assembly source contains multiple layers of both assembler macros and CPP macros. If you know what to look for, and follow the header trail to arch/arm/include/asm/assembler.h, you eventually find this complicated beast:
.irp c,,eq,ne,cs,cc,mi,pl,vs,vc,hi,ls,ge,lt,gt,le,hs,lo
.macro ret\c, reg
#if __LINUX_ARM_ARCH__ < 6
mov\c pc, \reg
#else
.ifeqs "\reg", "lr"
bx\c \reg
.else
mov\c pc, \reg
.endif
#endif
.endm
.endr
The purpose of this is to emit the architecturally-preferred return instruction for the benefit of microarchitectures with a return stack, whilst allowing the same code to still compile for older architectures.

Related

Why linux kernel has. S and. C, how do they get to work together

As seen in this link
https://elixir.bootlin.com/linux/v4.0/source/arch/x86/kernel/entry_64.S#
And also in this image of the kernel source code for x86 interactenter image description here

Some part of Linux kernel, especially the bootloader will use some assembly language. The best way to knwo how they work together is to debug the code, and follow the start step by step. Here is a helpful bolg. What's more, there is a blog can give you some knowledge about how they work.

Convert object file to another architecture

I am trying to use a Wifi-Dongle with a Raspberry Pi. The vendor of the dongle provides a Linux driver that I can compile successfully on the ARM-architecture, however, one object file, that comes with the driver, was precompiled for a x86-architecture, which causes the linker to fail.
I know it would be much easier to compile that (quite big) file again, but I don't have access to the source code.
Is it possible to convert that object file from a x86-architecture to an ARM-architecture?
Thank you!

Um, no, it looks to me like a waste of time. Wi-Fi driver is complex, and you say this one troublesome object file is 'large'. Lots of pain to translate, and chance of successful debug slim to none. Also, any parameter passing between this one object file and the rest of the system would not translate directly between x86 and ARM.

In theory, yes. Doing it on a real kernel driver without access to source code will be difficult.
If you had high quality dis-assembly of the object file, and the code in the object file is "well behaved" (using standard calling conventions, no self modifying code) then you could automatically translate the X86 instructions into arm instructions. However, you probably don't have high quality dis-assembly. In particular, there can be portions of the object file that you will not be able to properly classify as code or data doing normal recursive descent dis-assembly. If you misinterpret data as code, it will be translated to ARM code, rather than copied as is, and so will have the wrong values. That will likely cause the code to not work correctly.
Even if you get lucky, and can properly classify all of the addresses in the object file, there are several issues that will trip you up:
The calling conventions on X86 are different than the calling conventions on ARM. This means you will have to identify patterns related to X86 calling conventions and change them to use ARM calling conventions. This is a non trivial rewrite.
The hardware interface on ARM is different than on X86. You will have to understand how the driver works in order to translate the code. That would require either a substantial X86 hardware comparability layer, or reverse engineering of how the driver works. If you can reverse engineer the driver, then you don't need to translate it. You could just write an arm version.
The internal kernel APIS are different between ARM and X86. You will have to understand those difference and how to translate between them. That's likely non trivial.
The Linux Kernel uses an "alternatives" mechanism, which will rewrite machine code dynamically when code is first loaded into the kernel. For example, on uni-processor machines, locks are often replaced with no-ops to improve perf. Instructions like "popcnt" are replaced with function calls on machines that don't support it, etc. It's use in the Kernel is extremely common. This means there's a good chance the code in the object is file is not "well behaved", according to the definition given above. You would have to either verify that the object file doesn't use that mechanism, or find a way to translate uses of it.
X86 uses a different memory model than ARM does. To "safely" translate X86 code to ARM (without introducing race conditions) you would have to introduce memory fences after every memory access. That would result in REALLY BAD performance on an ARM chip. Figuring out when you need to introduce memory fences (without doing it everywhere) is an EXTREMELY hard problem. The most successful attempts at that sort of analysis require custom type systems, which you won't have in the object file.
Your best bet (quickest route to success) would be to try and reverse engineer what the object file in question does, and then just replace it.

There is no reasonable way of doing this. Contact the manufacturer and ask if they can provide the relevant code in ARM code, as x86 is useless to you. If they are not able to do that, you'll have to find a different supplier of either the hardware [that has an ARM version, or fully open source, of all the components], or supplier of the software [assuming there is another source of that].

You could translate the x86 assembly manually by installing x86 GNU binutils and disassemble
the object file with objdump. Probably some addresses will differ but should be straight forward.

Yes, you could most definitely do a static binary translation. x86 disassembly is painful though, if this was compiled from high level then it isnt as bad as it could be.
Is it really worth the effort? Might try an instruction set simulator instead. Have you done an analysis of the number of instructions used? System calls required, etc?
How far have you gotten so far on the disassembly?

Maybe the file only contains a binary dump of the wifi firmware? If so you need no instruction translation and a conversion can be done using objcopy.
You can you use objdump -x file.o and look if any real executable code is inside the obj-file or if it's only data.

If you have access to IDA with Hex-Rays decompiler, you can (with some work) decompile the object file into C code and then try to recompile it for ARM.

what is asm stand for in linux/include/asm

read from howto_add_systemcall
"In general, header files for machine architecture independent system
calls and functions are kept under linux/include/linux/ and machine
architecture dependent ones are kept in linux/include/asm/"
so what does asm stand for here?
I've searched wiki, but not found the answer.

I guess it stands for Architecture Specific Macros (asm) initially.
After that, any architecture specific stuff are placed there.

"asm" stands for "assembler" or "assembly language".

Assembly code for handling system calls for x86 architecture are located at,
arch/x86/kernel/entry_32.S (or _64.S)

Producing executables within Linux (in relation to implementing a compiler)

For my university, final-year dissertation, I am going to implement a compiler for a skeletal form of the C programming language, then go about extending it until it resembles something a little more like Java with array bounds checking, type-checking and so forth.
I am relatively competent at much of the theory that relates to compiler construction, and have experience programming in MIPS assembly language, so I do understand a little of what it is to write extremely low-level code.
My main concern is that I am likely to be able to get all the way to the point where I need to produce the actual machine-code output, but then not understand enough about how machine code is executed from the perspective of the operating system running it.
So, my actual question is basically, "does anyone know the best place to read up about writing assembly to run on an intel x86-64 processor under linux?"
The main gap in my knowledge is how the machine code is actually run in practise. Is it run directly on the processor, making "syscall"s (or the x86 equivalent) when it needs services provided by the kernel, or is the assembly language somehow an encapsulated description that tells the kernel how to execute the instructions (in a manner similar to an interpreted language such as Java)?
Any help you can provide would be greatly appreciated.

This document explains how you can implement a foreign function interface to interact with other code: http://www.x86-64.org/documentation/abi.pdf

Firstly, for the machine code start here: http://www.intel.com/products/processor/manuals/
Next, I assume your question about how the machine code is run is really about how the OS loads the exe into memory and calls main()? These links may help
Linkers and loaders:
http://www.linuxjournal.com/article/6463
ELF file format:
http://en.wikipedia.org/wiki/Executable_and_Linkable_Format and
http://www.linuxjournal.com/article/1060
Your machine code will go into the .text section of the executable
Finally, best of luck. Your project is similar to my final year project, except I targeted the JVM and compiled a subset of Visual Basic!

Bare metal cross compilers input

What are the input limitations of a bare metal cross compiler...as in does it not compile programs with pointers or mallocs......or anything that would require more than the underlying hardware....also how can 1 find these limitations..
I also wanted to ask...I built a cross compiler for target mips..i need to create a mips executable using this cross compiler...but i am not able to find where the executable is...as in there is 1 executable which i found mipsel-linux-cpp which is supposed to compile,assemble and link and then produce a.out but it is not doing so...
However the ./cc1 gives a mips assembly.......
There is an install folder which has a gcc executable which uses i386 assembly and then gives an exe...i dont understand how can the gcc exe give i386 and not mips assembly when i have specified target as mips....
please help im really not able to understand what is happ...
I followed the foll steps..
1. Installed binutils 2.19
2. configured gcc for mips..(g++,core)

I would suggest that you should have started two separate questions.
The GNU toolchain does not have any OS dependencies, but the GNU library does. Most bare-metal cross builds of GCC use the Newlib C library which provides a set of syscall stubs that you must map to your target yourself. These stubs include low-level calls necessary to implement stream I/O and heap management. They can be very simple or very complex depending on your needs. If the only I/O support is to a UART to stdin/stdout/stderr, then it is simple. You don't have to implement everything, but if you do not implement teh I/O stubs, you won't be able to use printf() for example. You must implement the sbrk()/sbrk_r() syscall is you want malloc() to work.
The GNU C++ library will work correctly with Newlib as its underlying library. If you use C++, the C runtime start-up (usually crt0.s) must include the static initialiser loop to invoke the constructors of any static objects that your code may include. The run-time start-up must also of course initialise the processor, clocks, SDRAM controller, timers, MMU etc; that is your responsibility, not the compiler's.
I have no experience of MIPS targets, but the principles are the same for all processors, there is a very useful article called "Building Bare Metal ARM with GNU" which you may find helpful, much of it will be relevant - especially porting the parts regarding implementing Newlib stubs.
Regarding your other question, if your compiler is called mipsel-linux-cpp, then it is not a 'bare-metal' build but rather a Linux build. Also this executable does not really "compile, assemble and link", it is rather a driver that separately calls the pre-processor, compiler, assembler and linker. It has to be configured correctly to invoke the cross-tools rather than the host tools. I generally invoke the linker separately in order to enforce decisions about which standard library to link (-nostdlib), and also because it makes more sense when a application is comprised of multiple execution units. I cannot offer much help other than that here since I have always used GNU-ARM tools built by people with obviously more patience than me, and moreover hosted on Windows, where there is less possibility of the host tool-chain being invoked instead (one reason why I have also avoided those tool-chains that rely on Cygwin)

EDIT
With more time available, I have rewritten my original answer in an attempt to provide something more useful.
I cannot provide a specific answer for your question. I have never tried to get code running on a MIPS machine. What I do have is plenty of experience getting a variety of "bare metal" boards up and running. All kinds of CPUs and all kinds of compilers and cross compilers. So I have an understanding of the principles that apply in all such situations. I will point out the kind of knowledge you will need to absorb before you can hope to succeed with a job like this, and hopefully I can list some links to resources to get you started on learning that knowledge.
I am worried you don't know that pointers are exactly the kind of thing a bare metal compiler can handle, they are a basic machine primitive. This tells me you are probably not an expert embedded developer who is just stuck in this particular scenario. Never mind. There isn't anything magic about programming an embedded system, and you can learn what you need to know.
The first step is getting to understand the relationship between C and the machine you wish to run code on. Basically C is a portable assembly language. This means that C is good for manipulating the basic operations of the machine. In this sense the basic operations of the machine are reading and writing memory locations, performing arithmetic and boolean operations on the data read from memory, and making branching and looping decisions based on that data. In particular the C concept of pointers allows you to manipulate data at locations in memory that you specify.
So far so good, but just doing raw computations in memory is not usually enough - you need a way to input and output data from memory. To do that you need to manipulate the hardware peripherals on your board. If the hardware peripherals are memory mapped then the machine registers used to control the peripherals look exactly like memory locations and C can manipulate them directly. Even in that case though, it is much more likely that doing useful I/O is best handled by extending the C core language with a library of routines provided just for that purpose. These library routines handle all the nasty details (timers, interrupts, non-memory mapped I/O) involved in manipulating the peripheral hardware on the board, and wrap them up with a convenient C function call interface. The idea is that you can go simply printf("hello world"); and the library call take care of the details of displaying the string.
An appropriately skilled developer knows how to adapt an existing I/O library to a new board, or how to develop new library routines to provide access to non-standard custom hardware. The classic way to develop these skills is to start with something simple, usually a LED for an output device, and a switch for an input device. Write a program that pulses a LED in a predictable way, or reads a switch and reflects in on a LED. The first time you get this working will be hugely satisfying.
Okay I have rambled enough. It is time to provide some more resources for you to study. The good news is that there's never been a better time to learn how things work at the interface between hardware and software. There is a wealth of freely available code and docs. Stackoverflow is a great resource as you know. Good luck! Links follow;
Embedded systems overview
Knowing the C language well is fundamental
Why not get your code working on a simulator before you try real hardware
Another emulated environment
Linux device drivers - an overlapping subject
Another book about bare metal programming

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string