What does swi SYS_ERROR0 do in arm linux kernel? - linux

Below is the code for reset vector as defined arm linux (arch/arm/kernel/entry-armv.S)
vector_rst:
ARM( swi SYS_ERROR0 )
THUMB( svc #0 )
THUMB( nop )
b vector_und
What does the instruction swi SYS_ERROR0 do ? When I checked, I found SYS_ERROR0 in arch/arm/kernel/asm-offsets.c
DEFINE(SYS_ERROR0, 0x9f0000);
I was unable to find anything related to it on internet. Can someone explain what does this instruction do ? What is SYS_ERROR0 ?

I was unable to find anything related to it on internet. Can someone explain what does this instruction do ? What is SYS_ERROR0 ?
DEFINE(SYS_ERROR0, 0x9f0000);
The swi instruction is typically a call from user mode to system mode. Ie, User space to the Linux kernel. Lower numbers are standard Linux system call such as open(), sbrk(), etc.
If you look at uapi/asm/unistd.h in arch/arm you can see some defines like __ARM_NR_BASE which is __NR_SYSCALL_BASE+0x0f0000. This can be 0x9f0000 for OABI systems. Basically, these are secret system calls that are ARM specific kernel calls. For instance, __ARM_NR_get_tls is only used for libc thread management on the ARM. Other CPUs may have different non-syscall mechanisms to do the same thing and/or the syscall interface may be different than on the ARM CPU.
So SYS_ERROR0 is a special ARM system call. By the way, asm-offset.c is never used directly. It is compiled and the object is scanned by a script to get assembler offsets to structures, etc. So if a compiler packs structures differently, then in theory, the assembler will be in-sync with the compiler version. We start here,
.L__vectors_start:
W(b) vector_rst
W(b) vector_und
W(ldr) pc, .L__vectors_start + 0x1000
W(b) vector_pabt
W(b) vector_dabt
W(b) vector_addrexcptn
W(b) vector_irq
W(b) vector_fiq
The swi is a vector handled by W(ldr) pc, .L__vectors_start + 0x1000, so the code is 4k after the vector table. This is vector_swi and you can see the code in entry-common.S. There are two methods of making a syscall. The older one (OABI) encodes the call in the SWI instruction. This is bad as the ICACHE must be examined as data (DCACHE). The newer systems pass the syscall in r7. There are two jump tables; sys_call_table and sys_oabi_call_table to handle OABI and the newer mechanism. In both case, higher __NR_SYSCALL_BASE are special cased and use arm_syscall in traps.c. So SYS_ERROR0 is the case 0: /* branch through 0 */ code in traps.c. The message branch through zero is printed (because user space jump to the reset vector which can be at address 0) and the user space gets a signal.

Related

Arm EL0 Address Translation

I want to translate a virtual user level address (EL0) to a physical address.
Arm provides an instruction AT to translate a virtual address to a physical address. This instruction should be called by privileged level (EL1), so I write a linux kernel module to execute this instruction. I pass a user level address (EL0) virt_addr through ioctl() to kernel module, and execute instruction AT.
According to the instruction manual, I execute the instruction by this way:
// S1E0R: S1 means stage 1, E0 means EL0 and R means read.
asm volatile("AT S1E0R, %[_addr_]" :: [_addr_]"r"(virt_addr));
// Register PAR_EL1 stores the translation result.
asm volatile("MRS %0, PAR_EL1": "=r"(phys_addr));
However, the value of register PAR_EL1, i.e. stored in phys_addr, is 0x80b (or 0b100000001011), which is not a correct result.
Then I found a manual about register PAR_EL1.
According to the manual, bit[0] is 1 means a fault occurs on the execution of an Address translation instruction.
bits[6:1] is Fault Status Code (FST). My FST is 0b000101. The manual tell me that the error information is: Translation fault, level 1.
I don't know the fault reason.
My cpu is Cortex-A53, and operating system is Linux-4.14.59.

save the number of bytes read from file [duplicate]

When I try to research about return values of system calls of the kernel, I find tables that describe them and what do I need to put in the different registers to let them work. However, I don't find any documentation where it states what is that return value I get from the system call. I'm just finding in different places that what I receive will be in the EAX register.
TutorialsPoint:
The result is usually returned in the EAX register.
Assembly Language Step-By-Step: Programming with Linux book by Jeff Duntemann states many times in his programs:
Look at sys_read's return value in EAX
Copy sys_read return value for safe keeping
Any of the websites I have don't explain about this return value. Is there any Internet source? Or can someone explain me about this values?
See also this excellent LWN article about system calls which assumes C knowledge.
Also: The Definitive Guide to Linux System Calls (on x86), and related: What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code?
C is the language of Unix systems programming, so all the documentation is in terms of C. And then there's documentation for the minor differences between the C interface and the asm on any given platform, usually in the Notes section of man pages.
sys_read means the raw system call (as opposed to the libc wrapper function). The kernel implementation of the read system call is a kernel function called sys_read(). You can't call it with a call instruction, because it's in the kernel, not a library. But people still talk about "calling sys_read" to distinguish it from the libc function call. However, it's ok to say read even when you mean the raw system call (especially when the libc wrapper doesn't do anything special), like I do in this answer.
Also note that syscall.h defines constants like SYS_read with the actual system call number, or asm/unistd.h for the Linux __NR_read names for the same constants. (The value you put in EAX before an int 0x80 or syscall instruction).
Linux system call return values (in EAX/RAX on x86) are either "normal" success, or a -errno code for error. e.g. -EFAULT if you pass an invalid pointer. This behaviour is documented in the syscalls(2) man page.
-1 to -4095 means error, anything else means success. See AOSP non-obvious syscall() implementation for more details on this -4095UL .. -1UL range, which is portable across architectures on Linux, and applies to every system call. (In the future, a different architecture could use a different value for MAX_ERRNO, but the value for existing arches like x86-64 is guaranteed to stay the same as part of Linus's don't-break-userspace policy of keeping kernel ABIs stable.)
For example, glibc's generic syscall(2) wrapper function uses this sequence: cmp rax, -4095 / jae SYSCALL_ERROR_LABEL, which is guaranteed to be future-proof for all Linux system calls.
You can use that wrapper function to make any system call, like syscall( __NR_mmap, ... ). (Or use an inline-asm wrapper header like https://github.com/linux-on-ibm-z/linux-syscall-support/blob/master/linux_syscall_support.h that has safe inline-asm for multiple ISAs, avoiding problems like missing "memory" clobbers that some other inline-asm wrappers have.)
Interesting cases include getpriority where the kernel ABI maps the -20..19 return-value range to 1..40, and libc decodes it. More details in a related answer about decoding syscall error return values.
For mmap, if you wanted you could also detect error just by checking that the return value isn't page-aligned (e.g. any non-zero bits in the low 11, for a 4k page size), if that would be more efficient than checking p > -4096ULL.
To find the actual numeric values of constants for a specific platform, you need to find the C header file where they're #defined. See my answer on a question about that for details. e.g. in asm-generic/errno-base.h / asm-generic/errno.h.
The meanings of return values for each sys call are documented in the section 2 man pages, like read(2). (sys_read is the raw system call that the glibc read() function is a very thin wrapper for.) Most man pages have a whole section for the return value. e.g.
RETURN VALUE
On success, the number of bytes read is returned (zero indicates
end of file), and the file position is advanced by this number. It
is not an error if this number is smaller than the number of bytes
requested; this may happen for example because fewer bytes are
actually available right now (maybe because we were close to end-of-
file, or because we are reading from a pipe, or from a terminal), or
because read() was interrupted by a signal. See also NOTES.
On error, -1 is returned, and errno is set appropriately. In this
case, it is left unspecified whether the file position (if any)
changes.
Note that the last paragraph describes how the glibc wrapper decodes the value and sets errno to -EAX if the raw system call's return value is negative, so errno=EFAULT and return -1 if the raw system call returned -EFAULT.
And there's a whole section listing all the possible error codes that read() is allowed to return, and what they mean specifically for read(). (POSIX standardizes most of this behaviour.)

RISC-V user level reference or reference implementation

Summary: What is the definitive reference or reference implementation for the RISC-V user-level ISA?
Context: The RISC-V website has "The RISC-V Instruction Set Manual" which explains the user-level instructions very well, but does not give an exact specification for them. I am trying to build a user-level ISA simulator now and intend to write an FPGA implementation later, so the exact behavior is important to me.
A reference implementation would be sufficient, but should preferably be as simple as possible -- i.e. I would try to understand a pipelined implementation only as a last resort. What is important is to have an understanding of the specified ISA and not of a single CPU implementation or compiler implementation.
One example to show my problem is the AUIPC instruction: The prose explanation says that "AUIPC forms a 32-bit offset from the 20-bit U-immediate, filling in the lowest 12 bits with zeros, adds this offset to the pc, then places the result in register rd." I wanted to know whether this refers to the old or new PC, i.e. the position of the AUIPC instruction or the next instruction. I looked at the "RISCV Angel" implementation, but that seems to mask out the lower bits of the (old) PC -- not just of the immediate -- which I could not find any reason for in the spec, not even in the change history of the spec (since Angel is a bit older). Instead of an answer, I now have two questions about AUIPC. Many other instructions pose similar problems to me.
AFAICT the RISC-V Instruction Set Manual you cite is the closest thing there is to a definitive reference. If there are things that are unclear or incorrect in there then you could open issues at the Github site where that document is maintained: https://github.com/riscv/riscv-isa-manual
As far as AIUPC is concerned, the answer is implied, but not stated explicitly, by this sentence at the bottom of page 9 in the current manual:
There is one additional user-visible register: the program counter pc holds the address of the current instruction.
Based on that statement I would expect that the pc value that is seen and manipulated by the AIUPC instruction is the address of the AIUPC instruction itself.
This interpretation is supported by the discussion of the JALR instruction:
The indirect jump instruction JALR (jump and link register) uses the I-type encoding. The target address is obtained by adding the 12-bit signed I-immediate to the register rs1, then setting the least-signicant bit of the result to zero. The address of the instruction following the jump (pc+4) is written to register rd.
Given that the address of the following instruction is expressed as pc+4, it seems clear that the pc value visible during the execution of JALR is the address of the JALR instruction itself.
The latest draft of the manual (at https://github.com/riscv/riscv-isa-manual/releases/download/draft-20190321-ba17106/riscv-spec.pdf) makes the situation slightly clearer. In place of this in the current manual:
AUIPC appends 12 low-order zero bits to the 20-bit U-immediate, sign-extends the result to 64 bits, then adds it to the pc and places the result in register rd.
the latest draft says:
AUIPC forms a 32-bit offset from the 20-bit U-immediate, filling in the lowest 12 bits with zeros, adds this offset to the pc of the AUIPC instruction, then places the result in register rd.

What happens when you execute an instruction that your CPU does not support?

What happens if a CPU attempts to execute a binary that has been compiled with some instructions that your CPU doesn't support. I'm specifically wondering about some of the new AVX instructions running on older processors.
I'm assuming this can be tested for, and a friendly message could in theory be displayed to a user. Presumably most low level libraries will check this on your behalf. Assuming you didn't make this check, what would you expect to happen? What signal would your process receive?
A new instruction can be designed to be "legacy compatible" or it can not.
To the former class belong instructions like tzcnt or xacquire that have an encoding that produces valid instructions in older architecture: tzcnt is encoded as
rep bsf and xacquire is just repne.
The semantic is different of course.
To the second class belong the majority of new instructions, AVX being one popular example.
When the CPU encounters an invalid or reserved encoding it generates the #UD (for UnDefined) exception - that's interrupt number 6.
The Linux kernel set the IDT entry for #UD early in entry_64.S:
idtentry invalid_op do_invalid_op has_error_code=0
the entry points to do_invalid_op that is generated with a macro in traps.c:
DO_ERROR(X86_TRAP_UD, SIGILL, "invalid opcode", invalid_op)
the macro DO_ERROR generates a function that calls do_error_trap in the same file (here).
do_error_trap uses fill_trap_info (in the same file, here) to create a siginfo_t structure containing the Linux signal information:
case X86_TRAP_UD:
sicode = ILL_ILLOPN;
siaddr = uprobe_get_trap_addr(regs);
break;
from there the following calls happen:
do_trap in traps.c
force_sig_info in signal.c
specific_send_sig_info in signal.c
that ultimately culminates in calling the signal handler for SIGILL of the offending process.
The following program is a very simple example that generates an #UD
BITS 64
GLOBAL _start
SECTION .text
_start:
ud2
we can use strace to check the signal received by running that program
--- SIGILL {si_signo=SIGILL, si_code=ILL_ILLOPN, si_addr=0x400080} ---
+++ killed by SIGILL +++
as expected.
As Cody Gray commented, libraries don't usually rely on SIGILL, instead they use a CPU dispatcher or check the presence of an instruction explicitly.

$gp, .cpload and position independence on MIPS

I'm reading about PIC implementation on MIPS on Linux here. It says:
The global pointer which is stored in the $gp register (aka $28) is a callee saved register.
The Wikipedia article about MIPS says the same.
However, according to them, when a .cpload directive is being used in function prologue, it clobbers the previous value of $gp without saving it first. When a .cprestore is used, it saves the current $gp to the stack frame, as opposed to the value of $gp that was there on function entrance. Same goes for the effect .cprestore has on jal/jalr: it restores $gp once the callee returns - assuming the callee might've clobbered it.
And finally, there's nothing in the function epilogue about $gp.
All in all, doesn't sound like a callee-saved register to me. Sounds like a caller-saved register. What am I misunderstanding here?
Linux programs on MIPS can be compiled as pic or not. If compiled as pic, then they must use "abicalls", and its behaviour is a little different from that of the no-abicalls convention.
From the "section Position-Independent Function Prologue" of the "SYSTEM V APPLICATION BINARY INTERFACE - MIPS Processor Supplement 3rd Edition" we can cite:
After calculating the gp, a function allocates the local stack space and saves the gp on the stack, so it can be restored after subsequent function calls. In other words, the gp is a caller saved register.
The code in the following figure illustrates a position-independent function prologue. _gp_disp represents the offset between the beginning of the function and the global offset table.
name:
la gp, _gp_disp
addu gp, gp, t9
addiu sp, sp, –64
sw gp, 32(sp)
So in summary, if you're using -mabicalls then gp is calculated at the beginning of all the functions needing global symbols (with some exceptions), and additionally any code (abi or not) that calls abi code will ensure that the called function address is stored in t9.

Resources