RISC-V PMP instruction access fault when jumping to U mode - riscv

I am trying to use PMP on a 16-byte region to protect a specific memory region. However, I am getting an instruction access fault when jumping to U mode when the PMP configuration is enabled.
Details:
My program starts in M mode and at some point jump to U mode using
mret. I am not using virtual memory for this test.
Memory region that I want to protect starts at 0x80020180.
I set the pmpaddr0 to 0x20008061 (right shift 0x80020180 by 2 and
make the last two digits 0b'01 to mark the 16-byte region).
pmp0cfg is set to 0b'0001 1000 (NAPOT is used and read, write,
execute is not permitted).
I have a store operation that tries to store to 0x80020184 in U mode.
But the code gives instruction access fault when jumping to the U
mode.
The first instruction in U mode is located at PC 0x800004c0,
which should not match with the pmpaddr0.
I am trying to figure out why it is giving instruction access fault when jumping to U mode.
Could anyone please help me to understand what's happening?
I am running my code on Spike, I am seeing the same behavior on rocket-core simulation as well.

I found the answer to my own question, posting as thin might help someone else.
It turns out I missed an important sentence in the spec about the PMP priority.
If no PMP entry matches an M-mode access, the access succeeds. If no PMP entry matches an
S-mode or U-mode access, but at least one PMP entry is implemented, the access fails.
So I had to add a second PMP entry to match all addresses in U mode and that solved the issue.

Related

3.10 kernel crash BUG() in mark_bootmem()

I get a kernel crash at BUG() here - http://lxr.free-electrons.com/source/mm/bootmem.c?v=3.10#L385 with the following message
2kernel BUG at /kernel/mm/bootmem.c:385!
What could be a possible reason for this?
Following is the function call trace
[<c0e165f8>] (mark_bootmem+0xd0/0xe0) from [<c0e05d64>] (bootmem_init+0x16c/0x26
[<c0e05d64>] (bootmem_init+0x16c/0x264) from [<c0e07980>] (paging_init+0x734/0x7
[<c0e07980>] (paging_init+0x734/0x7d4) from [<c0e03f20>] (setup_arch+0x3e8/0x69c
[<c0e03f20>] (setup_arch+0x3e8/0x69c) from [<c0e007d8>] (start_kernel+0x78/0x370
[<c0e007d8>] (start_kernel+0x78/0x370) from [<10008074>] (0x10008074)
Thanks
The mm/bootmem.c file is responsible for Boot Memory Allocator. Function mark_bootmem marks memory pages between start and end addresses (start is rounded down and end is rounded up to page boundaries) as reserved (or not reserved when used for freeing) for this allocator.
It iterates over bdata_list trying to find a region containing first page from requested address range. It it won't find it, the BUG() you mentioned will be triggered. The same BUG() will be triggered if it succeeds finding it, but the region is not large enough (end is outside of the region). So this BUG() means that it wasn't able to find requested memory region to mark.
Now if I understand the kernel code correctly, on normal UMA systems there will be only one entry in bdata_list and it should describe the range of lowmemory pages available in the system. Since you didn't provide too much information about your system it's hard to guess exact reason for the problem but in general, it seems that your memory setup is broken. This thing is very architecture specific so it's hard to tell what exactly is going on.

MOVDQU instruction + page boundary

I have a simple test program that loads an xmm register with the
movdqu instruction accessing data across a page boundary (OS = Linux).
If the following page is mapped, this works just fine. If it's not
mapped then I get a SIGSEGV, which is probably expected.
However this diminishes the usefulness of the unaligned loads quite
a bit. Additionally SSE4.2 instructions (like pcmpistri) which
allow for unaligned memory references appear to exhibit this behavior
as well.
That's all fine -- except there's many an implementation of strcmp
using pcmpistri that I've found that don't seem to address this issue
at all -- and I've been able to contrive trivial testcases that will
cause these implementations to fail, while the byte-at-a-time trivial
strcmp implementation will work just fine with the same data layout.
One more note -- it appears the the GNU C library implementation for
64-bit Linux has a __strcmp_sse42 variant that appears to use the
pcmpistri instruction in a more safe manner. The implementation of
this strcmp is fairly complex, but it appears to be carefully trying
to avoid the page boundary issue. I'm not sure if that's due to the
issue I describe above, or whether it's just a side-effect of trying to
get better performance by aligning the data.
Anyway the question I have is primarily -- where can I find out more
about this issue? I've typed in "movdqu crossing page boundary" and
every variant of that I can think of to Google, but haven't come across
anything particularly useful. If anyone can point me to further info
on this it would be greatly appreciated.
First, any algorithm which tries to access an unmapped address will cause a SegFault. If a non-AVX code flow used a 4 byte load to access the last byte of a page and the first 3 bytes of "the next page" which happened to not be mapped then it would also cause a SegFault. No? I believe that the "issue" is that the AVX(1/2/3) registers are so much bigger than "typical" that algorithms which were unsafe (but got away with it) get caught if they are trivially extended to the larger registers.
Aligned loads (MOVDQA) can never have this problem since they don't cross any boundaries of their own size or greater. Unaligned loads CAN have this problem (as you've noted) and "often" do. The reason for this is that the instruction is defined to load the full size of the target register. You need to look at the operand types in the instruction definitions quite carefully. It doesn't matter how much of the data you are interested in. It matters what the instruction is defined to do.
However...
AVX1 (Sandybridge) added a "masked move" capability which is slower than a movdqa or movdqu but will not (architecturally) access the unmapped page so long as the mask is not enabled for the portion of the access which would have fallen in that page. This is meant to address the issue. In general, moving forward, it appears that masked portions (See AVX512) of loads/stores will not cause access violations on IA either.
(It is a bummer about PCMPxSTRx behavior. Perhaps you could add 15 bytes of padding to your "string" objects?)
Facing a similar problem with a library I was writing, I got some information from a very helpful contributor.
The core of the idea is to align the 16-byte reads to the end of the string, then handle the leftover bytes at the beginning. This works because the end of the string must live in an accessible page, and you are guaranteed that the 16-byte truncated starting address must also live in an accessible page.
Since we never read past the string we cannot potentially stray into a protected page.
To handle the initial set of bytes, I chose to use the PCMPxSTRM functions, which return the bitmask of matching bytes. Then it's simply a matter of shifting the result to ignore any mask bits that occur before the true beginning of the string.

What happens after segmentation fault in linux kernel?

while I was thinking of making a networked paging (request the faulting page from remote node), I got this question:
First, let's consider the following steps:
1) a user-space program tries to access at memory X.
2) MMU walks the page table to find the physical address of X.
3) while walking the page table, it notice that the page table entry is invalid.
4) CPU traps and is catched by the Linux trap vector. (In ARM case, but I think x86 is also the same, right?)
5) At this point, I can retrieve the proper data from remote node, copy into some physical address and map it in page table.
6) Here goes the question: After this point, would the program that has page fault at X safely read the data?, Then, does it mean MMU or CPU somehow remembers the page faulting page table entry and return to that entry and resume the walking of page table?
If any of the steps are not right, please enlighten me.
Data abort handler just assigns to the pc the same value as before the data abort handling started, and instruction gets executed again, with right data in place, so data abort won't happen again.
The solution is tricky and non-portable.
You can get the values of the CPU registers, when the segmentation fault occurred, from a signal handler (link: http://man7.org/linux/man-pages/man2/sigaction.2.html). You need to analyse these to decide whether you can fix the situation. First you need to check that the instruction pointer is valid. Then, you need to check that the faulty address lies in a valid range. Then, you need to map memory for the non existent pages with mmap() system call. Then, you need to copy the required data to these pages. After the signal handler returns, the process will resume from where the segmentation fault had occurred.

protection of instructions from using it in user mode in linux

I read in a tutorial that some 15 instructions in x86 architecture is not allowed to be used in user mode.
I know there is something called code segment register which keeps track of current previlege level
My question is
a)does cpu, before executing every instructions has to check current previlege level it is running?
b)What actually happens if there is an instruction which cannot be used in user mode present in a user program?How CPU comes to know about this before executing?
The CPU does check CPL, RPL and things alike before executing certain instructions or certain parts of them (there are a number of instructions with very complex logic and the set of checks to perform depends on a number of conditions).
If an instruction is not allowed to execute, the CPU generates an exception event, which is then dispatched to its handler. Exception handlers are similar to interrupt handlers in nature and are defined by the OS. So, when the OS gets to handle an exception that it cannot anyhow correct, it terminates the program that's caused the exception.
An example of a "correctable" exception is page faults for virtual memory that's been offloaded to the disk. The OS loads the code/data that the application is trying to use back from the disk into the memory.
The Current Privilege Level is saved in one of the bit of CS register(technically 2 bits on x86).
Certain instructions are not allowed in the code, Ex - users canbe prevented from making certain system call .int X (in assembly (x86) results in system call),with X being an index into the IDT(interrupt descriptor table).This index points to the system call.Also a field called DPL is stored in each entry of IDT.
This are the steps followed by an int instruction:
• Fetch the n’th descriptor from the IDT, where n is the argument of int.
• Check that CPL in %cs is <= DPL, where DPL is the privilege level in the descriptor.
• If yes then the user code has enough privilege to do this system call,the current execution context is saved ( registers etc), because we now switch to kernel mode.
• If not then the user didn't have enough privilege to execute this and will result in an int 13 instruction (general protection fault) being executed
Well for 1 category of DO NOT DO instruction this how checking is done , I am not aware about how it done for other instruction.
Also for accessing different segments through the GDT ( gate descriptor table).the approach is the same.
PS : This is valid only on x86 based systems.
Please comment the link where you saw the list of reserved instruction.

8086 segment selector

There's some "supervisor" bit to not let the "user space" do something like:
mov CS, 200h
?
What kind of protection has?
Thanks
On the actual 8086 CPU? I don't think so. The advanced protection features only really started appearing with the 80286. There were no restrictions on what programs could set the code segment to on the 8086.
On the 80386 in protected mode (I think that was the first one to provide this but it may have been the 80286), the values in CS (and DS, ES, and so on) changed from segment registers to selectors and they had to have entries in a descriptor table (eg: GDT, LDT).
At that point, protection became possible but I don't think it was the loading into a selector register that caused the violations. Rather it was the use of a selector above your privilege level.
Although, for CS, that would happen pretty quickly after you changed it (as you tried to execute the next instruction).
See here for more information.

Resources