Understanding IRQs usage - linux

In the last couple of days, I've trying to implement a simple interrupt handler in C. So far so good, I think I've achieved my initial task. My ultimate goal is to inject some faults in the kernel through bit-flipping some CPU registers. After reading this, this and this, I chose IRQ number 10, since it seems it is a free, "open interrupt". My doubt is: how can I know if this is the "best" IRQ to call my interrupt handler? How can I possibly decide on this?
Thanks and all the best,
João
P.S.: As far as I know, there are 16 IRQ lines, but Linux has 256 IDT entries, as clearly stated in /include/asm/irq_vectors.h (please see below). This also seems to be puzzling me.
/*
* Linux IRQ vector layout.
*
* There are 256 IDT entries (per CPU - each entry is 8 bytes) which can
* be defined by Linux. They are used as a jump table by the CPU when a
* given vector is triggered - by a CPU-external, CPU-internal or
* software-triggered event.
*
* Linux sets the kernel code address each entry jumps to early during
* bootup, and never changes them. This is the general layout of the
* IDT entries:
*
* Vectors 0 ... 31 : system traps and exceptions - hardcoded events
* Vectors 32 ... 127 : device interrupts
* Vector 128 : legacy int80 syscall interface
* Vectors 129 ... 237 : device interrupts
* Vectors 238 ... 255 : special interrupts
*
* 64-bit x86 has per CPU IDT tables, 32-bit has one shared IDT table.
*
* This file enumerates the exact layout of them:
*/

Related

Reading /dev/urandom as early as possible

I am performing research in the field of random number generation and I need to demonstrate the "boot-time entropy hole" from the well-known "P's and Q's" paper (here). We will be spooling up two copies of the same minimal Linux virtual machine at the same time and we are expecting their /dev/urandom values to be the same at some early point in the boot process.
However, I have been unable to read /dev/urandom early enough in the boot process to spot the issue. We need to the earlier in the boot process.
How can I get the earliest possible values of /dev/urandom? We likely will need to modify the kernel, but we have very little experience there, and need some pointers. Or, if there's a kernel-instrumenting tool available that could do it without re-compiling a kernel, that would be great, too.
Thanks in advance!
urandom is provided via device driver and the first thing kernel does with the driver is to call init call.
If you take a look here: http://lxr.free-electrons.com/source/drivers/char/random.c#L1401
* Note that setup_arch() may call add_device_randomness()
* long before we get here. This allows seeding of the pools
* with some platform dependent data very early in the boot
* process. But it limits our options here. We must use
* statically allocated structures that already have all
* initializations complete at compile time. We should also
* take care not to overwrite the precious per platform data
* we were given.
*/
static int rand_initialize(void)
{
init_std_data(&input_pool);
init_std_data(&blocking_pool);
init_std_data(&nonblocking_pool);
return 0;
}
early_initcall(rand_initialize);
So, init function for this driver is rand_initialize. However note that comment says that setup_arch may call add_device randomness() before this device is even initialized. However, calling that function does not add any actual entropy (it feeds the pool with stuff like MAC addresses, so if you have two exactly the same VMs, you're good there). From the comment:
* add_device_randomness() is for adding data to the random pool that
* is likely to differ between two devices (or possibly even per boot).
* This would be things like MAC addresses or serial numbers, or the
* read-out of the RTC. This does *not* add any actual entropy to the
* pool, but it initializes the pool to different values for devices
* that might otherwise be identical and have very little entropy
* available to them (particularly common in the embedded world).
Also, note that entropy pools are stored on shutdown and restored on boot time via init script (on my Ubuntu 14.04, it's in /etc/init.d/urandom), so you might want to call your script from that script before
53 (
54 date +%s.%N
55
56 # Load and then save $POOLBYTES bytes,
57 # which is the size of the entropy pool
58 if [ -f "$SAVEDFILE" ]
59 then
60 cat "$SAVEDFILE"
61 fi
62 # Redirect output of subshell (not individual commands)
63 # to cope with a misfeature in the FreeBSD (not Linux)
64 # /dev/random, where every superuser write/close causes
65 # an explicit reseed of the yarrow.
66 ) >/dev/urandom
or similar call is made.

What is PML4 short for?

In Xen code ./xen/include/asm-x86/config.h, I saw the memory layout code is:
/*
137 * Meng: Xen-definitive guide: P81
138 * Memory layout:
139 * 0x0000000000000000 - 0x00007fffffffffff [128TB, 2^47 bytes, PML4:0-255]
140 * Guest-defined use (see below for compatibility mode guests).
141 * 0x0000800000000000 - 0xffff7fffffffffff [16EB]
142 * Inaccessible: current arch only supports 48-bit sign-extended VAs.
143 * 0xffff800000000000 - 0xffff803fffffffff [256GB, 2^38 bytes, PML4:256]
I'm very confused at what the PML4 is short for. I did know that the x86_64 only uses 48 bits out of 64bits. But what is the PML4 short for? It may help me understand the number behind it.
Thanks!
It's short for Page Map Level 4. A bit of explanation can be found here. Basically it's just the way AMD decided to label page tables.

Possible to insert an existing page into another VMA structure?

I`ve been learning the linux kernel through some experiments.Recently i m wondering whether it is possible to share pages between two user-space processes by inserting the pages of one process into the vma structure of the other one, after the latter calls mmap and sends the addr back to kernel through netlink.The insertion would be done in a driver module.The reason for this test,is that the two processes might not be directly communicating with each other,and duplicate pages of read-only memory could be a bad choice considering efficiency and redundancy.
And after some research I found the vm_insert_page function and the traditional remap_pfn_range. However It says in the lxr:
/**
2020 * vm_insert_page - insert single page into user vma
2021 * #vma: user vma to map to
2022 * #addr: target user address of this page
2023 * #page: source kernel page
2024 *
2025 * This allows drivers to insert individual pages they've allocated
2026 * into a user vma.
2027 *
2028 * The page has to be a nice clean individual kernel allocation."
from lxr
Does this mean it`s impossible to insert an existing page into another vma?The function can Only be called with newly created pages?I always thought pages could be sharing with a reference count number.

Do Kernel pages get swapped out?

Pertaining to the Linux kernel, do "Kernel" pages ever get swapped out ? Also, do User space pages ever get to reside in ZONE_NORMAL ?
No, kernel memory is unswappable.
Kernel pages are not swappable. But it can be freed.
UserSpace Pages can reside in ZONE_NORMAL.
Linux System Can be configured either to use HIGHMEM or not.
If ZONE_HIGHMEM is configured , then the userspace processes will get its memory from the HIGHMEM else userspace processes will get memory from ZONE_NORMAL.
Yes, under normal circumstances kernel pages (ie, memory residing in the kernel for kernel usage) are not swappable, in fact, once detected (see the pagefault handler source code), the kernel will explicitly crash itself.
See this:
http://lxr.free-electrons.com/source/arch/x86/mm/fault.c
and the function:
1205 /*
1206 * This routine handles page faults. It determines the address,
1207 * and the problem, and then passes it off to one of the appropriate
1208 * routines.
1209 *
1210 * This function must have noinline because both callers
1211 * {,trace_}do_page_fault() have notrace on. Having this an actual function
1212 * guarantees there's a function trace entry.
1213 */
1214 static noinline void
1215 __do_page_fault(struct pt_regs *regs, unsigned long error_code,
1216 unsigned long address)
1217 {
And the detection here:
1246 *
1247 * This verifies that the fault happens in kernel space
1248 * (error_code & 4) == 0, and that the fault was not a
1249 * protection error (error_code & 9) == 0.
1250 */
1251 if (unlikely(fault_in_kernel_space(address))) {
1252 if (!(error_code & (PF_RSVD | PF_USER | PF_PROT))) {
1253 if (vmalloc_fault(address) >= 0)
1254 return;
1255
1256 if (kmemcheck_fault(regs, address, error_code))
1257 return;
1258 }
But the same pagefault handler - which can detect pagefault arising from non-existent usermode memory (all hardware pagefault detection is always done in kernel) will explicitly retrieve the data from swap space if it exists, or start a memory allocation routine to give the process more memory.
Ok, that said, kernel does swap out kernel structures/memory/tasklists etc during software suspend and hibernation operation:
https://www.kernel.org/doc/Documentation/power/swsusp.txt
And during the resume phase it will restore back the kernel memory from swap file.

Where to start learning about linux DMA / device drivers / memory allocation

I'm porting / debugging a device driver (that is used by another kernel module) and facing a dead end because dma_sync_single_for_device() fails with an kernel oops.
I have no clue what that function is supposed to do and googling does not really help, so I probably need to learn more about this stuff in total.
The question is, where to start?
Oh yeah, in case it is relevant, the code is supposed to run on a PowerPC (and the linux is OpenWRT)
EDIT:
On-line resources preferrable (books take a few days to be delivered :)
On-line:
Anatomy of the Linux slab allocator
Understanding the Linux Virtual Memory Manager
Linux Device Drivers, Third Edition
The Linux Kernel Module Programming Guide
Writing device drivers in Linux: A brief tutorial
Books:
Linux Kernel Development (2nd Edition)
Essential Linux Device Drivers ( Only the first 4 - 5 chapters )
Useful Resources:
the Linux Cross Reference ( Searchable Kernel Source for all Kernels )
API changes in the 2.6 kernel series
dma_sync_single_for_device calls dma_sync_single_range_for_cpu a little further up in the file and this is the source documentation ( I assume that even though this is for arm the interface and behavior are the same ):
/**
380 * dma_sync_single_range_for_cpu
381 * #dev: valid struct device pointer, or NULL for ISA and EISA-like devices
382 * #handle: DMA address of buffer
383 * #offset: offset of region to start sync
384 * #size: size of region to sync
385 * #dir: DMA transfer direction (same as passed to dma_map_single)
386 *
387 * Make physical memory consistent for a single streaming mode DMA
388 * translation after a transfer.
389 *
390 * If you perform a dma_map_single() but wish to interrogate the
391 * buffer using the cpu, yet do not wish to teardown the PCI dma
392 * mapping, you must call this function before doing so. At the
393 * next point you give the PCI dma address back to the card, you
394 * must first the perform a dma_sync_for_device, and then the
395 * device again owns the buffer.
396 */
Understanding The Linux Kernel?
The chapters of the Linux Device Drivers book (in the same series as Understanding the Linux Kernel, recommended by #Matthew Flaschen) might be useful.
You can download the indiivudal chapters from the LWN Website. Chapter 16 deals with DMA.

Resources