I'm porting / debugging a device driver (that is used by another kernel module) and facing a dead end because dma_sync_single_for_device() fails with an kernel oops.
I have no clue what that function is supposed to do and googling does not really help, so I probably need to learn more about this stuff in total.
The question is, where to start?
Oh yeah, in case it is relevant, the code is supposed to run on a PowerPC (and the linux is OpenWRT)
EDIT:
On-line resources preferrable (books take a few days to be delivered :)
On-line:
Anatomy of the Linux slab allocator
Understanding the Linux Virtual Memory Manager
Linux Device Drivers, Third Edition
The Linux Kernel Module Programming Guide
Writing device drivers in Linux: A brief tutorial
Books:
Linux Kernel Development (2nd Edition)
Essential Linux Device Drivers ( Only the first 4 - 5 chapters )
Useful Resources:
the Linux Cross Reference ( Searchable Kernel Source for all Kernels )
API changes in the 2.6 kernel series
dma_sync_single_for_device calls dma_sync_single_range_for_cpu a little further up in the file and this is the source documentation ( I assume that even though this is for arm the interface and behavior are the same ):
/**
380 * dma_sync_single_range_for_cpu
381 * #dev: valid struct device pointer, or NULL for ISA and EISA-like devices
382 * #handle: DMA address of buffer
383 * #offset: offset of region to start sync
384 * #size: size of region to sync
385 * #dir: DMA transfer direction (same as passed to dma_map_single)
386 *
387 * Make physical memory consistent for a single streaming mode DMA
388 * translation after a transfer.
389 *
390 * If you perform a dma_map_single() but wish to interrogate the
391 * buffer using the cpu, yet do not wish to teardown the PCI dma
392 * mapping, you must call this function before doing so. At the
393 * next point you give the PCI dma address back to the card, you
394 * must first the perform a dma_sync_for_device, and then the
395 * device again owns the buffer.
396 */
Understanding The Linux Kernel?
The chapters of the Linux Device Drivers book (in the same series as Understanding the Linux Kernel, recommended by #Matthew Flaschen) might be useful.
You can download the indiivudal chapters from the LWN Website. Chapter 16 deals with DMA.
Related
I need to reserve a large buffer of physically contiguous RAM from the kernel and be able to gaurantee that the buffer will always use a specific, hard-coded physical address. This buffer should remain reserved for the kernel's entire lifetime. I have written a chardev driver as an interface for accessing this buffer in userspace. My platform is an embedded system with ARMv7 architecture running a 2.6 Linux kernel.
Chapter 15 of Linux Device Drivers, Third Edition has the following to say on the topic (page 443):
Reserving the top of RAM is accomplished by passing a mem= argument to the kernel at boot time. For example, if you have 256 MB, the argument mem=255M keeps the kernel from using the top megabyte. Your module could later use the following code to gain access to such memory:
dmabuf = ioremap (0xFF00000 /* 255M */, 0x100000 /* 1M */);
I've done that plus a couple of other things:
I'm using the memmap bootarg in addition to the mem one. The kernel boot parameters documentation suggests always using memmap whenever you use mem to avoid address collisions.
I used request_mem_region before calling ioremap and, of course, I check that it succeeds before moving ahead.
This is what the system looks like after I've done all that:
# cat /proc/cmdline
root=/dev/mtdblock2 console=ttyS0,115200 init=/sbin/preinit earlyprintk debug mem=255M memmap=1M$255M
# cat /proc/iomem
08000000-0fffffff : PCIe Outbound Window, Port 0
08000000-082fffff : PCI Bus 0001:01
08000000-081fffff : 0001:01:00.0
08200000-08207fff : 0001:01:00.0
18000300-18000307 : serial
18000400-18000407 : serial
1800c000-1800cfff : dmu_regs
18012000-18012fff : pcie0
18013000-18013fff : pcie1
18014000-18014fff : pcie2
19000000-19000fff : cru_regs
1e000000-1fffffff : norflash
40000000-47ffffff : PCIe Outbound Window, Port 1
40000000-403fffff : PCI Bus 0002:01
40000000-403fffff : 0002:01:00.0
40400000-409fffff : PCI Bus 0002:01
40400000-407fffff : 0002:01:00.0
40800000-40807fff : 0002:01:00.0
80000000-8fefffff : System RAM
80052000-8045dfff : Kernel text
80478000-80500143 : Kernel data
8ff00000-8fffffff : foo
Everything so far looks good, and my driver is working perfectly. I'm able to read and write directly to the specific physical address I've chosen.
However, during bootup, a big scary warning (™) was triggered:
BUG: Your driver calls ioremap() on system memory. This leads
to architecturally unpredictable behaviour on ARMv6+, and ioremap()
will fail in the next kernel release. Please fix your driver.
------------[ cut here ]------------
WARNING: at arch/arm/mm/ioremap.c:211 __arm_ioremap_pfn_caller+0x8c/0x144()
Modules linked in:
[] (unwind_backtrace+0x0/0xf8) from [] (warn_slowpath_common+0x4c/0x64)
[] (warn_slowpath_common+0x4c/0x64) from [] (warn_slowpath_null+0x1c/0x24)
[] (warn_slowpath_null+0x1c/0x24) from [] (__arm_ioremap_pfn_caller+0x8c/0x144)
[] (__arm_ioremap_pfn_caller+0x8c/0x144) from [] (__arm_ioremap_caller+0x50/0x58)
[] (__arm_ioremap_caller+0x50/0x58) from [] (foo_init+0x204/0x2b0)
[] (foo_init+0x204/0x2b0) from [] (do_one_initcall+0x30/0x19c)
[] (do_one_initcall+0x30/0x19c) from [] (kernel_init+0x154/0x218)
[] (kernel_init+0x154/0x218) from [] (kernel_thread_exit+0x0/0x8)
---[ end trace 1a4cab5dbc05c3e7 ]---
Triggered from: arc/arm/mm/ioremap.c
/*
* Don't allow RAM to be mapped - this causes problems with ARMv6+
*/
if (pfn_valid(pfn)) {
printk(KERN_WARNING "BUG: Your driver calls ioremap() on system memory. This leads\n"
KERN_WARNING "to architecturally unpredictable behaviour on ARMv6+, and ioremap()\n"
KERN_WARNING "will fail in the next kernel release. Please fix your driver.\n");
WARN_ON(1);
}
What problems, exactly, could this cause? Can they be mitigated? What are my alternatives?
So I've done exactly that, and it's working.
Provide the kernel command line (e.g. /proc/cmdline) and the resulting memory map (i.e. /proc/iomem) to verify this.
What problems, exactly, could this cause?
The problem with using ioremap() on system memory is that you end up assigning conflicting attributes to the memory which causes "unpredictable" behavior.
See the article "ARM's multiply-mapped memory mess", which provides a history to the warning that you are triggering.
The ARM kernel maps RAM as normal memory with writeback caching; it's also marked non-shared on uniprocessor systems. The ioremap() system call, used to map I/O memory for CPU use, is different: that memory is mapped as device memory, uncached, and, maybe, shared. These different mappings give the expected behavior for both types of memory. Where things get tricky is when somebody calls ioremap() to create a new mapping for system RAM.
The problem with these multiple mappings is that they will have differing attributes. As of version 6 of the ARM architecture, the specified behavior in that situation is "unpredictable."
Note that "system memory" is the RAM that is managed by the kernel.
The fact that you trigger the warning indicates that your code is generating multiple mappings for a region of memory.
Can they be mitigated?
You have to ensure that the RAM you want to ioremap() is not "system memory", i.e. managed by the kernel.
See also this answer.
ADDENDUM
This warning that concerns you is the result of pfn_valid(pfn) returning TRUE rather than FALSE.
Based on the Linux cross-reference link that you provided for version 2.6.37,
pfn_valid() is simply returning the result of
memblock_is_memory(pfn << PAGE_SHIFT);
which in turn is simply returning the result of
memblock_search(&memblock.memory, addr) != -1;
I suggest that the kernel code be hacked so that the conflict is revealed.
Before the call to ioremap(), assign TRUE to the global variable memblock_debug.
The following patch should display the salient information about the memory conflict.
(The memblock list is ordered by base-address, so memblock_search() performs a binary search on this list, hence the use of mid as the index.)
static int __init_memblock memblock_search(struct memblock_type *type, phys_addr_t addr)
{
unsigned int left = 0, right = type->cnt;
do {
unsigned int mid = (right + left) / 2;
if (addr < type->regions[mid].base)
right = mid;
else if (addr >= (type->regions[mid].base +
type->regions[mid].size))
left = mid + 1;
- else
+ else {
+ if (memblock_debug)
+ pr_info("MATCH for 0x%x: m=0x%x b=0x%x s=0x%x\n",
+ addr, mid,
+ type->regions[mid].base,
+ type->regions[mid].size);
return mid;
+ }
} while (left < right);
return -1;
}
If you want to see all the memory blocks, then call memblock_dump_all() with the variable memblock_debug is TRUE.
[Interesting that this is essentially a programming question, yet we haven't seen any of your code.]
ADDENDUM 2
Since you're probably using ATAGs (instead of Device Tree), and you want to dedicate a memory region, fix up the ATAG_MEM to reflect this smaller size of physical memory.
Assuming you have made zero changes to your boot code, the ATAG_MEM is still specifying the full RAM, so perhaps this could be the source of the system memory conflict that causes the warning.
See this answer about ATAGs and this related answer.
I am performing research in the field of random number generation and I need to demonstrate the "boot-time entropy hole" from the well-known "P's and Q's" paper (here). We will be spooling up two copies of the same minimal Linux virtual machine at the same time and we are expecting their /dev/urandom values to be the same at some early point in the boot process.
However, I have been unable to read /dev/urandom early enough in the boot process to spot the issue. We need to the earlier in the boot process.
How can I get the earliest possible values of /dev/urandom? We likely will need to modify the kernel, but we have very little experience there, and need some pointers. Or, if there's a kernel-instrumenting tool available that could do it without re-compiling a kernel, that would be great, too.
Thanks in advance!
urandom is provided via device driver and the first thing kernel does with the driver is to call init call.
If you take a look here: http://lxr.free-electrons.com/source/drivers/char/random.c#L1401
* Note that setup_arch() may call add_device_randomness()
* long before we get here. This allows seeding of the pools
* with some platform dependent data very early in the boot
* process. But it limits our options here. We must use
* statically allocated structures that already have all
* initializations complete at compile time. We should also
* take care not to overwrite the precious per platform data
* we were given.
*/
static int rand_initialize(void)
{
init_std_data(&input_pool);
init_std_data(&blocking_pool);
init_std_data(&nonblocking_pool);
return 0;
}
early_initcall(rand_initialize);
So, init function for this driver is rand_initialize. However note that comment says that setup_arch may call add_device randomness() before this device is even initialized. However, calling that function does not add any actual entropy (it feeds the pool with stuff like MAC addresses, so if you have two exactly the same VMs, you're good there). From the comment:
* add_device_randomness() is for adding data to the random pool that
* is likely to differ between two devices (or possibly even per boot).
* This would be things like MAC addresses or serial numbers, or the
* read-out of the RTC. This does *not* add any actual entropy to the
* pool, but it initializes the pool to different values for devices
* that might otherwise be identical and have very little entropy
* available to them (particularly common in the embedded world).
Also, note that entropy pools are stored on shutdown and restored on boot time via init script (on my Ubuntu 14.04, it's in /etc/init.d/urandom), so you might want to call your script from that script before
53 (
54 date +%s.%N
55
56 # Load and then save $POOLBYTES bytes,
57 # which is the size of the entropy pool
58 if [ -f "$SAVEDFILE" ]
59 then
60 cat "$SAVEDFILE"
61 fi
62 # Redirect output of subshell (not individual commands)
63 # to cope with a misfeature in the FreeBSD (not Linux)
64 # /dev/random, where every superuser write/close causes
65 # an explicit reseed of the yarrow.
66 ) >/dev/urandom
or similar call is made.
I'm trying to parametrize the rocket core by changing the configuration in PublicConfig.scala.
However, when I change XprLen and L1D_SETS to 32, I have a compilation problem.
What is the proper way to genarate a 32 bit data path with the Rocket Chip Generator, if possible?
The Rocket-chip does not currently support generating a 32b processor.
While the required changes to the datapath would be minimal, the host-target interface for communicating to the front-end server (as Rocket currently only runs in a tethered mode) has only been spec'ed out for 64b cores.
ALso, L1D_SETS is the number of "sets" in the L1 data-cache (such that L1D_WAYS * L1D_SETS * 64 bytes per line is the total cache capacity in bytes).
In the last couple of days, I've trying to implement a simple interrupt handler in C. So far so good, I think I've achieved my initial task. My ultimate goal is to inject some faults in the kernel through bit-flipping some CPU registers. After reading this, this and this, I chose IRQ number 10, since it seems it is a free, "open interrupt". My doubt is: how can I know if this is the "best" IRQ to call my interrupt handler? How can I possibly decide on this?
Thanks and all the best,
João
P.S.: As far as I know, there are 16 IRQ lines, but Linux has 256 IDT entries, as clearly stated in /include/asm/irq_vectors.h (please see below). This also seems to be puzzling me.
/*
* Linux IRQ vector layout.
*
* There are 256 IDT entries (per CPU - each entry is 8 bytes) which can
* be defined by Linux. They are used as a jump table by the CPU when a
* given vector is triggered - by a CPU-external, CPU-internal or
* software-triggered event.
*
* Linux sets the kernel code address each entry jumps to early during
* bootup, and never changes them. This is the general layout of the
* IDT entries:
*
* Vectors 0 ... 31 : system traps and exceptions - hardcoded events
* Vectors 32 ... 127 : device interrupts
* Vector 128 : legacy int80 syscall interface
* Vectors 129 ... 237 : device interrupts
* Vectors 238 ... 255 : special interrupts
*
* 64-bit x86 has per CPU IDT tables, 32-bit has one shared IDT table.
*
* This file enumerates the exact layout of them:
*/
On the mainbord we have an interrupt controller (IRC) which acts as a multiplexer between the devices which can raise an interrupt and the CPU:
|--------|
|-----------| | |
-(0)------| IRC _____|______| CPU |
-(...)----| ____/ | | |
-(15)-----|/ | |--------|
|-----------|
Every device is associated with an IRQ (the number on the left). After every execution the CPU senses the interrupt-request line. If a signal is detected a state save will be performed and the CPU loads an Interrupt Handler Routine which can be found in the Interrupt Vector which is located on a fixed address in memory. As far as I can see the Number of the IRQ and the Vector number in the Interrupt Vector are not the same because I have for example my network card registered to IRQ 8. On an Intel Pentium processor this would point to a routine which is used to signal one error condition so there must be a mapping somewhere which points to the correct handler.
Questions:
1) If I write an device driver and register an IRQ X for it. From where does the system know which device should be handled? I can for example use request_irq() with IRQ number 10 but how does the system know that the handler should be used for the mouse or keyboard or for whatever i write the driver?
2) How is the Interrupt Vector looking then? I mean if I use the IRQ 10 for my device this would overwrite an standard handler which is for error handling in the table (the first usable one is 32 according to Silberschatz (Operating System Concepts)).
3) Who initialy sets the IRQs? The Bios? The OS?
4) Who is responsible for the matching of the IRQ and the offset in the Interrupt Vector?
5) It is possible to share IRQS. How is that possible? There are hardware lanes on the Mainboard which connect devices to the Interrupt Controller. How can to lanes be configured to the same Interrupt? There must be a table which says lane 2 and 3 handle IRQ15 e.g. Where does this table reside and how is it called?
Answers with respect to linux kernel. Should work for most other OS's also.
1) If I write an device driver and register an IRQ X for it. From where does the system know which device should be handled? I can for example use request_irq() with IRQ number 10 but how does the system know that the handler should be used for the mouse or keyboard or for whatever i write the driver?
There is no 1 answer to it. For example if this is a custom embedded system, the hardware designer will tell the driver writer "I am going to route device x to irq y". For more flexibility, for example for a network card which generally uses PCI protocol. There are hardware/firmware level arbitration to assign an irq number to a new device when it is detected. This will then be written to one of the PCI configuration register. The driver first reads this device register and then registers its interrupt handler for that particular irq. There will be similar mechanisms for other protocols.
What you can do is look up calls to request_irq in kernel code and how the driver obtained the irq value. It will be different for each kind of driver.
The answer to this question is thus, the system doesn't know. The hardware designer or the hardware protocols provide this information to driver writer. And then the driver writer registers the handler for that particular irq, telling the system what to do in case you see that irq.
2) How is the Interrupt Vector looking then? I mean if I use the IRQ 10 for my device this would overwrite an standard handler which is for error handling in the table (the first usable one is 32 according to Silberschatz (Operating System Concepts)).
Good question. There are two parts to it.
a) When you request_irq (irq,handler). The system really doesn't program entry 0 in the IVT or IDT. But entry N + irq. Where N is the number of error handlers or general purpose exceptions supported on that CPU. Details vary from system to system.
b) What happens if you erroneously request an irq which is used by another driver. You get an error and IDT is not programmed with your handler.
Note: IDT is interrupt descriptor table.
3) Who initialy sets the IRQs? The Bios? The OS?
Bios first and then OS. But there are certain OS's for example, MS-DOS which doesn't reprogram the IVT set up by BIOS. More sophisticated modern OS's like Windows or Linux do not want to rely on particular bios functions, and they re-program the IDT. But bios has to do it initially only then OS comes into picture.
4) Who is responsible for the matching of the IRQ and the offset in the Interrupt Vector?
I am really not clear what you mean. The flow is like this. First your device is assigned an irq number, and then you register an handler for it with that irq number. If you use wrong irq number, and then enable interrupt on your device, system will crash. Because the handler is registered fro wrong irq number.
5) It is possible to share IRQS. How is that possible? There are hardware lanes on the Mainboard which connect devices to the Interrupt Controller. How can to lanes be configured to the same Interrupt? There must be a table which says lane 2 and 3 handle IRQ15 e.g. Where does this table reside and how is it called?
This is a very good question. Extra table is not how it is solved in kernel. Rather for each shared irq, the handlers are kept in a linked list of function pointers. Kernel loops through all the handlers and invokes them one after another until one of the handler claims the interrupt as its own.
The code looks like this:
driver1:
d1_int_handler:
if (device_interrupted()) <------------- This reads the hardware
{
do_interrupt_handling();
return MY_INTERRUPT;
}else {
return NOT_MY_INTERRUPT;
}
driver2:
Similar to driver 1
kernel:
do_irq(irq n)
{
if (shared_irq(n))
{
irq_chain = get_chain(n);
while(irq_chain)
{
if ((ret = irq_chain->handler()) == MY_INTERRUPT)
break;
irq_chain = irq_chain->next;
}
if (ret != MY_INTERRUPT)
error "None of the drivers accepted the interrupt";
}
}