Linux device driver - memory mapped I/O example discussion - linux

I have gone through the following topic and I still have some questions.
ioread32 followed by iowrite32 not giving same value
In the link, where can I get my base which is defined as 0xfed00000
in the post ?
what should I put for the second parameter in
void request_mem_region(unsigned long start, unsigned long len,char *name);
what should I put for the second parameter in
void *ioremap_nocache(unsigned long phys_addr, unsigned long size);
By having the Makefile and generating the kernel module, I should use the insmod and then dmesg to check if the code works as I expect, is this correct ?
In the case, should I add iounmap(virtual_base); before return 0; in the source ?
Thanks

In the link, where can I get my base which is defined as 0xfed00000 in the post ?
It's the base (physical) address of the peripheral's registers.
If the peripheral is a discrete chip on the board, then consult the board documentation.
If the peripheral is embedded in a SoC, then consult the memory map in the SoC datasheet.
what should I put for the second parameter in
void request_mem_region(unsigned long start, unsigned long len,char *name);
what should I put for the second parameter in
void *ioremap_nocache(unsigned long phys_addr, unsigned long size);
These two routines should be called with the same first and second parameters.
The length/size is the number of bytes the peripheral's register set occupies.
Sometimes the entire memory region to the next peripheral is specified.
By having the Makefile and generating the kernel module, I should use the insmod and then dmesg to check if the code works as I expect, is this correct ?
A judicious sprinkling of printk() statements is the tried & true method of testing a Linux kernel driver/module.
Unix has kdb.
In the case, should I add iounmap(virtual_base); before return 0; in the source ?
Do not copy that poorly written example of init code.
If ioremap() is performed in a driver's probe() (or other initialization) routine, then the iounmap() should be in the probe's error exit sequence and in the driver's remove() (or the complementary to init) routine.
There are numerous examples to study in the Linux kernel source. Use an online Linux cross reference such as http://lxr.free-electrons.com/source/
Note that almost all Linux drivers use iounmap() two or more times.

Related

Getting ENOTTY on ioctl for a Linux Kernel Module

I have the following chardev defined:
.h
#define MAJOR_NUM 245
#define MINOR_NUM 0
#define IOCTL_MY_DEV1 _IOW(MAJOR_NUM, 0, unsigned long)
#define IOCTL_MY_DEV2 _IOW(MAJOR_NUM, 1, unsigned long)
#define IOCTL_MY_DEV3 _IOW(MAJOR_NUM, 2, unsigned long)
module .c
static long device_ioctl(
struct file* file,
unsigned int ioctl_num,
unsigned long ioctl_param)
{
...
}
static int device_open(struct inode* inode, struct file* file)
{
...
}
static int device_release(struct inode* inode, struct file* file)
{
...
}
struct file_operations Fops = {
.open=device_open,
.unlocked_ioctl= device_ioctl,
.release=device_release
};
static int __init my_dev_init(void)
{
register_chrdev(MAJOR_NUM, "MY_DEV", &Fops);
...
}
module_init(my_dev_init);
My user code
ioctl(fd, IOCTL_MY_DEV1, 1);
Always fails with same error: ENOTTY
Inappropriate ioctl for device
I've seen similar questions:
i.e
Linux kernel module - IOCTL usage returns ENOTTY
Linux Kernel Module/IOCTL: inappropriate ioctl for device
But their solutions didn't work for me
ENOTTY is issued by the kernel when your device driver has not registered a ioctl function to be called. I'm afraid your function is not well registered, probably because you have registered it in the .unlocked_ioctl field of the struct file_operations structure.
Probably you'll get a different result if you register it in the locked version of the function. The most probable cause is that the inode is locked for the ioctl call (as it should be, to avoid race conditions with simultaneous read or write operations to the same device)
Sorry, I have no access to the linux source tree for the proper name of the field to use, but for sure you'll be able to find it yourself.
NOTE
I observe that you have used macro _IOW, using the major number as the unique identifier. This is probably not what you want. First parameter for _IOW tries to ensure that ioctl calls get unique identifiers. There's no general way to acquire such identifiers, as this is an interface contract you create between application code and kernel code. So using the major number is bad practice, for two reasons:
Several devices (in linux, at least) can share the same major number (minor allocation in linux kernel allows this) making it possible for a clash between devices' ioctls.
In case you change the major number (you configure a kernel where that number is already allocated) you have to recompile all your user level software to cope with the new device ioctl ids (all of them change if you do this)
_IOW is a macro built a long time ago (long ago from the birth of linux kernel) that tried to solve this problem, by allowing you to select a different character for each driver (but not dependant of other kernel parameters, for the reasons pointed above) for a device having ioctl calls not clashing with another device driver's. The probability of such a clash is low, but when it happens you can lead to an incorrect machine state (you have issued a valid, working ioctl call to the wrong device)
Ancient unix (and early linux) kernels used different chars to build these calls, so, for example, tty driver used 'T' as parameter for the _IO* macros, scsi disks used 'S', etc.
I suggest you to select a random number (not appearing elsewhere in the linux kernel listings) and then use it in all your devices (probably there will be less drivers you write than drivers in the kernel) and select a different ioctl id for each ioctl call. Maintaining a local ioctl file with the registered ioctls this way is far better than trying to guess a value that works always.
Also, a look at the definition of the _IO* macros should be very illustrative :)

Accessing memory pointers in hardware registers

I'm working on enhancing the stock ahci driver provided in Linux in order to perform some needed tasks. I'm at the point of attempting to issue commands to the AHCI HBA for the hard drive to process. However, whenever I do so, my system locks up and reboots. Trying to explain the process of issuing a command to an AHCI drive is far to much for this question. If needed, reference this link for the full discussion (the process is rather defined all over because there are several pieces, however, ch 4 has the data structures necessary).
Essentially, one writes the appropriate structures into memory regions defined by either the BIOS or the OS. The first memory region I should write to is the Command List Base Address contained in the register PxCLB (and PxCLBU if 64-bit addressing applies). My system is 64 bits and so I'm trying to getting both 32-bit registers. My code is essentially this:
void __iomem * pbase = ahci_port_base(ap);
u32 __iomem *temp = (u32*)(pbase + PORT_LST_ADDR);
struct ahci_cmd_hdr *cmd_hdr = NULL;
cmd_hdr = (struct ahci_cmd_hdr*)(u64)
((u64)(*(temp + PORT_LST_ADDR_HI)) << 32 | *temp);
pr_info("%s:%d cmd_list is %p\n", __func__, __LINE__, cmd_hdr);
// problems with this next line, makes the system reboot
//pr_info("%s:%d cl[0]:0x%08x\n", __func__, __LINE__, cmd_hdr->opts);
The function ahci_port_base() is found in the ahci driver (at least it is for CentOS 6.x). Basically, it returns the proper address for that port in the AHCI memory region. PORT_LST_ADDR and PORT_LST_ADDR_HI are both macros defined in that driver. The address that I get after getting both the high and low addresses is usually something like 0x0000000037900000. Is this memory address in a space that I cannot simply dereference it?
I'm hitting my head against the wall at this point because this link shows that accessing it in this manner is essentially how it's done.
The address that I get after getting both the high and low addresses
is usually something like 0x0000000037900000. Is this memory address
in a space that I cannot simply dereference it?
Yes, you are correct - that's a bus address, and you can't just dereference it because paging is enabled. (You shouldn't be just dereferencing the iomapped addresses either - you should be using readl() / writel() for those, but the breakage here is more subtle).
It looks like the right way to access the ahci_cmd_hdr in that driver is:
struct ahci_port_priv *pp = ap->private_data;
cmd_hdr = pp->cmd_slot;

How do I use performance counters inside of the kernel?

I want to access performance counters inside the kernel. I found many ways to use performance counters in user space, but can you tell me some way to use those in kernel space.
Please don't specify tool name, I want to write my own code, preferably a kernel module. I am using Ubuntu with kernel 3.18.1.
http://www.cise.ufl.edu/~sb3/files/pmc.pdf
http://www.cs.inf.ethz.ch/stricker/lab/doc/intel-part4.pdf
The first pdf contains description on how to use pmc.
The second contains the address of perfeventsel0 and perfeventsel1.
Ive shown an example below.U'll need to set the event number and umask as per ur requirement.
void SetUpEvent(void){
int reg_addr=0x186;
int event_no=0x0024;
int umask=0x3F00;
int enable_bits=0x430000;
int event=event_no | umask | enable_bits;
__asm__ ("wrmsr" : : "c"(reg_addr), "a"(event), "d"(0x00));
}
/* Read the performance monitor counter */
long int ReadCounter(void){
long int count;
long int eax_low, edx_high;
int reg_addr=0xC1;
__asm__("rdmsr" : "=a"(eax_low), "=d"(edx_high) : "c"(reg_addr));
count = ((long int)eax_low | (long int)edx_high<<32);
return count;
}
You should check if you CPU and other HW support you needs. Try look into oprofile source code. It have kernel module and userspace api. You can for example cut part of interesting code from oprofile kernel module part and use it into you module. I gues you module should have several reader or listeners with circle buffers for events keeping. You can also look inside linux/drivers/oprofile and to correspond linux/arch/.../oprofile. Inside make menuconfig you can config it like module or build-in and add additional timers. Available events and counters you can find under oprofile/events/ of oprofile tool (TLB_MISS, CPU_CYCLES, CYCLES_DATA_STALL, ...).
ARM Performance monitoring register
Under linux/arch/arm64/kernel/perf_regs.c you can find arm specific details.

alloc_pages() is paired by __free_pages()

I read the book "Linux Kernel Development", and find some functions that make me confused, listed as bellow:
struct page *alloc_pages(gfp_t gfp_mask, unsigned int order)
void __free_pages(struct page *page, unsigned int order)
unsigned long __get_free_pages(gfp_t gfp_mask, unsigned int order)
void free_pages(unsigned long addr, unsigned int order)
The problem is the use of the two underline in the function name, and how the function pairs.
1. when will the linux kernel uses two underline in its function name?
2. why alloc_pages is paired with __free_pages, but not free_pages?
As you can notice:
alloc_pages() / __free_pages() takes "page *" (page descriptor) as argument.
They are ususally used internally by some infrastrcture kernel code, like page fault handler, which wish to manipulate page descriptor instead of memory block content.
__get_free_pages() / free_pages() takes "unsigned long" (virtual address of memory block) as argument
They could be used by code which wish to use the memory block itself, after allocation, you can read / write to this memory block.
As for their name and double underscore "__", you don't need to bother too much. Sometimes kernel functions were named casually without too much consideration when they were first written. And when people think of that the names are not proper, but later those functions are already used wildly in kernel, and kernel guys are simply lazy to change them.

mprotect() like functionality within Linux kernel

I am in a Linux kernel module, and I allocate some memory with, say, vmalloc(). I want to make the memory have read, write, and execute permission. What is the clean and appropriate way of doing that? Basically, this is generally the equivalent of calling mprotect(), but in kernel space.
If I do the page walk, pgd_offset(), pud_offset(), pmd_offset(), pte_offset_map(), and then pte_mkwrite(), I run into linking errors when I tried it on 2.6.39. Also, it seems that if I am doing the page walk, it is a hack, and there ought to be a cleaner and more appropriate method.
My kernel module will be a loadable module, so internal symbols are not available to me.
Thanks, in advance, for your guidance.
There is a good answer to this question here: https://unix.stackexchange.com/questions/450557/is-there-any-function-analogous-to-mprotect-in-the-linux-kernel.
asm-generic/set_memory.h:int set_memory_ro(unsigned long addr, int numpages);
asm-generic/set_memory.h:int set_memory_rw(unsigned long addr, int numpages);
asm-generic/set_memory.h:int set_memory_x(unsigned long addr, int numpages);
asm-generic/set_memory.h:int set_memory_nx(unsigned long addr, int numpages);
they are defined here: https://elixir.bootlin.com/linux/v4.3/source/arch/x86/include/asm/cacheflush.h#L47
Have you tried by invoking do_mprotect() [kernel function corresponding to mprotect()] directly ?

Resources