mmap in linux-kernel with vm_insert_page() - linux

I am trying to understand mmap() functionality which can map page-by-page to the user space address.I am mapping these pages using the vm_insert_page(). This is returning with out any error. But when from the user space tried to write into the mapped memory, i'm getting the null pointer dereference. I am posting the code below. The system is x86,32 bit system.
The mmap() anf fault() function is posted.
static int mmap(struct file *filp , struct vm_area_struct *vma)
{
vma->vm_ops = &vm_ops;
vma->vm_flags |= VM_MIXEDMAP;
vm_open(vma);
return 0;
}
static int vm_fault(struct vm_area_struct *vma,struct vm_fault *vmf)
{
struct page *page=NULL;
page = alloc_page(GFP_HIGHUSER);//Both the alloc_pages() are working fine.
if(!page)
{
printk("The allocation of pagee memory is failed\n");
return -VM_FAULT_OOM;
}
// get_page(page);
if(vm_insert_page(vma,vmf->virtual_address,(page)))
printk("Thepage table creation is failed\n");
printk("The vm_insert of page is done\n");
return 0;
}
Thanks in advance.

Related

Could I/O memory access be used inside ISR under Linux (ARM)?

I'm creating driver for communication with FPGA under Linux. FPGA is connected via GPMC interface. When I tested read/write from driver context - everithing works perfectly. But the problem is that I need to read some address on interrupt. So I created interrupt handler, registred it and put iomemory reading in it (readw function). But when interrupt is fired - only zero's are readed. I tested every part of driver from the top to the bottom and it seems like the problem is in iomemory access inside ISR. When I replaced io access with constant value - it successfully passed to user-level application.
ARM version: armv7a (Cortex ARM-A8 (DM3730))
Compiler: CodeSourcery 2014.05
Here is some code from driver which represents performed actions:
// Request physical memory region for FPGA address IO
void* uni_PhysMem_request(const unsigned long addr, const unsigned long size) {
// Handle to be returned
void* handle = NULL;
// Check if memory region successfully requested (mapped to module)
if (!request_mem_region(addr, size, moduleName)) {
printk(KERN_ERR "\t\t\t\t%s() failed to request_mem_region(0x%p, %lu)\n", __func__, (void*)addr, size);
}
// Remap physical memory
if (!(handle = ioremap(addr, size))) {
printk(KERN_ERR "\t\t\t\t%s() failed to ioremap(0x%p, %lu)\n", __func__, (void*)addr, size);
}
// Return virtual address;
return handle;
}
// ...
// ISR
static irqreturn_t uni_IRQ_handler(int irq, void *dev_id) {
size_t readed = 0;
if (irq == irqNumber) {
printk(KERN_DEBUG "\t\t\t\tIRQ handling...\n");
printk(KERN_DEBUG "\t\t\t\tGPIO %d pin is %s\n", irqGPIOPin, ((gpio_get_value(irqGPIOPin) == 0) ? "LOW" : "HIGH"));
// gUniAddr is a struct which holds GPMC remapped virtual address (from uni_PhysMem_request), offset and read size
if ((readed = uni_ReadBuffer_IRQ(gUniAddr.gpmc.addr, gUniAddr.gpmc.offset, gUniAddr.size)) < 0) {
printk(KERN_ERR "\t\t\t\tunable to read data\n");
}
else {
printk(KERN_INFO "\t\t\t\tdata readed success (%zu bytes)\n", readed);
}
}
return IRQ_HANDLED;
}
// ...
// Read buffer by IRQ
ssize_t uni_ReadBuffer_IRQ(void* physAddr, unsigned long physOffset, size_t buffSize) {
size_t size = 0;
size_t i;
for (i = 0; i < buffSize; i += 2) {
size += uni_RB_write(readw(physAddr + physOffset)); // Here readed value sent to ring buffer. When "readw" replaced with any constant - everything OK
}
return size;
}
Looks like the problem was in code optimizations. I changed uni_RB_write function to pass physical address and data size, also read now performed via ioread16_rep function. So now everything works just fine.

How to dereference device_private in struct device

I'm working on a driver in Linux. I'm working on getting some /sys file attributes in place that will make things nicer. In delivering what these attributes are to tell, the attribute functions must have access to some data that's stored by the driver. Because of how things appear to be made and stored, I thought I could use the device_private *p member of the struct device that comes from device_create(). Basically, it's like this:
for (i = 0; i < total; i++) {
pDevice = device_create(ahcip_class, NULL, /*no parent*/
MKDEV(AHCIP_MAJOR, AHCIP_MINOR + i), NULL, /*no additional info*/
DRIVER_NAME "%d", AHCIP_MINOR + i);
if (IS_ERR(pDevice)) {
ret = PTR_ERR(pDevice);
printk(KERN_ERR "%s:%d device_create failed AHCIP_MINOR %d\n",
__func__, __LINE__, (AHCIP_MINOR + i));
break;
}
mydevs[i].psysfs_dev = pDevice;
ret = sysfs_create_group(&pDevice->kobj, &attr_group);
if (!ret) {
pr_err("%s:%d failed in making the device attributes\n",
__func__, __LINE__);
goto build_udev_quick_out;
}
}
This doesn't yet show the assignment into the device_private pointer, but that's where I'm headed. Each new device made under this class will need the same attributes thus the group. Here's my single attribute that I'm starting with for "proof of concept"
static ahcip_dev *get_ahcip_dev(struct kobject *ko)
{
ahcip_dev *adev = NULL;
struct device *pdev = container_of(ko, struct device, kobj);
if (!pdev) {
pr_err("%s:%d unable to find device struct in kobject\n",
__func__, __LINE__);
return NULL;
}
/* **** problem dereferencing p **** */
adev = (ahcip_dev*)pdev->p->driver_data;
/* return the pointer anyway, but if it's null, print to klog */
if (!adev)
pr_err("%s:%d no ahcip_dev, private driver data is NULL\n",
__func__, __LINE__);
/* **** again problem dereferencing p **** */
return pdev->p->(ahcip_dev*)driver_data; // <--- problem here
}
static ssize_t pxis_show(struct kobject *kobj, struct kobj_attribute *attr,
char *buff)
{
u32 pi = 0;
ahcip_dev *adev = get_ahcip_dev(kobj);
/* get_ahcip_dev() will print what happened, this needs to return
* error code
*/
if (!adev)
return -EIO;
pi = adev->port_index;
return sprintf(buff, "%08x\n", get_port_reg(adev->hba->ports[pi], 0x10));
}
I figured that, since device_create() returns a struct device* and I'm using that to make the device group, the struct kobject* that is coming into pxis_show is the member of the device structure made by device_create. If this is true, then I should be able to stuff some private data into that object and use it when the /sys files are accessed. However, when the lines of code marked above dereference the p member I get dereferencing pointer of incomplete type from gcc. I've determined that it's the struct device_private member of struct device that is incomplete but why? Is there a different structure I should be using? This seems to be something truly internally by the kernel.
For assign private data for device, you need to use void *drvdata parameter to device_create(). After creation, data can be accessed via dev_get_drvdata(pdev).
struct device_private is internal for device implementation. From description of this structure (drivers/base/base.h):
Nothing outside of the driver core should ever touch these fields.

flush_cache_range() and flush_tlb_range() do not seem to work

Here is what I did:
A user space process uses malloc() to allocate memory on the heap and fills it with a specific pattern of characters and then spells out the address returned by the malloc().
The process id and the address of the memory chunk are passed to a kernel module that looks like this:
int init_module(void) {
int res = 0;
struct page *data_page;
struct task_struct *task = NULL;
struct vm_area_struct *next_vma;
struct mm_struct *mm;
task = pid_task(find_vpid(pid), PIDTYPE_PID);
if (pid != -1)
target_process_id = pid;
if (!task) {
printk("Could not find the task struct for process id %d\n", pid);
return 0;
} else {
printk("Found the task <%s>\n", task->comm);
}
mm = task->mm;
if (!mm) {
printk("Could not find the mmap struct for process id %d\n", pid);
return 0;
}
next_vma = find_vma(mm, addr);
down_read(&task->mm->mmap_sem);
res = get_user_pages(task, task->mm, addr, 1, 1, 1, &data_page, NULL);
if (res != 1) {
printk(KERN_INFO "get_user_pages error\n");
up_read(&task->mm->mmap_sem);
return 0;
} else {
printk("Found vma struct and it starts at: %lu\n", next_vma->vm_start);
}
flush_cache_range(next_vma,next_vma->vm_start,next_vma->vm_end);
flush_tlb_range(next_vma,next_vma->vm_start,next_vma->vm_end);
up_read(&task->mm->mmap_sem);
return 0;
}
I added printk() statement to the handle_mm_fault() function in the Linux kernel to track page faults caused by target_process_id (3rd line of code after variable definitions above). Something like this:
if (unlikely(current->pid == target_process_id))
printk("Target process <%d> generated a page fault at address %lu\n", current->pid, address);
Now, what I noticed is that the last printk() statement does not catch anything.
The function init_module is the initialization function for a kernel module. It is inserted into the running kernel using insmod...using the command insmod module.ko pid=<processId> addr=<address>
Any idea what might going wrong?

LInux Kernel API to find the vma corresponds to a virtual address

Is there any Kernel API to find the VMA corresponds to virtual address?
Example : if a have an address 0x13000 i need some function like below
struct vm_area_struct *vma = vma_corresponds_to (0x13000,task);
You're looking for find_vma in linux/mm.h.
/* Look up the first VMA which satisfies addr < vm_end, NULL if none. */
extern struct vm_area_struct * find_vma(struct mm_struct * mm, unsigned long addr);
This should do the trick:
struct vm_area_struct *vma = find_vma(task->mm, 0x13000);
if (vma == NULL)
return -EFAULT;
if (0x13000 >= vma->vm_end)
return -EFAULT;
Since v5.14-rc1, there is a new API in linux/mm.h called vma_lookup()
The code can now be reduced to the following:
struct vm_area_struct *vma = vma_lookup(task->mm, 0x13000);
if (!vma)
return -EFAULT;

Shared memory across processes on Linux/x86_64

I have a few questions on using shared memory with processes. I looked at several previous posts and couldn't glean the answers precisely enough. Thanks in advance for your help.
I'm using shm_open + mmap like below. This code works as intended with parent and child alternating to increment g_shared->count (the synchronization is not portable; it works only for certain memory models, but good enough for my case for now). However, when I change MAP_SHARED to MAP_ANONYMOUS | MAP_SHARED, the memory isn't shared and the program hangs since the 'flag' doesn't get flipped. Removing the flag confirms what's happening with each process counting from 0 to 10 (implying that each has its own copy of the structure and hence the 'count' field). Is this the expected behavior? I don't want the memory to be backed by a file; I really want to emulate what might happen if these were threads instead of processes (they need to be processes for other reasons).
Do I really need shm_open? Since the processes belong to the same hierarchy, can I just use mmap alone instead? I understand this would be fairly straightforward if there wasn't an 'exec,' but how do I get it to work when there is an 'exec' following the 'fork?'
I'm using kernel version 3.2.0-23 on x86_64 (Intel i7-2600). For this implementation, does mmap give the same behavior (correctness as well as performance) as shared memory with pthreads sharing the same global object? For example, does the MMU map the segment with 'cacheable' MTRR/TLB attributes?
Is the cleanup_shared() code correct? Is it leaking any memory? How could I check? For example, is there an equivalent of System V's 'ipcs?'
thanks,
/Doobs
shmem.h:
#ifndef __SHMEM_H__
#define __SHMEM_H__
//includes
#define LEN 1000
#define ITERS 10
#define SHM_FNAME "/myshm"
typedef struct shmem_obj {
int count;
char buff[LEN];
volatile int flag;
} shmem_t;
extern shmem_t* g_shared;
extern char proc_name[100];
extern int fd;
void cleanup_shared() {
munmap(g_shared, sizeof(shmem_t));
close(fd);
shm_unlink(SHM_FNAME);
}
static inline
void init_shared() {
int oflag;
if (!strcmp(proc_name, "parent")) {
oflag = O_CREAT | O_RDWR;
} else {
oflag = O_RDWR;
}
fd = shm_open(SHM_FNAME, oflag, (S_IREAD | S_IWRITE));
if (fd == -1) {
perror("shm_open");
exit(EXIT_FAILURE);
}
if (ftruncate(fd, sizeof(shmem_t)) == -1) {
perror("ftruncate");
shm_unlink(SHM_FNAME);
exit(EXIT_FAILURE);
}
g_shared = mmap(NULL, sizeof(shmem_t),
(PROT_WRITE | PROT_READ),
MAP_SHARED, fd, 0);
if (g_shared == MAP_FAILED) {
perror("mmap");
cleanup_shared();
exit(EXIT_FAILURE);
}
}
static inline
void proc_write(const char* s) {
fprintf(stderr, "[%s] %s\n", proc_name, s);
}
#endif // __SHMEM_H__
shmem1.c (parent process):
#include "shmem.h"
int fd;
shmem_t* g_shared;
char proc_name[100];
void work() {
int i;
for (i = 0; i &lt ITERS; ++i) {
while (g_shared->flag);
++g_shared->count;
sprintf(g_shared->buff, "%s: %d", proc_name, g_shared->count);
proc_write(g_shared->buff);
g_shared->flag = !g_shared->flag;
}
}
int main(int argc, char* argv[], char* envp[]) {
int status, child;
strcpy(proc_name, "parent");
init_shared(argv);
fprintf(stderr, "Map address is: %p\n", g_shared);
if (child = fork()) {
work();
waitpid(child, &status, 0);
cleanup_shared();
fprintf(stderr, "Parent finished!\n");
} else { /* child executes shmem2 */
execvpe("./shmem2", argv + 2, envp);
}
}
shmem2.c (child process):
#include "shmem.h"
int fd;
shmem_t* g_shared;
char proc_name[100];
void work() {
int i;
for (i = 0; i &lt ITERS; ++i) {
while (!g_shared->flag);
++g_shared->count;
sprintf(g_shared->buff, "%s: %d", proc_name, g_shared->count);
proc_write(g_shared->buff);
g_shared->flag = !g_shared->flag;
}
}
int main(int argc, char* argv[], char* envp[]) {
int status;
strcpy(proc_name, "child");
init_shared(argv);
fprintf(stderr, "Map address is: %p\n", g_shared);
work();
cleanup_shared();
return 0;
}
Passing MAP_ANONYMOUS causes the kernel to ignore your file descriptor argument and give you a private mapping instead. That's not what you want.
Yes, you can create an anonymous shared mapping in a parent process, fork, and have the child process inherit the mapping, sharing the memory with the parent and any other children. That obvoiusly doesn't survive an exec() though.
I don't understand this question; pthreads doesn't allocate memory. The cacheability will depend on the file descriptor you mapped. If it's a disk file or anonymous mapping, then it's cacheable memory. If it's a video framebuffer device, it's probably not.
That's the right way to call munmap(), but I didn't verify the logic beyond that. All processes need to unmap, only one should call unlink.
2b) as a middle-ground of a sort, it is possible to call:
int const shm_fd = shm_open(fn,...);
shm_unlink(fn);
in a parent process, and then pass fd to a child process created by fork()/execve() via argp or envp. since open file descriptors of this type will survive the fork()/execve(), you can mmap the fd in both the parent process and any dervied processes. here's a more complete code example copied and simplified/sanitized from code i ran successfully under Ubuntu 12.04 / linux kernel 3.13 / glibc 2.15:
int create_shm_fd( void ) {
int oflags = O_RDWR | O_CREAT | O_TRUNC;
string const fn = "/some_shm_fn_maybe_with_pid";
int fd;
neg_one_fail( fd = shm_open( fn.c_str(), oflags, S_IRUSR | S_IWUSR ), "shm_open" );
if( fd == -1 ) { rt_err( strprintf( "shm_open() failed with errno=%s", str(errno).c_str() ) ); }
// for now, we'll just pass the open fd to our child process, so
// we don't need the file/name/link anymore, and by unlinking it
// here we can try to minimize the chance / amount of OS-level shm
// leakage.
neg_one_fail( shm_unlink( fn.c_str() ), "shm_unlink" );
// by default, the fd returned from shm_open() has FD_CLOEXEC
// set. it seems okay to remove it so that it will stay open
// across execve.
int fd_flags = 0;
neg_one_fail( fd_flags = fcntl( fd, F_GETFD ), "fcntl" );
fd_flags &= ~FD_CLOEXEC;
neg_one_fail( fcntl( fd, F_SETFD, fd_flags ), "fcntl" );
// resize the shm segment for later mapping via mmap()
neg_one_fail( ftruncate( fd, 1024*1024*4 ), "ftruncate" );
return fd;
}
it's not 100% clear to me if it's okay spec-wise to remove the FD_CLOEXEC and/or assume that after doing so the fd really will survive the exec. the man page for exec is unclear; it says: "POSIX shared memory regions are unmapped", but to me that's redundant with the general comments earlier that mapping are not preserved, and doesn't say that shm_open()'d fd will be closed. any of course there's the fact that, as i mentioned, the code does seem to work in at least one case.
the reason i might use this approach is that it would seem to reduce the chance of leaking the shared memory segment / filename, and it makes it clear that i don't need persistence of the memory segment.

Resources