Is there a way to find the file names of files mapped to the virtual memory area of a process in the linux kernel? - linux

Been working on a project for a few weeks now and I've hit a pretty significant roadblock and I was hoping somebody here might be able to offer some guidance.
All I need to do is write a system call that reports statistics of a process’s virtual address space when called. Those statistics, according to the assignment criteria, need to include the size of the process’s virtual address space, each virtual memory area’s access permissions, and the names of files mapped to these virtual memory areas.
The first two I have working, the last appears to not be possible, at least from what my research and attempts so far have turned up. I've isolated it down to accessing the vm_file struct within the vm_area_struct of the process and using that to get to the f_path, but past that I'm still stuck on how to get from there to a format that can actually be put into a printk statement, and everything I've tried hasn't output anything when I finally get the kernel recompiled.
Here's where the code sits at the moment. Am I even on the right track?
#include <linux/kernel.h>
#include <linux/sched.h>
#include <linux/mm_types.h>
#include <linux/module.h>
#include <linux/fs.h>
#include <linux/path.h>
#include <linux/dcache.h>
asmlinkage int sys_project3a1(unsigned int processID)
{
struct task_struct *task;
for_each_process(task)
{
if (task->pid == processID)
{
unsigned long virtualAddressSpace = 0;
struct vm_area_struct *vmlist;
printk("Process ID: %d", task->pid);
for (vmlist = task->mm->mmap; vmlist!=NULL; vmlist=vmlist->vm_next)
{
unsigned long space = vmlist->vm_end - vmlist->vm_start;
char *tmp;
char *pathname;
struct file *file;
struct path *path;
printk("Process Access Permissions: %lu", (unsigned long)(vmlist->vm_page_prot.pgprot));
file = vmlist->vm_file;
path = &file->f_path;
path_get(path);
tmp = (char *)__get_free_page(GFP_TEMPORARY);
pathname = d_path(path, tmp, PAGE_SIZE);
printk("Path Name: %s", pathname);
free_page((unsigned long)tmp);
virtualAddressSpace += space;
}
printk("Process Virtual Address Space: %lu", virtualAddressSpace);
}
}
return 1;
}

I was in a similar situation and had to find this on my own. Hope others who come here benefit from my answer.
So I am assuming you want the values corresponding to "Mapping" column of a pmap command ouput. The following works for me(tried on v4.6.4):
char filename[50];
if(vmlist->vm_file){
strcpy(filename, vmlist->vm_file->f_path.dentry->d_iname);
}
else{ // implies an anonymous mapping i.e. not file backup'ed
strcpy(filename, "[ anon ]");
}
After getting to the path, follow it's dentry field and then d_iname which is the mapped file's name. Doesn't look quite pretty, but does the job.

struct vm_area_struct *vas;
vas->vm_file->f_path.dentry->d_name.name

Related

Why does this code works well? It changes the Constant storage area string;

this is a simple linux kernel module code to reverse a string which should Oops after insmod,but it works well,why?
#include <linux/init.h>
#include <linux/module.h>
#include <linux/kernel.h>
static char *words = "words";
static int __init words_init(void)
{
printk(KERN_INFO "debug info\n");
int len = strlen(words);
int k;
for ( k = 0; k < len/2; k++ )
{
printk("the value of k %d\n",k);
char a = words[k];
words[k]= words[len-1-k];
words[len-1-k]=a;
}
printk(KERN_INFO "words is %s\n", words);
return 0;
}
static void __exit words_exit(void)
{
printk(KERN_INFO "words exit.\n");
}
module_init(words_init);
module_exit(words_exit);
module_param(words, charp, S_IRUGO);
MODULE_LICENSE("GPL");
static != const.
static in your example means "only visible in the current file". See What does "static" mean?
const just means "the code which can see this declaration isn't supposed to change the variable". But in certain cases, you can still create a non-const pointer to the same address and modify the contents.
Space removal from a String - In Place C style with Pointers suggests that writing to the string value shouldn't be possible.
Since static is the only difference between your code and that of the question, I looked further and found this: Global initialized variables declared as "const" go to text segment, while those declared "Static" go to data segment. Why?
Try static const char *word, that should do the trick.
Haha!,I get the answer from the linux kernel source code by myself;When you use the insmod,it will call the init_moudle,load_module,strndup_usr then memdup_usr function;the memdup_usr function will use kmalloc_track_caller to alloc memery from slab and then use the copy_from_usr to copy the module paragram into kernel;this mean the linux kernel module paragram store in heap,not in the constant storage area!! So we can change it's content!

Why mm_struct->start_stack and vm_area_struct->start don't point to the same address?

As far as I understand memory management in Linux kernel, there is a mm_struct structure responsible for address space in each process. One important memory region is stack. This should be identified by vm_area_struct memory region and mm_struct itself has a pointer mm_struct->stack_start which is stack's address.
I came accross the code below and what I cannot understand is why any of the memory region start/end addresses are not equal to mm_struct->stack_start value. Any help in understanding this would be very much appreciated. Thanks
Some of the results of loading the compiled kernel module:
Vma number 14: Starts at 0x7fff4bb68000, Ends at 0x7fff4bb8a000
Vma number 15: Starts at 0x7fff4bbfc000, Ends at 0x7fff4bbfe000
Vma number 16: Starts at 0x7fff4bbfe000, Ends at 0x7fff4bc00000
Code Segment start = 0x400000, end = 0x400854
Data Segment start = 0x600858, end = 0x600a94
Stack Segment start = 0x7fff4bb88420
One can find that stack segment start (0x7fff4bb88420) belongs to the vma number 14 but I don't know the addresses are different.
Kernel module source code:
#include <linux/init.h>
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/sched.h>
#include <linux/mm.h>
static int pid_mem = 1;
static void print_mem(struct task_struct *task)
{
struct mm_struct *mm;
struct vm_area_struct *vma;
int count = 0;
mm = task->mm;
printk("\nThis mm_struct has %d vmas.\n", mm->map_count);
for (vma = mm->mmap ; vma ; vma = vma->vm_next) {
printk ("\nVma number %d: \n", ++count);
printk(" Starts at 0x%lx, Ends at 0x%lx\n",
vma->vm_start, vma->vm_end);
}
printk("\nCode Segment start = 0x%lx, end = 0x%lx \n"
"Data Segment start = 0x%lx, end = 0x%lx\n"
"Stack Segment start = 0x%lx\n",
mm->start_code, mm->end_code,
mm->start_data, mm->end_data,
mm->start_stack);
}
static int mm_exp_load(void){
struct task_struct *task;
printk("\nGot the process id to look up as %d.\n", pid_mem);
for_each_process(task) {
if ( task->pid == pid_mem) {
printk("%s[%d]\n", task->comm, task->pid);
print_mem(task);
}
}
return 0;
}
static void mm_exp_unload(void)
{
printk("\nPrint segment information module exiting.\n");
}
module_init(mm_exp_load);
module_exit(mm_exp_unload);
module_param(pid_mem, int, 0);
MODULE_AUTHOR ("Krishnakumar. R, rkrishnakumar#gmail.com");
MODULE_DESCRIPTION ("Print segment information");
MODULE_LICENSE("GPL");
Looks like start_stack is the initial stack pointer address. It's calculated by the kernel when the program is executed and is based on the stack section address given in the executable file. I don't think it gets updated at all thereafter. The system uses start_stack in at least one instance: to identify which vma represents "the stack" (when providing /proc/<pid>/maps), as the vma containing that address is guaranteed to contain the (main) stack.
But note that this is only the stack for the "main" (initial) thread; a multi-threaded program will have other stacks too -- one per thread. Since they all share the same address space, all threads will show the same set of vmas, and I think you'll find they all have the same start_stack value as well. But only the main thread's stack pointer will be within the main stack vma. The other threads will each have their own stack vmas -- this is so that each thread's stack can grow independently.
In general, there is one mm_struct for a process, but many vm_area_struct and each responds for a mmaped area.
For example, in a 32-bit system, a process have a virtual address space of 4GB, all of which is pointed by the mm_struct. However, there can be many regions within the 4GB space. Each of the region is pointed by a vm_area_struct, and this region is limited by the vm_area_struct->start and vm_area_struct->end. So, obviously the mm_struct struct contains a list of vm_area_struct.
Here is the detail introduction.

Are memory addresses reported in valgrind absolute, physical addresses?

While running an MPI program with valgrind using mpirun -np 3 valgrind test, I noticed that the addresses of malloc/calloc'ed arrays are sometimes identical for different processes. This would lead me to believe that the addresses reported by Valgrind are either not absolute or do not correspond to the physical memory addresses -- which would make sense, as it uses its own allocator. Could anyone confirm this, or tell me which trivial insight I'm missing here? Thank you.
The code:
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
#include <assert.h>
int main(int argc, char* argv[])
{
int rank, nproc;
/* first let MPI strip off its MPI stuff: */
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &nproc);
double* a;
int n = 40;
assert(a = calloc(n, sizeof(double)));
printf("Rank %d: Address of a = %p\n",rank,a);
free(a);
MPI_Finalize();
return 0;
}
Example output:
Rank 0: Address of a = 0x6dad300
Rank 1: Address of a = 0x67a8800
Rank 2: Address of a = 0x67a8800
In modern processors / operating systems, memory is mapped into the process space of a program. The program has no notion (or interest in) the actual physical memory address of the space it's using--only device drivers really have the need.

Is there any API for determining the physical address from virtual address in Linux?

Is there any API for determining the physical address from virtual address in Linux operating system?
Kernel and user space work with virtual addresses (also called linear addresses) that are mapped to physical addresses by the memory management hardware. This mapping is defined by page tables, set up by the operating system.
DMA devices use bus addresses. On an i386 PC, bus addresses are the same as physical addresses, but other architectures may have special address mapping hardware to convert bus addresses to physical addresses.
In Linux, you can use these functions from asm/io.h:
virt_to_phys(virt_addr);
phys_to_virt(phys_addr);
virt_to_bus(virt_addr);
bus_to_virt(bus_addr);
All this is about accessing ordinary memory. There is also "shared memory" on the PCI or ISA bus. It can be mapped inside a 32-bit address space using ioremap(), and then used via the readb(), writeb() (etc.) functions.
Life is complicated by the fact that there are various caches around, so that different ways to access the same physical address need not give the same result.
Also, the real physical address behind virtual address can change. Even more than that - there could be no address associated with a virtual address until you access that memory.
As for the user-land API, there are none that I am aware of.
/proc/<pid>/pagemap userland minimal runnable example
virt_to_phys_user.c
#define _XOPEN_SOURCE 700
#include <fcntl.h> /* open */
#include <stdint.h> /* uint64_t */
#include <stdio.h> /* printf */
#include <stdlib.h> /* size_t */
#include <unistd.h> /* pread, sysconf */
typedef struct {
uint64_t pfn : 55;
unsigned int soft_dirty : 1;
unsigned int file_page : 1;
unsigned int swapped : 1;
unsigned int present : 1;
} PagemapEntry;
/* Parse the pagemap entry for the given virtual address.
*
* #param[out] entry the parsed entry
* #param[in] pagemap_fd file descriptor to an open /proc/pid/pagemap file
* #param[in] vaddr virtual address to get entry for
* #return 0 for success, 1 for failure
*/
int pagemap_get_entry(PagemapEntry *entry, int pagemap_fd, uintptr_t vaddr)
{
size_t nread;
ssize_t ret;
uint64_t data;
uintptr_t vpn;
vpn = vaddr / sysconf(_SC_PAGE_SIZE);
nread = 0;
while (nread < sizeof(data)) {
ret = pread(pagemap_fd, ((uint8_t*)&data) + nread, sizeof(data) - nread,
vpn * sizeof(data) + nread);
nread += ret;
if (ret <= 0) {
return 1;
}
}
entry->pfn = data & (((uint64_t)1 << 55) - 1);
entry->soft_dirty = (data >> 55) & 1;
entry->file_page = (data >> 61) & 1;
entry->swapped = (data >> 62) & 1;
entry->present = (data >> 63) & 1;
return 0;
}
/* Convert the given virtual address to physical using /proc/PID/pagemap.
*
* #param[out] paddr physical address
* #param[in] pid process to convert for
* #param[in] vaddr virtual address to get entry for
* #return 0 for success, 1 for failure
*/
int virt_to_phys_user(uintptr_t *paddr, pid_t pid, uintptr_t vaddr)
{
char pagemap_file[BUFSIZ];
int pagemap_fd;
snprintf(pagemap_file, sizeof(pagemap_file), "/proc/%ju/pagemap", (uintmax_t)pid);
pagemap_fd = open(pagemap_file, O_RDONLY);
if (pagemap_fd < 0) {
return 1;
}
PagemapEntry entry;
if (pagemap_get_entry(&entry, pagemap_fd, vaddr)) {
return 1;
}
close(pagemap_fd);
*paddr = (entry.pfn * sysconf(_SC_PAGE_SIZE)) + (vaddr % sysconf(_SC_PAGE_SIZE));
return 0;
}
int main(int argc, char **argv)
{
pid_t pid;
uintptr_t vaddr, paddr = 0;
if (argc < 3) {
printf("Usage: %s pid vaddr\n", argv[0]);
return EXIT_FAILURE;
}
pid = strtoull(argv[1], NULL, 0);
vaddr = strtoull(argv[2], NULL, 0);
if (virt_to_phys_user(&paddr, pid, vaddr)) {
fprintf(stderr, "error: virt_to_phys_user\n");
return EXIT_FAILURE;
};
printf("0x%jx\n", (uintmax_t)paddr);
return EXIT_SUCCESS;
}
GitHub upstream.
Usage:
sudo ./virt_to_phys_user.out <pid> <virtual-address>
sudo is required to read /proc/<pid>/pagemap even if you have file permissions as explained at: https://unix.stackexchange.com/questions/345915/how-to-change-permission-of-proc-self-pagemap-file/383838#383838
As mentioned at: https://stackoverflow.com/a/46247716/895245 Linux allocates page tables lazily, so make sure that you read and write a byte to that address from the test program before using virt_to_phys_user.
How to test it out
Test program:
#define _XOPEN_SOURCE 700
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
enum { I0 = 0x12345678 };
static volatile uint32_t i = I0;
int main(void) {
printf("vaddr %p\n", (void *)&i);
printf("pid %ju\n", (uintmax_t)getpid());
while (i == I0) {
sleep(1);
}
printf("i %jx\n", (uintmax_t)i);
return EXIT_SUCCESS;
}
The test program outputs the address of a variable it owns, and its PID, e.g.:
vaddr 0x600800
pid 110
and then you can pass convert the virtual address with:
sudo ./virt_to_phys_user.out 110 0x600800
Finally, the conversion can be tested by using /dev/mem to observe / modify the memory, but you can't do this on Ubuntu 17.04 without recompiling the kernel as it requires: CONFIG_STRICT_DEVMEM=n, see also: How to access physical addresses from user space in Linux? Buildroot is an easy way to overcome that however.
Alternatively, you can use a Virtual machine like QEMU monitor's xp command: How to decode /proc/pid/pagemap entries in Linux?
See this to dump all pages: How to decode /proc/pid/pagemap entries in Linux?
Userland subset of this question: How to find the physical address of a variable from user-space in Linux?
Dump all process pages with /proc/<pid>/maps
/proc/<pid>/maps lists all the addresses ranges of the process, so we can walk that to translate all pages: /proc/[pid]/pagemaps and /proc/[pid]/maps | linux
Kerneland virt_to_phys() only works for kmalloc() addresses
From a kernel module, virt_to_phys(), has been mentioned.
However, it is import to highlight that it has this limitation.
E.g. it fails for module variables. arc/x86/include/asm/io.h documentation:
The returned physical address is the physical (CPU) mapping for
the memory address given. It is only valid to use this function on
addresses directly mapped or allocated via kmalloc().
Here is a kernel module that illustrates that together with an userland test.
So this is not a very general possibility. See: How to get the physical address from the logical one in a Linux kernel module? for kernel module methods exclusively.
As answered before, normal programs should not need to worry about physical addresses as they run in a virtual address space with all its conveniences. Furthermore, not every virtual address has a physical address, the may belong to mapped files or swapped pages. However, sometimes it may be interesting to see this mapping, even in userland.
For this purpose, the Linux kernel exposes its mapping to userland through a set of files in the /proc. The documentation can be found here. Short summary:
/proc/$pid/maps provides a list of mappings of virtual addresses together with additional information, such as the corresponding file for mapped files.
/proc/$pid/pagemap provides more information about each mapped page, including the physical address if it exists.
This website provides a C program that dumps the mappings of all running processes using this interface and an explanation of what it does.
The suggested C program above usually works, but it can return misleading results in (at least) two ways:
The page is not present (but the virtual addressed is mapped to a page!). This happens due to lazy mapping by the OS: it maps addresses only when they are actually accessed.
The returned PFN points to some possibly temporary physical page which could be changed soon after due to copy-on-write. For example: for memory mapped files, the PFN can point to the read-only copy. For anonymous mappings, the PFN of all pages in the mapping could be one specific read-only page full of 0s (from which all anonymous pages spawn when written to).
Bottom line is, to ensure a more reliable result: for read-only mappings, read from every page at least once before querying its PFN. For write-enabled pages, write into every page at least once before querying its PFN.
Of course, theoretically, even after obtaining a "stable" PFN, the mappings could always change arbitrarily at runtime (for example when moving pages into and out of swap) and should not be relied upon.
I wonder why there is no user-land API.
Because user land memory's physical address is unknown.
Linux uses demand paging for user land memory. Your user land object will not have physical memory until it is accessed. When the system is short of memory, your user land object may be swapped out and lose physical memory unless the page is locked for the process. When you access the object again, it is swapped in and given physical memory, but it is likely different physical memory from the previous one. You may take a snapshot of page mapping, but it is not guaranteed to be the same in the next moment.
So, looking for the physical address of a user land object is usually meaningless.

Direct Memory Access in Linux

I'm trying to access physical memory directly for an embedded Linux project, but I'm not sure how I can best designate memory for my use.
If I boot my device regularly, and access /dev/mem, I can easily read and write to just about anywhere I want. However, in this, I'm accessing memory that can easily be allocated to any process; which I don't want to do
My code for /dev/mem is (all error checking, etc. removed):
mem_fd = open("/dev/mem", O_RDWR));
mem_p = malloc(SIZE + (PAGE_SIZE - 1));
if ((unsigned long) mem_p % PAGE_SIZE) {
mem_p += PAGE_SIZE - ((unsigned long) mem_p % PAGE_SIZE);
}
mem_p = (unsigned char *) mmap(mem_p, SIZE, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_FIXED, mem_fd, BASE_ADDRESS);
And this works. However, I'd like to be using memory that no one else will touch. I've tried limiting the amount of memory that the kernel sees by booting with mem=XXXm, and then setting BASE_ADDRESS to something above that (but below the physical memory), but it doesn't seem to be accessing the same memory consistently.
Based on what I've seen online, I suspect I may need a kernel module (which is OK) which uses either ioremap() or remap_pfn_range() (or both???), but I have absolutely no idea how; can anyone help?
EDIT:
What I want is a way to always access the same physical memory (say, 1.5MB worth), and set that memory aside so that the kernel will not allocate it to any other process.
I'm trying to reproduce a system we had in other OSes (with no memory management) whereby I could allocate a space in memory via the linker, and access it using something like
*(unsigned char *)0x12345678
EDIT2:
I guess I should provide some more detail. This memory space will be used for a RAM buffer for a high performance logging solution for an embedded application. In the systems we have, there's nothing that clears or scrambles physical memory during a soft reboot. Thus, if I write a bit to a physical address X, and reboot the system, the same bit will still be set after the reboot. This has been tested on the exact same hardware running VxWorks (this logic also works nicely in Nucleus RTOS and OS20 on different platforms, FWIW). My idea was to try the same thing in Linux by addressing physical memory directly; therefore, it's essential that I get the same addresses each boot.
I should probably clarify that this is for kernel 2.6.12 and newer.
EDIT3:
Here's my code, first for the kernel module, then for the userspace application.
To use it, I boot with mem=95m, then insmod foo-module.ko, then mknod mknod /dev/foo c 32 0, then run foo-user , where it dies. Running under gdb shows that it dies at the assignment, although within gdb, I cannot dereference the address I get from mmap (although printf can)
foo-module.c
#include <linux/module.h>
#include <linux/config.h>
#include <linux/init.h>
#include <linux/fs.h>
#include <linux/mm.h>
#include <asm/io.h>
#define VERSION_STR "1.0.0"
#define FOO_BUFFER_SIZE (1u*1024u*1024u)
#define FOO_BUFFER_OFFSET (95u*1024u*1024u)
#define FOO_MAJOR 32
#define FOO_NAME "foo"
static const char *foo_version = "#(#) foo Support version " VERSION_STR " " __DATE__ " " __TIME__;
static void *pt = NULL;
static int foo_release(struct inode *inode, struct file *file);
static int foo_open(struct inode *inode, struct file *file);
static int foo_mmap(struct file *filp, struct vm_area_struct *vma);
struct file_operations foo_fops = {
.owner = THIS_MODULE,
.llseek = NULL,
.read = NULL,
.write = NULL,
.readdir = NULL,
.poll = NULL,
.ioctl = NULL,
.mmap = foo_mmap,
.open = foo_open,
.flush = NULL,
.release = foo_release,
.fsync = NULL,
.fasync = NULL,
.lock = NULL,
.readv = NULL,
.writev = NULL,
};
static int __init foo_init(void)
{
int i;
printk(KERN_NOTICE "Loading foo support module\n");
printk(KERN_INFO "Version %s\n", foo_version);
printk(KERN_INFO "Preparing device /dev/foo\n");
i = register_chrdev(FOO_MAJOR, FOO_NAME, &foo_fops);
if (i != 0) {
return -EIO;
printk(KERN_ERR "Device couldn't be registered!");
}
printk(KERN_NOTICE "Device ready.\n");
printk(KERN_NOTICE "Make sure to run mknod /dev/foo c %d 0\n", FOO_MAJOR);
printk(KERN_INFO "Allocating memory\n");
pt = ioremap(FOO_BUFFER_OFFSET, FOO_BUFFER_SIZE);
if (pt == NULL) {
printk(KERN_ERR "Unable to remap memory\n");
return 1;
}
printk(KERN_INFO "ioremap returned %p\n", pt);
return 0;
}
static void __exit foo_exit(void)
{
printk(KERN_NOTICE "Unloading foo support module\n");
unregister_chrdev(FOO_MAJOR, FOO_NAME);
if (pt != NULL) {
printk(KERN_INFO "Unmapping memory at %p\n", pt);
iounmap(pt);
} else {
printk(KERN_WARNING "No memory to unmap!\n");
}
return;
}
static int foo_open(struct inode *inode, struct file *file)
{
printk("foo_open\n");
return 0;
}
static int foo_release(struct inode *inode, struct file *file)
{
printk("foo_release\n");
return 0;
}
static int foo_mmap(struct file *filp, struct vm_area_struct *vma)
{
int ret;
if (pt == NULL) {
printk(KERN_ERR "Memory not mapped!\n");
return -EAGAIN;
}
if ((vma->vm_end - vma->vm_start) != FOO_BUFFER_SIZE) {
printk(KERN_ERR "Error: sizes don't match (buffer size = %d, requested size = %lu)\n", FOO_BUFFER_SIZE, vma->vm_end - vma->vm_start);
return -EAGAIN;
}
ret = remap_pfn_range(vma, vma->vm_start, (unsigned long) pt, vma->vm_end - vma->vm_start, PAGE_SHARED);
if (ret != 0) {
printk(KERN_ERR "Error in calling remap_pfn_range: returned %d\n", ret);
return -EAGAIN;
}
return 0;
}
module_init(foo_init);
module_exit(foo_exit);
MODULE_AUTHOR("Mike Miller");
MODULE_LICENSE("NONE");
MODULE_VERSION(VERSION_STR);
MODULE_DESCRIPTION("Provides support for foo to access direct memory");
foo-user.c
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>
#include <sys/mman.h>
int main(void)
{
int fd;
char *mptr;
fd = open("/dev/foo", O_RDWR | O_SYNC);
if (fd == -1) {
printf("open error...\n");
return 1;
}
mptr = mmap(0, 1 * 1024 * 1024, PROT_READ | PROT_WRITE, MAP_FILE | MAP_SHARED, fd, 4096);
printf("On start, mptr points to 0x%lX.\n",(unsigned long) mptr);
printf("mptr points to 0x%lX. *mptr = 0x%X\n", (unsigned long) mptr, *mptr);
mptr[0] = 'a';
mptr[1] = 'b';
printf("mptr points to 0x%lX. *mptr = 0x%X\n", (unsigned long) mptr, *mptr);
close(fd);
return 0;
}
I think you can find a lot of documentation about the kmalloc + mmap part.
However, I am not sure that you can kmalloc so much memory in a contiguous way, and have it always at the same place. Sure, if everything is always the same, then you might get a constant address. However, each time you change the kernel code, you will get a different address, so I would not go with the kmalloc solution.
I think you should reserve some memory at boot time, ie reserve some physical memory so that is is not touched by the kernel. Then you can ioremap this memory which will give you
a kernel virtual address, and then you can mmap it and write a nice device driver.
This take us back to linux device drivers in PDF format. Have a look at chapter 15, it is describing this technique on page 443
Edit : ioremap and mmap.
I think this might be easier to debug doing things in two step : first get the ioremap
right, and test it using a character device operation, ie read/write. Once you know you can safely have access to the whole ioremapped memory using read / write, then you try to mmap the whole ioremapped range.
And if you get in trouble may be post another question about mmaping
Edit : remap_pfn_range
ioremap returns a virtual_adress, which you must convert to a pfn for remap_pfn_ranges.
Now, I don't understand exactly what a pfn (Page Frame Number) is, but I think you can get one calling
virt_to_phys(pt) >> PAGE_SHIFT
This probably is not the Right Way (tm) to do it, but you should try it
You should also check that FOO_MEM_OFFSET is the physical address of your RAM block. Ie before anything happens with the mmu, your memory is available at 0 in the memory map of your processor.
Sorry to answer but not quite answer, I noticed that you have already edited the question. Please note that SO does not notify us when you edit the question. I'm giving a generic answer here, when you update the question please leave a comment, then I'll edit my answer.
Yes, you're going to need to write a module. What it comes down to is the use of kmalloc() (allocating a region in kernel space) or vmalloc() (allocating a region in userspace).
Exposing the prior is easy, exposing the latter can be a pain in the rear with the kind of interface that you are describing as needed. You noted 1.5 MB is a rough estimate of how much you actually need to reserve, is that iron clad? I.e are you comfortable taking that from kernel space? Can you adequately deal with ENOMEM or EIO from userspace (or even disk sleep)? IOW, what's going into this region?
Also, is concurrency going to be an issue with this? If so, are you going to be using a futex? If the answer to either is 'yes' (especially the latter), its likely that you'll have to bite the bullet and go with vmalloc() (or risk kernel rot from within). Also, if you are even THINKING about an ioctl() interface to the char device (especially for some ad-hoc locking idea), you really want to go with vmalloc().
Also, have you read this? Plus we aren't even touching on what grsec / selinux is going to think of this (if in use).
/dev/mem is okay for simple register peeks and pokes, but once you cross into interrupts and DMA territory, you really should write a kernel-mode driver. What you did for your previous memory-management-less OSes simply doesn't graft well to an General Purpose OS like Linux.
You've already thought about the DMA buffer allocation issue. Now, think about the "DMA done" interrupt from your device. How are you going to install an Interrupt Service Routine?
Besides, /dev/mem is typically locked out for non-root users, so it's not very practical for general use. Sure, you could chmod it, but then you've opened a big security hole in the system.
If you are trying to keep the driver code base similar between the OSes, you should consider refactoring it into separate user & kernel mode layers with an IOCTL-like interface in-between. If you write the user-mode portion as a generic library of C code, it should be easy to port between Linux and other OSes. The OS-specific part is the kernel-mode code. (We use this kind of approach for our drivers.)
It seems like you have already concluded that it's time to write a kernel-driver, so you're on the right track. The only advice I can add is to read these books cover-to-cover.
Linux Device Drivers
Understanding the Linux Kernel
(Keep in mind that these books are circa-2005, so the information is a bit dated.)
I am by far no expert on these matters, so this will be a question to you rather than an answer. Is there any reason you can't just make a small ram disk partition and use it only for your application? Would that not give you guaranteed access to the same chunk of memory? I'm not sure of there would be any I/O performance issues, or additional overhead associated with doing that. This also assumes that you can tell the kernel to partition a specific address range in memory, not sure if that is possible.
I apologize for the newb question, but I found your question interesting, and am curious if ram disk could be used in such a way.
Have you looked at the 'memmap' kernel parameter? On i386 and X64_64, you can use the memmap parameter to define how the kernel will hand very specific blocks of memory (see the Linux kernel parameter documentation). In your case, you'd want to mark memory as 'reserved' so that Linux doesn't touch it at all. Then you can write your code to use that absolute address and size (woe be unto you if you step outside that space).

Resources