Custom SPI driver to implement lseek

Custom SPI driver to implement lseek - linux

I am trying to implement a SPI driver for custom hardware. I have started with a copy of the spidev driver, which has support for almost everything I need.
We're using a protocol that has three parts: a command bit (read / write) an address, and an arbitrary amount of data.
I had assumed that simply adding lseek capabilities would be the best way to do this. "Seek" to the desired address, then read or write any number of bytes. I created a custom .llseek in the new driver's file_operations, but I have never seen that function even be called. I have tried using fseek(), lseek(), and pread() and none of those functions seem to call the new my_lseek() function. Every call reports "errno 29 ESPIPE Illegal Seek"
The device is defined in the board.c file:
static struct spi_board_info my_spi_board_info[] __initdata = {
[0] = {
.modalias = "myspi",
.bus_num = 1,
.chip_select = 0,
.max_speed_hz = 3000000,
.mode = SPI_MODE_0,
.controller_data = &spidev_mcspi_config,
}, ...
I suspect there might be something with the way that the dev files get created, mainly because the example that I found references filp->f_pos
static int myspi_llseek(struct file *filp, loff_t off, int whence)
{
...
newpos = filp->f_pos + off;
...
}
So my questions are: Is there a way to have this driver (lightly modified spidev) support the "seek" call? At what point does this get defined to return errno 29? Will I have to start from a new driver and not be able to rely on the spi_board_info() and spi_register_board_info() setup?
Only one driver in the /drivers/spi directory (spi-dw) references lseek, and they use the default_llseek implementation. There are a couple of "hacks" that we've come up with to get everything up and running, but I tend to be a person who wants to learn to get it done the right way.
Any suggestions are greatly appreciated! (PS, the kernel version is 3.4.48 for an OMAP Android system)

Spi driver dose not support any llseek or fseek functionality. It has these many call back functions.
struct spi_driver {
const struct spi_device_id *id_table;
int (*probe)(struct spi_device *spi);
int (*remove)(struct spi_device *spi);
void (*shutdown)(struct spi_device *spi);
int (*suspend)(struct spi_device *spi, pm_message_t mesg);
int (*resume)(struct spi_device *spi);
struct device_driver driver;
};
Now drivers/spi/spi-dw.c is register as a charter-driver(debugfs_create_file("registers", S_IFREG | S_IRUGO,
dws->debugfs, (void *)dws, &dw_spi_regs_ops);). So they implement to create a file in the debugfs filesystem. they implement lseek callback function.
static const struct file_operations dw_spi_regs_ops = {
.owner = THIS_MODULE,
.open = simple_open,
.read = dw_spi_show_regs,
.llseek = default_llseek,
};
The file_operations structure is defined in linux/fs.h, and holds pointers to functions defined by the driver that perform various operations on the device. Each field of the structure corresponds to the address of some function defined by the driver to handle a requested operation
lseek -: lseek is a system call that is used to change the location of the read/write pointer of a file descriptor.
SPI -: The "Serial Peripheral Interface" (SPI) is a synchronous four wire serial link used to connect microcontrollers to sensors, memory, and peripherals. SPI can not provide any lseek and fseek functionlity.
There are two type of SPI driver (https://www.kernel.org/doc/Documentation/spi/spi-summary)
Controller drivers ... controllers may be built into System-On-Chip
processors, and often support both Master and Slave roles.
These drivers touch hardware registers and may use DMA.
Or they can be PIO bitbangers, needing just GPIO pins.
Protocol drivers ... these pass messages through the controller
driver to communicate with a Slave or Master device on the
other side of an SPI link.
If you want to user read, write and llseek then you will have to register a charter-driver on top of SPI. Then you will able to achieve your acquirement.

Related

How to extend MIPS bus error handlers?

I'm writing a Linux driver in MIPS architecture.
there I implement read operation - read some registers content.
Here is the code:
static int proc_seq_show(struct seq_file *seq, void *v)
{
volatile unsigned reg;
//here read registers and pass to user using seq_printf
//1. read reg number 1
//2. read reg number 2
}
int proc_open(struct inode *inode, struct file *filp)
{
return single_open(filp,&proc_seq_show, NULL);
}
static struct file_operation proc_ops = {.read = &seq_read, .open=&seq_open};
My problem is that reading register content sometimes causing kernel oops - bus error, and read operation is prevented. I can't avoid it in advance.
Since this behavior is acceptable I would like to ignore this error and continue to read the other registers.
I saw bus error handler in the kernel (do_be in traps.c), there is an option to add my own entry to the __dbe_table. An entry looks like that:
struct exception_table_entry {unsigned long insn, nextinsn; };
insn is the instruction that cause the error. nextinsn is the next instruction to be performed after exception.
In the driver I declare an entry:
struct exception_table_entry e __attribute__ ((section("__dbe_table"))) = {0,0};
but I don't know how to initialize it. How can I get the instruction address of the risky line in C? how can I get the address of the fix line? I have tried something with labels and addresses of label - but didn't manage to set correctly the exception_table_entry .
The same infrastructure is available in x86, does someone know how they use it?
Any help will be appreciated.

ping pong DMA in kernel

I am using a SoC to DMA data from the programmable logic side (PL) to the processor side (PS). My overall scheme is to allocate two separate dma buffers and ping-pong data to these buffers to the user-space application can access data without collisions. I have successfully done this using single dma transactions in a loop (in a kthread) with each transaction either on the first or second buffer. I was able to notify a poll() method of either the first or second buffer.
Now I would like to investigate scatter-gather and/or cyclic DMA.
struct scatterlist sg[2]
struct dma_async_tx_descriptor *chan_desc;
struct dma_slave_config config;
sg_init_table(sg, ARRAY_SIZE(sg));
addr_0 = (unsigned int *)dma_alloc_coherent(NULL, dma_size, &handle_0, GFP_KERNEL);
addr_1 = (unsigned int *)dma_alloc_coherent(NULL, dma_size, &handle_1, GFP_KERNEL);
sg_dma_address(&sg[0]) = handle_0;
sg_dma_len(&sg[0]) = dma_size;
sg_dma_address(&sg[1]) = handle_1;
sg_dma_len(&sg[1]) = dma_size;
memset(&config, 0, sizeof(config));
config.direction = DMA_DEV_TO_MEM;
config.dst_addr_width = DMA_SLAVE_BUSWIDTH_4_BYTES;
config.src_maxburst = DMA_MAX_BURST_SIZE / DMA_SLAVE_BUSWIDTH_4_BYTES;
dmaengine_slave_config(dma_dev->chan, &config);
chan_desc = dmaengine_prep_slave_sg(dma_dev->chan, &sg[0], 2, DMA_DEV_TO_MEM, DMA_CTRL_ACK | DMA_PREP_INTERRUPT);
chan_desc->callback = sync_callback;
chan_desc->callback_param = dma_dev->cmp;
init_completion(dma_dev->cmp);
dmaengine_submit(chan_desc);
dma_async_issue_pending(dma_dev->chan);
dma_free_coherent(NULL, dma_size, addr_0, handle_0);
dma_free_coherent(NULL, dma_size, addr_1, handle_1);
This works fine for a single run through the scatterlist and then calls the call_back at sync_callback. My thought was to chain the scatterlist in a loop but I won't get the callback.
IS THERE A WAY to have a callback for each descriptor in the scatterlist? I wondered if I could use dmaengine_prep_slave_cyclic (calls callback after every transaction) but it looks to me like this is for a single buffer when reviewing dmaengine.h. Looking at DMA Engine API Guide it looks like there is another option dmaengine_prep_interleaved_dma using a dma_interleaved_template that sounds interesting but hard to find info about.
In the end I just want to signal the user space in some manner as to what buffer is ready.

Remove input driver bound to the HID interface

I'm playing with some driver code for a special kind of keyboard. And this keyboard does have special modes. According to the specification those modes could only be enabled by sending and getting feature reports.
I'm using 'hid.c' file and user mode to send HID reports. But both 'hid_read' and 'hid_get_feature_report' failed with error number -1.
I already tried detaching keyboard from kernel drivers using libusb, but when I do that, 'hid_open' fails. I guess this is due to that HID interface already using by 'input' or some driver by the kernel. So I may not need to unbind kernel hidraw driver, instead I should try unbinding the keyboard ('input') driver top of 'hidraw' driver. Am I correct?
And any idea how I could do that? And how to find what are drivers using which drivers and which low level driver bind to which driver?

I found the answer to this myself.
The answer is to dig this project and find it's hid implementation on libusb.
Or you could directly receive the report.
int HID_API_EXPORT hid_get_feature_report(hid_device *dev, unsigned char *data, size_t length)
{
int res = -1;
int skipped_report_id = 0;
int report_number = data[0];
if (report_number == 0x0) {
/* Offset the return buffer by 1, so that the report ID
will remain in byte 0. */
data++;
length--;
skipped_report_id = 1;
}
res = libusb_control_transfer(dev->device_handle,
LIBUSB_REQUEST_TYPE_CLASS|LIBUSB_RECIPIENT_INTERFACE|LIBUSB_ENDPOINT_IN,
0x01/*HID get_report*/,
(3/*HID feature*/ << 8) | report_number,
dev->interface,
(unsigned char *)data, length,
1000/*timeout millis*/);
if (res < 0)
return -1;
if (skipped_report_id)
res++;
return res;
}
I'm sorry I can't post my actual code due to some legal reasons. However the above code is from hidapi implementation.
So even you work with an old kernel , you still have the chance to make your driver working.
This answers to this question too: https://stackoverflow.com/questions/30565999/kernel-version-2-6-32-does-not-support-hidiocgfeature

Unclear logic behind pl011_tx_chars() in amba-pl011 Linux kernel module

I'm trying to understand how Linux driver for AMBA serial port (amba-pl011.c) sends characters in non-DMA mode. For port operations, this driver registers only following callbacks:
static struct uart_ops amba_pl011_pops = {
.tx_empty = pl011_tx_empty,
.set_mctrl = pl011_set_mctrl,
.get_mctrl = pl011_get_mctrl,
.stop_tx = pl011_stop_tx,
.start_tx = pl011_start_tx,
.stop_rx = pl011_stop_rx,
.enable_ms = pl011_enable_ms,
.break_ctl = pl011_break_ctl,
.startup = pl011_startup,
.shutdown = pl011_shutdown,
.flush_buffer = pl011_dma_flush_buffer,
.set_termios = pl011_set_termios,
.type = pl011_type,
.release_port = pl011_release_port,
.request_port = pl011_request_port,
.config_port = pl011_config_port,
.verify_port = pl011_verify_port,
.poll_init = pl011_hwinit,
.poll_get_char = pl011_get_poll_char,
.poll_put_char = pl011_put_poll_char };
As you can see, there's no character sending operation among them, namely, pl011_tx_chars() function is not listed there. Since pl011_tx_chars() is declared static, it is not exposed outside the module. I found that within the module it is called only from pl011_int() function which is an interrupt handler. It is called whenever UART011_TXIS occurs:
if (status & UART011_TXIS) pl011_tx_chars(uap);
The function pl011_tx_chars() itself writes characters from circular buffer to UART01x_DR port until the fifo queue size is reached (function returns then so more data will be written at the next interrupt) or until circular buffer is empty (pl011_stop_tx() is called then). As we can see, pl011_start_tx() and pl011_stop_tx() are listed in AMBA port operations (so they can be called as callbacks despite their local static declaration). Seems reasonable, thing is, these two function do something very simple:
static void pl011_stop_tx(struct uart_port *port)
{
struct uart_amba_port *uap = (struct uart_amba_port *)port;
uap->im &= ~UART011_TXIM;
writew(uap->im, uap->port.membase + UART011_IMSC);
pl011_dma_tx_stop(uap);
}
static void pl011_start_tx(struct uart_port *port)
{
struct uart_amba_port *uap = (struct uart_amba_port *)port;
if (!pl011_dma_tx_start(uap)) {
uap->im |= UART011_TXIM;
writew(uap->im, uap->port.membase + UART011_IMSC);
}
}
Since I don't have CONFIG_DMA_ENGINE set, pl011_dma_tx_start() and pl011_dma_tx_stop() are just stubs:
static inline void pl011_dma_tx_stop(struct uart_amba_port *uap)
{
}
static inline bool pl011_dma_tx_start(struct uart_amba_port *uap)
{
return false;
}
Seems like the only thing that pl011_start_tx() does is to arm UART011_TXIM interrupt while the only thing that pl011_stop_tx() does is to disarm it. Nothing initiates the transmission!
I looked at serial_core.c - it's the only file where start_tx operation is invoked, in four places (by the registered callback). The most promissing place is uart_write() function. It fills circular buffer with data and calls local static uart_start() function which is very simple:
static void __uart_start(struct tty_struct *tty)
{
struct uart_state *state = tty->driver_data;
struct uart_port *port = state->uart_port;
if (!uart_circ_empty(&state->xmit) && state->xmit.buf &&
!tty->stopped && !tty->hw_stopped)
port->ops->start_tx(port);
}
static void uart_start(struct tty_struct *tty)
{
struct uart_state *state = tty->driver_data;
struct uart_port *port = state->uart_port;
unsigned long flags;
spin_lock_irqsave(&port->lock, flags);
__uart_start(tty);
spin_unlock_irqrestore(&port->lock, flags);
}
As you can see, no one sends initial characters to the UART port, circular buffer is filled and everything is waiting for UART011_TXIS interrupt.
Is it possible that arming UART011_TXIM interrupt instantly emits UART011_TXIS? I looked into DDI0183.pdf (PrimeCell® UART (PL011) Technical Referecne Manual), Chapter 3: Programmers Model, section 3.4: Interrupts, subsection 3.4.3 UARTTXINTR. What it says is:
....
The transmit interrupt changes state when one of the following events occurs:
• If the FIFOs are enabled and the transmit FIFO reaches the programmed trigger
level. When this happens, the transmit interrupt is asserted HIGH. The transmit
interrupt is cleared by writing data to the transmit FIFO until it becomes greater
than the trigger level, or by clearing the interrupt.
• If the FIFOs are disabled (have a depth of one location) and there is no data
present in the transmitters single location, the transmit interrupt is asserted HIGH.
It is cleared by performing a single write to the transmit FIFO, or by clearing the
interrupt.
....
The note below is even more interesting:
....
The transmit interrupt is based on a transition through a level, rather than on the level
itself. When the interrupt and the UART is enabled before any data is written to the
transmit FIFO the interrupt is not set. The interrupt is only set once written data leaves
the single location of the transmit FIFO and it becomes empty.
....
The emphasis above is mine. I don't know if my English is not sufficient, but from the words above I can't find where it states that unlocking transmit interrupt can be used for triggering transmit routine. What am I missing?

The ARM docs say that the PL011 is a "16550-ish" UART. This sort of gets them off the hook for fully specifying its behavior and instead sends you to the 16550 docs, which state in the "FIFO interrupt mode operation" section...
When the XMIT FIFO and transmitter interrupts are enabled (FCR0e1,
IER1e1), XMIT interrupts will occur as follows: A. The transmitter
holding register interrupt (02) occurs when the XMIT FIFO is empty; it
is cleared as soon as the transmitter holding register is written to
(1 to 16 characters may be written to the XMIT FIFO while servicing
this interrupt) or the IIR is read.
So, it appears that if the FIFO and TX holding register are empty and you enable TX interrupts, you should immediately see a TX interrupt that kickstarts the sending process and fills the holding register and then the FIFO. Once those drain back down below the FIFO trigger, then another interrupt will be generated to keep the process going for as long as there is more buffered data to be sent.

Direct Memory Access in Linux

I'm trying to access physical memory directly for an embedded Linux project, but I'm not sure how I can best designate memory for my use.
If I boot my device regularly, and access /dev/mem, I can easily read and write to just about anywhere I want. However, in this, I'm accessing memory that can easily be allocated to any process; which I don't want to do
My code for /dev/mem is (all error checking, etc. removed):
mem_fd = open("/dev/mem", O_RDWR));
mem_p = malloc(SIZE + (PAGE_SIZE - 1));
if ((unsigned long) mem_p % PAGE_SIZE) {
mem_p += PAGE_SIZE - ((unsigned long) mem_p % PAGE_SIZE);
}
mem_p = (unsigned char *) mmap(mem_p, SIZE, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_FIXED, mem_fd, BASE_ADDRESS);
And this works. However, I'd like to be using memory that no one else will touch. I've tried limiting the amount of memory that the kernel sees by booting with mem=XXXm, and then setting BASE_ADDRESS to something above that (but below the physical memory), but it doesn't seem to be accessing the same memory consistently.
Based on what I've seen online, I suspect I may need a kernel module (which is OK) which uses either ioremap() or remap_pfn_range() (or both???), but I have absolutely no idea how; can anyone help?
EDIT:
What I want is a way to always access the same physical memory (say, 1.5MB worth), and set that memory aside so that the kernel will not allocate it to any other process.
I'm trying to reproduce a system we had in other OSes (with no memory management) whereby I could allocate a space in memory via the linker, and access it using something like
*(unsigned char *)0x12345678
EDIT2:
I guess I should provide some more detail. This memory space will be used for a RAM buffer for a high performance logging solution for an embedded application. In the systems we have, there's nothing that clears or scrambles physical memory during a soft reboot. Thus, if I write a bit to a physical address X, and reboot the system, the same bit will still be set after the reboot. This has been tested on the exact same hardware running VxWorks (this logic also works nicely in Nucleus RTOS and OS20 on different platforms, FWIW). My idea was to try the same thing in Linux by addressing physical memory directly; therefore, it's essential that I get the same addresses each boot.
I should probably clarify that this is for kernel 2.6.12 and newer.
EDIT3:
Here's my code, first for the kernel module, then for the userspace application.
To use it, I boot with mem=95m, then insmod foo-module.ko, then mknod mknod /dev/foo c 32 0, then run foo-user , where it dies. Running under gdb shows that it dies at the assignment, although within gdb, I cannot dereference the address I get from mmap (although printf can)
foo-module.c
#include <linux/module.h>
#include <linux/config.h>
#include <linux/init.h>
#include <linux/fs.h>
#include <linux/mm.h>
#include <asm/io.h>
#define VERSION_STR "1.0.0"
#define FOO_BUFFER_SIZE (1u*1024u*1024u)
#define FOO_BUFFER_OFFSET (95u*1024u*1024u)
#define FOO_MAJOR 32
#define FOO_NAME "foo"
static const char *foo_version = "#(#) foo Support version " VERSION_STR " " __DATE__ " " __TIME__;
static void *pt = NULL;
static int foo_release(struct inode *inode, struct file *file);
static int foo_open(struct inode *inode, struct file *file);
static int foo_mmap(struct file *filp, struct vm_area_struct *vma);
struct file_operations foo_fops = {
.owner = THIS_MODULE,
.llseek = NULL,
.read = NULL,
.write = NULL,
.readdir = NULL,
.poll = NULL,
.ioctl = NULL,
.mmap = foo_mmap,
.open = foo_open,
.flush = NULL,
.release = foo_release,
.fsync = NULL,
.fasync = NULL,
.lock = NULL,
.readv = NULL,
.writev = NULL,
};
static int __init foo_init(void)
{
int i;
printk(KERN_NOTICE "Loading foo support module\n");
printk(KERN_INFO "Version %s\n", foo_version);
printk(KERN_INFO "Preparing device /dev/foo\n");
i = register_chrdev(FOO_MAJOR, FOO_NAME, &foo_fops);
if (i != 0) {
return -EIO;
printk(KERN_ERR "Device couldn't be registered!");
}
printk(KERN_NOTICE "Device ready.\n");
printk(KERN_NOTICE "Make sure to run mknod /dev/foo c %d 0\n", FOO_MAJOR);
printk(KERN_INFO "Allocating memory\n");
pt = ioremap(FOO_BUFFER_OFFSET, FOO_BUFFER_SIZE);
if (pt == NULL) {
printk(KERN_ERR "Unable to remap memory\n");
return 1;
}
printk(KERN_INFO "ioremap returned %p\n", pt);
return 0;
}
static void __exit foo_exit(void)
{
printk(KERN_NOTICE "Unloading foo support module\n");
unregister_chrdev(FOO_MAJOR, FOO_NAME);
if (pt != NULL) {
printk(KERN_INFO "Unmapping memory at %p\n", pt);
iounmap(pt);
} else {
printk(KERN_WARNING "No memory to unmap!\n");
}
return;
}
static int foo_open(struct inode *inode, struct file *file)
{
printk("foo_open\n");
return 0;
}
static int foo_release(struct inode *inode, struct file *file)
{
printk("foo_release\n");
return 0;
}
static int foo_mmap(struct file *filp, struct vm_area_struct *vma)
{
int ret;
if (pt == NULL) {
printk(KERN_ERR "Memory not mapped!\n");
return -EAGAIN;
}
if ((vma->vm_end - vma->vm_start) != FOO_BUFFER_SIZE) {
printk(KERN_ERR "Error: sizes don't match (buffer size = %d, requested size = %lu)\n", FOO_BUFFER_SIZE, vma->vm_end - vma->vm_start);
return -EAGAIN;
}
ret = remap_pfn_range(vma, vma->vm_start, (unsigned long) pt, vma->vm_end - vma->vm_start, PAGE_SHARED);
if (ret != 0) {
printk(KERN_ERR "Error in calling remap_pfn_range: returned %d\n", ret);
return -EAGAIN;
}
return 0;
}
module_init(foo_init);
module_exit(foo_exit);
MODULE_AUTHOR("Mike Miller");
MODULE_LICENSE("NONE");
MODULE_VERSION(VERSION_STR);
MODULE_DESCRIPTION("Provides support for foo to access direct memory");
foo-user.c
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>
#include <sys/mman.h>
int main(void)
{
int fd;
char *mptr;
fd = open("/dev/foo", O_RDWR | O_SYNC);
if (fd == -1) {
printf("open error...\n");
return 1;
}
mptr = mmap(0, 1 * 1024 * 1024, PROT_READ | PROT_WRITE, MAP_FILE | MAP_SHARED, fd, 4096);
printf("On start, mptr points to 0x%lX.\n",(unsigned long) mptr);
printf("mptr points to 0x%lX. *mptr = 0x%X\n", (unsigned long) mptr, *mptr);
mptr[0] = 'a';
mptr[1] = 'b';
printf("mptr points to 0x%lX. *mptr = 0x%X\n", (unsigned long) mptr, *mptr);
close(fd);
return 0;
}

I think you can find a lot of documentation about the kmalloc + mmap part.
However, I am not sure that you can kmalloc so much memory in a contiguous way, and have it always at the same place. Sure, if everything is always the same, then you might get a constant address. However, each time you change the kernel code, you will get a different address, so I would not go with the kmalloc solution.
I think you should reserve some memory at boot time, ie reserve some physical memory so that is is not touched by the kernel. Then you can ioremap this memory which will give you
a kernel virtual address, and then you can mmap it and write a nice device driver.
This take us back to linux device drivers in PDF format. Have a look at chapter 15, it is describing this technique on page 443
Edit : ioremap and mmap.
I think this might be easier to debug doing things in two step : first get the ioremap
right, and test it using a character device operation, ie read/write. Once you know you can safely have access to the whole ioremapped memory using read / write, then you try to mmap the whole ioremapped range.
And if you get in trouble may be post another question about mmaping
Edit : remap_pfn_range
ioremap returns a virtual_adress, which you must convert to a pfn for remap_pfn_ranges.
Now, I don't understand exactly what a pfn (Page Frame Number) is, but I think you can get one calling
virt_to_phys(pt) >> PAGE_SHIFT
This probably is not the Right Way (tm) to do it, but you should try it
You should also check that FOO_MEM_OFFSET is the physical address of your RAM block. Ie before anything happens with the mmu, your memory is available at 0 in the memory map of your processor.

Sorry to answer but not quite answer, I noticed that you have already edited the question. Please note that SO does not notify us when you edit the question. I'm giving a generic answer here, when you update the question please leave a comment, then I'll edit my answer.
Yes, you're going to need to write a module. What it comes down to is the use of kmalloc() (allocating a region in kernel space) or vmalloc() (allocating a region in userspace).
Exposing the prior is easy, exposing the latter can be a pain in the rear with the kind of interface that you are describing as needed. You noted 1.5 MB is a rough estimate of how much you actually need to reserve, is that iron clad? I.e are you comfortable taking that from kernel space? Can you adequately deal with ENOMEM or EIO from userspace (or even disk sleep)? IOW, what's going into this region?
Also, is concurrency going to be an issue with this? If so, are you going to be using a futex? If the answer to either is 'yes' (especially the latter), its likely that you'll have to bite the bullet and go with vmalloc() (or risk kernel rot from within). Also, if you are even THINKING about an ioctl() interface to the char device (especially for some ad-hoc locking idea), you really want to go with vmalloc().
Also, have you read this? Plus we aren't even touching on what grsec / selinux is going to think of this (if in use).

/dev/mem is okay for simple register peeks and pokes, but once you cross into interrupts and DMA territory, you really should write a kernel-mode driver. What you did for your previous memory-management-less OSes simply doesn't graft well to an General Purpose OS like Linux.
You've already thought about the DMA buffer allocation issue. Now, think about the "DMA done" interrupt from your device. How are you going to install an Interrupt Service Routine?
Besides, /dev/mem is typically locked out for non-root users, so it's not very practical for general use. Sure, you could chmod it, but then you've opened a big security hole in the system.
If you are trying to keep the driver code base similar between the OSes, you should consider refactoring it into separate user & kernel mode layers with an IOCTL-like interface in-between. If you write the user-mode portion as a generic library of C code, it should be easy to port between Linux and other OSes. The OS-specific part is the kernel-mode code. (We use this kind of approach for our drivers.)
It seems like you have already concluded that it's time to write a kernel-driver, so you're on the right track. The only advice I can add is to read these books cover-to-cover.
Linux Device Drivers
Understanding the Linux Kernel
(Keep in mind that these books are circa-2005, so the information is a bit dated.)

I am by far no expert on these matters, so this will be a question to you rather than an answer. Is there any reason you can't just make a small ram disk partition and use it only for your application? Would that not give you guaranteed access to the same chunk of memory? I'm not sure of there would be any I/O performance issues, or additional overhead associated with doing that. This also assumes that you can tell the kernel to partition a specific address range in memory, not sure if that is possible.
I apologize for the newb question, but I found your question interesting, and am curious if ram disk could be used in such a way.

Have you looked at the 'memmap' kernel parameter? On i386 and X64_64, you can use the memmap parameter to define how the kernel will hand very specific blocks of memory (see the Linux kernel parameter documentation). In your case, you'd want to mark memory as 'reserved' so that Linux doesn't touch it at all. Then you can write your code to use that absolute address and size (woe be unto you if you step outside that space).

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string