Embedded Linux: SC16IS752 buffer overflow - linux

In my system I'm using I2C to 2xUART converter SC16IS752. Linux Kernel sources already has driver for this chip, but only in SPI mode. Now I trying to modify this driver for I2C work. I2C has speed 400 kHz. To this converter's UART connected device on baud 38400. This device every 1s sends packets containing about 100 bytes data. SC16IS752 has 64 bytes RX FIFO, so it must be handled two times per one packet.
And I faced the problem of long latencies. When FIFO reaches threshold value, hardware interrupt occurs and now executes IRQ handler:
static irqreturn_t sc16is7x2_irq(int irq, void *data)
{
struct sc16is7x2_channel *chan = data;
#ifdef DEBUG
/* Indicate irq */
chan->handle_irq = true;
#endif
/* Trigger work thread */
sc16is7x2_dowork(chan);
return IRQ_HANDLED;
}
static void sc16is7x2_dowork(struct sc16is7x2_channel *chan)
{
printk("sc16is7x2_dowork \n");
if(!freezing(current))
{
queue_work(chan->workqueue, &chan->work);
}
}
So, as you can see, interrupt handler puts work for handle data from SC16IS752 FIFO in the queue.
And here i faced with problem. sc16is7x2_irq function executes immediately after interrupt occurs. But queued work is performed within 25ms after interrupt occurs. But after this time, FIFO is overflowed and data lost (100 bytes transmits in 26ms).
What is the correct solution in this situation and how to reduce 25ms latency in Linux Kernel?

I identified the source of delays, they were caused by calling functions printk. One function is performed about 2-3 ms.

Related

UART Interrupts disabling I/O on Sam3X8E/ Arduino Due

I am starting to use an Arduino Due for some project work which requires a UART and am confused by what looks like an interaction between UART interrupts and I/O.
My first piece of code was a small routine to set up the UART, send out data continuously by loading the transmit buffer upon receipt of a TXBE interrupt. I had the UART output hooked up to an oscilloscope and had set another I/O pin as a general purpose output which would flip state and therefore be used to trigger the scope when the transmit buffer was reloaded. Problem was that I was seeing UART data and it looked good, but the I/O wasn't flipping. At this point my loop() routine was empty so I set up another output port and in loop() just toggled its state as a sanity check. Still no output except for the UART.
Here's the code that I ended up with:
uint32_t tempo; // 32-bit temporary variable
boolean flag = true;
void UART_Handler(void) {
REG_UART_THR = 0x6DL; // load data into the transmit buffer
if (flag) {
REG_PIOD_SODR = 0x02L; // drive PD1 high
flag = false;
} else {
REG_PIOD_CODR = 0x02L; // drive PD1 low
flag = true;
}
}
void setup() {
// set up the UART I/O
REG_PIOA_IDR = 0x0300L; // disable interrupts on PA8 and PA9
tempo = REG_PIOA_ABSR; // get the current settings of the AB select register
REG_PIOA_ABSR = tempo & 0xFFFFFCFF; // set PA8 and PA9 to peripheral A control
REG_PIOA_PDR = 0x0300L; // disable parallel I/O for PA8 and PA9
NVIC_EnableIRQ(UART_IRQn); // enable UART interrupts in NVIC
// now set up the UART
tempo = REG_PMC_PCSR0; // get the current settings of the peripheral clock register 0
REG_PMC_PCER0 = tempo | 0x0100L; // enable the UART clocks
REG_UART_CR = 0x0CL; // reset UART receiver and transmitter
REG_UART_MR = 0x0800L; // set to normal outputs with no parity
REG_UART_BRGR = 0x89L; // baud rate set to 38400
REG_UART_IDR = 0x1FBL; // disable all UART interrupts
REG_UART_IER = 0x0800L; // enable TXBUFE interrupt
REG_UART_CR = 0x50L; // enable UART receiver and transmitter
// set up the debug outputs
REG_PIOD_IDR = 0x03L; // disable interrupts on PD0 and PD1
REG_PIOD_PER = 0x03L; // enable parallel I/O for PD0 & PD1
REG_PIOD_OER = 0x03L; // set PD0 & PD1 output enabled
REG_PIOD_CODR = 0x03L; // drive PD0 & PD1 low
}
void loop() // run over and over
{
REG_PIOD_SODR = 0x01L; // drive PD0 high
delay(1);
REG_PIOD_CODR = 0x01L; // drive PD0 low
delay(1);
}
the scope output can be viewed at http://www.iwanczuk.com/temp/scope1.png (don't have enough reputation here to post images!).
After staring at things for while and getting no insight I disabled the TXBUFE interrupts by commenting out the line REG_UART_IER = 0x0800L; // enable TXBUFE interrupt and the toggling of PortD1 was then visible but obviously no UART output (see http://www.iwanczuk.com/temp/scope2.png). It seems that the two are mutually exclusive which would be just silly if it were true. I am sure I'm missing something but I can't see or find what it is.
I have read the SAM3X8E data sheet to see if there's anything obvious I'm missing and if there is I can't see it. I've also done what I think are relevant web searches with no luck in finding a solution. I have also tried using general purpose outputs for the two outputs on port A and port D and have tried this on two Arduino Due boards with similar results on both.
Anyone have any ideas what I might be doing wrong? Thanks in advance.
Well, I have got to the bottom of this problem. Not sure it's the best answer but it's a solution. The long and short of it is to avoid TXBE interrupts. If I use TXEMPTY interrupts instead it works fine.
A line on page 168 of the Atmel data sheet says (sic) "A interrupt can enter pending state even it is disabled" so I wondered if the problem with TXBE was because I was not clearing the pending interrupt before or even inside the ISR so I added NVIC_ClearPendingIRQ(UART_IRQn); at the start of the ISR and also just before I enabled the TXBE interrupt but the (mis)behaviour didn't change.
The operation of TXEMPTY is still a little odd (to me) because it appears that the interrupt is generated by the transmit shift register just being empty, not when it goes empty. If you enable interrupts without having loaded the transmit buffer you will immediately get an interrupt. Some may like this "self=priming' behaviour, but it doesn't do it for me. I am writing my sending routine such that the TXEMPTY interrupt is not enabled until the transmitter has been loaded with the first byte to be sent.
Based on this post on the Arduino Forum: http://forum.arduino.cc/index.php?topic=186388.0 I presume that the USARTs have a similar issue.
Hopefully this will help others.
I just realised what could be the real error at the source of my problem. The UART interrupt register descriptions talk about the TXBUFE bit in the context of transmit buffer empty and so my assumption was that this is the bit that tells me when I can put another byte into the transmit holding register. However the UART Status Register description say that the TXBUFE bit is "the buffer full signal from the transmitter PDC channel". The latter puts a whole different slant on what this bit does. According to the UART Status Register description the bit I need to be looking at is the TXRDY bit!

How to test tx_timeout operation of a network kernel module?

I'm having some doubts about how I can test the operation tx_timeout of a network kernel module.
For example, lets take the snull example from chapter 14 of Linux Device Driver book.
void snull_tx_timeout (struct net_device *dev)
{
struct snull_priv *priv = (struct snull_priv *) dev->priv;
PDEBUG("Transmit timeout at %ld, latency %ld\n", jiffies,
jiffies - dev->trans_start);
priv->status = SNULL_TX_INTR;
snull_interrupt(0, dev, NULL);
priv->stats.tx_errors++;
netif_wake_queue(dev);
return;
}
And its initialization:
#ifdef HAVE_TX_TIMEOUT
dev->tx_timeout = snull_tx_timeout;
dev->watchdog_timeo = timeout;
#endif
How can I force a timeout to test the implementation of snull_tx_timeout() ?
I would be glad for any suggestion.
Thanks!
This email from David Miller answer this question. I tested using another network device driver and it worked very well.
The way to test tx_timeout is so simple. If you don't send the packages that are stored in a buffer (or a queue) to the hardware itself. So, those packages will be accumulated until the buffer or queue fill. The next packet may not be stored (and sent), throwing a timeout exception according watchdog_timeo time.

Periodic task in a Linux kernel module

Currently I am developing GPIO kernel module for friendlyarm Linux 2.6.32.2 (mini2440). I am from electronics background and new to Linux.
The kernel module loaded at start-up and the related device file is located in /dev as gpiofreq.
At first time writing to device file, GPIO pin toggles continuously at 50kHz. At second time writing it stop toggling. At third time, it starts again, and so on.
I have wrote separate kernel module to generate freq. but CPU freezes after writing device file at first time. The terminal prompt is shown but I can not run any command afterwards.
Here is the code-snippet:
//calling function which generates continuous freq at gpio
static int send_freq(void *arg)
{
set_current_state(TASK_INTERRUPTIBLE);
for(;;)
{
gpio_set_value(192,1);
udelay(10);
gpio_set_value(192,0);
udelay(10);
}
return 0;
}
Here is the device write code,
which start or stop with any data written to device file.
if(toggle==0)
{
printk("Starting Freq.\n");
task=kthread_run(&send_freq,(void *)freq,"START");
toggle=1;
}
else
{
printk("Operation Terminated.\n");
i = kthread_stop(task);
toggle=0;
}
You are doing an infinite loop in a kernel thread, there is no room for anything else
to happen, except IRQ and maybe other kernel thread.
What you could do is either
program a timer on your hardware and do your pin toggling in an interrupt
replace udelay with usleep_range
I suggest doing thing progressively, and starting in the kHz range with usleep_range, and eventually moving to cust om timer + ISR
in either case, you will probably have a lot of jitter, and doing such gpio toggling may be a good idea on a DSP or a PIC, but is a waste of resources on ARM + Linux, unless you are hardware assisted with pwm capable gpio engine.

Linux kernel device driver to DMA from a device into user-space memory

I want to get data from a DMA enabled, PCIe hardware device into user-space as quickly as possible.
Q: How do I combine "direct I/O to user-space with/and/via a DMA transfer"
Reading through LDD3, it seems that I need to perform a few different types of IO operations!?
dma_alloc_coherent gives me the physical address that I can pass to the hardware device.
But would need to have setup get_user_pages and perform a copy_to_user type call when the transfer completes. This seems a waste, asking the Device to DMA into kernel memory (acting as buffer) then transferring it again to user-space.
LDD3 p453: /* Only now is it safe to access the buffer, copy to user, etc. */
What I ideally want is some memory that:
I can use in user-space (Maybe request driver via a ioctl call to create DMA'able memory/buffer?)
I can get a physical address from to pass to the device so that all user-space has to do is perform a read on the driver
the read method would activate the DMA transfer, block waiting for the DMA complete interrupt and release the user-space read afterwards (user-space is now safe to use/read memory).
Do I need single-page streaming mappings, setup mapping and user-space buffers mapped with get_user_pages dma_map_page?
My code so far sets up get_user_pages at the given address from user-space (I call this the Direct I/O part). Then, dma_map_page with a page from get_user_pages. I give the device the return value from dma_map_page as the DMA physical transfer address.
I am using some kernel modules as reference: drivers_scsi_st.c and drivers-net-sh_eth.c. I would look at infiniband code, but cant find which one is the most basic!
Many thanks in advance.
I'm actually working on exactly the same thing right now and I'm going the ioctl() route. The general idea is for user space to allocate the buffer which will be used for the DMA transfer and an ioctl() will be used to pass the size and address of this buffer to the device driver. The driver will then use scatter-gather lists along with the streaming DMA API to transfer data directly to and from the device and user-space buffer.
The implementation strategy I'm using is that the ioctl() in the driver enters a loop that DMA's the userspace buffer in chunks of 256k (which is the hardware imposed limit for how many scatter/gather entries it can handle). This is isolated inside a function that blocks until each transfer is complete (see below). When all bytes are transfered or the incremental transfer function returns an error the ioctl() exits and returns to userspace
Pseudo code for the ioctl()
/*serialize all DMA transfers to/from the device*/
if (mutex_lock_interruptible( &device_ptr->mtx ) )
return -EINTR;
chunk_data = (unsigned long) user_space_addr;
while( *transferred < total_bytes && !ret ) {
chunk_bytes = total_bytes - *transferred;
if (chunk_bytes > HW_DMA_MAX)
chunk_bytes = HW_DMA_MAX; /* 256kb limit imposed by my device */
ret = transfer_chunk(device_ptr, chunk_data, chunk_bytes, transferred);
chunk_data += chunk_bytes;
chunk_offset += chunk_bytes;
}
mutex_unlock(&device_ptr->mtx);
Pseudo code for incremental transfer function:
/*Assuming the userspace pointer is passed as an unsigned long, */
/*calculate the first,last, and number of pages being transferred via*/
first_page = (udata & PAGE_MASK) >> PAGE_SHIFT;
last_page = ((udata+nbytes-1) & PAGE_MASK) >> PAGE_SHIFT;
first_page_offset = udata & PAGE_MASK;
npages = last_page - first_page + 1;
/* Ensure that all userspace pages are locked in memory for the */
/* duration of the DMA transfer */
down_read(&current->mm->mmap_sem);
ret = get_user_pages(current,
current->mm,
udata,
npages,
is_writing_to_userspace,
0,
&pages_array,
NULL);
up_read(&current->mm->mmap_sem);
/* Map a scatter-gather list to point at the userspace pages */
/*first*/
sg_set_page(&sglist[0], pages_array[0], PAGE_SIZE - fp_offset, fp_offset);
/*middle*/
for(i=1; i < npages-1; i++)
sg_set_page(&sglist[i], pages_array[i], PAGE_SIZE, 0);
/*last*/
if (npages > 1) {
sg_set_page(&sglist[npages-1], pages_array[npages-1],
nbytes - (PAGE_SIZE - fp_offset) - ((npages-2)*PAGE_SIZE), 0);
}
/* Do the hardware specific thing to give it the scatter-gather list
and tell it to start the DMA transfer */
/* Wait for the DMA transfer to complete */
ret = wait_event_interruptible_timeout( &device_ptr->dma_wait,
&device_ptr->flag_dma_done, HZ*2 );
if (ret == 0)
/* DMA operation timed out */
else if (ret == -ERESTARTSYS )
/* DMA operation interrupted by signal */
else {
/* DMA success */
*transferred += nbytes;
return 0;
}
The interrupt handler is exceptionally brief:
/* Do hardware specific thing to make the device happy */
/* Wake the thread waiting for this DMA operation to complete */
device_ptr->flag_dma_done = 1;
wake_up_interruptible(device_ptr->dma_wait);
Please note that this is just a general approach, I've been working on this driver for the last few weeks and have yet to actually test it... So please, don't treat this pseudo code as gospel and be sure to double check all logic and parameters ;-).
You basically have the right idea: in 2.1, you can just have userspace allocate any old memory. You do want it page-aligned, so posix_memalign() is a handy API to use.
Then have userspace pass in the userspace virtual address and size of this buffer somehow; ioctl() is a good quick and dirty way to do this. In the kernel, allocate an appropriately sized buffer array of struct page* -- user_buf_size/PAGE_SIZE entries -- and use get_user_pages() to get a list of struct page* for the userspace buffer.
Once you have that, you can allocate an array of struct scatterlist that is the same size as your page array and loop through the list of pages doing sg_set_page(). After the sg list is set up, you do dma_map_sg() on the array of scatterlist and then you can get the sg_dma_address and sg_dma_len for each entry in the scatterlist (note you have to use the return value of dma_map_sg() because you may end up with fewer mapped entries because things might get merged by the DMA mapping code).
That gives you all the bus addresses to pass to your device, and then you can trigger the DMA and wait for it however you want. The read()-based scheme you have is probably fine.
You can refer to drivers/infiniband/core/umem.c, specifically ib_umem_get(), for some code that builds up this mapping, although the generality that that code needs to deal with may make it a bit confusing.
Alternatively, if your device doesn't handle scatter/gather lists too well and you want contiguous memory, you could use get_free_pages() to allocate a physically contiguous buffer and use dma_map_page() on that. To give userspace access to that memory, your driver just needs to implement an mmap method instead of the ioctl as described above.
At some point I wanted to allow user-space application to allocate DMA buffers and get it mapped to user-space and get the physical address to be able to control my device and do DMA transactions (bus mastering) entirely from user-space, totally bypassing the Linux kernel. I have used a little bit different approach though. First I started with a minimal kernel module that was initializing/probing PCIe device and creating a character device. That driver then allowed a user-space application to do two things:
Map PCIe device's I/O bar into user-space using remap_pfn_range() function.
Allocate and free DMA buffers, map them to user space and pass a physical bus address to user-space application.
Basically, it boils down to a custom implementation of mmap() call (though file_operations). One for I/O bar is easy:
struct vm_operations_struct a2gx_bar_vma_ops = {
};
static int a2gx_cdev_mmap_bar2(struct file *filp, struct vm_area_struct *vma)
{
struct a2gx_dev *dev;
size_t size;
size = vma->vm_end - vma->vm_start;
if (size != 134217728)
return -EIO;
dev = filp->private_data;
vma->vm_ops = &a2gx_bar_vma_ops;
vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
vma->vm_private_data = dev;
if (remap_pfn_range(vma, vma->vm_start,
vmalloc_to_pfn(dev->bar2),
size, vma->vm_page_prot))
{
return -EAGAIN;
}
return 0;
}
And another one that allocates DMA buffers using pci_alloc_consistent() is a little bit more complicated:
static void a2gx_dma_vma_close(struct vm_area_struct *vma)
{
struct a2gx_dma_buf *buf;
struct a2gx_dev *dev;
buf = vma->vm_private_data;
dev = buf->priv_data;
pci_free_consistent(dev->pci_dev, buf->size, buf->cpu_addr, buf->dma_addr);
buf->cpu_addr = NULL; /* Mark this buffer data structure as unused/free */
}
struct vm_operations_struct a2gx_dma_vma_ops = {
.close = a2gx_dma_vma_close
};
static int a2gx_cdev_mmap_dma(struct file *filp, struct vm_area_struct *vma)
{
struct a2gx_dev *dev;
struct a2gx_dma_buf *buf;
size_t size;
unsigned int i;
/* Obtain a pointer to our device structure and calculate the size
of the requested DMA buffer */
dev = filp->private_data;
size = vma->vm_end - vma->vm_start;
if (size < sizeof(unsigned long))
return -EINVAL; /* Something fishy is happening */
/* Find a structure where we can store extra information about this
buffer to be able to release it later. */
for (i = 0; i < A2GX_DMA_BUF_MAX; ++i) {
buf = &dev->dma_buf[i];
if (buf->cpu_addr == NULL)
break;
}
if (buf->cpu_addr != NULL)
return -ENOBUFS; /* Oops, hit the limit of allowed number of
allocated buffers. Change A2GX_DMA_BUF_MAX and
recompile? */
/* Allocate consistent memory that can be used for DMA transactions */
buf->cpu_addr = pci_alloc_consistent(dev->pci_dev, size, &buf->dma_addr);
if (buf->cpu_addr == NULL)
return -ENOMEM; /* Out of juice */
/* There is no way to pass extra information to the user. And I am too lazy
to implement this mmap() call using ioctl(). So we simply tell the user
the bus address of this buffer by copying it to the allocated buffer
itself. Hacks, hacks everywhere. */
memcpy(buf->cpu_addr, &buf->dma_addr, sizeof(buf->dma_addr));
buf->size = size;
buf->priv_data = dev;
vma->vm_ops = &a2gx_dma_vma_ops;
vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
vma->vm_private_data = buf;
/*
* Map this DMA buffer into user space.
*/
if (remap_pfn_range(vma, vma->vm_start,
vmalloc_to_pfn(buf->cpu_addr),
size, vma->vm_page_prot))
{
/* Out of luck, rollback... */
pci_free_consistent(dev->pci_dev, buf->size, buf->cpu_addr,
buf->dma_addr);
buf->cpu_addr = NULL;
return -EAGAIN;
}
return 0; /* All good! */
}
Once those are in place, user space application can pretty much do everything — control the device by reading/writing from/to I/O registers, allocate and free DMA buffers of arbitrary size, and have the device perform DMA transactions. The only missing part is interrupt-handling. I was doing polling in user space, burning my CPU, and had interrupts disabled.
Hope it helps. Good Luck!
I'm getting confused with the direction to implement. I want to...
Consider the application when designing a driver.
What is the nature of data movement, frequency, size and what else might be going on in the system?
Is the traditional read/write API sufficient?
Is direct mapping the device into user space OK?
Is a reflective (semi-coherent) shared memory desirable?
Manually manipulating data (read/write) is a pretty good option if the data lends itself to being well understood. Using general purpose VM and read/write may be sufficient with an inline copy. Direct mapping non cachable accesses to the peripheral is convenient, but can be clumsy. If the access is the relatively infrequent movement of large blocks, it may make sense to use regular memory, have the drive pin, translate addresses, DMA and release the pages. As an optimization, the pages (maybe huge) can be pre pinned and translated; the drive then can recognize the prepared memory and avoid the complexities of dynamic translation. If there are lots of little I/O operations, having the drive run asynchronously makes sense. If elegance is important, the VM dirty page flag can be used to automatically identify what needs to be moved and a (meta_sync()) call can be used to flush pages. Perhaps a mixture of the above works...
Too often people don't look at the larger problem, before digging into the details. Often the simplest solutions are sufficient. A little effort constructing a behavioral model can help guide what API is preferable.
first_page_offset = udata & PAGE_MASK;
It seems wrong. It should be either:
first_page_offset = udata & ~PAGE_MASK;
or
first_page_offset = udata & (PAGE_SIZE - 1)
It is worth mention that driver with Scatter-Gather DMA support and user space memory allocation is most efficient and has highest performance. However in case we don't need high performance or we want to develop a driver in some simplified conditions we can use some tricks.
Give up zero copy design. It is worth to consider when data throughput is not too big. In such a design data can by copied to user by
copy_to_user(user_buffer, kernel_dma_buffer, count);
user_buffer might be for example buffer argument in character device read() system call implementation. We still need to take care of kernel_dma_buffer allocation. It might by memory obtained from dma_alloc_coherent() call for example.
The another trick is to limit system memory at the boot time and then use it as huge contiguous DMA buffer. It is especially useful during driver and FPGA DMA controller development and rather not recommended in production environments. Lets say PC has 32GB of RAM. If we add mem=20GB to kernel boot parameters list we can use 12GB as huge contiguous dma buffer. To map this memory to user space simply implement mmap() as
remap_pfn_range(vma,
vma->vm_start,
(0x500000000 >> PAGE_SHIFT) + vma->vm_pgoff,
vma->vm_end - vma->vm_start,
vma->vm_page_prot)
Of course this 12GB is completely omitted by OS and can be used only by process which has mapped it into its address space. We can try to avoid it by using Contiguous Memory Allocator (CMA).
Again above tricks will not replace full Scatter-Gather, zero copy DMA driver, but are useful during development time or in some less performance platforms.

Linux device driver handling multiple interrupt sources/vectors

I am writing a device driver to handle interrupts for a PCIe card, which currently works for any interrupt vector raised on the IRQ line.
But it has a few types that can be raised, flagged by the Vector register. So now I need to read the vector information and be a bit cleverer...
So, do I :-
1/ Have separate dev nodes /dev/int1, /dev/int2, etc for each interrupt type, and just doc that int1 is for vector type A etc?
1.1/ As each file/char-devices will have its own minor number, when opened I'll know which is which. i think.
1.2/ ldd3 seems to demo this method.
2/ Have one node /dev/int (as I do now) and have multiple processes hanging off the same read method? sounds better?!
2.1/ Then only wake the correct process up...?
2.2/ Do I use separate wait_queue_head_t wait_queues? Or different flag/test conditions?
In the read method:-
wait_event_interruptible(wait_queue, flag);
In the handler not real code! :-
int vector = read_vector();
if vector = A then
wake_up_interruptible(wait_queue, flag)
return IRQ_HANDLED;
else
return IRQ_NONE/IRQ_RETVAL?
EDIT: notes from peoples comments :-
1) my user-space code mmap's all of the PCIe firmware registers
2) User-space code has a few threads, each perform a blocking read on the device driver device nodes, which then returns data from the firmware when an interrupt occurs. I need the correct thread woken up depending on the interrupt type.
I am not sure I understand correctly what you mean with the Vector register (a pointer to some documentation would help me precise for your case).
Anyway, any PCI device gets a unique interrupt number (given by the BIOS or some firmware on other architectures than x86). You just need to register this interrupt in your driver.
priv->name = DRV_NAME;
err = request_irq(pdev->irq, your_irqhandler, IRQF_SHARED, priv->name,
pdev);
if (err) {
dev_err(&pdev->dev, "cannot request IRQ\n");
goto err_out_unmap;
}
One other thing that I do not really understand is why you would export your interrupts as a dev node: interrupts are certainly something that need to remain in your driver/kernel code. But I guess here you want to export a device that is then accessed in userspace. I just find /dev/int no to be a good naming.
For your question about multiple dev nodes: if your different interrupt sources then provide access to different hardware resources (even if on the same PCI board) I would go for option 1), with a wait_queue for each device. Otherwise, I would go for option 2)
Since your interrupts are coming from the same physical device, if you chose option 1) or option 2), the interrupt line will have to be shared and you will have to read the vector in your interrupt handler to define which hardware resource raised the interrupt.
For option 1), it would be something like this:
static irqreturn_t pex_irqhandler(int irq, void *dev) {
struct pci_dev *pdev = dev;
int result;
result = pci_read_config_byte(pdev, PCI_INTERRUPT_LINE, &myirq);
if (result) {
int vector = read_vector();
if (vector == A) {
set_flagA(flag);
} else if (vector == B) {
set_flagB(flag);
}
wake_up_interruptible(wait_queue, flag);
return IRQ_HANDLED;
} else {
return IRQ_NONE;
}
For option 2, it would be similar, but you would have only one if clause (for the respective vector value) in every different interrupt handler that you would request for every node.
If you have different chanel you can read() from, then you should definitely use different minor number. Imagine you have a card whith four serial port, you would definitely want four /dev/ttySx.
But does your device fit whith this model ?
First, I assume you're not trying to get your code into the mainline kernel. If you are, expect a vigorous discussion about the best way to do this. If you're writing a simple interrupt handling driver for a card which is mostly driven by mmap from user-space, there are a lot of ways to solve this problem.
If you use multiple device nodes (option 1), you can also implement poll so that a single application can open multiple device nodes and wait for a selection of interrupts. The minor number will be sufficient to tell them apart. If you have a wake queue for each vector, you can wake only the relevant listeners. You'll need to latch the vector after a successful poll to be sure that the read succeeds.
If you use a single device node (option 2), you'll need to add some extra magic so that the threads can register their interest in particular interrupt vectors. You could do this with an ioctl, or have the threads write the interrupt vectors to the device. Each thread should open the device node to get its own file descriptor. You can then associate the list of requested vectors with each open file descriptor. As a bonus, you can let the application read the interrupt vector from the device, so it knows which one happened.
You'll need to think about how the interrupt gets cleared. The interrupt handler will need to remove the interrupt, then store the result so it can be passed to user-space. You might find a kfifo useful for this rather than a wait queue. If you have a fifo for each open file descriptor, you can distribute the interrupt notifications to each listening application.

Resources