UPDATE :[ Seemed to have been hardware error, working fine with same code but new card ]
I recently bought a very cheap parallel pci card (link) to try to learn a bit about device drivers in linux (via ldd3) on my ubuntu machine
I've connected leds to pins 2-9, and have been able to set/clear the pins using IO ports. However have not been able to raise an interrupt and handle it. Any help or pointers will be appreciated
(please note I have pin 9 directly wired to pin 10)
lspci
07:04.0 Parallel controller: Device 1c00:2170 (rev 0f) (prog-if 01 [BiDir])
Subsystem: Device 1c00:2170
Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Interrupt: pin A routed to IRQ 11
Region 0: I/O ports at ccf0 [size=8]
Region 1: I/O ports at ccf8 [size=8]
after system boot, the io registers are:
DATA: 0xff, STATUS: 0x07, CONTROL: 0xc0
I've tried:
outb_p(0x10, BASE+2); // enable irq
outb_p(0x00, BASE); outb_p(0xFF, BASE); // trigger interrupt
// => DATA: 0xff, STATUS: 0x7b, CONTROL: 0xd0
but the interrupt count(as reported by lspci) in /proc/stat 's intr line for IRQ11 (as reported by lspci ) remains zero
I have also tried wrapping the above seq between probe_irq_on/off() (with an additional outb_p(0x00, BASE+2); udelay(5) in between) which also fails to spot and report any interrupt.
This kernel probing was done after a call to pci_enable_device(dev) in the module code.
Please let me know if any other info is required. Thanks in advance.
Related
I am using the Mellanox ConnectX-6 NIC and the configuration is as shown below:
(lspci -vv)
a1:00.0 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6]
Subsystem: Mellanox Technologies Device 0028
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 182
NUMA node: 1
Region 0: Memory at a0000000 (64-bit, prefetchable) [size=32M]
Expansion ROM at 9d000000 [disabled] [size=1M]
Capabilities: [60] Express (v2) Endpoint, MSI 00 ...
I am measuring RDMA throughput between two systems by varying the chunk size for total transfer of 10GB data. (Both the machines have same Mellanox NIC)
The results show that just after the chunk size of 32 MB (i.e. 33, 34, 35 MB …), the throughput would drop drastically by around 50+ Gbps. (Normal speeds for this NIC is 175-185 Gbps, so till 32 MB I get these speeds but in 33MB chunk size, I am getting somewhere in between 85-120 Gbps)
So would like to know is the prefetchable memory 32 MB which is listed in the above configuration has any impact on RDMA throughput.
Linux kernel fails to assign memory to the device when the BAR size is set to 1GB. The device enumeration works fine as long as the BAR memory size is set to 512MB. But when set to 1GB, it enumerates the device, but then the memory mappings are not assigned.
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B-
ParErr- DEVSEL=fast >TAbort- SERR-
(64-bit, non-prefetchable) [disabled] Region 2: Memory at
(64-bit, non-prefetchable) [disabled] Region 4: Memory
at (64-bit, non-prefetchable) [disabled]
What could be the reason for this? What can be done to debug this?
Enabled kernel debug at boot-up and this is what is logged for that device:
[ 7.087688] pci 0000:8b:00.0: BAR 4: can't assign mem (size
0x40000000) [ 7.109427] pci 0000:8b:00.0: BAR 0: can't assign mem
(size 0x100000) [ 7.130599] pci 0000:8b:00.0: BAR 2: can't assign
mem (size 0x2000)
you can try setpci -s "your pcie device bus number" COMMAND=0x02 for example setpcie -s 01:00.0 COMMAND=0x02 this will enable memory mapped transfers for your pcie device.
you can refer to this link:
https://forums.xilinx.com/t5/PCI-Express/lspci-reports-BAR-0-disabled/td-p/747139
So, the problem can be described as follows:
We got 11 completely equal PCI devices, connected through two CompactPCI buses, 6 on one, and 5 on the other.
We are trying to access the resources of the devices through the sysfs filesystem, example:
/sys/class/pci_bus/0000:04/device/0000:04:0d.0/resource1. First 4 devices allow read/write access to their resources without problems, but:
The 5th and 6th devices of both buses don't work: all files exist, but all read operations return a bunch of FFs, regardless of the written values, so I can't really say if the write was successful or not. When one of the first 4 is physically removed, 5th device starts working as usual, same goes for 6 on the bus with 6 devices. It looks like it can only work with 4 devices per bus, not more. It should be noted that CompactPCI allows using 7 PCI devices on the bus at once, according to the specification.
It can't really be a hardware problem, because Windows driver(developed long ago by someone we don't have access to) does it just fine.
lspci:
03:0b.0 Multimedia controller: Device 6472:8001 (rev 01)
03:0c.0 Multimedia controller: Device 6472:8001 (rev 01)
03:0d.0 Multimedia controller: Device 6472:8001 (rev 01)
03:0e.0 Multimedia controller: Device 6472:8001 (rev 01)
03:0f.0 Multimedia controller: Device 6472:8001 (rev 01)
04:09.0 Multimedia controller: Device 6472:8001 (rev 01)
04:0a.0 Multimedia controller: Device 6472:8001 (rev 01)
04:0b.0 Multimedia controller: Device 6472:8001 (rev 01)
04:0c.0 Multimedia controller: Device 6472:8001 (rev 01)
04:0d.0 Multimedia controller: Device 6472:8001 (rev 01)
04:0f.0 Multimedia controller: Device 6472:8001 (rev 01)
lspci -vv(equal aside from bus numbers for all 11 devices):
04:0f.0 Multimedia controller: Device 6472:8001 (rev 01)
Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr Stepping- SERR- FastB2B- DisINTx-
Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx+
Interrupt: pin A routed to IRQ 11
Region 0: I/O ports at d800 [size=128]
Region 1: Memory at febfe800 (32-bit, non-prefetchable) [size=128]
Don't know if I really need show you the code, because it is as simple as it is possible - file is opened, then mmaped, then the resulting pointer is used to write and read into that file.
fd = open ( (device_ + "resource" + std::to_string (i)).c_str(), O_RDWR);
ptr = (u_int32_t*) mmap (NULL, 0x7f, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
All paths are recovered right, that's what I've checked first.
dmesg has no errors regarding PCI.
After quite a long time, I've decided to answer this question. I didn't solve the problem by myself, and wrote an email to maintainer of PCI-related code in linux kernel. After tens of attempts to find out what went wrong, we just stopped - I had to switch to another project, spare time was over. The only thing that has been is discovered is that in such a configuration you CANNOT use mmap(and this is the primary way of accessing BARs through sysfs filesystem). So, instead, I have developed a simple PCI driver, which does exactly the same thing, but using read/write operations, and it worked.
Basically,
kernel -> userspace - result
ioremap -> read/write - works
ioremap -> mmap - doesn't work
sysfs -> mmap - doesn't work
My machine (running Linux kernel 3.2.38) on boot has wrong subsystem IDs (sub-device and sub-vendor IDs) of a PCI device. If I then physically unplug and re-plug the PCI device while the system is still up (i.e., hot-plug), it gets the correct IDs.
Note that the wrong sub-device and sub-vendor IDs it gets are same as the device's device and vendor IDs (see the first two lines in the lspci output below).
Following is the output of lspci -vvnn before and after hot-plugging the device:
Before hot-plugging:
0b:0f.0 Bridge [0680]: Device [1a88:4d45] (rev 05)
Subsystem: Device [1a88:4d45]
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 32 (250ns min, 63750ns max)
Interrupt: pin A routed to IRQ 10
Region 0: I/O ports at 2100 [size=256]
Region 1: I/O ports at 2000 [size=256]
Region 2: Memory at 92920000 (32-bit, non-prefetchable) [size=64]
After hot-plugging:
0b:0f.0 Bridge [0680]: Device [1a88:4d45] (rev 05)
Subsystem: Device [007d:5a14]
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Interrupt: pin A routed to IRQ 10
Region 0: I/O ports at 2100 [disabled] [size=256]
Region 1: I/O ports at 2000 [disabled] [size=256]
Region 2: [virtual] Memory at 92920000 (32-bit, non-prefetchable) [size=64]
My question: Is there a way to get the IDs fixed without hot-plugging the device? e.g. forcing kernel to re-read PCI device IDs e.g. by performing a PCI bus rescan/re-enumeration/re-configuration?
Any help would be highly appreciated. Thanks.
PS. Note that the problem isn't really related to kernel/software as it exists even if boot into UEFI internal shell.
PPS. The PCI device in this case is MEN F206N and "My machine" is MEN F22P
You may forcefully rescan the PCI by :
# echo 1 > /sys/bus/pci/rescan
A closer look at your lscpi output before and after hot plugging the device shows more delta than just the sub device/vendor ID. I'd be surprised if the device functions as expected after hot plugging.
Besides, forcing PCI reenumeration is not possible primarily because there may be other devices that have been enumerated correctly and functioning already. How do you expect reenumeration to deal with that? (and there are other reasons too.)
Prafulla
I'm working on 2.6.35.9 version of the Linux kernel and am trying to disable Command Completion Coalescing.
The output of lspci is as shown below:
00:00.0 Host bridge: Intel Corporation 82P965/G965 Memory Controller Hub (rev 02)
00:01.0 PCI bridge: Intel Corporation 82P965/G965 PCI Express Root Port (rev 02)
00:19.0 Ethernet controller: Intel Corporation 82566DC Gigabit Network Connection (rev 02)
00:1a.0 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #4 (rev 02)
00:1a.1 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #5 (rev 02)
00:1a.7 USB Controller: Intel Corporation 82801H (ICH8 Family) USB2 EHCI Controller #2 (rev 02)
00:1c.0 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 1 (rev 02)
00:1c.4 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 5 (rev 02)
00:1d.0 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #1 (rev 02)
00:1d.1 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #2 (rev 02)
00:1d.2 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #3 (rev 02)
00:1d.7 USB Controller: Intel Corporation 82801H (ICH8 Family) USB2 EHCI Controller #1 (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev f2)
00:1f.0 ISA bridge: Intel Corporation 82801HH (ICH8DH) LPC Interface Controller (rev 02)
00:1f.2 RAID bus controller: Intel Corporation 82801 SATA RAID Controller (rev 02)
00:1f.3 SMBus: Intel Corporation 82801H (ICH8 Family) SMBus Controller (rev 02)
01:00.0 VGA compatible controller: nVidia Corporation G72 [GeForce 7300 LE] (rev a1)
04:03.0 Mass storage controller: Promise Technology, Inc. PDC20268 (Ultra100 TX2) (rev 02)
I have Native Command Queuing enabled on my drives.
I was looking at the Serial ATA AHCI 1.3 Specification and found on page 115 that -
The CCC feature is only in use when CCC_CTL.EN is set to ‘1’. If CCC_CTL.EN is set to ‘0’, no CCC
interrupts shall be generated.
Next, I had a look at the relevant code (namely, the files concerning AHCI) for this version of the kernel but wasn't able to make any progress. I found the following macro enum HOST_CAP_CCC = (1 << 7) in drivers/ata/ahci.h, but I'm not sure how this should be modified to disable command coalescing.
Can someone please assist me in identifying how CCC can be disabled? Thank you!
In response to gby's comment:
I conducted an experiment where I issued requests of size 64KB from my driver code. 64KB corresponds to 128 sectors (each sector = 512 bytes).
When I look at the response timestamp differences, here is what I find:
Timestamp | Timestamp | Difference
at | at | in microsecs
Sector 255 - Sector 127 = 510
Sector 383 - Sector 255 = 3068
Sector 511 - Sector 383 = 22
Sector 639 - Sector 511 = 22
Sector 767 - Sector 639 = 12
Sector 895 - Sector 767 = 19
Sector 1023 - Sector 895 = 13
Sector 1151 - Sector 1023 = 402
As you can see, the response timestamp differences seem to suggest that the write completion interrupts are being batched into one and then one single interrupt is being raised, which might explain the really low numbers in tens of microseconds.
Also, when conducting this experiment, the on-disk write cache was disabled using hdparm.
Clearly, there is some interrupt batching involved here which I need to disable so that an interrupt is raised for each and every write request.
UPDATE:
Here is another experiment that I tried.
Create a bio structure in my driver and call the __make_request() function of the lower level driver. Only one 2560 bytes write request is sent from my driver.
Once this write is serviced, an interrupt is generated which is intercepted by do_IRQ(). Finally, the function blk_complete_request() is called. Keep in mind that we are still in the top half of the interrupt handler (i.e., interrupt context, not kernel context). Now, we compose another struct bio in blk_complete_request() and call the __make_request() function of the lower level driver. We record a timestamp at this point (say T_0). When the request completion callback is obtained, we record another timestamp (call it T_1). The difference - T_1 - T_0 - is always above 1 millisec. This experiment was repeated numerous times, and each time, the destination sector affected this difference - T_1 - T_0. It was observed that if the destination sectors are separated by approximately 350 sectors, the time difference is about 1.2 millisec for requests of size 2560 bytes.
Every time, the next write request is sent only when the previous request has been serviced. So, all these requests are chained and the disk has to service only one request at a time.
My understanding is that since the destination sectors of consecutive requests have been separated by a fairly large amount, by the time the next request is issued, the requested sector would be almost below the disk head and thus the write should happen immediately and T_1 - T_0 should be small (at least < 1 millisec).
The Serial ATA AHCI 1.3 Specification (page 114) states that:
When a software specified number of commands have completed or a software specified
timeout has expired, an interrupt is generated by hardware to allow software to process completed commands.
My guess is that this timer maybe the reason why the latency of each request is above 1 millisec. That's why I need to disable CCC.
I did mail the author - Jeff Garzik - but I haven't heard from him yet. Is he a registered user on stackoverflow? If yes, I could PM him...
The HDD we are using is: WD Caviar Black (Model number - WD1001FALS).
Anyone? :-(
AFAIK, HBA capabilities bit7(CCC supported) is RO and you can check it first to see if CCC supported. Then by spec you can disable CCC by setting CCC_CTL.EN because it is RW
Do you try to clear it then conduct your experiment ?