I wrote a simple PCIe driver and I want to test if it works. For example, If it is possible to write and read to the memory which is used from the device as well.
How can I do that?
And which stuff should be proved too?
You need to find the sysfs entry for your device, for example
/sys/devices/pci0000:00/0000:00:07.0/0000:28:00.0
(It can be easier to get there via the symlinks in other subdirectories of /sys, e.g. /sys/class/...)
In this directory there should be (pseudo-)files named resource... which correspond to the various address ranges (Base Address Registers) of your device. I think these can be mmap()ed (but I've never done that).
There's a lot of other stuff you can do with the entries in /sys. See the kernel documentation for more details.
To test the memory you can follow this approach:
1) Do lspci -v
Output of this command will be something like this
0002:03:00.1 Ethernet controller: QUALCOMM Corporation Device ABCD (rev 11)
Subsystem: QUALCOMM Corporation Device 8470
Flags: fast devsel, IRQ 110
Memory at 11d00f1008000 (64-bit, prefetchable) [disabled] [size=32K]
Memory at 11d00f0800000 (64-bit, prefetchable) [disabled] [size=8M]
Capabilities: [48] Power Management version 3
Capabilities: [50] Vital Product Data
Capabilities: [58] MSI: Enable- Count=1/8 Maskable- 64bit+
Capabilities: [a0] MSI-X: Enable- Count=1 Masked-
Capabilities: [ac] Express Endpoint, MSI 00
Capabilities: [100] Advanced Error Reporting
Capabilities: [13c] Device Serial Number 00-00-00-00-00-00-00-00
Capabilities: [150] Power Budgeting <?>
Capabilities: [180] Vendor Specific Information: ID=0000 Rev=0 Len=028 <?>
Capabilities: [250] #12
2) We can see in the above output memory is disabled. To enable it we can execute the following:
setpci -s 0002:03:00.1 COMMAND=0x02
This command will enable the memory at the address: 11d00f1008000
Now, try to read this memory using your processor read command it should be accessible.
Related
The following lspci output contains a line Expansion ROM at 42000000 [virtual] [disabled] [size=2M], what is the meaning and implications of [virtual]? How can I enable the expansion rom?
# lspci -s d:0.0 -v
0d:00.0 3D controller: Moore Threads Technology Co.,Ltd MTT S2000
Subsystem: Moore Threads Technology Co.,Ltd MTT S2000
Flags: bus master, fast devsel, latency 0, IRQ 35
Memory at 40000000 (32-bit, non-prefetchable) [size=32M]
Memory at 2000000000 (64-bit, prefetchable) [size=16G]
Expansion ROM at 42000000 [virtual] [disabled] [size=2M]
Capabilities: [80] Power Management version 3
Capabilities: [90] MSI: Enable+ Count=1/8 Maskable+ 64bit+
Capabilities: [c0] Express Endpoint, MSI 00
Capabilities: [100] Advanced Error Reporting
Capabilities: [150] Device Serial Number 00-00-00-00-00-00-00-00
Capabilities: [160] Power Budgeting <?>
Capabilities: [180] Resizable BAR <?>
Capabilities: [1b8] Latency Tolerance Reporting
Capabilities: [1c0] Dynamic Power Allocation <?>
Capabilities: [300] Secondary PCI Express
Capabilities: [4c0] Virtual Channel
Capabilities: [900] L1 PM Substates
Capabilities: [910] Data Link Feature <?>
Capabilities: [920] Lane Margining at the Receiver <?>
Capabilities: [9c0] Physical Layer 16.0 GT/s <?>
Kernel driver in use: mtgpu
I tried to retrieve the ROM address by setpci, it seems it's not very meaningful (I was expecting something like 42000000):
# setpci -s d:0.0 ROM_ADDRESS
00000002
For some non-[virtual] Expansion ROMs, I can enable them by using setpci -s <slot> ROM_ADDRESS=1:1, but I failed for this one.
My goal is to read the expansion rom of the device (either using dd or memtool), after enable the expansion rom somehow.
I want to load and unload linux drivers in the device terminal,and I have two options but I do not want to do the first one
Build driver as a module
CONFIG_DRIVER = m
and I can use rmmod and modprobe to unload and load device driver.
Build device driver into kernel itself
CONFIG_DRIVER = Y
I want to follow the 2nd option but I do not know how to unload and load the device driver, can the open source community please help me out here !
It's easy as that. You find a device and driver which you want to unbind. For example, on my Intel Minnownboard (v1) I have PCH UDC controller (a PCI device):
% lspci -nk
...
02:02.4 0c03: 8086:8808 (rev 02)
Subsystem: 1cc8:0001
Kernel driver in use: pch_udc
Now I know necessary bits:
bus on which the device is located: PCI
device name: 0000:02:02.4 (note that lspci gives reduced PCI address, i.e. without domain or i.o.w. BDF, while driver expects domain:BDF)
driver name: pch_udc
Take altogether we can unbind the device:
% echo 0000:02:02.4 > /sys/bus/pci/drivers/pch_udc/unbind
[ 3042.531872] configfs-gadget 0000:02:02.4: unregistering UDC driver [g1]
[ 3042.540979] udc 0000:02:02.4: releasing '0000:02:02.4'
You may bind it again. Simple use bind node in the same folder.
The feature appeared more than 15 years ago and here is the article on LWN that explains it.
So, the problem can be described as follows:
We got 11 completely equal PCI devices, connected through two CompactPCI buses, 6 on one, and 5 on the other.
We are trying to access the resources of the devices through the sysfs filesystem, example:
/sys/class/pci_bus/0000:04/device/0000:04:0d.0/resource1. First 4 devices allow read/write access to their resources without problems, but:
The 5th and 6th devices of both buses don't work: all files exist, but all read operations return a bunch of FFs, regardless of the written values, so I can't really say if the write was successful or not. When one of the first 4 is physically removed, 5th device starts working as usual, same goes for 6 on the bus with 6 devices. It looks like it can only work with 4 devices per bus, not more. It should be noted that CompactPCI allows using 7 PCI devices on the bus at once, according to the specification.
It can't really be a hardware problem, because Windows driver(developed long ago by someone we don't have access to) does it just fine.
lspci:
03:0b.0 Multimedia controller: Device 6472:8001 (rev 01)
03:0c.0 Multimedia controller: Device 6472:8001 (rev 01)
03:0d.0 Multimedia controller: Device 6472:8001 (rev 01)
03:0e.0 Multimedia controller: Device 6472:8001 (rev 01)
03:0f.0 Multimedia controller: Device 6472:8001 (rev 01)
04:09.0 Multimedia controller: Device 6472:8001 (rev 01)
04:0a.0 Multimedia controller: Device 6472:8001 (rev 01)
04:0b.0 Multimedia controller: Device 6472:8001 (rev 01)
04:0c.0 Multimedia controller: Device 6472:8001 (rev 01)
04:0d.0 Multimedia controller: Device 6472:8001 (rev 01)
04:0f.0 Multimedia controller: Device 6472:8001 (rev 01)
lspci -vv(equal aside from bus numbers for all 11 devices):
04:0f.0 Multimedia controller: Device 6472:8001 (rev 01)
Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr Stepping- SERR- FastB2B- DisINTx-
Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx+
Interrupt: pin A routed to IRQ 11
Region 0: I/O ports at d800 [size=128]
Region 1: Memory at febfe800 (32-bit, non-prefetchable) [size=128]
Don't know if I really need show you the code, because it is as simple as it is possible - file is opened, then mmaped, then the resulting pointer is used to write and read into that file.
fd = open ( (device_ + "resource" + std::to_string (i)).c_str(), O_RDWR);
ptr = (u_int32_t*) mmap (NULL, 0x7f, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
All paths are recovered right, that's what I've checked first.
dmesg has no errors regarding PCI.
After quite a long time, I've decided to answer this question. I didn't solve the problem by myself, and wrote an email to maintainer of PCI-related code in linux kernel. After tens of attempts to find out what went wrong, we just stopped - I had to switch to another project, spare time was over. The only thing that has been is discovered is that in such a configuration you CANNOT use mmap(and this is the primary way of accessing BARs through sysfs filesystem). So, instead, I have developed a simple PCI driver, which does exactly the same thing, but using read/write operations, and it worked.
Basically,
kernel -> userspace - result
ioremap -> read/write - works
ioremap -> mmap - doesn't work
sysfs -> mmap - doesn't work
I'm working on 2.6.35.9 version of the Linux kernel and am trying to disable Command Completion Coalescing.
The output of lspci is as shown below:
00:00.0 Host bridge: Intel Corporation 82P965/G965 Memory Controller Hub (rev 02)
00:01.0 PCI bridge: Intel Corporation 82P965/G965 PCI Express Root Port (rev 02)
00:19.0 Ethernet controller: Intel Corporation 82566DC Gigabit Network Connection (rev 02)
00:1a.0 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #4 (rev 02)
00:1a.1 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #5 (rev 02)
00:1a.7 USB Controller: Intel Corporation 82801H (ICH8 Family) USB2 EHCI Controller #2 (rev 02)
00:1c.0 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 1 (rev 02)
00:1c.4 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 5 (rev 02)
00:1d.0 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #1 (rev 02)
00:1d.1 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #2 (rev 02)
00:1d.2 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #3 (rev 02)
00:1d.7 USB Controller: Intel Corporation 82801H (ICH8 Family) USB2 EHCI Controller #1 (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev f2)
00:1f.0 ISA bridge: Intel Corporation 82801HH (ICH8DH) LPC Interface Controller (rev 02)
00:1f.2 RAID bus controller: Intel Corporation 82801 SATA RAID Controller (rev 02)
00:1f.3 SMBus: Intel Corporation 82801H (ICH8 Family) SMBus Controller (rev 02)
01:00.0 VGA compatible controller: nVidia Corporation G72 [GeForce 7300 LE] (rev a1)
04:03.0 Mass storage controller: Promise Technology, Inc. PDC20268 (Ultra100 TX2) (rev 02)
I have Native Command Queuing enabled on my drives.
I was looking at the Serial ATA AHCI 1.3 Specification and found on page 115 that -
The CCC feature is only in use when CCC_CTL.EN is set to ‘1’. If CCC_CTL.EN is set to ‘0’, no CCC
interrupts shall be generated.
Next, I had a look at the relevant code (namely, the files concerning AHCI) for this version of the kernel but wasn't able to make any progress. I found the following macro enum HOST_CAP_CCC = (1 << 7) in drivers/ata/ahci.h, but I'm not sure how this should be modified to disable command coalescing.
Can someone please assist me in identifying how CCC can be disabled? Thank you!
In response to gby's comment:
I conducted an experiment where I issued requests of size 64KB from my driver code. 64KB corresponds to 128 sectors (each sector = 512 bytes).
When I look at the response timestamp differences, here is what I find:
Timestamp | Timestamp | Difference
at | at | in microsecs
Sector 255 - Sector 127 = 510
Sector 383 - Sector 255 = 3068
Sector 511 - Sector 383 = 22
Sector 639 - Sector 511 = 22
Sector 767 - Sector 639 = 12
Sector 895 - Sector 767 = 19
Sector 1023 - Sector 895 = 13
Sector 1151 - Sector 1023 = 402
As you can see, the response timestamp differences seem to suggest that the write completion interrupts are being batched into one and then one single interrupt is being raised, which might explain the really low numbers in tens of microseconds.
Also, when conducting this experiment, the on-disk write cache was disabled using hdparm.
Clearly, there is some interrupt batching involved here which I need to disable so that an interrupt is raised for each and every write request.
UPDATE:
Here is another experiment that I tried.
Create a bio structure in my driver and call the __make_request() function of the lower level driver. Only one 2560 bytes write request is sent from my driver.
Once this write is serviced, an interrupt is generated which is intercepted by do_IRQ(). Finally, the function blk_complete_request() is called. Keep in mind that we are still in the top half of the interrupt handler (i.e., interrupt context, not kernel context). Now, we compose another struct bio in blk_complete_request() and call the __make_request() function of the lower level driver. We record a timestamp at this point (say T_0). When the request completion callback is obtained, we record another timestamp (call it T_1). The difference - T_1 - T_0 - is always above 1 millisec. This experiment was repeated numerous times, and each time, the destination sector affected this difference - T_1 - T_0. It was observed that if the destination sectors are separated by approximately 350 sectors, the time difference is about 1.2 millisec for requests of size 2560 bytes.
Every time, the next write request is sent only when the previous request has been serviced. So, all these requests are chained and the disk has to service only one request at a time.
My understanding is that since the destination sectors of consecutive requests have been separated by a fairly large amount, by the time the next request is issued, the requested sector would be almost below the disk head and thus the write should happen immediately and T_1 - T_0 should be small (at least < 1 millisec).
The Serial ATA AHCI 1.3 Specification (page 114) states that:
When a software specified number of commands have completed or a software specified
timeout has expired, an interrupt is generated by hardware to allow software to process completed commands.
My guess is that this timer maybe the reason why the latency of each request is above 1 millisec. That's why I need to disable CCC.
I did mail the author - Jeff Garzik - but I haven't heard from him yet. Is he a registered user on stackoverflow? If yes, I could PM him...
The HDD we are using is: WD Caviar Black (Model number - WD1001FALS).
Anyone? :-(
AFAIK, HBA capabilities bit7(CCC supported) is RO and you can check it first to see if CCC supported. Then by spec you can disable CCC by setting CCC_CTL.EN because it is RW
Do you try to clear it then conduct your experiment ?
How can I discover if a remote machine is configured with or without hardware or software RAID? All I know is i have 256GB at present, I need to order more space but before I can I need to know how the drives are configured.
df lists the drive as:
/dev/sdb1 287826944 273086548 119644 100% /mnt/db
and hdparm:
/dev/sdb:
HDIO_GET_MULTCOUNT failed: Invalid argument
readonly = 0 (off)
readahead = 256 (on)
geometry = 36404/255/63, sectors = 299439751168, start = 0
What else should I run and what should I look for?
Software RAID would not be /dev/sdb - dev/md0. Nor is it LVM.
So it's either real hardware RAID, or a raw disk.
lspci might show you and RAID controllers plugged in.
dmesg | grep sdb might tell you some more about the disk.
sdparm /dev/sdb might tell you something? Particularly if it really is a SCSI disk.
To check for software RAID:
cat /proc/mdstat
On my box, this shows:
Personalities : [raid1]
md0 : active raid1 sda1[0] sdb1[1]
96256 blocks [2/2] [UU]
md1 : active raid1 sda2[0] sdb2[1]
488287552 blocks [2/2] [UU]
unused devices: <none>
You get the names of all software RAID arrays, the RAID level for each, the partitions that are part of each RAID array, and the status of the arrays.
dmesg might help.
On a system where we do have software raid we see things like:
SCSI device sda: 143374744 512-byte hdwr sectors (73408 MB)
sda: Write Protect is off
sda: Mode Sense: ab 00 10 08
SCSI device sda: write cache: enabled, read cache: enabled, supports DPO and FUA
SCSI device sda: 143374744 512-byte hdwr sectors (73408 MB)
sda: Write Protect is off
sda: Mode Sense: ab 00 10 08
SCSI device sda: write cache: enabled, read cache: enabled, supports DPO and FUA
sda: sda1 sda2
sd 0:0:0:0: Attached scsi disk sda
SCSI device sdb: 143374744 512-byte hdwr sectors (73408 MB)
sdb: Write Protect is off
sdb: Mode Sense: ab 00 10 08
SCSI device sdb: write cache: enabled, read cache: enabled, supports DPO and FUA
SCSI device sdb: 143374744 512-byte hdwr sectors (73408 MB)
sdb: Write Protect is off
sdb: Mode Sense: ab 00 10 08
SCSI device sdb: write cache: enabled, read cache: enabled, supports DPO and FUA
sdb: sdb1 sdb2
sd 0:0:1:0: Attached scsi disk sdb
A bit later we see:
md: md0 stopped.
md: bind
md: bind
md: raid0 personality registered for level 0
md0: setting max_sectors to 512, segment boundary to 131071
raid0: looking at sda2
raid0: comparing sda2(63296000) with sda2(63296000)
raid0: END
raid0: ==> UNIQUE
raid0: 1 zones
raid0: looking at sdb2
raid0: comparing sdb2(63296000) with sda2(63296000)
raid0: EQUAL
raid0: FINAL 1 zones
raid0: done.
raid0 : md_size is 126592000 blocks.
raid0 : conf->hash_spacing is 126592000 blocks.
raid0 : nb_zone is 1.
raid0 : Allocating 4 bytes for hash.
and a df shows:
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 7.8G 3.3G 4.2G 45% /
tmpfs 2.0G 0 2.0G 0% /dev/shm
/dev/md0 117G 77G 35G 69% /scratch
So part of sda and all of sdb have been bound as one raid volume.
What you have could be one disk, or it could be hardware raid. dmesg should give you some clues.
It is always possible that it is a hardware raid controller that just looks like
a single sata (or scsi) drive. Ie, our systems with fiber channel raid arrays, linux
only sees a single device, and you control the raid portion and disk assignment
via connecting to the fiber raid array directly.
You can try mount -v or you can look in /sys/ or /dev/ for hints. dmesg might reveal information about the drivers used, and lspci could list any add-in hw raid cards, but in general there is no generic method you can rely on to find out the exact hardware & driver setup.
You might try using mdadm with more explanation here. If the 'mount' command does not show /dev/md*, chances are you are not using (or seeing) the software raid.
This is really a system administration, not programming related question, I'll tag it as such.