what is the meaning of these terms: "subbus", "secbus" and "pribus"
Here an example of the output:
dev.pcib.3.subbus: 2
dev.pcib.3.secbus: 2
dev.pcib.3.pribus: 0
dev.pcib.3.domain: 0
Does it map to pci address (pci:U:X:Y:Z)?
Why it is not documented in sysctl man page? Where can I find more info about it?
You can use "-d" option for sysctl. It provides short description for each system control like
dev.pcib.3.subbus=Subordinate bus number
dev.pcib.3.secbus=Secondary bus number
dev.pcib.3.pribus=Primary bus number
dev.pcib.3.domain=Domain number
In particular case this is statistics information about buses attached to PCI bridge (aka pcib). So it's doesn't map to PCI ID.
Sysctl is just routine which gathering tunable / read-only variables over kernel modules. As result, it doesn't know meaning of each variable. Developer of particular functionality may describe meaning of sysctl variables, but I think it's rare case.
If you're looking for PCI information, it's worth to use "pciconf -l -v" and "devinfo".
Related
in the book 2017 " UNIX and Linux System Administration " i've read the article below :
Modern systems manage their device files automatically. However, a few rare corner
cases may still require you to create devices manually with the mknod command.
So here’s how to do it:
mknod filename type major minor
Here, filename is the device file to be created, type is c for a character device or b
for a block device, and major and minor are the major and minor device numbers.
If you are creating a device file that refers to a driver that’s already present in your
kernel, check the documentation for the driver to find the appropriate major and
minor device numbers.
where can i find this doc and how to find Major & Minor for a device driver ???
The command cat /proc/devices shows the character and block major device numbers in use by drivers in the currently running Linux kernel, but provides no information about minor device numbers.
There is a list of pre-assigned (reserved) device numbers in the Linux kernel user's and administrator's guide: Linux allocated devices (4.x+ version). (The same list also appears in "Documentation/admin-guide/devices.txt" in the Linux kernel sources.) The list shows how minor device numbers are interpreted for each pre-assigned character and block major device number.
Some major device numbers are reserved for local or experimental use, or for dynamic assignment:
60-63 char LOCAL/EXPERIMENTAL USE
60-63 block LOCAL/EXPERIMENTAL USE
Allocated for local/experimental use. For devices not
assigned official numbers, these ranges should be
used in order to avoid conflicting with future assignments.
120-127 char LOCAL/EXPERIMENTAL USE
120-127 block LOCAL/EXPERIMENTAL USE
Allocated for local/experimental use. For devices not
assigned official numbers, these ranges should be
used in order to avoid conflicting with future assignments.
234-254 char RESERVED FOR DYNAMIC ASSIGNMENT
Character devices that request a dynamic allocation of major number will
take numbers starting from 254 and downward.
240-254 block LOCAL/EXPERIMENTAL USE
Allocated for local/experimental use. For devices not
assigned official numbers, these ranges should be
used in order to avoid conflicting with future assignments.
384-511 char RESERVED FOR DYNAMIC ASSIGNMENT
Character devices that request a dynamic allocation of major
number will take numbers starting from 511 and downward,
once the 234-254 range is full.
Character device drivers that call alloc_chrdev_region() to register a range of character device numbers will be assigned an unused major device number from the dynamic range. The same is true for character device drivers that call __register_chrdev() with the first argument (major) set to 0.
Some external ("out-of-tree") Linux kernel modules have a module parameter to allow their default major device number to be specified at module load time. That is useful for drivers that do not create their "/dev" entries dynamically, but want some flexibility for the system administrator to choose a major device number when creating device files manually with mknod.
docs:
https://www.oreilly.com/library/view/linux-device-drivers/0596000081/ch03s02.html
https://tldp.org/LDP/tlk/dd/drivers.html
how to find the appropriate minor & major number for a device number:
ls -l /dev/
cat /proc/devices shows the same as lsblk
I want to make a kernel module that read the DRAM counters to get the number of data read from DRAM (https://software.intel.com/en-us/articles/monitoring-integrated-memory-controller-requests-in-the-2nd-3rd-and-4th-generation-intel).
In that page, they say
"The BAR is available (in PCI configuration space) at Bus 0; Device 0; Function 0; Offset 048H", and UNC_IMC_DRAM_DATA_READS, which I want to read, is on "BAR + 0x5050".
Does it mean that I can get the physical address of DRAM Counter by typing
sudo setpci 00:00:0 48.L
and then + 0x5050 to get the address where the UNC_IMC_DRAM_DATA_READS?
Actually,
sudo setpci 00:00:0 48.L
outputs
fed10001
, and I accessed 0xfed15051 with busybox.
sudo busybox devmem 0xfed15051
However, the two leftmost bit, I mean "00" in 0x00123456, are always zero.
What was wrong, and how can I get the physical address correctly with Bus, Device, Function, and Offset.
Thank you :)
The low bit is an enable bit and should be excluded from the address you use. See for example https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/xeon-e3-1200v6-vol-2-datasheet.pdf (section 3.12 page # 57) -- where it's documented as the MCHBAREN flag.
This document also provides detailed register descriptions of the same registers mentioned in that tech note -- starting at section 7.43 page # 202.
In general, accesses to PCI registers are pretty much always done on 32-bit (DWORD) boundaries. You'll almost never find a counter that overlaps 32-bit words.
I'm trying to write an ansible playbook to automaticly scan new disks and put it into an existing VG and than extend it.
Unfortunately I can't figure out how Linux knows the next device mapper for example (/dev/sdc), to create a perfect ansible playbook to execute this task for me.
Scanning new disk online:
echo 0 0 0 | tee /sys/class/scsi_host/host*/scan
Someone have any idea about this?
Thanks.
You are using confusing terminology. Device mapper is framework which is used by LVM, occasionally one may use device mapper as name for devices created by applications which use device mapper. They are usually can be found in /dev/mapper.
/dev/sdc (and allo other /dev/sd[a-z][a-z]?) are just block devices. They CAN be used by LVM to create PV (physical volume), but they aren't "device mapper".
Now to the answer:
Linux uses 'next available in alphabet letter' for new device. Unfortunately, 'next available' for kernel and for user may be a different thing. If device has been unplugged (or died, or rescanned with reset) and underlying device is marked as been still used, Linux will use 'next letter', so replugged /dev/sdc may appear as /dev/sdd, or, if /dev/sdd is busy, /dev/sde, down to /dev/sdja (I'm not sure where it ends, but there is no such thing as /dev/sdzz AFAIK).
If you want to identify your devices you may use symlinks provided by udev. They are present in /dev/disk and reflects different way to identify devices:
- by-id - device ID is used (usually name and vendor)
- by-partuuid - by UUID of existing partition on disk
- by-uuid - by generated UUID unique for each drive
- by-path - by it's logical location.
I thing last one is the best: If you plug your device in the same slot it would have same name in /dev/disk/by-path regardless of vendor, id, existing filesystems and state of other block devices.
Here few examples of name you may find there:
pci-0000:00:1f.2-ata-3 - ATA disk #3 attached to a specific controller at PCI.
pci-0000:08:00.0-sas-0x50030480013afa6c-lun-0 - SAS drive with WWN 0x50030480013afa6c attached to a specific PCI controller.
pci-0000:01:00.0-scsi-0:2:1:0 - 'strange' scsi device #2 attached to a specific PCI controller. In my case it is a hardware RAID by LSI.
If you really want to handle new devices regardless of their names, please look at Udev scripts, which allows to reacts on new devices. Dealing with udev may be tricky, here example of such scripts in Ceph project: They process all disks with specific paritions ID automatically by udev rules: https://github.com/ceph/ceph/tree/master/udev
What about this?
- name: Find /sys/class/scsi_host hostX softlinks
find:
path: '/sys/class/scsi_host'
file_type: link
pattern: 'host*'
register: _scsi_hosts
- name: Rescanning for new disks
command: 'echo "- - -" > {{ item }}/scan'
changed_when: false
with_items: _scsi_hosts.files.path
In Linux, is there a way to find out which PCI card is plugged into which PCI slot?
/sys/bus/pci/devices/ contains many devices (bridges, CPU channels, etc.) that are not cards and I was not able to find any information about slot-card mappings in the device directories.
You can use
dmidecode –t slot
to find all available pci slots
than you can run
lspci -s <slot number>
command to list device connected to specified slot. You must take bus address from first command and use this address as parameter in second command.
Nebojsa's answer is good, but here's a little more information and an answer to magmabyte's comment.
dmidecode gives you the number of slots, however, those slots are not the only things using the PCI bridge which is why you see many more devices than slots.
Secondly, you may see multiple "devices" per slot, but they are likely just multiple ports on the same card. To give you an example using network interface cards (NICs):
megaman#someserver $ lspci | grep 10Gb
07:00.0 Ethernet controller: Emulex Corporation OneConnect 10Gb NIC (rev 02)
07:00.1 Ethernet controller: Emulex Corporation OneConnect 10Gb NIC (rev 02)
dmidecode indicates that this server has three slots (and it does). Slot 1 has the 10Gb NIC above (you can see that it has 2 ports), slot 2 has a fibre channel card (which also happens to have 2 ports), and finally slot 3 is empty.
There are three physical slots in the server, one is empty, two are filled with multi port cards (an HBA and a NIC).
To answer your question in the comment, the 3 slots you have are the ones indicated by dmidecode and they are likely populated with multi port interface cards.
In my kickstart, I use the following to determine the NIC to use for the OS. For example, some of our servers use an HPE 562SFP+ 2port 10Gb NIC. It would be:
NICPROD=562
USENIC=''
for NIC in /sys/class/net/e*; do
NIC=$(basename ${NIC})
FOUNDNIC=$(lspci -s $(ethtool -i ${NIC} | awk '/bus-info/ { print $2 }' | cut -d: -f2-) -vv | grep -E 'Product Name:')
if [[ "${FOUNDNIC}" == *${NICPROD}* && "${FOUNDNIC}" != *"FLR"* ]]; then
USENIC=${NIC}
break
fi
done
Hopefully this helps?
I am trying to measure certain hardware events on a (Intel Xeon) machine with multiple (physical) processors. Specifically, I wish to know how many requests are issued for reading 'offcore' data.
I found the OFFCORE_REQUESTS hardware event in Intels documentation and it gives the event descriptor 0xB0 and for data demands, the additional mask 0x01.
Would it then be correct to tell perf to record the event 0xB1 (i.e. 0xB0 | 0x01) and to call it as:
perf record -e r0B1 ./mytestapp someargs
Or is this incorrect?
Because perf report shows no output for events entered like this.
The perf documentation is rather sparse in this area, apart from a tutorial entry which does not say which event it was (though this one works for me), or how it was encoded...
Any help is greatly appreciated.
Ok, so I guess I figured it out.
For the the Intel machine I use, the format is as follows:
<umask><eventselector> where both are hexadecimal values. The leading zeros of the umask can be dropped, but not for the event selector.
So for the event 0xB0 with the mask 0x01 I can call:
perf record -e r1B0 ./mytestapp someargs
I could not manage to find the exact parsing of it in the perf kernel code (any kernel hacker here?), but I found these sources:
A description of the use of perf with raw events in the c't magazine 13/03 (subscription required), which describes some raw events with their description from the Intel Architecture Software Developers Manuel (Vol 3b)
A patch on the kernel mailing list, discussing the proper way to document it. It specified that the pattern above was "... was x86 specific and imcomplete at that"
(Updated) The man page of newer versions shows an example on Intel machines: man perf-list
Update:
As pointed out in the comments (thank you!), the libpfm translator can be used to obtain the proper event descriptor. The website linked in the comments (Bojan Nikolic: How to monitor the full range of CPU performance events), discovered by user 'osgx' explains it in further detail.
It seems you can use as well:
perf record -e cpu/event=0xB1,umask=0x1/u ./mytestapp someargs
I don't know where this syntax is documented.
You can probably use the other arguments (edge, inv, cmask) as well.
There are several libraries which can be helpful to work with raw PMU events.
perf's own wiki https://perf.wiki.kernel.org/index.php/Tutorial#Events recommends perf list --help man page for info about raw events encoding. And modern perf versions will list raw events as part of perf list output ("... if linked against the libpfm4 library, provides some short description of the events."). perf list --details will also print raw ids and masks of events.
Bojan Nikolic has "How to monitor the full range of CPU performance events" blog article about libpfm4 (perfmon2) lib usage to encode raw events for perf with help of showevtinfo and check_events tools, which are provided with the same library.
There is also perf python wrapper ocperf which accepts intel's event names. It is written by Andi Kleen (Intel Open Source Technology Center) as part of pmu-tools set of utilities (LWN post from 2013, event lists by intel at https://download.01.org/perfmon/). There is a demo of ocperf (2011) http://halobates.de/modern-pmus-yokohama.pdf:
ocperf
•Perf wrapper to support Intel specific events
•Allows symbolic events and some additional events
ocperf record -a −e offcore_response.any_data.remote_dram_0 sleep 10
PAPI library also has tool to explore raw events with some descriptions - papi_native_avail.