NVM Express Submission Queue Entry Command Format - linux

In NVMe Command format of Submission queue it says Metadata Pointer (MPTR) contains an address of a single contiguous physical buffer that is byte aligned. I am not understanding about this Metadata is whose? Is it the metadata of any file for which i have issued read/write/flush command?

According to the NVMe Spec 1.2, section 5.16 concerning the NVMe Format command, only DWORD 10 is utilized and all other fields are reserved. This means that the Metadata pointer (i.e CDW4 and CDW5) is reserved and should be initialized to 0. It is important to set CDW1 (ie. the NameSpace ID) to the NameSpace which you want to format (or can be set to 0xFFFFFFFF if the NVMe controller supports formatting of ALL Namespaces). And, as is always the case with NVMe Admin commands, you must set CDW0 to indicate which Admin Command you are issuing (i.e. you set the OPC field to 0x80 to indicate you are issuing a Format NVMe command)

Related

NVME sensor reading error with more than 1 NVME configured in entity manager

Hi, I'm trying to read NVMe sensors using NVMeSensor from dbus-sensors. I have configured for 4 Nvmes in my *.json file of entity-manager (EM) config and it logged "Sensor x error reading" for all. I put the config in the common EM config for the board together with Fan sensors, ADCsensors and others, refering this (https://github.com/ibm-openbmc/entity-manager/blob/14a7bc9303d747dbc20cb702083e7af0a3cf0496/configurations/NVME%20P4000.json#L10-L41). In this case, I see that boost::asio::async_read at https://github.com/openbmc/dbus-sensors/blob/ce6bcdfc28f60173093087050a43adbc586fd6fa/src/NVMeBasicContext.cpp#L290 returns the response of size 0. But the resp from https://github.com/openbmc/dbus-sensors/blob/ce6bcdfc28f60173093087050a43adbc586fd6fa/src/NVMeBasicContext.cpp#L83 has size of 6 and valid value.
Howerver, when I config only 1 nvme in EM, it returns value normally on dbus.
I wonder if NVMeSensor only support nvme with a fru and we have to have a single json file for each just like NVMEP4000.json.
What should I do when I want to config all the nvme inside the EM config of the board?. Since I can't find any reference.
I have not found the meaning of "Address" in NVME1000 config since it will use 0x6a anyway, at least to what I have seen. Can you tell me what is it for?
I'm really new to OpenBMC and don't get much of the mechanism of the code, please help to remedy my understanding if it's not correct. Any advice from you will be appreciated a lot.
Thank you.
Edited
I realize that when 1 of the NVME is not present, all of them will fails. I think the failed one affects the stream for reading or the response stream (respStream) although each nvme has a separate request stream (reqStream). I don't know why they interfere each other, but I see that when the resp size from smbus is < 0, they still write them to the stream without resizing the resp vector like when the size is normal, I add the resp.resize(len) here (https://github.com/openbmc/dbus-sensors/blob/ce6bcdfc28f60173093087050a43adbc586fd6fa/src/NVMeBasicContext.cpp#L153), it works, and we can do hot plug. Is that because I did not use FRU probe for the NVMEs....?
I wonder if NVMeSensor only support nvme with a fru and we have to have a single json file for each just like NVMEP4000.json.
The "Probe" field in entity-manager configuration json is used for probe rules for the device. FRU is just one way. For example, if you know the exact i2c bus and address, you can use something like
xyz.openbmc_project.Inventory.Decorator.I2CDevice({'Bus': 4, 'Address': 60})
^ ^ ^
DBus Interface | Value
Property
And "Probe" can be an array with AND OR operators. Like this example.
What should I do when I want to config all the nvme inside the EM config of the board?.
I think adding all 4 NVME1000 blocks to your board json will do this, as long as they have different names and bus-address configuration.
I have not found the meaning of "Address" in NVME1000 config since it will use 0x6a anyway, at least to what I have seen. Can you tell me what is it for?
On Intel P4000 series SSDs, 0x53 (What in nvme_p4000.json) is the 7bit address of the FRU eeprom, while 0x6a is the 7bit address for NVM Express Basic Management Command (Appendix A of NVMe-MI 1.2b Specification). These addresses are only documented in the product spec that not generally available :(
Putting all nvme configs inside the baseboard EM config is OK. There are hotplug issues with dbus-sensors nvmesensor, so when one of my configured nvme is not present, all the others will fail. I only plugged 1 nvme to one of the 4 slots so it causes the problem. I was told they are checking on this, but I'm doing the trick I put in the Edit section of my question.
They hardcode 0x6A for i2c address in nvmesensor code, the reason is as #KagurazakaKotori said.

device driver documentation for linux

in the book 2017 " UNIX and Linux System Administration " i've read the article below :
Modern systems manage their device files automatically. However, a few rare corner
cases may still require you to create devices manually with the mknod command.
So here’s how to do it:
mknod filename type major minor
Here, filename is the device file to be created, type is c for a character device or b
for a block device, and major and minor are the major and minor device numbers.
If you are creating a device file that refers to a driver that’s already present in your
kernel, check the documentation for the driver to find the appropriate major and
minor device numbers.
where can i find this doc and how to find Major & Minor for a device driver ???
The command cat /proc/devices shows the character and block major device numbers in use by drivers in the currently running Linux kernel, but provides no information about minor device numbers.
There is a list of pre-assigned (reserved) device numbers in the Linux kernel user's and administrator's guide: Linux allocated devices (4.x+ version). (The same list also appears in "Documentation/admin-guide/devices.txt" in the Linux kernel sources.) The list shows how minor device numbers are interpreted for each pre-assigned character and block major device number.
Some major device numbers are reserved for local or experimental use, or for dynamic assignment:
60-63 char LOCAL/EXPERIMENTAL USE
60-63 block LOCAL/EXPERIMENTAL USE
Allocated for local/experimental use. For devices not
assigned official numbers, these ranges should be
used in order to avoid conflicting with future assignments.
120-127 char LOCAL/EXPERIMENTAL USE
120-127 block LOCAL/EXPERIMENTAL USE
Allocated for local/experimental use. For devices not
assigned official numbers, these ranges should be
used in order to avoid conflicting with future assignments.
234-254 char RESERVED FOR DYNAMIC ASSIGNMENT
Character devices that request a dynamic allocation of major number will
take numbers starting from 254 and downward.
240-254 block LOCAL/EXPERIMENTAL USE
Allocated for local/experimental use. For devices not
assigned official numbers, these ranges should be
used in order to avoid conflicting with future assignments.
384-511 char RESERVED FOR DYNAMIC ASSIGNMENT
Character devices that request a dynamic allocation of major
number will take numbers starting from 511 and downward,
once the 234-254 range is full.
Character device drivers that call alloc_chrdev_region() to register a range of character device numbers will be assigned an unused major device number from the dynamic range. The same is true for character device drivers that call __register_chrdev() with the first argument (major) set to 0.
Some external ("out-of-tree") Linux kernel modules have a module parameter to allow their default major device number to be specified at module load time. That is useful for drivers that do not create their "/dev" entries dynamically, but want some flexibility for the system administrator to choose a major device number when creating device files manually with mknod.
docs:
https://www.oreilly.com/library/view/linux-device-drivers/0596000081/ch03s02.html
https://tldp.org/LDP/tlk/dd/drivers.html
how to find the appropriate minor & major number for a device number:
ls -l /dev/
cat /proc/devices shows the same as lsblk

DMIDecode product_uuid and product_serial.what is the difference?

There are product_uuid and product_serial files in dir /sys/class/dmi/id/.
How it are generated? What is the difference?
Can I change this files?
Is it save a value after reinstall operation system?
How it are generated?
Those values are generated in kernel code. You can find them pretty easily using git grep command (with keywords you are interested in) in your kernel source directory:
$ git grep --all-match -n -e '\bdmi\b' -e product_uuid -e product_serial
So, product_uuid and product_serial sysfs nodes are created in drivers/firmware/dmi-id.c:
DEFINE_DMI_ATTR_WITH_SHOW(product_serial, 0400, DMI_PRODUCT_SERIAL);
DEFINE_DMI_ATTR_WITH_SHOW(product_uuid, 0400, DMI_PRODUCT_UUID);
From DEFINE_DMI_ATTR_WITH_SHOW definition you can see that both attributes are accessed via sys_dmi_field_show() function, which in turn calls dmi_get_system_info(), which just returns corresponding element from dmi_ident array. This table is populated in dmi_decode() routine:
dmi_save_ident(dm, DMI_PRODUCT_SERIAL, 7);
dmi_save_uuid(dm, DMI_PRODUCT_UUID, 8);
So product_uuid is generated in dmi_save_uuid() function. Just read its code to understand how it's done.
product_serial is generated in dmi_save_ident() function. It boils down to code like this:
(struct dmi_header *)(dmi_base)[7];
where dmi_base is address (remapped to virtual memory obviously) of DMI table, and 7 corresponds to DMI_PRODUCT_SERIAL constant.
To better understand this please see SMBIOS specification, specifically Table 9 – System Information (Type 1) Structure, which corresponds to this command:
# dmidecode --type 1
What is the difference?
As for product_uuid -- look at SMBIOS specification, section 7.2.1 System - UUID. It has description and also table with explanation for each part of this number. Using that table you can decode your UUID and extract some information from it, like timestamp, etc.
As for product_serial -- I believe it's self-explanatory, it's just a serial number for your device. You can usually find it printed on some sticker on your computer. For example, for my laptop it's on the bottom. It's the same string that I see in /sys/class/dmi/id/product_serial.
Can I change this files?
Those files are actually not real files but just an interface to kernel functions. Read about sysfs for details. So in order to "change" those files you need to edit mentioned kernel files accordingly, then rebuild the whole kernel and boot it (instead of one provided by your distribution).
Also, as #ChristopheVu-Brugier mentioned in comment, you can change those values in DMI table (in some tricky way though). But I wouldn't recommend it. Those values definitely have some meaning and may be useful in some cases (if not for you, then for some software in your PC).
Is it save a value after reinstall operation system?
Those values are actually obtained from DMI table, which is hardcoded along with BIOS to permanent memory (flash chip with BIOS on your motherboard) and you just read those values from this DMI table using kernel functions by reading those files.

AHCI specification

I have a question regarding the AHCI spec:
Is the variable pDmaXferCnt in the port used when the transfer is a DMA write or read?
The description in the spec seems to indicate that it isn't, but the PRDs are used instead.
But how does the HBA know how much data is to be sent or received to/from a SATA device?
This information will be available in the sector count of a H2D FIS, but unless I have overlooked it there doesn't seem to be a register of variable that holds this value.
The DX:transmit state also seems to indicate that pDmaXferCnt will have a set value, yet I can’t see where it would be set for a DMA read/write operation.
Thanks
From the spec:
"Implementation Note:
HBA state variables are used to describe the required externally visible behavior. Implementations are not required to have internal state values that directly correspond to these variables." - meaning you (maybe) won't find the pDmaXferCnt externally in a register.
There is another way to track the count though.
Under the HBA Memory Space Usage part of the AHCI spec, there are the data structures of the command list (list of command headers) and the command table (pointed to by the command header, each command table is a command to be sent). These are both accessible to the HBA.
In the command header in DW0 is the PRDTL - which is the count of how many PRD's to be used in the transfer.
Now in the actual command table that the command header points to, contains the actual PRD's, in each PRD is their own DBC or data byte count (amount of data in bytes to be DMA'd at the location specified in DBA). So if you take the each of the PRD's * there own DBC's and add them up you'll get the amount of data to be transfered.
Alternately in the in the command header DW1 is the PRDBC which is the count of bytes transfered, so you could check that after the command.
HBA - Host Bus Adapter
PRDTL - Physical Region Descriptor Table Length
PRD - Physical Region Descriptor (tracks where in physical memory and the count of bytes is to be transfered)
DBC - data byte count (inside a PRD)
DBA - data base address (physical address inside a PRD)
PRDBC - Physical Region Descriptor Byte Count
DMA - direct memory access
For more reading: http://www.intel.com/content/www/us/en/io/serial-ata/serial-ata-ahci-spec-rev1-3-1.html

Acquring Processor ID in Linux

On Microsoft Windows you can get Processor ID (not Process ID) via WMI which is based in this case (only when acquiring Processor ID) on CPUID instruction
Is there a similar method to acquire this ID on Linux ?
I do not know what WMI is and MS-Windows "CPUID instruction", since I do not know or use MS-Windows (few users here do). So I cannot say for sure if this is offering the same information, but have a try with cat /proc/cpuinfo. If you require a specific value you can grep that out easily.
If you need do to this from within a program then you can use the file utils to read such information. Always keep in mind one of the most basic principles of 'unix' style operating systems: everything is a file.
For context of the OP's question, ProcessorID value returned by WMI is documented thus:
Processor information that describes the processor features. For an
x86 class CPU, the field format depends on the processor support of
the CPUID instruction. If the instruction is supported, the property
contains 2 (two) DWORD formatted values. The first is an offset of
08h-0Bh, which is the EAX value that a CPUID instruction returns with
input EAX set to 1. The second is an offset of 0Ch-0Fh, which is the
EDX value that the instruction returns. Only the first two bytes of
the property are significant and contain the contents of the DX
register at CPU reset—all others are set to 0 (zero), and the contents
are in DWORD format.
As an example, on my system:
C:\>wmic path Win32_Processor get ProcessorId
ProcessorId
BFEBFBFF000206A7
Note that the ProcessorID is simply a binary-encoded format of information usually available in other formats, specifically the signature (Family/Model/Stepping/Processor type) and feature flags. If you only need the information, you may not actually need this ID -- just get the already-decoded information from /proc/cpuinfo.
If you really want these 8 bytes, there are a few ways to get the ProcessorID in Linux.
With root/sudo, the ID is contained in the output of dmidecode:
<snip>
Handle 0x0004, DMI type 4, 35 bytes
Processor Information
Socket Designation: CPU Socket #0
Type: Central Processor
Family: Other
Manufacturer: GenuineIntel
ID: A7 06 02 00 FF FB EB BF
<snip>
Note the order of bytes is reversed: Windows returns the results in Big-Endian order, while Linux returns them in Little-Endian order.
If you don't have root permissions, it is almost possible to reconstruct the ProcessorID from /proc/cpuinfo by binary-encoding the values it returns. For the "signature" (the first four bytes in Windows/last four bytes in Linux) you can binary encode the Identification extracted from /proc/cpuinfo to conform to the Intel Documentation Figure 5-2 (other manufacturers use it for compatibility).
Stepping is in bits 3-0
Model is in bits 19-16 and 7-4
Family is in bits 27-20 and 11-8
Processor type is in bits 13-12 (/proc/cpuinfo doesn't tell you this, assume 0)
Similarly, you can populate the remaining four bytes by iterating over the feature flags (flags key in /proc/cpuinfo) and setting bits as appropriate per Table 5-5 of the Intel doc linked above.
Finally, you can install the cpuid package (e.g., on Ubuntu, sudo apt-get install cpuid). Then by running the cpuid -r (raw) command you can parse its output. You would combine the values from the EAX and EDX registers for an initial EAX value of 1:
$ cpuid -r
CPU 0:
0x00000000 0x00: eax=0x0000000d ebx=0x756e6547 ecx=0x6c65746e edx=0x49656e69
0x00000001 0x00: eax=0x000206a7 ebx=0x00020800 ecx=0x9fba2203 edx=0x1f8bfbff
<snip>

Resources