NVME sensor reading error with more than 1 NVME configured in entity manager - linux

Hi, I'm trying to read NVMe sensors using NVMeSensor from dbus-sensors. I have configured for 4 Nvmes in my *.json file of entity-manager (EM) config and it logged "Sensor x error reading" for all. I put the config in the common EM config for the board together with Fan sensors, ADCsensors and others, refering this (https://github.com/ibm-openbmc/entity-manager/blob/14a7bc9303d747dbc20cb702083e7af0a3cf0496/configurations/NVME%20P4000.json#L10-L41). In this case, I see that boost::asio::async_read at https://github.com/openbmc/dbus-sensors/blob/ce6bcdfc28f60173093087050a43adbc586fd6fa/src/NVMeBasicContext.cpp#L290 returns the response of size 0. But the resp from https://github.com/openbmc/dbus-sensors/blob/ce6bcdfc28f60173093087050a43adbc586fd6fa/src/NVMeBasicContext.cpp#L83 has size of 6 and valid value.
Howerver, when I config only 1 nvme in EM, it returns value normally on dbus.
I wonder if NVMeSensor only support nvme with a fru and we have to have a single json file for each just like NVMEP4000.json.
What should I do when I want to config all the nvme inside the EM config of the board?. Since I can't find any reference.
I have not found the meaning of "Address" in NVME1000 config since it will use 0x6a anyway, at least to what I have seen. Can you tell me what is it for?
I'm really new to OpenBMC and don't get much of the mechanism of the code, please help to remedy my understanding if it's not correct. Any advice from you will be appreciated a lot.
Thank you.
Edited
I realize that when 1 of the NVME is not present, all of them will fails. I think the failed one affects the stream for reading or the response stream (respStream) although each nvme has a separate request stream (reqStream). I don't know why they interfere each other, but I see that when the resp size from smbus is < 0, they still write them to the stream without resizing the resp vector like when the size is normal, I add the resp.resize(len) here (https://github.com/openbmc/dbus-sensors/blob/ce6bcdfc28f60173093087050a43adbc586fd6fa/src/NVMeBasicContext.cpp#L153), it works, and we can do hot plug. Is that because I did not use FRU probe for the NVMEs....?

I wonder if NVMeSensor only support nvme with a fru and we have to have a single json file for each just like NVMEP4000.json.
The "Probe" field in entity-manager configuration json is used for probe rules for the device. FRU is just one way. For example, if you know the exact i2c bus and address, you can use something like
xyz.openbmc_project.Inventory.Decorator.I2CDevice({'Bus': 4, 'Address': 60})
^ ^ ^
DBus Interface | Value
Property
And "Probe" can be an array with AND OR operators. Like this example.
What should I do when I want to config all the nvme inside the EM config of the board?.
I think adding all 4 NVME1000 blocks to your board json will do this, as long as they have different names and bus-address configuration.
I have not found the meaning of "Address" in NVME1000 config since it will use 0x6a anyway, at least to what I have seen. Can you tell me what is it for?
On Intel P4000 series SSDs, 0x53 (What in nvme_p4000.json) is the 7bit address of the FRU eeprom, while 0x6a is the 7bit address for NVM Express Basic Management Command (Appendix A of NVMe-MI 1.2b Specification). These addresses are only documented in the product spec that not generally available :(

Putting all nvme configs inside the baseboard EM config is OK. There are hotplug issues with dbus-sensors nvmesensor, so when one of my configured nvme is not present, all the others will fail. I only plugged 1 nvme to one of the 4 slots so it causes the problem. I was told they are checking on this, but I'm doing the trick I put in the Edit section of my question.
They hardcode 0x6A for i2c address in nvmesensor code, the reason is as #KagurazakaKotori said.

Related

More memory for ESP32 when using bluetooth, SPIFFS and WiFi

I'm working on a project that will have SPIFFS, Bluetooth and WiFi libraries. The program is all set so the librarys don't interfere in the communication, since Bluetooth can't work when WiFi is set. But I'm getting the following problem when I attempt to add a command line from the library https://github.com/mobizt/Firebase-ESP32, this library is responsable of making connection to firestore database:
text section exceeds available space in boardSketch uses 1517102 bytes (115%) of program storage space. Maximum is 1310720 bytes.
Global variables use 63300 bytes (19%) of dynamic memory, leaving 264380 bytes for local variables. Maximum is 327680 bytes.
Sketch too big; see http://www.arduino.cc/en/Guide/Troubleshooting#size for tips on reducing it.
Error compiling for board DOIT ESP32 DEVKIT V1.
I only get this error when adding this piece of code:
Firebase.begin(&config, &auth);
Firebase.reconnectWiFi(true);
I'm using arduino ide to work with esp32, but I have esp-idf in case it helps solving the issue.
It looks like you've reached the limit of the partition size for your application. I don't know how Arduino configures the ESP IDF partitions, but you should be able to change them however you want. See the documetation on partitions

Linux: writing to the i2c/SMBus

I have a problem with the i2c/SMBus on a Linux System with an Intel Apollo Lake processor. I am trying to read/write from/to an EEPROM but I face some issues. My EEPROM is located at the address 0x56, and I am able to watch the Bus with my Logic Analyzer.
When I try to read from the device via i2ctools (i2cget) the System behaves as expected. My issue occurs when I try to perform a write command via i2cset for example. i2cset ends with an error (Write failed). Because I am able to watch the Bus electrically I can also say that all lines stay HIGH and the Bus is not touched. I was able to activate the dev_dbg() functions in the i2c driver i2c_i801 and when I perform i2cset I am able to find (dmesg) the debug message:
[ 765.095591] [2753] i2c_i801:i801_check_post:433: i801_smbus 0000:00:1f.1: No response
When running my minimal I²C Python code using the smbus2 lib, I get the following error message and the above mentioned debug message:
from smbus2 import SMBus
bus = SMBus(0)
b = bus.read_byte_data(86,10) #<- This is performed
b = bus.write_byte_data(86,10,12) #<- This is not performed
bus.close()
Error:
File "usr/local/lib/python3.8/dist-packagers/smbus2-0.4.0-py3.8.egg/smbus/smbus2.py", line 455, in write_byte_data
ioctl(self.fd, I2C_SMBUS, msg)
OSERROR: [Errno 6] No such device or adress
A big hint for me is that I am not able to perform a write command in the address space form 0x50 to 0x57. My guess is that some driver locks the address space to prevent write command to that "dangerous" area.
My question is: "Does anyone know this kind of behavior and is there a solution so that I can write to my EEPROM at address 0x56? OR Is there a lock surrounding the i2c adress space from 0x50 to 0x57 and who is my opponent?"
I am kind of a newbie to the whole driver and kernel world so please be kind and it is quite possible that I made a beginner mistake.
I would appreciate hints and tips I can look after surrounding my problem.
It seems that I found the cause of my problem. In this Forum post is described that Intel changed a configuration Bit at the SMBus controller.
OK, I know what's going on.
Starting with the 8-Series/C220 chipsets, Intel introduced a new
configuration bit for the SMBus controller in register HOSTC (PCI
D31:F3 Address Offset 40h):
Bit 4 SPD Write Disable - R/WO.
0 = SPD write enabled.
1 = SPD write disabled. Writes to SMBus addresses 50h - 57h are
disabled.
This badly documented change in the configuration explains the issues.
One Challenge is, that to apply and enable changes to the SPD-write Bit the System needs to be rebooted. Unfortunately while rebooting the BIOS will change the Bit back to the default. The only solution seems to be an adaption in the BIOS.
For me, this issue is resolved. I just wanted to share this information in case someone faces the same issues.

BlueZ remote device presence

Using BlueZ, which
is the official Linux Bluetooth stack
I'd like to know which of the below two methods are better suited for detecting a device's presence in the nearby.
To be more exact, I want to periodically scan for a Bluetooth device (not BLE => no advertisement packets are sent).
I found two ways of detecting it:
1.) Using l2ping
# l2ping BTMAC
2.) Using hcitool
# hcitool name BTMAC
Both approaches working.
I'd like to know, which approach would drain more battery of the scanned device?
Looking at solution #1 (l2ping's source):
It uses a standard socket connect call to connect to the remote device, then uses the send command to send data to it:
send(sk, send_buf, L2CAP_CMD_HDR_SIZE + size, 0)
Now, L2CAP_CMD_HDR_SIZE is 4, and default size is 44, so altogether 48 bytes are sent, and received back with L2CAP_ECHO_REQ.
For hcitool I just have found the entrypoint:
int hci_read_remote_name(int dd, const bdaddr_t *bdaddr, int len, char *name, int to);
My questions:
which of these approaches are better (less power-consuming) for the remote device? If there is any difference at all.
shall I reduce the l2ping's size? What shall be the minimum?
is my assumption correct that hci_read_remote_name also connects to the remote device and sends some kind of request to it for getting back its name?
To answer your questions:-
which of these approaches are better (less power-consuming) for the remote device? If there is any difference at all.
l2ping BTMAC is the more suitable command purely because this is what it is meant to do. While "hcitool name BTMAC" is used to get the remote device's name, "l2ping" is used to detect its presence which is what you want to achieve. The difference in power consumption is really minimal, but if there is any then l2ping should be less power consuming.
shall I reduce the l2ping's size? What shall be the minimum?
If changing the l2ping size requires modifying the source code then I recommend leaving it the same. By leaving it the same you are using the same command that has been used countless times and the same command that was used to qualify the BlueZ stack. This way there's less chance for error and any change would not result in noticeable performance or power improvements.
is my assumption correct that hci_read_remote_name also connects to the remote device and sends some kind of request to it for getting back its name?
Yes your assumption is correct. According the Bluetooth Specification v5.2, Vol 4, Part E, Section 7.1.19 Remote Name Request Command:
If no connection exists between the local device and the device
corresponding to the BD_ADDR, a temporary Link Layer connection will
be established to obtain the LMP features and name of the remote
device.
I hope this helps.

netlink and big endian format

I have not found any document/note in the kernel that would mandate to pass 16/32-bit values in netlink messages towards the kernel in network byte order. So my question is if I have to use htonl/htons functions when filling up netlink message. Is there such requirement at all?
according to this article this could be controlled on per-attribute basis
There are two special flags which may be present in netlink
attributes, though I have yet to encounter them in my work.
NLA_F_NESTED: specifies a nested attribute; used as a hint for
parsing. Doesn’t always appear to be used, even if nested attributes
are present. NLA_F_NET_BYTEORDER: attribute data is stored in network
byte order (big endian) instead of host endianness
UPD: looks like native (little) endian does not work well for some cases: I'm getting errno 4097 trying to pass IPSET CREATE timeout using it. network byte order works fine.

How can I determine what MTD flash device is installed (e.g. get the ID or serial number)?

Using uClinux we have one of two flash devices installed, a 1GB flash or a 2GB flash.
The only way I can think of solving this is to somehow get the device ID - which is down the in the device driver code, for me that is in:
drivers/mtd/devices/m25p80.c
I have been using the command mtdinfo (which comes from mtdutils binaries, derived from mtdinfo.c/h). There is various information stored in here about the flash partitions including flash type 'nor' eraseblock size '65536', etc. But nothing that I can identify the chip with.
Its not very clear to me how I can get information from "driver-land" into "user-land". I am looking at trying to extend the mtdinfo command to print more information but there are many layers...
What is the best way to achieve this?
At the moment, I have found no easy way to do this without code changes. However I have found an easy code change (probably a bit of a hack) that allows me to get the information I need:
In the relevant file (in my case drivers/mtd/devices/m25p80.c) you can call one of the following:
dev_err("...");
dev_alert("...");
dev_warn("...");
dev_notice("...");
_dev_info("...");
Which are defined in include/Linux/device.h, so they are part of the Linux driver interface so you can use them from any driver.
I found that the dev_err() and devalert() both get printed out "on screen" during run time. However all of these device messages can be found in /var/log/messages. Since I added messages in the format: dev_notice("JEDEC id %06x\n", jedecid);, I could find the device ID with the following command:
cat /var/log/messages | grep -i jedec
Obviously using dev_err() ordev_alert() is not quite right! - but dev_notice() or even _dev_info() seem more appropriate.
Not yet marking this as the answer since it requires code changes - still hoping for a better solution if anyone knows of one...
Update
Although the above "solution" works, its a bit crappy - certainly will do the job and good enough for mucking around. But I decided that if I am making code changes I may as well do it properly. So I have now implemented changes to add an interface in sysfs such that you can get the flash id with the following command:
cat /sys/class/m25p80/m25p80_dev0/device_id
The main function calls required for this are (in this order):
alloc_chrdev_region(...)
class_create(...)
device_create(...)
sysfs_create_group(...)
This should give enough of a hint for anyone wanting to do the same, though I can expand on that answer if anyone wants it.

Resources