The dmaengine in Linux - how to check the number of actually transferred bytes in a single SG transfer? - linux

The dmaengine in Linux significantly simplifies writing of drivers for devices using DMA, especially if they support and use scatter-gather (SG) transfers.
However, there is a problem in case if the length of such transfer is not known a priori. This situation is quite common. It may happen in case of USB transfers, or in case of AXI Stream transfers. In the case of an AXI Stream connected device, the transfer should be terminated by setting the tlast signal to '1'. The driver should prepare the buffer for the longest possible transfer, but after the transfer is complete, it should be able to find the number of actually transferred bytes. Unfortunately it seems, that there is no documented way to read the length of the completed transfer.
The single transfer is represented with an SG-table, that is later converted into the dma_async_tx_descriptor, using the dmaengine_prep_slave_sg function , submitted to the DMA layer (e.g. like here), and finally scheduled for execution via dma_async_issue_pending. After that, the transfer descriptor may be accessed only via a dma_cookie returned by the dmaengine_prep_slave_sg function.
The problem with determining the actual length of the transfer was reported for the USB transfers, and a patch adding the transferred field to the dma_async_tx_descriptor was proposed. However that proposal was rejected, and after a discussion a solution based on using the dmaengine_tx_status, and checking the residue field in the returned dma_tx_state structure.
Unfortunately, it seems that the proposed solution does not work neither in the 4.4 kernel (used for Xilinx SoC devices) nor in the newest 4.7 kernel.
The residue field is set to 0 in case of completed transfers regardless of the actual number of transferred bytes.
So my question is: How can I reliably determine the actual number of transferred bytes in the completed SG DMA transfer in a dmaengine compatible driver?
PS. That question with more Xilinx-specific details related to the AXI DMA IP core has been also asked on the Xilinx forum.

Related

BlueZ remote device presence

Using BlueZ, which
is the official Linux Bluetooth stack
I'd like to know which of the below two methods are better suited for detecting a device's presence in the nearby.
To be more exact, I want to periodically scan for a Bluetooth device (not BLE => no advertisement packets are sent).
I found two ways of detecting it:
1.) Using l2ping
# l2ping BTMAC
2.) Using hcitool
# hcitool name BTMAC
Both approaches working.
I'd like to know, which approach would drain more battery of the scanned device?
Looking at solution #1 (l2ping's source):
It uses a standard socket connect call to connect to the remote device, then uses the send command to send data to it:
send(sk, send_buf, L2CAP_CMD_HDR_SIZE + size, 0)
Now, L2CAP_CMD_HDR_SIZE is 4, and default size is 44, so altogether 48 bytes are sent, and received back with L2CAP_ECHO_REQ.
For hcitool I just have found the entrypoint:
int hci_read_remote_name(int dd, const bdaddr_t *bdaddr, int len, char *name, int to);
My questions:
which of these approaches are better (less power-consuming) for the remote device? If there is any difference at all.
shall I reduce the l2ping's size? What shall be the minimum?
is my assumption correct that hci_read_remote_name also connects to the remote device and sends some kind of request to it for getting back its name?
To answer your questions:-
which of these approaches are better (less power-consuming) for the remote device? If there is any difference at all.
l2ping BTMAC is the more suitable command purely because this is what it is meant to do. While "hcitool name BTMAC" is used to get the remote device's name, "l2ping" is used to detect its presence which is what you want to achieve. The difference in power consumption is really minimal, but if there is any then l2ping should be less power consuming.
shall I reduce the l2ping's size? What shall be the minimum?
If changing the l2ping size requires modifying the source code then I recommend leaving it the same. By leaving it the same you are using the same command that has been used countless times and the same command that was used to qualify the BlueZ stack. This way there's less chance for error and any change would not result in noticeable performance or power improvements.
is my assumption correct that hci_read_remote_name also connects to the remote device and sends some kind of request to it for getting back its name?
Yes your assumption is correct. According the Bluetooth Specification v5.2, Vol 4, Part E, Section 7.1.19 Remote Name Request Command:
If no connection exists between the local device and the device
corresponding to the BD_ADDR, a temporary Link Layer connection will
be established to obtain the LMP features and name of the remote
device.
I hope this helps.

RN4870 Prepare write request and Execute write request

I have an Android App that write several bytes to a Bluetooth device.
Looking on btsnoop_hci.log I see that, when a large amount of bytes are sent to the BLE device, the app use Prepare Write Request more times and then Execute Write Request: Immediately Write All.
Now my problem is how to perform this with a my application using a RN4870 module.
At this moment I can connect, read service and characteristics, and write using
CHW command as described in the manual when the there are few bytes.
But I cannot write as the remote BLE device expect when there are lot of bytes.
Thank You for support
Marco
This is the Microchip answer:
Hello,
The core specifications are handled by the firmware.
The user doesn't have access at this level, so is nothing you can set.
Regarding the long data question:
"Does RN4870 module support the Data Length Extension feature? "
RN4870 rev 1.28 support DLE, but partially. The normal packet size in BLE without DLE is 20 bytes.
With a standard DLE feature, the normal packet size should be 251 bytes.
However, in RN4870 Rev 1.28, the packet size is 151 bytes. So it is not a full implementation of the DLE.
The DLE feature (Data Length Extension) is embedded into the lower levels of the Bluetooth stack and there are no specific commands to enable or disable DLE. Essentially, if the peer device also supports DLE, then the DLE will be enabled.
So there are no specific (commands) that you need to do to increase throughput through DLE.
Regards,
In other words there is nothing to do!
In Android Application you can't directly set the DLE length, instead you should set the MTU size. Android Bluetooth stack will calculate DLE length based on the MTU. Maximum Data Length supported by BT protocol in 251, but can be between 27 and 251 depends on BT controller H/W capability. During connection BT Device will negotiate with peer device(If peer device support DLE) to set Maximum DLE size supported by both device.
To increase your throughput you can use the maximum supported MTU size of 512. Also you can write without response and do error check on data using your own logic like checking parity or CRC and re-transmit data from Application for better throughput.

spi_write_then_read with variant register size

As I understand the term "word length" (spi_bits_per_word) in spi, defines the CS (chip select) active time.
It therefore seems that linux driver will function correctly when dealing with simple spi protocols which keeps word size constant.
But, How can we deal with spi protocols which use different spi size as part of protocol.
for example cs need to be active for sending spi word - 9 bits, and then reading spi - 8 bits or 24 bits (the length of the register read is different each time, depends on register)
How can we implement that using spi_write_then_read ?
Do we need to set bits_per_word size for the sending and then another bits_per_word for the receiving ?
Regards,
Ran
"word length" means number of bits you can send in one transaction. It doesn't defines the CS (chip select) active time. You can keep it active for whatever time you want(least is for word-length).
SPI has got some format. You cannot randomly read-write whatever number of bits you want.Most of SPI supports 4-bit, 8-bit, 16-bit and 32-bit mode. If the given mode doesn't satisfy your requirement then you need to break your requirement. For eg:- To read 24-bit data, we need to use 8-bit word-length transfer for 3 times.
Generally SPI is fullduplex means it will read at same time it will write.

What type of FEI should I use?

I have a device that has one 40MHz wideband input and can output one wideband output and 8 narrowband DDC channels from a device. I am not sure how to setup the FEI interface for this device. Should it be a CHANNELIZER or an RX_DIGITIZER_CHANNELIZER? I am leaning towards the latter due to the wideband output.
Regarding the allocation of such a device. For the RX_DIGITIZER_CHANNELIZER types, the manual is vague in the use of the CHANNELIZER portion in this instance. Should I still allow CHANNELIZERS to be allocated or just allocate the RX_DIGITIZER_CHANNELIZERs and DDCs? Should changes to the RX_DIGITIZER_CHANNELIZER determine when to drop the DDCs in this instance? If CHANNELIZERS can still be allocated with a RX_DIGITIZER_CHANNELIZER, how does this work?
A channelizer device takes already digitized data and provides DDCs from this digital stream. A RX_DIGITIZER_CHANNELIZER has a analog input and performs the receiver, digitizer and channelizer function in a non-separable device. It sounds like you choose correctly to use the RX_DIGITIZER_CHANNELIZER for a device with an analog input. Typically allocating the RX_DIGITIZER_CHANNELIZER takes care of allocating the Channelizer because all three parts are linked together. Therefore the only additional allocations are usually DDCs.

Linux DMA API: specifying address increment behavior?

I am writing a driver for Altera Soc Developement Kit and need to support two modes of data transfer to/from a FPGA:
FIFO transfers: When writing to (or reading from) an FPGA FIFO, the destination (or source) address must not be incremented by the DMA controller.
non-FIFO transfers: These are normal (RAM-like) transfers where both the source and destination addresses require an increment for each word transferred.
The particular DMA controller I am using is the CoreLink DMA-330 DMA Controller and its Linux driver is pl330.c (drivers/dma/pl330.c). This DMA controller does provide a mechanism to switch between "Fixed-address burst" and "Incrementing-address burst" (these are synonymous with my "FIFO transfers" and "non-FIFO transfers"). The pl330 driver specifies which behavior it wants by setting the appropriate bits in the CCRn register
#define CC_SRCINC (1 << 0)
#define CC_DSTINC (1 << 14)
My question: it is not at all clear to me how clients of the pl330 (my driver, for example) should specify the address-incrementing behavior.
The DMA engine client API says nothing about how to specify this while the DMA engine provider API simply states:
Addresses pointing to RAM are typically incremented (or decremented)
after each transfer. In case of a ring buffer, they may loop
(DMA_CYCLIC). Addresses pointing to a device's register (e.g. a FIFO)
are typically fixed.
without giving any detail as to how the address types are communicated to providers (in my case the pl300 driver).
The in the pl330_prep_slave_sg method it does:
if (direction == DMA_MEM_TO_DEV) {
desc->rqcfg.src_inc = 1;
desc->rqcfg.dst_inc = 0;
desc->req.rqtype = MEMTODEV;
fill_px(&desc->px,
addr, sg_dma_address(sg), sg_dma_len(sg));
} else {
desc->rqcfg.src_inc = 0;
desc->rqcfg.dst_inc = 1;
desc->req.rqtype = DEVTOMEM;
fill_px(&desc->px,
sg_dma_address(sg), addr, sg_dma_len(sg));
}
where later, the desc->rqcfg.src_inc, and desc->rqcfg.dst_inc are used by the driver to specify the address-increment behavior.
This implies the following:
Specifying a direction = DMA_MEM_TO_DEV means the client wishes to pull data from a FIFO into RAM. And presumably DMA_DEV_TO_MEM means the client wishes to push data from RAM into a FIFO.
Scatter-gather DMA operations (for the pl300 at least) is restricted to cases where either the source or destination end point is a FIFO. What if I wanted to do a scatter-gather operation from system RAM into FPGA (non-FIFO) memory?
Am I misunderstanding and/or overlooking something? Does the DMA engine already provide a (undocumented) mechanism to specify address-increment behavior?
Look at this
pd->device_prep_dma_memcpy = pl330_prep_dma_memcpy;
pd->device_prep_dma_cyclic = pl330_prep_dma_cyclic;
pd->device_prep_slave_sg = pl330_prep_slave_sg;
It means you have different approaches like you have read in documentation. RAM-like transfers could be done, I suspect, via device_prep_dma_memcpy().
It appears to me (after looking to various drivers in a kernel) the only DMA transfer styles which allows you (indirectly) control auto-increment behavior is the ones which have enum dma_transfer_direction in its corresponding device_prep_... function.
And this parameter declared only for device_prep_slave_sg and device_prep_dma_cyclic, according to include/linux/dmaengine.h
Another option should be to use and struct dma_interleaved_template which allows you to specify increment behaviour directly. But support for this method is limited (only i.MX DMA driver does support it in 3.8 kernel, for example. And even this support seems to be limited)
So I think, we are stuck with device_prep_slave_sg case with all sg-related complexities for a some while.
That is that I am doing at the moment (although it is for accessing of some EBI-connected device on Atmel SAM9 SOC)
Another thing to consider is a device's bus width. memcopy-variant can perform different bus-width transfers, depending on a source and target addresses and sizes. And this may not match size of FIFO element.

Resources