Is there possible to stream infinte data over SPI using DMA on STM32F3?

Is there possible to stream infinte data over SPI using DMA on STM32F3? - protocols

I'm developing a RF modem based on a new protocol, which has a feature of streaming 96 Bytes in one frame - but they are sent on and on, before communication ends. I plan using two 96 Bytes buffers in STM32 - in next lines I will explain why.
I want to send first 96 Byte frames by the USB-CDC to STM32 - then external modem chip will generate a "9600bps" clock and STM will have to write Payload bits by bits on specified output pin(at the trailing edge of the each clock pulse).
When STM32 will notice that it had sent a half of 96 Byte frame - that it sent to PC notification to send more data - PC will refill second 96 Byte buffer by USB-CDC immediately. When STM32 will end sending first buffer - immediately starts sending second buffer content. When it will send half of second buffer - as previous will ask PC for another 96Byte frame.
And that way all the time, before PC will sent command to stop tx.
This transfer mode - a serial, with using a "trigger clock".
Is this possible using DMA, and how could I set it?
I want to use DMA to have ability to use USB while already streaming data to the radio modem chip. Is this the right approach?
I'm working in project building an opensource radiocommunication system project with both packet and stream capatibilities & digital voice. I'm designing and electronics for PC radiomodem. Project is called M17 and is maintained by Wojtek SP5WWP.

Re. general architecture. Serial communication over USB ACM does not have to use buffer of the same size and be synchronized with the downstream communication over SPI. You could use buffers as big as practically possible so PC can send data in advance. This will reduce the chance of buffer underflow if PC does not provide data fast enough. Use a circular buffer and fill it when a packet arrives from USB.
DMA is the right approach. Although people often say that DMA is only necessary for high bandwidth operations, it may be actually easier to work with DMA than handling interrupts per every byte, even when you only handle 9600 bits per second.
DMA controller in STM32F3 has a Half-Transfer Complete (HTIF in DMA_ISR) bit that you can poll or make it generate and interrupt. In conjunction with the Transfer Complete status (TCIF) and the Circular bit (CIRC in DMA_CCR) you can organize a double-buffered data pipe so that transfers can overlap with whatever else the MCU is doing. The application will reload the first half of the DMA buffer on the HTIF event. When the TCIF event happens, it reloads the second half. It has to be done quickly, before the other half is also completed. However, you need a double buffered pipeline only when you need to constantly stream data, i.e. overall amount is larger than can the size of the DMA buffer.
Stopping a circular DMA may be tricky. I suppose both the STM32 and external chip know how many bytes to send. In that case, after this amount is received, disable the DMA.
It seems you need a slave SPI in STM32 as the external chip generates SPI clock.
DMA is not difficult to set up, however, it needs multiple things to work properly. I assume register-level programming, if you use some kind of framework, you'll need to find out how it implements these features. Enable clocks for SPI, GPIO port for SPI pins, and DMA, configure the pins as AF. Find the right DMA channel for the SPI peripheral. In case of SPI DMA you usually need two channels: TX and RX, but with the slave SPI, you may get away with one. Configure SPI, pay attention to clock polarity and phase, and set it to generate a DMA request for each TX and/or RX. Set the DMA CPAR channel register pointing to the SPI DR register in channel(s) and program all other DMA channel registers appropriately. Enable the DMA channel(s). Enable SPI in slave mode. When the SPI master clocks data on the MOSI/SCK pins, the DMA controller will put them in memory. When the buffer is half-full and full-full, the channel will set the HTIF and TCIF bits and generate and interrupt, if you told it to. Use these events to implement flow control.

Related

How to increase polling speed of data from external ADC (controller and ADC has SPI interface) without any external interrupt?

I have an an external ADC which has capability to sample data at 40 Ksps and controller needs to poll the data from ADC. I am unable to get enough sampling rate by keep on calling it from while (1) loop, i.e nearly 8 Ksps what I am getting. It has no external pin to notify controller about data ready.
1) How to make it fast enough to reach up to that sampling rate?
2) Since I am sampling it at that rate, I need to transfer data on USB simultaneously so, how should I implement the buffering scheme to have minimum delay between consecutive data packet?
FYI:
My SYSCLK :168 MHz
SPI Clock: 10.5 MHz
controller: STM32F4

How to set Inter-Byte Delay Timeout to milliseconds?

I'm currently working with termios for serial communication in Linux.
I need to set an intercharacter timeout to 5ms.
I found a way to set intercharacter timeout using VMIN and VTIME where VMIN has to be VMIN > 0 and VTIME > 0.
The problem is that i need to set the VTIME to 5ms, but the VTIME is expressed in tenths of a second.
VTIME data type is unsigned char, so i can't just set it to 0.05.
Does anyone know if there is some way around this?

I need to set an intercharacter timeout to 5ms.
...
Does anyone know if there is some way around this?
No, there is no way to set a shorter termios timeout than 100 ms.
Depending on your hardware and kernel configuration, this timeout may not be reliable at all, especially if you are trying to detect time-separated messages.
The termios handling is at least a full layer above the UART device driver (see
Linux serial drivers).
Unless your kernel is configured to ensure that the bottom-half of the UART driver and the kworker threads for termios are high priority and low latency, then short intercharacter intervals cannot be accurately or reliably determined.
If the UART utilizes a FIFO to buffer incoming data, then that hardware obscures the intercharacter spacing that the software can detect.
Similarly when the UART driver is using DMA to store the received data, intercharacter timing will be obscured.
With DMA the CPU is not involved with handling the received data until the DMA operation is complete, and all temporal information about any intercharacter separation is gone.
(Crucial information such as framing error and/or parity error is difficult/impossible to pinpoint to a specific byte when using DMA.)
Even without DMA, termios will only be able to use timing based on the transfer of data through the tty flip buffers (which is a layer removed from the timing on the wire).
Some UARTs do have hardware that assist in detecting the end-of-message by idle line.
For example Atmel/Microchip ATSAMA5 and AT91SAM9 SoCs have USARTs with a Receiver Timeout feature that measures the idle time after each received frame.
When this idle line time exceeds a specified value, an interrupt can be generated.
The Linux driver for the Atmel USART typically uses the receiver-timeout interrupt to (prematurely) terminate the current DMA receive operation, and copy the contents of the DMA buffer to the tty flip buffer.
In summary you cannot or should not rely solely on VMIN and VTIME settings to detect time-separated messages. See Parsing time-delimited UART data.
The message packets need to have delimiter/sentinel characters/bytes so that messages can be reliably parsed and validated.
See parsing complete messages from serial port for an example of efficient use of syscalls with a local buffer.

How to modify timing in readyRead of QextSeriaport [duplicate]

I'm implementing a protocol over serial ports on Linux. The protocol is based on a request answer scheme so the throughput is limited by the time it takes to send a packet to a device and get an answer. The devices are mostly arm based and run Linux >= 3.0. I'm having troubles reducing the round trip time below 10ms (115200 baud, 8 data bit, no parity, 7 byte per message).
What IO interfaces will give me the lowest latency: select, poll, epoll or polling by hand with ioctl? Does blocking or non blocking IO impact latency?
I tried setting the low_latency flag with setserial. But it seemed like it had no effect.
Are there any other things I can try to reduce latency? Since I control all devices it would even be possible to patch the kernel, but its preferred not to.
---- Edit ----
The serial controller uses is an 16550A.

Request / answer schemes tends to be inefficient, and it shows up quickly on serial port. If you are interested in throughtput, look at windowed protocol, like kermit file sending protocol.
Now if you want to stick with your protocol and reduce latency, select, poll, read will all give you roughly the same latency, because as Andy Ross indicated, the real latency is in the hardware FIFO handling.
If you are lucky, you can tweak the driver behaviour without patching, but you still need to look at the driver code. However, having the ARM handle a 10 kHz interrupt rate will certainly not be good for the overall system performance...
Another options is to pad your packet so that you hit the FIFO threshold every time. It will also confirm that if it is or not a FIFO threshold problem.
10 msec # 115200 is enough to transmit 100 bytes (assuming 8N1), so what you are seeing is probably because the low_latency flag is not set. Try
setserial /dev/<tty_name> low_latency
It will set the low_latency flag, which is used by the kernel when moving data up in the tty layer:
void tty_flip_buffer_push(struct tty_struct *tty)
{
unsigned long flags;
spin_lock_irqsave(&tty->buf.lock, flags);
if (tty->buf.tail != NULL)
tty->buf.tail->commit = tty->buf.tail->used;
spin_unlock_irqrestore(&tty->buf.lock, flags);
if (tty->low_latency)
flush_to_ldisc(&tty->buf.work);
else
schedule_work(&tty->buf.work);
}
The schedule_work call might be responsible for the 10 msec latency you observe.

Having talked to to some more engineers about the topic I came to the conclusion that this problem is not solvable in user space. Since we need to cross the bridge into kernel land, we plan to implement an kernel module which talks our protocol and gives us latencies < 1ms.
--- edit ---
Turns out I was completely wrong. All that was necessary was to increase the kernel tick rate. The default 100 ticks added the 10ms delay. 1000Hz and a negative nice value for the serial process gives me the time behavior I wanted to reach.

Serial ports on linux are "wrapped" into unix-style terminal constructs, which hits you with 1 tick lag, i.e. 10ms. Try if stty -F /dev/ttySx raw low_latency helps, no guarantees though.
On a PC, you can go hardcore and talk to standard serial ports directly, issue setserial /dev/ttySx uart none to unbind linux driver from serial port hw and control the port via inb/outb to port registers. I've tried that, it works great.
The downside is you don't get interrupts when data arrives and you have to poll the register. often.
You should be able to do same on the arm device side, may be much harder on exotic serial port hw.

Here's what setserial does to set low latency on a file descriptor of a port:
ioctl(fd, TIOCGSERIAL, &serial);
serial.flags |= ASYNC_LOW_LATENCY;
ioctl(fd, TIOCSSERIAL, &serial);

In short: Use a USB adapter and ASYNC_LOW_LATENCY.
I've used a FT232RL based USB adapter on Modbus at 115.2 kbs.
I get about 5 transactions (to 4 devices) in about 20 mS total with ASYNC_LOW_LATENCY. This includes two transactions to a slow-poke device (4 mS response time).
Without ASYNC_LOW_LATENCY the total time is about 60 mS.
With FTDI USB adapters ASYNC_LOW_LATENCY sets the inter-character timer on the chip itself to 1 mS (instead of the default 16 mS).
I'm currently using a home-brewed USB adapter and I can set the latency for the adapter itself to whatever value I want. Setting it at 200 µS shaves another mS off that 20 mS.

None of those system calls have an effect on latency. If you want to read and write one byte as fast as possible from userspace, you really aren't going to do better than a simple read()/write() pair. Try replacing the serial stream with a socket from another userspace process and see if the latencies improve. If they don't, then your problems are CPU speed and hardware limitations.
Are you sure your hardware can do this at all? It's not uncommon to find UARTs with a buffer design that introduces many bytes worth of latency.

At those line speeds you should not be seeing latencies that large, regardless of how you check for readiness.
You need to make sure the serial port is in raw mode (so you do "noncanonical reads") and that VMIN and VTIME are set correctly. You want to make sure that VTIME is zero so that an inter-character timer never kicks in. I would probably start with setting VMIN to 1 and tune from there.
The syscall overhead is nothing compared to the time on the wire, so select() vs. poll(), etc. is unlikely to make a difference.

Inserting ethernet frames of a particular ethertype in ahead of TCP/IP frames in netdev_queue

We have developed an Application Specific Integrated Circuit for power line communications. The chip has an ethernet interface. If the ASIC receives an ethernet frame containing TCP/IP or ARP payload (ethertypes 0x0800 IPv4, 0x0806 ARP and 0x86DD IPv6), it simply forwards the frame onto the power line and does the same in the other direction. We call such frames data frames.
If the ASIC receives an ethernet frame of a specific ethertype (we use 0x88b5 which is allocated for experimental//public use on local networks), it consumes this frame itself. These frames contain configuration settings for the ASIC. We call these configuration frames.
The chip is connected to an Ethernet LAN on one side and to power line on the other end. So it basically bridges the two network. The ASIC requires throttling of the data frames passing through it. This is due to the fact that the speeds over power line are 100 times less than the 100 Mbps Ethernet and also because the number of data frames that the ASIC can handle per second are limited.
We use raw sockets to form the configuration frames and send it via ethernet to the ASIC. Is there a way in which whenever configuration frame (0x88b5), it is queued in front of all the pending data frames (ethertypes 0x0800, 0x0806, 0x86dd) in the netdev_queue?
Can this be done via some supporting functionality implemented using hacks & hooks in a kernel module?
We came across a similar question (although improperly tagged) here: Setting up priority of packets that are being transmitted over the network

SPI data transfer - why MOSI goes to zero half cycle before the data transfer?

I have an SPI signal output from a SPI device. I wonder why the data output (MOSI) goes to 0 half cycle before the actual data is written on the bus? Is it a must condition for an SPI device? If it does not go to zero, would there be any problem on the data transfer?
I use spidev32766.1 on linux (ubuntu 12.04 - kernel 3.7.1), the processor is imx233
Thank you in advance!!

The slave device doesn't care what happens on the data line except for a very short period (usually <1ns) either side of its active clock edge (this window is defined by the setup and hold time specifications for the interface).
I have no idea why your system would put out that "wiggle" though!

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string