How to set Inter-Byte Delay Timeout to milliseconds? - linux

I'm currently working with termios for serial communication in Linux.
I need to set an intercharacter timeout to 5ms.
I found a way to set intercharacter timeout using VMIN and VTIME where VMIN has to be VMIN > 0 and VTIME > 0.
The problem is that i need to set the VTIME to 5ms, but the VTIME is expressed in tenths of a second.
VTIME data type is unsigned char, so i can't just set it to 0.05.
Does anyone know if there is some way around this?

I need to set an intercharacter timeout to 5ms.
...
Does anyone know if there is some way around this?
No, there is no way to set a shorter termios timeout than 100 ms.
Depending on your hardware and kernel configuration, this timeout may not be reliable at all, especially if you are trying to detect time-separated messages.
The termios handling is at least a full layer above the UART device driver (see
Linux serial drivers).
Unless your kernel is configured to ensure that the bottom-half of the UART driver and the kworker threads for termios are high priority and low latency, then short intercharacter intervals cannot be accurately or reliably determined.
If the UART utilizes a FIFO to buffer incoming data, then that hardware obscures the intercharacter spacing that the software can detect.
Similarly when the UART driver is using DMA to store the received data, intercharacter timing will be obscured.
With DMA the CPU is not involved with handling the received data until the DMA operation is complete, and all temporal information about any intercharacter separation is gone.
(Crucial information such as framing error and/or parity error is difficult/impossible to pinpoint to a specific byte when using DMA.)
Even without DMA, termios will only be able to use timing based on the transfer of data through the tty flip buffers (which is a layer removed from the timing on the wire).
Some UARTs do have hardware that assist in detecting the end-of-message by idle line.
For example Atmel/Microchip ATSAMA5 and AT91SAM9 SoCs have USARTs with a Receiver Timeout feature that measures the idle time after each received frame.
When this idle line time exceeds a specified value, an interrupt can be generated.
The Linux driver for the Atmel USART typically uses the receiver-timeout interrupt to (prematurely) terminate the current DMA receive operation, and copy the contents of the DMA buffer to the tty flip buffer.
In summary you cannot or should not rely solely on VMIN and VTIME settings to detect time-separated messages. See Parsing time-delimited UART data.
The message packets need to have delimiter/sentinel characters/bytes so that messages can be reliably parsed and validated.
See parsing complete messages from serial port for an example of efficient use of syscalls with a local buffer.

Related

How many I/O interrupts can happen during a time period?

I don't need exact figures but I want to know a realistic sense of the typical average pc's ability to read input interrupts in 1 millisecond period. Say a mouse keeps moving, how many reads happen for an average or a gaming mouse for that matter, by the os?
In other words if we make a program that tries to record mouse inputs, how frequent should we read in order to read a single input value more than once?
This depends on hardware and what kind of device you are talking about. Intel actually provides the maximum rate of interrupt for its xHCI USB controller. I would say this maximum rate is probably too high for any gaming mouse. The Intel document about xHCI (https://www.intel.com/content/dam/www/public/us/en/documents/technical-specifications/extensible-host-controler-interface-usb-xhci.pdf) specifies at page 289 that
Interrupt Moderation allows multiple events to be processed in the context of a single Interrupt Service Request (ISR), rather than generating an ISR for each event.The interrupt generation that results from the assertion of the Interrupt Pending (IP) flag may be throttled by the settings of the Interrupter Moderation (IMOD) register of the associated Interrupter. The IMOD register consists of two 16-bit fields: the Interrupt Moderation Counter (IMODC) and the Interrupt Moderation Interval (IMODI).Software may use the IMOD register to limit the rate of delivery of interrupts to the host CPU. This register provides a guaranteed inter-interrupt delay between the interrupts of an Interrupter asserted by the host controller, regardless of USB traffic conditions.The following algorithm converts the inter-interrupt interval value to the common 'interrupts/sec' performance metric:
Interrupts/sec = (250×10-9sec × IMODI) -1
For example, if the IMODI is programmed to 512, the host controller guarantees the host will not be interrupted by the xHC for at least 128 microseconds from the last interrupt. The maximum observable interrupt rate from the xHC should not exceed 8000 interrupts/sec.Inversely, inter-interrupt interval value can be calculated as:
Inter-interrupt interval = (250×10-9sec × interrupts/sec) -1
The optimal performance setting for this register is very system and configuration specific. An initial suggested range for the moderation Interval is 651-5580 (28Bh -15CCh). The IMODI field shall default to 4000 (1 ms.) upon initialization and reset. It may be loaded with an alternative value by software when the Interrupter is initialized
USB works alongside the xHCI to provide interrupts to the system. I'm not a hardware engineer but I would say that the interrupt speed depends on the mouse frequency. For example this mouse: https://www.amazon.ca/Programmable-PICTEK-Computer-Customized-Breathing/dp/B01G8W30BY/ref=sr_1_4?dchild=1&keywords=usb+gaming+mouse&qid=1610137924&s=electronics&sr=1-4, has a frequency of 125HZ to 1000HZ. It probably means that you will get a interrupt frequency of 125/s to 1000/s since the mouse has this frequency. Its optical sensor will check the surface that the mouse is on at this frequency providing an interrupt for a movement.
As to interrupts themselves, I think it depends on the speed of the CPU. Interrupts are masked for a short amount of time while handling. The fastest the CPU, the fastest the interrupt will be unmasked, the fastest a new interrupt can occur. I would say the bottleneck here is the mouse with 1000 interrupts/s, that is 1 interrupt/ms.

Is there possible to stream infinte data over SPI using DMA on STM32F3?

I'm developing a RF modem based on a new protocol, which has a feature of streaming 96 Bytes in one frame - but they are sent on and on, before communication ends. I plan using two 96 Bytes buffers in STM32 - in next lines I will explain why.
I want to send first 96 Byte frames by the USB-CDC to STM32 - then external modem chip will generate a "9600bps" clock and STM will have to write Payload bits by bits on specified output pin(at the trailing edge of the each clock pulse).
When STM32 will notice that it had sent a half of 96 Byte frame - that it sent to PC notification to send more data - PC will refill second 96 Byte buffer by USB-CDC immediately. When STM32 will end sending first buffer - immediately starts sending second buffer content. When it will send half of second buffer - as previous will ask PC for another 96Byte frame.
And that way all the time, before PC will sent command to stop tx.
This transfer mode - a serial, with using a "trigger clock".
Is this possible using DMA, and how could I set it?
I want to use DMA to have ability to use USB while already streaming data to the radio modem chip. Is this the right approach?
I'm working in project building an opensource radiocommunication system project with both packet and stream capatibilities & digital voice. I'm designing and electronics for PC radiomodem. Project is called M17 and is maintained by Wojtek SP5WWP.
Re. general architecture. Serial communication over USB ACM does not have to use buffer of the same size and be synchronized with the downstream communication over SPI. You could use buffers as big as practically possible so PC can send data in advance. This will reduce the chance of buffer underflow if PC does not provide data fast enough. Use a circular buffer and fill it when a packet arrives from USB.
DMA is the right approach. Although people often say that DMA is only necessary for high bandwidth operations, it may be actually easier to work with DMA than handling interrupts per every byte, even when you only handle 9600 bits per second.
DMA controller in STM32F3 has a Half-Transfer Complete (HTIF in DMA_ISR) bit that you can poll or make it generate and interrupt. In conjunction with the Transfer Complete status (TCIF) and the Circular bit (CIRC in DMA_CCR) you can organize a double-buffered data pipe so that transfers can overlap with whatever else the MCU is doing. The application will reload the first half of the DMA buffer on the HTIF event. When the TCIF event happens, it reloads the second half. It has to be done quickly, before the other half is also completed. However, you need a double buffered pipeline only when you need to constantly stream data, i.e. overall amount is larger than can the size of the DMA buffer.
Stopping a circular DMA may be tricky. I suppose both the STM32 and external chip know how many bytes to send. In that case, after this amount is received, disable the DMA.
It seems you need a slave SPI in STM32 as the external chip generates SPI clock.
DMA is not difficult to set up, however, it needs multiple things to work properly. I assume register-level programming, if you use some kind of framework, you'll need to find out how it implements these features. Enable clocks for SPI, GPIO port for SPI pins, and DMA, configure the pins as AF. Find the right DMA channel for the SPI peripheral. In case of SPI DMA you usually need two channels: TX and RX, but with the slave SPI, you may get away with one. Configure SPI, pay attention to clock polarity and phase, and set it to generate a DMA request for each TX and/or RX. Set the DMA CPAR channel register pointing to the SPI DR register in channel(s) and program all other DMA channel registers appropriately. Enable the DMA channel(s). Enable SPI in slave mode. When the SPI master clocks data on the MOSI/SCK pins, the DMA controller will put them in memory. When the buffer is half-full and full-full, the channel will set the HTIF and TCIF bits and generate and interrupt, if you told it to. Use these events to implement flow control.

Linux UART slower than specified Baudrate

I'm trying to communicate between two Linux systems via UART.
I want to send large chunks of data. With the specified Baudrate it should take around 5 seconds, but it takes nearly 10 times the expected time.
As I'm sending more than the buffer can handle at once it is send in small parts and I'm draining the buffer in between. If I measure the time needed for the drain and the number of bytes written to the buffer I calculate a Baudrate nearly 10 times lower than the specified Baudrate.
I would expect a slower transmission as the optimal, but not this much.
Did I miss something while setting the UART or while writing? Or is this normal?
The code used for setup:
int bus = open(interface.c_str(), O_RDWR | O_NOCTTY | O_NDELAY); // <- also tryed blocking
if (bus < 0) {
return;
}
struct termios options;
memset (&options, 0, sizeof options);
if(tcgetattr(bus, &options) != 0){
close(bus);
bus = -1;
return;
}
cfsetspeed (&options, B230400);
cfmakeraw(&options); // <- also tried this manually. did not make a difference
if(tcsetattr(bus, TCSANOW, &options) != 0)
{
close(bus);
bus = -1;
return;
}
tcflush(bus, TCIFLUSH);
The code used to send:
int32_t res = write(bus, data, dataLength);
while (res < dataLength){
tcdrain(bus); // <- taking way longer than expected
int32_t r = write(bus, &data[res], dataLength - res);
if(r == 0)
break;
if(r == -1){
break;
}
res += r;
}
B230400
The docs are contradictory. cfsetspeed is documented as requiring a speed_t type, while the note says you need to use one of the "B" constants like "B230400." Have you tried using an actual speed_t type?
In any case, the speed you're supplying is the baud rate, which in this case should get you approximately 23,000 bytes/second, assuming there is no throttling.
The speed is dependent on hardware and link limitations. Also the serial protocol allows pausing the transmission.
FWIW, according to the time and speed you listed, if everything works perfectly, you'll get about 1 MB in 50 seconds. What speed are you actually getting?
Another "also" is the options structure. It's been years since I've had to do any serial I/O, but IIRC, you need to actually set the options that you want and are supported by your hardware, like CTS/RTS, XON/XOFF, etc.
This might be helpful.
As I'm sending more than the buffer can handle at once it is send in small parts and I'm draining the buffer in between.
You have only provided code snippets (rather than a minimal, complete, and verifiable example), so your data size is unknown.
But the Linux kernel buffer size is known. What do you think it is?
(FYI it's 4KB.)
If I measure the time needed for the drain and the number of bytese written to the buffer I calculate a Baudrate nearly 10 times lower than the specified Baudrate.
You're confusing throughput with baudrate.
The maximum throughput (of just payload) of an asynchronous serial link will always be less than the baudrate due to framing overhead per character, which could be two of the ten bits of the frame (assuming 8N1). Since your termios configuration is incomplete, the overhead could actually be three of the eleven bits of the frame (assuming 8N2).
In order to achieve the maximum throughput, the tranmitting UART must saturate the line with frames and never let the line go idle.
The userspace program must be able to supply data fast enough, preferably by one large write() to reduce syscall overhead.
Did I miss something while setting the UART or while writing?
With Linux, you have limited access to the UART hardware.
From userspace your program accesses a serial terminal.
Your program accesses the serial terminal in a sub-optinal manner.
Your termios configuration appears to be incomplete.
It leaves both hardware and software flow-control untouched.
The number of stop bits is untouched.
The Ignore modem control lines and Enable receiver flags are not enabled.
For raw reading, the VMIN and VTIME values are not assigned.
Or is this normal?
There are ways to easily speed up the transfer.
First, your program combines non-blocking mode with non-canonical mode. That's a degenerate combination for receiving, and suboptimal for transmitting.
You have provided no reason for using non-blocking mode, and your program is not written to properly utilize it.
Therefore your program should be revised to use blocking mode instead of non-blocking mode.
Second, the tcdrain() between write() syscalls can introduce idle time on the serial link. Use of blocking mode eliminates the need for this delay tactic between write() syscalls.
In fact with blocking mode only one write() syscall should be needed to transmit the entire dataLength. This would also minimize any idle time introduced on the serial link.
Note that the first write() does not properly check the return value for a possible error condition, which is always possible.
Bottom line: your program would be simpler and throughput would be improved by using blocking I/O.

How to modify timing in readyRead of QextSeriaport [duplicate]

I'm implementing a protocol over serial ports on Linux. The protocol is based on a request answer scheme so the throughput is limited by the time it takes to send a packet to a device and get an answer. The devices are mostly arm based and run Linux >= 3.0. I'm having troubles reducing the round trip time below 10ms (115200 baud, 8 data bit, no parity, 7 byte per message).
What IO interfaces will give me the lowest latency: select, poll, epoll or polling by hand with ioctl? Does blocking or non blocking IO impact latency?
I tried setting the low_latency flag with setserial. But it seemed like it had no effect.
Are there any other things I can try to reduce latency? Since I control all devices it would even be possible to patch the kernel, but its preferred not to.
---- Edit ----
The serial controller uses is an 16550A.
Request / answer schemes tends to be inefficient, and it shows up quickly on serial port. If you are interested in throughtput, look at windowed protocol, like kermit file sending protocol.
Now if you want to stick with your protocol and reduce latency, select, poll, read will all give you roughly the same latency, because as Andy Ross indicated, the real latency is in the hardware FIFO handling.
If you are lucky, you can tweak the driver behaviour without patching, but you still need to look at the driver code. However, having the ARM handle a 10 kHz interrupt rate will certainly not be good for the overall system performance...
Another options is to pad your packet so that you hit the FIFO threshold every time. It will also confirm that if it is or not a FIFO threshold problem.
10 msec # 115200 is enough to transmit 100 bytes (assuming 8N1), so what you are seeing is probably because the low_latency flag is not set. Try
setserial /dev/<tty_name> low_latency
It will set the low_latency flag, which is used by the kernel when moving data up in the tty layer:
void tty_flip_buffer_push(struct tty_struct *tty)
{
unsigned long flags;
spin_lock_irqsave(&tty->buf.lock, flags);
if (tty->buf.tail != NULL)
tty->buf.tail->commit = tty->buf.tail->used;
spin_unlock_irqrestore(&tty->buf.lock, flags);
if (tty->low_latency)
flush_to_ldisc(&tty->buf.work);
else
schedule_work(&tty->buf.work);
}
The schedule_work call might be responsible for the 10 msec latency you observe.
Having talked to to some more engineers about the topic I came to the conclusion that this problem is not solvable in user space. Since we need to cross the bridge into kernel land, we plan to implement an kernel module which talks our protocol and gives us latencies < 1ms.
--- edit ---
Turns out I was completely wrong. All that was necessary was to increase the kernel tick rate. The default 100 ticks added the 10ms delay. 1000Hz and a negative nice value for the serial process gives me the time behavior I wanted to reach.
Serial ports on linux are "wrapped" into unix-style terminal constructs, which hits you with 1 tick lag, i.e. 10ms. Try if stty -F /dev/ttySx raw low_latency helps, no guarantees though.
On a PC, you can go hardcore and talk to standard serial ports directly, issue setserial /dev/ttySx uart none to unbind linux driver from serial port hw and control the port via inb/outb to port registers. I've tried that, it works great.
The downside is you don't get interrupts when data arrives and you have to poll the register. often.
You should be able to do same on the arm device side, may be much harder on exotic serial port hw.
Here's what setserial does to set low latency on a file descriptor of a port:
ioctl(fd, TIOCGSERIAL, &serial);
serial.flags |= ASYNC_LOW_LATENCY;
ioctl(fd, TIOCSSERIAL, &serial);
In short: Use a USB adapter and ASYNC_LOW_LATENCY.
I've used a FT232RL based USB adapter on Modbus at 115.2 kbs.
I get about 5 transactions (to 4 devices) in about 20 mS total with ASYNC_LOW_LATENCY. This includes two transactions to a slow-poke device (4 mS response time).
Without ASYNC_LOW_LATENCY the total time is about 60 mS.
With FTDI USB adapters ASYNC_LOW_LATENCY sets the inter-character timer on the chip itself to 1 mS (instead of the default 16 mS).
I'm currently using a home-brewed USB adapter and I can set the latency for the adapter itself to whatever value I want. Setting it at 200 µS shaves another mS off that 20 mS.
None of those system calls have an effect on latency. If you want to read and write one byte as fast as possible from userspace, you really aren't going to do better than a simple read()/write() pair. Try replacing the serial stream with a socket from another userspace process and see if the latencies improve. If they don't, then your problems are CPU speed and hardware limitations.
Are you sure your hardware can do this at all? It's not uncommon to find UARTs with a buffer design that introduces many bytes worth of latency.
At those line speeds you should not be seeing latencies that large, regardless of how you check for readiness.
You need to make sure the serial port is in raw mode (so you do "noncanonical reads") and that VMIN and VTIME are set correctly. You want to make sure that VTIME is zero so that an inter-character timer never kicks in. I would probably start with setting VMIN to 1 and tune from there.
The syscall overhead is nothing compared to the time on the wire, so select() vs. poll(), etc. is unlikely to make a difference.

RealTek 8168 (r8169 linux driver) - Rx descriptor ring confusion

I'm working on control system stuff, where the NIC interrupt is used as a trigger. This works very well for cycle times greater than 3 microseconds. Now i want to make some performance tests and measure the transmission time, respectively the shortest time between two interrupts.
The sender is sending 60 byte packages as fast as possible. The receiver should generate one interrupt per package. I'm testing with 256 packets, the size of the Rx descriptor ring. The packet data won't handled during the test. Only the interrupt is interesting.
The trouble is, that the reception is very fast up to less then 1 microsecond between two interrupts, but only for around 70 interrupts / descriptors. Then the NIC sets the RDU (Rx Descriptor Unavailable) bit and stops the receiving before reaching the end of the ring. The confusing thing is, when i increase the size of the Rx descriptor ring up to 2048 (e.g.), then the number of interrupts is increasing too (around 800). I don't understand this behavior. I thought he should stop again after 70 interrupts.
It seems to be a time problem, but why? I'm overlooking something, but what? Can somebody help my?
Thanks in advance!
What I think is that due to large RX packet rate, your receive interrupts are missing . Don't count interrupts to see how many packets are received.Rely on "own" bit of Receive descriptors.
Receive Descriptor unavailable will be set only when you reach end of the ring unless you have made some error in programming RX descriptors (e.g. forgot to set ownership bit)
So if your RX ring has 256 descriptors, I think you should receive 256 packets without recycling RX descriptors.
If you are doubtful whether you are reaching the end of ring or not, try setting interrupt on completion bit of only last RX descriptor.In this way you receive only one interrupt at the end of ring.

Resources