Why does my Raspberry Pi SPI write loop pause for about 15ms? Context switch? - linux

I have a Raspberry Pi managing and writing to an FPGA. I use the SPI bus and some GPIO's to get the data over the link. The SPI write happens in bursts of variable length - from a few tens of bytes up to about 8kB.
The FPGA SPI receive code has a timeout which I initially set to around 12ms (which was a guesstimated value). This is almost always OK, but today I found out that very occasional timeouts on the FPGA seem to be caused by the fact that RPi will occasionally pause for about 15ms in the middle of sending a byte.
The RPi python driver code looks pretty much like this:
for b in byte_array:
send_byte()
where send_byte() is a simple function that calls the SPI byte write function in the GPIO library, with some checking of a BUSY line and retrying.
Generally, bytes go out a few hundred microseconds apart, so this sudden pause is odd. I am thinking it is probably caused by some sort of context switch - but the Pi is not doing much else (and running the stock Linux distro).
None of this is a big problem, I can easily increase the timeout with no problems. But I am curious. What causes a chip running Linux at several GHz to suddenly stop doing the only thing it is being asked to do for about 15ms - which is an eternity in this context?
If it WAS a problem - what could I do about it?

Related

What is the best way to sample periodicaly gpio-pins in Linux?

I like to sample a signal which is generated by the pins of my Raspberry Pi. I made the experience that high sample rates are difficult to realize.
First I done a fast approach with Python(super slow). Then I changed to ANSI C + the bcm2835.h lib. I gained significant performance gains.
Now I am asking my self the question: How to do the best sampling under Linux?
My tries were undertaken in user space. But what is about, switching to kernel space? I can write a simple character device kernel module. In this module the pins are periodically checked. If the state changed some information is put into a buffer. This i/o buffer is polled by a synchronous file read for a application in user space. The best solution for me would be, if the pin checking can be done with a fixed frequency(sample period should be constant for signal processing).
The set up for this could be:
#kernel: character module + kernel thread + gpio device tree interface + DSP at constant sampling time
#user space: i/o app reading synchronous from character module
Ideas/tips?
I have a solution for you.
I have written such a module:
https://github.com/Appyx/gpio-reflect
You can synchronously read any signal from the GPIO pins.
You can use the output and calculate the signal with your sampling rate.
Just divide the periods.

How to modify timing in readyRead of QextSeriaport [duplicate]

I'm implementing a protocol over serial ports on Linux. The protocol is based on a request answer scheme so the throughput is limited by the time it takes to send a packet to a device and get an answer. The devices are mostly arm based and run Linux >= 3.0. I'm having troubles reducing the round trip time below 10ms (115200 baud, 8 data bit, no parity, 7 byte per message).
What IO interfaces will give me the lowest latency: select, poll, epoll or polling by hand with ioctl? Does blocking or non blocking IO impact latency?
I tried setting the low_latency flag with setserial. But it seemed like it had no effect.
Are there any other things I can try to reduce latency? Since I control all devices it would even be possible to patch the kernel, but its preferred not to.
---- Edit ----
The serial controller uses is an 16550A.
Request / answer schemes tends to be inefficient, and it shows up quickly on serial port. If you are interested in throughtput, look at windowed protocol, like kermit file sending protocol.
Now if you want to stick with your protocol and reduce latency, select, poll, read will all give you roughly the same latency, because as Andy Ross indicated, the real latency is in the hardware FIFO handling.
If you are lucky, you can tweak the driver behaviour without patching, but you still need to look at the driver code. However, having the ARM handle a 10 kHz interrupt rate will certainly not be good for the overall system performance...
Another options is to pad your packet so that you hit the FIFO threshold every time. It will also confirm that if it is or not a FIFO threshold problem.
10 msec # 115200 is enough to transmit 100 bytes (assuming 8N1), so what you are seeing is probably because the low_latency flag is not set. Try
setserial /dev/<tty_name> low_latency
It will set the low_latency flag, which is used by the kernel when moving data up in the tty layer:
void tty_flip_buffer_push(struct tty_struct *tty)
{
unsigned long flags;
spin_lock_irqsave(&tty->buf.lock, flags);
if (tty->buf.tail != NULL)
tty->buf.tail->commit = tty->buf.tail->used;
spin_unlock_irqrestore(&tty->buf.lock, flags);
if (tty->low_latency)
flush_to_ldisc(&tty->buf.work);
else
schedule_work(&tty->buf.work);
}
The schedule_work call might be responsible for the 10 msec latency you observe.
Having talked to to some more engineers about the topic I came to the conclusion that this problem is not solvable in user space. Since we need to cross the bridge into kernel land, we plan to implement an kernel module which talks our protocol and gives us latencies < 1ms.
--- edit ---
Turns out I was completely wrong. All that was necessary was to increase the kernel tick rate. The default 100 ticks added the 10ms delay. 1000Hz and a negative nice value for the serial process gives me the time behavior I wanted to reach.
Serial ports on linux are "wrapped" into unix-style terminal constructs, which hits you with 1 tick lag, i.e. 10ms. Try if stty -F /dev/ttySx raw low_latency helps, no guarantees though.
On a PC, you can go hardcore and talk to standard serial ports directly, issue setserial /dev/ttySx uart none to unbind linux driver from serial port hw and control the port via inb/outb to port registers. I've tried that, it works great.
The downside is you don't get interrupts when data arrives and you have to poll the register. often.
You should be able to do same on the arm device side, may be much harder on exotic serial port hw.
Here's what setserial does to set low latency on a file descriptor of a port:
ioctl(fd, TIOCGSERIAL, &serial);
serial.flags |= ASYNC_LOW_LATENCY;
ioctl(fd, TIOCSSERIAL, &serial);
In short: Use a USB adapter and ASYNC_LOW_LATENCY.
I've used a FT232RL based USB adapter on Modbus at 115.2 kbs.
I get about 5 transactions (to 4 devices) in about 20 mS total with ASYNC_LOW_LATENCY. This includes two transactions to a slow-poke device (4 mS response time).
Without ASYNC_LOW_LATENCY the total time is about 60 mS.
With FTDI USB adapters ASYNC_LOW_LATENCY sets the inter-character timer on the chip itself to 1 mS (instead of the default 16 mS).
I'm currently using a home-brewed USB adapter and I can set the latency for the adapter itself to whatever value I want. Setting it at 200 µS shaves another mS off that 20 mS.
None of those system calls have an effect on latency. If you want to read and write one byte as fast as possible from userspace, you really aren't going to do better than a simple read()/write() pair. Try replacing the serial stream with a socket from another userspace process and see if the latencies improve. If they don't, then your problems are CPU speed and hardware limitations.
Are you sure your hardware can do this at all? It's not uncommon to find UARTs with a buffer design that introduces many bytes worth of latency.
At those line speeds you should not be seeing latencies that large, regardless of how you check for readiness.
You need to make sure the serial port is in raw mode (so you do "noncanonical reads") and that VMIN and VTIME are set correctly. You want to make sure that VTIME is zero so that an inter-character timer never kicks in. I would probably start with setting VMIN to 1 and tune from there.
The syscall overhead is nothing compared to the time on the wire, so select() vs. poll(), etc. is unlikely to make a difference.

High Speed Serial

I have a system which uses a UART clocked at 26 Mhz. This is a 16850 UART on a i86 architecture. I have no problems accessing the port. The largest incoming message is about 56 bytes, the largest outgoing about 100. The baud rate divisor needs to be 1 so seterial /dev/ttyS4 baud_base 115200 is OK and opening at 115200. There is no flow control. Specifying part 16850 does NOT set the FIFO to deep. I was losing bytes. All the data is byte, unsigned char.
I wrote a routine that uses ioperm to set the deep FIFO's to 64 and now a read/write works meaning that the deep FIFO's are NOT being enabled by serial_core.c or 8250.c, at least in a deep manner.
With the deep FIFO set using s brute force, post open(fd, "/dev/ttyS4", NO_BLOCKING, etc I get reliably the correct number of bytes but I tend to get the same word missing a bit. Not a byte, a bit.
All this same stuff runs fine under DOS so it is not a hardware issue.
I have opened the port for raw, no delays, no party, 8 bits, 2 stop.
Has anyone seen issues reading serial ports are relatively high speeds with short bursts of data?
Yes, I have tried custom baud, etc. The FIFO levels made the biggest improvement. This is a ISA bus card using IRQ7.
It appears the serial driver for Linux sucks and has way to much latency and far to many features for really basic raw operation.
Has anyone else tried very high speed data without flow control or had similar issues. As I stated, I get the correct number of bytes and all the data is correct except 1 bit in byte 4.
I am pretty stumped.

Cubieboard2 - how do I set GPIO high during boot and keep it high?

I am trying to adapt a circuit for soft-power from Raspberry to Cubieboard (http://www.mosaic-industries.com/embedded-systems/microcontroller-projects/electronic-circuits/push-button-switch-turn-on/latching-toggle-power-switch). It appears to be working just fine but it needs a pin from the board to be set high while it is running (actually the description is for input with pull-up, but that's not the problem).
I have managed to set the PH15 pin as output high in uBoot script via gpio set 263. However about 3 seconds into the boot process something sets it low back again. The fex file is set correctly to adjust it to output high (tried input with pull-up too) but that appears to kick in several seconds later (about 7-8 seconds in the boot). As a result during the boot process the soft-power circuit just dies reacting to the low state of the pin.
So that is the puzzling question - how to set a pin on the Cubieboard immediately during boot and keep it that way? What brings it down?
Using kernel 3.4.79 with some config options (rtc, drivers, etc - nothing exotic).
Any help appreciated. Modifications to the circuit acceptable too!

Bit Bang with SPI (fwirte, write performance)

I have a bit bang code that allows me to send like 4 megs of data through SPI lines. Its embedded code for custom Hardware using a Linux Kernel.
The problem is that takes a VERY long time to do it (4 hours) this is most likely becase the kernel is doing more stuff. Basically my code is something like this(aprox):
unsigned char data=0xFF;
BB_SPI_Init();
SPI_start();//activates chipselect(enable)
for(i=0;i<8;i++){
if(data & 0x80){
gpio_set_value(SPI_MOSI,1);
}else{
gpio_set_value(SPI_MOSI,0);
}
//send pulse clock
gpio_set_value(SPI_CLK,0);
gpio_set_value(SPI_CLK,1);
data<<=1;
}
SPI_stop();//deactivates chipselect(disable)
So is a very simple bit bang, but i notice that if i use write to send data to the linux gpio handler /sys/class/gpio/gpioXX/value (where XX is any gpio number) it takes 4 hours.
But if i use fwrite() for sending to the same device it takes 3 hours.
BUT, if you use write() only for the enables ( SPI_stop(), and SPI_start()) and fwrite() for sending to MISO, CLK it only takes 1 hour and 30 minutes.
So, with that as a base, could someone explain to me how is that happening? my imagination says that is the way the threads are handled and in every software cycle it resolves 2 threads (fwrite() and write()) instead if was only one of the functions used, but now i'm still investigating, can someone let me know any kind of information? is there a better way to handle this?
FYI
Can't use kernel driver spi because the hardware was connected to gpios and it is a mandatory requirement to use bit bang but i accept any suggestion
Thanks in advance
EDIT
Hey Guys thanks for your comments, it seems that i had a problem (very dumb one) that i created the file descriptor each time that i was going to send data to sys/class/gpio/gpioxx/value so that's why was slow. Also turn off some other programs and the transfer skyrocket to 3 minutes instead of 1 hour 30 minutes (with write()). Thanks and Sorry about it
I think that the spi-bitbang driver is the best solution if you are looking for performance. Doing the bit-bang from user space is a pain because you have at least 3 system calls for each bit of data. A system call is an expensive operation.
FYI Can't use kernel driver spi because the hardware was connected to gpios and it is a mandatory requirement to use bit bang but i accept any suggestion
That's why the spi-bitbang driver exists. You can easily configure the spi-bitbang driver to work with your GPIOs.
Then, once you have a spi-bitbang driver, you can write a char device that accept as input your entire block of data and transfer it in kernel space. With this solution you will get the maximum performance for a bit-bang interface.

Resources