I am trying to port (or rather customize) a pure Linux application to OS X Snow Leopard (10.6.4). It is an application that sends a binary to a target hardware over the serial port. The app is almost running but i am hit with an interesting problem with the serial port writes.
With all the same settings as Linux (115.2k is the baudrate) OS X serial data send seems to be some 10 times or more slower compared to Linux. What takes 3 secs in Linux , takes 30-40 secs and by that time the target firmware at the receiving end times out :).
Digging in to the serial port write function, i see that it is using select() system call to find if the device (or rather the file descriptor) is ready to write data to. Each write system call writes 1024 bytes of data in OS X and 1087 bytes of data in Linux (Thats what the return value of write is). My data size is some 50KB for a first level binary (it is a small bootloader which can load a bigger binary in the next level).
Pseudo code
select() configuration with 1s time out and observing the serial port file descriptor for write ready.
while(true)
{
rc=select(...)
if(rc>0){write(...) and other logic to get out of while if done}
if(rc==0){//try again}
if(rc<0){//error}
}
I observed that in linux, writes happen all the time one after another . A sequence of writes and it comes out of the function in a jiffy. But, in OS X, it is like 3 writes and then select returns zero twice (2 seconds gone) again a few writes and select time out etc etc making the function a lot slower.
Any clues?
Notes:
The app is using termios lib API to control the serial port.
I could fix this by changing the prolific chip device driver. By default it was using a non-standard open source driver, I downloaded the OS X driver from the prolific website and it works ok.
Thanks to Nils and others for the support!
Related
I have a Raspberry Pi managing and writing to an FPGA. I use the SPI bus and some GPIO's to get the data over the link. The SPI write happens in bursts of variable length - from a few tens of bytes up to about 8kB.
The FPGA SPI receive code has a timeout which I initially set to around 12ms (which was a guesstimated value). This is almost always OK, but today I found out that very occasional timeouts on the FPGA seem to be caused by the fact that RPi will occasionally pause for about 15ms in the middle of sending a byte.
The RPi python driver code looks pretty much like this:
for b in byte_array:
send_byte()
where send_byte() is a simple function that calls the SPI byte write function in the GPIO library, with some checking of a BUSY line and retrying.
Generally, bytes go out a few hundred microseconds apart, so this sudden pause is odd. I am thinking it is probably caused by some sort of context switch - but the Pi is not doing much else (and running the stock Linux distro).
None of this is a big problem, I can easily increase the timeout with no problems. But I am curious. What causes a chip running Linux at several GHz to suddenly stop doing the only thing it is being asked to do for about 15ms - which is an eternity in this context?
If it WAS a problem - what could I do about it?
[Edit: I found the reason, see below]
The problem:
I created a "driver" for a device in Windows using Python (PyUSB and libusb-win32). While this software works seamlessly on multiple PCs under Windows, using my Linux (Kubuntu 18.10) test laptop, a sequence of bulk writes of length 512 bytes each times out after the second 512 byte bulk transfer.
Interesting: I also tried the same using VirtualBox. And it turns out using a Windows guest via VirtualBox on the same Linux host, the same error still occurs. So it is not because of
The question:
What can happen under Linux does not happen under Windows that causes a timeout [Errno 110]?
More information, in case it helps:
Under Windows, Wireshark shows timing differences between two of the bulk writes of 6 ms for the first one and 5 ms for every following, while under Linux the delta is only round about 3 ms, which are mostly resulting from a sleep operation (relevant Python source code is attached). Doubling that time does nothing.
dmesg shows messages like 'bulk endpoint ## has invalid maxpacket 64', where ## is 0x01, 0x08 and 0x81.
The device only has one configuration.
The test laptop has only USB 3.0 connectors, where the Windows PCs have both USB 3.0 and 2.0. I tested all.
Wireshark shows the device answering with another (empty) bulk on every bulk write under Linux, while it does not show that under Windows. As far as I understand, that is because USBPcap cannot capture handshakes under Windows. But I am not shure with that, because I do not know if this type of response would really be classified as "URB_BULK out".
I tried libusb0, libusb1 and OpenUSB as backends under Linux, without success.
The bulk transfer in question is the transfer of FPGA firmware to the device.
I am able to communicate with the device before the multiple-512 byte-chunk bulk operation on the same endpoints using only a few bytes. The code that then causes the timeout is the following one in the second iteration of this for loop:
for chunk in chunks: # chunks: array of bytearrays with 512 bytes each
self.write(0x01,chunk)
time.sleep(0.003)
[Edit] The reason I found out that this only occurs on my test laptop using xhci, not on a second Linux test machine using ehci. So this might be caused by xhci. I do not yet have a workaround, but this at least gives an explanation.
It turns out that the device requested less bytes per packet, the desired amount of bytes (64) could be found in dmesg as already written in the question. Since xhci doesn't support that officially, Linux decided to ignore that request. Windows seemingly went with it and split larger packets up in the requested packet size. So the solution was to manually split the data to packets of 64 byte size before transferring it.
I'm implementing a protocol over serial ports on Linux. The protocol is based on a request answer scheme so the throughput is limited by the time it takes to send a packet to a device and get an answer. The devices are mostly arm based and run Linux >= 3.0. I'm having troubles reducing the round trip time below 10ms (115200 baud, 8 data bit, no parity, 7 byte per message).
What IO interfaces will give me the lowest latency: select, poll, epoll or polling by hand with ioctl? Does blocking or non blocking IO impact latency?
I tried setting the low_latency flag with setserial. But it seemed like it had no effect.
Are there any other things I can try to reduce latency? Since I control all devices it would even be possible to patch the kernel, but its preferred not to.
---- Edit ----
The serial controller uses is an 16550A.
Request / answer schemes tends to be inefficient, and it shows up quickly on serial port. If you are interested in throughtput, look at windowed protocol, like kermit file sending protocol.
Now if you want to stick with your protocol and reduce latency, select, poll, read will all give you roughly the same latency, because as Andy Ross indicated, the real latency is in the hardware FIFO handling.
If you are lucky, you can tweak the driver behaviour without patching, but you still need to look at the driver code. However, having the ARM handle a 10 kHz interrupt rate will certainly not be good for the overall system performance...
Another options is to pad your packet so that you hit the FIFO threshold every time. It will also confirm that if it is or not a FIFO threshold problem.
10 msec # 115200 is enough to transmit 100 bytes (assuming 8N1), so what you are seeing is probably because the low_latency flag is not set. Try
setserial /dev/<tty_name> low_latency
It will set the low_latency flag, which is used by the kernel when moving data up in the tty layer:
void tty_flip_buffer_push(struct tty_struct *tty)
{
unsigned long flags;
spin_lock_irqsave(&tty->buf.lock, flags);
if (tty->buf.tail != NULL)
tty->buf.tail->commit = tty->buf.tail->used;
spin_unlock_irqrestore(&tty->buf.lock, flags);
if (tty->low_latency)
flush_to_ldisc(&tty->buf.work);
else
schedule_work(&tty->buf.work);
}
The schedule_work call might be responsible for the 10 msec latency you observe.
Having talked to to some more engineers about the topic I came to the conclusion that this problem is not solvable in user space. Since we need to cross the bridge into kernel land, we plan to implement an kernel module which talks our protocol and gives us latencies < 1ms.
--- edit ---
Turns out I was completely wrong. All that was necessary was to increase the kernel tick rate. The default 100 ticks added the 10ms delay. 1000Hz and a negative nice value for the serial process gives me the time behavior I wanted to reach.
Serial ports on linux are "wrapped" into unix-style terminal constructs, which hits you with 1 tick lag, i.e. 10ms. Try if stty -F /dev/ttySx raw low_latency helps, no guarantees though.
On a PC, you can go hardcore and talk to standard serial ports directly, issue setserial /dev/ttySx uart none to unbind linux driver from serial port hw and control the port via inb/outb to port registers. I've tried that, it works great.
The downside is you don't get interrupts when data arrives and you have to poll the register. often.
You should be able to do same on the arm device side, may be much harder on exotic serial port hw.
Here's what setserial does to set low latency on a file descriptor of a port:
ioctl(fd, TIOCGSERIAL, &serial);
serial.flags |= ASYNC_LOW_LATENCY;
ioctl(fd, TIOCSSERIAL, &serial);
In short: Use a USB adapter and ASYNC_LOW_LATENCY.
I've used a FT232RL based USB adapter on Modbus at 115.2 kbs.
I get about 5 transactions (to 4 devices) in about 20 mS total with ASYNC_LOW_LATENCY. This includes two transactions to a slow-poke device (4 mS response time).
Without ASYNC_LOW_LATENCY the total time is about 60 mS.
With FTDI USB adapters ASYNC_LOW_LATENCY sets the inter-character timer on the chip itself to 1 mS (instead of the default 16 mS).
I'm currently using a home-brewed USB adapter and I can set the latency for the adapter itself to whatever value I want. Setting it at 200 µS shaves another mS off that 20 mS.
None of those system calls have an effect on latency. If you want to read and write one byte as fast as possible from userspace, you really aren't going to do better than a simple read()/write() pair. Try replacing the serial stream with a socket from another userspace process and see if the latencies improve. If they don't, then your problems are CPU speed and hardware limitations.
Are you sure your hardware can do this at all? It's not uncommon to find UARTs with a buffer design that introduces many bytes worth of latency.
At those line speeds you should not be seeing latencies that large, regardless of how you check for readiness.
You need to make sure the serial port is in raw mode (so you do "noncanonical reads") and that VMIN and VTIME are set correctly. You want to make sure that VTIME is zero so that an inter-character timer never kicks in. I would probably start with setting VMIN to 1 and tune from there.
The syscall overhead is nothing compared to the time on the wire, so select() vs. poll(), etc. is unlikely to make a difference.
I have a bit bang code that allows me to send like 4 megs of data through SPI lines. Its embedded code for custom Hardware using a Linux Kernel.
The problem is that takes a VERY long time to do it (4 hours) this is most likely becase the kernel is doing more stuff. Basically my code is something like this(aprox):
unsigned char data=0xFF;
BB_SPI_Init();
SPI_start();//activates chipselect(enable)
for(i=0;i<8;i++){
if(data & 0x80){
gpio_set_value(SPI_MOSI,1);
}else{
gpio_set_value(SPI_MOSI,0);
}
//send pulse clock
gpio_set_value(SPI_CLK,0);
gpio_set_value(SPI_CLK,1);
data<<=1;
}
SPI_stop();//deactivates chipselect(disable)
So is a very simple bit bang, but i notice that if i use write to send data to the linux gpio handler /sys/class/gpio/gpioXX/value (where XX is any gpio number) it takes 4 hours.
But if i use fwrite() for sending to the same device it takes 3 hours.
BUT, if you use write() only for the enables ( SPI_stop(), and SPI_start()) and fwrite() for sending to MISO, CLK it only takes 1 hour and 30 minutes.
So, with that as a base, could someone explain to me how is that happening? my imagination says that is the way the threads are handled and in every software cycle it resolves 2 threads (fwrite() and write()) instead if was only one of the functions used, but now i'm still investigating, can someone let me know any kind of information? is there a better way to handle this?
FYI
Can't use kernel driver spi because the hardware was connected to gpios and it is a mandatory requirement to use bit bang but i accept any suggestion
Thanks in advance
EDIT
Hey Guys thanks for your comments, it seems that i had a problem (very dumb one) that i created the file descriptor each time that i was going to send data to sys/class/gpio/gpioxx/value so that's why was slow. Also turn off some other programs and the transfer skyrocket to 3 minutes instead of 1 hour 30 minutes (with write()). Thanks and Sorry about it
I think that the spi-bitbang driver is the best solution if you are looking for performance. Doing the bit-bang from user space is a pain because you have at least 3 system calls for each bit of data. A system call is an expensive operation.
FYI Can't use kernel driver spi because the hardware was connected to gpios and it is a mandatory requirement to use bit bang but i accept any suggestion
That's why the spi-bitbang driver exists. You can easily configure the spi-bitbang driver to work with your GPIOs.
Then, once you have a spi-bitbang driver, you can write a char device that accept as input your entire block of data and transfer it in kernel space. With this solution you will get the maximum performance for a bit-bang interface.
I am trying to send text data from one PC to other using Serial cable. One of the PC is running linux and I am sending data from it using write(2) system call. The log size is approx 65K bytes but the write(2) system call returns some 4K bytes (i.e. this much amount of data is getting transferred). I tried breaking the data in chunks of 4K but write(2) returns -1.
My question is that "Is there any buffer limit for writing data on serial port? or can I send data of any size?. Also do I need to continously read data from other PC as I write 4K chunk of data"
Do I need to do any special configuration in termios structure for sending (huge) data?
The transmit buffer is one page (took a look at Linux 2.6.18 sources) - which is 4K in most (if not all) cases.
The other end must read (don't know the size of the receive buffer), but more importantly you should not write faster than the serial port can transmit, if you are using 115200 bps 8-N-1 you can write the 4K chunk approximately 3 times a second. (115200 / 9 / 4096 = 3.125)
Yes, there is a buffer limit - but when you reach that limit, the write() should block.
When write() returns -1, what is errno set to?
Make sure that the receiver is reading.
You should update the current position it your buffer from the write(), and continue the next write from there. (Applies to all writes(), regardless if the fd is a serial port, tcp socket or a file.)
If you get an error back for subsequent writes. Judging by the manpage, its safe to retry the writes for the following errnos: EAGAIN, EINTR, and probably ENOSPC. Use perror() to see what you get. (..and post it, I am curious.)
EFBIG would seem to indicate that you are trying to write using a buffer (or rather count) that is too large, but that is probably much larger than 64k.
If the internal buffer is filled up, because you are writing to fast, try to (nano)sleep a little between the writes. There are several clever ways of doing this (like tcp does), but if the rate is known, just write at a fixed rate.
If you think the receiver is actually reading, but not much happens, have a look at the serial ports flow-control options and if the cable is wired for DTS/RTS.