High Speed Serial - linux

I have a system which uses a UART clocked at 26 Mhz. This is a 16850 UART on a i86 architecture. I have no problems accessing the port. The largest incoming message is about 56 bytes, the largest outgoing about 100. The baud rate divisor needs to be 1 so seterial /dev/ttyS4 baud_base 115200 is OK and opening at 115200. There is no flow control. Specifying part 16850 does NOT set the FIFO to deep. I was losing bytes. All the data is byte, unsigned char.
I wrote a routine that uses ioperm to set the deep FIFO's to 64 and now a read/write works meaning that the deep FIFO's are NOT being enabled by serial_core.c or 8250.c, at least in a deep manner.
With the deep FIFO set using s brute force, post open(fd, "/dev/ttyS4", NO_BLOCKING, etc I get reliably the correct number of bytes but I tend to get the same word missing a bit. Not a byte, a bit.
All this same stuff runs fine under DOS so it is not a hardware issue.
I have opened the port for raw, no delays, no party, 8 bits, 2 stop.
Has anyone seen issues reading serial ports are relatively high speeds with short bursts of data?
Yes, I have tried custom baud, etc. The FIFO levels made the biggest improvement. This is a ISA bus card using IRQ7.
It appears the serial driver for Linux sucks and has way to much latency and far to many features for really basic raw operation.
Has anyone else tried very high speed data without flow control or had similar issues. As I stated, I get the correct number of bytes and all the data is correct except 1 bit in byte 4.
I am pretty stumped.

Related

Is interrupt jitter causing the annoying wobble in audio using the mcu's dac?

I had a assignment for college where we needed to play a precompiled wav as integer array through the PWM and DAC. Now, I wanted more of a challenge, so I went out of my way and created a audio dac over usb using the micro controller in question: The STM32F051. It basically listens to my soundcard output using a wasapi loopback recorder, changes the resolution from 16 to 12 bit (since the dac on the stm32 only has a 12 bit resolution) and sends it over using usart using 10x sample rate as baud rate (in my case 960000). All done in C#.
On the microcontroller I simply use a interrupt for usart and push the received data to the dac.
It works pretty well, much better than PWM, and at a decent sample frequency of 48kHz.
But... here it comes.. When there is some (mostly) high pitch symphonic melody it starts to sound "wobbly".
Here a video where you can hear it: https://youtu.be/xD3uTP9etuA?t=88
I read up on the internet a bit about DIY dac's and someone somewhere (don't remember where) mentioned that MCU's in general have interrupt jitter. So may basic question is: Is interrupt jitter actually causing this? If so, are there ways to limit the jitter happening?
Or is this something entirely different?
I am thinking of trying to compact the pcm data send over serial (as said before, resolution of 12 bits, but are sent in packet of 2 8bits forming 16bits, hence twice the samplerate as the baud rate, so my plan is trying to shift 12 bits to the MSB and adding four bits of the next 12 bit value to the current 16 bit variable, hence only needing 12 transfers instead of 16 per 8 samples. Might read upon more efficient ways of compacting data for transport.), put the samples in a buffer and then use another timer that triggers at 48kHz for sending the samples to the dac. Would this concept work? Or would I just waste time?
For code, here is the project: https://github.com/EldinZenderink/SoundOverSerial

How to modify timing in readyRead of QextSeriaport [duplicate]

I'm implementing a protocol over serial ports on Linux. The protocol is based on a request answer scheme so the throughput is limited by the time it takes to send a packet to a device and get an answer. The devices are mostly arm based and run Linux >= 3.0. I'm having troubles reducing the round trip time below 10ms (115200 baud, 8 data bit, no parity, 7 byte per message).
What IO interfaces will give me the lowest latency: select, poll, epoll or polling by hand with ioctl? Does blocking or non blocking IO impact latency?
I tried setting the low_latency flag with setserial. But it seemed like it had no effect.
Are there any other things I can try to reduce latency? Since I control all devices it would even be possible to patch the kernel, but its preferred not to.
---- Edit ----
The serial controller uses is an 16550A.
Request / answer schemes tends to be inefficient, and it shows up quickly on serial port. If you are interested in throughtput, look at windowed protocol, like kermit file sending protocol.
Now if you want to stick with your protocol and reduce latency, select, poll, read will all give you roughly the same latency, because as Andy Ross indicated, the real latency is in the hardware FIFO handling.
If you are lucky, you can tweak the driver behaviour without patching, but you still need to look at the driver code. However, having the ARM handle a 10 kHz interrupt rate will certainly not be good for the overall system performance...
Another options is to pad your packet so that you hit the FIFO threshold every time. It will also confirm that if it is or not a FIFO threshold problem.
10 msec # 115200 is enough to transmit 100 bytes (assuming 8N1), so what you are seeing is probably because the low_latency flag is not set. Try
setserial /dev/<tty_name> low_latency
It will set the low_latency flag, which is used by the kernel when moving data up in the tty layer:
void tty_flip_buffer_push(struct tty_struct *tty)
{
unsigned long flags;
spin_lock_irqsave(&tty->buf.lock, flags);
if (tty->buf.tail != NULL)
tty->buf.tail->commit = tty->buf.tail->used;
spin_unlock_irqrestore(&tty->buf.lock, flags);
if (tty->low_latency)
flush_to_ldisc(&tty->buf.work);
else
schedule_work(&tty->buf.work);
}
The schedule_work call might be responsible for the 10 msec latency you observe.
Having talked to to some more engineers about the topic I came to the conclusion that this problem is not solvable in user space. Since we need to cross the bridge into kernel land, we plan to implement an kernel module which talks our protocol and gives us latencies < 1ms.
--- edit ---
Turns out I was completely wrong. All that was necessary was to increase the kernel tick rate. The default 100 ticks added the 10ms delay. 1000Hz and a negative nice value for the serial process gives me the time behavior I wanted to reach.
Serial ports on linux are "wrapped" into unix-style terminal constructs, which hits you with 1 tick lag, i.e. 10ms. Try if stty -F /dev/ttySx raw low_latency helps, no guarantees though.
On a PC, you can go hardcore and talk to standard serial ports directly, issue setserial /dev/ttySx uart none to unbind linux driver from serial port hw and control the port via inb/outb to port registers. I've tried that, it works great.
The downside is you don't get interrupts when data arrives and you have to poll the register. often.
You should be able to do same on the arm device side, may be much harder on exotic serial port hw.
Here's what setserial does to set low latency on a file descriptor of a port:
ioctl(fd, TIOCGSERIAL, &serial);
serial.flags |= ASYNC_LOW_LATENCY;
ioctl(fd, TIOCSSERIAL, &serial);
In short: Use a USB adapter and ASYNC_LOW_LATENCY.
I've used a FT232RL based USB adapter on Modbus at 115.2 kbs.
I get about 5 transactions (to 4 devices) in about 20 mS total with ASYNC_LOW_LATENCY. This includes two transactions to a slow-poke device (4 mS response time).
Without ASYNC_LOW_LATENCY the total time is about 60 mS.
With FTDI USB adapters ASYNC_LOW_LATENCY sets the inter-character timer on the chip itself to 1 mS (instead of the default 16 mS).
I'm currently using a home-brewed USB adapter and I can set the latency for the adapter itself to whatever value I want. Setting it at 200 µS shaves another mS off that 20 mS.
None of those system calls have an effect on latency. If you want to read and write one byte as fast as possible from userspace, you really aren't going to do better than a simple read()/write() pair. Try replacing the serial stream with a socket from another userspace process and see if the latencies improve. If they don't, then your problems are CPU speed and hardware limitations.
Are you sure your hardware can do this at all? It's not uncommon to find UARTs with a buffer design that introduces many bytes worth of latency.
At those line speeds you should not be seeing latencies that large, regardless of how you check for readiness.
You need to make sure the serial port is in raw mode (so you do "noncanonical reads") and that VMIN and VTIME are set correctly. You want to make sure that VTIME is zero so that an inter-character timer never kicks in. I would probably start with setting VMIN to 1 and tune from there.
The syscall overhead is nothing compared to the time on the wire, so select() vs. poll(), etc. is unlikely to make a difference.

Why TCP/IP speed depends on the size of sending data?

When I sent small data (16 bytes and 128 bytes) continuously (use a 100-time loop without any inserted delay), the throughput of TCP_NODELAY setting seems not as good as normal setting. Additionally, TCP-slow-start appeared to affect the transmission in the beginning.
The reason is that I want to control a device from PC via Ethernet. The processing time of this device is around several microseconds, but the huge latency of sending command affected the entire system. Could you share me some ways to solve this problem? Thanks in advance.
Last time, I measured the transfer performance between a Windows-PC and a Linux embedded board. To verify the TCP_NODELAY, I setup a system with two Linux PCs connecting directly with each other, i.e. Linux PC <--> Router <--> Linux PC. The router was only used for two PCs.
The performance without TCP_NODELAY is shown as follows. It is easy to see that the throughput increased significantly when data size >= 64 KB. Additionally, when data size = 16 B, sometimes the received time dropped until 4.2 us. Do you have any idea of this observation?
The performance with TCP_NODELAY seems unchanged, as shown below.
The full code can be found in https://www.dropbox.com/s/bupcd9yws5m5hfs/tcpip_code.zip?dl=0
Please share with me your thinking. Thanks in advance.
I am doing socket programming to transfer a binary file between a Windows 10 PC and a Linux embedded board. The socket library are winsock2.h and sys/socket.h for Windows and Linux, respectively. The binary file is copied to an array in Windows before sending, and the received data are stored in an array in Linux.
Windows: socket_send(sockfd, &SOPF->array[0], n);
Linux: socket_recv(&SOPF->array[0], connfd);
I could receive all data properly. However, it seems to me that the transfer time depends on the size of sending data. When data size is small, the received throughput is quite low, as shown below.
Could you please shown me some documents explaining this problem? Thank you in advance.
To establish a tcp connection, you need a 3-way handshake: SYN, SYN-ACK, ACK. Then the sender will start to send some data. How much depends on the initial congestion window (configurable on linux, don't know on windows). As long as the sender receives timely ACKs, it will continue to send, as long as the receivers advertised window has the space (use socket option SO_RCVBUF to set). Finally, to close the connection also requires a FIN, FIN-ACK, ACK.
So my best guess without more information is that the overhead of setting up and tearing down the TCP connection has a huge affect on the overhead of sending a small number of bytes. Nagle's algorithm (disabled with TCP_NODELAY) shouldn't have much affect as long as the writer is effectively writing quickly. It only prevents sending less than full MSS segements, which should increase transfer efficiency in this case, where the sender is simply sending data as fast as possible. The only effect I can see is that the final less than full MSS segment might need to wait for an ACK, which again would have more impact on the short transfers as compared to the longer transfers.
To illustrate this, I sent one byte using netcat (nc) on my loopback interface (which isn't a physical interface, and hence the bandwidth is "infinite"):
$ nc -l 127.0.0.1 8888 >/dev/null &
[1] 13286
$ head -c 1 /dev/zero | nc 127.0.0.1 8888 >/dev/null
And here is a network capture in wireshark:
It took a total of 237 microseconds to send one byte, which is a measly 4.2KB/second. I think you can guess that if I sent 2 bytes, it would take essentially the same amount of time for an effective rate of 8.2KB/second, a 100% improvement!
The best way to diagnose performance problems in networks is to get a network capture and analyze it.
When you make your test with a significative amount of data, for example your bigger test (512Mib, 536 millions bytes), the following happens.
The data is sent by TCP layer, breaking them in segments of a certain length. Let assume segments of 1460 bytes, so there will be about 367,000 segments.
For every segment transmitted there is a overhead (control and management added data to ensure good transmission): in your setup, there are 20 bytes for TCP, 20 for IP, and 16 for ethernet, for a total of 56 bytes every segment. Please note that this number is the minimum, not accounting the ethernet preamble for example; moreover sometimes IP and TCP overhead can be bigger because optional fields.
Well, 56 bytes for every segment (367,000 segments!) means that when you transmit 512Mib, you also transmit 56*367,000 = 20M bytes on the line. The total number of bytes becomes 536+20 = 556 millions of bytes, or 4.448 millions of bits. If you divide this number of bits by the time elapsed, 4.6 seconds, you get a bitrate of 966 megabits per second, which is higher than what you calculated not taking in account the overhead.
From the above calculus, it seems that your ethernet is a gigabit. It's maximum transfer rate should be 1,000 megabits per second and you are getting really near to it. The rest of the time is due to more overhead we didn't account for, and some latencies that are always present and tend to be cancelled as more data is transferred (but they will never be defeated completely).
I would say that your setup is ok. But this is for big data transfers. As the size of the transfer decreases, the overhead in the data, latencies of the protocol and other nice things get more and more important. For example, if you transmit 16 bytes in 165 microseconds (first of your tests), the result is 0.78 Mbps; if it took 4.2 us, about 40 times less, the bitrate would be about 31 Mbps (40 times bigger). These numbers are lower than expected.
In reality, you don't transmit 16 bytes, you transmit at least 16+56 = 72 bytes, which is 4.5 times more, so the real transfer rate of the link is also bigger. But, you see, transmitting 16 bytes on a TCP/IP link is the same as measuring the flow rate of an empty acqueduct by dropping some tears of water in it: the tears get lost before they reach the other end. This is because TCP/IP and ethernet are designed to carry much more data, with reliability.
Comments and answers in this page point out many of those mechanisms that trade bitrate and reactivity for reliability: the 3-way TCP handshake, the Nagle algorithm, checksums and other overhead, and so on.
Given the design of TCP+IP and ethernet, it is very normal that, for little data, performances are not optimal. From your tests you see that the transfer rate climbs steeply when the data size reaches 64Kbytes. This is not a coincidence.
From a comment you leaved above, it seems that you are looking for a low-latency communication, instead than one with big bandwidth. It is a common mistake to confuse different kind of performances. Moreover, in respect to this, I must say that TCP/IP and ethernet are completely non-deterministic. They are quick, of course, but nobody can say how much because there are too many layers in between. Even in your simple setup, if a single packet get lost or corrupted, you can expect delays of seconds, not microseconds.
If you really want something with low latency, you should use something else, for example a CAN. Its design is exactly what you want: it transmits little data with high speed, low latency, deterministic time (just microseconds after you transmitted a packet, you know if it has been received or not. To be more precise: exactly at the end of the transmission of a packet you know if it reached the destination or not).
TCP sockets typically have a buffer size internally. In many implementations, it will wait a little bit of time before sending a packet to see if it can fill up the remaining space in the buffer before sending. This is called Nagle's algorithm. I assume that the times you report above are not due to overhead in the TCP packet, but due to the fact that the TCP waits for you to queue up more data before actually sending.
Most socket implementations therefore have a parameter or function called something like TcpNoDelay which can be false (default) or true. I would try messing with that and seeing if that affects your throughput. Essentially these flags will enable/disable Nagle's algorithm.

Is it possible to use 9-bit serial communication in Linux?

RS-232 communication sometimes uses 9-bit bytes. This can be used to communicate with multiple microcontrollers on a bus where 8 bits are data and the extra bit indicates an address byte (rather than data). Inactive controllers only generate an interrupt for address bytes.
Can a Linux program send and receive 9-bit bytes over a serial device? How?
The termios system does not directly support 9 bit operation but it can be emulated on some systems by playing tricks with the CMSPAR flag. It is undocumented and my not appear in all implementations.
Here is a link to a detailed write-up on how 9-bit emulation is done:
http://www.lothosoft.ch/thomas/libmip/markspaceparity.php
9-bit data is a standard part of RS-485 and used in multidrop applications. Hardware based on 16C950 devices may support 9-bits, but only if the UART is used in its 950 mode (rather than the more common 450/550 modes used for RS-232).
A description of the 16C950 may be found here.
This page summarizes Linux RS-485 support, which is baked into more recent kernels (>=3.2 rc3).
9-bit data framing is possible even if a real world UARTs doesn't.
Found one library that also does it under Windows and Linux.
See http://adontec.com/9-bit-serial-communication.htm
basically what he wants is to output data from a linux box, then send it on let's say a 2 wire bus with a bunch of max232 ic's -> some microcontroller with uart or software rs232 implementation
one can leave the individual max232 level converter's away as long as there are no voltage potency issues between the individual microcontrollers (on the same pcb, for example, rather than in different buildings ;) up until the maximum output (ttl) load of the max232 (or clones, or a resistor and invertor/transistor) ic.
can't find linux termios settings for MARK or SPACE parity (Which i'm sure the hardware uarts actually do support, just not linux tty implementation), so we shall just hackzor the actual parity generation a bit.
8 data bits, 2 stop bits is the same length as 8 databits, 1 parity bit, 1 stop bit. (where the first stopbit is a logic 1, negative line voltage).
one would then use the 9th bit as an indicator that the other 8 bits are the address of the individual or group of microcontrollers, which then take the next bytes as some sort of command, or data, as well, they are 'addressed'.
this provides for an 8 bit transparant, although one way traffic, means to address 'a lot of things' (256 different (groups of) things, actually ;) on the same bus. it's one way, for when one would want to do 2 way, you'd need 2 wire pairs, or modulate at multiple frequencies, or implement colission detection and the whole lot of that.
PIC microcontrollers can do 9 bit serial communication with ehm 'some trickery' (the 9th bit is actually in another register ;)
now... considering the fact that on linux and the likes it is not -that- simple...
have you considered simply turning parity on for the 'address word' (the one in which you need 9 bits ;) and then either setting it to odd or even, calculate it so that the right one is chosen to make the 9th (parity) bit '1' with parity on and 8 bit 'data', then turn parity back off and turn 2 stop bits on. (which still keeps a 9 bit word length in as far as your microcontroller is concerned ;)... it's a long time ago but as far as i recall stop bits are just as long as data bits in the timing of things.
this should work on anything that can do 8 bit output, with parity, and with 2 stop bits. which includes pc hardware and linux. (and dos etc)
pc hardware also has options to just turn 'parity' on or off for all words (Without actually calculating it) if i recall correctly from 'back in the days'
furthermore, the 9th bit the pic datasheet speaks about, actually IS the parity bit as in RS-232 specifications. just that you're free to turn it off or on. (on PIC's anyway - in linux it's a bit more complicated than that)
(nothing a few termios settings on linux won't solve i think... just turn it on and off then... we've made that stuff do weirder things ;)
a pic microcontroller actually does exactly the same, just that it's not presented like 'what it actually is' in the datasheet. they actually call it 'the 9th bit' and things like that. on pc's and therefore on linux it works pretty much the same way tho.
anyway if this thing should work 'both ways' then good luck wiring it with 2 pairs or figuring out some way to do collission detection, which is hell a lot more problematic than getting 9 bits out.
either way it's not much more than an overrated shift register. if the uart on the pc doesn't want to do it (which i doubt), just abuse the DTR pin to just shift out the data by hand, or abuse the printer port to do the same, or hook up a shift register to the printer port... but with the parity trick it should work fine anyway.
#include<termios.h>
#include<stdio.h>
#include<sys/types.h>
#include<sys/stat.h>
#include<fcntl.h>
#include<unistd.h>
#include<stdint.h>
#include<string.h>
#include<stdlib.h>
struct termios com1pr;
int com1fd;
void bit9oneven(int fd){
cfmakeraw(&com1pr);
com1pr.c_iflag=IGNPAR;
com1pr.c_cflag=CS8|CREAD|CLOCAL|PARENB;
cfsetispeed(&com1pr,B300);
cfsetospeed(&com1pr,B300);
tcsetattr(fd,TCSANOW,&com1pr);
};//bit9even
void bit9onodd(int fd){
cfmakeraw(&com1pr);
com1pr.c_iflag=IGNPAR;
com1pr.c_cflag=CS8|CREAD|CLOCAL|PARENB|PARODD;
cfsetispeed(&com1pr,B300);
cfsetospeed(&com1pr,B300);
tcsetattr(fd,TCSANOW,&com1pr);
};//bit9odd
void bit9off(int fd){
cfmakeraw(&com1pr);
com1pr.c_iflag=IGNPAR;
com1pr.c_cflag=CS8|CREAD|CLOCAL|CSTOPB;
cfsetispeed(&com1pr,B300);
cfsetospeed(&com1pr,B300);
tcsetattr(fd,TCSANOW,&com1pr);
};//bit9off
void initrs232(){
com1fd=open("/dev/ttyUSB0",O_RDWR|O_SYNC|O_NOCTTY);
if(com1fd>=0){
tcflush(com1fd,TCIOFLUSH);
}else{printf("FAILED TO INITIALIZE\n");exit(1);};
};//initrs232
void sendaddress(unsigned char x){
unsigned char n;
unsigned char t=0;
for(n=0;n<8;n++)if(x&2^n)t++;
if(t&1)bit9oneven(com1fd);
if(!(t&1))bit9onodd(com1fd);
write(com1fd,&x,1);
};
void main(){
unsigned char datatosend=0x00; //bogus data byte to send
initrs232();
while(1){
bit9oneven(com1fd);
while(1)write(com1fd,&datatosend,1);
//sendaddress(223); // address microcontroller at address 223;
//write(com1fd,&datatosend,1); // send an a
//sendaddress(128); // address microcontroller at address 128;
//write(com1fd,&datatosend,1); //send an a
}
//close(com1fd);
};
somewhat works.. maybe some things the wrong way around but it does send 9 bits. (CSTOPB sets 2 stopbits, meaning that on 8 bit transparant data the 9th bit = 1, in addressing mode the 9th bit = 0 ;)
also take note that the actual rs232 line voltage levels are the other way around from what your software 'reads' (which is the same as the 'inverted' 5v ttl levels your pic microcontroller gets from the transistor or inverter or max232 clone ic). (-19v or -10v (pc) for logic 1, +19/+10 for logic 0), stop bits are negative voltage, like a 1, and the same lenght.
bits go out 0-7 (and in this case: 8 ;)... so start bit -> 0 ,1,2,3,4,5,6,7,
it's a bit hacky but it seems to work on the scope.
Can a Linux program send and receive 9-bit bytes over a serial device?
The standard UART hardware (8251 etc.) doesn't support 9-bit-data modes.
I also made complete demo for 9-bit UART emulation (based on even/odd parity). You can find it here.
All sources available on git.
You can easily adapt it for your device. Hope you like it.

Serial port : not able to write big chunk of data

I am trying to send text data from one PC to other using Serial cable. One of the PC is running linux and I am sending data from it using write(2) system call. The log size is approx 65K bytes but the write(2) system call returns some 4K bytes (i.e. this much amount of data is getting transferred). I tried breaking the data in chunks of 4K but write(2) returns -1.
My question is that "Is there any buffer limit for writing data on serial port? or can I send data of any size?. Also do I need to continously read data from other PC as I write 4K chunk of data"
Do I need to do any special configuration in termios structure for sending (huge) data?
The transmit buffer is one page (took a look at Linux 2.6.18 sources) - which is 4K in most (if not all) cases.
The other end must read (don't know the size of the receive buffer), but more importantly you should not write faster than the serial port can transmit, if you are using 115200 bps 8-N-1 you can write the 4K chunk approximately 3 times a second. (115200 / 9 / 4096 = 3.125)
Yes, there is a buffer limit - but when you reach that limit, the write() should block.
When write() returns -1, what is errno set to?
Make sure that the receiver is reading.
You should update the current position it your buffer from the write(), and continue the next write from there. (Applies to all writes(), regardless if the fd is a serial port, tcp socket or a file.)
If you get an error back for subsequent writes. Judging by the manpage, its safe to retry the writes for the following errnos: EAGAIN, EINTR, and probably ENOSPC. Use perror() to see what you get. (..and post it, I am curious.)
EFBIG would seem to indicate that you are trying to write using a buffer (or rather count) that is too large, but that is probably much larger than 64k.
If the internal buffer is filled up, because you are writing to fast, try to (nano)sleep a little between the writes. There are several clever ways of doing this (like tcp does), but if the rate is known, just write at a fixed rate.
If you think the receiver is actually reading, but not much happens, have a look at the serial ports flow-control options and if the cable is wired for DTS/RTS.

Resources