I have a bit bang code that allows me to send like 4 megs of data through SPI lines. Its embedded code for custom Hardware using a Linux Kernel.
The problem is that takes a VERY long time to do it (4 hours) this is most likely becase the kernel is doing more stuff. Basically my code is something like this(aprox):
unsigned char data=0xFF;
BB_SPI_Init();
SPI_start();//activates chipselect(enable)
for(i=0;i<8;i++){
if(data & 0x80){
gpio_set_value(SPI_MOSI,1);
}else{
gpio_set_value(SPI_MOSI,0);
}
//send pulse clock
gpio_set_value(SPI_CLK,0);
gpio_set_value(SPI_CLK,1);
data<<=1;
}
SPI_stop();//deactivates chipselect(disable)
So is a very simple bit bang, but i notice that if i use write to send data to the linux gpio handler /sys/class/gpio/gpioXX/value (where XX is any gpio number) it takes 4 hours.
But if i use fwrite() for sending to the same device it takes 3 hours.
BUT, if you use write() only for the enables ( SPI_stop(), and SPI_start()) and fwrite() for sending to MISO, CLK it only takes 1 hour and 30 minutes.
So, with that as a base, could someone explain to me how is that happening? my imagination says that is the way the threads are handled and in every software cycle it resolves 2 threads (fwrite() and write()) instead if was only one of the functions used, but now i'm still investigating, can someone let me know any kind of information? is there a better way to handle this?
FYI
Can't use kernel driver spi because the hardware was connected to gpios and it is a mandatory requirement to use bit bang but i accept any suggestion
Thanks in advance
EDIT
Hey Guys thanks for your comments, it seems that i had a problem (very dumb one) that i created the file descriptor each time that i was going to send data to sys/class/gpio/gpioxx/value so that's why was slow. Also turn off some other programs and the transfer skyrocket to 3 minutes instead of 1 hour 30 minutes (with write()). Thanks and Sorry about it
I think that the spi-bitbang driver is the best solution if you are looking for performance. Doing the bit-bang from user space is a pain because you have at least 3 system calls for each bit of data. A system call is an expensive operation.
FYI Can't use kernel driver spi because the hardware was connected to gpios and it is a mandatory requirement to use bit bang but i accept any suggestion
That's why the spi-bitbang driver exists. You can easily configure the spi-bitbang driver to work with your GPIOs.
Then, once you have a spi-bitbang driver, you can write a char device that accept as input your entire block of data and transfer it in kernel space. With this solution you will get the maximum performance for a bit-bang interface.
Related
I had a assignment for college where we needed to play a precompiled wav as integer array through the PWM and DAC. Now, I wanted more of a challenge, so I went out of my way and created a audio dac over usb using the micro controller in question: The STM32F051. It basically listens to my soundcard output using a wasapi loopback recorder, changes the resolution from 16 to 12 bit (since the dac on the stm32 only has a 12 bit resolution) and sends it over using usart using 10x sample rate as baud rate (in my case 960000). All done in C#.
On the microcontroller I simply use a interrupt for usart and push the received data to the dac.
It works pretty well, much better than PWM, and at a decent sample frequency of 48kHz.
But... here it comes.. When there is some (mostly) high pitch symphonic melody it starts to sound "wobbly".
Here a video where you can hear it: https://youtu.be/xD3uTP9etuA?t=88
I read up on the internet a bit about DIY dac's and someone somewhere (don't remember where) mentioned that MCU's in general have interrupt jitter. So may basic question is: Is interrupt jitter actually causing this? If so, are there ways to limit the jitter happening?
Or is this something entirely different?
I am thinking of trying to compact the pcm data send over serial (as said before, resolution of 12 bits, but are sent in packet of 2 8bits forming 16bits, hence twice the samplerate as the baud rate, so my plan is trying to shift 12 bits to the MSB and adding four bits of the next 12 bit value to the current 16 bit variable, hence only needing 12 transfers instead of 16 per 8 samples. Might read upon more efficient ways of compacting data for transport.), put the samples in a buffer and then use another timer that triggers at 48kHz for sending the samples to the dac. Would this concept work? Or would I just waste time?
For code, here is the project: https://github.com/EldinZenderink/SoundOverSerial
I like to sample a signal which is generated by the pins of my Raspberry Pi. I made the experience that high sample rates are difficult to realize.
First I done a fast approach with Python(super slow). Then I changed to ANSI C + the bcm2835.h lib. I gained significant performance gains.
Now I am asking my self the question: How to do the best sampling under Linux?
My tries were undertaken in user space. But what is about, switching to kernel space? I can write a simple character device kernel module. In this module the pins are periodically checked. If the state changed some information is put into a buffer. This i/o buffer is polled by a synchronous file read for a application in user space. The best solution for me would be, if the pin checking can be done with a fixed frequency(sample period should be constant for signal processing).
The set up for this could be:
#kernel: character module + kernel thread + gpio device tree interface + DSP at constant sampling time
#user space: i/o app reading synchronous from character module
Ideas/tips?
I have a solution for you.
I have written such a module:
https://github.com/Appyx/gpio-reflect
You can synchronously read any signal from the GPIO pins.
You can use the output and calculate the signal with your sampling rate.
Just divide the periods.
I'm implementing a protocol over serial ports on Linux. The protocol is based on a request answer scheme so the throughput is limited by the time it takes to send a packet to a device and get an answer. The devices are mostly arm based and run Linux >= 3.0. I'm having troubles reducing the round trip time below 10ms (115200 baud, 8 data bit, no parity, 7 byte per message).
What IO interfaces will give me the lowest latency: select, poll, epoll or polling by hand with ioctl? Does blocking or non blocking IO impact latency?
I tried setting the low_latency flag with setserial. But it seemed like it had no effect.
Are there any other things I can try to reduce latency? Since I control all devices it would even be possible to patch the kernel, but its preferred not to.
---- Edit ----
The serial controller uses is an 16550A.
Request / answer schemes tends to be inefficient, and it shows up quickly on serial port. If you are interested in throughtput, look at windowed protocol, like kermit file sending protocol.
Now if you want to stick with your protocol and reduce latency, select, poll, read will all give you roughly the same latency, because as Andy Ross indicated, the real latency is in the hardware FIFO handling.
If you are lucky, you can tweak the driver behaviour without patching, but you still need to look at the driver code. However, having the ARM handle a 10 kHz interrupt rate will certainly not be good for the overall system performance...
Another options is to pad your packet so that you hit the FIFO threshold every time. It will also confirm that if it is or not a FIFO threshold problem.
10 msec # 115200 is enough to transmit 100 bytes (assuming 8N1), so what you are seeing is probably because the low_latency flag is not set. Try
setserial /dev/<tty_name> low_latency
It will set the low_latency flag, which is used by the kernel when moving data up in the tty layer:
void tty_flip_buffer_push(struct tty_struct *tty)
{
unsigned long flags;
spin_lock_irqsave(&tty->buf.lock, flags);
if (tty->buf.tail != NULL)
tty->buf.tail->commit = tty->buf.tail->used;
spin_unlock_irqrestore(&tty->buf.lock, flags);
if (tty->low_latency)
flush_to_ldisc(&tty->buf.work);
else
schedule_work(&tty->buf.work);
}
The schedule_work call might be responsible for the 10 msec latency you observe.
Having talked to to some more engineers about the topic I came to the conclusion that this problem is not solvable in user space. Since we need to cross the bridge into kernel land, we plan to implement an kernel module which talks our protocol and gives us latencies < 1ms.
--- edit ---
Turns out I was completely wrong. All that was necessary was to increase the kernel tick rate. The default 100 ticks added the 10ms delay. 1000Hz and a negative nice value for the serial process gives me the time behavior I wanted to reach.
Serial ports on linux are "wrapped" into unix-style terminal constructs, which hits you with 1 tick lag, i.e. 10ms. Try if stty -F /dev/ttySx raw low_latency helps, no guarantees though.
On a PC, you can go hardcore and talk to standard serial ports directly, issue setserial /dev/ttySx uart none to unbind linux driver from serial port hw and control the port via inb/outb to port registers. I've tried that, it works great.
The downside is you don't get interrupts when data arrives and you have to poll the register. often.
You should be able to do same on the arm device side, may be much harder on exotic serial port hw.
Here's what setserial does to set low latency on a file descriptor of a port:
ioctl(fd, TIOCGSERIAL, &serial);
serial.flags |= ASYNC_LOW_LATENCY;
ioctl(fd, TIOCSSERIAL, &serial);
In short: Use a USB adapter and ASYNC_LOW_LATENCY.
I've used a FT232RL based USB adapter on Modbus at 115.2 kbs.
I get about 5 transactions (to 4 devices) in about 20 mS total with ASYNC_LOW_LATENCY. This includes two transactions to a slow-poke device (4 mS response time).
Without ASYNC_LOW_LATENCY the total time is about 60 mS.
With FTDI USB adapters ASYNC_LOW_LATENCY sets the inter-character timer on the chip itself to 1 mS (instead of the default 16 mS).
I'm currently using a home-brewed USB adapter and I can set the latency for the adapter itself to whatever value I want. Setting it at 200 µS shaves another mS off that 20 mS.
None of those system calls have an effect on latency. If you want to read and write one byte as fast as possible from userspace, you really aren't going to do better than a simple read()/write() pair. Try replacing the serial stream with a socket from another userspace process and see if the latencies improve. If they don't, then your problems are CPU speed and hardware limitations.
Are you sure your hardware can do this at all? It's not uncommon to find UARTs with a buffer design that introduces many bytes worth of latency.
At those line speeds you should not be seeing latencies that large, regardless of how you check for readiness.
You need to make sure the serial port is in raw mode (so you do "noncanonical reads") and that VMIN and VTIME are set correctly. You want to make sure that VTIME is zero so that an inter-character timer never kicks in. I would probably start with setting VMIN to 1 and tune from there.
The syscall overhead is nothing compared to the time on the wire, so select() vs. poll(), etc. is unlikely to make a difference.
I have a system which uses a UART clocked at 26 Mhz. This is a 16850 UART on a i86 architecture. I have no problems accessing the port. The largest incoming message is about 56 bytes, the largest outgoing about 100. The baud rate divisor needs to be 1 so seterial /dev/ttyS4 baud_base 115200 is OK and opening at 115200. There is no flow control. Specifying part 16850 does NOT set the FIFO to deep. I was losing bytes. All the data is byte, unsigned char.
I wrote a routine that uses ioperm to set the deep FIFO's to 64 and now a read/write works meaning that the deep FIFO's are NOT being enabled by serial_core.c or 8250.c, at least in a deep manner.
With the deep FIFO set using s brute force, post open(fd, "/dev/ttyS4", NO_BLOCKING, etc I get reliably the correct number of bytes but I tend to get the same word missing a bit. Not a byte, a bit.
All this same stuff runs fine under DOS so it is not a hardware issue.
I have opened the port for raw, no delays, no party, 8 bits, 2 stop.
Has anyone seen issues reading serial ports are relatively high speeds with short bursts of data?
Yes, I have tried custom baud, etc. The FIFO levels made the biggest improvement. This is a ISA bus card using IRQ7.
It appears the serial driver for Linux sucks and has way to much latency and far to many features for really basic raw operation.
Has anyone else tried very high speed data without flow control or had similar issues. As I stated, I get the correct number of bytes and all the data is correct except 1 bit in byte 4.
I am pretty stumped.
I would like to know if it would be somehow possible to handle a Serial.println() on an Arduino uno without holding up the main program.
Basically, I'm using the arduino to charge a 400V capacitor and the main program is opening and closing the gate on a MOSFET transistor for 15 and 20 microseconds respectively. I also have a voltage divider connected to the capacitor which I use to measure the voltage on the capacitor when its being charged with the Arduino. I use analogRead() to get the raw value on the pin, multiply the value by the required ratio and try to print that value to the serial console at the end of each cycle. The problem is, though, that in order for the capacitor to charge quickly, the delays need to be very small (in the range of microseconds) and the serial print command takes much longer than that to execute and therefore holds up the entire program.
My question therefore is wherther it would somehow be possible to get the command to execute on something like a different "thread" without holding up the main cycle. Is the 8bit AVR capable of doing something like this?
Thanks
Arduino does not support multithreading, but there are a few libraries that allow you to do the equivalent:
Arduino Multi-Threading Library
ArduinoThread
For more information, you can also visit this question: Does Arduino support threading?
The basic Arduino serial print functions are blocking, They watch the TX Ready flag then load the next byte to transmit. This means if you send "Hello World" the print function will block for as long as it take to load the UART to send 10 characters at your selected baud rate.
One way to deal with this is using high baud rate so you don't wait long.
The better way is to use interrupt driven transmit. This way your string of data to send is buffered with each character sent when the UART can accept the character. The call to send data just loads the TX buffer, loads the first character into the UART to start the process then returns.
This bufferd approach will still block if you fill the TX buffer. The send method would have to wait to load more into the buffer. A good serial lib might give you an easy way to check the buffer status so you could implement your own extra buffering.
Here is a library that claims to be interrupt driven. I've not used it myself. Let us know if it works for you.
https://code.google.com/p/arduino-buffered-serial/
I found this post: Serial output is now interrupt driven. Serial.print() just stuffs the data in a buffer. Interrupts have to happen for that data to actually be shifted out. posted: Nov 11, 2012. I have found that to be true BUT if you try to print more then the internal serial buffer will hold it will lock up until the buffer is at least partially emptied.