Which microcontroller for fast high quality audio switching and playback - audio

I'm building a device which will play high quality sound samples and will switch between samples in >5ms when a signal is applied.
I'm after a microcontroller which can allow this - I need 4 I/O pins for triggering the transistions between sounds, as well as the output pin(s) for the audio. The duration of the audio files will be 50ms or so but ideally would have enough storage to allow the files to be 1 second or longer. It will loop the current file until told to change. I don't want audiable pops or suchlike when switching files or running other commands - but there shouldn't be a need for anything complex to run beside it, it's purely audio playing and switching.
I've looked at various microcontrollers in the arduino family but they don't seem optimal for this purpose - (tried for example mozzi library for arduino but it's not fantastic quality). Ideally I could do it all on the chip (whatever it is, doesn't need to be arduino) - without needing external storage or RAM modules. But if that's neccessary I'll do it. The solution is to fit in a 2cm wide cylinder (but no length constraints) so would be ideally within that - so no SD card modules or whatever. Language wise - I'm fairly new to them all - but can learn whatever would be best.
Audio - (44.1kHz CD quality WAV, although could obviously switch to a different format if neccessary). If this is totally impossible to play such a high quality sound - then sound quality could be less.
Thank you for your help

For a simple application like this you would be best to just use a small ARM Cortex M device hooked up to an external SPI FLASH chip. Most microcontrollers scale processing power and RAM with FLASH storage so keeping it all on one chip will result in a grotesquely over-powered solution. Serial FLASH memory is very cheap, easy to use, and you can change the size in the future if you need to add more samples.
For the audio side if you really want CD quality you'll have to look at getting a external audio DAC as I don't know of any microcontrollers that integrate a CD quality codec. External DACs aren't expensive or complex to use, but just adds to the physical size and BOM cost. Many Cortex chips have built in 12-bit DACs though so if the audio has a reasonably small dynamic range you might find this is suitable for your needs.
In terms of minimising pops and clicks the Cortex devices will have enough power for some basic filtering to deal with this. I would recommend against Arduino though as you will quickly come up against processing power limitations and I doubt you will want to dive into assembler optimisations.

Related

ALSA vs PulseAudio - Latency Concerns

Good day,
I have been debating some details with a colleague about ALSA vs PulseAudio, and need some help coming to a conclusion with it. It's to my understanding that ALSA is relatively low-level, and talks directly to the hardware, while PulseAudio sits on top of ALSA as a service.
Additionally, it's to my understanding that ALSA is tied to Linux, but PulseAudio just acts as an abstraction layer on top of ALSA, and can work on other platforms. My conclusion is that ALSA would provide lower audio latency on most Linux systems, while my colleague contends that PulseAudio provides better (shorter) latency regardless.
Which of us is correct? My reasoning is that since PulseAudio sits on top of ALSA or even wraps it, there's no way it could provide better latency unless it's providing its own low-level calls.
Thank you.
ALSA (like many other sound APIs) provides a ring buffer for samples to be played.
The most common way to use this ring buffer is to keep it filled at all times.
This implies that a sample that is written now is played only after all the other samples in the buffer have been played, i.e., the latency is proportional to the buffer's size.
(The buffer size can be chosen by the application, but depends on the capabilities of the hardware, and is fixed once chosen.)
PulseAudio has the ability to keep only a part of the buffer filled.
(This is not a feature offered directly by ALSA, but requires a separate timer to monitor the playback progress.)
Thus it can offer lower latency than other applications using the same buffer size, but more importantly, this allows it to adjust the latency dynamically without having to stop and reconfigure the device.
Other applications could do the same, but it's easier to use PulseAudio than to implement that buffer handling again.

What bluetooth to use (2.1 or 4.0) and how?

The title seems to be too general (I can't think of a good title). I'll try to be specific in the question description.
I was required to make an industrial control box that collects data periodically (maybe 10-20 bytes of data per 5 seconds). The operator will use a laptop or mobile phone to collect the data (without opening the box) via Bluetooth, weekly or monthly or probably at even longer period.
I will be in charge of selecting proper modules/chips, doing PCB, and doing the embedded software too. Because the box is not produced in high volume, so I have freedom (modules/chips to use, prices, capabilities, etc.) in designing different components.
The whole application requires an USART port to read in data when available (maybe every 5-10 seconds), a SPI port for data storage (SD Card reader/writer), several GPIO pins for LED indicator or maybe buttons (whether we need buttons and how many is up to my design).
I am new to Bluetooth. I read through wiki and some googling pages. Knowing about the pairing, knowing about the class 1 and class 2 differences, knowing about the 2.1 and 4.0 differences.
But I still have quite several places not clear to decide what Bluetooth module/chip to use.
A friend of mine mentioned TI CC2540 to me. I checked and it only supports 4.0 BLE mode. And from Google, BT4.0 has payload of at most 20 bytes. Is BT4.0 suitable for my application when bulk data will need to be collected every month or several months? Or it's better to use BT2.1 with EDR for this application? BT4.0 BLE mode seems to have faster pairing speed but lower throughput?
I read through CC2540, and found that it is not a BT only chip, it has several GPIO pins and uart pins (I am not sure about SPI). Can I say that CC2540 itself is powerful enough to hold the whole application? Including bluetooth, data receiving via UART, and SD card reading/writing?
My original design was to use an ARM cortex-M/AVR32 MCU. The program is just a loop to serve each task/events in rounds (or I can even install Linux). There will be a Bluetooth module. The module will automatically take care of pairing. I will only need to send the module what data to send to the other end. Of course there might be some other controlling, such as to turn the module to low power mode because the Bluetooth will only be used once per month or something like that. But after some study of Bluetooth, I am not sure whether such BT module exists or not. Is programming chips like CC2540 a must?
In my understanding, my designed device will be a BT slave, the laptop/phone will be the master. My device will periodically probe (maybe with longer period to save power) the existence of master and pair with it. Once it's paired, it will start sending data. Is my understanding on the procedure correct? Is there any difference in pairing/data sending for 2.1 and 4.0?
How should authentication be designed? I of course want unlimited phones/laptops to pair with the device, but only if they can prove they are the operator.
It's a bit messy. I will be appreciated if you have read through the above questions. The following is the summary,
2.1 or 4.0 to use?
Which one is the better choice? Meaning that suitable for the application, and easy to develop.
ARM/avr32 + CC2540 (or the like)
CC2540 only or the like (if possible)
ARM/avr32 + some BT modules ( such as Bluegiga https://www.bluegiga.com/en-US/products/ )
Should Linux be used?
How the pairing and data sending should be like for power saving? Are buttons useful to facilitate the sleep mode and active pairing and data sending mode for power saving?
How the authentication should be done? Only operators are allowed but he can use any laptops/phones.
Thanks for reading.
My two cents (I've done a fair amount of Bluetooth work, and I've designed consumer products currently in the field)... Please note that I'll be glossing over A LOT of design decisions to keep this 'concise'.
2.1 or 4.0 to use?
Looking over your estimates, it sounds like you're looking at about 2MB of data per week, maybe 8MB per month. The question of what technology to use here comes down to how long people are willing to wait to collect data.
For BLE (BT 4.0) - assume your data transfer is in the 2-4kB/s range. For 2.1, assume it's in the 15-30kB range. This depends on a ton of factors, but has generally been my experience between Android/iOS and BLE devices.
At 2MB, either one of those is going to take a long time to transfer. Is it possible to get data more frequently? Or maybe have a wifi-connected base station somewhere that collects data frequently? (I'm not sure of the application)
Which one is the better choice? Meaning that suitable for the
application, and easy to develop. ARM/avr32 + CC2540 (or the like)
CC2540 only or the like (if possible) ARM/avr32 + some BT modules (
such as Bluegiga https://www.bluegiga.com/en-US/products/ ) Should
Linux be used?
This is a tricky question to answer without knowing more about the complexity of the system, the sensors, the data storage capacity, BOM costs, etc...
In my experience, a Linux OS is HUGE overkill for a simple GPIO/UART/I2C based system (unless you're super comfortable with Linux). Chips that can run Linux, and add the additional RAM are usually costly (e.g. a cheap ARM Cortex M0 is like 50 cents in decent volume and sounds like all you need to use).
The question usually comes down to 'external MCU or not' - as in, do you try to get an all-in-one BT module, which has application space to program on. This is a size and cost savings to using it, but it adds risks and unknowns vs a braindead BT module + external MCU.
One thing I would like to clarify is that you mention the TI CC2540 a few times (actually, CC2541 is the newer version). Anyways, that chip is an IC level component. Unless you want to do the antenna design and put it through FCC intentional radiator certification (the certs are between 1k-10k usually - and I'm assuming you're in the US when I say FCC).
What I think you're looking for is a ready-made, certified module. For example, the BLE113 from Bluegiga is an FCC certified module which contains the CC2541 internally (plus some bells and whistles). That one specifically also has an interpreted language called BGScript to speed up development. It's great for very simple applications - and has nice, baked in low-power modes.
So, the BLE113 is an example of a module that can be used with/without an external MCU depending on application complexity.
If you're willing to go through FCC intentional radiator certification, then the TI CC2541 is common, as well as the Nordic NRF51822 (this chip has a built in ARM core that you can program on as well, so you don't need an external MCU).
An example of a BLE module which requires an external MCU would be the Bobcats (AMS001) from AckMe. They have a BLE stack running on the chip which communicates to an external MCU using UART.
As with a comment above, if you need iOS compatibility, using a Bluetooth 2.1 (BT Classic) is a huge pain due to the MFI program (which I have gone through - pure misery). So, unless REALLY necessary, I would stick with BT Classic and Android/PC.
Some sample BT classic chips might be the Roving Networks RN42, the AmpedRF BT 33 (or 43 or 53). If you're interested, I did a throughput test on iOS devices with a Bluetooth classic device (https://stackoverflow.com/a/22441949/992509)
How the pairing and data sending should be like for power saving? Are
buttons useful to facilitate the sleep mode and active pairing and
data sending mode for power saving?
If the radio is only on every week or month when the operator is pulling data down, there's very little to do other than to put the BT module into reset mode to ensure there is no power used. BT Classic tends to use more power during transfer, but if you're streaming data constantly, the differences can be minimal if you choose the right modules (e.g. lower throughput on BLE for a longer period of time, vs higher throughput on BT 2.1 for less time - it works itself out in the wash).
The one thing I would do here is to have a button trigger the ability to pair to the BT modules, because that way they're not always on and advertising, and are just asleep or in reset.
How the authentication should be done? Only operators are allowed but
he can use any laptops/phones.
Again, not knowing the environment, this can be tricky. If it's in a secure environment, that should be enough (e.g. behind locked doors that you need to be inside to press a button to wake up the BT module).
Otherwise, at a bare minimum, have the standard BT pairing code enabled, rather than allowing anyone to pair. Additionally, you could bake extra authentication and security in (e.g. you can't download data without a passcode or something).
If you wanted to get really crazy, you could encrypt the data (this might involve using Linux though) and make sure only trusted people had decryption keys.
We use both protocols on different products.
Bluetooth 4.0 BLE aka Smart
low battery consumption
low data rates (I came up to 20 bytes each 40 ms. As I remember apple's minimum interval is 18 ms and other handset makers adapted that interval)
you have to use Bluetooth's characteristics mechanism
you have to implement chaining if your data packages are longer
great distances 20-100m
new technology with a lot of awful premature implementations. Getting better slowly.
we used a chip from Bluegiga that allowed a script language for programming. But still many limitations and bugs are build in.
we had a greater learning curve to implement BLE than using 2.1
Bluetooth 2.1
good for high data rates depending on used baud rate. The bottle neck here was the buffer in the controller.
weak distances 2-10 m
Its much easier to stream data
Did not notice a big time difference in pairing and connecting with both technologies.
Here are two examples of devices which clearly demand either 2.1 or BLE. Maybe your use case is closer to one of those examples:
Humidity sensors attached to trees in a Forrest. Each week the ranger walks through the Forrest and collects the data.
Wireless stereo headsets

How do media timeline apps work under the hood?

There are numerous examples of apps that are time-based and send out events or carry out complex processing to a very fine resolution and accuracy. Think MIDI sequencing apps, audio and video editing apps.
So I'm curious, at their core level, how do these apps do what they do so accurately from a programming point of view?
MIDI and media playback are entirely different in nature, and are handled in different ways.
For MIDI, there is very little data to process. A thread with a high priority is created to handle MIDI I/O. That's all that is needed.
For audio, accuracy isn't a problem but latency is. There is a buffer on the sound interface that is regularly written to by the software playing back audio. For a typical media player, this buffer has storage for about 300ms of audio. The software just writes the PCM-encoded audio waveform to the buffer. The sound interface is constantly reading from this buffer and playing back at a constant rate.
For low-latency audio applications, this buffer size can be very small, handling as little as 5 or 10ms of audio. The software generating the audio data must again be handled by a thread with high priority, and often has many optimizations that keep it running in the event the rest of the software (effects and what not) cannot keep up. Buffer underruns are common. Special drivers are often used to skip unneeded software in the signal chain. ASIO and DirectX are common on Windows. Windows Vista/7 and OSX both call their audio APIs "core audio", and provide low-latency features without special drivers.
Video is an entirely different beast. Decoding the video is handled by hardware, where possible. This is how a slow device such as a cell phone is able to playback 720p video. If the hardware can handle the codec, the software just needs to send it the data. In cases where the codec is unsupported, the video must be decoded in software which is much slower. Even on modern PCs, software decoding often leads to choppy or laggy video.
Synchronization of audio to video is also a problem. I don't know much about it, but it is my understanding that the audio is the master clock, and video is synchronized to it. You cannot simply start playback and expect the timing to work out, as different sound interfaces will have different ideas as to what 44.1kHz (or any other sample rate) is. You can prove this yourself by playing back the same audio on two different devices simultaneously, and listening to them drift apart over time.

Graphics development on ARM

I am planning to make a small OS and run a Tetris clone on it using an ARM Cortex-M3. Unfortunately, I am not able to buy any development boards as of now, so I will have to use simulators.
I have actually looked into QEMU which has LM3S6965EVB support, which contains an ARM Cortex-M3 processor. But apparently newer revisions of the board are not compatible with the model in QEMU as none of the examples I have downloaded from TI seem to work. Even the OLED display is different.
Another problem is to do graphics development as the OLED display for LM3S6965EVB has a really low resolution. I was able to get it up to 640x480 by editing the QEMU source but as I could not get any examples to work, so I don't know if it works either. Using the debug parameters for SSD0323, all I can see is that it accepts some of the data that is sent to initialize the device, then hangs...
I have considered choosing another board in QEMU but that would mean redoing many things from scratch when I get my hands on a real device, as the other ones are too powerful for something as simple as this.
What should I do? Are there any other simulators out there that can help me accomplish what I am trying to do? I want to develop a small OS and some small games.
Thanks in advance. I have been searching for a solution for days and I am really stuck.
How much you have to redo, in part, has to do with your software/system engineering, you can abstract where needed and only have to re-write the abstraction layer not the entire package. Actually you can do much of your software design/testing on your host system and never cross compile, only later cross compile to a simulator or real hardware.
For example, I assume you would construct the next video screen somewhere in ram then depending on the hardware change some bits in a register and page flip or have to do a copy from this frame buffer to the video/lcd in whatever form it wants. Using thumbulator you could build your screen in ram somewhere (in the simulated memory space) then add to the simulator when the simulation writes to such and such register take these bytes from ram and display them on the host computer (running the simulation) basically simulate some hardware. use sdl or basic X calls or whatever you prefer. I normally take snapshots to .bmp files (very easy to write) then look at them later.
Later, on hardware your abstracted update_screen() function would have hardware specific code to display that screen.
thumbulator only runs thumb instructions not ARM and not Thumb2, thumb being the common denominator between the arm processors (ARMv4T and newer except for cortex-m) and those that support thumb2 extensions (cortex-m). other than startup code the compiling and programming experience is the same across the arm family. the code (other than startup code and of course hardware specific accesses) will run across the arm family as well as the simulator. If you go to a cortex-m then adding an architecture specification to the command line will change the build from thumb only to thumb+thumb2 instructions giving you some performance boost. if you surf around my other projects on github you will find this idea reapeated over and over again, I have many simple cortex-m examples where I use gcc and llvm and build the same .C code with thumb instructions and thumb+thumb2.
Another completely different answer is get a GBA (Nintendo Game Boy Advance). You can get a GBA SP (has a backlit display, makes the whole experience better) for about $30 or so on ebay. You can buy flash cartridges that take sd cards for about the same amount. It has an ARM7TDMI, it runs thumb code much faster than ARM code, giving you that thumb experience in preparation for other/newer cores like the cortex-m. For another $30 you can get a game link cable, chop it up, attach a rs232 level shifter (I can talk you through all of this), and make a gba serial cable. My preferred setup is to have a flash cartridge that I have pre-programmed with a serial bootloader, I download the program over serial into ram then run from ram. This avoids having to yank the flash cartridge and/or sd card every time you re-compile the program. doable, and a cheaper solution but gets tiring fast.
If you have a Nintendo DS for $12 to $15 you can get an sd based flash cartridge that you can likewise use for development. I recommend learning the gba first, which you can do on the NDS if you buy a gba side memory cartridge (need a ds lite not an ndsi nor 3d) supported by the software on the cartridge. (the ez flash 3 in 1 gba size for example is a good one, as well as memory you can flash that one with the nds and carry it over to the gba (this is how I put my serial bootloader on it)). these loaders will let you put your .gba file on the nds cartridge sd card then load it into the gba cartridge and it switches the nds into gba mode and runs as a gba.
there are lots of other solutions, sparkfun.com likely has a number of arm based boards that can drive lcds and/or come with lcds. You can go to earthlcd and get one of the serial based lcd panels that make for rapid development, later of course a cheaper solution is desired. Along the same lines you could instead simulate an earthlcd like thing using your host computer have the embedded microcontroller send screen updates over serial to the host and the host displays the graphics. Later replace that screen update with something else.
This latter solution, for about $20 you can get a stm32f4 discovery board, has a cortex-m4, runs up to 168MHz, has a number of serial ports of which at least two have pins not being used by something else you could easily have one port for debug messages and the other for this virtual serial screen. In the stm32f4 directory in my stm32vld repo on github I have a number of getting started examples for using that board (as well as the stm32vld which is a few bucks cheaper but not as powerful as this stm32f4). Likewise your host application can take keystrokes and turn them into user control/game control commands back into the game software on the microcontroller.
There is of course the beagleboard or hawkboard or raspberri pi when it comes out, or open-rd (I dont like the plug computer but do like the open rd) which have video processing and video output direct to a monitor and/or tv using composite or whatever. About $150 to $200 and it just works run with it. You definitely dont need to run linux on these platforms, you can make your own os or whatever you like and run that, very simple.
There are more solutions than you probably have time and/or money to pursue you need to find one that fits within your comfort or happyness zone for how you like to do development and try that path.

Fast Audio Input/Output

Here's what I want to do:
I want to allow the user to give my program some sound data (through a mic input), then hold it for 250ms, then output it back out through the speakers.
I have done this already using Java Sound API. The problem is that it's sorta slow. It takes a minimum of about 1-2 seconds from the time the sound is made to the time the sound is heard again from the speakers, and I haven't even tried to implement delay logic yet. Theoretically there should be no delay, but there is. I understand that you have to wait for the sound card to fill up its buffer or whatever, and the sample size and sampling rate have something to do with this.
My question is this: Should I continue down the Java path trying to do this? I want to get the delay down to like 100ms if possible. Does anyone have experience using the ASIO driver with Java? Supposedly it's faster..
Also, I'm a .NET guy. Does this make sense to do with .NET instead? What about C++? I'm looking for the right technology to use here, and maybe a good example of how to read/write to audio input/output streams using your suggested technology platform. Thanks for your help!
I've used JavaSound in the past and found it wonderfully flaky (and it keeps changing between VM releases). If you like C#, use it, just use the DirectX APIs. Here's an example of doing kind of what you want to do using DirectSound and C#. You could use the Effects plugins to perform your 250 ms echo.
http://blogs.microsoft.co.il/blogs/tamir/archive/2008/12/25/capturing-and-streaming-sound-by-using-directsound-with-c.aspx
You may want to look into JACK, an audio API designed for low-latency sound processing. Additionally, Google turns up this nifty presentation [PDF] about using JACK with Java.
Theoretically there should be no delay, but there is.
Well, it's impossible to have zero delay. The best you can hope for is an unnoticeable delay (in terms of human perception). It might help if you describe your basic algorithm for reading & writing the sound data, so people can identify possible problems.
A potential issue with using a garbage-collected language like Java is that the GC will periodically run, interrupting your processing for some arbitrary amount of time. However, I'd be surprised if it's >100ms in normal usage. If GC is a problem, most JVMs provide alternate collection algorithms you can try.
If you choose to go down the C/C++ path, I highly recommend using PortAudio ( http://portaudio.com/ ). It works with almost everything on multiple platforms and it gives you low-level control of the sound drivers without actually having to deal with the various sound driver technology that is around.
I've used PortAudio on multiple projects, and it is a real joy to use. And the license is permissive.
If low latency is your goal, you can't beat C.
libsoundio is a low-level C library for real-time audio input and output. It even comes with an example program that does exactly what you want - piping the microphone input to the speakers output.
It's possible with JavaSound to get end-to-end latency in the ballpark of 100-150ms.
The primary cause of latency is the buffer sizes of the capture and playback lines. The bufferSize is set when opening the lines:
capture: TargetDataLine#open(AudioFormat format, int bufferSize)
playback: SourceDataLine#open(AudioFormat format, int bufferSize)
If the buffer is too big it will cause excess latency, but if it's too small it will cause stuttery playback. So you need to find a balance for your applications needs and your computing power.
The default buffer size can be checked with DataLine#getBufferSize when calling #open(AudioFormat format). The default size will vary based on the AudioFormat and seems to be geared for high latency, stutter free playback applications (e.g. internet streaming). If you're developing a low latency application, the default buffer size is much too large and should be changed.
In my testing with a 16-bit PCM AudioFormat, a buffer size of 1024 bytes has been pretty close to ideal for low latency.
The second and often overlooked cause of audio latency is any other activity being done in the capture or playback threads. For example, logging messages to console can introduce 10's of ms of latency. Turn it off.

Resources