Audio reading library for Mono - audio

I'm trying to work with raw audio data for manipulation and playback with OpenAL. So far everything works nice and dandy since I've written my own .wav file reader and have been working with that. However, my goal is for people to import their own music. This implies that my program should support various audio formats and codecs, including atleast MP3, Ogg and FLAC.
Now unlike reading a .wav file, the other formats aren't as straightforward. Now I could possibly write my own readers and / or use wrappers of various existing libraries such as libsnd and the ogg library, but I'd rather not reinvent the wheel. So my question is: is there a library already that allows fetching the raw byte audio data from various formats in Mono?
I've taken a look at NAudio, but it's highly dependant on various Win32 API calls, which is a no-go for me as I intend to make my program multi-platform. At the moment I only care about getting the data for reading and playback purposes, I do not intend to manipulate, mix, or any other kind of computational work.
EDIT:
One important factor I forgot was licensing. I'd prefer an MIT based licensing or other open license that allow me to use the library for free in commercial software. The BASS.Net library for example is out of the question, as licensing the library is out of my budget.
EDIT2:
irrKlang does not support Mono.

After giving NAudio another try, I have noticed that they removed some dependencies on the Win32 API. I can now successfully load WAV files through Mono, and there are extensions available that support FLAC and OGG. MP3 support seems to work only for Windows due licensing issues, but that's okay.

Related

Linux Audio record and quality comparison

I am starting a project to test the audio performance on linux.
What I need to do is to play the audio on our websystem and check the audio quality (or just check it has audio output) on linux.
I am going to record the audio on linux with ffmpeg. Is there any other better choice?
I don't know how to (automation) check I recorded is what I played, as well as the quality of recorded audio.
I think what you need is PESQ (Perceptual Evaluation of Sound Quality). However I have not found anything which is open source/free and out of the box.
You can download the recommendation from here:
http://www.itu.int/rec/T-REC-P.862-200511-I!Amd2/en
Basically this is the reference implementation of PESQ.
Sevana has an audio quality analyser which is not an ITU standard, it is AQuA:
http://www.sevana.fi/aqua_wiki.php
It is available for linux but I think you have to pay for it.
You can also check the similarities for two audio files with cross-correlation, please refer to here:
https://dsp.stackexchange.com/questions/736/how-do-i-implement-cross-correlation-to-prove-two-audio-files-are-similar
I just learned that lot of people are using Matlab or Octave to generate the necessary data, for example:
http://bagustris.blogspot.ie/2011/11/calculate-time-lag-from-cross.html

Converting Audio From Unknown Format

I would like to create a utility in either PHP or Perl to convert an audio file created by the Nortel's Callpilot voice mail system into a wave file. The problem is that the format, which has the .vbk file extension, is unknown to virtually any audio player. To date, I have not found one that will play a .vbk file. I've looked at audio file conversion libraries in CPAN and tried many of them, they don't recognize the file. I was not successful with PHP's audio formats manipulation either. Nortel does provide a converter, however, it does not suite my needs. I would like to have this run via cron on a CentOS system. I don't know how to reverse engineer this format. There seems to be just scraps of info on this format on the web. This page indicates that it is "based on the H.232 format":
https://www.odesk.com/o/jobs/job/Reverse-Engineer-Nortel-VBK-Audio-Format_~~f501f11679f3f6bb/
I know this is a very old thread, but I've recently been looking into converting Nortel's vbk format as well. Importing the vbk files into Audacity with raw data option, Encoding: U-Law, Byte order: little-endian, Channels: 1 Channel (Mono), Sample rate: 8000 Hz. Not sure if they have multiple formats for their vbk files, but mine were from a BCM50 phone system.
Well, this is the joy of closed proprietary systems. But there is a chance they could play nice. Try to contact Callpilot and see if they'll give you the format specs. It's worth a shot.
As for reverse engineering, you need to be able to generate known content. Like a constant tone at 60Hz for exactly 1 second. Then at 50Hz. Then at 10 seconds. Compare them. Isolate the data from the metadata. There is going to be compression involved, so try a handful of common compression schemes, maybe research into Nortel's practices will probably tell you more. If you can feed that into a player and get a tone back out, you're on your way.
There's probably more informed and structured ways to go about reverse engineering, but from my experience it's a lot of trial and error.

How do I create an mp4 file from a collection of H.264 frames and audio frames?

I have a program that captures and stores H.264 encoded video as well as audio into a proprietary format file. I need to be able to export that video and audio to an mp4 file. I prefer C# but will use C++ if necessary. Any suggestions?
To produce MPEG-4 Part 14 .MP4 file you need a multiplexer. There is a choice of multiplexers out there:
FFmpeg (libavformat)
DirectShow filters (free and open source from GDCL, commercial)
Windows 7+ Media Foundation file sink
API and complexity might vary because some of multiplexers are expected to be a part of pipeline, they are not completely standalone classes. You might want to check respective samples (and license agreements, perhaps, too) to see what is best for you.
Take a look at libmp4v2. Fairly straightforward to use..
http://code.google.com/p/mp4v2/

Looking for an expressive audio programming language or library

I'm looking for an audio processing language or library which will allow me to experiment with different synthesis techniques. I've looked at Processing which I think is great at what it does, but haven't found any inspiring (and simple) audio libraries.
As a baseline, I want to simply create my own sample buffers and play them back (ideally in realtime). As a plus, the ability to handle MIDI events would be great. I'm an experienced C++ programmer so I could do it natively on but had hoped there was a more DSL (domain specific language) approach.
I have access to Windows, Mac or Linux so not too bothered yet about platform. Other languages I can deal with are C#, Java & Python.
Thanks
James
Depending on how much you want to stay out of the low-level housekeeping details, you may want to look at CSound , or if you want to not actually write code, the patching-based system PureData is great to work with. As #Lou points out, ChucK is interesting (but was too buggy to use the last time I checked it out).
If you really do want to write code, look at the Synthesis Toolkit, a set of C++ classes for audio processing and synthesis.
For an app framework, I recommend JUCE, which has incredibly nice cross-platform handling of audio/midi IO and GUI elements.
Max MSP is an audio production tool that is highly expressive.
I guess you could say it's a high-level tool, and not a low-level programming language. My impression of it is that it's geared towards the technical musician or the artistic engineer, but anyway it kicks ass and you could go low-level with it if you want.
I've always been a big fan of SuperCollider. It's designed for Mac OS X but also works on Linux.
The language is mostly based on SmallTalk, and it's pretty easy to pick up if you understand the basics of functional programming. The quality of the sound output by the SC Server is very good and there is plenty of documentation both built into the app environment and available online.
One interesting point of SuperCollider is the usage on android devices, and it's intercommunication with python trough out other modules.
Here goes an example
I know you didn't say Ruby, but check out Archaeopteryx
https://github.com/gilesbowkett/archaeopteryx/wiki
or ChucK
http://chuck.cs.princeton.edu/
Have a look at NAudio, an open source .NET audio SDK for working with audio files and devices in Windows. Some features include:
http://naudio.codeplex.com/
NAudio Features:
Play back audio using a variety of APIs
Decompress audio from different Wave Formats
Record audio using WaveIn, WASAPI or ASIO
Read and Write standard .WAV files
Mix and manipulate audio streams using a 32 bit floating mixing engine
Extensive support for reading and writing MIDI files
Full MIDI event model
Basic support for Windows Mixer APIs
A collection of useful Windows Forms Controls
Some basic audio effects, including a compressor

Best Voice Compression Algorithms/Formats

We have some raw voice audio that we need to distribute over the internet. We need decent quality, but it doesn't need to be of musical quality. Our main concern is usability by the consumer (i.e. what and where they can play it) and size of the download. My experience has shown that mp3s do not produce the best compression numbers for voice audio, but I am at a loss for what the best alternatives are. Ultimately we would like to automate the conversion process to allow the consumer to choose the quality vs. size level that they would like.
You should give Opus a try. Example compression command line:
ffmpeg -i x.wav -b:a 32k x.opus
Start here.
As you rightly point out, voice compression is different from general audio compression. You'll find many codecs dedicated to telephony applications, ranging from PCM and ADPCM through later packet based encodings such as CELP used on GSM cellular networks.
Still, VOIP voice encoding is slightly different from that due to the medium used. you can find a good, free (unencumbered and open source (BSD)) library for speech encoding/decoding in the Speex software library.
Again, which you choose depends on the speech you're encoding and the medium it's being transmitted over. Also note that many libraries have several algorithms they can use depending on the circumstances, and some will even switch on the fly based on conditions of the sound and network.
To get more help, narrow your question down.
-Adam
The most frequently used compression formats used in live voice audio (like VoIP telephony) are μ-Law (mu-Law/u-Law is used in the US) and a-Law (used in Europe, etc.) which, unlike Uncompressed PCM, don't support as wide of a frequency range (a smaller range of possible values ignores sounds outside of the necessary spectrum and requires less space to store).
For usability sake it is easiest to use mpeg compressions (mp2/3/4) for streaming to standard media players as the algorithms are readily available and typically quite fast and almost all media players should support it, but for voice you might try to specify a lower bitrate or do your conversion from a lower quality file in the first place (WAV can be at several sampling rates and voice requires a much lower sampling rate than music or effects, it's basically like frame-per-second on video). Alternatively you can use Real Media, WMA or other proprietary formats, but this would limit usability since the users would require specific third party software for playback, though WMA has an excellent compression ratio as well as compression options specific to voice audio.
Assuming your users will be running Windows, there is a WMA speech compression codec that you can use with the Windows Media Encoder SDK. Failing that, you can use ACM to use something like G723/G728, ADPCM, mu-law or a-law, some of which are installed as standard on Windows XP & above. These can be packaged inside WAV files. You'll need to experiment a little to find the right bitrate/quality (probably don't bother with mu-law or a-law). With voice data you can get away with quite low sample rates - e.g. 16000 or 8000, as there isn't much above 4Khz in the human spoken voice.
I think AMR is one of the best speech codecs. I was using it about a year ago and I remember that quality was very good and size levels were rather small.
One drawback, especially in your case is that, as far as I know, it isn't supported by wide range of media players. QuickTime and RealPlayer are two which I know to play .amr files.
Try speex ... unencumbered by patents, good performance both sizewise and CPU-wise. I've been having good luck using it on iPhone.

Resources