Is there a specification that describe specifically the AAC-LC standard, and is it a realist goal to achieve a codec, not a general one, but for a specific AAC-LC format, with predefined number of channels and sample rate?
Are there some existing projects for AAC-LC specifically?
[EDIT]
I found a first project, coded with MatLab, that is looking promising:
http://www.mathworks.com/matlabcentral/fileexchange/28028-mpeg-4-aac-lc-decoder
It directly demux mp4, and decode AAC-LC with no blah-blah. Thousand of hardcoded values and box definitions. Maybe this is exactly built to test common iPhone/Windows Phone MP4s.
AAC (Advanced Audio Coding), including the LC (Low Complexity) profile, was originally specified in ISO/IEC 13818-7 (MPEG-2 Part 7). It was later updated by ISO/IEC 14496-3 (MPEG-4 Part 3); subpart 4 covers AAC specifically, and subpart 1 (Main) is likely to also be helpful.
General information can be found on the Wikipedia page.
There are a few existing open source implementations. Currently one of the better quality ones is the Fraunhofer FDK AAC Codec Library for Android; it was released as open source as part of Android but is not Android-specific. Its primary disadvantage is that its license is considered to be incompatible with the GPL. Some other open source implementations are listed in FFmpeg's AAC Encoding Guide, including FFmpeg's native AAC encoder, which is in development.
Related
IBM advises using Opus sound format for audio submitted to its Watson Speech to Text service. The idea being that Opus is designed specifically for speech.
Otherwise, it says you will get better quality transcription when submitting audio in flac format than in mp3 format. The latter has the obvious advantage of its small size. There is after all a 100Mb limit for file submissions. So you weigh the balance of your needs. That all makes sense so far.
But looking at conversions done on a source WAV file, the Opus file is size is comparable with mp3.
Downsampling a 366Mb wav file to 8k sample rate (one of two sample rates advised for using the service), created a wav file of 66.4Mb. Converting that to flac, wav and opus produced flac: 43.6Mb; mp3: 6.2Mb; opus: 9.8Mb.
So is opus really the best choice for getting the most accurate transcription? And how can that be when it is so small compared to flac?
Opus is designed to efficiently encode speech. The details are explained in the linked wiki article, but just to give you a gist, consider that human vocalisation range is rather limited, roughly from 80 to 260 Hz. On the other hand, or hearing range is far greater, up to 20000 Hz. Whereas music encoders (like mp3) have to work roughly within our hearing range, voice-specialised encoders (like Opus) can focus on what matters to efficiently encode human voice, with no interest what lies significantly above our vocalisation range. That I hope provides some intuition why Opus is so efficient.
Is it the best? It's somewhat opinionated, but yes, I think it's among the best choices out there. To cite after Wikipedia, Opus replaces both Vorbis and Speex for new applications, and several blind listening tests have ranked it higher-quality than any other standard audio format at any given bitrate.
I have a program that captures and stores H.264 encoded video as well as audio into a proprietary format file. I need to be able to export that video and audio to an mp4 file. I prefer C# but will use C++ if necessary. Any suggestions?
To produce MPEG-4 Part 14 .MP4 file you need a multiplexer. There is a choice of multiplexers out there:
FFmpeg (libavformat)
DirectShow filters (free and open source from GDCL, commercial)
Windows 7+ Media Foundation file sink
API and complexity might vary because some of multiplexers are expected to be a part of pipeline, they are not completely standalone classes. You might want to check respective samples (and license agreements, perhaps, too) to see what is best for you.
Take a look at libmp4v2. Fairly straightforward to use..
http://code.google.com/p/mp4v2/
I am looking for a comprehensive API in Java that can convert audio across various formats and bitrates.
For example
WAV (6kHz to 48kHz) L16/audio ---TO--- WAV (RIFF header) 8KHz 8-bit mono A-law/U-law
AIFF (6kHz to 48kHz) L16/audio ---TO--- WAV (RIFF header) 8KHz 8-bit mono A-law/U-law
and other voice audio formats.
Any other suggestions about similar Java libraries on audio conversion are also entertained.
I was able to solve this problem by using Tritonus : Open Source Java Sound API and its wide range of sound convertor plugins.
Specifically the Tritonus miscellaneous plugins was very useful in my context.
I'm looking for an AAC encoder/decoder library that works on Linux and Windows (for a C\C++ app). This is for a commercial product, so libFAAC is not an option. I've looked at the one from Nero and MainConcept, but I'd prefer something with a LGPL license or the like that doesn't require license fees.
You may want to consider android's stagefright, though it will probably take some work to adapt to a general purpose library. It available is under the Apache 2.0 license https://android.googlesource.com/platform/frameworks/base/+/froyo-release/media/libstagefright/codecs
The 3GPP 26.410 AAC reference code is very high quality for reference code though they don't mention any specific licensing terms in their package http://www.3gpp.org/ftp/Specs/html-info/26410.htm
FFmpeg has a very fast LGPL AAC decoder and an experimental LGPL AAC encoder. The decoder is great but the encoder really sucks. http://git.ffmpeg.org/?p=ffmpeg;a=tree;f=libavcodec
FAAD2 is great AAC decoder: http://www.audiocoding.com/faad2.html
On the encoder side of things - no good LGPL implementation is available, you'd have to go with commercial encoders.
We have some raw voice audio that we need to distribute over the internet. We need decent quality, but it doesn't need to be of musical quality. Our main concern is usability by the consumer (i.e. what and where they can play it) and size of the download. My experience has shown that mp3s do not produce the best compression numbers for voice audio, but I am at a loss for what the best alternatives are. Ultimately we would like to automate the conversion process to allow the consumer to choose the quality vs. size level that they would like.
You should give Opus a try. Example compression command line:
ffmpeg -i x.wav -b:a 32k x.opus
Start here.
As you rightly point out, voice compression is different from general audio compression. You'll find many codecs dedicated to telephony applications, ranging from PCM and ADPCM through later packet based encodings such as CELP used on GSM cellular networks.
Still, VOIP voice encoding is slightly different from that due to the medium used. you can find a good, free (unencumbered and open source (BSD)) library for speech encoding/decoding in the Speex software library.
Again, which you choose depends on the speech you're encoding and the medium it's being transmitted over. Also note that many libraries have several algorithms they can use depending on the circumstances, and some will even switch on the fly based on conditions of the sound and network.
To get more help, narrow your question down.
-Adam
The most frequently used compression formats used in live voice audio (like VoIP telephony) are μ-Law (mu-Law/u-Law is used in the US) and a-Law (used in Europe, etc.) which, unlike Uncompressed PCM, don't support as wide of a frequency range (a smaller range of possible values ignores sounds outside of the necessary spectrum and requires less space to store).
For usability sake it is easiest to use mpeg compressions (mp2/3/4) for streaming to standard media players as the algorithms are readily available and typically quite fast and almost all media players should support it, but for voice you might try to specify a lower bitrate or do your conversion from a lower quality file in the first place (WAV can be at several sampling rates and voice requires a much lower sampling rate than music or effects, it's basically like frame-per-second on video). Alternatively you can use Real Media, WMA or other proprietary formats, but this would limit usability since the users would require specific third party software for playback, though WMA has an excellent compression ratio as well as compression options specific to voice audio.
Assuming your users will be running Windows, there is a WMA speech compression codec that you can use with the Windows Media Encoder SDK. Failing that, you can use ACM to use something like G723/G728, ADPCM, mu-law or a-law, some of which are installed as standard on Windows XP & above. These can be packaged inside WAV files. You'll need to experiment a little to find the right bitrate/quality (probably don't bother with mu-law or a-law). With voice data you can get away with quite low sample rates - e.g. 16000 or 8000, as there isn't much above 4Khz in the human spoken voice.
I think AMR is one of the best speech codecs. I was using it about a year ago and I remember that quality was very good and size levels were rather small.
One drawback, especially in your case is that, as far as I know, it isn't supported by wide range of media players. QuickTime and RealPlayer are two which I know to play .amr files.
Try speex ... unencumbered by patents, good performance both sizewise and CPU-wise. I've been having good luck using it on iPhone.