Best Voice Compression Algorithms/Formats - audio

We have some raw voice audio that we need to distribute over the internet. We need decent quality, but it doesn't need to be of musical quality. Our main concern is usability by the consumer (i.e. what and where they can play it) and size of the download. My experience has shown that mp3s do not produce the best compression numbers for voice audio, but I am at a loss for what the best alternatives are. Ultimately we would like to automate the conversion process to allow the consumer to choose the quality vs. size level that they would like.

You should give Opus a try. Example compression command line:
ffmpeg -i x.wav -b:a 32k x.opus

Start here.
As you rightly point out, voice compression is different from general audio compression. You'll find many codecs dedicated to telephony applications, ranging from PCM and ADPCM through later packet based encodings such as CELP used on GSM cellular networks.
Still, VOIP voice encoding is slightly different from that due to the medium used. you can find a good, free (unencumbered and open source (BSD)) library for speech encoding/decoding in the Speex software library.
Again, which you choose depends on the speech you're encoding and the medium it's being transmitted over. Also note that many libraries have several algorithms they can use depending on the circumstances, and some will even switch on the fly based on conditions of the sound and network.
To get more help, narrow your question down.
-Adam

The most frequently used compression formats used in live voice audio (like VoIP telephony) are μ-Law (mu-Law/u-Law is used in the US) and a-Law (used in Europe, etc.) which, unlike Uncompressed PCM, don't support as wide of a frequency range (a smaller range of possible values ignores sounds outside of the necessary spectrum and requires less space to store).
For usability sake it is easiest to use mpeg compressions (mp2/3/4) for streaming to standard media players as the algorithms are readily available and typically quite fast and almost all media players should support it, but for voice you might try to specify a lower bitrate or do your conversion from a lower quality file in the first place (WAV can be at several sampling rates and voice requires a much lower sampling rate than music or effects, it's basically like frame-per-second on video). Alternatively you can use Real Media, WMA or other proprietary formats, but this would limit usability since the users would require specific third party software for playback, though WMA has an excellent compression ratio as well as compression options specific to voice audio.

Assuming your users will be running Windows, there is a WMA speech compression codec that you can use with the Windows Media Encoder SDK. Failing that, you can use ACM to use something like G723/G728, ADPCM, mu-law or a-law, some of which are installed as standard on Windows XP & above. These can be packaged inside WAV files. You'll need to experiment a little to find the right bitrate/quality (probably don't bother with mu-law or a-law). With voice data you can get away with quite low sample rates - e.g. 16000 or 8000, as there isn't much above 4Khz in the human spoken voice.

I think AMR is one of the best speech codecs. I was using it about a year ago and I remember that quality was very good and size levels were rather small.
One drawback, especially in your case is that, as far as I know, it isn't supported by wide range of media players. QuickTime and RealPlayer are two which I know to play .amr files.

Try speex ... unencumbered by patents, good performance both sizewise and CPU-wise. I've been having good luck using it on iPhone.

Related

Which is best sound format for IBM Speech to Text?

IBM advises using Opus sound format for audio submitted to its Watson Speech to Text service. The idea being that Opus is designed specifically for speech.
Otherwise, it says you will get better quality transcription when submitting audio in flac format than in mp3 format. The latter has the obvious advantage of its small size. There is after all a 100Mb limit for file submissions. So you weigh the balance of your needs. That all makes sense so far.
But looking at conversions done on a source WAV file, the Opus file is size is comparable with mp3.
Downsampling a 366Mb wav file to 8k sample rate (one of two sample rates advised for using the service), created a wav file of 66.4Mb. Converting that to flac, wav and opus produced flac: 43.6Mb; mp3: 6.2Mb; opus: 9.8Mb.
So is opus really the best choice for getting the most accurate transcription? And how can that be when it is so small compared to flac?
Opus is designed to efficiently encode speech. The details are explained in the linked wiki article, but just to give you a gist, consider that human vocalisation range is rather limited, roughly from 80 to 260 Hz. On the other hand, or hearing range is far greater, up to 20000 Hz. Whereas music encoders (like mp3) have to work roughly within our hearing range, voice-specialised encoders (like Opus) can focus on what matters to efficiently encode human voice, with no interest what lies significantly above our vocalisation range. That I hope provides some intuition why Opus is so efficient.
Is it the best? It's somewhat opinionated, but yes, I think it's among the best choices out there. To cite after Wikipedia, Opus replaces both Vorbis and Speex for new applications, and several blind listening tests have ranked it higher-quality than any other standard audio format at any given bitrate.

Open field usage of Google Resonance Audio SDK

is there a scenario where we can use the Google Resonance Audio SDK not with headphones, but with real speakers (e.g. mounted in a 360° cyrcle setting)?
Or are all algorithms not working for real speaker outputs?
Thank you!
Currently, Resonance Audio is optimized for headphone playback. For example, HRTF processing is done in the Ambisonics domain, without generating (virtual) speaker signals - this is because it is a much more efficient way of generating binaural output.
However, in the Resonance Audio open source release, the Ambisonic Codec class can readily be used to decode Ambisonics to any arbitrary loudspeaker array. To use that with the rest of the Resonance Audio system, however, it would be necessary to modify/extend the audio processing graph by adding a new decoder node.
Please, feel free to add a feature request and, depending on popularity, we might consider adding that in the future!

Determining the quality of mp3 audio streams

I have built a source client using Portaudio and LAME which streams the microphone input to an Icecast server to be listened to online via the HTML5 tag. I have managed to (supposedly) get the quality of the stream to MP3 320kbps at 44.1kHz and am looking for a way to confirm this using tests and or benchmarks.
I have an indication that these stats are somewhat correct from looking at stream inspectors in software such as iTunes and VLC, but I am looking to get a more in-depth data set.
What I basically want is to be able to test how much of the original file is being lost over the stream and if or how much the quality changes depending on environmental conditions of the broadcaster or streamer.
Does anyone know of any tools, frameworks to get some hard numbers or representations of this data?
If VLC tells you the stream is 320kbit CBR, then it is.
It sounds like what you're looking for is a comparison of the actual audio content. This is highly subjective. MP3 is built to use features of how our hearing works to save bandwidth. For example, quiet sounds are masked by loud sounds. High frequencies are harder to hear and are simply rolled off.
You can compare the spectral analysis between the original PCM-sampled waveform and the MP3 decoded waveform, but this doesn't tell you how humans interpret that sound. For that, you would have to survey humans.

What is the best way to stream a audio file to website users/listeners [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I'm developing a music site which will stream audio files stored in a server to users, audio files will be played through flash player placed in a webpage..
As I heard I need to use a streaming media server for streaming audio files ( like 2mb to 3mb in size).. Do I need to use one?
I found some streaming media server softwares like http://www.icecast.org - but as in their documentation, It is used for streaming radio stations and live streaming purposes, but I just need to stream audio files faster and in low size (low bandwidth) with good quality..
I heard I need to encode the audio files first and then send them to listeners and in their end audio files need to be decoded again. Is that true? How can I do that? if I need to use a special web server, where should I host my files? Any good hosting providers?
if I host audio files in a normal web server, they will use HTTP or TCP to deliver my audio files to users/ listners but I found that HTTP and TCP are not good ways to use for multi media purposes like streaming audio and video files, and they are used for delivering HTML and stuff. I found I should use RSTP or UDP for streaming audio files.. What should I use?
I know that .MP3 files has much better quality than the other formats but it also gives huge size to the audio files.. which format should I use for audio files?
Most of the best quality audio files are more than 7mb so I'm planning to convert them my self using a software so I could get low size files with some level of good quality. If I'm converting my audio files what is the good BITRATE I should use for my files?
Any known best softwares for converting audio files while keeping
quality in a good level?
Note** - I know that I will not need complex requirements at the beginning of the site but I want to know the best ways like they are using for soundcloud.com
Here´s a reply from someone who actually runs a shoutcast radio station, is an audio-technician and web-designer. Below is knowledge gathered
from over 5000 hours of up-to-date research !
6)
Audio Software ?
You need to have software that can:
Convert to other bitrates and formats
Normalize the audiovolume to a same "normalized" level for all mp3´s. (-1 dB)
Cut-off silence at beginning and/or end.
Equalize the audio so it sounds good.
Add effects, Mix...etc.
Best,most-used, very solid and FREE is "Audacity"
5)
Good bitrate ?
If the bitrate is to high your listeners on slower connections wil suffer from "bufferunderuns"
ie: hickups / short breaks in the audio cause their connection cant keep up with the (to high) speed.
If its to low then the quality is no good.
Best choice is 128 kb/s it sounds good and wont cause underruns for most.
Best format is Mp3 since its the format that can be handled by most players and shoutcast-providers.
Using above your average filesize for a 4 Min track will be around 4 Mb.
Since Mp3 # 128kb/s is the most popular you will get the best price/quality-deal
from a shoutcast server provider .
5b)
Audio tagging ?
You did forget that one.
You need to make sure to have your audio-files "Tagged" ie: what is displayed in the
players as "Artist - Title" information is not taken from the filename..but instead from the (iD1/iD3) "Tag"
Best, most used, very solid and FREE software is: "mp3tag"
it can do "Bulk" also (a 1000 mp3´s at once)
http://www.mp3tag.de/en/
4)
Codec ?
You upload your files to a server in the format described above "Mp3 # 128 kb/s"
since its the most used format all players can play it.
Make sure you upload in the same format (above) as the output of the server
this will keep a (important) low processor-load on your server (it wont need to convert).
A Shoutcast-server (or other streamserver) will take take your separate mp3´s and convert them
into one single realtime stream, it will create multiple streams to multiple listeners (100´s).
It will also provide you with statistics (nr of listeners,from where,now playing,played before)
A listener can play it 2 ways:
a-From a embedded player embedded on your website.
b-Or by clicking a link on your websit which will open your stream in any (standalone) player
your visitor has installed ( Winamp, WindowsMediaPlayer, Realplayer, Quicktime, iTunes...etc)
A standalone will give best quality because it will have more/better audiocontrols (equalizer...etc)
Best practice is to offer BOTH a embedded player and a simple clickable link.
check out at least 20 radio-station-websites (both professional and amateurs)
to see how they do it.
Best , and free embedded-player right now is "jPlayer"
because its dual-mode (HTML5 / Flash) so ALL BROWSERS and ALL MOBILES will play it.
and its very well supported with a forum,tutorials...etc
http://www.jplayer.org
2)
Hosting providers ?
Google for "Shoutcast streaming" or "Shoutcast server"
compare 20 of them for best price / quality...research them again using Google.
They will have special shoutcast software (webbased) such as "Centova"
you control it from any browser, you can stream live to it...or create playlists that play unattended from the server while you sleep ("autodj")
You can create multiple playlists such that they will play at certain times/days/random...etc.
You could create your whole station based on autodj playlists only
like that you will not have to worry about your own upload-connection interrupting
and you can shutoff your own pc.
For autodj you want a shoutcast service with at least 5 Gb storage (mp3´s)
that will give you around 3 to 4 days music without repeats...using the playlists in a clever way
and taking into account that listeners will on average listen between 30 mins and 2 hours at certain times,..you can make sure that they will not hear the same tracks all the time.
If you insist to do "live" (realtime) broadcast (streaming) from your OWN computer (directly or via a stream-server-provider then most used software is "Sam broadcaster"
That is it...start with a good Shoutcast server provider, then built your website and create a clickable link to the stream, after that you do the embedded player.
To begin, let me clarify my understanding of your needs. Please add a comment and clarify in your question if these are wrong:
You intend to build a site that will play audio
Audio will not be one continuous stream, but will be made up of individual files
Your audio will generally be music
Now, on to your questions:
(1) As I heard I need to use a streaming media server for streaming audio files ( like 2mb to 3mb in size).. Do I need to use one?
(3A) if I host audio files in a normal web server, they will use HTTP or TCP to deliver my audio files to users/ listners but I found that HTTP and TCP are not good ways to use for multi media purposes like streaming audio and video files, and they are used for delivering HTML and stuff.
Nonsense. Streaming media servers, such as SHOUTcast/Icecast, are actually just HTTP servers that send content as it comes in from an encoder. The client doesn't know the difference between it and HTTP. Metadata is interleaved into the content stream at the client's request (made with a special request header), but it is still compatible with HTTP.
HTTP is a protocol that is good for transferring any type of content. Ever download something from a website? That would have been with HTTP.
If it's good enough for YouTube, Sound Cloud, Pandora, and just about everyone else, it's probably good enough for you as well, 'eh?
(3B) I found I should use RSTP or UDP for streaming audio files.. What should I use?
TCP is an underlying network protocol that ensures reliable transmission. Packets are received in the proper order, and are acknowledged so that any lost packets can be re-transmitted. There is some overhead with this. The reason UDP is sometimes used is that it provides lower latency at the cost of being unreliable. This is fine for telephony communications, but is pointless for media that is not time sensitive, such as a bunch of audio files coming from a server. In fact, if you get a few too many corrupt packets, your audio player will often simply stop decoding the file, and would need to be restarted.
RTSP is way overkill for your needs. It supports a bunch of stuff for media control, variying bitrate on the fly, etc. This is not appropriate for your situation. Perhaps if you were streaming live video, or lengthy content, this would be more appropriate.
(2) I heard I need to encode the audio files first and then send them to listeners and in their end audio files need to be decoded again. Is that true? How can I do that? if I need to use a special web server, where should I host my files? Any good hosting providers?
You need to pick a codec for encoding audio that the client supports. I assume you will be using HTML5 with a Flash fallback. Unfortunately, there is no codec available that is universally supported. See the chart here: http://html5doctor.com/html5-audio-the-state-of-play/#support
(4) I know that .MP3 files has much better quality than the other formats but it also gives huge size to the audio files.. which format should I use for audio files?
Check your assumptions at the door, you are very wrong here. Keep in mind that the raw PCM data is often 8 times larger than MP3 (depending on chosen bitrate of course). In any case, you will want to encode to AAC, MP3, and Vorbis for widest client compatibility. aacPlus is an extension of AAC and is generally considered the standard for decent quality audio at relatively low bitrates. A 128kbit stream in AAC will sound better than a 128kbit stream in MP3.
(5) Most of the best quality audio files are more than 7mb so I'm planning to convert them my self using a software so I could get low size files with some level of good quality. If I'm converting my audio files what is the good BITRATE I should use for my files?
This question is very subjective. Personally, as a musician and audiophile, I prefer to hear stuff in its original quality. I use FLAC for compressing my music library, as the quality is lossless. For your needs, this will take up way too much bandwidth. Most folks don't know the difference between a 128kbit MP3 and the original. Many "premium" internet radio stations offer 128kbit aacPlus and 256kbit MP3. Pandora offers 96kbit MP3 for regular users, and 192kbit MP3 for premium users. Experiment, and pick a set of bitrates that work well for you and users.
Always keep the original around. It doesn't have to be on your servers, but you need it. If you re-compress a file that was already lossy compressed, then you are losing additional quality. If you make 3 compressed versions of one source, make sure you're doing so from the original source.
(6) Any known best softwares for converting audio files while keeping quality in a good level?
If it is legal for you to use, take a look at FFMPEG. It can handle just about any codec you can think of. As a word of caution though, do look into it to make sure you are paying all of the license fees necessary. Some of the codecs contained within are patented. I'm not a lawyer, and have yet to be able to figure out the legalities of using them on a commercial site. All I know is that it is heavily debated.
I've been using http://www.yagosta.com for years for a music company client. Free service and SSssooooo easy. Requires NO tech knowledge. I haven't updated this site in several years but you can see what it looks like at the following link. They probably have plenty of new designs which you can customize too. Perfectly adequate for most requirements.
http://www.bluedotmusic.net/selector01.html

Converting audio to code and vice-versa

Having just witnessed Sound Load technology on the Nintendo DS game Bangai-O Spritis. I was curious as to how this technology works? Does anyone have any links, documentation or sample code on implementing such a feature, that would allow the state of an application to be saved and loaded via audio?
Its the same old thing used in ZX Spectrum era. You load programs/games from tape.Only the sound quality and the filters are probably better.
In my opinion something like Bluetooth or WiFi is better. You can also send files that can be put on some storage and then load them. I find these methods much easier than sound because if there is a lot of noise around you cannot do much.
It is just a conversion of data to audio and then back from audio to data.
Search for Zotyocopy and Copy86M on google - these are the utilities used for saving a game to tape after loading it into memory on zx spectrum.
If you want to pass data as audio through the air there are a few things you need to be aware of though, such as how the speaker and microphone interact for example. It is important that they don't distort or alter the sound too much as what you are sending are in fact the raw bytes.
Some audio software will let you open any file as audio so that you may listen to it. If you record audio as data do not use lossy compression such as mp3 on the audio file!

Resources