Does ReplayGain work on Opus audio files, too, and how do I apply it? - audio

ReplayGain is a proposed technical standard published by David Robinson in 2001 to measure and normalize the perceived loudness of audio in computer audio formats such as MP3 and Ogg Vorbis.
Does ReplayGain work on audio files encoded with Opus, too? And what's a command-line solution to apply it?

Related

How can I apply audio compression to an MP4 file?

I am using moviepy to generate MP4 files from sets of shorter clips, each with their own audio. The problem is that the resulting MP4 often has a very high dynamic range from one clip to the next and I would like to apply audio compression to make it easier on the ears. In Google I can only find results about audio information compression, but not about audio compression from the audio engineering perspective.
I would like to know if there is some way of doing this with moviepy, or with some other library. I have no issue with invoking (non interactive) command line utilities either.
Thank you.

Which is best sound format for IBM Speech to Text?

IBM advises using Opus sound format for audio submitted to its Watson Speech to Text service. The idea being that Opus is designed specifically for speech.
Otherwise, it says you will get better quality transcription when submitting audio in flac format than in mp3 format. The latter has the obvious advantage of its small size. There is after all a 100Mb limit for file submissions. So you weigh the balance of your needs. That all makes sense so far.
But looking at conversions done on a source WAV file, the Opus file is size is comparable with mp3.
Downsampling a 366Mb wav file to 8k sample rate (one of two sample rates advised for using the service), created a wav file of 66.4Mb. Converting that to flac, wav and opus produced flac: 43.6Mb; mp3: 6.2Mb; opus: 9.8Mb.
So is opus really the best choice for getting the most accurate transcription? And how can that be when it is so small compared to flac?
Opus is designed to efficiently encode speech. The details are explained in the linked wiki article, but just to give you a gist, consider that human vocalisation range is rather limited, roughly from 80 to 260 Hz. On the other hand, or hearing range is far greater, up to 20000 Hz. Whereas music encoders (like mp3) have to work roughly within our hearing range, voice-specialised encoders (like Opus) can focus on what matters to efficiently encode human voice, with no interest what lies significantly above our vocalisation range. That I hope provides some intuition why Opus is so efficient.
Is it the best? It's somewhat opinionated, but yes, I think it's among the best choices out there. To cite after Wikipedia, Opus replaces both Vorbis and Speex for new applications, and several blind listening tests have ranked it higher-quality than any other standard audio format at any given bitrate.

MIME type of mp3 file using Speech to Text

I am using speech to text API of Watson. Initially I used a .wav file but I want to use a mp3 file. So I want know the MIME type of mp3 file for specifying the content type?
You can see within official documentation the Audio formats support is:
Audio formats: Transcribes Free Lossless Audio Codec (FLAC), Linear 16-bit Pulse-Code Modulation (PCM), Waveform Audio File Format (WAV), Ogg format with the Opus or Vorbis codec, Web Media (WebM) format with the Opus or Vorbis codec, mu-law (or u-law) audio data, or basic audio.
Check: MIME Types for Speech to Text.
One good idea to use your mp3 audio is to convert before sending for the API.
And, depends on what you want, you can use this article. In this article, Jason shows how to use mp3 with Asterisk to send the voice audio for Speech to Text. I'm not sure if works yet.
EDIT: [10/2017]
A few days ago, Watson Speech to Text release one new version that supports mp3 input features.
Check the audio formats supported now:
Audio formats: Transcribe Free Lossless Audio Codec (FLAC), MP3 (Motion Picture Experts Group, or MPEG) format, Linear 16-bit Pulse-Code Modulation (PCM), Waveform Audio File Format (WAV), Ogg format with the Opus or Vorbis codec, Web Media (WebM) format with the Opus or Vorbis codec, mu-law (or u-law) audio data, and basic audio.
See the Official documentation talking about here.

How do I create an mp4 file from a collection of H.264 frames and audio frames?

I have a program that captures and stores H.264 encoded video as well as audio into a proprietary format file. I need to be able to export that video and audio to an mp4 file. I prefer C# but will use C++ if necessary. Any suggestions?
To produce MPEG-4 Part 14 .MP4 file you need a multiplexer. There is a choice of multiplexers out there:
FFmpeg (libavformat)
DirectShow filters (free and open source from GDCL, commercial)
Windows 7+ Media Foundation file sink
API and complexity might vary because some of multiplexers are expected to be a part of pipeline, they are not completely standalone classes. You might want to check respective samples (and license agreements, perhaps, too) to see what is best for you.
Take a look at libmp4v2. Fairly straightforward to use..
http://code.google.com/p/mp4v2/

Java audio converter api

I am looking for a comprehensive API in Java that can convert audio across various formats and bitrates.
For example
WAV (6kHz to 48kHz) L16/audio ---TO--- WAV (RIFF header) 8KHz 8-bit mono A-law/U-law
AIFF (6kHz to 48kHz) L16/audio ---TO--- WAV (RIFF header) 8KHz 8-bit mono A-law/U-law
and other voice audio formats.
Any other suggestions about similar Java libraries on audio conversion are also entertained.
I was able to solve this problem by using Tritonus : Open Source Java Sound API and its wide range of sound convertor plugins.
Specifically the Tritonus miscellaneous plugins was very useful in my context.

Resources