converting .wav file to .ogg in javascript - audio

I'm trying to capture user's audio input from the browser. I have done it with WAV but the files are really big. A friend of mine told me that OGG files are much smaller.
Does anyone knows how to convert WAV to OGG?
I also have the raw data buffer, I don't really need to convert. But I just need the OGG encoder.
Here's the WAV encoder from Matt Diamond's RecorderJS:
function encodeWAV(samples){
var buffer = new ArrayBuffer(44 + samples.length * 2);
var view = new DataView(buffer);
/* RIFF identifier */
writeString(view, 0, 'RIFF');
/* file length */
view.setUint32(4, 32 + samples.length * 2, true);
/* RIFF type */
writeString(view, 8, 'WAVE');
/* format chunk identifier */
writeString(view, 12, 'fmt ');
/* format chunk length */
view.setUint32(16, 16, true);
/* sample format (raw) */
view.setUint16(20, 1, true);
/* channel count */
view.setUint16(22, 2, true);
/* sample rate */
view.setUint32(24, sampleRate, true);
/* byte rate (sample rate * block align) */
view.setUint32(28, sampleRate * 4, true);
/* block align (channel count * bytes per sample) */
view.setUint16(32, 4, true);
/* bits per sample */
view.setUint16(34, 16, true);
/* data chunk identifier */
writeString(view, 36, 'data');
/* data chunk length */
view.setUint32(40, samples.length * 2, true);
floatTo16BitPCM(view, 44, samples);
return view;
}
is there one for OGG?

The Web Audio spec is actually intended to allow exactly this kind of functionality, but is just not close to fulfilling that purpose yet:
This specification describes a high-level JavaScript API for processing and synthesizing audio in web applications. The primary paradigm is of an audio routing graph, where a number of AudioNode objects are connected together to define the overall audio rendering. The actual processing will primarily take place in the underlying implementation (typically optimized Assembly / C / C++ code), but direct JavaScript processing and synthesis is also supported.
Here's a statement on the current w3c audio spec draft, which makes the following points:
While processing audio in JavaScript, it is extremely challenging to get reliable, glitch-free audio while achieving a reasonably low-latency, especially under heavy processor load.
JavaScript is very much slower than heavily optimized C++ code and is not able to take advantage of SSE optimizations and multi-threading which is critical for getting good performance on today's processors. Optimized native code can be on the order of twenty times faster for processing FFTs as compared with JavaScript. It is not efficient enough for heavy-duty processing of audio such as convolution and 3D spatialization of large numbers of audio sources.
setInterval() and XHR handling will steal time from the audio processing. In a reasonably complex game, some JavaScript resources will be needed for game physics and graphics. This creates challenges because audio rendering is deadline driven (to avoid glitches and get low enough latency).
JavaScript does not run in a real-time processing thread and thus can be pre-empted by many other threads running on the system.
Garbage Collection (and autorelease pools on Mac OS X) can cause unpredictable delay on a JavaScript thread.
Multiple JavaScript contexts can be running on the main thread, stealing time from the context doing the processing.
Other code (other than JavaScript) such as page rendering runs on the main thread.
Locks can be taken and memory is allocated on the JavaScript thread. This can cause additional thread preemption.
The problems are even more difficult with today's generation of mobile devices which have processors with relatively poor performance and power consumption / battery-life issues.
ECMAScript (js) is really fast for a lot of things, and is getting faster all the time depending on what engine is interpreting the code. For something as intensive as audio processing however, you would be much better off using a low-level tool that's compiled to optimize resources specific to the task. I'm currently using ffmpeg on the server side to accomplish something similar.
I know that it is really inefficient to have to send a wav file across an internet connection just to obtain a more compact .ogg file, but that's the current state of things with the web audio api. To do any client-side processing the user would have to explicitly give access to the local file system and execution privileges for the file to make the conversion.
Edit: You could also use Google's native-client if you don't mind limiting your users to Chrome. It seems like very promising technology that loads in a sandbox and achieves speeds nearly as good natively executed code. I'm assuming that there will be similar implementations in other browsers at some point.

This question has been driving me crazy because I haven't seen anyone come up with a really clean solution, so I came up with my own library:
https://github.com/sb2702/audioRecord.js
Basic usage
audioRecord.requestDevice(function(recorder){
// Request user permission for microphone access
recorder.record(); // Start recording
recorder.stop(); /Stop recording
recorder.exportOGG(function(oggBlob){
//Here's your OGG file
});
recorder.exportMP3(function(mp3Blob){
//Here's your mp3 file
});
recorder.exportWAV(function(wavBlob){
//Here's your WAV file
});
});
Using the continuous mp3 encoding option, it's entirely reasonable to capture and encode audio input entirely in the browser, cross-browser, without a server or native code.
DEMO: http://sb2702.github.io/audioRecord.js/
It's still rough around the edges, but I'll try to clean / fix it up.

NEW: Derivative work of Matt Diamond's recorderjs recording to Ogg-Opus
To encode to Ogg-Opus a file in whole in a browser without special extensions, one may use an Emscripten port of opus-tools/opusenc (demo). It comes with decoding support for WAV, AIFF and a couple of other formats and a re-sampler built in.
An Ogg-Vorbis encoder is also available.
Since the questioner is primarily out for audio compression, they might be also interested in mp3 encoding using lame.

Ok, this might not be a direct answer as it does not say how to convert .wav into .ogg. Then again, why bother with the conversion, when you can the .ogg file directly. This depends on MediaRecorder API, but browsers which support WebAudio usually have this too( Firefox 25+ and Chrome 47+)
github.io Demo
Github Code Source

Related

How can I detect corrupt/incomplete MP3 file, from a node.js app?

The common situation when the integrity of an MP3 file is not correct, is when the file has been partially uploaded to the server. In this case, the indicated audio duration doesn't correspond to what is really in the MP3 file: we can hear the beginning, but at some point the playing stops and the indicated duration of the audio player is broken.
I tried with libraries like node-ffprobe, but it seems they just read metadata, without making comparison with real audio data in the file. Is there a way to detect efficiently a corrupted or incomplete MP3 file from node.js?
Note: the client uploading MP3 files is a hardware (an audio recorder), uploading files on a FTP server. Not a browser. So I'm not able to upload potentially more useful data from the client.
MP3 files don't normally have a duration. They're just a series of MPEG frames. Sometimes, there is an ID3 tag indicating duration, but not always.
Players can determine duration by choosing one of a few methods:
Decode the entire audio file.This is the slowest method, but if you're going to decode the file anyway, you might as well go this route as it gives you an exact duration.
Read the whole file, skimming through frame headers.You'll have to read the whole file from disk, but you won't have to decode it. Can be slow if I/O is slow, but gives you an exact duration.
Read the first frame's bitrate and estimate duration by file size.Definitely the fastest method, and the one most commonly used by players. Duration is an estimate only, and is reasonably accurate for CBR, but can be wildly inaccurate for VBR.
What I'm getting at is that these files might not actually be broken. They might just be VBR files that your player doesn't know the duration of.
If you're convinced they are broken (such as stopping in the middle of content), then you'll have to figure out how you want to handle it. There are probably only a couple ways to determine this:
Ideally, there's an ID3 tag indicating duration, and you can decode the whole file and determine its real duration to compare.
Usually, that ID3 tag won't exist, so you'll have to check to see if the last frame is complete or not.
Beyond that, you don't really have a good way of knowing if the stream is incomplete, since there is no outer container that actually specifies number of frames to expect.
The expression for calculating the filesize of an mp3 based on duration and encoding (from this answer) is quite simple:
x = length of song in seconds
y = bitrate in kilobits per second
(x * y) / 1024 = filesize (MB)
There is also a javascript implementation for the Web Audio API in another answer on that same question. Perhaps that would be useful in your Node implementation.
mp3diags is some older open source software for fixing mp3s and which was great for batch processing stuff like this. The source is c++ and still available if you're feeling nosy and want to see how some of these features are implemented.
Worth a look since it has some features that might be be useful in your context:
What is MP3 Diags and what does it do?
low quality audio
missing VBR header
missing normalization data
Correcting files that show incorrect song duration
Correcting files in which the player cannot seek correctly

Programmatic access to a sound played through OpenAL

I am working with an application that uses OpenAL API quite extensively. In particular, there are multiple sound sources, non-trivial listener filters, etc.
I want to be able to run this application significantly faster than real-time. At the same time, the sound must be saved for later postprocessing. Is there a way to access the OpenAL output programmatically (virtually) without ever playing the sound on the real playback device?
Ideally, I'd like to have access that would be played during every tick of the main loop of my application. Normally one tick corresponds to one rendered frame (e.g. 1/30th of a second). But in this case we would be running the app as fast as possible.
We ended up using OpenAL Soft to do this. Example:
#include "alext.h"
LPALCLOOPBACKOPENDEVICESOFT alcLoopbackOpenDeviceSOFT;
alcLoopbackOpenDeviceSOFT = alcGetProcAddress(NULL,"alcLoopbackOpenDeviceSOFT");
replace your default device with this device
ALCcontext *context = alcCreateContext(device, attrs);
Set the attrs as you would for your default device
Then in the main loop use:
LPALCRENDERSAMPLESSOFT alcRenderSamplesSOFT;
alcRenderSamplesSOFT = alcGetProcAddress(NULL, "alcRenderSamplesSOFT");
alcRenderSamplesSOFT(device, buffer, 1024);
Here the buffer will store 1024 samples. This code runs faster than real-time, therefore you can sample frames every tick
Are you able to do your required functions with the audio data prior to its being shipped to OpenAL? I've done a lot with javax.sound.sampled when it is untethered by the blocking write() method in SourceDataLine, especially when saving to file rather than playing back.
From what little I know about OpenAL, there is also a blocking process occurs when data is shipped, with a queue of arrays that are managed. I've been meaning to look into this further...
(Probably not being very helpful here. Apologies.)

What's FFmpeg doing with avcodec_send_packet()?

I'm trying to optimise a piece of software for playing video, which internally uses the FFmpeg libraries for decoding. We've found that on some large (4K, 60fps) video, it sometimes takes longer to decode a frame than that frame should be displayed for; sadly, because of the problem domain, simply buffering/skipping frames is not an option.
However, it appears that the FFmpeg executable is able to decode the video in question fine, at about 2x speed, so I've been trying to work out what we're doing wrong.
I've written a very stripped-back decoder program for testing; the source is here (it's about 200 lines). From profiling it, it appears that the one major bottleneck during decoding is the avcodec_send_packet() function, which can take up to 50ms per call. However, measuring the same call in FFmpeg shows strange behaviour:
(these are the times taken for each call to avcodec_send_packet() in milliseconds, when decoding a 4K 25fps VP9-encoded video.)
Basically, it seems that when FFmpeg uses this function, it only really takes any amount of time to complete every N calls, where N is the number of threads being used for decoding. However, both my test decoder and the actual product use 4 threads for decoding, and this doesn't happen; when using frame-based threading, the test decoder behaves like FFmpeg using only 1 thread. This would seem to indicate that we're not using multithreading at all, but we've still seen performance improvements by using more threads.
FFmpeg's results average out to being about twice as fast overall as our decoders, so clearly we're doing something wrong. I've been reading through FFmpeg's source to try to find any clues, but it's so far eluded me.
My question is: what's FFmpeg doing here that we're not? Alternatively, how can we increase the performance of our decoder?
Any help is greatly appreciated.
I was facing the same problem. It took me quite a while to figure out a solution which I want to share here for future references:
Enable multithreading for the decoder. Per default the decoder only uses one thread, depending on the decoder, multithreading can speed up decoding drastically.
Assuming you have AVFormatContext *format_ctx, a matching codec AVCodec* codec and AVCodecContext* codec_ctx (allocated using avcodec_alloc_context3).
Before opening the codec context (using avcodec_open2) you can configure multithreading. Check the capabilites of the codec in order to decide which kind of multithreading you can use:
// set codec to automatically determine how many threads suits best for the decoding job
codec_ctx->thread_count = 0;
if (codec->capabilities | AV_CODEC_CAP_FRAME_THREADS)
codec_ctx->thread_type = FF_THREAD_FRAME;
else if (codec->capabilities | AV_CODEC_CAP_SLICE_THREADS)
codec_ctx->thread_type = FF_THREAD_SLICE;
else
codec_ctx->thread_count = 1; //don't use multithreading
Another speed-up I found out is the following: keep sending packets to the decoder (thats what avcodec_send_packet() is doing) until you get AVERROR(EAGAIN) as return value. This means the internal decoder buffers are full and you first need to collect the decoded frames (but remember to send this last packet again after the decoder is empty again). Now you can collect the decoded frames using avcodec_receive_frame until you get AVERROR(EAGAIN) again.
Some decoders work way faster when they have mutiple frames queued for decoding (thats what the decoder does when codec_ctx->thread_type = FF_THREAD_FRAME is set).
avcodec_send_packet() and avcodec_receive_frame() are wrapper functions most important thing those do is calling selected codec's decode function and returns decoded frame (or error).
Try tweaking the codec options, for example, low latency may not give you what you want. And sometimes old api (I believe it still around) avcodec_decode_video2() outperforms newer one, you may try that too.

Adding audio effects (reverb etc..) to a BackgroundAudioPlayer driven streaming audio app

I have a windows phone 8 app which plays audio streams from a remote location or local files using the BackgroundAudioPlayer. I now want to be able to add audio effects, for example, reverb or echo, etc...
Please could you advise me on how to do this? I haven't been able to find a way of hooking extra audio processing code into the pipeline of audio processing even through I've read much about WASAPI, XAudio2 and looked at many code examples.
Note that the app is written in C# but, from my previous experience with writing audio processing code, I know that I should be writing the audio code in native C++. Roughly speaking, I need to find a point at which there is an audio buffer containing raw PCM data which I can use as an input for my audio processing code which will then write either back to the same buffer or to another buffer which is read by the next stage of audio processing. There need to be ways of synchronizing what happens in my code with the rest of the phone's audio processing mechanisms and, of course, the process needs to be very fast so as not to cause audio glitches. Or something like that; I'm used to how VST works, not how such things might work in the Windows Phone world.
Looking forward to seeing what you suggest...
Kind regards,
Matt Daley
I need to find a point at which there is an audio buffer containing
raw PCM data
AFAIK there's no such point. This MSDN page hints that audio/video decoding is performed not by the OS, but by the Qualcomm chip itself.
You can use something like Mp3Sharp for decoding. This way the mp3 will be decoded on the CPU by your managed code, you can interfere / process however you like, then feed the PCM into the media stream source. Main downside - battery life: the hardware-provided codecs should be much more power-efficient.

Play audio data using QIODevice (Qt4.6 with VC++)

I'm working on playing audio from an audio stream using VC++ with the QtMultimedia library. Since I'm not too experienced with Qt's libraries I started by reading in a .wav file and writing it to a buffer:
ifstream wavFile;
char* file = "error_ex.wav";
wavFile.open( file, ios::binary );
After that, I used ifstream's .read() function and write all the data into a buffer. After the buffer is written it's sent off to the audio writer that prepares it for Qt:
QByteArray fData;
for( int i = 0; i < (int)data.size(); ++i )
{
fData.push_back(data.at(i));
}
m_pBuffer->open(QIODevice::ReadWrite);
m_pBuffer->write( fData );
m_pBuffer->close();
(m_pBuffer is of type QBuffer)
Once the QBuffer is ready I attempt to play the buffer:
QIODevice* ioDevice = m_pAudioOut->start();
ioDevice->write( m_pBuffer->buffer() );
(m_pAudioOut is of type QAudioOutput)
This results in a small pop from the speakers and then it stops playing. Any ideas why?
Running Visual Studios 2008 on Windows XP SP2 using Qt library 4.6.3.
As Frank pointed out, if your requirement is simply to play audio data from a file, a higher-level API would do the job, and would simplify your application code. Phonon would be one option; alternatively, the QtMobility project provides the QMediaPlayer API for high-level use cases.
Given that the question is specifically about using QIODevice however, and that you mentioned that reading from a WAV file was just your intitial approach, I'll assume that you actually need a streaming API, i.e. one which allows the client to control the buffering, rather than handing over this control to a higher-level abstraction such as Phonon.
QAudioOutput can be used in two different modes, depending on which overload of start() is called:
"Pull mode": void QAudioOutput::start(QIODevice *)
In this mode, QAudioOutput will pull data from the supplied QIODevice without further intervention from the client. It is a good choice if the QIODevice being used is one which is provided by Qt (e.g. QFile, QAbstractSocket etc).
"Push mode": QIODevice* QAudioOutput::start()
In this mode, the QAudioOutput client must push mode to the audio device by calling QIODevice::write(). This will need to be done in a loop, something like:
qint64 dataRemaining = ... // assign correct value here
while (dataRemaining) {
qint64 bytesWritten = audioOutput->write(buffer, dataRemaining);
dataRemaining -= bytesWritten;
buffer += bytesWritten;
// Then wait for a short time
}
How the wait is implemented will depend on the context of your application - if audio is being written from a dedicated thread, it could simply sleep(). Alternatively, if audio is being written from the main thread, you will probably want the write to be triggered by a QTimer.
Since you don't mention anything about using a loop around the write() calls in your app, it looks like what is happening is that you write a short segment of data (which plays as a pop), then don't write any more.
You can see code using both modes in the examples/multimedia/audiooutput app which is delivered with Qt.
Are you sure you use the right (high-level) API? It would be weird if you had to handle data streams and buffering manually. Also, QIODevice::write() doesn't necessarily write the whole buffer but might stop after n bytes, just like POSIX write() (that's why one always should check the return value).
I didn't look into QtMultimedia yet, but using the more mature Phonon, video and audio output worked just fine for me in the past. It works like this:
Create a Phonon::AudioOutput object
Create a Phonon::MediaObject object
Phonon::createPath( mediaObject, audioObject )
mediaObject->setCurrentSource( Phonon::MediaSource( path ) );
mediaObject->play();
There are also examples in Qt.

Resources