Simplest way to capture raw audio from audio input for real time processing on a mac - audio

What is the simplest way to capture audio from the built in audio input and be able to read the raw sampled values (as in a .wav) in real time as they come in when requested, like reading from a socket.
Hopefully code that uses one of Apple's frameworks (Audio Queues). Documentation is not very clear, and what I need is very basic.

Try the AudioQueue Framework for this. You mainly have to perform 3 steps:
setup an audio format how to sample the incoming analog audio
start a new recording AudioQueue with AudioQueueNewInput()
Register a callback routine which handles the incoming audio data packages
In step 3 you have a chance to analyze the incoming audio data with AudioQueueGetProperty()
It's roughly like this:
static void HandleAudioCallback (void *aqData,
AudioQueueRef inAQ,
AudioQueueBufferRef inBuffer,
const AudioTimeStamp *inStartTime,
UInt32 inNumPackets,
const AudioStreamPacketDescription *inPacketDesc) {
// Here you examine your audio data
}
static void StartRecording() {
// now let's start the recording
AudioQueueNewInput (&aqData.mDataFormat, // The sampling format how to record
HandleAudioCallback, // Your callback routine
&aqData, // e.g. AudioStreamBasicDescription
NULL,
kCFRunLoopCommonModes,
0,
&aqData.mQueue); // Your fresh created AudioQueue
AudioQueueStart(aqData.mQueue,
NULL);
}
I suggest the Apple AudioQueue Services Programming Guide for detailled information about how to start and stop the AudioQueue and how to setup correctly all ther required objects.
You may also have a closer look into Apple's demo prog SpeakHere. But this is IMHO a bit confusing to start with.

It depends how ' real-time ' you need it
if you need it very crisp, go down right at the bottom level and use audio units. that means setting up an INPUT callback. remember, when this fires you need to allocate your own buffers and then request the audio from the microphone.
ie don't get fooled by the presence of a buffer pointer in the parameters... it is only there because Apple are using the same function declaration for the input and render callbacks.
here is a paste out of one of my projects:
OSStatus dataArrivedFromMic(
void * inRefCon,
AudioUnitRenderActionFlags * ioActionFlags,
const AudioTimeStamp * inTimeStamp,
UInt32 inBusNumber,
UInt32 inNumberFrames,
AudioBufferList * dummy_notused )
{
OSStatus status;
RemoteIOAudioUnit* unitClass = (RemoteIOAudioUnit *)inRefCon;
AudioComponentInstance myUnit = unitClass.myAudioUnit;
AudioBufferList ioData;
{
int kNumChannels = 1; // one channel...
enum {
kMono = 1,
kStereo = 2
};
ioData.mNumberBuffers = kNumChannels;
for (int i = 0; i < kNumChannels; i++)
{
int bytesNeeded = inNumberFrames * sizeof( Float32 );
ioData.mBuffers[i].mNumberChannels = kMono;
ioData.mBuffers[i].mDataByteSize = bytesNeeded;
ioData.mBuffers[i].mData = malloc( bytesNeeded );
}
}
// actually GET the data that arrived
status = AudioUnitRender( (void *)myUnit,
ioActionFlags,
inTimeStamp,
inBusNumber,
inNumberFrames,
& ioData );
// take MONO from mic
const int channel = 0;
Float32 * outBuffer = (Float32 *) ioData.mBuffers[channel].mData;
// get a handle to our game object
static KPRing* kpRing = nil;
if ( ! kpRing )
{
//AppDelegate * appDelegate = [UIApplication sharedApplication].delegate;
kpRing = [Game singleton].kpRing;
assert( kpRing );
}
// ... and send it the data we just got from the mic
[ kpRing floatsArrivedFromMic: outBuffer
count: inNumberFrames ];
return status;
}

Related

JUCE - play audio input back

I am learning JUCE and I am writing a program that just reads the input from the audio card and plays it back. Obviously this is just for learning purposes. I am using the audio application template. This is the code inside the getNextAudioBlock() function:
void getNextAudioBlock (const AudioSourceChannelInfo& bufferToFill) override
{
if(true) // this is going to be replaced by checking the value of a button
{
const int channel = 0;
if(true) // this is going to be replaced too
{
const float* inBuffer = bufferToFill.buffer->getReadPointer(channel, bufferToFill.startSample);
float* outBuffer = bufferToFill.buffer->getWritePointer(channel, bufferToFill.startSample);
for(int sample = 0; sample < bufferToFill.numSamples; ++sample)
outBuffer[sample] = inBuffer[sample];
}
else
{
bufferToFill.buffer->clear(0, bufferToFill.startSample, bufferToFill.numSamples);
}
}
else
{
bufferToFill.buffer->clear(0, bufferToFill.startSample, bufferToFill.numSamples);
}
}
The code is really simple: the content from the input buffer is copied directly to the output buffer. However, I am not hearing anything. What am I doing wrong?

How to monitor playback on ALSA via asoundlib?

I'm building an application that allows for ALSA configuration and in the GUI there is a peek meter that allows the client to see playback levels in realtime. I'm having a hard time determining what device to connect to because I don't know if ALSA has a default "loopback" or not and what it's called. I am also having trouble converting the read data into a sample, then finding said sample's amplitude. Here is what I have built so far:
Grab device and set hardware params
if (0 == snd_pcm_open(&pPcm, "default", SND_PCM_STREAM_CAPTURE, SND_PCM_NONBLOCK))
{
if (0 == snd_pcm_set_params(pPcm, SND_PCM_FORMAT_S16_LE, SND_PCM_ACCESS_RW_INTERLEAVED, 1, 96000, 1, 1)) // This last argument confuses me because I'm not given a unit of measurement (second, millisecond, mircosecond, etc.)
{
return snd_pcm_start(pPcm);
}
}
pPcm = nullptr;
return -1;
Read from device and return the peek of the audio signal
int iRtn = -1;
if (nullptr == pPcm)
{
if (-1 == SetupListener())
{
return iRtn;
}
}
// Check to make the state is sane for reading.
if (SND_PCM_STATE_PREPARED == snd_pcm_state(pPcm) ||
SND_PCM_STATE_RUNNING == snd_pcm_state(pPcm))
{
snd_pcm_resume(pPcm); // This might be superfluous.
// The state is sane, read from the stream.
signed short iBuffer = 0;
int iNumRead = snd_pcm_readi(pPcm, &iBuffer, 1);
if (0 < iNumRead)
{
// This calculates an approximation.
// We have some audio data, acquire it's peek in dB. (decibels)
float nSample = static_cast<float>(iBuffer);
float nAmplitude = nSample / MAX_AMPLITUDE_S16; // MAX_AMPLITUDE_S16 is defined as "32767"
float nDecibels = (0 < nAmplitude) ? 20 * log10(nAmplitude) : 0;
iRtn = static_cast<int>(nDecibels); // Cast to integer for GUI element.
}
}
return iRtn;
The ALSA documentation seems very barren and so I apologize if I'm misusing the API.

Save an RGB24 sample to bitmap

I'm using Windows Media Foundation to do some messing around with my webcam. I've been able to successfully retrieve a data sample from the webcam and identify that the format is RGB24. Now I'd like to save a single frame as a bitmap. A small snippet of the code I'm using to read a sample from the webcam is below.
IMFSample *pSample = NULL;
hr = pReader->ReadSample(
MF_SOURCE_READER_ANY_STREAM, // Stream index.
0, // Flags.
&streamIndex, // Receives the actual stream index.
&flags, // Receives status flags.
&llTimeStamp, // Receives the time stamp.
&pSample // Receives the sample or NULL.
);
So once I've got pSample populated with an IMFSample how can I save it as a bitmap?
Below is the code snippet I used to save a bitmap from an IMFSample. I've taken a lot of shortcuts and I'm pretty sure I'm only able to get away with doing things this way because my webcam defaults to returning an RGB24 stream and also a 640 x 480 pixel buffer which means there's no striping to worry about in pData.
hr = pReader->ReadSample(
MF_SOURCE_READER_ANY_STREAM, // Stream index.
0, // Flags.
&streamIndex, // Receives the actual stream index.
&flags, // Receives status flags.
&llTimeStamp, // Receives the time stamp.
&pSample // Receives the sample or NULL.
);
wprintf(L"Stream %d (%I64d)\n", streamIndex, llTimeStamp);
HANDLE file;
BITMAPFILEHEADER fileHeader;
BITMAPINFOHEADER fileInfo;
DWORD write = 0;
file = CreateFile(L"sample.bmp",GENERIC_WRITE,0,NULL,CREATE_ALWAYS,FILE_ATTRIBUTE_NORMAL,NULL); //Sets up the new bmp to be written to
fileHeader.bfType = 19778; //Sets our type to BM or bmp
fileHeader.bfSize = sizeof(fileHeader.bfOffBits) + sizeof(RGBTRIPLE); //Sets the size equal to the size of the header struct
fileHeader.bfReserved1 = 0; //sets the reserves to 0
fileHeader.bfReserved2 = 0;
fileHeader.bfOffBits = sizeof(BITMAPFILEHEADER)+sizeof(BITMAPINFOHEADER); //Sets offbits equal to the size of file and info header
fileInfo.biSize = sizeof(BITMAPINFOHEADER);
fileInfo.biWidth = 640;
fileInfo.biHeight = 480;
fileInfo.biPlanes = 1;
fileInfo.biBitCount = 24;
fileInfo.biCompression = BI_RGB;
fileInfo.biSizeImage = 640 * 480 * (24/8);
fileInfo.biXPelsPerMeter = 2400;
fileInfo.biYPelsPerMeter = 2400;
fileInfo.biClrImportant = 0;
fileInfo.biClrUsed = 0;
WriteFile(file,&fileHeader,sizeof(fileHeader),&write,NULL);
WriteFile(file,&fileInfo,sizeof(fileInfo),&write,NULL);
IMFMediaBuffer *mediaBuffer = NULL;
BYTE *pData = NULL;
pSample->ConvertToContiguousBuffer(&mediaBuffer);
hr = mediaBuffer->Lock(&pData, NULL, NULL);
WriteFile(file, pData, fileInfo.biSizeImage, &write, NULL);
CloseHandle(file);
mediaBuffer->Unlock();
I've included a bit of a discussion here.

Hooking IDirect3DDevice9::EndScene method to capture a gameplay video: can not get rid of a text overlay in the recorded video

In fact it is a wild mix of technologies, but the answer to my question (I think) is closest to Direct3D 9. I am hooking to an arbitrary D3D9 applications, in most cases it is a game, and injecting my own code to mofify the behavior of the EndScene function. The backbuffer is copied into a surface which is set to point to a bitmap in a push source DirectShow filter. The filter samples the bitmaps at 25 fps and streams the video into an .avi file. There is a text overlay shown across the game's screnn telling the user about a hot key combination that is supposed to stop gameplay capture, but this overlay is not supposed to show up in the recoreded video. Everything works fast and beautiful except for one annoying fact. On a random occasion, a frame with the text overaly makes its way into the recoreded video. This is not a really desired artefact, the end user only wants to see his gameplay in the video and nothing else. I'd love to hear if anyone can share ideas of why this is happening. Here is the source code for the EndScene hook:
using System;
using SlimDX;
using SlimDX.Direct3D9;
using System.Diagnostics;
using DirectShowLib;
using System.Runtime.InteropServices;
[InterfaceType(ComInterfaceType.InterfaceIsIUnknown)]
[System.Security.SuppressUnmanagedCodeSecurity]
[Guid("EA2829B9-F644-4341-B3CF-82FF92FD7C20")]
public interface IScene
{
unsafe int PassMemoryPtr(void* ptr, bool noheaders);
int SetBITMAPINFO([MarshalAs(UnmanagedType.LPArray, SizeParamIndex = 1)]byte[] ptr, bool noheaders);
}
public class Class1
{
object _lockRenderTarget = new object();
public string StatusMess { get; set; }
Surface _renderTarget;
//points to image bytes
unsafe void* bytesptr;
//used to store headers AND image bytes
byte[] bytes;
IFilterGraph2 ifg2;
ICaptureGraphBuilder2 icgb2;
IBaseFilter push;
IBaseFilter compressor;
IScene scene;
IBaseFilter mux;
IFileSinkFilter sink;
IMediaControl media;
bool NeedRunGraphInit = true;
bool NeedRunGraphClean = true;
DataStream s;
DataRectangle dr;
unsafe int EndSceneHook(IntPtr devicePtr)
{
int hr;
using (Device device = Device.FromPointer(devicePtr))
{
try
{
lock (_lockRenderTarget)
{
bool TimeToGrabFrame = false;
//....
//logic based on elapsed milliseconds deciding if it is time to grab another frame
if (TimeToGrabFrame)
{
//First ensure we have a Surface to render target data into
//called only once
if (_renderTarget == null)
{
//Create offscreen surface to use as copy of render target data
using (SwapChain sc = device.GetSwapChain(0))
{
//Att: created in system memory, not in video memory
_renderTarget = Surface.CreateOffscreenPlain(device, sc.PresentParameters.BackBufferWidth, sc.PresentParameters.BackBufferHeight, sc.PresentParameters.BackBufferFormat, Pool.SystemMemory);
} //end using
} // end if
using (Surface backBuffer = device.GetBackBuffer(0, 0))
{
//The following line is where main action takes place:
//Direct3D 9 back buffer gets copied to Surface _renderTarget,
//which has been connected by references to DirectShow's
//bitmap capture filter
//Inside the filter ( code not shown in this listing) the bitmap is periodically
//scanned to create a streaming video.
device.GetRenderTargetData(backBuffer, _renderTarget);
if (NeedRunGraphInit) //ran only once
{
ifg2 = (IFilterGraph2)new FilterGraph();
icgb2 = (ICaptureGraphBuilder2)new CaptureGraphBuilder2();
icgb2.SetFiltergraph(ifg2);
push = (IBaseFilter) new PushSourceFilter();
scene = (IScene)push;
//this way we get bitmapfile and bitmapinfo headers
//ToStream is slow, but run it only once to get the headers
s = Surface.ToStream(_renderTarget, ImageFileFormat.Bmp);
bytes = new byte[s.Length];
s.Read(bytes, 0, (int)s.Length);
hr = scene.SetBITMAPINFO(bytes, false);
//we just supplied the header to the PushSource
//filter. Let's pass reference to
//just image bytes from LockRectangle
dr = _renderTarget.LockRectangle(LockFlags.None);
s = dr.Data;
Result r = _renderTarget.UnlockRectangle();
bytesptr = s.DataPointer.ToPointer();
hr = scene.PassMemoryPtr(bytesptr, true);
//continue building graph
ifg2.AddFilter(push, "MyPushSource");
icgb2.SetOutputFileName(MediaSubType.Avi, "C:\foo.avi", out mux, out sink);
icgb2.RenderStream(null, null, push, null, mux);
media = (IMediaControl)ifg2;
media.Run();
NeedRunGraphInit = false;
NeedRunGraphClean = true;
StatusMess = "now capturing, press shift-F11 to stop";
} //end if
} // end using backbuffer
} // end if Time to grab frame
} //end lock
} // end try
//It is usually thrown when the user makes game window inactive
//or it is thrown deliberately when time is up, or the user pressed F11 and
//it resulted in stopping a capture.
//If it is thrown for another reason, it is still a good
//idea to stop recording and free the graph
catch (Exception ex)
{
//..
//stop the DirectShow graph and cleanup
} // end catch
//draw overlay
using (SlimDX.Direct3D9.Font font = new SlimDX.Direct3D9.Font(device, new System.Drawing.Font("Times New Roman", 26.0f, FontStyle.Bold)))
{
font.DrawString(null, StatusMess, 20, 100, System.Drawing.Color.FromArgb(255, 255, 255, 255));
}
return device.EndScene().Code;
} // end using device
} //end EndSceneHook
As it happens sometimes, I finally found an answer to this question myself, if anyone is interested. It turned out that backbuffer in some Direct3D9 apps is not necessarily refreshed each time the hooked EndScene is called. Hence, occasionally the backbuffer with the text overlay from the previous EndScene hook call was passed to the DirectShow source filter responsible for collecting input frames. I started stamping each frame with a tiny 3 pixel overlay with known RGB values and checking if this dummy overlay was still present before passing the frame to the DirectShow filter. If the overlay was there, the previously cached frame was passed instead of the current one. This approach effectively removed the text overlay from the video recorded in the DirectShow graph.

Visual C++ AVI writer function to push bitmaps (640x480) to AVI file?

I have a video capture card with SDK for Visual C++. Color frames (640 x 480) become available to me at 30 fps in a callback from the SDK. Currently, I am writing the entire image sequence out one at a time as individual bmp files in a separate thread -- that's 108,000 files in an hour, or about 100 GB per hour, which is not manageable. I would like to push these incoming frames to one AVI file instead, with optional compression. Where do I even start? Wading through the MSDN DirectShow documentation has confused me so far. Are there better examples out there? Is OpenCV the answer? I've looked at some examples, but I'm not sure OpenCV would even recognize the card as a capture device, nor do I understand how it even recognizes capture devices in the first place. Also, I'm already getting the frames in, I just need to put them out to AVI in some consumer thread that does not back up my producer thread. Thanks for any help.
I've used CAviFile before. It works pretty well, I had to tweak it a bit to allow the user to pick the codec. I took that code from CAviGenerator. The interface for CAviFile is very simple, here's some sample code:
CAviFile *Avi = new CAviFile(fileName.c_str(), 0, 10);
HRESULT res = Avi->AppendNewFrame(Width, Height, ImageBuffer, BitsPerPixel);
if (FAILED(res))
{
std::cout << "Error recording AVI: " << Avi->GetLastErrorMessage() << std::endl;
}
delete Avi;
Obviously you have to ensure your ImageBuffer contains data in the right format etc. But once I got that kind of stuff all sorted out it worked great.
You can either use Video for Windows or DirectShow. Each comes with its own set of codecs. (and can be extended)
Though Microsoft considers VfW deprecated it is still perfectly usable, and is easier to setup than DirectShow.
Well you need to attach an AVI Mux (CLSID_AviDest) to your capture card. You then need to attach a File Writer (CLSID_FileWriter) and it will write out everything for you.
Admittedly Setting up the capture graph is not necessarily easy as DirectShow makes you jump through a million and one hoops.
Its much easier using the ICaptureGraphBuilder2 interface. Thankfully Microsoft have given a really nice rundown of how to do this ...
http://msdn.microsoft.com/en-us/library/dd318627.aspx
Adding an encoder is not easy though and, conveniently, glossed over in that link.
Here is an example of how to enumerate all the video compressors in a system that I wrote for an MFC app of mine.
BOOL LiveInputDlg::EnumerateVideoCompression()
{
CComboBox* pVideoCompression = (CComboBox*)GetDlgItem( IDC_COMBO_VIDEOCOMPRESSION );
pVideoCompression->SetExtendedUI( TRUE );
pVideoCompression->SetCurSel( pVideoCompression->AddString( _T( "<None>" ) ) );
ICreateDevEnum* pDevEnum = NULL;
IEnumMoniker* pEnum = NULL;
HRESULT hr = S_OK;
hr = CoCreateInstance( CLSID_SystemDeviceEnum, NULL, CLSCTX_INPROC_SERVER, IID_ICreateDevEnum, (void**)&pDevEnum );
if ( FAILED( hr ) )
{
return FALSE;
}
hr = pDevEnum->CreateClassEnumerator( CLSID_VideoCompressorCategory, &pEnum, 0 );
pDevEnum->Release();
if ( FAILED( hr ) )
{
return FALSE;
}
if ( pEnum )
{
IMoniker* pMoniker = NULL;
hr = pEnum->Next( 1, &pMoniker, NULL );
while( hr == S_OK )
{
IPropertyBag* pPropertyBag = NULL;
hr = pMoniker->BindToStorage( NULL, NULL, IID_IPropertyBag, (void**)&pPropertyBag );
if ( FAILED( hr ) )
{
pMoniker->Release();
pEnum->Release();
return FALSE;
}
VARIANT varName;
VariantInit( &varName );
hr = pPropertyBag->Read( L"Description", &varName, NULL );
if ( FAILED( hr ) )
{
hr = pPropertyBag->Read( L"FriendlyName", &varName, NULL );
if ( FAILED( hr ) )
{
pPropertyBag->Release();
pMoniker->Release();
pEnum->Release();
return FALSE;
}
}
IBaseFilter* pBaseFilter = NULL;
pMoniker->BindToObject( NULL, NULL, IID_IBaseFilter, (void**)&pBaseFilter );
{
USES_CONVERSION;
TCHAR* pName = OLE2T( varName.bstrVal );
int index = pVideoCompression->AddString( pName );
pVideoCompression->SetItemDataPtr( index, pMoniker );
VariantClear( &varName );
pPropertyBag->Release();
}
hr = pEnum->Next( 1, &pMoniker, NULL );
}
pEnum->Release();
}
return TRUE;
}
Good Luck! :)

Resources