About DirectShow source filter - audio

I have created (C++, Win10, VS2022) a simple source DirectShow filter. It gets audio stream from the external source (file – for testing, network – in future) and produces audio stream on output pin, which I connect to soundspeaker.
In order to do it I have implemented FillBuffer method for the output pin (CSourceStream) of the filter. Media type - MEDIATYPE_Stream/MEDIASUBTYPE_PCM.
Before being connected the pin gets info about media type via SetMediaType (WAVEFORMATEX) and remembers parameters of audio - wBitsPerSample; nSamplesPerSec; nChannels. Audio stream comes from the external source (file or net) to FillBuffer with the parameters - wBitsPerSample; nSamplesPerSec; nChannels. It works fine.
But I need to handle situation, when external source will send audio stream to the filter , with another parameter (for example, old sample had 11025 Hz, and the current = 22050).
Could you help me – which actions and calls should I make in FillBuffer() method if I will receive audio stream with changed wBitsPerSample or nSamplesPerSec or nChannels parameter ?
The fact is that these parameters have already been agreed between my output pin and the input pin of the soundspeaker and I need to change these agreement correctly.

You need to improve the implementation and handle
Dynamic Format Changes
...
QueryAccept (Downstream) is used when If an output pin proposes a format change to its downstream peer, but only if the new format does not require a larger buffer.
This might be not trivial because baseline DirectShow filters are not required to support dynamic changes. That is, ability to change format is dependent on your actual pipeline and implementation of other filters.
You should also be able to find SDK helpers, CDynamicSourceStream and CDynamicSource.

Related

IMFTransform SetInputType()/SetOutputType() fails

I'm trying to playback MP3 (and similar audio files) using WASAPI shared mode and a media foundation IMFSourceReader on Windows 7. From what I understand I have to use an IMFTransform between the IMFSourceReader decoding and the WASAPI playback. Everything seems fine apart from when I call SetInputType()/SetOutputType() on the IMFTransform?
The relevant snippets of code are:
MFCreateSourceReaderFromURL(...); // Various test mp3 files
...
sourceReader->GetCurrentMediaType(MF_SOURCE_READER_FIRST_AUDIO_STREAM, &reader.audioType);
//sourceReader->GetNativeMediaType(MF_SOURCE_READER_FIRST_AUDIO_STREAM, 0, &reader.audioType);
...
audioClient->GetMixFormat(&player.mixFormat);
...
MFCreateMediaType(&player.audioType);
MFInitMediaTypeFromWaveFormatEx(player.audioType, player.mixFormat, sizeof(WAVEFORMATEX) + player.mixFormat->cbSize);
...
hr = CoCreateInstance(CLSID_CResamplerMediaObject, NULL, CLSCTX_INPROC_SERVER, IID_IUnknown, (void**)&unknown);
ASSERT(SUCCEEDED(hr));
hr = unknown->QueryInterface(IID_PPV_ARGS(&resampler.transform));
ASSERT(SUCCEEDED(hr));
unknown->Release();
hr = resampler.transform->SetInputType(0, inType, 0);
ASSERT(hr != DMO_E_INVALIDSTREAMINDEX);
ASSERT(hr != DMO_E_TYPE_NOT_ACCEPTED);
ASSERT(SUCCEEDED(hr)); // Fails here with hr = 0xc00d36b4
hr = resampler.transform->SetOutputType(0, outType, 0);
ASSERT(hr != DMO_E_INVALIDSTREAMINDEX);
ASSERT(hr != DMO_E_TYPE_NOT_ACCEPTED);
ASSERT(SUCCEEDED(hr)); // Fails here with hr = 0xc00d6d60
I suspect I am misunderstanding how to negotiate the input/output IMFMediaType's between things, and also how to take into consideration that IMFTransform needs to operate on uncompressed data?
It seems odd to me the output type fails but maybe that is a knock on effect of the input type failing first - and if I try to set the output type first it fails also.
In recent versions of Windows you would probably prefer to take advantage of stock functionality which is already there for you.
When you configure Source Reader object, IMFSourceReader::SetCurrentMediaType lets you specify media type you want your data in. If you set media type compatible with WASAPI requirements, Source Reader would automatically add transform to convert the data for you.
However...
Audio resampling support was added to the source reader with Windows 8. In versions of Windows prior to Windows 8, the source reader does not support audio resampling. If you need to resample the audio in versions of Windows earlier than Windows 8, you can use the Audio Resampler DSP.
... which means that indeed you might need to manage the MFT yourself. The input media type for the MFT is supposed to be coming from IMFSourceReader::GetCurrentMediaType. To instruct source reader to use uncompressed audio you need to build a media type decoder for this type of stream would decode audio to. For example, if your file is MP3 then you would read number of channels, sampling rate and build a compatible PCM media type (or take system decoder and ask it separately for output media type, which is even a cleaner way). You would set this uncompressed audio media type using IMFSourceReader::SetCurrentMediaType. This media type would also be your input media type for audio resampler MFT. This would instruct source reader to add necessary decoders and IMFSourceReader::ReadSample would give you converted data.
Output media type for reasmpler MFT would be derived from audio format you obtained from WASAPI and converted using API calls you mentioned at the top of your code snippet.
To look the error codes up you can use this:
https://www.magnumdb.com/search?q=0xc00d36b4
https://www.magnumdb.com/search?q=0xc00d6d60
Also, you, generally, should be able to play audio files using Media Foundation Media Session API with smaller effort. Media Session uses the same primitives to build a playback pipeline and takes care of format fitting.
Ah so are you saying I need to create an additional object that is the decoder to fit between the IMFSourceReader and IMFTransform/Resampler?
No. By doing SetCurrentMediaType with proper media type you have Source Reader adding decoder internally so that it could give you already decompressed data. Starting with Windows 8 it is also capable to do conversion between PCM flavors, but in Windows 7 you need to take care of this yourself with Audio Resampler DSP.
You can manage decoder yourself but you don't need to since Source Reader's decoder would do the same more reliably.
You might need a separate decoder just to help you guess what PCM media type decoder would produce so that you request it from Source Reader. MFTEnumEx is proper API to look decoder up.
I am not sure how to decide on or create a suitable decoder object? Do I need to enumerate a list of suitable ones somehow rather than assume specific ones?
The mentioned MFTEnum, MFTEnumEx API calls can enumerate decoders, both all available or filtered by given criteria.
One another way is to use partial media type (see relevant explanation and code snippet here: Tutorial: Decoding Audio). Partial media type is a signal about desired format requesting that Media Foundation API supplies a primitive that matches this partial type. See comments below for related discussion links.

External source for sample rate of Redhawk system

We are using Redhawk for an FM modulator. It reads an audio modulating signal from a file, performs the modulation, then sends the modulated data from Redhawk to an external program via TCP/IP for DAC and up-conversion to RF.
The data flows through the following components: rh.FileReader, rh.DataConverter, rh.fastfilter, an FM modulator, rh.DataConverter, and rh.sinksocket. The FM modulator is a custom component.
The rh.sinksocket sends data to an external server program that sends the samples from Redhawk to an FPGA and DAC.
At present the sample rate appears to be controlled via the rh.FileReader component. However, we would like the external DAC to set the sample rate of the system, not the rh.FileReader component of Redhawk, for example via TCP/IP flow control.
Is it possible to use an external DAC as the clock source for a Redhawk waveform?
The property on FileReader dictating the sample rate is simply telling it what the sample rate of the provided file is. This is used for the Signal Related Information (SRI) passed to down stream components and then output rate if you do not block or throttle. Eg. FileReader does not do any resampling of the given file to meet the sample rate given.
If you want to resample to a given rate you can try the ArbitraryRateResampler component.
Regarding setting these properties via some external mechanism (TCP/IP) you would want to write a specific component or REDHAWK service that listens for this external event and then makes a configure call to set the property you'd like changed.
If these events are global and can apply to many applications on your domain then a service is the right pattern, if these events are specific to a single application then a component might make more sense.

J2ME - Fm recording app - can't buffer and write to a file

Hey every one I was developing a J2ME app that records fm radio,I have tried so many methods but I have failed. The major problem I faced is that in the media api for J2ME once the code for tuning into a specific fm channel is written(and works but only outputs directly to the speaker) I couldn't find a way to buffer the output and write it into a file.Thanks in advance.
I think it is not possible with MMAPI directly. I assume the fm radio streams via RTSP, and you can specify it as data source for MMAPI, but if you want to store the audio data, you need to fetch it in your own buffer instead, and then pass to MMAPI Player via InputStream.
In that way you will need to code your own handling for RTSP (or whatever your fm radio uses), and convert data into format acceptable by MMAPI Player via InputStream, for example audio/x-wav or audio/amr. If header of the format doesn't need to specify length of data, then you probably can 'stream' it via your buffer receiving data from RTSP source.
This is some kind of low level coding, I think it will be hard to implement in J2ME.

DirectShow, specifically Rate Matching, time stamps and the DirectSound Audio Renderer

Can anyone give me a concise explanation of how and why DirectShow DirectSound Audio Renderer will adjust the rate when I have my custom capture filter that does not expose a clock?
I cannot make any sense of it at all. When audio starts, I assign a rtStart of zero plus the duration of the sample (numbytes / m_wfx.nAvgBytesPerSec). Then the next sample has a start time of the end of the previous sample, and so on....
Some time later, the capture filter senses Directshow is consuming samples too rapidly, and tries to set a timestamp of some time in the future, which the audio renderer completely ignores. I can, as a test, suddenly tell a sample it must not be rendered until 20 secs in the future (StreamTime() + UNITS), and again the renderer just ignores it. However, the Null Audio Renderer does what it is told, and the whole graph freezes for 20 seconds, which is the expected behaviour.
In a nutshell, then, I want the audio renderer to use either my capture clock (or its own, or the graph's, I dont care) but I do need it to obey the time stamps I'm sending to it. What I need it to do is squish or stretch samples, ever so subtly, to make up for the difference in the rates between DSound and the oncoming stream (whose rate I cannot control).
MSDN explains the technology here: Live Sources, I suppose you are aware of this documentation topic.
Rate matching takes place when your source is live, otherwise audio renderer does not need to bother and it expects the source to keep input queue pre-loaded with data, so that data is consumed at the rate it is needed.
It seems that your filter is capturing in real time (capture filter and then you mention you don't control the rate of data you obtain externally). So you need to make sure your capture filter is recognized as live source and then you choose the clock for playback, and overall the mode of operation. I suppose you want the behavior described hear AM_PUSHSOURCECAPS_PRIVATE_CLOCK:
the source filter is using a private clock to generate time stamps. In this case, the audio renderer matches rates against the time stamps.
This is what you write about above:
you time stamp according to external source
playback is using audio device clock
audio renderer does rate matching to match the rates
To see how exactly rate matching takes place, you need to open audio renderer property pages, Advanced page:
Data under Slaving Info will show the rate matching details (48000/48300 matching in my example). The data is also available programmatically via IAMAudioRendererStats::GetStatParam.

Adding audio effects (reverb etc..) to a BackgroundAudioPlayer driven streaming audio app

I have a windows phone 8 app which plays audio streams from a remote location or local files using the BackgroundAudioPlayer. I now want to be able to add audio effects, for example, reverb or echo, etc...
Please could you advise me on how to do this? I haven't been able to find a way of hooking extra audio processing code into the pipeline of audio processing even through I've read much about WASAPI, XAudio2 and looked at many code examples.
Note that the app is written in C# but, from my previous experience with writing audio processing code, I know that I should be writing the audio code in native C++. Roughly speaking, I need to find a point at which there is an audio buffer containing raw PCM data which I can use as an input for my audio processing code which will then write either back to the same buffer or to another buffer which is read by the next stage of audio processing. There need to be ways of synchronizing what happens in my code with the rest of the phone's audio processing mechanisms and, of course, the process needs to be very fast so as not to cause audio glitches. Or something like that; I'm used to how VST works, not how such things might work in the Windows Phone world.
Looking forward to seeing what you suggest...
Kind regards,
Matt Daley
I need to find a point at which there is an audio buffer containing
raw PCM data
AFAIK there's no such point. This MSDN page hints that audio/video decoding is performed not by the OS, but by the Qualcomm chip itself.
You can use something like Mp3Sharp for decoding. This way the mp3 will be decoded on the CPU by your managed code, you can interfere / process however you like, then feed the PCM into the media stream source. Main downside - battery life: the hardware-provided codecs should be much more power-efficient.

Resources