anybody know how directshow filters exchange media samples?
we know source filter grab sample (from MIC or Live) and pass it to next filter in graph.
clearly i want to know how filters pass samples to another filter?
is there a known pattern for this?
if i decide to implement audio processing filters without any third party component ,
is it possible to implement media sample exchange using muti producer/consumer Queue?
say source filter F1 is media captur filter and filter F2 is DSP filter.
F1 write to Multi Producer/Consumer queue and F2 consume captured samples enqueued by F1 if any sample available.(I think about Multi Producer/Consumer queue because in some cases one filter can have more than one output and more that one input and each filter has it's own thread).
is there any better way?
*EDIT : our solution must looks like Publisher Subscriber pattern , but i think this not suitable for media processing.
thank u so much
MSDN provides a pretty detailed description here in Overview of Data Flow in DirectShow. You want the whole article, and this is the excerpt about specifically exchange of data between filters:
[...] Whenever a filter needs to fill a buffer with data, it requests a sample from the allocator by calling IMemAllocator::GetBuffer. If the allocator has any samples that are not currently in use by another filter, the GetBuffer method returns immediately with a pointer to the sample. If all of the allocator's samples are in use, the method blocks until a sample becomes available. When the method does return a sample, the filter puts data into the buffer, sets the appropriate flags on the sample (typically including a time stamp), and delivers the sample downstream. [...]
Related
This is a bit of a newb question but I struggle with how to decide what a stream should be called and I can't seem to find documentation on how to go about it.
For instance, a lot of the examples in the documentation just have "Customer$1", "Customer$2", etc. I'm presuming Customer is the resource here and the integer is the customer ID?
For my use case, I have UnitAllocated and UnitDeallocated events happening for multiple different Operators. Ideally, I want to have a stream of unit allocations and unit deallocations for a given operator.
Should I embed the operator ID in the stream name as an easy way to tap into multiple stream for the same operator? Or is this the wrong way to go about this?
Does anyone know how to do trick modes (rewind/forward at different speeds) with MPEG-DASH ?
DASH-IF Interoperability Points V3.0 states that it is possible.
the general idea is laid out in the document but the details are not specified.
A DASH segmenter should add tracks with a frame rate lower than normal to a specially marked AdaptationSet. Roughly you could say (even though in theory you should look at the exact profile/level thresholds) half frame rate is double playoutrate. A quarter frame rate is quadruple playoutrate.
All this is only an offer to the DASH client to facilitate ffwd. The client can use it but doesn't have to. If the DASH client doesn't understand the AdaptationSet at all it will disregard it due the EssentialProperty that marking it as track play AdaptationSet.
I can't see that fast rewind can be supported in any spec conforming way. You'd need to implement it according to your needs but with no expectation of interoperability.
You can try an indication on ISO/IEC 23009-1:2014(E) =>
Annex A
The client may pause or stop a Media Presentation. In this case client simply stops requesting Media Segments or parts thereof. To resume, the client sends requests to Media Segments, starting with the next Subsegment after the last requested Subsegment.
If a specific Representation or SubRepresentation element includes the #maxPlayoutRate attribute, then the corresponding Representation or Sub-Representation may be used for the fast-forward trick mode. The client may play the Representation or Sub-Representation with any speed up to the regular speed times the specified #maxPlayoutRate attribute with the same decoder profile and level requirements as the normal playout rate. If a specific Representation or SubRepresentation element includes the #codingDependency attribute with value set to 'false', then the corresponding Representation or Sub-Representation may be used for both fast-forward and fast-rewind trick modes.
Sub-Representations in combination with Index Segments and Subsegment Index boxes may be used for efficient trick mode implementation. Given a Sub-Representation with the desired #maxPlayoutRate, ranges corresponding to SubRepresentation#level all level values from SubRepresentation#dependencyLevel may be extracted via byte ranges constructed from the information in Subsegment Index Box. These ranges can be used to construct more compact HTTP GET request.
I have an app that extracts information from incoming messages. The messages all contain the same information, but they have different forms depending on the source that sent them.
Example:
Message from source A :
A: You spent $50.00 at Macy's on 2/20/12
Message from source B :
Purchase, $50.00, Macy's, 2Feb2012, Balance $5000.00
Every message from a single source has the same form though. So at the moment, I'm doing it by writing a set of regular expressions to first identify which message I'm trying to decode (i.e. what source it came from so I know what the form of the message is), and then extracting the necessary information from the message (in the above example, I want to know the transaction amount, the store where the transaction happened, and the date). If I discover a new source for a message, or a source changes the format of their message (doesn't happen very often, but could happen), I need to manually write the regular expressions for that message. I'm sure however that I could automate this using some kind of machine learning technique. I just don't know much about machine learning, and I don't know where to even start looking for a technique that would apply to my problem. I would like someone to just point me in the right direction on where to start reading.
In order to detect and label amounts, dates, person names and similar information you can use a technique called Named Entity Recognition. The Stanford Named Entity Recognizer comes with pretrained, ready to use models.
You also use whatever labeled data you have generated so far to learn a custom model for your application. The standard techniques used for this purpose are Conditional Random Fields or Sequence Perceptron. There are many toolkits implementing these models, including:
Wapiti - A simple and fast discriminative sequence labelling toolkit.
Sequor - sequence labeler based on Collins's (2002) perceptron.
I tested this and it seems that the order of event handling is the same as the order of the list in the source event. I don't think I can rely on this as the documentation only states:
Emit simultaneous event occurrences. Up to strictness, we have spill . collect = id
How can I create a function similar to spill with a specification like:
Emit sequential event occurrences with the guarantee that no other events will fire between the first and last
Or should I try a different approach? I am trying to implementing macro functionality in Reactive-Banana
(I'm the author of reactive-banana.)
It seems that the order of event handling is the same as the order of the list in the source event.
This is correct, you can rely on that. In fact, it more or less follows from the equation spill . collect = id. After all, to yield the identity mapping, spill must preserve the order of the events ascollect has put them in the list.
Furthermore, you can inspect the source code of the modules Reactive.Banana.Model (Reactive.Banana.Internal.Model in version 0.5) and Reactive.Banana.Combinators. Taken together, they give an authoritative model implementation. You can directly check how spill behaves. (Though it may be a little confusing since the model is built in two parts.)
Nonetheless, I shall add a few words to the documentation.
I'm new to audio filters so please excuse me if i'm saying something wrong.
I like to write a code which can split up audio stored in PCM samples into two or three frequency bands and do some manipulation (like modifying their audio levels) or analysis on them then reconstruct audio samples from the output.
As far as i read on the internet for this task i could use FFT-IFFT and do manipulation on the complex form or use a time domain based filterbank which for example is used by the MP2 audio encoding format. Maybe a filter-bank is a better choice, at least i read somewhere it can be more CPU usage friendly in real time streaming environments. However i'm having hard times understanding the mathematical stuff behind a filterbank. I'm trying to find some source code (preferably in Java or C/C++) about this topic, so far with no success.
Can somebody provide me tips or links which can get me closer to an example filter bank?
Using FFT to split an Audio signal into few bands is overkill.
What you need is one or two Linkwitz-Riley filters. These filters split a signal into a high and low frequency part.
A nice property of this filter is, that if you add the low and high frequency parts you get almost the original signal back. There will be a little bit of phase-shift but the ear will not be able to hear this.
If you need more than two bands you can chain the filters. For example if you want to separate the signal at 100 and 2000Hz it would in pseudo-code somewhat like this:
low = linkwitz-riley-low (100, input-samples)
temp = linkwitz-riley-high (100, input-samples)
mids = linkwitz-riley-low (2000, temp)
highs = linkwitz-riley-high (2000, temp);
and so on..
After splitting the signal you can for example amplifiy the three output bands: low, mids and highs and later add them together to get your processed signal.
The filter sections itself can be implemented using IIR filters. A google search for "Linkwitz-Riley digital IIR" should give lots of good hits.
http://en.wikipedia.org/wiki/Linkwitz-Riley_filter
You should look up wavelets, especially Daubechies wavelets. They will let you do the trick, they're FIR filters and they're really short.
Update
Downvoting with no explanation isn't cool. Additionally, I'm right. Wavelets are filter banks and their job is to do precisely what is described in the question. IMHO, that is. I've done it many times myself.
There's a lot of filter source code to be found here