Extracting fixed number of IMFs using EMD - audio

I am doing a project on audio watermarking via Empirical Mode Decomposition. I have to decompose the signal into IMFs, embed watermark into the last IMF and reconstruct the signal at the sender end. At the receiver end, I have to decompose the signal into IMFs using EMD, extract the watermark. For this to be done successfully, I have to get same number of IMFs in sender and receiver end while decomposing the frame using EMD. I accomplished it by setting the maximum iterations to number of IMFs obtained in sender end, but I didnt find the extraction results satisfactory. Is there any other way to do this?

EMD is very sensitive to the "noise model". More over it does not have an inverse transform (i.e. "empirical"). So at least you have to build a lot of redundancy in your signal.

By default EMD keeps on decomposing signal unless it gets the trend (trend is different from residue), so it may be possible that train & test data may have different properties (volatility etc) thus different number of IMFs will generate.
As a suggestion, i will advise you to use (max_imf: ) this is an optional argument in PyEMD and it may restrict the decomposition to same numbers in train & test data.

Related

Time series anomaly detection

I would like to get suggestions about a time series problem. The data is about strain gauge on the wing of flight which is measured using different sensors. Basically, we are creating the anomalies by simulating the physics model. We have a baseline which is working fine and then created some anomalies by changing some of the factors and recorded over time. Our aim is to create a model which can find out the anomaly during the live testing(it can be a crack on the wing), basically a real time anomaly detection using statistical methods or machine learning.
A few thoughts - sorted roughly from top-to-bottom based on time investiment (assuming little/no prior ML knowledge):
start simple and validate: for what you've described this could be as simple as
create a training / validation dataset using your simulator - since you can simulate, do so for significant episodes of both "standard" and extreme forces applied to the wing
choose a real time smoother: e.g., exponential averaging or moving average, determine a proper parameter for each of your input sensor signals. smooth the input signals.
determine threshold values:
- create rough but sensible lower bound threshold values "by eye"
- use simple statistics to determine a decent threshold value (e.g., using a moving fixed length window of appropriate size, and setting the threshold at a multiple of the standard deviation in that window slid across the entire signal)
in either case, testing on further simulated (and - ideally also - real data)
If an effort like this works "good enough" - stop and move on to next (facet of) problem. If not
follow the first two steps (simulate and smooth data)
take an "autoregressive" approach create training / validation input/output pairs by running a sliding window of fixed length over the input signal(s). train a simple supervised learner on thes pairs, for each input signal or all together, to produce a (set of) time series anamoly detectors trained on your simulated data. cross-validate with the validation portion of your data.
use this model (or one like it) on your validation data to test performance - and ideall collect real data (not simulated) to validate your model even further on.
If this sort of approach produces "good enough" results - stop, and move onto the next facet of the problem.
If not - examine and try any number of anomoly detection approaches coded in a variety languages listed on an aggregator like the awesome repo for time series anomaly detection

Ideas on filtering out consistent time series data

So I have two subsets of data that represent two situations. The one that look more consistent needs to be filtered out (they are noise) while the one looks random are kept (they are motions). The method I was using was to define a moving window = 10 and whenever the standard deviation of the data within the window was smaller than some threshold, I suppressed them. However, this method could not filter out all "consistent" noise while also hurting the inconsistent one (real motion). I was hoping to use some kinds of statistical models and not machine learning to accomplish this. Any suggestions would be appreciated!
noise
real motion
The Kolmogorov–Smirnov test is used to compare two samples to determine if they come from the same distribution. I realized that real world data would never be uniform. So instead of comparing my noise data against the uniform distribution, I used scipy.stats.ks_2samp function to compare any bursts against one real motion burst. I then muted the motion if the return p-value is significantly small, meaning I can reject the hypothesis that two samples are from the same distribution.

Audio signal source separation with neural network

What I am trying to do is separating the audio sources and extract its pitch from the raw signal.
I modeled this process myself, as represented below:
Each sources oscillate in normal modes, often makes its component peaks' frequency integer multiplication. It's known as Harmonic. And then resonanced, finally combined linearly.
As seen in above, I've got many hints in frequency response pattern of audio signals, but almost no idea how to 'separate' it. I've tried countless of my own models. This is one of them:
FFT the PCM
Get peak frequency bins and amplitudes.
Calculate pitch candidate frequency bins.
For each pitch candidates, using recurrent neural network analyze all the peaks and find appropriate combination of peaks.
Separate analyzed pitch candidates.
Unfortunately, I've got non of them successfully separates the signal until now.
I want any of advices to solve these kind of problem.
Especially in modeling of source separation like my one above.
Because no one has really attempted to answer this, and because you've marked it with the neural-network tag, I'm going to address the suitability of a neural network to this kind of problem. As the question was somewhat non-technical, this answer will also be "high level".
Neural networks require some sort of sample set from which to learn. In order to "teach" a neural net to solve this problem you would essentially need to have a working set of known solutions to work from. Do you have this? If so, read on. If not, a neural is probably not what you are seeking. You stated that you have "many hints" but no real solution. This leads me to believe you probably don't have sample sets. If you can get them, great, otherwise you might be out of luck.
Supposing now that you have a sample set of Raw Signal samples and corresponding Source 1 and Source 2 outputs... Well, now you're going to need a method for deciding on a topology. Assuming you don't know a lot about how neural nets work (and don't want to), and assuming you also don't know the exact degree of complexity of the problem, I would probably recommend the open source NEAT package to get you started. I am not affiliated in any way with this project, but I have used it, and it allows you to (relatively) intelligently evolve neural network topologies to fit the problem.
Now, in terms of how a neural net would solve this specific problem. The first thing that comes to mind is that all audio signals are essentially time-series. That is to say, the information they convey is somehow dependent and related to the data at previous timesteps (e.g. the detection of some waveform cannot be done from a single time-point; it requires information about previous timesteps as well). Again, there's a million ways of solving this problem, but since I'm already recommending NEAT I'd probably suggest you take a look at the C++ NEAT Time Series mod.
If you're going down this route, you'll probably be wanting to use some sort of sliding window to provide information about the recent past at each time step. For a quick and dirty intro to sliding windows, check out this SO question:
Time Series Prediction via Neural Networks
The size of the sliding window can be important, especially if you're not using recurrent neural nets. Recurrent networks allow neural nets to remember previous time steps (at the cost of performance - NEAT is already recurrent so that choice is made for you here). You will probably want the sliding window length (ie. the number of timesteps in the past provided at every time step) to be roughly equal to your conservative guess of the largest number of previous timesteps required to gain enough information to split your waveform.
I'd say this is probably enough information to get you started.
When it comes to deciding how to provide the neural net with the data, you'll first want to normalise the input signals (consider a sigmoid function) and experiment with different transfer functions (sigmoid would probably be a good starting point).
I would imagine you'll want to have 2 output neurons, providing normalised amplitude (which you would denormalise via the inverse of the sigmoid function) as the output representing Source 1 and Source 2 respectively. For the fitness value (the way you judge the ability of each tested network to solve the problem) would be something along the lines of the negative of the RMS error of the output signal against the actual known signal (ie. tested against the samples I was referring to earlier that you will need to procure).
Suffice to say, this will not be a trivial operation, but it could work if you have enough samples to train the network against. What is a good number of samples? Well as a rule of thumb it's roughly a number that is large enough such that a simple polynomial function of order N (where N is the number of neurons in the netural network you require to solve the problem) cannot fit all of the samples accurately. This is basically to ensure you are not simply overfitting the problem, which is a serious issue with neural networks.
I hope this has been helpful! Best of luck.
Additional note: your work to date wouldn't have been in vain if you go down this route. A neural network is likely to benefit from additional "help" in the form of FFTs and other signal modelling "inputs", so you might want to consider taking the signal processing you have already done, organising into an analog, continuous representation and feeding it as an input alongside the input signal.

Signal processing: FFT overlap processing resources

Are there any good (if possible scientific) resources available (web or books) about overlap processing. I am not that interested in the effects of using overlap processing and windows when analyzing a signal, since the requirements are different. It is more about the following Real Time situation: (I am currently dealing with audio signals)
Dividing a signal into smaller parts.
Creating overlap windows.
FFTing the windowed chunks.
Do processing in the frequency domain.
IFFT the results.
put the chunks together to a continuous stream.
I am especially interested in the influence of the window used on the resulting error as well as the effect of the overlap length. However I couldn't find any good resources that deal with the subject in detail. Any suggestions?
Edit:
After some discussions if using a window function is appropriate, I found a decent handout explaining the overlap and add/save method. http://www.ece.tamu.edu/~deepa/ecen448/handouts/08c/10_Overlap_Save_Add_handouts.pdf
However, after doing some tests, I noticed that the windowed version would perform more accurate in most cases than the overlap & add/save method. Could anybody confirm this?
I don't want to jump to any conclusions regarding computation time though....
Edit2:
Here are some graphs from my tests:
I created a signal, which consists of three cosine waves
I used this filter function in the time domain for filtering. (It's symmetric, as it is applied to the whole output of the FFT, which also is symmetric for real input signals)
The output of the IFFT looks like this: It can be seen that low frequencies are attenuated more than frequency in the mid range.
For the overlap add/save and the windowed processing I divided the input signal into 8 chunks of 256 samples. After reassembling them they look like that. (sample 490 - 540)
It can be seen that the overlap add/save processes differ from the windowed version at the point where chunks are put together (sample 511). This is the error which leads to different results when comparing windowed process and overlap add/save. The windowed process is closer to the one processed in one big junk.
However, I have no idea why they are there or if they shouldn't be there at all.
This is fairly well-known area of signal processing, and generally speaking if you are doing processing along the lines of FFT -> spectral processing -> IFFT you need to use the "overlap and add" approach. Cross-correlation of two inputs is a classic example, done much more easily in the spectral domain than the time domain.
Here's a short paper I found right away via Google (I just searched for "fft overlap and add"): http://www.coe.montana.edu/ee/rmaher/ee477/ee477_fftlab_sp07.pdf
I would recommend you invest in a good Signal Processing book, such as the classic Rabiner & Gold "Theory and application of digital signal processing" (Prentice-Hall ISBN 0-13-914101-4). That should cover the concept of overlap-and-add processing.
When using an FFT for overlap-add or overlap-save fast convolution filtering, normally you don't want to use a windowing function. The circular windowing artifacts cancel out when combining successive FFT frames in canonical overlap add/save filtering.
ADDED:
If you do use a non-rectangular window, you might want to make sure that all the overlapped frames of windows sum to DC, otherwise your resulting filtered signal will have amplitude scalloping. Rectangular windows and raised-cosine (von Hann) windows will sum to DC if the overlap amount is an exact submultiple of the window width (except, of course, at the very start and end of the overlap sequence).
I have been playing with this attempting to answer the question for myself as to why one would use a window. My only references to a synthesis window are this:
https://ccrma.stanford.edu/~jos/sasp/Inverse_FFT_Synthesis.html
http://recherche.ircam.fr/anasyn/roebel/amt_audiosignale/VL2.pdf
http://www.dspdimension.com/tutorials/
Stephan Bernsee has some good overview information. His smbpitchshift code uses a synthesis window -- He uses the raised cosine on the input block, then applies it again on the output block, but this I believe is necessary because the pitch shifting algorithm is not a linear filtering operation, so it is certain there may be discontinuous artifacts on the window boundaries, thus a synthesis window is used to create a smooth transition between frames.
I think the reason there is not much information specifically addressing windowing for frequency domain real-time convolution is because it doesn't have a practical application unless you also need to do some analysis (ie, and adaptive filter of some sort), then the topics related to spectral spreading is again of interest.
I have plotted outputs from a filtered signal using both a raised cosine window as well as overlap-add method, and the end result is an identical IR, and identical signals. It comes as no surprise since the same operations performed in the time domain yield the same results.
On the other hand, if I implement a broken filter kernel, a smooth windowing function can help mask artifacts. This in a sense windows the broken IR so there is a more cohesive transition between frames. It would still be better to have an IR that is limited to length nfft/2 in the time domain. If you need to obtain a filter response with an IR longer than nfft/2, then you should consider either using a larger FFT size (if latency is not a problem) or use a partitioned convolution scheme:
http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=0CB4QFjAA&url=http%3A%2F%2Fpcfarina.eng.unipr.it%2FPublic%2FPapers%2F164-Mohonk2001.PDF&ei=qtH0TorDEoKziQKAloHEDg&usg=AFQjCNGDmz79DiuG1kmPXifbWJ7M-gr9rQ&sig2=CMopEcGc1VArZ3gipWTr_w
or
http://www.music.miami.edu/programs/mue/Research/jvandekieft/jvchapter2.htm
I hope that is helpful to somebody reading this
I hope those links help, even though it doesn't directly address windowing as used in real-time Frequency domain filtering.

Downsampling and applying a lowpass filter to digital audio

I've got a 44Khz audio stream from a CD, represented as an array of 16 bit PCM samples. I'd like to cut it down to an 11KHz stream. How do I do that? From my days of engineering class many years ago, I know that the stream won't be able to describe anything over 5500Hz accurately anymore, so I assume I want to cut everything above that out too. Any ideas? Thanks.
Update: There is some code on this page that converts from 48KHz to 8KHz using a simple algorithm and a coefficient array that looks like { 1, 4, 12, 12, 4, 1 }. I think that is what I need, but I need it for a factor of 4x rather than 6x. Any idea how those constants are calculated? Also, I end up converting the 16 byte samples to floats anyway, so I can do the downsampling with floats rather than shorts, if that helps the quality at all.
Read on FIR and IIR filters. These are the filters that use a coefficent array.
If you do a google search on "FIR or IIR filter designer" you will find lots of software and online-applets that does the hard job (getting the coefficients) for you.
EDIT:
This page here ( http://www-users.cs.york.ac.uk/~fisher/mkfilter/ ) lets you enter the parameters of your filter and will spit out ready to use C-Code...
You're right in that you need apply lowpass filtering on your signal. Any signal over 5500 Hz will be present in your downsampled signal but 'aliased' as another frequency so you'll have to remove those before downsampling.
It's a good idea to do the filtering with floats. There are fixed point filter algorithms too but those generally have quality tradeoffs to work. If you've got floats then use them!
Using DFT's for filtering is generally overkill and it makes things more complicated because dft's are not a contiuous process but work on buffers.
Digital filters generally come in two tastes. FIR and IIR. The're generally the same idea but IIF filters use feedback loops to achieve a steeper response with far less coefficients. This might be a good idea for downsampling because you need a very steep filter slope there.
Downsampling is sort of a special case. Because you're going to throw away 3 out of 4 samples there's no need to calculate them. There is a special class of filters for this called polyphase filters.
Try googling for polyphase IIR or polyphase FIR for more information.
Notice (in additions to the other comments) that the simple-easy-intuitive approach "downsample by a factor of 4 by replacing each group of 4 consecutive samples by the average value", is not optimal but is nevertheless not wrong, nor practically nor conceptually. Because the averaging amounts precisely to a low pass filter (a rectangular window, which corresponds to a sinc in frequency). What would be conceptually wrong is to just downsample by taking one of each 4 samples: that would definitely introduce aliasing.
By the way: practically any software that does some resampling (audio, image or whatever; example for the audio case: sox) takes this into account, and frequently lets you choose the underlying low-pass filter.
You need to apply a lowpass filter before you downsample the signal to avoid "aliasing". The cutoff frequency of the lowpass filter should be less than the nyquist frequency, which is half the sample frequency.
The "best" solution possible is indeed a DFT, discarding the top 3/4 of the frequencies, and performing an inverse DFT, with the domain restricted to the bottom 1/4th. Discarding the top 3/4ths is a low-pass filter in this case. Padding to a power of 2 number of samples will probably give you a speed benefit. Be aware of how your FFT package stores samples though. If it's a complex FFT (which is much easier to analyze, and generally has nicer properties), the frequencies will either go from -22 to 22, or 0 to 44. In the first case, you want the middle 1/4th. In the latter, the outermost 1/4th.
You can do an adequate job by averaging sample values together. The naïve way of grabbing samples four by four and doing an equal weighted average works, but isn't too great. Instead you'll want to use a "kernel" function that averages them together in a non-intuitive way.
Mathwise, discarding everything outside the low-frequency band is multiplication by a box function in frequency space. The (inverse) Fourier transform turns pointwise multiplication into a convolution of the (inverse) Fourier transforms of the functions, and vice-versa. So, if we want to work in the time domain, we need to perform a convolution with the (inverse) Fourier transform of box function. This turns out to be proportional to the "sinc" function (sin at)/at, where a is the width of the box in the frequency space. So at every 4th location (since you're downsampling by a factor of 4) you can add up the points near it, multiplied by sin (a dt) / a dt, where dt is the distance in time to that location. How nearby? Well, that depends on how good you want it to sound. It's common to ignore everything outside the first zero, for instance, or just take the number of points to be the ratio by which you're downsampling.
Finally there's the piss-poor (but fast) way of just discarding the majority of the samples, keeping just the zeroth, the fourth, and so on.
Honestly, if it fits in memory, I'd recommend just going the DFT route. If it doesn't use one of the software filter packages that others have recommended to construct the filter for you.
The process you're after called "Decimation".
There are 2 steps:
Applying Low Pass Filter on the data (In your case LPF with Cut Off at Pi / 4).
Downsampling (In you case taking 1 out of 4 samples).
There are many methods to design and apply the Low Pass Filter.
You may start here:
http://en.wikipedia.org/wiki/Filter_design
You could make use of libsamplerate to do the heavy lifting. Libsamplerate is a C API, and takes care of calculating the filter coefficients. You to select from different quality filters so that you can trade off quality for speed.
If you would prefer not to write any code, you could just use Audacity to do the sample rate conversion. It offers a powerful GUI, and makes use of libsamplerate for it's sample rate conversion.
I would try applying DFT, chopping 3/4 of the result and applying inverse DFT. I can't tell if it will sound good without actually trying tough.
I recently came across BruteFIR which may already do some of what you're interested in?
You have to apply low-pass filter (removing frequencies above 5500 Hz) and then apply decimation (leave every Nth sample, every 4th in your case).
For decimation, FIR, not IIR filters are usually employed, because they don't depend on previous outputs and therefore you don't have to calculate anything for discarded samples. IIRs, generally, depends on both inputs and outputs, so, unless a specific type of IIR is used, you'd have to calculate every output sample before discarding 3/4 of them.
Just googled an intro-level article on the subject: https://www.dspguru.com/dsp/faqs/multirate/decimation

Resources