Note Onset Detection Dynamic Thresholding - audio

So I am working on Note Onset Detection. I have implemented the method here: Note onset detection
However, I am finding some difficulty or problems regarding the 'static' nature of the method. What I am looking for is how to make the thresholding method 'dynamic'. But I am finding trouble finding suitable solutions.
Aside from that, I am also working on instead of having the amplitude value as the basis of passing the threshold, I make use of the 'difference' between 2 amplitude values, to know when a signal increases or not, and how much it increased or decreased. This is what I'm using currently.
Anyone willing to help or has worked with this kind of problem? Thank you!
Additionally, by any chance do any of you have a PDF file of this paper: http://www.mendeley.com/research/methods-detecting-impulsive-noise-speech-audio-signals-14/

Volume compression is a form of AGC (Automatic Gain Control), and AGC can be done dynamically. The are plenty of close to real-time AGC algorithms to be found in search results, although a bit of delay is required if you want an AGC attack that's smoother than a step function.

Related

Is SRM in Google Optimize (Bayesian Model) a thing

So checking for Sample Ratio Mismatch is good for data quality.
But in Google Optimize i can't influence the sample size or do something against it.
My problem is, out of 15 A/B Tests I only got 2 Experiment with no SRM.
(Used this tool https://www.lukasvermeer.nl/srm/microsite/)
In the other hand the bayesian model deals with things like different sample sizes and I dont need to worry about, but the opinions on this topic are different.
Is SRM really a problem in Google Optimize or can I ignore it?
SRM affects Bayesian experiments just as much as it affects Frequentist. SRM happens when you expect a certain traffic split, but end up with a different one. Google Optimize is a black box, so it's impossible to tell if the uneven sample sizes you are experiencing are intentional or not.
Lots of things can cause a SRM, for example if your variation's javascript code has a bug in some browsers those users may not be tracked properly. Another common cause is if your variation causes page load times to increase, more people will abandon the page and you'll see a smaller sample size than expected.
That lack of statistical rigor and transparency is one of the reasons I built Growth Book, which is an open source A/B testing platform with a Bayesian stats engine and automatic SRM checks for every experiment.

Methods for simulating moving audio source

I'm currently researching an problem regarding DOA (direction of arrival) regression for an audio source, and need to generate training data in the form of audio signals of moving sound sources. In particular, I have the stationary sound files, and I need to simulate a source and microphone(s) with the distances between them changing to reflect movement.
Is there any software online that could potentially do the trick? I've looked into pyroomacoustics and VA as well as other potential libraries, but none of them seem to deal with moving audio sources, due to the difficulties in simulating the doppler effect.
If I were to write up my own simulation code for dealing with this, how difficult would it be? My use case would be an audio source and a microphone in some 2D landscape, both moving with their own velocities, where I would want to collect the recording from the microphone as an audio file.
Some speculation here on my part, as I have only dabbled with writing some aspects of what you are asking about and am not experienced with any particular libraries. Likelihood is good that something exists and will turn up.
That said, I wonder if it would be possible to use either the Unreal or Unity game engine. Both, as far as I can remember, grant the ability to load your own cues and support 3D including Doppler.
As far as writing your own, a lot depends on what you already know. With a single-point mike (as opposed to stereo) the pitch shifting involved is not that hard. There is a technique that involves stepping through the audio file's DSP data using linear interpolation for steps that lie in between the data points, which is considered to have sufficient fidelity for most purposes. Lot's of trig, too, to track the changes in velocity.
If we are dealing with stereo, though, it does get more complicated, depending on how far you want to go with it. The head masks high frequencies, so real time filtering would be needed. Also it would be good to implement delay to match the different arrival times at each ear. And if you start talking about pinnas, I'm way out of my league.
As of now it seems like Pyroomacoustics does not support moving sound sources. However, do check a possible workaround suggested by the developers here in Issue #105 - where the idea of using a time-varying convolution on a dense microphone array is suggested.

Change pitch of audio buffer

I am trying to change the pitch of a buffer sample using a scriptprocessor, but what kind of formulas do I need to do this? I am not looking for the exact js code, but just for some general mathematical how to. I would love to have some code for this, as the first answer has a lot of formulas where I have no idea on how to implement that in JS.
I know that this is working with time, but according to this it can be done with the FFT, but I have no idea how one should do that.
For one method of doing time-pitch modification using an FFT, look up phase vocoder. Here's one explanation of how a phase vocoder works (but a search will turn up many others): http://www.guitarpitchshifter.com/algorithm.html
I believe https://github.com/mikolalysenko/pitch-shift would be appropriate (the quality is not on par with other code, but this library is rather easy to understand/use). You can hear a demo at http://mikolalysenko.github.io/pitch-shift/.

How to amplify certain audio samples, particularly amplifying a certain frequency?

Can anyone provide sample pseudocode or share some existing link that has sample code.
Like for example I have a mix audio of 1kHz or 2kHz or 8kHz or so, and I want to boost certain frequencies like 1kHz only in real-time.
Reading some DSP books and resources confuses me.
You just need to design and implement a suitable digital filter. This is a large and complex subject area though, so you won't get a simple answer here. Probably the best thing as a first step would be to read a good introductory book on DSP, e.g. Understanding DSP by Rick Lyons, which is a very good for beginners as it's not too heavy on the math and has a more practical bent than most such introductory DSP books.
For this particular application though what you are trying to do is similar to implementing a graphic equalizer, and there are many pointers to how to implement this kind of thing if you use e.g. "graphic equalizer" as a search term.
There's a lot of math behind digital filtering. Sorry, I think it is important to at least understand basic filters (like those used in electronics). If you don't want to go through the basics: best to get an audio graphics equaliser where you can play with the (virtual) sliders. If you want to implement a very specific filter, please read on.
Real time: depends on your computing platform. If this is a small micro (like AVR, Microchip PIC,..) you'll need an efficient algorithm. This is likely a IIR band pass filter. The equivalent of a graphics equaliser consists of multiple band pass filters, all summed together. See http://en.wikipedia.org/wiki/Infinite_impulse_response
A more computing intensive algorithm uses FIR filters. In that case you can also control the phase of the filtered signal. http://en.wikipedia.org/wiki/Finite_impulse_response
If you find an algorithm (i.e. IIR), you'll need to calculate the coefficients. The algorithm is simple, calculating the coefficients is not.
I found a book matching your question: Audio digital signal processing in real time
I browsed through it; it seems to have the right answers.

How to compare spoken audio against reference recording - language learning

I am looking for a way to compare a user submitted audio recording against a reference recording for comparison in order to give someone a grade or percentage for language learning.
I realize that this is a very un-scientific way of doing things and is more than a gimmick than anything.
My first thoughts are some sort of audio fingerprinting, or waveform comparison.
Any ideas where I should be looking?
This is by no means a trivial problem to solve, though there is an abundance of research on the topic. Presently the most successful forms of machine learning in the speech recognition domain apply Hidden Markov Model techniques.
You may also want to take a look at existing implementations of HMM algorithms. One such library in its early stages is ghmm.
Perhaps even better and more readily applicable to your problem is HTK.
In addition to chomp's great answer, one important keyword you probably need to look up is Dynamic Time Warping (DTW). This is the wikipedia article: http://en.wikipedia.org/wiki/Dynamic_time_warping

Resources