1. Context
I'm using GLSL to plot the amplitude of a waveform at a given frequency like this:
Displaying simple waveforms like the ones above is a trivial task, it's just a matter of using the right equations (GLSL excerpt code available here). What I'm trying to do now is to display the results of a frequency modulation between two waveforms.
2. Research
After some research I've found two possible ways to accomplish this:
Using a dsp-like approach, as far as I know using phase accumulators (see first answer here) in combination with lookup tables is the suggested approach when working with signal processing. As a GLSL beginner I understood that this is not possibile with GLSL shaders because they can't store and increment variables across multiple GPU cycles.
Using a pure mathematical approach, this involves equations like Chowning's simple fm formula.
FM formula http://img577.imageshack.us/img577/4820/fmformula.png
That formula works great in some situations (cosine wave modulating another cosine or cosine modulating a sawtooth) but I wasn't able to find a general equation that would work in every case (when a sawtooth wave modulates a cosine I expect the carrier frequency to be modulated by sawtooth amplitude, but all I get is the carrier wave apparently unmodulated).
3. Questions
What would be the best approach to solve this problem?
Is a dsp-like approach possible with GLSL?
If not, is there any general FM equation flexible enough to do all the work?
Considering my lack of skills in all the disciplines involved here (audio dsp, computer sciences, gpu programming, maths) I won't be surprised if I'm missing something really simple here. Please be patient.
You're right that the usual dsp-like approach of phase accumulators isn't well suited to parallel computing on a GPU; the 'pure mathematical' method is therefore probably the best bet.
The generalisation of Chowning's simple FM formula to more general frequency modulation functions is given on the Wikipedia page for Frequency Modulation (the very first equation on that page). The key point is that the argument to the cos function is the phase, which, as the equation on Wikipedia indicates, is the time integral of the the frequency. In FM, the frequency is typically a carrier plus a modulation: for example, with Chowning's formula for simple sinusoidal FM, the frequency as a function of time t is
f(t) = f_c - M * f_m * sin(f_m * t)
where f_c is the carrier frequency, M is the modulation amount, and f_m is the frequency of modulation. This integrates to the phase
p(t) = f_c * t + M * cos(f_m * t)
which corresponds to phase in the equation in your question.
To modulate a cosine with a sawtooth, f(t) would be a sawtooth wave (plus a carrier frequency), and so to find p(t) you would need to find the time-integral of a sawtooth wave. This is relatively straightforward (it should be a piecewise-quadratic function), but people at math.stackexchange should be able to help if you have difficulty.
(Note: I've phrased everything here in terms of time t, but this could equally well be space, x, instead.)
Related
For a project of mine I am working with sampled sound generation and I need to create various waveforms at various frequencies. When the waveform is sinusoidal, everything is fine, but when the waveform is rectangular, there is trouble: it sounds as if it came from the eighties, and as the frequency increases, the notes sound wrong. On the 8th octave, each note sounds like a random note from some lower octave.
The undesirable effect is the same regardless of whether I use either one of the following two approaches:
The purely mathematical way of generating a rectangular waveform as sample = sign( secondsPerHalfWave - (timeSeconds % secondsPerWave) ) where secondsPerWave = 1.0 / wavesPerSecond and secondsPerHalfWave = secondsPerWave / 2.0
My preferred way, which is to describe one period of the wave using line segments and to interpolate along these lines. So, a rectangular waveform is described (regardless of sampling rate and regardless of frequency) by a horizontal line from x=0 to x=0.5 at y=1.0, followed by another horizontal line from x=0.5 to x=1.0 at y=-1.0.
From what I gather, the literature considers these waveform generation approaches "naive", resulting in "aliasing", which is the cause of all the undesirable effects.
What this all practically translates to when I look at the generated waveform is that the samples-per-second value is not an exact multiple of the waves-per-second value, so each wave does not have an even number of samples, which in turn means that the number of samples at level 1.0 is often not equal to the number of samples at level -1.0.
I found a certain solution here: https://www.nayuki.io/page/band-limited-square-waves which even includes source code in Java, and it does indeed sound awesome: all undesirable effects are gone, and each note sounds pure and at the right frequency. However, this solution is entirely unsuitable for me, because it is extremely computationally expensive. (Even after I have replaced sin() and cos() with approximations that are ten times faster than Java's built-in functions.) Besides, when I look at the resulting waveforms they look awfully complex, so I wonder whether they can legitimately be called rectangular.
So, my question is:
What is the most computationally efficient method for the generation of periodic waveforms such as the rectangular waveform that does not suffer from aliasing artifacts?
Examples of what the solution could entail:
The computer audio problem of generating correct sample values at discrete time intervals to describe a sound wave seems to me somewhat related to the computer graphics problem of generating correct integer y coordinates at discrete integer x coordinates for drawing lines. The Bresenham line generation algorithm is extremely efficient, (even if we disregard for a moment the fact that it is working with integer math,) and it works by accumulating a certain error term which, at the right time, results in a bump in the Y coordinate. Could some similar mechanism perhaps be used for calculating sample values?
The way sampling works is understood to be as reading the value of the analog signal at a specific, infinitely narrow point in time. Perhaps a better approach would be to consider reading the area of the entire slice of the analog signal between the last sample and the current sample. This way, sampling a 1.0 right before the edge of the rectangular waveform would contribute a little to the sample value, while sampling a -1.0 considerable time after the edge would contribute a lot, thus naturally yielding a point which is between the two extreme values. Would this solve the problem? Does such an algorithm exist? Has anyone ever tried it?
Please note that I have posted this question here as opposed to dsp.stackexchange.com because I do not want to receive answers with preposterous jargon like band-limiting, harmonics and low-pass filters, lagrange interpolations, DC compensations, etc. and I do not want answers that come from the purely analog world or the purely theoretical outer space and have no chance of ever receiving a practical and efficient implementation using a digital computer.
I am a programmer, not a sound engineer, and in my little programmer's world, things are simple: I have an array of samples which must all be between -1.0 and 1.0, and will be played at a certain rate (44100 samples per second.) I have arithmetic operations and trigonometric functions at my disposal, I can describe lines and use simple linear interpolation, and I need to generate the samples extremely efficiently because the generation of a dozen waveforms simultaneously and also the mixing of them together may not consume more than 1% of the total CPU time.
I'm not sure but you may have a few of misconceptions about the nature of aliasing. I base this on your putting the term in quotes, and from the following quote:
What this all practically translates to when I look at the generated
waveform is that the samples-per-second value is not an exact multiple
of the waves-per-second value, so each wave does not have an even
number of samples, which in turn means that the number of samples at
level 1.0 is often not equal to the number of samples at level -1.0.
The samples/sec and waves/sec don't have to be exact multiples at all! One can play back all pitches below the Nyquist. So I'm not clear what your thinking on this is.
The characteristic sound of a square wave arises from the presence of odd harmonics, e.g., with a note of 440 (A5), the square wave sound could be generated by combining sines of 440, 1320, 2200, 3080, 3960, etc. progressing in increments of 880. This begs the question, how many odd harmonics? We could go to infinity, theoretically, for the sharpest possible corner on our square wave. If you simply "draw" this in the audio stream, the progression will continue well beyond the Nyquist number.
But there is a problem in that harmonics that are higher than the Nyquist value cannot be accurately reproduced digitally. Attempts to do so result in aliasing. So, to get as good a sounding square wave as the system is able to produce, one has to avoid the higher harmonics that are present in the theoretically perfect square wave.
I think the most common solution is to use a low-pass filtering algorithm. The computations are definitely more cpu-intensive than just calculating sine waves (or doing FM synthesis, which was my main interest). I am also weak on the math for DSP and concerned about cpu expense, and so, avoided this approach for long time. But it is quite viable and worth an additional look, imho.
Another approach is to use additive synthesis, and include as many sine harmonics as you need to get the tonal quality you want. The problem then is that the more harmonics you add, the more computation you are doing. Also, the top harmonics must be kept track of as they limit the highest note you can play. For example if using 10 harmonics, the note 500Hz would include content at 10500 Hz. That's below the Nyquist value for 44100 fps (which is 22050 Hz). But you'll only be able to go up about another octave (doubles everything) with a 10-harmonic wave and little more before your harmonic content goes over the limit and starts aliasing.
Instead of computing multiple sines on the fly, another solution you might consider is to instead create a set of lookup tables (LUTs) for your square wave. To create the values in the table, iterate through and add the values from the sine harmonics that will safely remain under the Nyquist for the range in which you use the given table. I think a table of something like 1024 values to encode a single period could be a good first guess as to what would work.
For example, I am guestimating, but the table for the octave C4-C5 might use 10 harmonics, the table for C5-C6 only 5, the table for C3-C4 might have 20. I can't recall what this strategy/technique is called, but I do recall it has a name, it is an accepted way of dealing with the situation. Depending on how the transitions sound and the amount of high-end content you want, you can use fewer or more LUTs.
There may be other methods to consider. The wikipedia entry on Aliasing describes a technique it refers to as "bandpass" that seems to be intentionally using aliasing. I don't know what that is about or how it relates to the article you cite.
The Soundpipe library has the concept of a frequency table, which is a data structure that holds a precomputed waveform such as a sine. You can initialize the frequency table with the desired waveform and play it through an oscilator. There is even a module named oscmorph which allows you to morph between two or more wavetables.
This is an example of how to generate a sine wave, taken from Soundpipe's documentation.
int main() {
UserData ud;
sp_data *sp;
sp_create(&sp);
sp_ftbl_create(sp, &ud.ft, 2048);
sp_osc_create(&ud.osc);
sp_gen_sine(sp, ud.ft);
sp_osc_init(sp, ud.osc, ud.ft);
ud.osc->freq = 500;
sp->len = 44100 * 5;
sp_process(sp, &ud, write_osc);
sp_ftbl_destroy(&ud.ft);
sp_osc_destroy(&ud.osc);
sp_destroy(&sp);
return 0;
}
I'm starting a project on Python where I need to develop a pitch-detection system, basically what I have to do is to record a sound coming from a guitar string, then Identify which is the tone of that sound.
I have read and searched through websites (including stackoverflow) so I can understand the main ideas of important things like: FFT, Time-domaing, Frecuency-domain, Harmonics, pitch detection algorithms, octave-errors and so on.
After my research I found that I could use HPS (Harmonic Product Spectrum) Algorithm and that algorithm belongs to a frecuency-domain approach, that means that I have to (In general steps):
Record the sound from the guitar (avoid external noises).
Use FFT function so I can transform that audio from a time-domain
to a frecuency-domain (that's what FFT does).
After I get that data (an array) then I have to use HPS so I can
find the highest tone which it will be the tone string sound.
My problem starts in the last step, I have read the ecuation of the HPS and some lectures about that, but I still can't understand it and develop my own function.
Am I missing something or something that I don't understand and I think I do?
I just can't find a way to program my own HPS algorithm.
In the HPS quesion here:
How to get the fundamental frequency using Harmonic Product Spectrum? ,
the number of harmonics considered is 5 (R = 5); and the 5 harmonic spectrums are in hps2 thru hps5 (plus the original FFT spectrum) after downsampling by sequential harmonic ratios.
Then the 5 downsampled spectrums are summed.
Then the entire HPS summing array length is searched to find where the peak or maxima in the summed 5 harmonics is located.
The downsampling and search for the optimal HPS estimate might not be done optimally in that example. But that's a different Q&A (some of which is already in the answers to the above SO question).
I've done this before in few ways (either FFT which is working in Frequency domain or Autocorrelation and AMDF which are working in Time domain).
For me personally Autocorrelation is favourite since it's simple and clear to implement and in your use case, analyzing guitar strings, worked with 100% accuracy. So I can recommend it to you.
I've shared my code before and you can find it fully explained on the following link:
Android: Finding fundamental frequency of audio input
While most of the other questions here are regarding determining how to know which notes comprise a chord, I am asking on a slightly different point.
How would you be able to determine whether a sound played is a single note, or a chord? Ive tried searching for some papers but so far, I have only seen papers tackling how to detect the notes of a chord rather than differentiating whether the sound produced was only a single note or a chord.
Thanks!
You would need to do some kind of pattern matching on the power spectrum. For a single note you will see the fundamental + multiple harmonics, all of which are at integer multiples of the fundamental frequency of course. For a chord, e.g. a simple major chord such as C major, which has notes C, E and G, you'll get 3 fundamentals + harmonics of each. Some of the harmonics from the different fundamentals will coincide (due to the almost rational integer ratio between the notes, which is what makes the chord sound "good"), however there will still be intervals between frequency components which are not just straight multiples, and it's the pattern of these that really determines the nature of the chord. It might be a good candidate for some kind of classifier or neural net.
If it's sound like a chord to you, is because you brain is capable of distinguishing the harmonics of the chord.
So when you listen to a chord from a distance the string will be mixed together for the general note of the chord to be heard, that's like you are compressing the sound from many channels into one.
If you record in a good enough quality you should be able to split your sound into different thresholds determined by the notes you are trying to pick up, i.e drop-d, normal tuning.
Try to do the process in a sound editor before trying to tackle it as a program.
You can find any single frequency of any instrument or even noise by using what's called a Fourier Transform. It is a mathematical process in which frequency folding is performed in order to sort each and every tone in the sample you provide. This is similar as to how scientists study the sun and other stars, looking at all the frequency information to see what elements are in what quantities. In my master's thesis, I used what's called a FFT, or fast Fourier Transfer.
You can separate harmonics from pure tones, and much more with the FFT. You will need to use many FFT iterations because you really can't wait for a FFT to decode 'Stairway to Heaven'. Look locally at smaller snippets.
You can find software to do this for you at many places, and you can check out Wolfram Alpha and similar websites for apps and code to do this.
Without any user interaction, how would a program identify what type of waveform is present in a recording from an ADC?
For the sake of this question: triangle, square, sine, half-sine, or sawtooth waves of constant frequency. Level and frequency are arbitrary, and they will have noise, small amounts of distortion, and other imperfections.
I'll propose a few (naive) ideas, too, and you can vote them up or down.
You definitely want to start by taking an autocorrelation to find the fundamental.
With that, take one period (approximately) of the waveform.
Now take a DFT of that signal, and immediately compensate for the phase shift of the first bin (the first bin being the fundamental, your task will be simpler if all phases are relative).
Now normalise all the bins so that the fundamental has unity gain.
Now compare and contrast the rest of the bins (representing the harmonics) against a set of pre-stored waveshapes that you're interested in testing for. Accept the closest, and reject overall if it fails to meet some threshold for accuracy determined by measurements of the noisefloor.
Do an FFT, find the odd and even harmonic peaks, and compare the rate at which they decrease to a library of common waveform.. peak... ratios.
Perform an autocorrelation to find the fundamental frequency, measure the RMS level, find the first zero-crossing, and then try subtracting common waveforms at that frequency, phase, and level. Whichever cancels out the best (and more than some threshold) wins.
This answer presumes no noise and that this is a simple academic exercise.
In the time domain, take the sample by sample difference of the waveform. Histogram the results. If the distribution has a sharply defined peak (mode) at zero, it is a square wave. If the distribution has a sharply defined peak at a positive value, it is a sawtooth. If the distribution has two sharply defined peaks, one negative and one positive,it is a triangle. If the distribution is broad and is peaked at either side, it is a sine wave.
arm yourself with more information...
I am assuming that you already know that a theoretically perfect sine wave has no harmonic partials (ie only a fundamental)... but since you are going through an ADC you can throw the idea of a theoretically perfect sine wave out the window... you have to fight against aliasing and determining what are "real" partials and what are artifacts... good luck.
the following information comes from this link about csound.
(*) A sawtooth wave contains (theoretically) an infinite number of harmonic partials, each in the ratio of the reciprocal of the partial number. Thus, the fundamental (1) has an amplitude of 1, the second partial 1/2, the third 1/3, and the nth 1/n.
(**) A square wave contains (theoretically) an infinite number of harmonic partials, but only odd-numbered harmonics (1,3,5,7,...) The amplitudes are in the ratio of the reciprocal of the partial number, just as sawtooth waves. Thus, the fundamental (1) has an amplitude of 1, the third partial 1/3, the fifth 1/5, and the nth 1/n.
I think that all of these answers so far are quite bad (including my own previous...)
after having thought the problem through a bit more I would suggest the following:
1) take a 1 second sample of the input signal (doesn't need to be so big, but it simplifies a few things)
2) over the entire second, count the zero-crossings. at this point you have the cps (cycles per second) and know the frequency of the oscillator. (in case that's something you wanted to know)
3) now take a smaller segment of the sample to work with: take precisely 7 zero-crossings worth. (so your work buffer should now, if visualized, look like one of the graphical representations you posted with the original question.) use this small work buffer to perform the following tests. (normalizing the work buffer at this point could make life easier)
4) test for square-wave: zero crossings for a square wave are always very large differences, look for a large signal delta followed by little to no movement until the next zero crossing.
5) test for saw-wave: similar to square-wave, but a large signal delta will be followed by a linear constant signal delta.
6) test for triangle-wave: linear constant (small) signal deltas. find the peaks, divide by the distance between them and calculate what the triangle wave should look like (ideally) now test the actual signal for deviance. set a deviance tolerance threshold and you can determine whether you are looking at a triangle or a sine (or something parabolic).
First find the base frequency and the phase. You can do that with FFT. Normalize the sample. Then subtract each sample with the sample of the waveform you want to test (same frequency and same phase). Square the result add it all up and divide it by the number of samples. The smallest number is the waveform you seek.
Wikipedia's Wavelet article contains this text:
The discrete wavelet transform is also less computationally complex, taking O(N) time as compared to O(N log N) for the fast Fourier transform. This computational advantage is not inherent to the transform, but reflects the choice of a logarithmic division of frequency, in contrast to the equally spaced frequency divisions of the FFT.
Does this imply that there's also an FFT-like algorithm that uses a logarithmic division of frequency instead of linear? Is it also O(N)? This would obviously be preferable for a lot of applications.
Yes. Yes. No.
It is called the Logarithmic Fourier Transform. It has O(n) time. However it is useful for functions which decay slowly with increasing domain/abscissa.
Referring back the wikipedia article:
The main difference is that wavelets
are localized in both time and
frequency whereas the standard Fourier
transform is only localized in
frequency.
So if you can be localized only in time (or space, pick your interpretation of the abscissa) then Wavelets (or discrete cosine transform) are a reasonable approach. But if you need to go on and on and on, then you need the fourier transform.
Read more about LFT at http://homepages.dias.ie/~ajones/publications/28.pdf
Here is the abstract:
We present an exact and analytical expression for the Fourier transform of a function that has been sampled logarithmically. The procedure is significantly more efficient computationally than the fast Fourier transformation (FFT) for transforming functions or measured responses which decay slowly with increasing abscissa value. We illustrate the proposed method with an example from electromagnetic geophysics, where the scaling is often such that our logarithmic Fourier transform (LFT) should be applied. For the example chosen, we are able to obtain results that agree with those from an FFT to within 0.5 per cent in a time that is a factor of 1.0e2 shorter. Potential applications of our LFT in geophysics include conversion of wide-band electromagnetic frequency responses to transient responses, glacial loading and unloading,
aquifer recharge problems, normal mode and earth tide studies in seismology, and impulsive shock wave modelling.
EDIT: After reading up on this I think this algorithm is not really useful for this question, I will give a description anyway for other readers.
There is also the Filon's algorithm a method based on Filon's qudrature which can be found in Numerical Recipes this [PhD thesis][1].
The timescale is log spaced as is the resulting frequeny scale.
This algorithm is used for data/functions which decayed to 0 in the observed time interval (which is probably not your case), a typical simple example would be an exponential decay.
If your data is noted by points (x_0,y_0),(x_1,y_1)...(x_i,y_i) and you want to calculate the spectrum A(f) where f is the frequency from lets say f_min=1/x_max to f_max=1/x_min
log spaced.
The real part for each frequency f is then calculated by:
A(f) = sum from i=0...i-1 { (y_i+1 - y_i)/(x_i+1 - x_i) * [ cos(2*pi*f * t_i+1) - cos(2*pi*f*t_i) ]/((2*pi*f)^2) }
The imaginary part is:
A(f) = y_0/(2*pi*f) + sum from i=0...i-1 { (y_i+1 - y_i)/(x_i+1 - x_i) * [ sin(2*pi*f * t_i+1) - sin(2*pi*f*t_i) ]/((2*pi*f)^2) }
[1] Blochowicz, Thomas: Broadband Dielectric Spectroscopy in Neat and Binary Molecular Glass Formers. University of Bayreuth, 2003, Chapter 3.2.3
To do what you want, you need to measure different time Windows, which means lower frequencies get update least often (inversely proportional to powers of 2).
Check FPPO here:
https://www.rationalacoustics.com/files/FFT_Fundamentals.pdf
This means that higher frequencies will update more often, but you always average (moving average is good), but can also let it move faster. Of course, if plan on using the inverse FFT, you don't want any of this. Also, to have better accuracy (smaller bandwidth) at lower frequencies, means these need to update much more slowly, like 16k Windows (1/3 m/s).
Yeah, a low frequency signal naturally travels slowly, and thus of course, you need a lot of time to detect them. This is not a problem that math can fix. It's a natural trade of, and you can't have high accuracy a lower frequency and fast response.
I think the link I provide will clarify some of your options...7 years after you asked the question, unfortunately.