I'm trying to write an audio application.
I can play a cin wave from a frequency of 20 to 20K to hear sounds. my question is how can i convert frequencies to keyboard notes in order to create a virtual keyboard (or piano) ? is there some kind of formula to achieve this ?
The programming language that I use is not important because I don't want to use other tools that calculate it for me. i want to write it myself so i need to understand the math behind it. thanks
update
i found the following url: http://www.reverse-engineering.info/Audio/bwl_eq_info.pdf
that contains the octave prequency chart. do i need to store that list or is there a formula that can be used instead ?
There are a few different ways to tune instruments. The most commonly used for pianos is the 12 tone equal temperament, a formula for which can be found here. The idea is that each pair of adjacent notes has the same frequency ratio.
See also equal temperament on Wikipedia.
You can calculate frequency of a tone as
f = 440 * exp(x*ln(2)/12)
where x is number of semitones above A in the middle of the piano keyboard.
First, you need to know about A440. This is the "standard" pitch to tune everything else against.
Double the frequency to raise an octave; halve the frequency to drop an octave. It's clear from this that the tones are logarithmic relative to the frequencies.
There are multiple systems for deciding where on the logarithmic line the rest of the notes fall. A straightforward approach is to divide the semitones geometrically along the logarithmic scale (which is the approach xofon's answer uses), but there may be better ways.
full reference of P2F F2P conversion functions. i use 69 instead of 57 though.
http://musicdsp.org/showone.php?id=125
Related
For a project of mine I am working with sampled sound generation and I need to create various waveforms at various frequencies. When the waveform is sinusoidal, everything is fine, but when the waveform is rectangular, there is trouble: it sounds as if it came from the eighties, and as the frequency increases, the notes sound wrong. On the 8th octave, each note sounds like a random note from some lower octave.
The undesirable effect is the same regardless of whether I use either one of the following two approaches:
The purely mathematical way of generating a rectangular waveform as sample = sign( secondsPerHalfWave - (timeSeconds % secondsPerWave) ) where secondsPerWave = 1.0 / wavesPerSecond and secondsPerHalfWave = secondsPerWave / 2.0
My preferred way, which is to describe one period of the wave using line segments and to interpolate along these lines. So, a rectangular waveform is described (regardless of sampling rate and regardless of frequency) by a horizontal line from x=0 to x=0.5 at y=1.0, followed by another horizontal line from x=0.5 to x=1.0 at y=-1.0.
From what I gather, the literature considers these waveform generation approaches "naive", resulting in "aliasing", which is the cause of all the undesirable effects.
What this all practically translates to when I look at the generated waveform is that the samples-per-second value is not an exact multiple of the waves-per-second value, so each wave does not have an even number of samples, which in turn means that the number of samples at level 1.0 is often not equal to the number of samples at level -1.0.
I found a certain solution here: https://www.nayuki.io/page/band-limited-square-waves which even includes source code in Java, and it does indeed sound awesome: all undesirable effects are gone, and each note sounds pure and at the right frequency. However, this solution is entirely unsuitable for me, because it is extremely computationally expensive. (Even after I have replaced sin() and cos() with approximations that are ten times faster than Java's built-in functions.) Besides, when I look at the resulting waveforms they look awfully complex, so I wonder whether they can legitimately be called rectangular.
So, my question is:
What is the most computationally efficient method for the generation of periodic waveforms such as the rectangular waveform that does not suffer from aliasing artifacts?
Examples of what the solution could entail:
The computer audio problem of generating correct sample values at discrete time intervals to describe a sound wave seems to me somewhat related to the computer graphics problem of generating correct integer y coordinates at discrete integer x coordinates for drawing lines. The Bresenham line generation algorithm is extremely efficient, (even if we disregard for a moment the fact that it is working with integer math,) and it works by accumulating a certain error term which, at the right time, results in a bump in the Y coordinate. Could some similar mechanism perhaps be used for calculating sample values?
The way sampling works is understood to be as reading the value of the analog signal at a specific, infinitely narrow point in time. Perhaps a better approach would be to consider reading the area of the entire slice of the analog signal between the last sample and the current sample. This way, sampling a 1.0 right before the edge of the rectangular waveform would contribute a little to the sample value, while sampling a -1.0 considerable time after the edge would contribute a lot, thus naturally yielding a point which is between the two extreme values. Would this solve the problem? Does such an algorithm exist? Has anyone ever tried it?
Please note that I have posted this question here as opposed to dsp.stackexchange.com because I do not want to receive answers with preposterous jargon like band-limiting, harmonics and low-pass filters, lagrange interpolations, DC compensations, etc. and I do not want answers that come from the purely analog world or the purely theoretical outer space and have no chance of ever receiving a practical and efficient implementation using a digital computer.
I am a programmer, not a sound engineer, and in my little programmer's world, things are simple: I have an array of samples which must all be between -1.0 and 1.0, and will be played at a certain rate (44100 samples per second.) I have arithmetic operations and trigonometric functions at my disposal, I can describe lines and use simple linear interpolation, and I need to generate the samples extremely efficiently because the generation of a dozen waveforms simultaneously and also the mixing of them together may not consume more than 1% of the total CPU time.
I'm not sure but you may have a few of misconceptions about the nature of aliasing. I base this on your putting the term in quotes, and from the following quote:
What this all practically translates to when I look at the generated
waveform is that the samples-per-second value is not an exact multiple
of the waves-per-second value, so each wave does not have an even
number of samples, which in turn means that the number of samples at
level 1.0 is often not equal to the number of samples at level -1.0.
The samples/sec and waves/sec don't have to be exact multiples at all! One can play back all pitches below the Nyquist. So I'm not clear what your thinking on this is.
The characteristic sound of a square wave arises from the presence of odd harmonics, e.g., with a note of 440 (A5), the square wave sound could be generated by combining sines of 440, 1320, 2200, 3080, 3960, etc. progressing in increments of 880. This begs the question, how many odd harmonics? We could go to infinity, theoretically, for the sharpest possible corner on our square wave. If you simply "draw" this in the audio stream, the progression will continue well beyond the Nyquist number.
But there is a problem in that harmonics that are higher than the Nyquist value cannot be accurately reproduced digitally. Attempts to do so result in aliasing. So, to get as good a sounding square wave as the system is able to produce, one has to avoid the higher harmonics that are present in the theoretically perfect square wave.
I think the most common solution is to use a low-pass filtering algorithm. The computations are definitely more cpu-intensive than just calculating sine waves (or doing FM synthesis, which was my main interest). I am also weak on the math for DSP and concerned about cpu expense, and so, avoided this approach for long time. But it is quite viable and worth an additional look, imho.
Another approach is to use additive synthesis, and include as many sine harmonics as you need to get the tonal quality you want. The problem then is that the more harmonics you add, the more computation you are doing. Also, the top harmonics must be kept track of as they limit the highest note you can play. For example if using 10 harmonics, the note 500Hz would include content at 10500 Hz. That's below the Nyquist value for 44100 fps (which is 22050 Hz). But you'll only be able to go up about another octave (doubles everything) with a 10-harmonic wave and little more before your harmonic content goes over the limit and starts aliasing.
Instead of computing multiple sines on the fly, another solution you might consider is to instead create a set of lookup tables (LUTs) for your square wave. To create the values in the table, iterate through and add the values from the sine harmonics that will safely remain under the Nyquist for the range in which you use the given table. I think a table of something like 1024 values to encode a single period could be a good first guess as to what would work.
For example, I am guestimating, but the table for the octave C4-C5 might use 10 harmonics, the table for C5-C6 only 5, the table for C3-C4 might have 20. I can't recall what this strategy/technique is called, but I do recall it has a name, it is an accepted way of dealing with the situation. Depending on how the transitions sound and the amount of high-end content you want, you can use fewer or more LUTs.
There may be other methods to consider. The wikipedia entry on Aliasing describes a technique it refers to as "bandpass" that seems to be intentionally using aliasing. I don't know what that is about or how it relates to the article you cite.
The Soundpipe library has the concept of a frequency table, which is a data structure that holds a precomputed waveform such as a sine. You can initialize the frequency table with the desired waveform and play it through an oscilator. There is even a module named oscmorph which allows you to morph between two or more wavetables.
This is an example of how to generate a sine wave, taken from Soundpipe's documentation.
int main() {
UserData ud;
sp_data *sp;
sp_create(&sp);
sp_ftbl_create(sp, &ud.ft, 2048);
sp_osc_create(&ud.osc);
sp_gen_sine(sp, ud.ft);
sp_osc_init(sp, ud.osc, ud.ft);
ud.osc->freq = 500;
sp->len = 44100 * 5;
sp_process(sp, &ud, write_osc);
sp_ftbl_destroy(&ud.ft);
sp_osc_destroy(&ud.osc);
sp_destroy(&sp);
return 0;
}
Hello kind people of the audio computing world,
I have an array of samples that respresent a recording. Let us say that it is 5 seconds at 44100Hz. How would I play this back at an increased pitch? And is it possible to increase and decrease the pitch dynamically? Like have the pitch slowly increase to double the speed and then back down.
In other words I want to take a recording and play it back as if it is being 'scratched' by a d.j.
Pseudocode is always welcomed. I will be writing this up in C.
Thanks,
EDIT 1
Allow me to clarify my intentions. I want to keep the playback at 44100Hz and so therefore I need to manipulate the samples before playback. This is also because I would want to mix the audio that has an increased pitch with audio that is running at a normal rate.
Expressed in another way, maybe I need to shrink the audio over the same number of samples somehow? That way when it is played back it will sound faster?
EDIT 2
Also, I would like to do this myself. No libraries please (unless you feel I could pick through the code and find something interesting).
EDIT 3
A sample piece of code written in C that takes 2 arguments (array of samples and pitch factor) and then returns an array of the new audio would be fantastic!
PS I've started a bounty on this not because I don't think the answers already given aren't valid. I just thought it would be good to get more feedback on the subject.
AWARD OF BOUNTY
Honestly I wish I could distribute the bounty over several different answers as they were quite a few that I thought were super helpful. Special shoutout to Daniel for passing me some code and AShelly and Hotpaw2 for putting in such detailed responses.
Ultimately though I used an answer from another SO question referenced by datageist and so the award goes to him.
Thanks again everyone!
Take a look at the "Elephant" paper in Nosredna's answer to this (very similar) SO question:
How do you do bicubic (or other non-linear) interpolation of re-sampled audio data?
Sample implementations are provided starting on page 37, and for reference, AShelly's answer corresponds to linear interpolation (on that same page). With a little tweaking, any of the other formulas in the paper could be plugged into that framework.
For evaluating the quality of a given interpolation method (and understanding the potential problems with using "cheaper" schemes), take a look at this page:
http://www.discodsp.com/highlife/aliasing/
For more theory than you probably want to deal with (with source code), this is a good reference as well:
https://ccrma.stanford.edu/~jos/resample/
One way is to keep a floating point index into the original wave, and mix interpolated samples into the output wave.
//Simulate scratching of `inwave`:
// `rate` is the speedup/slowdown factor.
// result mixed into `outwave`
// "Sample" is a typedef for the raw audio type.
void ScratchMix(Sample* outwave, Sample* inwave, float rate)
{
float index = 0;
while (index < inputLen)
{
int i = (int)index;
float frac = index-i; //will be between 0 and 1
Sample s1 = inwave[i];
Sample s2 = inwave[i+1];
*outwave++ += s1 + (s2-s1)*frac; //do clipping here if needed
index+=rate;
}
}
If you want to change rate on the fly, you can do that too.
If this creates noisy artifacts when rate > 1, try replacing *outwave++ += s1 + (s2-s1)*frac; with this technique (from this question)
*outwave++ = InterpolateHermite4pt3oX(inwave+i-1,frac);
where
public static float InterpolateHermite4pt3oX(Sample* x, float t)
{
float c0 = x[1];
float c1 = .5F * (x[2] - x[0]);
float c2 = x[0] - (2.5F * x[1]) + (2 * x[2]) - (.5F * x[3]);
float c3 = (.5F * (x[3] - x[0])) + (1.5F * (x[1] - x[2]));
return (((((c3 * t) + c2) * t) + c1) * t) + c0;
}
Example of using the linear interpolation technique on "Windows Startup.wav" with a factor of 1.1. The original is on top, the sped-up version is on the bottom:
It may not be mathematically perfect, but it sounds like it should, and ought to work fine for the OP's needs..
Yes, it is possible.
But this is not a small amount of pseudo code. You are asking for a time pitch modification algorithm, which is a fairly large and complicated amount of DSP code for decent results.
Here's a Time Pitch stretching overview from DSP Dimensions. You can also Google for phase vocoder algorithms.
ADDED:
If you want to "scratch", as a DJ might do with an LP on a physical turntable, you don't need time-pitch modification. Scratching changes the pitch and the speed of play by the same amount (not independently as would require time-pitch modification).
And the resulting array won't be of the same length, but will be shorter or longer by the amont of pitch/speed change.
You can change the pitch, as well as make the sound play faster or slower by the same ratio, by just resampling the signal using properly filtered interpolation. Just move each sample point, instead of by 1.0, by floating point addition by your desired rate change, then filter and interpolate the data at that point. Interpolation using a windowed Sinc interpolation kernel, with a low-pass filter transition frequency below the lower of the original and interpolated local sample rate, will work fairly well. Searching for "windowed Sinc interpolation" on the web returns lots of suitable result.
You need an interpolation method that includes a low-pass filter, or else you will hear horrible aliasing noise. (The exception to this might be if your original sound file is already severely low-pass filtered a decade or more below the sample rate.)
If you want this done easily, see AShelly's suggestion [edit: as a matter of fact, try it first anyway]. If you need good quality, you basically need a phase vocoder.
The very basic idea of a phase vocoder is to find the frequencies that the sound consists of, change those frequencies as needed and resynthesize the sound. So a brutal simplification would be:
run FFT
change all frequencies by a factor
run inverse FFT
If you're going to implement this yourself, you definitely should read a thorough explanation of how a phase vocoder works. The algorithm really needs many more considerations than the three-step simplification above.
Of course, ready-made implementations exist, but from the question I gather you want to do this yourself.
To decrease and increase the pitch is as simple as playing the sample back at a lower or higher rate than 44.1kHz. This will produce the slower/faster record sound but you'll need to add the 'scratchiness' of real records.
This helped me with resampling, which is same thing you need just looked from the opposite side.
If you can't find code, ping me, I have a nice C routine for this.
I'm implementing a 'filter sweep' effect (I don't know if it's called like that). What I do is basically create a low-pass filter and make it 'move' along a certain frequency range.
To calculate the filter cut-off frequency at a given moment I use a user-provided linear function, which yields values between 0 and 1.
My first attempt was to directly map the values returned by the linear function to the range of frequencies, as in cf = freqRange * lf(x). Although it worked ok it looked as if the sweep ran much faster when moving through low frequencies and then slowed down during its way to the high frequency zone. I'm not sure why is this but I guess it's something to do with human hearing perceiving changes in frequency in a non-linear manner.
My next attempt was to move the filter's cut-off frequency in a logarithmic way. It works much better now but I still feel that the filter doesn't move at a constant perceived speed through the range of frequencies.
How should I divide the frequency space to obtain a constant perceived sweep speed?
Thanks in advance.
The frequency sweep effect you're referring to is likely a wah-wah filter, named for the ubiquitous wah-wah pedal.
We hear frequency in terms of octaves, and sweeping through octaves with a logarithmic scale is the way to linearize it. Not to sound dismissive, but it sounds like what you're doing is physically and mathematically correct. (You should spent as much time between 200 and 400 Hz as you do between 2000 and 4000 Hz, etc.) You just don't like how it sounds. And that's quite okay on both counts -- audio is highly subjective.
To mix things up a bit, one option would be to try the Bark scale, which is based on psychoacoustics and the structure of the ear. As I understand it, this is designed to spend equal amounts of time in each of your ear's internal "bandpass filters".
You could always try a quadratic or cubic function between 0 and 1. Audio potentiometers often use a few piecewise quadratic or cubic sections to get their mapping.
Winging it, but try this:
http://en.wikipedia.org/wiki/Physics_of_music#Scales "The following table shows the ratios between the frequencies of all the notes of the just major scale and the fixed frequency of the first note of the scale."
There is then a chart showing fractional values between 1 and 2, and if you tweak your timing to match, you may get what you wish. While the overall progression is still logarithmic, the stepping between each one should divide up into equal stepped 8ths (a bit jumpy).
Put another way, every half second adjust one note up. Each octave (I think) will cover twice the frequency range of the prior octave.
EDIT: Also, you'll find the frequencies here: http://en.wikipedia.org/wiki/Middle_C#Designation_by_octave (doesn't the programmer in you wish that C0 was exactly 16hz?)
As a follow-up to my previous question, if I want my smartphone application to detect a certain musical note, and I only need to know whether the incoming sound is that musical note or not, with a certain amount of fuzziness, to allow the note to be off-key by x cents.
Given that, is there a superior method over others for speed and accuracy? That is, by knowing that the note you are looking for is, say, a #C3, how best to tell if that note is present or not? I'm assuming that looking for a single note would be easier than separating out all waveforms, and then looking at the results for the fundamental frequency.
In the responses to my original question, one respondent suggested that autocorrelation might work well if you know that the notes are within a certain range. I wonder if autocorrelation would then work even better, if you only have to check for the presence or absence of a certain note (+/- x cents).
Those methods being:
Kiss FFT
FFTW
Discrete Wavelet Transform
autocorrelation
zero crossing analysis
octave-spaced filters
DWT
Any thoughts would be appreciated.
As you describe it, you just need to determine if a particular pitch is present. A very simple (fast) detector would just record the equivalent of one period of the waveform, then record another period and correlate them, like an oversimplified (single-lag) autocorrelation. If there's a high match, you know the waveform being recorded is repeating at around the same period, or a harmonic of it.
For instance, to detect 1 kHz, record 1 ms of audio (48 samples at 48 kHz), then record another 1 ms, and compare them (correlate = multiply all samples and sum). If they line up (correlation above some threshold), then you're listening to 1 kHz, 2 kHz, 3 kHz, or some other multiple. Doing several periods would give you more confidence on the match.
A true autocorrelation would tell you which harmonic, specifically, if that's important to you.
I've got a 44Khz audio stream from a CD, represented as an array of 16 bit PCM samples. I'd like to cut it down to an 11KHz stream. How do I do that? From my days of engineering class many years ago, I know that the stream won't be able to describe anything over 5500Hz accurately anymore, so I assume I want to cut everything above that out too. Any ideas? Thanks.
Update: There is some code on this page that converts from 48KHz to 8KHz using a simple algorithm and a coefficient array that looks like { 1, 4, 12, 12, 4, 1 }. I think that is what I need, but I need it for a factor of 4x rather than 6x. Any idea how those constants are calculated? Also, I end up converting the 16 byte samples to floats anyway, so I can do the downsampling with floats rather than shorts, if that helps the quality at all.
Read on FIR and IIR filters. These are the filters that use a coefficent array.
If you do a google search on "FIR or IIR filter designer" you will find lots of software and online-applets that does the hard job (getting the coefficients) for you.
EDIT:
This page here ( http://www-users.cs.york.ac.uk/~fisher/mkfilter/ ) lets you enter the parameters of your filter and will spit out ready to use C-Code...
You're right in that you need apply lowpass filtering on your signal. Any signal over 5500 Hz will be present in your downsampled signal but 'aliased' as another frequency so you'll have to remove those before downsampling.
It's a good idea to do the filtering with floats. There are fixed point filter algorithms too but those generally have quality tradeoffs to work. If you've got floats then use them!
Using DFT's for filtering is generally overkill and it makes things more complicated because dft's are not a contiuous process but work on buffers.
Digital filters generally come in two tastes. FIR and IIR. The're generally the same idea but IIF filters use feedback loops to achieve a steeper response with far less coefficients. This might be a good idea for downsampling because you need a very steep filter slope there.
Downsampling is sort of a special case. Because you're going to throw away 3 out of 4 samples there's no need to calculate them. There is a special class of filters for this called polyphase filters.
Try googling for polyphase IIR or polyphase FIR for more information.
Notice (in additions to the other comments) that the simple-easy-intuitive approach "downsample by a factor of 4 by replacing each group of 4 consecutive samples by the average value", is not optimal but is nevertheless not wrong, nor practically nor conceptually. Because the averaging amounts precisely to a low pass filter (a rectangular window, which corresponds to a sinc in frequency). What would be conceptually wrong is to just downsample by taking one of each 4 samples: that would definitely introduce aliasing.
By the way: practically any software that does some resampling (audio, image or whatever; example for the audio case: sox) takes this into account, and frequently lets you choose the underlying low-pass filter.
You need to apply a lowpass filter before you downsample the signal to avoid "aliasing". The cutoff frequency of the lowpass filter should be less than the nyquist frequency, which is half the sample frequency.
The "best" solution possible is indeed a DFT, discarding the top 3/4 of the frequencies, and performing an inverse DFT, with the domain restricted to the bottom 1/4th. Discarding the top 3/4ths is a low-pass filter in this case. Padding to a power of 2 number of samples will probably give you a speed benefit. Be aware of how your FFT package stores samples though. If it's a complex FFT (which is much easier to analyze, and generally has nicer properties), the frequencies will either go from -22 to 22, or 0 to 44. In the first case, you want the middle 1/4th. In the latter, the outermost 1/4th.
You can do an adequate job by averaging sample values together. The naïve way of grabbing samples four by four and doing an equal weighted average works, but isn't too great. Instead you'll want to use a "kernel" function that averages them together in a non-intuitive way.
Mathwise, discarding everything outside the low-frequency band is multiplication by a box function in frequency space. The (inverse) Fourier transform turns pointwise multiplication into a convolution of the (inverse) Fourier transforms of the functions, and vice-versa. So, if we want to work in the time domain, we need to perform a convolution with the (inverse) Fourier transform of box function. This turns out to be proportional to the "sinc" function (sin at)/at, where a is the width of the box in the frequency space. So at every 4th location (since you're downsampling by a factor of 4) you can add up the points near it, multiplied by sin (a dt) / a dt, where dt is the distance in time to that location. How nearby? Well, that depends on how good you want it to sound. It's common to ignore everything outside the first zero, for instance, or just take the number of points to be the ratio by which you're downsampling.
Finally there's the piss-poor (but fast) way of just discarding the majority of the samples, keeping just the zeroth, the fourth, and so on.
Honestly, if it fits in memory, I'd recommend just going the DFT route. If it doesn't use one of the software filter packages that others have recommended to construct the filter for you.
The process you're after called "Decimation".
There are 2 steps:
Applying Low Pass Filter on the data (In your case LPF with Cut Off at Pi / 4).
Downsampling (In you case taking 1 out of 4 samples).
There are many methods to design and apply the Low Pass Filter.
You may start here:
http://en.wikipedia.org/wiki/Filter_design
You could make use of libsamplerate to do the heavy lifting. Libsamplerate is a C API, and takes care of calculating the filter coefficients. You to select from different quality filters so that you can trade off quality for speed.
If you would prefer not to write any code, you could just use Audacity to do the sample rate conversion. It offers a powerful GUI, and makes use of libsamplerate for it's sample rate conversion.
I would try applying DFT, chopping 3/4 of the result and applying inverse DFT. I can't tell if it will sound good without actually trying tough.
I recently came across BruteFIR which may already do some of what you're interested in?
You have to apply low-pass filter (removing frequencies above 5500 Hz) and then apply decimation (leave every Nth sample, every 4th in your case).
For decimation, FIR, not IIR filters are usually employed, because they don't depend on previous outputs and therefore you don't have to calculate anything for discarded samples. IIRs, generally, depends on both inputs and outputs, so, unless a specific type of IIR is used, you'd have to calculate every output sample before discarding 3/4 of them.
Just googled an intro-level article on the subject: https://www.dspguru.com/dsp/faqs/multirate/decimation