Creating motor sound via FFT? - audio

There is a idle tone of a car. I want to make accelerating and deccelerating that sound by changing ffts.
How can achieve this. I only know C and little bit C++.

start with your idle sound array of raw audio [array1] ( this is the payload of a WAV file in PCM format of the car idling ) which will be in the time domain
feed this array1 into a FFT call which will return a new array [array2] which will be the frequency domain representation of the same underlying data as array1 ... where in this new array array2, element zero represents zero Hertz and the freq increment (incr_freq), separating each element, is defined by source sound array1 parms as per
incr_freq := sample_rate / number_of_samples_in_array1
... value of each element of array2 will be a complex number from which you can calc the magnitude and phase of the given freq ... to be clear frequency with regard to array2 is derived based on element position starting with element zero which is the DC bias and can be ignored ... knowing the frequency increment above (incr_freq) lets show the first few elements of array2
complex_number_0 := array2[0] // element 0 DC bias ignore this element
// its frequency_0 = 0
complex_number_1 := array2[1] // element 1
// its at frequency1 = frequency_0 + incr_freq
complex_number_2 := array2[2] // element 2
// its at frequency2 = frequency_1 + incr_freq
now identify the top X magnitudes in array2 (nugget1) ... these are the dominant frequencies which are most responsible to capture the essence of the car sound ... we save for later the element value of array2 for the X elements ... we calc magnitude using below which is inside loop across all elements of array2
for index_fft, curr_complex := range complex_fft {
curr_real = real(curr_complex) // pluck out real portion of imaginary number
curr_imag = imag(curr_complex) // ditto for imaginary part of complex number
curr_mag = 2.0 * math.Sqrt(curr_real*curr_real+curr_imag*curr_imag) / number_of_samples_array2
// ... more goodness here
}
now feed the array2 FFT array into an inverse FFT call which will return an array [array3] once again in the time domain ( raw audio )
if you do not alter the data of array2 your array3 will be the same (to a first approximation) as array1 ... now jack up your array2 to impart the acc or dec before sending it into that inverse FFT call
the secret sauce of how to alter array2 is left as an exercise ... my guess is put into a loop your synthesis of array3 from array2 which gets immediately rendered to your speakers inside this loop (loop_secret_sauce) ... where you increment (acc) or decrement(dec) the top X frequencies as identified above as nugget1 ... meaning as a whole shift the entire set of freq of all of the top X frequencies as defined by their magnitude ... give a non linear aspect to this shift of the set of X frequencies ... possibly not only increment or dec the freq of this set but also muck about with their relative magnitudes as well as introduce a wider swatch of frequencies in this loop
to give yourself traction in making this secret sauce use as array1 several different recordings of the car when its at idle or acc or dec and compare its array2 and use the diff between idle, acc, dec inside this sauce loop
here we drill down on mechanics ... when source audio is idle we iterate across its array2 and identify the top X elements of array2 with the greatest magnitude ... these X elements of array2 get saved into array top_mag_idle ... do same for source audio of acc and save in array top_mag_acc ... critical step ... examine difference between the elements stored in top_mag_idle versus top_mag_acc ... this transition between elements of top_mag_idle into top_mag_acc is your secret sauce which you will put into loop_secret_sauce ... to get concrete here when you loop across loop_secret_sauce and update array2 elements to reflect top_mag_idle the audio will sound idle over time when you continue looping across array2 to synthesize array3 and transition to updating array2 elements to reflect top_mag_acc the sound will be of an accelerating car
perhaps to gain intuition on the secret sauce consider this ... imagine listening to a car on idle ... as with any complex system which generates audio it will have a set of dominant frequencies meaning there are a set of say 5 different frequencies with the greatest magnitude ( loudest freqs ) ... similar to a pianist playing a cord on a piano where the shape of her hand and fingers remain static yet she is repeatedly tapping the keyboard ... now the car starts to accelerate ... the analogy here is she continues to repeatedly tap the keyboard with that same static hand and finger layout yet now she slides her hand up to the right along the keyboard as she continues to tap the keyboard ... in your code inside loop_secret_sauce the original set of freqs (top_mag_idle) will generate the idle car sound when you synthesize array3 from array2 ... then to implement acc you increment in unison all freqs in top_mag_idle and repeat synthesis of array3 from array2 this will give you the acc sound
until you get this working I would only use mono ( one channel not stereo )
sounds like an interesting project ... have fun !!!

Related

How to fix the issue of plotting a 2D sine wave in python

I want to generate 2D travelling sine wave. To do this, I've set the parameters for the plane wave and generate wave for any time instants like as follows:
import numpy as np
import random
import matplotlib.pyplot as plt
f = 10 # frequency
fs = 100 # sample frequency
Ts = 1/fs # sample period
t = np.arange(0,0.5, Ts) # time index
c = 50 # speed of wave
w = 2*np.pi *f # angular frequency
k = w/c # wave number
resolution = 0.02
x = np.arange(-5, 5, resolution)
y = np.arange(-5, 5, resolution)
dx = np.array(x); M = len(dx)
dy = np.array(y); N = len(dy)
[xx, yy] = np.meshgrid(x, y);
theta = np.pi / 4 # direction of propagation
kx = k* np.cos(theta)
ky = k * np.sin(theta)
So, the plane wave would be
plane_wave = np.sin(kx * xx + ky * yy - w * t[1])
plt.figure();
plt.imshow(plane_wave,cmap='seismic',origin='lower', aspect='auto')
that gives a smooth plane wave as shown in . Also, the sine wave variation with plt.figure(); plt.plot(plane_wave[2,:]) time is given in .
However, when I want to append plane waves at different time instants then there is some discontinuity arises in figure 03 & 04 , and I want to get rid of from this problem.
I'm new in python and any help will be highly appreciated. Thanks in advance.
arr = []
for count in range(len(t)):
p = np.sin(kx * xx + ky * yy - w * t[count]); # plane wave
arr.append(p)
arr = np.array(arr)
print(arr.shape)
pp,q,r = arr.shape
sig = np.reshape(arr, (-1, r))
print('The signal shape is :', sig.shape)
plt.figure(); plt.imshow(sig.transpose(),cmap='seismic',origin='lower', aspect='auto')
plt.xlabel('X'); plt.ylabel('Y')
plt.figure(); plt.plot(sig[2,:])
This is not that much a problem of programming. It has to do more with the fact that you are using the physical quantities in a somewhat unusual way. Your plots are absolutely fine and correct.
What you seem to have misunderstood is the fact that you are talking about a 2D problem with a third dimension added for time. This is by no means wrong but if you try to append the snapshot of the 2D wave side-by-side you are using (again) the x spatial dimension to represent temporal variations. This leads to an inconsistency of the use of that coordinate axis. Now, to make this more intuitive, consider the two time instances separately. Does it not coincide with your intuition that all points on the 2D plane must have different amplitudes (unless of course the time has progressed by a multiple of the period of the wave)? This is the case indeed. Thus, when you try to append the two snapshots, a discontinuity is exhibited. In order to avoid that you have to either use a time step equal to one period, which I believe is of no practical use, or a constant time step that will make the phase of the wave on the left border of the image in the current time equal to the phase of the wave on the right border of the image in the previous time step. Yet, this will always be a constant time step, alternating the phase (on the edges of the image) between the two said values.
The same applies to the 1D case because you use the two coordinate axes to represent the wave (x is the x spatial dimension and y is used to represent the amplitude). This is what can be seen in your last plot.
Now, what would be the solution you may ask. The solution is provided by simple inspection of the mathematical formula of the wave function. In 2D, it is a scalar function of three variables (that is, takes as input three values and outputs one) and so you need at least four dimensions to represent it. Alas, we can't perceive a fourth spatial dimension, but this is not a problem in your case as the output of the function is represented with colors. Then there are three dimensions that could be used to represent the temporal evolution of your function. All you have to do is to create a 3D array where the third dimension represents time and all 2D snapshots will be stored in the first two dimensions.
When it comes to visual representation of the results you could either use some kind of waterfall plots where the z-axis will represent time or utilize the fourth dimension we can perceive, time that is, to create an animation of the evolution of the wave.
I am not very familiar with Python, so I will only provide a generic naive implementation. I am sure a lot of people here could provide some simplification and/or optimisation of the following snippet. I assume that everything in your first two blocks of code is available so changes have to be done only in the last block you present
arr = np.zeros((len(xx), len(yy), len(t))) # Initialise the array to hold the temporal evolution of the snapshots
for i in range(len(t)):
arr[:, :, i] = np.sin(kx * xx + ky * yy - w * t[i])
# Below you can plot the figures with any function you prefer or make an animation out of it

display signal from stereo mics

Are these 2 Audio Signals of Stereo?
1st Signal2nd Signal
hard to tell ... would be easier to identify whether those are stereo if both curves were plotted together with different color curves on same plot and zoomed in so you can see if the curves have similar shapes though slightly different ... if you create one loop to iterate across each point of the curve and inside this loop print out the sum of curve1 - curve2 on a per point basis then if the values of each of these sum values are close to zero then both curves are very similar and likely are stereo curves of same source sound
// array1 holds all points of your signal 1
// array2 holds all points of your signal 2
size_array = length(array1)
for curr_index = 0; curr_index < size_array; curr_index++ {
curr_sum = array1[curr_index] - array2[curr_index] // inverts array2
print $curr_sum
}
if both signals were identical above list of curr_sum would show value zero ( which means your signal is mono just copied into two channels ) ... if signals are stereo then curr_sum will be somewhat close to zero depending on degree of stereo separation between both microphones

Use of fft on sound using Julia

I have some code in Julia I've just wrote:
using FFTW
using Plots
using WAV, PlotlyJS
snd, sampFreq = wavread("input.wav")
N, _ = size(snd)
t = 0:1/(N-1):1;
s = snd[:,1]
y = fft(s)
y1 = copy(y)
for i = 1:N
if abs(y1[i]) > 800
y1[i] = 0
end
end
s_new = real(ifft(y1))
wavwrite(s_new, "output1.wav", Fs = sampFreq)
y2 = copy(y)
for i = 1:N
if abs(y2[i]) < 800
y2[i] = 0
end
end
s_new = real(ifft(y2))
wavwrite(s_new, "output2.wav", Fs = sampFreq)
sticks((abs.(y1)))
sticks!((abs.(y2)))
s1,k1 = wavread("output1.wav")
s2,k2 = wavread("output2.wav")
for i = 1:N
s1[i] += s2[i]
end
wavwrite(s1, "output3.wav", Fs = sampFreq)
it's the code that reads file input.wav, next do fft on the sound, dividing it into two files output1 with only frequencies > 800 and output2 with frequencies < 800.
In next part I merge the two files into output3. I expected something similar to input, but what I get sounds terrible (I mean it sounds like input, but is quieter and with hum bigger than expected).
My question is on which part of a code I loose the most information about input and is it a way to improve it, to get as output3 something almost like input?
You appear to not understand what the fft (fast fourier transform) returns. It returns a vector of amplitudes, not frequencies. The vector's components correspond to a the amplitude of a sine wave at a frequency that you can find using the fftfreq() function, but be sure to provide the fftfreq() function with its second argument, your sampFreq variable.
To decompose the sound, then, you need to zero the vector components you do not want, based on what fftfreq() tells you the frequencies corresponding to the bins (vector postions in the vector returned by fft().
You will still see a big drop in sound quality with reversing the process with ifft, because the fft will basically average parts of the signal by splitting it into the frequency dimension's bins.
I suggest a tutorial on fft() before you fix your code further -- you can google several of these.

Finding pairs closer than a given distance (proximity) in a set of points

I'm developping a multiplayer game with node.js. Every second I get the coordinates (X, Y, Z) of every player. How can I have, for each player a list of all players located closer than a given distance from him ?
Any idea to avoid a O(n²) calculation?
You are not looking for clustering algorithms.
Instead, you are looking for a database index that supports radius queries.
Examples:
R*-tree
kd-tree
M-tree
Gridfile
Octree (for 3d, quadtree for 2d)
Any of these should do the trick, and yield an O(n log n) performance theoretically. In practise, it's not as easy as this. If all your objects are really close, "closer than a given coordinate" may mean every object, i.e. O(n^2).
What you are looking for is a quadtree in 3 dimensions, i.e. an octree. An octree is basically the same as the binary tree, but instead of two children per node, it has 2^D = 2^3 = 8 children per node, where D is the dimension.
For example, imagine a cube. In order to create the next level of the root, you actually have every node representing the 8 sub-cubes inside the cube and so on.
This tree will yield fast lookups but careful not to use it for more dimensions. I had built a polymorphic quadtree and wouldn't go to more than 8-10 dimensions, because it was becoming too flat.
The other approach would be the kd-tree, where actually you halve the dataset (the players) at every step.
You could use a library that provides nearest neighbour searching.
I'm answering my own question because I have the answer now. Thanks to G. Samaras and Anony-Mousse:
I use a kd-tree algorithm:
First I build the tree with all the players
Then for each player I calculate the list of all the players within given range arround this player
This is very fast and easy with the npm module kdtree: https://www.npmjs.org/package/kdtree
var kd = require('kdtree');
var tree = new kd.KDTree(3); // A new tree for 3-dimensional points
var players = loadPlayersPosition(); // players is an array containing all the positions
for (var p in players){ //let's build the tree
tree.insert(players[p].x, players[p].y, players[p].z, players[p].username);
}
nearest = [];
for (var p in players){ //let's look for neighboors
var RANGE = 1000; //1km range
close = tree.nearestRange(players[p].x, players[p].y, players[p].z, RANGE);
nearest.push(close);
}
It returns nearest that is an array conataining for each player all his neighboors within a range of 1000m. I made some tests on my PC with 100,000 simulated players. It takes only 500 ms to build the tree and another 500 ms to find the nearest neigboors pairs. I find it very fast for such a big number of players.
bonus: if you need to do this with latitude and longitude instead of x, y, z, just convert lat, lon to cartesian x, y z, because for short distances chord distance on a sphere ~ great circle distance

How to draw a frequency spectrum from a Fourier transform

I want to plot the frequency spectrum of a music file (like they do for example in Audacity). Hence I want the frequency in Hertz on the x-axis and the amplitude (or desibel) on the y-axis.
I devide the song (about 20 million samples) into blocks of 4096 samples at a time. These blocks will result in 2049 (N/2 + 1) complex numbers (sine and cosine -> real and imaginary part). So now I have these thousands of individual 2049-arrays, how do I combine them?
Lets say I do the FFT 5000 times resulting in 5000 2049-arrays of complex numbers. Do I plus all the values of the 5000 arrays and then take the magnitude of the combined 2049-array? Do I then sacle the x-axis with the songs sample rate / 2 (eg: 22050 for a 44100hz file)?
Any information will be appriciated
What application are you using for this? I assume you are not doing this by hand, so here is a Matlab example:
>> fbins = fs/N * (0:(N/2 - 1)); % Where N is the number of fft samples
now you can perform
>> plot(fbins, abs(fftOfSignal(1:N/2)))
Stolen
edit: check this out http://www.codeproject.com/Articles/9388/How-to-implement-the-FFT-algorithm
Wow I've written a load about this just recently.
I even turned it into a blog post available here.
My explanation is leaning towards spectrograms but its just as easy to render a chart like you describe!
I might not be correct on this one, but as far as I'm aware, you have 2 ways to get the spectrum of the whole song.
1) Do a single FFT on the whole song, which will give you an extremely good frequency resolution, but is in practice not efficient, and you don't need this kind of resolution anyway.
2) Divide it into small chunks (like 4096 samples blocks, as you said), get the FFT for each of those and average the spectra. You will compromise on the frequency resolution, but make the calculation more manageable (and also decrease the variance of the spectrum). Wilhelmsen link's describes how to compute an FFT in C++, and I think some library already exists to do that, like FFTW (but I never managed to compile it, to be fair =) ).
To obtain the magnitude spectrum, average the energy (square of the magnitude) accross all you chunks for every single bins. To get the result in dB, just 10 * log10 the results. That is of course assuming that you are not interested in the phase spectrum. I think this is known as the Barlett's method.
I would do something like this:
// At this point you have the FFT chunks
float sum[N/2+1];
// For each bin
for (int binIndex = 0; binIndex < N/2 + 1; binIndex++)
{
for (int chunkIndex = 0; chunkIndex < chunkNb; chunkIndex++)
{
// Get the magnitude of the complex number
float magnitude = FFTChunk[chunkIndex].bins[binIndex].real * FFTChunk[chunkIndex].bins[binIndex].real
+ FFTChunk[chunkIndex].bins[binIndex].im * FFTChunk[chunkIndex].bins[binIndex].im;
magnitude = sqrt(magnitude);
// Add the energy
sum[binIndex] += magnitude * magnitude;
}
// Average the energy;
sum[binIndex] /= chunkNb;
}
// Then get the values in decibel
for (int binIndex = 0; binIndex < N/2 + 1; binIndex++)
{
sum[binIndex] = 10 * log10f(sum[binIndex]);
}
Hope this answers your question.
Edit: Goz's post will give you plenty of information on the matter =)
Commonly, you would take just one of the arrays, corresponding to the point in time of the music in which you are interested. The you would calculate the log of the magnitude of each complex array element. Plot the N/2 results as Y values, and scale the X axis from 0 to Fs/2 (where Fs is the sampling rate).

Resources