display signal from stereo mics - audio

Are these 2 Audio Signals of Stereo?
1st Signal2nd Signal

hard to tell ... would be easier to identify whether those are stereo if both curves were plotted together with different color curves on same plot and zoomed in so you can see if the curves have similar shapes though slightly different ... if you create one loop to iterate across each point of the curve and inside this loop print out the sum of curve1 - curve2 on a per point basis then if the values of each of these sum values are close to zero then both curves are very similar and likely are stereo curves of same source sound
// array1 holds all points of your signal 1
// array2 holds all points of your signal 2
size_array = length(array1)
for curr_index = 0; curr_index < size_array; curr_index++ {
curr_sum = array1[curr_index] - array2[curr_index] // inverts array2
print $curr_sum
}
if both signals were identical above list of curr_sum would show value zero ( which means your signal is mono just copied into two channels ) ... if signals are stereo then curr_sum will be somewhat close to zero depending on degree of stereo separation between both microphones

Related

scipy.signal.find_peaks and a distance-values to provide in a function - (how to use not the number of discretes)

I am using scipy.signal.find_peaks function (link) to determine the peaks of a signal provided.
The signal is loaded in a dataframe like this
x = df["sig_coord"] # the x coordinate of a signal , time msec
Y = df["sig_value"] # value of a signal f(x) - Measured Voltage at the moment x, V
The dataframe has abot 10k points for one signal.
peaks = signal.find_peaks(Y, prominence=2, distance = 40)
the parameter distance actually the measure of how far peaks can be distanced from each other, as I undersood. But the dimension of this parameter must be set in discrete number values - or point number of a signal.
My x-scale is non-monotonic, the distance between pints is changed in a special manner (non-equidistant measurements) so it's unconvenient to use discrete numbers as a dimension of a distance in this case... It will be much better to use a distance provided in a time dimesion (msec in my case). Is it possible to do that?
Maybe there is a way to use find_peaks functionality providing the distance in coordinates that have physical meaning, not strictly discrets number..?
Or maybe it's possible to recalc those values in a simple manner?
Because I have various ratio msec/signal points at various signal parts..

Creating motor sound via FFT?

There is a idle tone of a car. I want to make accelerating and deccelerating that sound by changing ffts.
How can achieve this. I only know C and little bit C++.
start with your idle sound array of raw audio [array1] ( this is the payload of a WAV file in PCM format of the car idling ) which will be in the time domain
feed this array1 into a FFT call which will return a new array [array2] which will be the frequency domain representation of the same underlying data as array1 ... where in this new array array2, element zero represents zero Hertz and the freq increment (incr_freq), separating each element, is defined by source sound array1 parms as per
incr_freq := sample_rate / number_of_samples_in_array1
... value of each element of array2 will be a complex number from which you can calc the magnitude and phase of the given freq ... to be clear frequency with regard to array2 is derived based on element position starting with element zero which is the DC bias and can be ignored ... knowing the frequency increment above (incr_freq) lets show the first few elements of array2
complex_number_0 := array2[0] // element 0 DC bias ignore this element
// its frequency_0 = 0
complex_number_1 := array2[1] // element 1
// its at frequency1 = frequency_0 + incr_freq
complex_number_2 := array2[2] // element 2
// its at frequency2 = frequency_1 + incr_freq
now identify the top X magnitudes in array2 (nugget1) ... these are the dominant frequencies which are most responsible to capture the essence of the car sound ... we save for later the element value of array2 for the X elements ... we calc magnitude using below which is inside loop across all elements of array2
for index_fft, curr_complex := range complex_fft {
curr_real = real(curr_complex) // pluck out real portion of imaginary number
curr_imag = imag(curr_complex) // ditto for imaginary part of complex number
curr_mag = 2.0 * math.Sqrt(curr_real*curr_real+curr_imag*curr_imag) / number_of_samples_array2
// ... more goodness here
}
now feed the array2 FFT array into an inverse FFT call which will return an array [array3] once again in the time domain ( raw audio )
if you do not alter the data of array2 your array3 will be the same (to a first approximation) as array1 ... now jack up your array2 to impart the acc or dec before sending it into that inverse FFT call
the secret sauce of how to alter array2 is left as an exercise ... my guess is put into a loop your synthesis of array3 from array2 which gets immediately rendered to your speakers inside this loop (loop_secret_sauce) ... where you increment (acc) or decrement(dec) the top X frequencies as identified above as nugget1 ... meaning as a whole shift the entire set of freq of all of the top X frequencies as defined by their magnitude ... give a non linear aspect to this shift of the set of X frequencies ... possibly not only increment or dec the freq of this set but also muck about with their relative magnitudes as well as introduce a wider swatch of frequencies in this loop
to give yourself traction in making this secret sauce use as array1 several different recordings of the car when its at idle or acc or dec and compare its array2 and use the diff between idle, acc, dec inside this sauce loop
here we drill down on mechanics ... when source audio is idle we iterate across its array2 and identify the top X elements of array2 with the greatest magnitude ... these X elements of array2 get saved into array top_mag_idle ... do same for source audio of acc and save in array top_mag_acc ... critical step ... examine difference between the elements stored in top_mag_idle versus top_mag_acc ... this transition between elements of top_mag_idle into top_mag_acc is your secret sauce which you will put into loop_secret_sauce ... to get concrete here when you loop across loop_secret_sauce and update array2 elements to reflect top_mag_idle the audio will sound idle over time when you continue looping across array2 to synthesize array3 and transition to updating array2 elements to reflect top_mag_acc the sound will be of an accelerating car
perhaps to gain intuition on the secret sauce consider this ... imagine listening to a car on idle ... as with any complex system which generates audio it will have a set of dominant frequencies meaning there are a set of say 5 different frequencies with the greatest magnitude ( loudest freqs ) ... similar to a pianist playing a cord on a piano where the shape of her hand and fingers remain static yet she is repeatedly tapping the keyboard ... now the car starts to accelerate ... the analogy here is she continues to repeatedly tap the keyboard with that same static hand and finger layout yet now she slides her hand up to the right along the keyboard as she continues to tap the keyboard ... in your code inside loop_secret_sauce the original set of freqs (top_mag_idle) will generate the idle car sound when you synthesize array3 from array2 ... then to implement acc you increment in unison all freqs in top_mag_idle and repeat synthesis of array3 from array2 this will give you the acc sound
until you get this working I would only use mono ( one channel not stereo )
sounds like an interesting project ... have fun !!!

Normalization - Signal with different sampling rates

I am trying to solve a signal processing problem. I have a signal like this
My job is to use FFT to plot the frequency vs. signal. This is what I have coded so far:
def Extract_Data(filepath, pattern):
data = []
with open(filepath) as file:
for line in file:
m = re.match(pattern, line)
if m:
data.append(list(map(float, m.groups())))
#print(data)
data = np.asarray(data)
#Convert lists to arrays
variable_array = data[:,1]
time_array = data[:,0]
return variable_array, time_array
def analysis_FFT(filepath, pattern):
signal, time = Extract_Data(filepath, pattern)
signal_FFT = np.fft.fft(signal)
N = len(signal_FFT)
T = time[-1]
#Frequencies
signal_freq = np.fft.fftfreq(N, d = T/N)
#Shift the frequencies
signal_freq_shift = np.fft.fftshift(signal_freq)
#Real and imagniary part of the signal
signal_real = signal_FFT.real
signal_imag = signal_FFT.imag
signal_abs = pow(signal_real, 2) + pow(signal_imag, 2)
#Shift the signal
signal_shift = np.fft.fftshift(signal_FFT)
#signal_shift = np.fft.fftshift(signal_FFT)
#Spectrum
signal_spectrum = np.abs(signal_shift)
What I really concern about is the sampling rate. As you look at the plot, it looks like the sampling rate of the first ~0.002s is not the same as the rest of the signal. So I'm thinking maybe I need to normalize the signal
However, when I use np.fft.fftfreq(N, d =T/N), it seems like np.fft.ffreq assumes the signal has the same sampling rate throughout the domain. So I'm not sure how I could normalize the signal with np.fft. Any suggestions?
Cheers.
This is what I got when I plotted shifted frequency [Hz] with shifted signal
I generated a synthetic signal similar to yours and plotted, like you did the spectrum over the whole time. Your plot was good as it pertains to the whole spectrum, just appears to not give the absolute value.
import numpy as np
import matplotlib.pyplot as p
%matplotlib inline
T=0.05 # 1/20 sec
n=5000 # 5000 Sa, so 100kSa/sec sampling frequency
sf=n/T
d=T/n
t=np.linspace(0,T,n)
fr=260 # Hz
y1= - np.cos(2*np.pi*fr*t) * np.exp(- 20* t)
y2= 3*np.sin(2*np.pi*10*fr*t+0.5) *np.exp(-2e6*(t-0.001)**2)
y=(y1+y2)/30
f=np.fft.fftshift(np.fft.fft(y))
freq=np.fft.fftshift(np.fft.fftfreq(n,d))
p.figure(figsize=(12,8))
p.subplot(311)
p.plot(t,y ,color='green', lw=1 )
p.xlabel('time (sec)')
p.ylabel('Velocity (m/s)')
p.subplot(312)
p.plot(freq,np.abs(f)/n)
p.xlabel('freq (Hz)')
p.ylabel('Velocity (m/s)');
p.subplot(313)
s=slice(n//2-500,n//2+500,1)
p.plot(freq[s],np.abs(f)[s]/n)
p.xlabel('freq (Hz)')
p.ylabel('Velocity (m/s)');
On the bottom, I zoomed in a bit to show the two main frequency components. Note that we are showing the positive and negative frequencies (only the positive ones, times 2x are physical). The Gaussians at 2600 Hz indicate the frequency spectrum of the burst (FT of Gaussian is Gaussian). The straight lines at 260 Hz indicate the slow base frequency (FT of sine is a delta).
That, however hides the timing of the two separate frequency components, the short (in my case Gaussian) burst at the start at about 2.6 kHz and the decaying low tone at about 260 Hz. The spectrogram plots spectra of short pieces (nperseg) of your signal in vertical as stripes where color indicates intensity. You can set some overlap between the time frames,which should be some fraction of the segment length. By stacking these stripes over time, you get a plot of the spectral change over time.
from scipy.signal import spectrogram
f, t, Sxx = spectrogram(y,sf,nperseg=256,noverlap=64)
p.pcolormesh(t, f[:20], Sxx[:20,:])
#p.pcolormesh(t, f, Sxx)
p.ylabel('Frequency [Hz]')
p.xlabel('Time [sec]')
p.show()
It is instructive to try and generate the spectrogram yourself with the help of just the FFT. Otherwise the settings of the spectrogram function might not be very intuitive at first.

Amplitude of audio signal harmonics in Unity3D

I have managed to calculate the pitch of audio input from microphone using the GetSpectrumData function. But now I need to get the amplitudes of the first 7 harmonics of audio (Project requirement)
I have very less knowledge of Audio dsp. Only thing I understood is that harmonics are multiples of the fundamental frequency. But how will I get the amplitudes of the harmonics.
Thanks
First you need to figure out which FFT bin your fundamental frequency is in. Say it resides in bin# 10. The harmonics will reside in integer multiples of that bin so the 2nd harmonic will be in bin 20, 3rd in bin 30 and so on. For each of these harmonic bins you need to compute the amplitude. Depending on the window function you used in the FFT you will need to include a small number of bins in the calculation (google spectral leakage if you're interested).
double computeAmpl(double[] spectrum, int windowHalfLen, int peakBin, int harmonic)
{
double sumOfSquares = 0.0;
for (int bin = peakBin-windowHalfLen; bin <= peakBin+windowHalfLen; bin++)
{
sumOfSquares += spectrum[bin] * spectrum[bin];
}
return sqrt(sumOfSquares);
}
As I mentioned the window half length depends on the window. Some common ones are:
blackman-harris 3 - 3
blackman-harris 4 - 4
flat top - 5
hann - 3

What is the unit of the return values (coefficients) of an FFT?

My application performs an FFT on the raw audio signal (all microphone readings are 16bit integer values in values, which is 1024 cells). It first normalizes the readings according to the 16bit. Then it extracts the magnitude of the frequency 400Hz.
int sample_rate = 22050;
int values[1024];
// omitted: code to read 16bit audio samples into values array
double doublevalues[1024];
for (int i = 0; i < 1024; i++) {
doublevalues[i] = (double)values[i] / 32768.0; // 16bit
}
fft(doublevalues); // inplace FFT, returns only real coefficients
double magnitude = 400.0 / sample_rate * 2048;
printf("magnitude of 400Hz: %f", magnitude);
When I try this out and generate a 400Hz signal to see the value of magnitude, it is around 0 when there is no 400Hz signal and goes up to 30 or 40 when there is.
What is the unit or meaning of the magnitude field? It surprises me that it is larger than 1 even though I normalize the raw signal to be between -1..+1.
It depends on which FFT you are using, as there are different conventions on scaling. The most common convention is that the output values are scaled by N, where N is the size of the FFT. So a 1024 point FFT will have output values which are 1024 times greater than the corresponding input values. A further complication is that for real-to-complex FFTs people typically ignore the symmetric upper half of the FFT, which is fine (because it's conjugate symmetric) but you need to account for a factor of 2 if you do this.
Other common conventions for FFT scaling are (a) no scaling (i.e. the factor of N has been removed) and (b) sqrt(N), which is sometimes used for symmetric scaling behaviour of FFT versus IFFT (sqrt(N) in each direction).
Since sqrt(1024) == 32 it's possible that you're using an FFT routine with sqrt(N) scaling, since you seem to be seeing values of around 30 for for a unit magnitude sine wave input.

Resources