How to program sound card to output a specific signal? - audio

I'm trying to program my sound card to output specific values.
Let's say, I have below sequence
[1,2,3,2,10,4,1,50,20,1]
I want the sound card to output the specified analog signal according to this sequence.
I can use Windows Multimedia API of course. However, my task is light-weighted and I don't want to use such heavy framework.
Any suggestions on this?

I propose you generate a .wav file and play it with a media player.
It's easy with python and its wave module. In my below example I worked with python 3.3
import wave
import struct
# define your sequence
frames = [1,2,3,2,10,4,1,50,20,1]
output = wave.open('out.wav', mode='wb') # create the file that will contain your sequence
output.setnchannels(1) # 1 for mono
output.setsampwidth(1) # resolution in number of bytes
output.setframerate(10) # sampling rate, being usually 44100 Hz
output.setnframes(len(frames)) # sequence length
for i in range(0, len(frames)):
output.writeframes(struct.pack('h',frames[i])) # convert to string in hex format
# close the file, you're done
output.close()

You can do this in one line if you use Matlab or the free equivalent, Octave. The relevant documentation is here.
soundsc(x, fs, [ lo, hi ])
Scale the signal so that [lo, hi] -> [-1, 1], then play it
at sampling rate fs. If fs is empty, then the default 8000 Hz
sampling rate is used.
Your function call in the console would look like this ...
soundsc([1,2,3,2,10,4,1,50,20,1], fs, [1 50]);
... or like this with manual normalisation of the positive integer vector to give values between +/- 1 ...
x = [1,2,3,2,10,4,1,50,20,1];
x=x-min(x); % get values to range from zero up
x=x/max(x); % get floating point values to range between 0.0 and 1.0
x=x*2-1; % get values to range between +/- 1.0;
soundsc(x);

Related

Normalization - Signal with different sampling rates

I am trying to solve a signal processing problem. I have a signal like this
My job is to use FFT to plot the frequency vs. signal. This is what I have coded so far:
def Extract_Data(filepath, pattern):
data = []
with open(filepath) as file:
for line in file:
m = re.match(pattern, line)
if m:
data.append(list(map(float, m.groups())))
#print(data)
data = np.asarray(data)
#Convert lists to arrays
variable_array = data[:,1]
time_array = data[:,0]
return variable_array, time_array
def analysis_FFT(filepath, pattern):
signal, time = Extract_Data(filepath, pattern)
signal_FFT = np.fft.fft(signal)
N = len(signal_FFT)
T = time[-1]
#Frequencies
signal_freq = np.fft.fftfreq(N, d = T/N)
#Shift the frequencies
signal_freq_shift = np.fft.fftshift(signal_freq)
#Real and imagniary part of the signal
signal_real = signal_FFT.real
signal_imag = signal_FFT.imag
signal_abs = pow(signal_real, 2) + pow(signal_imag, 2)
#Shift the signal
signal_shift = np.fft.fftshift(signal_FFT)
#signal_shift = np.fft.fftshift(signal_FFT)
#Spectrum
signal_spectrum = np.abs(signal_shift)
What I really concern about is the sampling rate. As you look at the plot, it looks like the sampling rate of the first ~0.002s is not the same as the rest of the signal. So I'm thinking maybe I need to normalize the signal
However, when I use np.fft.fftfreq(N, d =T/N), it seems like np.fft.ffreq assumes the signal has the same sampling rate throughout the domain. So I'm not sure how I could normalize the signal with np.fft. Any suggestions?
Cheers.
This is what I got when I plotted shifted frequency [Hz] with shifted signal
I generated a synthetic signal similar to yours and plotted, like you did the spectrum over the whole time. Your plot was good as it pertains to the whole spectrum, just appears to not give the absolute value.
import numpy as np
import matplotlib.pyplot as p
%matplotlib inline
T=0.05 # 1/20 sec
n=5000 # 5000 Sa, so 100kSa/sec sampling frequency
sf=n/T
d=T/n
t=np.linspace(0,T,n)
fr=260 # Hz
y1= - np.cos(2*np.pi*fr*t) * np.exp(- 20* t)
y2= 3*np.sin(2*np.pi*10*fr*t+0.5) *np.exp(-2e6*(t-0.001)**2)
y=(y1+y2)/30
f=np.fft.fftshift(np.fft.fft(y))
freq=np.fft.fftshift(np.fft.fftfreq(n,d))
p.figure(figsize=(12,8))
p.subplot(311)
p.plot(t,y ,color='green', lw=1 )
p.xlabel('time (sec)')
p.ylabel('Velocity (m/s)')
p.subplot(312)
p.plot(freq,np.abs(f)/n)
p.xlabel('freq (Hz)')
p.ylabel('Velocity (m/s)');
p.subplot(313)
s=slice(n//2-500,n//2+500,1)
p.plot(freq[s],np.abs(f)[s]/n)
p.xlabel('freq (Hz)')
p.ylabel('Velocity (m/s)');
On the bottom, I zoomed in a bit to show the two main frequency components. Note that we are showing the positive and negative frequencies (only the positive ones, times 2x are physical). The Gaussians at 2600 Hz indicate the frequency spectrum of the burst (FT of Gaussian is Gaussian). The straight lines at 260 Hz indicate the slow base frequency (FT of sine is a delta).
That, however hides the timing of the two separate frequency components, the short (in my case Gaussian) burst at the start at about 2.6 kHz and the decaying low tone at about 260 Hz. The spectrogram plots spectra of short pieces (nperseg) of your signal in vertical as stripes where color indicates intensity. You can set some overlap between the time frames,which should be some fraction of the segment length. By stacking these stripes over time, you get a plot of the spectral change over time.
from scipy.signal import spectrogram
f, t, Sxx = spectrogram(y,sf,nperseg=256,noverlap=64)
p.pcolormesh(t, f[:20], Sxx[:20,:])
#p.pcolormesh(t, f, Sxx)
p.ylabel('Frequency [Hz]')
p.xlabel('Time [sec]')
p.show()
It is instructive to try and generate the spectrogram yourself with the help of just the FFT. Otherwise the settings of the spectrogram function might not be very intuitive at first.

Getting multiple sinosidual waves using Fourier Transform Python

According to http://www.thefouriertransform.com/
" The Fourier Transform shows that any waveform can be re-written as the sum of sinusoidal functions. "
I have some signals (each have shape of 256,64) that I want to break down into sub-signals and I want to use those sub-signals then to generate the real signal back. I am doing it right now like this:-
#getting data
with open('../f', 'rb') as fp:
f=pickle.load(fp)
from scipy.fftpack import fft, dct
f=f[0]
tf=fft(f)
x=np.reshape(np.abs(tf),(256,64))
plt.plot(x)
plt.show()
print(x.shape) #same shape as f
But I am getting output in the same shape as of the real signal but with some imaginary values which are discarded ultimately. I have looked at other Fourier questions here but none of them gave satisfying result, they just transformed the input signal. What am I doing wrong? Any help will be much appreciated.
To see the sinusoidal components, you need to plot sine waves.
x = a * sin(t)
not a reshaped FFT result.
If you don't care about phase, the number of sinewave plots will be half the length of your FFT + 1, which each sinewave of a frequency calculated from the bin center of each FFT result element (index times samplerate divided by length), and its amplitude given by the abs() of the FFT bin.

Confusing artifacts in PyWavelet complex-Morlet analysis of 1-kHz signal

I have some artifacts in a PyWavelets transform that are really confusing me. I'm using version 0.5.2. Can someone explain what is happening here?
I start by creating a 1-kHz signal, and then I attempt to analyze this signal with a complex Morlet continuous wavelet transform. I'm doing so with 3 octaves: 0.5 kHz to 1 kHz, 1 kHz to 2 kHz, and 2 kHz to 4 kHz, each one with 40 log-scaled scales. My intuition says that there should be a single peak at y=40 (being equivalent to 1 kHz), and that any difference in time should be minimal. Instead, I'm getting a peak at around y=35 to 37 (0.92 to 0.95 kHz), and there's some kind of periodic effect. (Strangely, this effect seems to occur only in the real component of the transform--the imaginary component looks closer to how I imagined it should look, though it's still not centered correctly. I believe that the real component and the imaginary component should look like time-shifted versions of each other, when looking at a pure sine wave.)
Am I misusing PyWavelets? Is there possibly a bug here? Any help would be welcome.
import numpy
import pywt
import matplotlib.pyplot as plt
# makes a 1-kHz signal
def make_data(length, quality):
tau = 2*numpy.pi
x = numpy.arange(length)
y = numpy.sin(tau * x/(quality/1000)) # the 1000 is for 1 kHz
return y
# does the continuous wavelet transform, outputting pic
def do_transform(data, base_freq, num_octaves, voices_per_octave, quality):
# calculate the scales, based on the desired frequencies
base_scale = quality / (2*base_freq)
far_scale = base_scale / 2**num_octaves
scales = numpy.geomspace(base_scale, far_scale, num=num_octaves*voices_per_octave+1, endpoint=True)
# actual calculation
coeffs, freqs = pywt.cwt(data, scales, "cmor", 1/quality)
print("freqs: " +str(freqs))
# output
truncated = coeffs[:, 100:200]
plt.imshow(abs(truncated), origin='lower', interpolation='none')
#plt.imshow(truncated.real, origin='lower', interpolation='none')
#plt.imshow(truncated.imag, origin='lower', interpolation='none')
plt.subplots_adjust(left=0.01, right=0.99, top=0.99, bottom=0.05)
plt.show()
data = make_data(1000, 44100)
do_transform(data, 500, 3, 40, 44100)
Magnitudes of transform
Real components of transform
Imaginary components of transform
It turns out that this is a known issue. (It's still not clear if this is a bug or just unexpected behavior, but the discussion is here: https://github.com/PyWavelets/pywt/issues/307 .)
Thanks to everyone who looked at it and considered it.

What is the unit of the return values (coefficients) of an FFT?

My application performs an FFT on the raw audio signal (all microphone readings are 16bit integer values in values, which is 1024 cells). It first normalizes the readings according to the 16bit. Then it extracts the magnitude of the frequency 400Hz.
int sample_rate = 22050;
int values[1024];
// omitted: code to read 16bit audio samples into values array
double doublevalues[1024];
for (int i = 0; i < 1024; i++) {
doublevalues[i] = (double)values[i] / 32768.0; // 16bit
}
fft(doublevalues); // inplace FFT, returns only real coefficients
double magnitude = 400.0 / sample_rate * 2048;
printf("magnitude of 400Hz: %f", magnitude);
When I try this out and generate a 400Hz signal to see the value of magnitude, it is around 0 when there is no 400Hz signal and goes up to 30 or 40 when there is.
What is the unit or meaning of the magnitude field? It surprises me that it is larger than 1 even though I normalize the raw signal to be between -1..+1.
It depends on which FFT you are using, as there are different conventions on scaling. The most common convention is that the output values are scaled by N, where N is the size of the FFT. So a 1024 point FFT will have output values which are 1024 times greater than the corresponding input values. A further complication is that for real-to-complex FFTs people typically ignore the symmetric upper half of the FFT, which is fine (because it's conjugate symmetric) but you need to account for a factor of 2 if you do this.
Other common conventions for FFT scaling are (a) no scaling (i.e. the factor of N has been removed) and (b) sqrt(N), which is sometimes used for symmetric scaling behaviour of FFT versus IFFT (sqrt(N) in each direction).
Since sqrt(1024) == 32 it's possible that you're using an FFT routine with sqrt(N) scaling, since you seem to be seeing values of around 30 for for a unit magnitude sine wave input.

Libsox encoding

Why do i get distorted output if I convert a wav file using libsox to:
&in->encoding.encoding = SOX_ENCODING_UNSIGNED;
&in->encoding.bits_per_sample = 8;
using the above code?
The input file has bits_per_sample = 16.
So you're saying that you tell SOX to read a 16 bit sample WAV file as an 8 bit sample file? Knowing nothing about SOX, I would expect it to read each 16 bit sample as two 8 bit samples... the high order byte and the low order byte like this: ...HLHLHLHLHL...
For simplicity, we'll call high order byte samples 'A' samples. 'A' samples carry the original sound with less dynamic range, because the low order byte with the extra precision has been chopped off.
We'll call the low order byte samples "B samples." These will be roughly random and encode noise.
So, as a result we'll have the original sound, the 'A' samples, shifted down in frequency by a half. This is because there's a 'B' sample between every 'A' sample which halves the rate of the 'A' samples. The 'B' samples add noise to the original sound. So we'll have the original sound, shifted down by a half, with noise.
Is that what you're hearing?
Edit Guest commented that the goal is to downconvert a WAV to 8 bit audio. Reading the manpage for SoX, it looks like SoX always uses 32 bit audio in memory as a result of sox_read(). Passing it a format will only make it attempt to read from that format.
To downconvert in memory, use SOX_SAMPLE_TO_SIGNED_8BIT or SOX_SAMPLE_TO_UNSIGNED_8BIT from sox.h, ie:
sox_format_t ft = sox_open_read("/file/blah.wav", NULL, NULL);
if( ft ) {
sox_ssample_t buffer[100];
sox_size_t amt = sox_read(ft, buffer, sizeof(buffer));
char 8bitsample = SOX_SAMPLE_TO_SIGNED_8BIT(buffer[0], ft->clips);
}
to output a downconverted file, use the 8 bit format when writing instead of when reading.

Resources