Ringing artifacts on a audio signal shown on osciloscope - audio

I generated a squere wave signal and put it into a wave file, using this code:
import sys, os, wave, random, struct
noise_output = wave.open('noise.wav', 'w')
noise_output.setparams((1, 2, 1000, 0, 'NONE', 'not compressed'))
SAMPLE_LEN = 1000
for i in range(0, SAMPLE_LEN):
value = random.choice([-32000, 32000])
for j in range(100):
packed_value = struct.pack('h', value)
noise_output.writeframes(packed_value)
I was expected to hear some short rattles when listened, because this is not let's say a "valid" audio signal. Instead I could hear some rattles with a tone somehow, cannot describe it.
Then I used an osciloscope to see the output signal from the soundcard, and it looks like this:
The output looks to me something like Gibbs effect.
My question is, why does it look like this? I was excepted to see no ringing artifact on osciloscope. How the DAC from the soundcard works and which digital/analog filters are outputing this signal?

This ringing could be due to the interpolation filter(s) used by any automatic sample rate conversion done in the audio driver or hardware. Get rid of most of it by using a "softer" edge or larger rise/fall time.

Related

How to find all the peaks in a signal correctly?

I have four arrays of signal, I am trying to find the peaks in those signal. I am following this blog, but not able to detect peaks accurately. Plots of four signal looks like this :
My code to generate the peaks is :
def plot_peaks(time, singal):
index_data = indexes = scipy.signal.argrelextrema(
np.array(signal),
comparator=np.greater,order=2
)
plt.plot(time,singal)
plt.plot(time[index_data[0]],signal[index_data[0]], alpha = 0.5,marker = 'o', mec = 'r',ms = 9, ls = ":",label='%d %s' % (index_data[0].size-1, 'Peaks'))
plt.legend(loc='best', framealpha=.5, numpoints=1)
plt.xlabel('Time(s)', fontsize=14)
plt.ylabel('Amplitude', fontsize=14)
which is resulting like this:
But I want to show the maximum peaks only, but this code is generating a lot of minor peaks too.
How to accurately generate maximum peaks?
I tried scipy code, but Confused with parameters that function
peaks, _ = find_peaks(x, height=0)
Without access to the input data itself it is hard to present a working solution. I can however give some general tips that might enable you to solve the problem yourself.
The difference between what you call "minor" peaks as opposed to "maximum" peaks is their prominence. The prominence is a measure of how much a peaks height stands out compared to the surrounding signal. For more context have a look at the example section, this wikipedia article on topographic prominence or this answer.
scipy.signal.find_peaks provides a way to to only select peaks with a certain prominence. Try to replace scipy.signal.argrelextrema with find_peaks like so
def plot_peaks(time, signal, prominence=None):
index_data, _ = scipy.signal.find_peaks(
np.array(signal),
prominence=prominence
)
plt.plot(time, signal)
plt.plot(time[index_data[0]], signal[index_data[0]], alpha = 0.5,marker = 'o', mec = 'r',ms = 9, ls = ":",label='%d %s' % (index_data[0].size-1, 'Peaks'))
plt.legend(loc='best', framealpha=.5, numpoints=1)
plt.xlabel('Time(s)', fontsize=14)
plt.ylabel('Amplitude', fontsize=14)
Because your signals 1 to 4 have very different amplitude levels I didn't provide a default value that would work for all your signals, you'll have to try yourself which value works best for each plot. E.g. plot_peaks(time, signal_1, prominence=0.1) might be a good starting value for signal 1, plot_peaks(time, signal_2, prominence=1500) for signal 2 and so on...

How to detect input audio existence and do action whenever it exists?

I checked pyaudio but it offers the ability to record the input and manipulate it , i just want to do action when audio input exists.
You can implement a simple input audio detection by using PyAudio. You just need to decide what you mean with audio existence.
In the following example code I have used a simple root mean square calculation with a threshold. An other option is a peak test, just comparing the amplitude of each audio sample with a peak amplitude threshold. What is most useful for you depends on the application.
You can play around with the threshold value (i.e. the minimum amplitude or loudness of audio) and the chunk size (i.e. the latency of the audio detection) to get the behaviour you want.
import pyaudio
import math
RATE = 44100
CHUNK = 1024
AUDIO_EXISTENCE_THRESHOLD = 1000
def detect_input_audio(data, threshold):
if not data:
return False
rms = math.sqrt(sum([x**2 for x in data]) / len(data))
if rms > threshold:
return True
return False
audio = pyaudio.PyAudio()
stream = audio.open(format=pyaudio.paInt16, channels=1, input=True,
rate=RATE, frames_per_buffer=CHUNK)
data = []
while detect_input_audio(data, AUDIO_EXISTENCE_THRESHOLD):
data = stream.read(CHUNK)
# Do something when input audio exists
# ...
stream.stop_stream()
stream.close()
audio.terminate()

Python FFT for feature extraction

I am looking to perform feature extraction for human accelerometer data to use for activity recognition. The sampling rate of my data is 100Hz.
From the various sources I have researched an FFT is a favourable method to use. I have the data in a sliding windows format, the length of each window is 256. I am using Python to do this with the NumPy library. The code I have used to apply the FFt is:
import numpy as np
def fft_transform (window_data):
fft_data = []
fft_freq = []
power_spec = []
for window in window_data:
fft_window = np.fft.fft(window)
fft_data.append(fft_window)
freq = np.fft.fftfreq(np.array(window).shape[-1], d=0.01)
fft_freq.append(freq )
fft_ps = np.abs(fft_window)**2
power_spec.append(fft_ps)
return fft_data, fft_freq, power_spec
This give output which looks like this:
fft_data
array([ 2.92394828e+01 +0.00000000e+00j,
-6.00104665e-01 -7.57915977e+00j,
-1.02677676e+01 -1.55806119e+00j,
-7.17273995e-01 -6.64043705e+00j,
3.45758079e+01 +3.60869421e+01j,
etc..
freq_data
array([ 0. , 0.390625, 0.78125 , 1.171875, 1.5625 , etc...
power_spectrum
array([ 8.54947354e+02, 5.78037884e+01, 1.07854606e+02,
4.46098863e+01, 2.49775388e+03, etc...
I have also plotted the results using this code - where fst_ps is the first array/window of power_spectrum and the fst_freq is the first window/array of the fft_freq data.
import matplotlib.pyplot as plt
fig = plt.figure(figsize=(width, height))
fig1= fig.add_subplot(221)
fig2= fig.add_subplot(222)
fig1.plot(fst_freq, fst_ps)
fig2.plot(fst_freq, np.log10(fst_ps))
plt.show()
I am looking for some advice on what my next step is for extracting features. Thanks
So, as you decomposed signal into spectrum, next step you could try to understand which frequencies is relevant for your application. But it's quite bit difficult to get it from single spectrum picture. Remember, that one frequency bin in the spectrum - it's the same basic signal bounded by narrow frequency range. Some frequencies could not be important for your task.
Better way, if you could try STFT method to understand your signal features in the frequency-time domain. For example, you may read this article about STFT approach on Python. Usually this method applied for searching some kind of time-frequency patterns, which can be recognized as features. For example, in human voice pattern (as in the article) you may see sustainable floating frequencies with duration and frequency bound features. You need to get STFT for your signal to find some patterns on the sonogram to extract features for your task.

Proper usage of tensorflows STFT function

I am trying to construct a plot spectrum of an audio sample similar to the one that is created using Audacity. From Audacity's wiki page, the plot spectrum (attached example) performs:
Plot Spectrum take the audio in blocks of 'Size' samples, does the
FFT, and averages all the blocks together.
I was thinking I would use the STFT functionality recently provided by Tensorflow.
I am using audio blocks of size 512, and my code is as follows:
audio_binary = tf.read_file(audio_file)
waveform = tf.contrib.ffmpeg.decode_audio(
audio_binary,
file_format="wav",
samples_per_second=4000,
channel_count=1
)
stft = tf.contrib.signal.stft(
waveform,
512, # frame_length
512, # frame_step
fft_length=512,
window_fn=functools.partial(tf.contrib.signal.hann_window, periodic=True), # matches audacity
pad_end=True,
name="STFT"
)
But the results of stft are is just an empty array when I expect the FFT results for each frame (of 512 samples)
What is wrong with the way that I am making this call?
I have verified that waveform audio data is being correctly read with just the regular tf.fft function.
audio_file = tf.placeholder(tf.string)
audio_binary = tf.read_file(audio_file)
waveform = tf.contrib.ffmpeg.decode_audio(
audio_binary,
file_format="wav",
samples_per_second=sample_rate, # Get Info on .wav files (sample rate)
channel_count=1 # Get Info on .wav files (audio channels)
)
stft = tf.contrib.signal.stft(
tf.transpose(waveform),
frame_length, # frame_lenght, hmmm
frame_step, # frame_step, more hmms
fft_length=fft_length,
window_fn=functools.partial(tf.contrib.signal.hann_window,
periodic=False), # matches audacity
pad_end=False,
name="STFT"
)

Kivy/Audiostream microphone input data format

I am playing around with some basics of the Audiostream package for Kivy.
I would like to make a simple online input-filter-output system, for example, take in microphone data, impose a band-pass filter, send to speakers.
However, I can't seem to figure out what data format the microphone input is in or how to manipulate it. In code below, buf is type string, but how can I get the data out of it to manipulate it in such a way [i.e. function(buf)] to do something like a band-pass filter?
The code currently functions to just send the microphone input directly to the speakers.
Thanks.
from time import sleep
from audiostream import get_input
from audiostream import get_output, AudioSample
#get speakers, create sample and bind to speakers
stream = get_output(channels=2, rate=22050, buffersize=1024)
sample = AudioSample()
stream.add_sample(sample)
#define what happens on mic input with arg as buffer
def mic_callback(buf):
print 'got', len(buf)
#HERE: How do I manipulate buf?
#modified_buf = function(buf)
#sample.write(modified_buf)
sample.write(buf)
# get the default audio input (mic on most cases)
mic = get_input(callback=mic_callback)
mic.start()
sample.play()
sleep(3) #record for 3 seconds
mic.stop()
sample.stop()
The buffer is composed of bytes that need to be interpreted as signed short. You can use struct or array module to get value. In your example, you have 2 channels (L/R). Let's say you wanna to have the right channel volume down by 20% (aka 80% of the original sound only for right channel)
from array import array
def mic_callback(buf):
# convert our byte buffer into signed short array
values = array("h", buf)
# get right values only
r_values = values[1::2]
# reduce by 20%
r_values = map(lambda x: x * 0.8, r_values)
# you can assign only array for slice, not list
# so we need to convert back list to array
values[1::2] = array("h", r_values)
# convert back the array to a byte buffer for speaker
sample.write(values.tostring())

Resources