Faster plotting of real time audio signal - python-3.x

I have a piece of code that takes real time audio signal from audio jack of my laptop and plots its graph after some basic filtering. The problem I am facing is that the real time plotting is getting slower and slower as the program is running ahead.
Any suggestions to make this plotting faster and proceed at constant rate?? I think animation function will make it faster but was not able to formulate according to my requirement
import pyaudio
import numpy as np
import time
import matplotlib.pyplot as plt
import scipy.io.wavfile
from scipy.signal import butter, lfilter
import wave
plt.rcParams["figure.figsize"] = 8,4
RATE = 44100
CHUNK = int(RATE/2) # RATE / number of updates per second
#Filter co-efficients
nyq = 0.5 * RATE
low = 3000 / nyq
high = 6000 / nyq
b, a = butter(7, [low, high], btype='band')
#Figure structure
fig, (ax, ax2) =plt.subplots(nrows=2, sharex=True)
x = np.linspace(1, CHUNK, CHUNK)
extent = [x[0] - (x[1] - x[0]) / 2., x[-1] + (x[1] - x[0]) / 2., 0, 1]
def soundplot(stream):
t1=time.time()
data = np.array(np.fromstring(stream.read(CHUNK),dtype=np.int32))
y1 = lfilter(b, a, data)
ax.imshow(y1[np.newaxis, :], cmap="jet", aspect="auto")
plt.xlim(extent[0], extent[1])
plt.ylim(-50000000, 50000000)
ax2.plot(x, y1)
plt.pause(0.00001)
plt.cla() # which clears data but not axes
y1 = []
print(time.time()-t1)
if __name__=="__main__":
p=pyaudio.PyAudio()
stream=p.open(format=pyaudio.paInt32,channels=1,rate=RATE,input=True,
frames_per_buffer=CHUNK)
for i in range(RATE):
soundplot(stream)
stream.stop_stream()
stream.close()
p.terminate()

This is a little long for a comment, and since you're asking for suggestions I think it's a semi-complete answer. There's more info and examples online about getting realtime plotting with matplotlib, if you need ideas beyond what's here. The library wasn't designed for this, but it's possible.
First step, profile the code. You can do this with
import cProfile
cProfile.run('soundplot(stream)')
That will show where most of the time is being spent.
Without doing that, I'll give a few tips, but be aware that profiling may show other causes.
First, you want to eliminate redundant function calls in the function soundplot. Both of the following are unnecessary:
plt.xlim(extent[0], extent[1])
plt.ylim(-50000000, 50000000)
They can be called once in initialization code. imshow updates these automatically, but for speed you shouldn't call that every time. Instead, in some initialization code outside the function use im=imshow(data, ...), where data is the same size as what you'll be plotting (although it may not need to be). Then, in soundplot use im.set_data(y1[np.newaxis, :]). Not having to recreate the image object each iteration will speed things up immensely.
Since the image object remains through each iteration, you'll also need to remove the call to cla(), and replace it with either show() or draw() to have the figure draw the updated image. You can do the same with the line on the second axis, using line.set_ydata(y).
Please post the before and after rate it runs at, and let me know if that helps.
Edit: some quick profiling of similar code suggests a 100-500x speedup, mostly from removing cla().
Also looking at your code, the reason for it slowing down is that cla isn't ever called on the first axis. Eventually there will be hundreds of images drawn on that axis, slowing matplotlib to a crawl.

Related

Find the time the music sound starts

I have this sound file where I am looking for the time the music starts. I am limited to using only the scipy module. How do I detect the time on x axis when the sound starts?
An example figure is shown below. The Signal with higher magnitudes shows when the music.
Note sometimes there is noise in the signal which could also have high peaks.
import scipy
import numpy as np
import matplotlib.pyplot as plt
#create single signal
dt = 0.001
t= np.arange(0,6,dt)
lowFreq = np.sin(2+ np.pi*10*t)
musicFreq = 3.5*np.sin(2+np.pi*25*t)
combinedSignal = np.concatenate([lowFreq,musicFreq])
plt.plot(combinedSignal)
plt.show()
I think you can get the time when the sound start by just using:
idx_start = np.where(combinedSignal > 1)[0][0]
This will return the first index where the magnitude of your signal is greater than 1.

Real time audio acquisition using pyAudio (problem with CHUNK size)

I am having an issue when I run the code below. The goal is to develop an app that acheives real time sound acquisition. I have set the CHUNK (frame) size to 320 using 16KHz sampling rate, hence, frame duration of 0.02 s. The issue when I record, the result (the content of the variable "many") contains some glitch sounds or noise. When I double the CHUNK, the problem disapears. The value 0.02 depends on the nature of the problem I am trying to resolve. It is required to set to 0.02. Do you have any suggestions?
import pyaudio
import struct
import numpy as np
import matplotlib.pyplot as plt
import time
import IPython.display as ipd
CHUNK = int(1*320)
FORMAT = pyaudio.paFloat32
CHANNELS = 1
RATE = 16000
p = pyaudio.PyAudio()
chosen_device_index = 1
for x in range(0,p.get_device_count()):
info = p.get_device_info_by_index(x)
#print p.get_device_info_by_index(x)
if info["name"] == "pulse":
chosen_device_index = info["index"]
print("Chosen index: ", chosen_device_index)
stream = p.open(format=FORMAT,
channels=CHANNELS,
rate=RATE,
input_device_index=chosen_device_index,
input=True,
output=False,
frames_per_buffer=CHUNK)
plt.ion()
%matplotlib qt
fig, ax = plt.subplots()
x = np.arange(0, CHUNK)
data = stream.read(CHUNK)
print(len(data))
data_ = struct.unpack(str(CHUNK) + 'f', data)
line, = ax.plot(x, data_)
ax.set_ylim([-1,1])
many = []
while True:
data = struct.unpack(str(CHUNK) + 'f', stream.read(CHUNK))
line.set_ydata(data)
fig.canvas.draw()
fig.canvas.flush_events()
many= np.concatenate((many, data),axis=None)
ipd.Audio(many,rate = 16000)
From the conversation between you can fdcpp, it seems true that the piece of code
line.set_ydata(data)
fig.canvas.draw()
fig.canvas.flush_events()
many= np.concatenate((many, data),axis=None)
takes more than 0.02 s to run. That's why when the next CHUNK size data comes, your code hasn't been ready to receive it, which causes input overflow.
There are different ways to bypass it. But I agree with fdcpp that the best way to solve this problem is to think about your end goal.
For example, you can separate the processing of receiving audio data from processing the data, i.e., your line, fig code. One process just receives and stores the audio data, while the other process takes the stored data and draws it.
But please keep in mind that as long as the drawing part takes more than 0.02 s, you cannot achieve "real-time" as you wanted.

Vectorizing a for loop with a pandas dataframe

I am trying to do a project for my physics class where we are supposed to simulate motion of charged particles. We are supposed to randomly generate their positions and charges but we have to have positively charged particles in one region and negatively charged ones anywhere else. Right now, as a proof of concept, I am trying to do only 10 particles but the final project will have at least 1000.
My thought process is to create a dataframe with the first column containing the randomly generated charges and run a loop to see what value I get and place in the same dataframe as the next three columns their generated positions.
I have tried to do a simple for loop going over the rows and inputting the data as I go, but I run into an IndexingError: too many indexers. I also want this to run as efficiently as possible so that if I scale up the number of particles, it doesn't slow as much.
I also want to vectorize the operations of calculating the motion of each particle since it is based on position of every other particle which, through normal loops would take a lot of computational time.
Any vectorization optimization or offloading to GPU would be very helpful, thanks.
# In[1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from mpl_toolkits import mplot3d
# In[2]:
num_points=10
df_position = pd.DataFrame(pd,np.empty((num_points,4)),columns=['Charge','X','Y','Z'])
# In[3]:
charge = np.array([np.random.choice(2,num_points)])
df_position.iloc[:,0]=np.where(df_position["Charge"]==0,-1,1)
# In[4]:
def positive():
return np.random.uniform(low=0, high=5)
def negative():
return np.random.uniform(low=5, high=10)
# In[5]:
for row in df_position.itertuples(index=True,name='Charge'):
if(getattr(row,"Charge")==-1):
df_position.iloc[row,1]=positive()
df_position.iloc[row,2]=positive()
df_position.iloc[row,3]=positive()
else:
df_position.iloc[row,1]=negative()
#this is where I would get the IndexingError and would like to optimize this portion
df_position.iloc[row,2]=negative()
df_position.iloc[row,3]=negative()
df_position.iloc[:,0]=np.where(df_position["Charge"]==0,-1,1)
# In[6]:
ax=plt.axes(projection='3d')
ax.set_xlim(0, 10); ax.set_ylim(0, 10); ax.set_zlim(0,10);
xdata=df_position.iloc[:,1]
ydata=df_position.iloc[:,2]
zdata=df_position.iloc[:,3]
chargedata=df_position.iloc[:11,0]
colors = np.where(df_position["Charge"]==1,'r','b')
ax.scatter3D(xdata,ydata,zdata,c=colors,alpha=1)
EDIT:
The dataframe that I want the results in would be something like this
Charge X Y Z
-1
1
-1
-1
1
With the inital coordinates of each charge listed after in their respective columns. It will be a 3D dataframe as I will need to track of all their new positions after each time step so that I can do animations of the motion. Each layer will be exactly the same format.
Some code for creating your dataframe:
import numpy as np
import pandas as pd
num_points = 1_000
# uniform distribution of int, not sure it is the best one for your problem
# positive_point = np.random.randint(0, num_points)
positive_point = int(num_points / 100 * np.random.randn() + num_points / 2)
negavite_point = num_points - positive_point
positive_df = pd.DataFrame(
np.random.uniform(0.0, 5.0, size=[positive_point, 3]), index=[1] * positive_point, columns=['X', 'Y', 'Z']
)
negative_df = pd.DataFrame(
np.random.uniform(5.0, 10.0, size=[negavite_point, 3]), index=[-1] *negavite_point, columns=['X', 'Y', 'Z']
)
df = pd.concat([positive_df, negative_df])
It is quite fast for 1,000 or 1,000,000.
Edit: with my first answer, I totally miss a big part of the question. This new one should fit better.
Second edit: I use a better distribution for the number of positive point than a uniform distribution of int.

SymPy Plot Resolution

I just recently started learning Python (Platform: Python 3.7) for my Signal processing and communications class and so far it has been great. However, I'm having issues reproducing the same resolutional quality that's achievable with MatPlotLib and linspace (shown in the code below) when using the SymPy library. I was wondering if there is any way of achieving the same resolution?
I understand that SymPy works off the MatPlotLib library, on the back-end, but is limited in how much it actually uses. I tried adding sampling rates to the time-domain limits in the sym.plot call, like you can do with linspace, but that doesn't work. Is there anyway to call the linspace function prior to plotting, to improve the plot's resolution, or even without using linspace?
Step 1 shows the necessary plots, using MatPlotLib. Step 2 shows the code I'm trying to develop to produce the same results, but the quality of the waveform is nowhere near the same as Step 1.
import sympy as sym
import matplotlib.pyplot as plt
import numpy as np
# Step 1 & Section Identifiers
fc, fm = 10**9,10**6
wc, wm, Ac, Am = 2*np.pi*fc, 2*np.pi*fm, 8, 2
# Carrier Signal Plot
t = np.linspace(0, 5/fc, 500)
ct = Ac*np.cos(wc*t)
plt.xlabel('Time')
plt.ylabel('c(t)')
plt.plot(t,ct)
plt.show()
# Step 2 & Section Identifiers
t = sym.Symbol('t')
fc, fm = 10**7, 10**4
wc, wm = 2*sym.pi*fc, 2*sym.pi*fm
# Carrier Plot
ct = Ac*sym.cos(wc*t)
sym.plot(ct, (t, 0, 5/fc), ylabel = "C(t)")
Edit
I Found that I could turn off adaptive sampling and manually specifying the number of data points, to smooth out the signal. I have added the edited line(s) necessary to show this!
sym.plot(ct, (t, 0, 5/fc), ylabel = "C(t)", adaptive = False, nb_of_points = 500)

FFT on MPU6050 output signal

I want to perform FFT on data array that I have extracted from MPU6050 sensor connected to Arduino UNO using Python
Please find the data sample below
0.13,0.04,1.03
0.14,0.01,1.02
0.15,-0.04,1.05
0.16,0.02,1.05
0.14,0.01,1.02
0.16,-0.03,1.04
0.15,-0.00,1.04
0.14,0.03,1.02
0.14,0.01,1.03
0.17,0.02,1.05
0.15,0.03,1.03
0.14,0.00,1.02
0.17,-0.02,1.05
0.16,0.01,1.04
0.14,0.02,1.01
0.15,0.00,1.03
0.16,0.03,1.05
0.11,0.03,1.01
0.15,-0.01,1.03
0.16,0.01,1.05
0.14,0.02,1.03
0.13,0.01,1.02
0.15,0.02,1.05
0.13,0.00,1.03
0.08,0.01,1.03
0.09,-0.01,1.03
0.09,-0.02,1.03
0.07,0.01,1.03
0.06,0.00,1.05
0.04,0.00,1.04
0.01,0.01,1.02
0.03,-0.05,1.02
-0.03,-0.05,1.03
-0.05,-0.02,1.02
I have taken 1st column (X axis) and saved in an array
Reference:https://hackaday.io/project/12109-open-source-fft-spectrum-analyzer/details
from this i took a part of FFT and the code is as below
from scipy.signal import filtfilt, iirfilter, butter, lfilter
from scipy import fftpack, arange
import numpy as np
import string
import matplotlib.pyplot as plt
sample_rate = 0.2
accx_list_MPU=[]
outputfile1='C:/Users/Meena/Desktop/SensorData.txt'
def fftfunction(array):
n=len(array)
print('The length is....',n)
k=arange(n)
fs=sample_rate/1.0
T=n/fs
freq=k/T
freq=freq[range(n//2)]
Y = fftpack.fft(array)/n
Y = Y[range(n//2)]
pyl.plot(freq, abs(Y))
pyl.grid()
ply.show()
with open(outputfile1) as f:
string1=f.readlines()
N1=len(string1)
for i in range (10,N1):
if (i%2==0):
new_list=string1[i].split(',')
l=len(new_list)
if (l==3):
accx_list_MPU.append(float(new_list[0]))
fftfunction(accx_list_MPU)
I have got the output of FFT as shown FFToutput
I do not understand if the graph is correct.. This is the first time im working with FFT and how do we relate it to data
This is what i got after the changes suggested:FFTnew
Here's a little rework of your fftfunction:
def fftfunction(array):
N = len(array)
amp_spec = abs(fftpack.fft(array)) / N
freq = np.linspace(0, 1, num=N, endpoint=False)
plt.plot(freq, amp_spec, "o-", markerfacecolor="none")
plt.xlim(0, 0.6) # easy way to hide datapoints
plt.margins(0.05, 0.05)
plt.xlabel("Frequency $f/f_{sample}$")
plt.ylabel("Amplitude spectrum")
plt.minorticks_on()
plt.grid(True, which="both")
fftfunction(X)
Specifically it removes the fs=sample_rate/1.0 part - shouldn't that be the inverse?
The plot then basically tells you how strong which frequency (relative to the sample frequency) was. Looking at your image, at f=0 you have your signal offset or mean value, which is around 0.12. For the rest of it, there's not much going on, no peaks whatsoever that indicate a certain frequency being overly present in the measurement data.

Resources