How to process audio stream in realtime - python-3.x

I have a setup with a raspberry pi 3 running latest jessie with all updates installed in which i provide a A2DP bluetooth sink where i connect with a phone to play some music.
Via pulseaudio, the source (phone) is routed to the alsa output (sink). This works reasonably well.
I now want to analyze the audio stream using python3.4 with librosa and i found a promising example using pyaudio which got adjusted to use the pulseaudio input (which magically works because its the default) instead of a wavfile:
"""PyAudio Example: Play a wave file (callback version)."""
import pyaudio
import wave
import time
import sys
import numpy
# instantiate PyAudio (1)
p = pyaudio.PyAudio()
# define callback (2)
def callback(in_data, frame_count, time_info, status):
# convert data to array
data = numpy.fromstring(data, dtype=numpy.float32)
# process data array using librosa
# ...
return (None, pyaudio.paContinue)
# open stream using callback (3)
stream = p.open(format=p.paFloat32,
channels=1,
rate=44100,
input=True,
output=False,
frames_per_buffer=int(44100*10),
stream_callback=callback)
# start the stream (4)
stream.start_stream()
# wait for stream to finish (5)
while stream.is_active():
time.sleep(0.1)
# stop stream (6)
stream.stop_stream()
stream.close()
wf.close()
# close PyAudio (7)
p.terminate()
Now while the data flow works in principle, there is a delay (length of buffer) with which the stream_callback gets called. Since the docs state
Note that PyAudio calls the callback function in a separate thread.
i would have assumed that while the callback is worked on, the buffer keeps filling in the mainthread. Of course, there would be an initial delay to fill the buffer, afterwards i expected to get synchronous flow.
I need a longer portion in the buffer (see frames_in_buffer) for librosa to be able to perfom analysis correctly.
How is something like this possible? Is it a limitation of the software-ports for the raspberry ARM?
I found other answers, but they use the blocking I/O. How would i wrap this into a thread so that librosa analysis (which might take some time) does not block the buffer filling?
This blog seems to fight performance issues with cython, but i dont think the delay is a performance issue. Or might it? Others seem to need some ALSA tweaks but would this help while using pulseaudio?
Thanks, any input appreciated!

Related

How to get amplitude of an audio file in realtime in Python

I have the below code which uses an audio file and calculates amplitude:
from scipy.io.wavfile import read
fs, amplitude = read('1.wav')
print(amplitude)
Now I am trying to read the file in such a way that I should be able to process audio every second. As of now, its reading all the audio file and then showing it, but I want to read let's say first 10 sec (or 1,2,3 sec) and then print its amplitude. Just like in case of reading frames from camera using OpenCV.
Is there any library available to achieve this?

How to recieve live stream from IP camera using opencv?

I am building a project wherein I capture video from webcam or usb camera or from url and perform object detection on the video using machine learning tensorflow API. Everything works fine if I take the input video from webcam or external usb camera but when I take input from IP camera using url the code fails after running for 30-40 seconds.
My code looks like this
import cv2
vid = cv2.VideoCapture(“rtsp://x.xx.xx.xx:554”)
While(True)
_,img = vid.read()
img = cv2.cvtColor(img,cv2.COLOR_BGR2RGB)
final_img = show_inference(detection_model , img)
final_img = cv2.cvtColor(img, cv2.COLOR_RGB2BGR)
cv2.imshow(‘frame’, final_img)
If cv2.waitkey(1)
break
Vid.release()
cv2.destroyAllWindows()
This works fine when i execute it with webcam or usb camera using below lines:
cv2.VideoCapture(0) or cv2.VideoCapture(1)
But when i run using url it shows me frame for 30-40 seconds and then fails with the below error
OpenCV(4.4.0)\source\color.cpp:182: error:(-215:Asertion failed)!_src.empty() in function ‘cv::cvtColor’
It appears to me that the opencv library fails to capture live feed from url and then the code fails.
Anyone any idea how to resolve this issue, below are the versions and specifications i am using:
Using Tensorflow 2.0 on i5 machine without gpu
Hikvision PTZ IP camera
Python version 3.7
Opencv version 4.4
code:
check vid.isOpened(). if it isn't, do not read.
say rv, img = vid.read()
check that rv is True, otherwise break loop
are you throttling the reception of frames in any way? does your inference step take much time?
set the camera to a lower FPS value. the camera will produce frames at its own rate. it will not stop or slow down for you.
when you don't read frames, they queue up. they do not disappear. that will eventually cause you to crash or give you other types of failure. you absolutely must consume frames as fast as they are made.

librosa.load() takes too long to load(sample) mp3 files

I am trying to sample (convert analog to digital) mp3 files via the following Python code using the librosa library, but it takes too much time (around 4 seconds for one file). I suspect this is because librosa doesn't support mp3 and hence uses the slower audioread to sample mp3
Code:
import time
import librosa
s = time.time()
for i in mp3_list[:10]: # list of mp3 file paths, doing for 10 files
y, sr = librosa.load(i)
print('time taken =', time.time() - s)
time taken = 36.55561399459839
I also get this warning:
UserWarning: "PySoundFile failed. Trying audioread instead."
Obviously, this is too much time for any practical application. I want to know if there are better alternatives to this?
For comparison, it only took around 1.2 seconds total time to sample 10 same-sized wav conversions
So the warning kind of hints it. The Librosa developers addressed a similar question in this GitHub question:
This warning will always occur when loading mp3 because libsndfile
does not (yet/currently) support the mp3 format. Librosa tries to use
libsndfile first, and if that fails, it will fall back on the
audioread package, which is a bit slower and more brittle, but
supports more formats.
This is confirmed in the Librosa-code: try ... except RuntimeError ...
So what you can do in this case is either implement your own load() that directly uses audioread to avoid the time wasted in the first block of librosa.load(), or you can use a different library such as pydub. Alternatively, you can use ffmpeg to convert your mp3 to wave before loading them.

How can I concatenate ATSC streams from DVB card?

I'm trying to make a simple "TV viewer" using a Linux DVB video capture card. Currently I watch TV using the following process (I'm on a Raspberry Pi):
Tune to a channel using azap -r TV_CHANNEL_HERE. This will supply bytes to
device /dev/dvb/adapter0/dvr0.
Open OMXPlayer omxplayer /dev/dvb/adapter0/dvr0
Watch TV!
The problem comes when I try to change channels. Even if I set the player to cache incoming bytes (tried with MPlayer also), the player can't withstand a channel change (by restarting azap with a new channel.
I'm thinking this is because of changes in the MPEG TS stream metadata.
Looking for a C library that would let me do the following:
Pull cache_size * mpeg_ts_packet_size from DVR device.
Evaluate each packet and rewrite metadata (PID, etc) as needed.
Populate FIFO with resulting packet.
Set {OMXPlayer,MPlayer} to read from FIFO.
The other thing I was thinking would be to use a program that converts MPEG TS into MPEG PS and concatenate the bytes that way.
Thoughts?
Indeed, when you want to tune on an other channel, some metadata can potentially change and invalid previously cached data.
Unfortunately I'm not familiar with the tools you are using but your point 2. makes me raise an eyebrow: you will waste your time trying to rewrite Transport Stream data.
I would rather suggest to stop and restart process on zapping since it seems to work fine at start.
P.S.:
Here are some tools that can help. Also, I'm not sure at which level your problem is but VLC can be installed on Raspberry PI and it handles TS gracefully.

Prevent ALSA underruns with PyAudio

I wrote a little program which records voice from the microphone and sends it over network and plays it there. I'm using PyAudio for this task. It works almost fine but on both computers i get errors from ALSA that an underrun occurred. I googled a lot about it and now I know what an underrun even is. But I still don't know how to fix the problem. Most of the time the sound is just fine. But it sounds a little bit strange if underruns occur. Is there anything I should take care of in my code? It feels like I'm doing an simple error and I miss it.
My system: python: python3.3, OS: Linux Mint Debian Edition UP7, PyAudio v0.2.7
Have you considered syncing sound?
You didn't provide the code, so my guess is that you need to have a timer in separate thread, that will execute every CHUNK_SIZE/RATE milliseconds code that looks like this:
silence = chr(0)*self.chunk*self.channels*2
out_stream = ... # is the output stream opened in pyaudio
def play(data):
# if data has not arrived, play the silence
# yes, we will sacrifice a sound frame for output buffer consistency
if data == '':
data = silence
out_stream.write(data)
Assuming this code will execute regularly, this way we will always supply some audio data to output audio stream.
It's possible to prevent the underruns by filling silence in if needed.
That looks like that:
#...
data = s.recv(CHUNK * WIDTH) # Receive data from peer
stream.write(data) # Play sound
free = stream.get_write_available() # How much space is left in the buffer?
if free > CHUNK # Is there a lot of space in the buffer?
tofill = free - CHUNK
stream.write(SILENCE * tofill) # Fill it with silence
#...
The solution for me was to buffer the first 10 packets/frames of the recorded sound. Look at the snippet below
BUFFER = 10
while len(queue) < BUFFER:
continue
while running:
recorded_frame = queue.pop(0)
audio.write(recorded_frame)

Resources