Getting volume levels from PyAudio for use in Arduino - audio

I want to sent volume data from my laptop's audio input (just the built-in microphone in my Macbook) to Arduino with as little lag as possible.
I see that it isn't hard to capture the audio input using PyAudio, but most of the examples for that module save the audio readings into a wav or other file format. Can I just directly measure the volume as I'm reading it into PyAudio, or do I need to save it to a file and analyze that file? I don't care about any other data in the audio beyond the volume.
Much appreciated.

You can read in the volume in real time. To do this, set up the recording but don't save the data, just process it. Here, I'll get the RMS value of each chunk using Python's included audioop module. (This example is just a modification of the record demo in the PyAudio webpage to include audioop.rms.)
import pyaudio
import wave
import audioop
CHUNK = 1024
FORMAT = pyaudio.paInt16
CHANNELS = 2
RATE = 44100
RECORD_SECONDS = 5
WAVE_OUTPUT_FILENAME = "output.wav"
p = pyaudio.PyAudio()
stream = p.open(format=FORMAT,
channels=CHANNELS,
rate=RATE,
input=True,
frames_per_buffer=CHUNK)
for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
data = stream.read(CHUNK)
rms = audioop.rms(data, 2) # here's where you calculate the volume
stream.stop_stream()
stream.close()
p.terminate()
Of course, if you don't like RMS, audioop has other volume measures.

Related

How to get amplitude of an audio file in realtime in Python

I have the below code which uses an audio file and calculates amplitude:
from scipy.io.wavfile import read
fs, amplitude = read('1.wav')
print(amplitude)
Now I am trying to read the file in such a way that I should be able to process audio every second. As of now, its reading all the audio file and then showing it, but I want to read let's say first 10 sec (or 1,2,3 sec) and then print its amplitude. Just like in case of reading frames from camera using OpenCV.
Is there any library available to achieve this?

How to split long audio (EX:1hour ) file into multiple short length (5s) audio file using python

I have some long audio files.I want to split this audio file into multiple short length audio file using python.Ex:The audio long length is more than 1 hour and want to split into multiple short length 5s files. i want to extract features for the whole audio file in each 5s.
There are two issues in your question.
Splitting the audio
Extracting features.
and both of them have the same, underlying, key information: sampling frequency.
The duration of an audio signal, in seconds, and the sampling frequency used for the audio file, define the amount of samples that an audio file has. An audio sample is (in simplified terms) one value of the audio signal in your hard-disk or computer memory.
The amount of audio samples, for a typical wav file, are calculated based on the formula sr * dur, here sr is the sampling frequency in Hz (e.g. 44100 for a CD quality signal) and dur is the duration of the audio file in seconds. For example, a CD audio file of 2 seconds has 44100 * 2 = 88200 samples.
So:
To split an audio file in Python, you first have to read it in a variable. There are plenty libraries and functions out there, for example (in a random order):
scipy.io.wavfile.read
wave module
and others. You can check this SO post for more info on reading a wav file.
Then, you just have to get N samples, e.g. my_audio_1 = whole_audio_file[0:5*sr].
BUT!!!
If you just want to extract features for every X seconds, then it is no need to split the audio manually. Most audio feature extraction libraries, do that for you.
For example, in librosa you can control the amount of the FFT points, which roughly are equivalent to the length of the audio that you want to extract features from. You can check, for example, here: https://librosa.org/doc/latest/feature.html

Video size optimization

I'm working on a task that should optimize the video's size before uploading to the server in a web application.
So i want my model to automatically optimize each input video size.
I have many trials in different approaches like FFmpeg:
I used libx265, h264 and lib265 as a codec, with some videos it increases the video size and others minimize it with little ratio and it takes so long to generate the output file.
for example with video of size 8M
input = {input_name: None}
output = {output_name: '-n preset faster -vcodec libx265 -crf 28'}
The output file is 10M.
And also tried OpenCV:
But the output videos aren't written the file appears 0kb size.
ex: with input video resolution (1280×544)
I want to down scale it by:
cap= cv2.VideoCapture(file_name)
cap.set(3,640)
cap.set(4,480)
codec = cv2.VideoWriter_fourcc(*'XDIV')
out = cv2.VideoWriter(output_file, codec, 28.0 , (640,480))
While cap.isOpened():
bol, frame = cap.read()
out.write(frame)
cv2.imshow('video', frame)
I become little bit confused. what are the parameters of the input,output videos i should consider to be optimized and make a vital size change in each specific video? Is it the codec, width, height only?
What is the most effective approach to do this?
Should i build a predictive model for estimating the suitable output video file parameters or there is a method that auto adjust ?
if there's an illustrative example please provide me.

Basic pitch shift using Stream methods in sounddevice module for Python?

I really do not understand the correct format or code structure regarding how to implement the Sounddevice Stream methods. I want to create a basic buffer that writes my array data to be read in a callback almost in real time. I want to be able to change the freq of the sound wave via a threaded query that is integrated with the stream. I am trying to understand the basic API and how input to output works with streaming via Sounddevice.
https://python-sounddevice.readthedocs.io/en/0.3.12/api.html
My lack of understanding of this API has me at a brick wall of knowing where to start. This is just for learning sound manipulation and applying effects to continuous sound without any audible cutoffs, kind of like a Theremin.
So after heavy API reading and some euroscipy videos I figured out the correct format for the sounddevice (portaudio fork) stream method. I also took some basic knowledge of threads and queues to create a rudimentary pitch shifter that is almost realtime. The pitch shifter will need to be changed and implemented with a knob. There will also need to be improved buffer speeds to be considered real time. Hope this helps out anyone wanting to just jump into manipulating sound without all the hassle!
def waveform(q):
with sd.Stream(samplerate=RATE,blocksize=CHUNK,dtype='int32',latency='low',callback=None) as s:
sps = 44100
wave = signal.square
t = .3
atten = .015
while True:
i = q.get()
freq = i
waveform = wave(2*np.pi*(np.arange(t*sps))*freq/sps)
waveform_quiet = waveform*atten
wave_int = waveform_quiet * 2147483647
s.write(np.ascontiguousarray(wave_int, np.int32))
q=Queue()
q.put(i)
p = Thread(target=waveform, args=(q,))
p.daemon = True
p.start()
#pitch shifter, increments of 10hz
while True:
i+ = 10
q.put(i)
print('Queues being stored')
print(i)
if i >880:
print('Queues Stored')
break

How to process audio stream in realtime

I have a setup with a raspberry pi 3 running latest jessie with all updates installed in which i provide a A2DP bluetooth sink where i connect with a phone to play some music.
Via pulseaudio, the source (phone) is routed to the alsa output (sink). This works reasonably well.
I now want to analyze the audio stream using python3.4 with librosa and i found a promising example using pyaudio which got adjusted to use the pulseaudio input (which magically works because its the default) instead of a wavfile:
"""PyAudio Example: Play a wave file (callback version)."""
import pyaudio
import wave
import time
import sys
import numpy
# instantiate PyAudio (1)
p = pyaudio.PyAudio()
# define callback (2)
def callback(in_data, frame_count, time_info, status):
# convert data to array
data = numpy.fromstring(data, dtype=numpy.float32)
# process data array using librosa
# ...
return (None, pyaudio.paContinue)
# open stream using callback (3)
stream = p.open(format=p.paFloat32,
channels=1,
rate=44100,
input=True,
output=False,
frames_per_buffer=int(44100*10),
stream_callback=callback)
# start the stream (4)
stream.start_stream()
# wait for stream to finish (5)
while stream.is_active():
time.sleep(0.1)
# stop stream (6)
stream.stop_stream()
stream.close()
wf.close()
# close PyAudio (7)
p.terminate()
Now while the data flow works in principle, there is a delay (length of buffer) with which the stream_callback gets called. Since the docs state
Note that PyAudio calls the callback function in a separate thread.
i would have assumed that while the callback is worked on, the buffer keeps filling in the mainthread. Of course, there would be an initial delay to fill the buffer, afterwards i expected to get synchronous flow.
I need a longer portion in the buffer (see frames_in_buffer) for librosa to be able to perfom analysis correctly.
How is something like this possible? Is it a limitation of the software-ports for the raspberry ARM?
I found other answers, but they use the blocking I/O. How would i wrap this into a thread so that librosa analysis (which might take some time) does not block the buffer filling?
This blog seems to fight performance issues with cython, but i dont think the delay is a performance issue. Or might it? Others seem to need some ALSA tweaks but would this help while using pulseaudio?
Thanks, any input appreciated!

Resources