I have successfully wrote a code that will record a few seconds of audio and save it in the selected directory in python 2.7 using pyaudio, like so:
import pyaudio
import wave
import sys
chunk = 1024
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 44100
RECORD_SECONDS = 5
WAVE_OUTPUT_FILENAME = "%d_%d.wav" % (self.get('subject_nr'), self.get('count_inline_script'))
p = pyaudio.PyAudio()
stream = p.open(format = FORMAT,
channels = CHANNELS,
rate = RATE,
input = True,
frames_per_buffer = chunk)
Now, i only recently started using Python 3.2 and i am wondering if there is a way to record sound like in the older version?
If you are on windows and your script only uses wave and pyAudio it is perfectly possible to run it with py3k.
wave is a module in the official distribution and windows binary installers for pyAudio can be obtained from here
Related
I'm working with the libraries librosa, opensmile and essentia to extract features from the audio, however, despite being able to, the process is extremely time consuming and making it impossible for me to continue with my project.
Basically I have 4303 wav files with 30 seconds each. As an environment, I have been using the free version of Colab. I know that it is possible to use gpu in this environment or to develop some kind of Multithreading, however, I still don't have much experience in these matters.
Therefore, I would like to know if there is any way to optimize my solution, considering that the current solution goes beyond 12 hours of processing and does not finish, because the environment falls.
The code used is below:
!pip install opensmile
!pip install sox
!pip install essentia
!pip install librosa
import pandas as pd
import numpy as np
import os
import re
import opensmile
import essentia
import essentia.standard as es
import librosa
path = '/content/drive/MyDrive/vocal'
files = os.listdir(path)
files.sort(key=lambda f: int(re.sub('\D', '', f))
voiceFeatures = []
for f in files:
smile = opensmile.Smile(
feature_set=opensmile.FeatureSet.eGeMAPSv02
)
y = smile.process_file(path+'/'+f)
audio = es.MonoLoader(filename= path+'/'+f)()
run_gfcc = es.GFCC(numberCoefficients=12)
gfccs = run_gfcc(audio)
y_, sr = librosa.load(path+'/'+f)
f1Mean = y.F1frequency_sma3nz_amean
f1STD = y.F1frequency_sma3nz_stddevNorm
f1BandMean = y.F1bandwidth_sma3nz_amean
f1BandSTD = y.F1bandwidth_sma3nz_stddevNorm
f2Mean = y.F2frequency_sma3nz_amean
f2STD = y.F2frequency_sma3nz_stddevNorm
f2BandMean = y.F2bandwidth_sma3nz_amean
f2BandSTD = y.F2bandwidth_sma3nz_stddevNorm
f3Mean = y.F3frequency_sma3nz_amean
f3STD = y.F3frequency_sma3nz_stddevNorm
f3BandMean = y.F3bandwidth_sma3nz_amean
f3BandSTD = y.F3bandwidth_sma3nz_stddevNorm
voicedMean = y.MeanVoicedSegmentLengthSec
voicedSTD = y.StddevVoicedSegmentLengthSec
unvoicedMean = y.MeanUnvoicedSegmentLength
unvoicedSTD = y.StddevUnvoicedSegmentLength
f0Mean = y['F0semitoneFrom27.5Hz_sma3nz_amean']
f0STD = y['F0semitoneFrom27.5Hz_sma3nz_stddevNorm']
hnrMean = y.HNRdBACF_sma3nz_amean
hnrSTD = y.HNRdBACF_sma3nz_stddevNorm
jitterMean = y.jitterLocal_sma3nz_amean
jitterSTD = y.jitterLocal_sma3nz_stddevNorm
shitterMean = y.shimmerLocaldB_sma3nz_amean
shitterSTD = y.shimmerLocaldB_sma3nz_stddevNorm
gfccsMean = np.mean(gfccs[1])
gfccsSTD = np.std(gfccs[1])
mfcc = librosa.feature.mfcc(y=y_, sr=sr, n_mfcc=16) # mfcc - 16
features ={
"title": f,
"f1Mean": f1Mean[0],
"f1STD": f1STD[0],
"f1BandMean": f1BandMean[0],
"f1BandSTD": f1BandSTD[0],
"f2Mean": f2Mean[0],
"f2STD": f2STD[0],
"f2BandMean": f2BandMean[0],
"f2BandSTD": f2BandSTD[0],
"f3Mean": f3Mean[0],
"f3STD": f3STD[0],
"f3BandMean": f3BandMean[0],
"f3BandSTD": f3BandSTD[0],
"voicedMean": voicedMean[0],
"voiceSTD": voicedSTD[0],
"unvoicedMean": unvoicedMean[0],
"unvoicedSTD": unvoicedSTD[0],
"f0Mean": f0Mean[0],
"f0STD": f0STD[0],
"hnrMean": hnrMean[0],
"hnrSTD": hnrSTD[0],
"jitterMean": jitterMean[0],
"jitterSTD": jitterSTD[0],
"shitterMean": shitterMean[0],
"shitterSTD": shitterSTD[0],
"gfccsMean": gfccsMean,
"gfccsSTD": gfccsSTD,
"mfcc1Mean": np.mean(mfcc[0]),
"mfcc1STD": np.std(mfcc[0]),
"mfcc2Mean": np.mean(mfcc[1]),
"mfcc2STD": np.std(mfcc[1]),
"mfcc3Mean": np.mean(mfcc[2]),
"mfcc3STD": np.std(mfcc[2]),
"mfcc4Mean": np.mean(mfcc[3]),
"mfcc4STD": np.std(mfcc[3]),
"mfcc5Mean": np.mean(mfcc[4]),
"mfcc5STD": np.std(mfcc[4]),
"mfcc6Mean": np.mean(mfcc[5]),
"mfcc6STD": np.std(mfcc[5]),
"mfcc7Mean": np.mean(mfcc[6]),
"mfcc7STD": np.std(mfcc[6]),
"mfcc8Mean": np.mean(mfcc[7]),
"mfcc8STD": np.std(mfcc[7]),
"mfcc9Mean": np.mean(mfcc[8]),
"mfcc9STD": np.std(mfcc[8]),
"mfcc10Mean": np.mean(mfcc[9]),
"mfcc10STD": np.std(mfcc[9]),
"mfcc11Mean": np.mean(mfcc[10]),
"mfcc11STD": np.std(mfcc[10]),
"mfcc12Mean": np.mean(mfcc[11]),
"mfcc12STD": np.std(mfcc[11]),
"mfcc13Mean": np.mean(mfcc[12]),
"mfcc13STD": np.std(mfcc[12]),
"mfcc14Mean": np.mean(mfcc[13]),
"mfcc14STD": np.std(mfcc[13]),
"mfcc15Mean": np.mean(mfcc[14]),
"mfcc15STD": np.std(mfcc[14]),
"mfcc16Mean": np.mean(mfcc[15]),
"mfcc16STD": np.std(mfcc[15]),
}
voiceFeatures.append(features)
df = pd.json_normalize(voiceFeatures)
I have designed a method when product not found in using barcode scanning I put this code in that product not found.
#api.multi
def _product_sound(self)
PyAudio = pyaudio.PyAudio
bitrate = 8000
frq = 500
LENGTH = 2
if frq > bitrate:
bitrate = frq+100
numberofframe = int(bitrate * LENGTH)
restframe = numberofframe % bitrate
wave = ''
for x in range(numberofframe):
wave = wave+chr(int(math.sin(x/((bitrate/frq)/math.pi))*124+128))
for x in range(restframe):
wave = wave+chr(128)
p = PyAudio()
stream = p.open(format = p.get_format_from_width(1), channels = 1,rate = bitrate,output = True)
stream.write(wave)
stream.stop_stream()
stream.close()
p.terminate()
When I try this code in single system its work perfectly. but when I try to using in the different device that time sound not be generated.
So how to play sound in odoo with different system or current system ?
I'm trying to make tensorflow mfcc give me the same results as python lybrosa mfcc
i have tried to match all the default parameters that are used by librosa
in my tensorflow code and got a different result
this is the tensorflow code that i have used :
waveform = contrib_audio.decode_wav(
audio_binary,
desired_channels=1,
desired_samples=sample_rate,
name='decoded_sample_data')
sample_rate = 16000
transwav = tf.transpose(waveform[0])
stfts = tf.contrib.signal.stft(transwav,
frame_length=2048,
frame_step=512,
fft_length=2048,
window_fn=functools.partial(tf.contrib.signal.hann_window,
periodic=False),
pad_end=True)
spectrograms = tf.abs(stfts)
num_spectrogram_bins = stfts.shape[-1].value
lower_edge_hertz, upper_edge_hertz, num_mel_bins = 0.0,8000.0, 128
linear_to_mel_weight_matrix =
tf.contrib.signal.linear_to_mel_weight_matrix(
num_mel_bins, num_spectrogram_bins, sample_rate, lower_edge_hertz,
upper_edge_hertz)
mel_spectrograms = tf.tensordot(
spectrograms,
linear_to_mel_weight_matrix, 1)
mel_spectrograms.set_shape(spectrograms.shape[:-1].concatenate(
linear_to_mel_weight_matrix.shape[-1:]))
log_mel_spectrograms = tf.log(mel_spectrograms + 1e-6)
mfccs = tf.contrib.signal.mfccs_from_log_mel_spectrograms(
log_mel_spectrograms)[..., :20]
the equivalent in librosa:
libr_mfcc = librosa.feature.mfcc(wav, 16000)
the following are the graphs of the results:
I'm the author of tf.signal. Sorry for not seeing this post sooner, but you can get librosa and tf.signal.stft to match if you center-pad the signal before passing it to tf.signal.stft. See this GitHub issue for more details.
I spent a whole 1 day trying to make them match. Even the rryan's solution didn't work for me (center=False in librosa), but I finally found out, that TF and librosa STFT's match only for the case win_length==n_fft in librosa and frame_length==fft_length in TF. That's why rryan's colab example is working, but you can try that if you set frame_length!=fft_length, the amplitudes are very different (although visually, after plotting, the patterns look similar). Typical example - if you choose some win_length/frame_length and then you want to set n_fft/fft_length to the smallest power of 2 greater than win_length/frame_length, then the results will be different. So you need to stick with the inefficient FFT given by your window size... I don't know why it is so, but that's how it is, hopefully it will be helpful for someone.
The output of contrib_audio.decode_wav should be DecodeWav with { audio, sample_rate } and audio shape is (sample_rate, 1), so what is the purpose for getting first item of waveform and do transpose?
transwav = tf.transpose(waveform[0])
No straight forward way, since librosa stft uses center=True which does not comply with tf stft.
Had it been center=False, stft tf/librosa would give near enough results. see colab sniff
But even though, trying to import the librosa code into tf is a big headache. Here is what I started and gave up. Near but not near enough.
def pow2db_tf(X):
amin=1e-10
top_db=80.0
ref_value = 1.0
log10 = 2.302585092994046
log_spec = (10.0/log10) * tf.log(tf.maximum(amin, X))
log_spec -= (10.0/log10) * tf.log(tf.maximum(amin, ref_value))
pow2db = tf.maximum(log_spec, tf.reduce_max(log_spec) - top_db)
return pow2db
def librosa_feature_like_tf(x, sr=16000, n_fft=2048, n_mfcc=20):
mel_basis = librosa.filters.mel(sr, n_fft).astype(np.float32)
mel_basis = mel_basis.reshape(1, int(n_fft/2+1), -1)
tf_stft = tf.contrib.signal.stft(x, frame_length=n_fft, frame_step=hop_length, fft_length=n_fft)
print ("tf_stft", tf_stft.shape)
tf_S = tf.matmul(tf.abs(tf_stft), mel_basis);
print ("tf_S", tf_S.shape)
tfdct = tf.spectral.dct(pow2db_tf(tf_S), norm='ortho'); print ("tfdct", tfdct.shape)
print ("tfdct before cut", tfdct.shape)
tfdct = tfdct[:,:,:n_mfcc];
print ("tfdct afer cut", tfdct.shape)
#tfdct = tf.transpose(tfdct,[0,2,1]);print ("tfdct afer traspose", tfdct.shape)
return tfdct
x = tf.placeholder(tf.float32, shape=[None, 16000], name ='x')
tf_feature = librosa_feature_like_tf(x)
print("tf_feature", tf_feature.shape)
mfcc_rosa = librosa.feature.mfcc(wav, sr).T
print("mfcc_rosa", mfcc_rosa.shape)
For anyone still looking for this: I had a similar problem some time ago: Matching librosa's mel filterbanks/mel spectrogram to a tensorflow implementation. The solution was to use a different windowing approach for the spectrogram and librosa's mel matrix as constant tensor. See here and here.
I'm completely new to QMultimedia. At the moment, I try to get the audio stream from the microphone in my webcam for further processing. Right now I just try to continuously show the volume level of the sound "heard" by the mic with a slider. So I googled some code together (found nearly 10 tons of examples how I can play an audio, but only a few blocks of C++ code about audio input) and got stuck.
This is my actual code:
import sys, time
from PyQt4 import Qt, QtGui, QtCore, QtMultimedia
class VolumeSlider(QtGui.QSlider):
def __init__(self, parent=None):
super(VolumeSlider, self).__init__(parent)
self.audio = None
self.volumeSlider = QtGui.QSlider(QtCore.Qt.Horizontal)
self.volumeSlider.setTickInterval(1)
self.volumeSlider.setMaximum(100)
self.volumeSlider.setValue(49)
self.volumeSlider.show()
self.openMicStream()
# THIS IS WHAT I WANT - DOESN'T WORK
while True:
self.volumeSlider.setValue(self.audio.volume())
time.sleep(0.02)
def openMicStream( self ):
#audioInputDevices = QtMultimedia.QAudioDeviceInfo.availableDevices(QtMultimedia.QAudio.AudioInput)
#for d in audioInputDevices: d.deviceName()
info = QtMultimedia.QAudioDeviceInfo(QtMultimedia.QAudioDeviceInfo.defaultInputDevice())
print "Default audio input device:", info.deviceName()
audioFormat = QtMultimedia.QAudioFormat()
audioFormat.setFrequency(8000);
audioFormat.setChannels(1);
audioFormat.setSampleSize(8);
audioFormat.setCodec("audio/pcm");
audioFormat.setByteOrder(QtMultimedia.QAudioFormat.LittleEndian);
audioFormat.setSampleType(QtMultimedia.QAudioFormat.UnSignedInt);
audioDeviceInfo = QtMultimedia.QAudioDeviceInfo.defaultInputDevice();
if not audioDeviceInfo.isFormatSupported(audioFormat):
sys.stderr("default audioFormat not supported try to use nearest")
audioFormat = audioDeviceInfo.nearestFormat(audioFormat);
self.audioInput = QtMultimedia.QAudioInput(audioFormat);
fmtSupported = info.isFormatSupported(audioFormat)
print "Is the selected format supported?", fmtSupported
if not fmtSupported:
audioFormat = info.nearestFormat(audioFormat)
print "Is the nearest format supported?", info.isFormatSupported(audioFormat)
self.audio = QtMultimedia.QAudioInput(audioFormat, None)
self.audio.start()
if __name__ == "__main__":
app = Qt.QApplication(sys.argv)
x = VolumeSlider()
sys.exit(app.exec_())
Could anybody poke me in the head what I have to do at the "#THIS IS WHAT I WANT" place to calculate and show the current level of volume?
There is no inbuilt function for computing the current volume level of the input sound signal when recorded with QAudioInput neither in Qt 4 (QAudioInput documentation) nor in Qt 5.
But you could calculate it for yourself. The root-mean-square in a moving window of the signal is often used as a measure for current loudness. See How can I determine how loud a WAV file will sound? for more suggestions.
Solved it after a while of working on another parts. Now I can at least hear the sound out of the boxes, after I changed the openMicStream(self) to this:
def openMicStream( self ):
info = QAudioDeviceInfo(QAudioDeviceInfo.defaultInputDevice())
print "Default audioInput input device: ", info.deviceName()
audioFormat = QAudioFormat()
audioFormat.setFrequency(44100);
audioFormat.setChannels(1);
audioFormat.setSampleSize(16);
audioFormat.setCodec("audioInput/pcm");
audioFormat.setByteOrder(QAudioFormat.LittleEndian);
audioFormat.setSampleType(QAudioFormat.UnSignedInt);
audioDeviceInfo = QAudioDeviceInfo.defaultInputDevice();
if not audioDeviceInfo.isFormatSupported(audioFormat):
messages.error(__name__, "default audioFormat not supported try to use nearest")
audioFormat = audioDeviceInfo.nearestFormat(audioFormat);
print audioFormat.frequency()
print audioFormat.channels()
print audioFormat.sampleSize()
print audioFormat.codec()
print audioFormat.byteOrder()
print audioFormat.sampleType()
self.audioInput = QAudioInput(audioFormat);
audioFmtSupported = info.isFormatSupported(audioFormat)
messages.info(__name__, "Is the selected format supported?"+str(audioFmtSupported))
if not audioFmtSupported:
audioFormat = info.nearestFormat(audioFormat)
messages.info(__name__, "Is the nearest format supported?"+str(info.isFormatSupported(audioFormat)))
self.audioInput = QAudioInput(audioFormat, None)
self.audioOutput = QAudioOutput(audioFormat, None)
device = self.audioOutput.start()
self.audioInput.start(device)
I'd like to query my audio device and get all its available sample rates. I'm using PyAudio 0.2, which runs on top of PortAudio v19, on an Ubuntu machine with Python 2.6.
In the pyaudio distribution, test/system_info.py shows how to determine supported sample rates for devices. See the section that starts at line 49.
In short, you use the PyAudio.is_format_supported method, e.g.
devinfo = p.get_device_info_by_index(1) # Or whatever device you care about.
if p.is_format_supported(44100.0, # Sample rate
input_device=devinfo['index'],
input_channels=devinfo['maxInputChannels'],
input_format=pyaudio.paInt16):
print 'Yay!'
With the sounddevice module, you can do it like that:
import sounddevice as sd
samplerates = 32000, 44100, 48000, 96000, 128000
device = 0
supported_samplerates = []
for fs in samplerates:
try:
sd.check_output_settings(device=device, samplerate=fs)
except Exception as e:
print(fs, e)
else:
supported_samplerates.append(fs)
print(supported_samplerates)
When I tried this, I got:
32000 Invalid sample rate
128000 Invalid sample rate
[44100, 48000, 96000]
You can also check if a certain number of channels or a certain data type is supported.
For more details, check the documentation: check_output_settings().
You can of course also check if a device is a supported input device with check_input_settings().
If you don't know the device ID, have a look at query_devices().
I don't think that's still relevant, but this also works with Python 2.6, you just have to remove the parentheses from the print statements and replace except Exception as e: with except Exception, e:.
Directly using Portaudio you can run the command below:
for (int i = 0, end = Pa_GetDeviceCount(); i != end; ++i) {
PaDeviceInfo const* info = Pa_GetDeviceInfo(i);
if (!info) continue;
printf("%d: %s\n", i, info->name);
}
Thanks to another thread