Basic pitch shift using Stream methods in sounddevice module for Python? - python-3.x

I really do not understand the correct format or code structure regarding how to implement the Sounddevice Stream methods. I want to create a basic buffer that writes my array data to be read in a callback almost in real time. I want to be able to change the freq of the sound wave via a threaded query that is integrated with the stream. I am trying to understand the basic API and how input to output works with streaming via Sounddevice.
https://python-sounddevice.readthedocs.io/en/0.3.12/api.html
My lack of understanding of this API has me at a brick wall of knowing where to start. This is just for learning sound manipulation and applying effects to continuous sound without any audible cutoffs, kind of like a Theremin.

So after heavy API reading and some euroscipy videos I figured out the correct format for the sounddevice (portaudio fork) stream method. I also took some basic knowledge of threads and queues to create a rudimentary pitch shifter that is almost realtime. The pitch shifter will need to be changed and implemented with a knob. There will also need to be improved buffer speeds to be considered real time. Hope this helps out anyone wanting to just jump into manipulating sound without all the hassle!
def waveform(q):
with sd.Stream(samplerate=RATE,blocksize=CHUNK,dtype='int32',latency='low',callback=None) as s:
sps = 44100
wave = signal.square
t = .3
atten = .015
while True:
i = q.get()
freq = i
waveform = wave(2*np.pi*(np.arange(t*sps))*freq/sps)
waveform_quiet = waveform*atten
wave_int = waveform_quiet * 2147483647
s.write(np.ascontiguousarray(wave_int, np.int32))
q=Queue()
q.put(i)
p = Thread(target=waveform, args=(q,))
p.daemon = True
p.start()
#pitch shifter, increments of 10hz
while True:
i+ = 10
q.put(i)
print('Queues being stored')
print(i)
if i >880:
print('Queues Stored')
break

Related

ESP8266 analogRead() microphone Input into playable audio

My goal is to record audio using an electret microphone hooked into the analog pin of an esp8266 (12E) and then be able to play this audio on another device. My circuit is:
In order to check the output of the microphone I connected the circuit to the oscilloscope and got this:
In the "gif" above you can see the waves made by my voice when talking to microphone.
here is my code on esp8266:
void loop() {
sensorValue = analogRead(sensorPin);
Serial.print(sensorValue);
Serial.print(" ");
}
I would like to play the audio on the "Audacity" software in order to have an understanding of the result. Therefore, I copied the numbers from the serial monitor and paste it into the python code that maps the data to (-1,1) interval:
def mapPoint(value, currentMin, currentMax, targetMin, targetMax):
currentInterval = currentMax - currentMin
targetInterval = targetMax - targetMin
valueScaled = float(value - currentMin) / float(currentInterval)
return round(targetMin + (valueScaled * targetInterval),5)
class mapper():
def __init__(self,raws):
self.raws=raws.split(" ")
self.raws=[float(i) for i in self.raws]
def mapAll(self):
self.mappeds=[mapPoint(i,min(self.raws),max(self.raws),-1,1) for i in self.raws ]
self.strmappeds=str(self.mappeds).replace(",","").replace("]","").replace("[","")
return self.strmappeds
Which takes the string of numbers, map them on the target interval (-1 ,+1) and return a space (" ") separated string of data ready to import into Audacity software. (Tools>Sample Data Import and then select the text file including the data). The result of importing data from almost 5 seconds voice:
which is about half a second and when I play I hear unintelligible noise. I also tried lower frequencies but there was only noise there, too.
The suspected causes for the problem are:
1- Esp8266 has not the capability to read the analog pin fast enough to return meaningful data (which is probably not the case since it's clock speed is around 100MHz).
2- The way software is gathering the data and outputs it is not the most optimized way (In the loop, Serial.print, etc.)
3- The microphone circuit output is too noisy. (which might be, but as observed from the oscilloscope test, my voice has to make a difference in the output audio. Which was not audible from the audacity)
4- The way I mapped and prepared the data for the Audacity.
Is there something else I could try?
Are there similar projects out there? (which to my surprise I couldn't find anything which was done transparently!)
What can be the right way to do this? (since it can be a very useful and economic method for recording, transmitting and analyzing audio.)
There are many issues with your project:
You do not set a bias voltage on A0. The ADC can only measure voltages between Ground and VCC. When removing the microphone from the circuit, the voltage at A0 should be close to VCC/2. This is usually achieved by adding a voltage divider between VCC and GND made of 2 resistors, and connected directly to A0. Between the cap and A0.
Also, your circuit looks weird... Is the 47uF cap connected directly to the 3.3V ? If that's the case, you should connect it to pin 2 of the microphone instead. This would also indicate that right now your ADC is only recording noise (no bias voltage will do that).
You do not pace you input, meaning that you do not have a constant sampling rate. That is a very important issue. I suggest you set yourself a realistic target that is well within the limits of the ADC, and the limits of your serial port. The transfer rate in bytes/sec of a serial port is usually equal to baud-rate / 8. For 9600 bauds, that's only about 1200 bytes/sec, which means that once converted to text, you max transfer rate drops to about 400 samples per second. This issue needs to be addressed and the max calculated before you begin, as the max attainable overall sample rate is the maximum of the sample rate from the ADC and the transfer rate of the serial port.
The way to grab samples depends a lot on your needs and what you are trying to do with this project, your audio bandwidth, resolution and audio quality requirements for the application and the amount of work you can put into it. Reading from a loop as you are doing now may work with a fast enough serial port, but the quality will always be poor.
The way that is usually done is with a timer interrupt starting the ADC measurement and an ADC interrupt grabbing the result and storing it in a small FIFO, while the main loop transfers from this ADC fifo to the serial port, along the other tasks assigned to the chip. This cannot be done directly with the Arduino libraries, as you need to control the ADC directly to do that.
Here a short checklist of things to do:
Get the full ESP8266 datasheet from Expressif. Look up the actual specs of the ADC, mainly: the sample rates and resolutions available with your oscillator, and also its electrical constraints, at least its input voltage range and input impedance.
Once you know these numbers, set yourself some target, the math needed for successful project need input numbers. What is your application? Do you want to record audio or just detect a nondescript noise? What are the minimum requirements needed for things to work?
Look up in the Arduino documentartion how to set up a timer interrupt and an ADC interrupt.
Look up in the datasheet which registers you'll need to access to configure and run the ADC.
Fix the voltage bias issue on the ADC input. Nothing can work before that's done, and you do not want to destroy your processor.
Make sure the input AC voltage (the 'swing' voltage) is large enough to give you the results you want. It is not unusual to have to amplify a mic signal (with an opamp or a transistor), just for impedance matching.
Then you can start writing code.
This may sound awfully complex for such a small task, but that's what the average day of an embedded programmer looks like.
[EDIT] Your circuit would work a lot better if you simply replaced the 47uF DC blocking capacitor by a series resistor. Its value should be in the 2.2k to 7.6k range, to keep the circuit impedance within the 10k Ohms or so needed for the ADC. This would insure that the input voltage to A0 is within the operating limits of the ADC (GND-3.3V on the NodeMCU board, 0-1V with bare chip).
The signal may still be too weak for your application, though. What is the amplitude of the signal on your scope? How many bits of resolution does that range cover once converted by the ADC? Example, for a .1V peak to peak signal (SIG = 0.1), an ADC range of 0-3.3V (RNG = 3.3) and 10 bits of resolution (RES = 1024), you'll have
binary-range = RES * (SIG / RNG)
= 1024 * (0.1 / 3.3)
= 1024 * .03
= 31.03
A range of 31, which means around Log2(31) (~= 5) useful bits of resolution, is that enough for your application ?
As an aside note: The ADC will give you positive values, with a DC offset, You will probably need to filter the digital output with a DC blocking filter before playback. https://manual.audacityteam.org/man/dc_offset.html

Listening for 2 or more microphones using Microsoft speech services

Good day
I have a project in python where you can talk and get responses from, like a chat. The app is working great, now I want to be able to install two microphones and talk to my assistant from both of my microphones.
But the problem is, I'm using microsoft speech services, and in their examples they haven't shown about using two audio streams or something related to this. I saw their topic on multiple audio recognition with Java, C# and C++. No python is supported.
My question is, is there any way I can connect two or more microphones to my laptop and use two audio streams at the same time to get response from my app?
I have python3.9 installed and my code just uses recognize_once() function from Microsoft's examles.
I was thinking is there any way I can run like multi threads and listen for audio from those threads, I have no idea. I did search for topics related to this, but people explain doing this with PyAudio, I use microsoft speech services because my language isn't supported.
Any help would be appreciated, sorry for my english.
For this kind of problem, we can use multiple audio channel array. There is a service called "Microphone array recommendations". There are different array channels and based on the channel count we can include the micro
phones. We can include the array of 2,4,7 channels.
2 Microphones - It's a linear channel.
Check the following document to know about the spacing and the microphone array.
Document
You need to make sure that the default Microsoft Azure Kinect DK is enabled or not. Follow the below python code, which is in the running state.
import pyaudio
import wave
import numpy as np
p = pyaudio.PyAudio()
# Find out the index of Azure Kinect Microphone Array
azure_kinect_device_name = "Azure Kinect Microphone Array"
index = -1
for i in range(p.get_device_count()):
print(p.get_device_info_by_index(i))
if azure_kinect_device_name in p.get_device_info_by_index(i)["name"]:
index = i
break
if index == -1:
print("Could not find Azure Kinect Microphone Array. Make sure it is properly connected.")
exit()
# Open the stream for reading audio
input_format = pyaudio.paInt32
input_sample_width = 4
input_channels = 7 #choose your channel count among 2,4,7
input_sample_rate = 48000
stream = p.open(format=input_format, channels=input_channels, rate=input_sample_rate, input=True, input_device_index=index)
# Read frames from microphone and write to wav file
with wave.open("output.wav", "wb") as outfile:
outfile.setnchannels(1) # We want to write only first channel from each frame
outfile.setsampwidth(input_sample_width)
outfile.setframerate(input_sample_rate)
time_to_read_in_seconds = 5
frames_to_read = time_to_read_in_seconds * input_sample_rate
total_frames_read = 0
while total_frames_read < frames_to_read:
available_frames = stream.get_read_available()
read_frames = stream.read(available_frames)
first_channel_data = np.fromstring(read_frames, dtype=np.int32)[0::7].tobytes()
outfile.writeframesraw(first_channel_data)
total_frames_read += available_frames
stream.stop_stream()
stream.close()
p.terminate()

Realtime STFT and ISTFT in Julia for Audio Processing

I'm new to audio processing and dealing with data that's being streamed in real-time. What I want to do is:
listen to a built-in microphone
chunk together samples into 0.1second chunks
convert the chunk into a periodogram via the short-time Fourier transform (STFT)
apply some simple functions
convert back to time series data via the inverse STFT (ISTFT)
play back the new audio on headphones
I've been looking around for "real time spectrograms" to give me a guide on how to work with the data, but no dice. I have, however, discovered some interesting packages, including PortAudio.jl, DSP.jl and MusicProcessing.jl.
It feels like I'd need to make use of multiprocessing techniques to just store the incoming data into suitable chunks, whilst simultaneously applying some function to a previous chunk, whilst also playing another previously processed chunk. All of this feels overcomplicated, and has been putting me off from approaching this project for a while now.
Any help will be greatly appreciated, thanks.
As always start with a simple version of what you really need ... ignore for now pulling in audio from a microphone, instead write some code to synthesize a sin curve of a known frequency and use that as your input audio, or read in audio from a wav file - benefit here is its known and reproducible unlike microphone audio
this post shows how to use some of the libs you mention http://www.seaandsailor.com/audiosp_julia.html
You speak of "real time spectrogram" ... this is simply repeatedly processing a window of audio, so lets initially simplify that as well ... once you are able to read in the wav audio file then send it into a FFT call which will return back that audio curve in its frequency domain representation ... as you correctly state this freq domain data can then be sent into an inverse FFT call to give you back the original time domain audio curve
After you get above working then wrap it in a call which supplies a sliding window of audio samples to give you the "real time" benefit of being able to parse incoming audio from your microphone ... keep in mind you always use a power of 2 number of audio samples in your window of samples you feed into your FFT and IFFT calls ... lets say your window is 16384 samples ... your julia server will need to juggle multiple demands (1) pluck the next buffer of samples from your microphone feed (2) send a window of samples into your FFT and IFFT call ... be aware the number of audio samples in your sliding window will typically be wider than the size of your incoming microphone buffer - hence the notion of a sliding window ... over time add your mic buffer to the front of this window and remove same number of samples off from tail end of this window of samples

How can I play a wav file in Go with portaudio and sndfile

First off, I'm new to the world of Go and lower level programming, so bear with me... :)
So what I'm trying to do is this; read a .wav-file with the libsndfile binding for Go and play it with portaudio.
I cannot find any examples for this, and clearly I lack basic knowledge about pointers, streams and buffers to make this happen. Here is my take on it so far, I've tried to read the docs and the few examples I've been able to find and put the pieces together. I think I'm able to open the file and the stream but I don't get how to connect the two.
package main
import (
"code.google.com/p/portaudio-go/portaudio"
"fmt"
"github.com/mkb218/gosndfile/sndfile"
"math/rand"
)
func main() {
portaudio.Initialize()
defer portaudio.Terminate()
// Open file with sndfile
var i sndfile.Info
file, fileErr := sndfile.Open("hello.wav", sndfile.Read, &i)
fmt.Println("File: ", file, fileErr)
// Open portaudio stream
h, err := portaudio.DefaultHostApi()
stream, err := portaudio.OpenStream(portaudio.HighLatencyParameters(nil, h.DefaultOutputDevice), func(out []int32) {
for i := range out {
out[i] = int32(rand.Uint32())
}
})
defer stream.Close()
fmt.Println("Stream: ", stream, err)
// Play portaudio stream
// ....
framesOut := make([]int32, 32000)
data, err := file.ReadFrames(framesOut)
fmt.Println("Data: ", data, err)
}
I would be ever so grateful for a working example and some tips/links for beginners. If you have a solution that involves other libraries than the two mentioned above, that's ok too.
Aha, audio programming! Welcome to the world of soft-realtime computing :)
Think about the flow of data: a bunch of bits in a .wav file on disk are read by your program and sent to the operating system which hands them off to a sound card where they are converted to an analog signal that drives speakers generating the sound waves that finally reach your ears.
This flow is very sensitive to time fluctuations. If it is held up at any point you will perceive noticeable and sometimes jarring artifacts in the final sound.
Generally the OS/sound card are solid and well tested - most audio artifacts are caused by us developers writing shoddy application code ;)
Libraries such as PortAudio help us out by taking care of some of the thread proirity black magic and making the scheduling approachable. Essentially it says "ok I'm going to start this audio stream, and every X milliseconds when I need the next bit of sample data I'll callback whatever function you provide."
In this case you've provided a function that fills the output buffer with random data. To playback the wave file instead, you need to change this callback function.
But! You don't want to be doing I/O in the callback. Reading some bytes off disk could take tens of milliseconds, and portaudio needs that sample data now so that it gets to the sound card in time. Similarly, you want to avoid acquiring locks or any other operation that could potentially block in the audio callback.
For this example it's probably simplest to load the samples before starting the stream, and use something like this for the callback:
isample := 0
callback := func(out []int32) {
for i := 0; i < len(out); i++ {
out[i] = framesOut[(isample + i) % len(framesOut)]
}
isample += len(out)
}
Note that % len(framesOut) will cause the loaded 32000 samples to loop over and over - PortAudio will keep the stream running until you tell it to stop.
Actually, you need to tell it to start too! After opening it call stream.Start() and add a sleep after that or your program is likely to exit before it gets a chance to play anything.
Finally, this also assumes that the sample format in the wave file is the same as the sample format you requested from PortAudio. If the formats don't match you will still hear something, but it probably won't sound pretty! Anyway sample formats are a whole 'nother question.
Of course loading all your sample data up front so you can refer to it within the audio callback isn't a fantastic approach except once you get past hello world stuff. Generally you use a ring-buffer or something similar to pass sample data to the audio callback.
PortAudio provides another API (the "blocking" API) that does this for you. For portaudio-go, this is invoked by passing a slice into OpenStream instead of a function. When using the blocking API you pump sample data into the stream by (a) filling the slice you passed into OpenStream and (b) calling stream.Write().
This is much longer than I intended so I better leave it there. HTH.

Prevent ALSA underruns with PyAudio

I wrote a little program which records voice from the microphone and sends it over network and plays it there. I'm using PyAudio for this task. It works almost fine but on both computers i get errors from ALSA that an underrun occurred. I googled a lot about it and now I know what an underrun even is. But I still don't know how to fix the problem. Most of the time the sound is just fine. But it sounds a little bit strange if underruns occur. Is there anything I should take care of in my code? It feels like I'm doing an simple error and I miss it.
My system: python: python3.3, OS: Linux Mint Debian Edition UP7, PyAudio v0.2.7
Have you considered syncing sound?
You didn't provide the code, so my guess is that you need to have a timer in separate thread, that will execute every CHUNK_SIZE/RATE milliseconds code that looks like this:
silence = chr(0)*self.chunk*self.channels*2
out_stream = ... # is the output stream opened in pyaudio
def play(data):
# if data has not arrived, play the silence
# yes, we will sacrifice a sound frame for output buffer consistency
if data == '':
data = silence
out_stream.write(data)
Assuming this code will execute regularly, this way we will always supply some audio data to output audio stream.
It's possible to prevent the underruns by filling silence in if needed.
That looks like that:
#...
data = s.recv(CHUNK * WIDTH) # Receive data from peer
stream.write(data) # Play sound
free = stream.get_write_available() # How much space is left in the buffer?
if free > CHUNK # Is there a lot of space in the buffer?
tofill = free - CHUNK
stream.write(SILENCE * tofill) # Fill it with silence
#...
The solution for me was to buffer the first 10 packets/frames of the recorded sound. Look at the snippet below
BUFFER = 10
while len(queue) < BUFFER:
continue
while running:
recorded_frame = queue.pop(0)
audio.write(recorded_frame)

Resources