Prevent ALSA underruns with PyAudio - python-3.x

I wrote a little program which records voice from the microphone and sends it over network and plays it there. I'm using PyAudio for this task. It works almost fine but on both computers i get errors from ALSA that an underrun occurred. I googled a lot about it and now I know what an underrun even is. But I still don't know how to fix the problem. Most of the time the sound is just fine. But it sounds a little bit strange if underruns occur. Is there anything I should take care of in my code? It feels like I'm doing an simple error and I miss it.
My system: python: python3.3, OS: Linux Mint Debian Edition UP7, PyAudio v0.2.7

Have you considered syncing sound?
You didn't provide the code, so my guess is that you need to have a timer in separate thread, that will execute every CHUNK_SIZE/RATE milliseconds code that looks like this:
silence = chr(0)*self.chunk*self.channels*2
out_stream = ... # is the output stream opened in pyaudio
def play(data):
# if data has not arrived, play the silence
# yes, we will sacrifice a sound frame for output buffer consistency
if data == '':
data = silence
out_stream.write(data)
Assuming this code will execute regularly, this way we will always supply some audio data to output audio stream.

It's possible to prevent the underruns by filling silence in if needed.
That looks like that:
#...
data = s.recv(CHUNK * WIDTH) # Receive data from peer
stream.write(data) # Play sound
free = stream.get_write_available() # How much space is left in the buffer?
if free > CHUNK # Is there a lot of space in the buffer?
tofill = free - CHUNK
stream.write(SILENCE * tofill) # Fill it with silence
#...

The solution for me was to buffer the first 10 packets/frames of the recorded sound. Look at the snippet below
BUFFER = 10
while len(queue) < BUFFER:
continue
while running:
recorded_frame = queue.pop(0)
audio.write(recorded_frame)

Related

Icecast: I have strange behaviour with repeats of end of tracks, as well as pitch changes from my Icecast Server

I only began using icecast a few days ago, so if I stuffed something up somewhere, please let me know.
I have a weird problem with icecast. Everytime a track is "finished" on icecast, a section of the end of the currently playing track (i think 64kbs of the track) is repeated about 2 to 3 times before the next song plays, but the next song doesn't begin playing in the start, but a few seconds of the way through. Also, I can notice that the playback speed (and hence the pitch) sometimes differs from the original as well.
I consulted this post and this post that was quoted below which taught me what the <burst-on-connect> and the <burst-size> tags are used for. It also taught me this:
What's happening here is that nothing is being added to the buffer, so clients connect, get the contents of that buffer, and then the stream ends. The client must be re-connecting repeatedly, and it keeps getting that same buffer.
Cheers to Brad for that post. A solution to this problem was provided in a comments section of that post and it said to decrease the <source-timeout> of the icecast server, so that it will close the connection quicker and stop any repeating. But this is assuming I want to close the mountpoint, and I dont, because what I am using Icecast for is actually a 24/7 radio player. If I did close my mountpoint, then what happens is VLC just turns off and doesn't repeatedly attempt to connect anymore. Unless this is wrong. I don't know.
I use VLC to hear the playback of the icecast streams and I use nodeshout which is a bunch of bindings from libshout built for node.js. I use nodeshout to send data to a bunch of mounts on my icecast server. In the future I plan to make a site that will listen to the icecast streams, meaning it will replace VLC.
icecast.xml
<limits>
<clients>100</clients>
<sources>4</sources>
<queue-size>1008576</queue-size>
<client-timeout>30</client-timeout>
<header-timeout>15</header-timeout>
<source-timeout>30</source-timeout>
<burst-on-connect>1</burst-on-connect>
<burst-size>252144</burst-size>
</limits>
This is a summary of the audio sending code on my node.js server.
nodejs
// these lines of code is a smaller part of a function, and this sets all the information. The variables name, description etc come from the arguments of the function
var nodeshout = require("nodeshout");
let shout = nodeshout.create();
shout.setHost('localhost');
shout.setPort(8000);
shout.setUser('source');
shout.setPassword(process.env.icecastPassword); //password in .env file
shout.setName(name);
shout.setDescription(description);
shout.setMount(mount);
shout.setGenre(genre);
shout.setFormat(1); // 0=ogg, 1=mp3
shout.setAudioInfo('bitrate', '128');
shout.setAudioInfo('samplerate', '44100');
shout.setAudioInfo('channels', '2');
return shout
// now meanwhile somewhere lower in the file, there is this summary of how the audio is sent to the icecast server
var nodeshout = require("nodeshout")
var {FileReadStream, ShoutStream} = require("nodeshout") //here is where the FileReadStream and ShoutStream functions come from
const filecontent = new FileReadStream(pathToSong, 65536); //if I change the 65536 to a higher value, then more bytes are being repeated at the end of the track. If I decrease this, it starts sounding buggy and off.
var streamcontent = filecontent.pipe(new ShoutStream(shoutstream))
streamcontent.on('finish', () => {
next()
console.log("Track has finished on " + stream.name + ": " + chosenTrack)
})
I also notice weirder behaviour. After the previous song had it's last chunk repeated a few times, that's when the server calls the streamcontent.on('finish') event that is located in the nodejs script, and only then does it warn me that the track is finished.
What I have tried
I tried messing around with the <source-timeout> tag, the number of bytes (or bits im not sure) that are being sent on nodejs, the burst size, I also tried turning bursting off completely but it results in super strange behavior.
I also thought creating a new stream every time per song was a bad idea as seen in new ShoutStream(shoutstream) when piping the file data, but using the same stream meant that the program would return an error because it would write the next track to the shoutstream after it had said it had closed.
If any more information is necessary to figure out what is going on, I can provide it. Thanks for your time.
Edit: I would like to add: Do you think I should manually control how many bytes are sent to icecast and then use the same stream object instead of calling a new one every time?
I found out why the stream didn't play some tracks as opposed to others.
How I got there
I could not switch to ogg/vorbis or ogg/opus for my stream, so I had to do something with my source client. I double checked everything was correct and that my audio files were in the correct bitrate. When i ran the ffprobe tool with ffprobe audio.mp3 sometimes the bitrates did not adhere to the typical rates of 120kbps, 128kbps, 192, 312, etc etc so on. It was always some strange value such as 129852 just to provide an example.
I then downloaded the checkmate mp3 checker here and checked my audio files, and they were all encoded in a variable bitrate!!! VBR damnit!
TLDR
I fixed my problem by re-encoding all my tracks to a constant bitrate of 128kbps using ffmpeg.
Quick Edit: I am pretty sure that programs such as Darkice might already support variable bit rate transfers to Icecast servers, but it would be impractical for me to use darkice, hence why I stuck with nodeshout.

What's FFmpeg doing with avcodec_send_packet()?

I'm trying to optimise a piece of software for playing video, which internally uses the FFmpeg libraries for decoding. We've found that on some large (4K, 60fps) video, it sometimes takes longer to decode a frame than that frame should be displayed for; sadly, because of the problem domain, simply buffering/skipping frames is not an option.
However, it appears that the FFmpeg executable is able to decode the video in question fine, at about 2x speed, so I've been trying to work out what we're doing wrong.
I've written a very stripped-back decoder program for testing; the source is here (it's about 200 lines). From profiling it, it appears that the one major bottleneck during decoding is the avcodec_send_packet() function, which can take up to 50ms per call. However, measuring the same call in FFmpeg shows strange behaviour:
(these are the times taken for each call to avcodec_send_packet() in milliseconds, when decoding a 4K 25fps VP9-encoded video.)
Basically, it seems that when FFmpeg uses this function, it only really takes any amount of time to complete every N calls, where N is the number of threads being used for decoding. However, both my test decoder and the actual product use 4 threads for decoding, and this doesn't happen; when using frame-based threading, the test decoder behaves like FFmpeg using only 1 thread. This would seem to indicate that we're not using multithreading at all, but we've still seen performance improvements by using more threads.
FFmpeg's results average out to being about twice as fast overall as our decoders, so clearly we're doing something wrong. I've been reading through FFmpeg's source to try to find any clues, but it's so far eluded me.
My question is: what's FFmpeg doing here that we're not? Alternatively, how can we increase the performance of our decoder?
Any help is greatly appreciated.
I was facing the same problem. It took me quite a while to figure out a solution which I want to share here for future references:
Enable multithreading for the decoder. Per default the decoder only uses one thread, depending on the decoder, multithreading can speed up decoding drastically.
Assuming you have AVFormatContext *format_ctx, a matching codec AVCodec* codec and AVCodecContext* codec_ctx (allocated using avcodec_alloc_context3).
Before opening the codec context (using avcodec_open2) you can configure multithreading. Check the capabilites of the codec in order to decide which kind of multithreading you can use:
// set codec to automatically determine how many threads suits best for the decoding job
codec_ctx->thread_count = 0;
if (codec->capabilities | AV_CODEC_CAP_FRAME_THREADS)
codec_ctx->thread_type = FF_THREAD_FRAME;
else if (codec->capabilities | AV_CODEC_CAP_SLICE_THREADS)
codec_ctx->thread_type = FF_THREAD_SLICE;
else
codec_ctx->thread_count = 1; //don't use multithreading
Another speed-up I found out is the following: keep sending packets to the decoder (thats what avcodec_send_packet() is doing) until you get AVERROR(EAGAIN) as return value. This means the internal decoder buffers are full and you first need to collect the decoded frames (but remember to send this last packet again after the decoder is empty again). Now you can collect the decoded frames using avcodec_receive_frame until you get AVERROR(EAGAIN) again.
Some decoders work way faster when they have mutiple frames queued for decoding (thats what the decoder does when codec_ctx->thread_type = FF_THREAD_FRAME is set).
avcodec_send_packet() and avcodec_receive_frame() are wrapper functions most important thing those do is calling selected codec's decode function and returns decoded frame (or error).
Try tweaking the codec options, for example, low latency may not give you what you want. And sometimes old api (I believe it still around) avcodec_decode_video2() outperforms newer one, you may try that too.

How to process audio stream in realtime

I have a setup with a raspberry pi 3 running latest jessie with all updates installed in which i provide a A2DP bluetooth sink where i connect with a phone to play some music.
Via pulseaudio, the source (phone) is routed to the alsa output (sink). This works reasonably well.
I now want to analyze the audio stream using python3.4 with librosa and i found a promising example using pyaudio which got adjusted to use the pulseaudio input (which magically works because its the default) instead of a wavfile:
"""PyAudio Example: Play a wave file (callback version)."""
import pyaudio
import wave
import time
import sys
import numpy
# instantiate PyAudio (1)
p = pyaudio.PyAudio()
# define callback (2)
def callback(in_data, frame_count, time_info, status):
# convert data to array
data = numpy.fromstring(data, dtype=numpy.float32)
# process data array using librosa
# ...
return (None, pyaudio.paContinue)
# open stream using callback (3)
stream = p.open(format=p.paFloat32,
channels=1,
rate=44100,
input=True,
output=False,
frames_per_buffer=int(44100*10),
stream_callback=callback)
# start the stream (4)
stream.start_stream()
# wait for stream to finish (5)
while stream.is_active():
time.sleep(0.1)
# stop stream (6)
stream.stop_stream()
stream.close()
wf.close()
# close PyAudio (7)
p.terminate()
Now while the data flow works in principle, there is a delay (length of buffer) with which the stream_callback gets called. Since the docs state
Note that PyAudio calls the callback function in a separate thread.
i would have assumed that while the callback is worked on, the buffer keeps filling in the mainthread. Of course, there would be an initial delay to fill the buffer, afterwards i expected to get synchronous flow.
I need a longer portion in the buffer (see frames_in_buffer) for librosa to be able to perfom analysis correctly.
How is something like this possible? Is it a limitation of the software-ports for the raspberry ARM?
I found other answers, but they use the blocking I/O. How would i wrap this into a thread so that librosa analysis (which might take some time) does not block the buffer filling?
This blog seems to fight performance issues with cython, but i dont think the delay is a performance issue. Or might it? Others seem to need some ALSA tweaks but would this help while using pulseaudio?
Thanks, any input appreciated!

Opencv stereo cameras capture and framerate limits

I am trying to get pairs of images out of a Minoru stereo webcam, currently through opencv on linux.
It works fine when I force a low resolution:
left = cv2.VideoCapture(0)
left.set(cv2.cv.CV_CAP_PROP_FRAME_WIDTH, 320)
left.set(cv2.cv.CV_CAP_PROP_FRAME_HEIGHT, 240)
right = cv2.VideoCapture(0)
right.set(cv2.cv.CV_CAP_PROP_FRAME_WIDTH, 320)
right.set(cv2.cv.CV_CAP_PROP_FRAME_HEIGHT, 240)
while True:
_, left_img = left.read()
_, right_img = right.read()
...
However, I'm using the images for creating depth maps, and a bigger resolution would be good. But if I try leaving the default, or forcing resolution to 640x480, I'm hitting errors:
libv4l2: error turning on stream: No space left on device
I have read about USB bandwith limitations but:
this happens on the first iteration (first read() from right)
I don't need anywhere near 60 or even 30 FPS, but couldn't manage to reduce "requested FPS" via VideoCapture parameters (if this makes sense)
adding sleeps don't seem to help, even between the left/right reads
strangely if I do much processing (in the while loop), I start noticing "lag": things happening in the real world get shown much later on the images read. This would suggest that actually there is a buffer somewhere that can and does accumulate several images (a lot)
I tried a workaround of creating and releasing a separate VideoCapture for each image read, but this is a bit too slow overall (< 1FPS), and more importantly, image are too much out of sync for working on stereo matching.
I'm trying to understand why this fails, in order to find solutions. It looks like v4l is allocating a single global too-small buffer, used by the 2 capture objects somehow.
Any help would be appreciated.
I had the same problem and found this answer - https://superuser.com/questions/431759/using-multiple-usb-webcams-in-linux
Since both the minoru cameras show the format as 'YUYV', this is likely a USB bandwidth issue. I lowered the frames per second to 20 (didn't work at 24) and I can see both the 640x480 images.

Sound only plays once in tcl/tk (snack, pulseaudio, Linux)

I'm using snack package in Linux Mint 13 (Maya), tk8.5 (wish).
My audio output is an analog stereo with pulseaudio software.
Acording to this: http://www.speech.kth.se/snack/tutorial.html
all I have to do to play a sound again is to use play command again.
I have a sound object and it only plays once no matter how many times I call the play command.
I tried putting a stop command before play, like this:
mysound stop
mysound play
What happens: it plays on the first but not on the second call, plays on the third but not on the fourth call, and it goes on. This is asynchronous, which means I pushed buttons to repeat stop-play. Now, this script:
package require snack
snack::sound s
s read knock.wav
after 1000 {s play}; #play sound effect
after 5000 {s play}; #this one doesn't work
after 10000 {s play}; #this one doesn't work
after 15000 {s stop; s play}; #played
after 20000 {s stop; s play}; #not played
after 25000 {s stop; s play}; #played
Same behavior as I had using button release events. In Android, the behavior work exactly as in theory, except that it has huge delays depending on the device (e.g. the sound comes after 2 seconds in one phone and after 200ms in another with better hardware).
I know the theory is right, and my final question is: how can I improve a Linux implementation that uses a more robust sound playing? Maybe using midi sounds. A solution that could work in any UNIX machine. Does snack provide that?
Thank you so much, for this is very important for me and I believe for other as well!
Unfortunately you don't tell us what your system is (what linux are you using and what is your audio system and device) or what you are really doing. So please provide a minimal working example.
Here is mine, that is working interactively (I routinely use Tkcon for this).
package require sound
snack::sound s1 -load longrunning-audio.wav
s1 play
# wait
s1 stop
s1 play
s1 pause
s1 pause; # resumes playing
I use the sound package instead of snack, because we don't need the graphical elements here.
And as a script
#!/usr/bin/env tclsh
package require sound
snack::sound s1 -load longrunning-audio.wav
puts "play"; s1 play
after 4000 puts "stop" \; s1 stop
after 6000 puts "play" \; s1 play
after 10000 puts "pause" \; s1 pause
after 12000 puts "resume" \; s1 pause;
after 16000 {apply {{} {
puts "stop and exit"
s1 stop
exit
}}}
# start the event loop
vwait forever
after starts a command after the given time in microseconds. In a real program you would use some procedures for this, here it is only to simulate some interaction.
Maybe you are suffering from a badly packaged snack or your audio system is playing fools with you. I remember a similar problem with some version of pulseaudio in combination with one of the output devices. Only the start of the sound was played but the audio system stayed active (use snack::audio active to show the current state).
Wow, you've got a lot of callback scripts scheduled at the same time! The after calls themselves take very little time. This means that you've got several calls to s play happening at virtually the same time. You are aware that only the last such one will have any effect? It's effectively as if you did this (in the “1000ms from now” timeslot):
s play
s play
s stop; s play
s stop; s play
s stop; s play
Yes, they're several callbacks, but they'll all actually get evaluated one after each other. You won't see any time between them.
I believe that the sound object, s, can only ever be playing or not playing. Also, sounds don't play instantly (obviously) so you've got to allow time from starting the playing to stopping it if you want the sound to be heard. Doing what you're doing will result in lots of activity at the command processing level, but not so much will be observable.

Resources