How to loop audio in Alexa - node.js

I am building a ambient audio skill for sleep for Alexa! I am trying to loop the audio so I don't have to download 10 hour versions of the audio. How do I get the audio to work? I have it build to where it will play the audio, but not loop.

I've solved this problem in my skill Rainmaker: https://www.amazon.com/Arif-Gebhardt-Rainmaker/dp/B079V11ZDM
The trick is to handle the PlaybackNearlyFinished event.
https://developer.amazon.com/de/docs/alexa-voice-service/audioplayer.html#playbacknearlyfinished
This event is fired shortly before the currently playing audio stream is ending.
Respond to the event with another audioPlayerPlay directive with behavior ENQUEUE. This will infinitely loop your audio until it gets interrupted by e.g. the AMAZON.StopIntent.
Advanced: if you want a finite loop, say ten times your audio, use the token of the audioPlayerPlay directive to count down from ten. Once the counter hits zero, just don't enqueue another audio. But be sure to respond something in this case, even if it's just an empty response. Otherwise you will get a timeout error or the like.

Related

How to playback realtime audio in python while also constantly recording?

I want to create a speech jammer. It is essentially something that repeats back to you what you just said, but it is continuous. I was trying to use the sounddevice library and record what I am saying while also playing it back. Then I changed it to originally record what I was saying, then play it back while also recording something new. However it is not functioning as I would like it. Any suggestions for other libraries? Or if someone sees a suggestion for the code I already have.
Instead of constantly playing back to me, it is starting and stopping. It does this at intervals of the duration specified. So it will record for 500 ms, then play that back for 500 ms and then start recording again. Wanted behavior would be - recording for 500ms while playing back the audio it is recording at some ms delay.
import sounddevice as sd
import numpy as np
fs = 44100
sd.default.samplerate = fs
sd.default.channels = 2
#the above is to avoid having to specify arguments in every function call
duration = .5
myarray = sd.rec(int(duration*fs))
while(True):
sd.wait()
myarray = sd.playrec(myarray)
sd.wait()
Paraphrasing my own answer from https://stackoverflow.com/a/54569667:
The functions sd.play(), sd.rec() and sd.playrec() are not meant to be used repeatedly in rapid succession. Internally, they each time create an sd.OutputStream, sd.InputStream or sd.Stream (respectively), play/record the audio data and close the stream again. Because of opening and closing the stream, gaps will occur. This is expected.
For continuous playback you can use the so-called "blocking mode" by creating a single stream and calling the read() and/or write() methods on it.
Or, what I normally prefer, you can use the so-called "non-blocking mode" by creating a custom "callback" function and passing it to the stream on creation.
In this callback function, you can e.g. write the input data to a queue.Queue and read the output data from the same queue. By filling the queue by a certain amount of zeros beforehand, you can specify how long the delay between input and output shall be.
You can have a look at the examples to see how callback functions and queues are used.
Let me know if you need more help, then I can try to come up with a concrete code example.
I'm seeing a potential problem here of you trying to use myarray as both the input and the output of the .playrec() function. I would recommend having two arrays, one for recording the live audio, and one for playing back the recorded audio.
Instead of using the .playrec() command, you could just rapidly alternate between the use of .record() and .play() with a small delay between within your while-loop.
For example, the following code should record for one millisecond, wait a millisecond, and then playback the one millisecond of audio:
duration = 0.001
while(True):
myarray= sd.rec(int(duration*fs))
sd.wait()
sd.play(myarray, (int(duration*fs)))
There is no millisecond delay after the playback because you want to go right back to recording the next millisecond straight away. It should be noted, however, that this does not keep a recording of your audio for more than one millisecond! You would have to add your own code that adds to the array of a specified size and fills it up over time.

When does WASAPI GetNextPacketSize return 0

The sample code of WASAPI capture on MSDN, loops till the GetNextPacketSize return 0.
I just want to understand when will this happen:
Will it happen if there is silence registered on the microphone? (In this case will it loop infinitely if i keep making noise on microphone?)
It depends on some audio capture fundamental concept which I am missing (I am quite new to audio APIs :)).
The API helps in determining the size of the data buffer to be captured so that API client does not need to guess or allocate a buffer with excess etc. The API will return zero when there is no data to capture yet (not a single frame). This can happen in ongoing audio capture session when/if you call the API too early, and the caller is basically expected to try once again later since new data can still be generated.
In some conditions zero return might indicate an end of the stream. Specifically, if you capture from loopback device and there are no active playback sessions that can generate data for loopback delivery, capture API might keep delivering no data until new playback session emerges.
The sample code loop checks for zero packet size in conjunction with Sleep call. This way the loop expects that at least some data is generated during the sleep time and under normal conditions of continuous generation of audio data there is no zero length returned every first call within the outer loop. The inner loop attempts to read as many non-empty buffers as possible until zero indicates that all data, which was ready for delivery, was already returned to the client.
Outer loop keeps running until sink passes end-of-capture event through bDone variable. There is a catch here that somehow inner loop might be rolling without breaking into outer loop - according to the sample code - and capture is not correctly stopped. The sample assumes that sink processes data fast enough so that inner loop could process all currently available data and break out to reach Sleep call. That is, the WASAPI calls are all non-blocking and in assumption that these loops runs pretty fast the idea is that audio data is processed faster than it is captured, and the loop spends most of the thread time being in the Sleep call. Perhaps not the best sample code for beginners. You can improve this by checking bDone in the inner loop as well, to make it more reliable.

Different amount of frames for same track

In the jukebox.c example of libspotify I count all frames of the current track in the music_delivery callback. When end_of_track is called the frames count is different each time I played the same track. So end_of_track is called several seconds after the song is over. And this timespan differs for each playback.
How can I determine if the song is really over? Do I have to take the duration of the song in seconds and multiply it with the sample rate to take care when the song is over?
Why are more frames delivered than necessary for the track? And why is end_of_track not called on the real end of it? Or I am missing something?
end_of_track is called when libspotify has finished delivering audio frames for that track. This is not information about playback - every playback implementation I've seen keeps an internal buffer between libspotify and the sound driver.
Depending on where you're counting, this will account for the difference you're seeing. Since the audio code is outside of libspotify, you need to keep track of what's actually going to the sound driver yourself and stop playback, skip to the next track or whatever you need to do accordingly. end_of_track is basically there to let you know that you can close any output streams you may have from the delivery callback to your audio code or something along those lines.

Cocoalibspotify, how to trigger an action when playlist plays the next track

I have a playlist, and I want to sequentially play through the tracks, but every time a new track is loaded, I want to call a function. How would I go about listening for this event?
SPPlaybackManager, the playback class in CocoaLibSpotify, doesn't automatically play tracks sequentially, so you have to manually tell it to play each time. Since you're managing that, you already know when a new track is starting playback.
Additionally, SPPlaybackManagerDelegate has a method -playbackManagerWillStartPlayingAudio:, which will let you know when audio starts hitting the speakers.

realtime midi input and synchronisation with audio

I have built a standalone app version of a project that until now was just a VST/audiounit. I am providing audio support via rtaudio.
I would like to add MIDI support using rtmidi but it's not clear to me how to synchronise the audio and MIDI parts.
In VST/audiounit land, I am used to MIDI events that have a timestamp indicating their offset in samples from the start of the audio block.
rtmidi provides a delta time in seconds since the previous event, but I am not sure how I should grab those events and how I can work out their time in relation to the current sample in the audio thread.
How do plugin hosts do this?
I can understand how events can be sample accurate on playback, but it's not clear how they could be sample accurate when using realtime input.
rtaudio gives me a callback function. I will run at a low block size (32 samples). I guess I will pass a pointer to an rtmidi instance as the userdata part of the callback and then call midiin->getMessage( &message ); inside the audio callback, but I am not sure if this is thread-sensible.
Many thanks for any tips you can give me
In your case, you don't need to worry about it. Your program should send the MIDI events to the plugin with a timestamp of zero as soon as they arrive. I think you have perhaps misunderstood the idea behind what it means to be "sample accurate".
As #Brad noted in his comment to your question, MIDI is indeed very slow. But that's only part of the problem... when you are working in a block-based environment, incoming MIDI events cannot be processed by the plugin until the start of a block. When computers were slower and block sizes of 512 (or god forbid, >1024) were common, this introduced a non-trivial amount of latency which results in the arrangement not sounding as "tight". Therefore sequencers came up with a clever way to get around this problem. Since the MIDI events are already known ahead of time, these events can be sent to the instrument one block early with an offset in sample frames. The plugin then receives these events at the start of the block, and knows not to start actually processing them until N samples have passed. This is what "sample accurate" means in sequencers.
However, if you are dealing with live input from a keyboard or some sort of other MIDI device, there is no way to "schedule" these events. In fact, by the time you receive them, the clock is already ticking! Therefore these events should just be sent to the plugin at the start of the very next block with an offset of 0. Sequencers such as Ableton Live, which allow a plugin to simultaneously receive both pre-sequenced and live events, simply send any live events with an offset of 0 frames.
Since you are using a very small block size, the worst-case scenario is a latency of .7ms, which isn't too bad at all. In the case of rtmidi, the timestamp does not represent an offset which you need to schedule around, but rather the time which the event was captured. But since you only intend to receive live events (you aren't writing a sequencer, are you?), you can simply pass any incoming MIDI to the plugin right away.

Resources