How to get amplitude of an audio stream in an AudioGraph to build a SoundWave using Universal Windows? - audio

I want to built a SoundWave sampling an audio stream.
I read that a good method is to get amplitude of the audio stream and represent it with a Polygon. But, suppose we have and AudioGraph with just a DeviceInputNode and a FileOutpuNode (a simple recorder).
How can I get the amplitude from a node of the AudioGraph?
What is the best way to periodize this sampling? Is a DispatcherTimer good enough?
Any help will be appreciated.

First, everything you care about is kind of here:
uwp AudioGraph audio processing
But since you have a different starting point, I'll explain some more core things.
An AudioGraph node is already periodized for you -- it's generally how audio works. I think Win10 defaults to periods of 10ms and/or 20ms, but this can be set (theoretically) via the AudioGraphSettings.DesiredSamplesPerQuantum setting, with the AudioGraphSettings.QuantumSizeSelectionMode = QuantumSizeSelectionMode.ClosestToDesired; I believe the success of this functionality actually depends on your audio hardware and not the OS specifically. My PC can only do 480 and 960. This number is how many samples of the audio signal to accumulate per channel (mono is one channel, stereo is two channels, etc...), and this number will also set the callback timing as a by-product.
Win10 and most devices default to 48000Hz sample rate, which means they are measuring/output data that many times per second. So with my QuantumSize of 480 for every frame of audio, i am getting 48000/480 or 100 frames every second, which means i'm getting them every 10 milliseconds by default. If you set your quantum to 960 samples per frame, you would get 50 frames every second, or a frame every 20ms.
To get a callback into that frame of audio every quantum, you need to register an event into the AudioGraph.QuantumProcessed handler. You can directly reference the link above for how to do that.
So by default, a frame of data is stored in an array of 480 floats from [-1,+1]. And to get the amplitude, you just average the absolute value of this data.
This part, including handling multiple channels of audio, is explained more thoroughly in my other post.
Have fun!

Related

Sync music to frame-based time

I'm making a game in which there are a series of events (which happens, say, every 30 frames in a 60fps setting) that I want to sync with the music (at 120 bpm). In usual cases, e.g. rhythm games, syncing the events to the music is easier, because human seems to perceive much smaller gaps in music than in videos. However, in my case, the game heavily depends on frame-based time, and a lot of things will break if I change the schedule of my series of events.
After a lot of experiments, it seems to me almost impossible to tweak the music without disturbing the human ear: A jump of ~1ms is noticeable, a ~10ms discrepancy between video and audio is noticeable, a 0.5% change in the pitch is noticeable. And I don't have handy tools to speed up audio without changing the pitch.
What is the easiest way out in this circumstance? Is there any reference on this subject that I can refer to? Any advice is appreciated!
The method I that I successfully use (in Java) is to route the playback signal through a path that allows the counting of PCM frames (audio frames run at rates like 44100 fps, as opposed to screen updates which run at rates like 60 fps). I don't know about other languages, but with Java, this can be done by outputting using a SourceDataLine class. As the audio frame count is incremented, it can be compared to the next item (pending item) on a collection of events that require triggers to other systems or threads. Java has an excellent class for handling the collection of events: ConcurrentSkipListSet. It is asynchronous, and automatically sorts elements via a Comparator set to the desired PCM frame count.
Some example code that showing the counting of frames can be seen in this tutorial Using Files and Format Converters, if you search on the page for the phrase "Here, do something useful with the audio data". They are counting bytes, not PCM frames, but the example does give the basic idea.
Why is counting PCM effective? I think this has to do with the fact that this code (in Java) is the closest we get to the point where audio data is fed to the native code controlling the sound system, and that this code employs a blocking queue. Thus, the write operations only happen when the audio system is ready to receive and playback more sound data, and audio systems have to be very accurate in how they maintain their rate of processing. The amount of time variance that occurs here (especially if the thread is given a high priority) is smaller than the time variance incurred by choices made by the JVM as it juggles multiple threads and processes.

How does chrome://webrtc-internal compute keyFramesDecoded?

I have been analyzing the JSON file generated using chrome://webrtc-internal, while running webrtc on 2 PCS.
I looked at Stats API to verify how webrtc-internal computes the keyframe rate.
By looking at Stats API/ RTC Remote Inbound RTP Video Stream, it contains keyFramesDecoded which represents the total number of key frames, such as key frames in VP8, given that I set the codec to VP8.
keyFramesDecoded values are very small, i.e., 2 for a couple of minutes, and similarly for 3 and ...
My question is: How does the graph here make sense for keyFramesDecoded?
That looks right to me.
Chrome is configured to send a keyframe every 3000 frames. That means for 30fps you will see a keyframe every 100 seconds. The framesDecoded is being construct by lots of delta frames.
If you are in a unconstrained network and not dealing with lots of change in your video I would expect to see graphs like yours.

Realtime STFT and ISTFT in Julia for Audio Processing

I'm new to audio processing and dealing with data that's being streamed in real-time. What I want to do is:
listen to a built-in microphone
chunk together samples into 0.1second chunks
convert the chunk into a periodogram via the short-time Fourier transform (STFT)
apply some simple functions
convert back to time series data via the inverse STFT (ISTFT)
play back the new audio on headphones
I've been looking around for "real time spectrograms" to give me a guide on how to work with the data, but no dice. I have, however, discovered some interesting packages, including PortAudio.jl, DSP.jl and MusicProcessing.jl.
It feels like I'd need to make use of multiprocessing techniques to just store the incoming data into suitable chunks, whilst simultaneously applying some function to a previous chunk, whilst also playing another previously processed chunk. All of this feels overcomplicated, and has been putting me off from approaching this project for a while now.
Any help will be greatly appreciated, thanks.
As always start with a simple version of what you really need ... ignore for now pulling in audio from a microphone, instead write some code to synthesize a sin curve of a known frequency and use that as your input audio, or read in audio from a wav file - benefit here is its known and reproducible unlike microphone audio
this post shows how to use some of the libs you mention http://www.seaandsailor.com/audiosp_julia.html
You speak of "real time spectrogram" ... this is simply repeatedly processing a window of audio, so lets initially simplify that as well ... once you are able to read in the wav audio file then send it into a FFT call which will return back that audio curve in its frequency domain representation ... as you correctly state this freq domain data can then be sent into an inverse FFT call to give you back the original time domain audio curve
After you get above working then wrap it in a call which supplies a sliding window of audio samples to give you the "real time" benefit of being able to parse incoming audio from your microphone ... keep in mind you always use a power of 2 number of audio samples in your window of samples you feed into your FFT and IFFT calls ... lets say your window is 16384 samples ... your julia server will need to juggle multiple demands (1) pluck the next buffer of samples from your microphone feed (2) send a window of samples into your FFT and IFFT call ... be aware the number of audio samples in your sliding window will typically be wider than the size of your incoming microphone buffer - hence the notion of a sliding window ... over time add your mic buffer to the front of this window and remove same number of samples off from tail end of this window of samples

can a DSP combine two 48khz audio streams to create a 96khz output

Wireless connections like bluetooth are limited by transmission bandwidth resulting in a limited bitrate and audio sampling frequency.
Can a high definition audio output like 24bit/96khz be created by combining two separate audio streams of 24bit/48khz each, transmitted from a source to receiver speakers/earphones.
I tried to understand how a DSP(digital signal processor) works, but I am unable to find the exact technical words that explain this kind of audio splitting and re-combining technique for increasing the audio resolution
No, you would have to upsample the two original audio streams to 96 kHz. Combining two audio streams will not increase audio resolution; all you're really doing is summing two streams together.
You'll probably want to read this free DSP resource for more information.
Here is a simple construction which could be used to create two audio streams at 24bit/48kHz from a higher resolution 24bit/96kHz stream, which could later be recombined to recreate a single audio stream at 24bit/96kHz.
Starting with an initial high resolution source at 24bit/96kHz {x[0],x[1],x[2],...}:
Take every even sample of the source (i.e. {x[0],x[2],x[4],...} ), and send it over your first 24bit/48kHz channel (i.e. producing the stream y1 such that y1[0]=x[0], y1[1]=x[2], ...).
At the same time, take every odd sample {x[1],x[3],x[5],...} of the source, and send it over your second 24bit/48kHz channel (i.e. producing the stream y2 such that y2[0]=x[1], y2[1]=x[3], ...).
At the receiving end, you should then be able to reconstruct the original 24bit/96kHz audio signal by interleaving the samples from your first and second channel. In other words you would be recreating an output stream out with:
out[0] = y1[0]; // ==x[0]
out[1] = y2[0]; // ==x[1]
out[2] = y1[1]; // ==x[2]
out[3] = y2[1]; // ==x[3]
out[4] = y1[2]; // ==x[4]
out[5] = y2[2]; // ==x[5]
...
That said, transmitting those two streams of 24bit/48kHz would require an effective bandwidth of 2*24bit*48000kHz = 2304kbps, which is exactly the same as transmitting one stream of 24bit/96kHz. So, while this allows you to fit the audio stream in channels of fixed bandwidth, you are not reducing the total bandwidth requirement this way.
Could please you provide you definition of "combining". Based on the data rates, it seems like you want to do a multiplex (combining two mono channels into a stereo channel). If the desire is to "add" two channels together (two monos into a single mono or two stereo channels into one stereo), then you should not have to increase your sampling rate (you are adding two band limited signals, increasing the sampling rate is not necessary).

gnuradio phase drift of AM demodulation

I am beginning a project using GNUradio and an inexpensive SDR.
http://www.amazon.com/gp/product/B00SXZDUAQ?psc=1&redirect=true&ref_=oh_aui_search_detailpage
One portion of the project requires me to generate a reference audio tone and compare the phase of that tone to demodulated audio.
To simulate this portion of the system, I have generated a simple GNUradio flowchart:
I had some issues with the source and demodulated audio in that they would drift relative to each other. This occurred on the scope sync on the original flowgraph. To aid in troubleshooting I sent the demodulated audio out thru the soundcard’s second channel and monitored both audio streams in addition to the modulated RF on an external oscilloscope:
Initially all seems well but, the demodulated audio drifts in relation to the original source and RF:
My question is: am I doing something wrong in the flowgraph or am I expecting too much performance out of an inexpensive SDR?
Thanks in advance for any insights
You cannot expect to see zero phase drift in anything short of a fully digital simulation, or a fully analog circuit with exactly one oscillator, because no two (physical) oscillators have identical frequencies.
In your case, there are two relevant oscillators involved:
The sample clock in the RTL-SDR unit.
The sample clock in your sound card output.
Within an GNU Radio flowgraph, there is no time reference per se and everything depends on the sources and sinks which are connected to hardware.
The relevant source in your flowgraph is the RTL-SDR hardware; insofar as its oscillator is different from its nominal value (28.8 MHz, as it happens), everything it produces will be off-frequency in an absolute sense (both RF carrier frequencies and audio frequencies of demodulated output).
But you don't actually have an absolute frequency reference; you have the tone produced by your sound card. The sound card has its own oscillator, which determines the rate at which samples are converted to analog signals, and therefore the rate at which samples are consumed from the flowgraph.
Therefore, your reference signal will drift relative to your received and demodulated signal, at a rate determined by the difference in frequency error between the two oscillators.
Additionally, since your sound card will be accepting samples from the flowgraph at a slightly different real-time rate than the RTL-SDR is producing them, you will notice periodic glitches in the audio as the error accumulates and must be dealt with; they will start occurring either immediately (if the source is slower than the sink, requiring the sound card to play silence instead) or after a delay for buffers to hit their maximum size (if the source is faster than the sink, requiring the RTL-SDR to drop some samples).

Resources