I want to mix about 20 audio streams with ffmpeg amix, however as described here, amix has a weired way of making the input streams quieter the more of them you mix together:
"amix scales each input's volume by 1/n where n = no. of active inputs. This is evaluated for each audio frame. So when an input drops out, the volume of the remaining inputs is scaled by a smaller amount, hence their volumes increase"
How can I get rid of this anoying behaviour?
I just want the audio streams to keep the same loudness, since only one of them has actual audio in it at any give time anyway..
At the moment I end up with a file that is about 1/20 the loudness of the original, making it effectively unusable.
Adjust the volume of each stream by multiplying by n (20 in your case)
https://ffmpeg.org/ffmpeg-filters.html#volume
Related
I'm new to audio processing and dealing with data that's being streamed in real-time. What I want to do is:
listen to a built-in microphone
chunk together samples into 0.1second chunks
convert the chunk into a periodogram via the short-time Fourier transform (STFT)
apply some simple functions
convert back to time series data via the inverse STFT (ISTFT)
play back the new audio on headphones
I've been looking around for "real time spectrograms" to give me a guide on how to work with the data, but no dice. I have, however, discovered some interesting packages, including PortAudio.jl, DSP.jl and MusicProcessing.jl.
It feels like I'd need to make use of multiprocessing techniques to just store the incoming data into suitable chunks, whilst simultaneously applying some function to a previous chunk, whilst also playing another previously processed chunk. All of this feels overcomplicated, and has been putting me off from approaching this project for a while now.
Any help will be greatly appreciated, thanks.
As always start with a simple version of what you really need ... ignore for now pulling in audio from a microphone, instead write some code to synthesize a sin curve of a known frequency and use that as your input audio, or read in audio from a wav file - benefit here is its known and reproducible unlike microphone audio
this post shows how to use some of the libs you mention http://www.seaandsailor.com/audiosp_julia.html
You speak of "real time spectrogram" ... this is simply repeatedly processing a window of audio, so lets initially simplify that as well ... once you are able to read in the wav audio file then send it into a FFT call which will return back that audio curve in its frequency domain representation ... as you correctly state this freq domain data can then be sent into an inverse FFT call to give you back the original time domain audio curve
After you get above working then wrap it in a call which supplies a sliding window of audio samples to give you the "real time" benefit of being able to parse incoming audio from your microphone ... keep in mind you always use a power of 2 number of audio samples in your window of samples you feed into your FFT and IFFT calls ... lets say your window is 16384 samples ... your julia server will need to juggle multiple demands (1) pluck the next buffer of samples from your microphone feed (2) send a window of samples into your FFT and IFFT call ... be aware the number of audio samples in your sliding window will typically be wider than the size of your incoming microphone buffer - hence the notion of a sliding window ... over time add your mic buffer to the front of this window and remove same number of samples off from tail end of this window of samples
Wireless connections like bluetooth are limited by transmission bandwidth resulting in a limited bitrate and audio sampling frequency.
Can a high definition audio output like 24bit/96khz be created by combining two separate audio streams of 24bit/48khz each, transmitted from a source to receiver speakers/earphones.
I tried to understand how a DSP(digital signal processor) works, but I am unable to find the exact technical words that explain this kind of audio splitting and re-combining technique for increasing the audio resolution
No, you would have to upsample the two original audio streams to 96 kHz. Combining two audio streams will not increase audio resolution; all you're really doing is summing two streams together.
You'll probably want to read this free DSP resource for more information.
Here is a simple construction which could be used to create two audio streams at 24bit/48kHz from a higher resolution 24bit/96kHz stream, which could later be recombined to recreate a single audio stream at 24bit/96kHz.
Starting with an initial high resolution source at 24bit/96kHz {x[0],x[1],x[2],...}:
Take every even sample of the source (i.e. {x[0],x[2],x[4],...} ), and send it over your first 24bit/48kHz channel (i.e. producing the stream y1 such that y1[0]=x[0], y1[1]=x[2], ...).
At the same time, take every odd sample {x[1],x[3],x[5],...} of the source, and send it over your second 24bit/48kHz channel (i.e. producing the stream y2 such that y2[0]=x[1], y2[1]=x[3], ...).
At the receiving end, you should then be able to reconstruct the original 24bit/96kHz audio signal by interleaving the samples from your first and second channel. In other words you would be recreating an output stream out with:
out[0] = y1[0]; // ==x[0]
out[1] = y2[0]; // ==x[1]
out[2] = y1[1]; // ==x[2]
out[3] = y2[1]; // ==x[3]
out[4] = y1[2]; // ==x[4]
out[5] = y2[2]; // ==x[5]
...
That said, transmitting those two streams of 24bit/48kHz would require an effective bandwidth of 2*24bit*48000kHz = 2304kbps, which is exactly the same as transmitting one stream of 24bit/96kHz. So, while this allows you to fit the audio stream in channels of fixed bandwidth, you are not reducing the total bandwidth requirement this way.
Could please you provide you definition of "combining". Based on the data rates, it seems like you want to do a multiplex (combining two mono channels into a stereo channel). If the desire is to "add" two channels together (two monos into a single mono or two stereo channels into one stereo), then you should not have to increase your sampling rate (you are adding two band limited signals, increasing the sampling rate is not necessary).
I want to built a SoundWave sampling an audio stream.
I read that a good method is to get amplitude of the audio stream and represent it with a Polygon. But, suppose we have and AudioGraph with just a DeviceInputNode and a FileOutpuNode (a simple recorder).
How can I get the amplitude from a node of the AudioGraph?
What is the best way to periodize this sampling? Is a DispatcherTimer good enough?
Any help will be appreciated.
First, everything you care about is kind of here:
uwp AudioGraph audio processing
But since you have a different starting point, I'll explain some more core things.
An AudioGraph node is already periodized for you -- it's generally how audio works. I think Win10 defaults to periods of 10ms and/or 20ms, but this can be set (theoretically) via the AudioGraphSettings.DesiredSamplesPerQuantum setting, with the AudioGraphSettings.QuantumSizeSelectionMode = QuantumSizeSelectionMode.ClosestToDesired; I believe the success of this functionality actually depends on your audio hardware and not the OS specifically. My PC can only do 480 and 960. This number is how many samples of the audio signal to accumulate per channel (mono is one channel, stereo is two channels, etc...), and this number will also set the callback timing as a by-product.
Win10 and most devices default to 48000Hz sample rate, which means they are measuring/output data that many times per second. So with my QuantumSize of 480 for every frame of audio, i am getting 48000/480 or 100 frames every second, which means i'm getting them every 10 milliseconds by default. If you set your quantum to 960 samples per frame, you would get 50 frames every second, or a frame every 20ms.
To get a callback into that frame of audio every quantum, you need to register an event into the AudioGraph.QuantumProcessed handler. You can directly reference the link above for how to do that.
So by default, a frame of data is stored in an array of 480 floats from [-1,+1]. And to get the amplitude, you just average the absolute value of this data.
This part, including handling multiple channels of audio, is explained more thoroughly in my other post.
Have fun!
Is there a way to add lossless data to an AAC audio stream?
Essentially I am looking to be able to inject "this frame of audio should be played at XXX time" every n frames in.
If I use a lossless codec I suppose I could just inject my own header mid stream and that data would be intact as it needs to be the same on the way out just like gzip does not loose data.
Any ideas? I suppose I could encode the data into chunks of AAC on the server and on the network layer add a timestamp saying play the following chunk of AAC at time x but I'd prefer to figure a way to add it to the audio itself.
This is not really possible (short of writing your own specialized encoder), as AAC (and MP3) frames are not truly standalone.
There is a concept of the bit reservoir, where unused bandwidth from one frame can be utilized for a later frame that may need more bandwidth to store a more complicated sound. That is, data from frame 1 might be needed in frame 2 and/or 3. If you cut the stream between frames 1 and 2 and insert your alternative frames, the reference to the bit reservoir data is broken and you have damaged frame 2's ability to be decoded.
There are encoders that can work in a mode where the bit reservoir isn't used (at the cost of quality). If operating in this mode, you should be able to cut the stream more freely along frame boundaries.
Unfortunately, the best way to handle this is to do it in the time domain when dealing with your raw PCM samples. This gives you more control over the timing placement anyway, and ensures that your stream can also be used with other codecs.
During capturing from some audio and video sources and encoding at AVI container for synchronizing audio & video I set audio as a master stream and this gave best result for synchronizing.
http://msdn.microsoft.com/en-us/library/windows/desktop/dd312034(v=vs.85).aspx
But this method gives a higher FPS value as a result. About 40 or 50 instead of 30 FPS.
If this media file just playback - all OK, but if try to recode at different software to another video format appears out of sync.
How can I programmatically set dwScale and dwRate values in the AVISTREAMHEADER structure at AVI muxing?
How can I programmatically set dwScale and dwRate values in the AVISTREAMHEADER structure at AVI muxing?
MSDN:
This method works by adjusting the dwScale and dwRate values in the AVISTREAMHEADER structure.
You requested that multiplexer manages the scale/rate values, so you cannot adjust them. You should be seeing more odd things in your file, not just higher FPS. The file itself is perhaps out of sync and as soon as you process it with other applciations that don't do playback fine tuning, you start seeing issues. You might be having video media type showing one frame rate and effectively the rate is different.