Noise reduction for audio buffer as it gets filled in - audio

I have a scenario where some audio is being received over the internet. The audio itself has lot of noise, which needs to be filtered out. Received audio is raw PCM 16 bit.
Tools like audacity can remove noise, but they create a noise profile and then remove the noise from part of or from the whole file. I want to instead remove noise from the audio as it comes in and gets written to a buffer, so that once all the audio is received and written to the buffer, noise reduction is already completed and the audio can be played out. Each packet from the network sends around 1 KB of audio, and the total audio is around 1 MB.
The audio contains conversation between two people, so I want to keep the audio within voice recording range (80-255 Hz from the comments).
I want to ask if anyone knows of any algorithm that can achieve this.

Related

ffmpeg to auto lower/fade audio volume of one audio stream when microphone voice detected?

I want to do live audio translation via microphone, to get streamed live vid/audio from Facebook, plug the mic into laptop and do live translation by mixing existing audio stream with one coming from the mic (translation). This is OK, somehow I got this part by using audio filter "amix" and mix two audio streams together into one. Now I want to add more perfection to it, is it possible to (probably is) upon mic voice detection to automatically decrease/fade down 20% volume of input/original audio stream to hear translation (mic audio) more loudly and then when mic action/voice stops for lets say 3-5 seconds the volume of original audio stream fades up/goes up to normal volume... is this too much, i can play with sox or similar?

Realtime STFT and ISTFT in Julia for Audio Processing

I'm new to audio processing and dealing with data that's being streamed in real-time. What I want to do is:
listen to a built-in microphone
chunk together samples into 0.1second chunks
convert the chunk into a periodogram via the short-time Fourier transform (STFT)
apply some simple functions
convert back to time series data via the inverse STFT (ISTFT)
play back the new audio on headphones
I've been looking around for "real time spectrograms" to give me a guide on how to work with the data, but no dice. I have, however, discovered some interesting packages, including PortAudio.jl, DSP.jl and MusicProcessing.jl.
It feels like I'd need to make use of multiprocessing techniques to just store the incoming data into suitable chunks, whilst simultaneously applying some function to a previous chunk, whilst also playing another previously processed chunk. All of this feels overcomplicated, and has been putting me off from approaching this project for a while now.
Any help will be greatly appreciated, thanks.
As always start with a simple version of what you really need ... ignore for now pulling in audio from a microphone, instead write some code to synthesize a sin curve of a known frequency and use that as your input audio, or read in audio from a wav file - benefit here is its known and reproducible unlike microphone audio
this post shows how to use some of the libs you mention http://www.seaandsailor.com/audiosp_julia.html
You speak of "real time spectrogram" ... this is simply repeatedly processing a window of audio, so lets initially simplify that as well ... once you are able to read in the wav audio file then send it into a FFT call which will return back that audio curve in its frequency domain representation ... as you correctly state this freq domain data can then be sent into an inverse FFT call to give you back the original time domain audio curve
After you get above working then wrap it in a call which supplies a sliding window of audio samples to give you the "real time" benefit of being able to parse incoming audio from your microphone ... keep in mind you always use a power of 2 number of audio samples in your window of samples you feed into your FFT and IFFT calls ... lets say your window is 16384 samples ... your julia server will need to juggle multiple demands (1) pluck the next buffer of samples from your microphone feed (2) send a window of samples into your FFT and IFFT call ... be aware the number of audio samples in your sliding window will typically be wider than the size of your incoming microphone buffer - hence the notion of a sliding window ... over time add your mic buffer to the front of this window and remove same number of samples off from tail end of this window of samples

Recording composite video to an audio file

I'm trying to record raw composite video siganl to an audio file by connecting the yellow rca cable from a player to the mic input in my pc so I can then put the cable in my audio output and connect it with the video input in an old crt tv and play back the signal to the tv so that I can view the original video.
But that didn't work and I can only see random white lines.
Is that due to frequency limits in the audio format or in the onboard audio chip, or is analog-digital conversion and the other way when recording and playing back damaging the signal?
Video signals operate in ranges above 1 Mhz, where high-quality audio signals only max out at ~96Khz. Video signals would likely need to be be encoded in a format that an audio recorder could pick up, then decoded back into a video signal before a television could render it properly. This answer on the Sound Design exchange may be of interest to you.
A very high bitrate uncompressed audio file may be able to store a low-fidelity video signal, a black and white signal could be stored at sub-vhs quality, but could be at least a resolvable image, recording component video may be possible even though syncing the seperate tracks would be hard.
I tried it.
Sampling rate is 192KHz. It can record up to 192/2=96KHz.
I succeed to capture part of luminance signal.
Color signal is in very high frequency.
So we can't record color signal using soundcard.
Video is very distorted.
However we may can caputure more clearly using soundcard more highter sampling rate.
https://m.youtube.com/watch?v=-Q_YraNAGhw&feature=youtu.be

A way to add data "mid stream" to encoded audio (possibly with AAC)

Is there a way to add lossless data to an AAC audio stream?
Essentially I am looking to be able to inject "this frame of audio should be played at XXX time" every n frames in.
If I use a lossless codec I suppose I could just inject my own header mid stream and that data would be intact as it needs to be the same on the way out just like gzip does not loose data.
Any ideas? I suppose I could encode the data into chunks of AAC on the server and on the network layer add a timestamp saying play the following chunk of AAC at time x but I'd prefer to figure a way to add it to the audio itself.
This is not really possible (short of writing your own specialized encoder), as AAC (and MP3) frames are not truly standalone.
There is a concept of the bit reservoir, where unused bandwidth from one frame can be utilized for a later frame that may need more bandwidth to store a more complicated sound. That is, data from frame 1 might be needed in frame 2 and/or 3. If you cut the stream between frames 1 and 2 and insert your alternative frames, the reference to the bit reservoir data is broken and you have damaged frame 2's ability to be decoded.
There are encoders that can work in a mode where the bit reservoir isn't used (at the cost of quality). If operating in this mode, you should be able to cut the stream more freely along frame boundaries.
Unfortunately, the best way to handle this is to do it in the time domain when dealing with your raw PCM samples. This gives you more control over the timing placement anyway, and ensures that your stream can also be used with other codecs.

How to play a song with Audio Queue API in order to find the Accurate timings of the beat of the song

Now I am working on a musical project in which i need accurate timings.I already used NSTimer,NSdate,but iam geting delay while playing the beats(beats i.e tik tok)So i have decided to use Audio Queue API to play my sound file present in the main bundle that is of .wav format, Its been 2 weeks i am struggling with this, Can anybody please help me out of this problem.
Use uncompressed sounds and a single Audio Queue, continuously running. Don't stop it between beats. Instead mix in raw PCM samples of each Tock sound (etc.) starting some exact number of samples apart. Play silence (zeros) in between. That will produce sub-millisecond accurate timing.

Resources