I am trying to create an audio occlusion system, it kind of works however when I try to set the LPF frequency, it doesn't change (ue4) - audio

When I print out what the Low Pass Filter is its fine, however when I get the LPF frequency from the audio component reference it doesn't change. any ideas or suggestions?
Here is a link to my code:https://blueprintue.com/blueprint/x0hnwumi/

Related

Scaling an image according to audio (threshold, frequencies)

I am looking for scaling a PNG file according to an audio provided, a frequency range (20hz-1000hz for example) and a threshold, for a smooth effect.
For example, when there is a kick, scale go to 120% smoothly, I would like to make those audio visualizers such as dubstep, etc... where when kicks comes in, their image are "pumping".
First, is it doable with ffmpeg?
Where to start?
I found showcqt that takes frequencies in input etc., but its output is a video so I don't think I can use it in my case. Any help appreciated.
If you are able to read the PCM values as they are being output, then you might consider using a rolling RMS average in order to get a continuous stream of amplitudes. IDK the best length of the array. Perhaps it should correspond to the number of audio frames that would give you an update for each visual frame? The folks at the DSP site would have the best insights.
If you do a rolling average, computations are not terribly expensive. You'd do the square on the incoming and add that to a ring buffer (circular queue) and drop the outgoing. Only those data points need be added to the rolling average when computing the new rolling average, since the denominator is fixed and known. I found a video that describes the basic RMS math here using Matlab.
It might be necessary to add some smoothing to visualizer that is receiving the volume updates. Also, handing off data from the audio thread should likely employ some form of loose coupling. It would not be good if the thread that is processing the audio was also handling graphics.
I'm a little over my head, but I think this is what is generally done for visualizers.

Meaning of sample values in a wav file

for a school project, I am supposed to analyze a short sound recording in wav format. I am done with the project, I DFT'd it, filtered out unwanted frequencies, and got the correct result. What eludes me, though, is the meaning of the values of the individual samples of my wav file. I have tens of thousands of samples that look like this:
[ 0.06234258 0.16020246 0.14122963 ... -0.01704375 -0.08993937 -0.09293508]
However, no matter how much I multiply these values by a number, the resulting sound sounds the same. If I multiply every sample by 1000, it sounds just as it sounded before. The same goes for dividing. So what do these samples mean, if not volume?
EDIT:
Here is the code I'm using:
import soundfile as sf
import IPython
samples, sampling_freq = sf.read('recording.wav')
IPython.display.display(IPython.display.Audio(samples, rate=sampling_freq )) #This one displays a playable bar.
The samples (basically a long array of floating point numbers) in the file is the Pulse Code Modulated data representing the audio.
Given that audio players use this data to recreate the original audio wave, multiplying every sample by some factor should increase the volume. However, some audio players scale down (re-normalize) the samples to prevent audio clamping - which can be the cause why it sounds the same.
The ideal way to visualize the audio should be using Audacity. It has the capability to show the audio wave in real time. Something like this -
PC: Google

I need to analyse many audio WAV files for characteristic noise, ideas?

I need to be able to analyze (search thru) hundreds of WAV files and detect but not remove static noise. As done currently now, I must listen to each conversation and find the characteristic noise/static manually, which takes too much time. Ideally, I would need a program that can read each new WAV file and be able to detect characteristic signatures of the static noise such as periods of bursts of white noise or full audio band, high amplitude noise (like AM radio noise over phone conversation such as a wall of white noise) or bursts of peek high frequency high amplitude (as in crackling on the phone line) in a background of normal voice. I do not need to remove the noise but simply detect it and flag the recording for further troubleshooting. Ideas?
I can listen to the recordings and find the static or crackling but this takes time. I need an automated or batch process that can run on its own and flag the troubled call recordings (WAV files for a phone PBX). These are SIP and analog conversations depending on the leg of the conversation so RTSP/SIP packet analysis might be an option, but the raw WAV file is the simplest. I can use Audacity, but this still requires opening each file and looking at the visual representation of the audio spectrometry and is only a little faster than listening to each call but still cumbersome.
I currently have no code or methods for this task. I simply listen to each call wav file to find the noise.
I need a batch Wav file search that can render wav file recordings that contain the characteristic noise or static or crackling over the recording phone conversation.
Unless you can tell the program how the noise looks like, it's going to be challenging to run any sort of batch processing. I was facing a similar challenge and that prompted me to develop (free and open source) software to help user in audio exploration, analysis and signal separation:
App: https://audioexplorer.online/
Docs: https://tracek.github.io/audio-explorer/
Source code: https://github.com/tracek/audio-explorer
Essentially, it visualises audio as a 2d scatter plot rather than only "linear", as in waveform or spectrogram. When you upload audio the following happens:
Onsets are detected (based on high-frequency content algorithm from aubio) according to the threshold you set. Set it to None if you want all.
Per each audio fragment, calculate audio features based on your selection. There's no universal best set of features, all depends on the application. You might try for starter with e.g. Pitch statistics. Consider setting proper values for bandpass filter and sample length (that's the length of audio fragment we're going to use). Sample length could be in future established dynamically. Check docs for more info.
The result is that for each fragment you have many features, e.g. 6 or 60. That means we have then k-dimensional (where k is number of features) structure, which we then project to 2d space with dimensionality reduction algorithm of your selection. Uniform Manifold Approximation and Projection is a sound choice.
In theory, the resulting embedding should be such that similar sounds (according to features we have selected) are closely together, while different further apart. Your noise should be now separated from your "not noise" and form cluster.
When you hover over the graph, in right-upper corner a set of icons appears. One is lasso selection. Use it to mark points, inspect spectrogram and e.g. download table with features that describe that signal. At that moment you can also reduce the noise (extra button appears) in a similar way to Audacity - it analyses the spectrum and reduces these frequencies with some smoothing.
It does not completely solve your problem right now, but could severely cut the effort. Going through hundreds of wavs could take better part of the day, but you will be done. Want it automated? There's CLI (command-line interface) that I am developing at the same time. In not-too-distant future it should take what you have labelled as noise and signal and then use supervised machine learning to go through everything in batch mode.
Suggestions / feedback? Drop an issue on GitHub.

Feeding real-time audio data to tensorflow on a mobile device

I am building a prototype of a sound detection app that will ultimately run on a phone (iPhone/Android). It needs to be near real-time to give fast enough response to the user when a particular sound is recognized. I am hoping to use tensorflow to actually build and train the model and then deploy it on mobile device.
What I am unsure about is best way to feed data to tensorflow for inference in this case.
Option 1: Feed only newly acquired samples to the model.
Here the model itself keeps a buffer of previous signal samples, to which new samples are appended and the whole thing get processed.
Something like:
samples = tf.placeholder(tf.int16, shape=(None))
buffer = tf.Variable([], trainable=False, validate_shape=False, dtype=tf.int16)
update_buffer = tf.assign(buffer, tf.concat(0, [buffer, samples]), validate_shape=False)
detection_op = ....process buffer...
session.run([update_buffer, detection_op], feed_dict={samples: [.....]})
This seems to work, but if the samples are pushed to the model 100 times a second, what's happening inside tf.assign (the buffer can grow big enough, and if tf.assign constantly allocates memory this may not work well)?
Option 2: Feed the whole recording to the model
Here the iPhone app keeps the state/recording samples, and feeds the whole recording to the model. The input can get quite large, and re-running the detection op on the whole recording will have to keep recomputing the same values each cycle.
Option 3: Feed a sliding window of data
Here the app keeps the data for the whole recording, but feeds only the latest slice of data to the model. E.g. last 2 sec at 2000 sampling rate == 4000 sample fed fed at the rate of 1/100 sec (each new 20 samples). The model may also need to keep some running totals for the whole recording.
Advise?
I'd need to know a bit more about your application requirements, but for simplicities sake I recommend starting with option #3. The usual way to approach this problem for arbitrary sounds is:
Have some trigger to detect the start of a sound or speech utterance. This can just be sustained audio levels, or something more advanced.
Run a spectrogram over a fixed size window, aligned with the start of the noise.
The rest of the network can just be a standard image detection one (usually cut down in size) to classify the sound.
There are a lot of variations and other possible approaches. For example for speech it's typical to use MFCC as your feature generator, and then run an LSTM to separate out phonemes, but since you mention sound detection I'm guessing you don't need anything this advanced.

Finding the frequency per second from an audio file

I am currently making a game, similar to Audiosurf. I am trying to find the frequency of an audio file(like .mpg3 or .wav) at every second. Based on the value I will build the level. I have been doing a lot of research on this topic. I have a way to get the samples within the audio file, i am using the unity engine to make this game. I am thinking about breaking the samples into samples per second(using the transfer rate), then do an FFT on each of those and then find the highest frequency within each. Am I on the right path? Can anyone ofter any suggestions or if I am not on the right path, can anyone one correct me? Any help would be appreciated.
You are on the right path with the FFT part and splitting your samples into bins. Here is a library for that: http://www.fftw.org/
Where it gets hairy is with picking your frequency, let me tell you off the bat just throw away the highest frequency in the spectrum, it's part of the static. Maybe you could use the lowest frequency to catch the bassline, but likely the bass drums and even atmospheric sound effects will interfere there.
Now provided you do find some heuristic that allows you to pick the "frequency" at a given moment in the song, this most likely doesn't have correlation to the music itself. You are really better off re-working your idea to use frequency spectrum at each moment, not just a single frequency.
EDIT: The fourier transform will provide you with an array of complex numbers, each represents the amplitude as the real component and phase as the imaginary component for its bin.

Resources