I'm trying to create a zoom-in/shake effect based on how loud the audio stream's bass is.
I've realized you can use showcqt to get a graph of the sound, but I can't figure out how to pipe that to some zoom function to do zoom based on that.
Any ideas?
Related
I am looking for scaling a PNG file according to an audio provided, a frequency range (20hz-1000hz for example) and a threshold, for a smooth effect.
For example, when there is a kick, scale go to 120% smoothly, I would like to make those audio visualizers such as dubstep, etc... where when kicks comes in, their image are "pumping".
First, is it doable with ffmpeg?
Where to start?
I found showcqt that takes frequencies in input etc., but its output is a video so I don't think I can use it in my case. Any help appreciated.
If you are able to read the PCM values as they are being output, then you might consider using a rolling RMS average in order to get a continuous stream of amplitudes. IDK the best length of the array. Perhaps it should correspond to the number of audio frames that would give you an update for each visual frame? The folks at the DSP site would have the best insights.
If you do a rolling average, computations are not terribly expensive. You'd do the square on the incoming and add that to a ring buffer (circular queue) and drop the outgoing. Only those data points need be added to the rolling average when computing the new rolling average, since the denominator is fixed and known. I found a video that describes the basic RMS math here using Matlab.
It might be necessary to add some smoothing to visualizer that is receiving the volume updates. Also, handing off data from the audio thread should likely employ some form of loose coupling. It would not be good if the thread that is processing the audio was also handling graphics.
I'm a little over my head, but I think this is what is generally done for visualizers.
I need to be able to analyze (search thru) hundreds of WAV files and detect but not remove static noise. As done currently now, I must listen to each conversation and find the characteristic noise/static manually, which takes too much time. Ideally, I would need a program that can read each new WAV file and be able to detect characteristic signatures of the static noise such as periods of bursts of white noise or full audio band, high amplitude noise (like AM radio noise over phone conversation such as a wall of white noise) or bursts of peek high frequency high amplitude (as in crackling on the phone line) in a background of normal voice. I do not need to remove the noise but simply detect it and flag the recording for further troubleshooting. Ideas?
I can listen to the recordings and find the static or crackling but this takes time. I need an automated or batch process that can run on its own and flag the troubled call recordings (WAV files for a phone PBX). These are SIP and analog conversations depending on the leg of the conversation so RTSP/SIP packet analysis might be an option, but the raw WAV file is the simplest. I can use Audacity, but this still requires opening each file and looking at the visual representation of the audio spectrometry and is only a little faster than listening to each call but still cumbersome.
I currently have no code or methods for this task. I simply listen to each call wav file to find the noise.
I need a batch Wav file search that can render wav file recordings that contain the characteristic noise or static or crackling over the recording phone conversation.
Unless you can tell the program how the noise looks like, it's going to be challenging to run any sort of batch processing. I was facing a similar challenge and that prompted me to develop (free and open source) software to help user in audio exploration, analysis and signal separation:
App: https://audioexplorer.online/
Docs: https://tracek.github.io/audio-explorer/
Source code: https://github.com/tracek/audio-explorer
Essentially, it visualises audio as a 2d scatter plot rather than only "linear", as in waveform or spectrogram. When you upload audio the following happens:
Onsets are detected (based on high-frequency content algorithm from aubio) according to the threshold you set. Set it to None if you want all.
Per each audio fragment, calculate audio features based on your selection. There's no universal best set of features, all depends on the application. You might try for starter with e.g. Pitch statistics. Consider setting proper values for bandpass filter and sample length (that's the length of audio fragment we're going to use). Sample length could be in future established dynamically. Check docs for more info.
The result is that for each fragment you have many features, e.g. 6 or 60. That means we have then k-dimensional (where k is number of features) structure, which we then project to 2d space with dimensionality reduction algorithm of your selection. Uniform Manifold Approximation and Projection is a sound choice.
In theory, the resulting embedding should be such that similar sounds (according to features we have selected) are closely together, while different further apart. Your noise should be now separated from your "not noise" and form cluster.
When you hover over the graph, in right-upper corner a set of icons appears. One is lasso selection. Use it to mark points, inspect spectrogram and e.g. download table with features that describe that signal. At that moment you can also reduce the noise (extra button appears) in a similar way to Audacity - it analyses the spectrum and reduces these frequencies with some smoothing.
It does not completely solve your problem right now, but could severely cut the effort. Going through hundreds of wavs could take better part of the day, but you will be done. Want it automated? There's CLI (command-line interface) that I am developing at the same time. In not-too-distant future it should take what you have labelled as noise and signal and then use supervised machine learning to go through everything in batch mode.
Suggestions / feedback? Drop an issue on GitHub.
I am trying to convert a .wav music file into something playable at beep command.
I need to export the frequencies to a text format to use as input parameters at beep.
Ps.: It is not about Speech Transcription.
The beep command in linux is only to control de pc-speaker. It only allows one frequency simultaneously and doesn't apply. A wav file is a file of samples that normally carries music (music is made of a lot of simultaneous frequencies)
You cannot convert a wav file to play it on the pc-speaker. You need a sound card to do that.
As you say, it's not voice recognition, but even in that case, a violin simple note sounds different than a guitar one, because it carries not only a single frequency in it. There are what is called harmonics, different components at different frequencies (normally multiples of the original frequency) that makes the sound different (not only the frequencies matter, also the relative intensities of them) and that is impossible to reproduce with a tool that only allows you to play a single frequency, with a given shape (the wave is not sinusoidal, but have several already included harmonics, that make it sound like a pc speaker) and no intensity capable.
I want to identify areas in a .mp4 (H264 + AAC) video that are silent and unchanged frames and cut them out.
Of course there would be some fine-tuning regarding thresholds and algorithms to measure unchanged frames.
My problem is more general, regarding how I would go about automating this?
Is it possible to solve this with ffmpeg? (preferably with C or python)
How can I programatically analyse the audio?
How can I programatically analyse video frames?
For audio silence see this.
For still video scenes ffmpeg might not be the ideal tool.
You could use scene change detection with a low threshold to find the specific frames, then extract those frames and compare them with something like imagemagick's compare function:
ffprobe -show_frames -print_format compact -f lavfi "movie=test.mp4,select=gt(scene\,.1)"
compare -metric RMSE frame1.png frame0.png
I don't expect this to work very well.
Your best bet is to use something like OpenCV to find differences between frames.
OpenCV Simple Motion Detection
I want to put a watermark on my video. IS it possible to do with directshow filter.
Want to overlap an image on video like channel logo. so that image will be fixed when video is playing.
Please provide some valuable help or samples (VC++)
I've done this before. You have two options.
Use VMR-7 or VMR-9's mixer capabilities. I guarantee you this will look real ugly, because VMR filters can't do alpha blending at all. Your watermark will have rough edges.
Implement a filter class that derives from CTransInPlaceFilter.
You implement the following methods:
CheckMediaType (accept all RGB formats)
SetMediaType (accept all RGB formats)
Transform (this is where you do the overlay)
In your filter's constructor (or on some other method that gets called before the graph runs), load your watermark from file or resources. Save the bitmap bits of the image file into a buffer.
When Transform gets called, crack open the IMediaSample that's passed in, access its buffer, and have a double-nested-for loop to copy each pixel of the watermark onto the buffer of the image.
One problem with all of this is that your input source may not be native RGB. Most webcams for example are YUV sources (or worse, MJPG). By constraining your filter to only accept RGB types will force the DShow color converter filters to load. As such, extra latency may get added to your graph. As for alpha blending (if you want it), you are on your own here - the source buffer you are blitting on top of will likely be RGB24 with no alpha channel.