Can FFT be used to find drum solos/breaks in audio files?

Can FFT be used to find drum solos/breaks in audio files? - audio

Is it possible with FFT to find a drum solo, or a drum break, in an audio file? Is this something FFT is able to do and are there any resources online that could aid me with learning?

In general, a FFT is not a good choice for detecting the onset of percussion sounds:
An FFT is always calculated over a window of samples (in effect a period of time) and yields the magnitude of signal within the bin and its phase offset. You can therefore determine that there is signal at that particular bin, but not its onset time. The best time resolution available is the window period. Of course, you can make the period shorter at the expense of frequency resolution.
Percussion sounds tend to look like noise and spread across the spectrum. This would be OK if you only had percussions sounds, but is not great in real-life polyphonic content.
However, you might be able to find some inference from the different characteristics of the spectra of a drum solo vs instrumental sections of a track.
The problem of finding the time at which percussion sounds start in music is described in academic journals as onset dectection and is one of the many techniques used for feature extraction; the wider field is known as Music Information Retrieval. Your problem sounds like one of identifying sections in audio files and this might be described as partitioning
A good place to start is Sonic Visualiser which is a tool written specifically for MIR applications. Plug-ins exist for various types of feature extraction. From these you will be able to easily find the large body of academic work in this area. There is an added bonus that the existing plug-ins are all open source too.

I'd look here, there was a bit of discussion with great pointers on the Gamedev SE: https://gamedev.stackexchange.com/questions/9761/beat-detection-and-fft :-)

Related

How can I distinguish an instrument from a sound?

I just saw a paper by Cornell reconstructing faces from sound. But I am more interested in the timbre. It might be attacked with AI, but is there an easier way? For example, is instrument a going to be on a different range than instrument b.

For the most part, instruments are going to have overlapping frequency content. IDK the specific algorithms for isolating instruments--I've heard they do exist. I would think that a big element is not just tracking all the harmonics and frequency content, but looking for correspondences in volume changes or frequency changes of the different frequencies, in order to determine which frequencies should be grouped together as a single instrument. Since instruments often play the same notes at the same time, this would be no mean feat. If you are a beginner with digital signal processing, can I recommend "The Scientists and Engineers Guide to DSP" by Steve Smith? (Free download, good book on the fundamental knowledge needed to tackle such a project.)

Methods for simulating moving audio source

I'm currently researching an problem regarding DOA (direction of arrival) regression for an audio source, and need to generate training data in the form of audio signals of moving sound sources. In particular, I have the stationary sound files, and I need to simulate a source and microphone(s) with the distances between them changing to reflect movement.
Is there any software online that could potentially do the trick? I've looked into pyroomacoustics and VA as well as other potential libraries, but none of them seem to deal with moving audio sources, due to the difficulties in simulating the doppler effect.
If I were to write up my own simulation code for dealing with this, how difficult would it be? My use case would be an audio source and a microphone in some 2D landscape, both moving with their own velocities, where I would want to collect the recording from the microphone as an audio file.

Some speculation here on my part, as I have only dabbled with writing some aspects of what you are asking about and am not experienced with any particular libraries. Likelihood is good that something exists and will turn up.
That said, I wonder if it would be possible to use either the Unreal or Unity game engine. Both, as far as I can remember, grant the ability to load your own cues and support 3D including Doppler.
As far as writing your own, a lot depends on what you already know. With a single-point mike (as opposed to stereo) the pitch shifting involved is not that hard. There is a technique that involves stepping through the audio file's DSP data using linear interpolation for steps that lie in between the data points, which is considered to have sufficient fidelity for most purposes. Lot's of trig, too, to track the changes in velocity.
If we are dealing with stereo, though, it does get more complicated, depending on how far you want to go with it. The head masks high frequencies, so real time filtering would be needed. Also it would be good to implement delay to match the different arrival times at each ear. And if you start talking about pinnas, I'm way out of my league.

As of now it seems like Pyroomacoustics does not support moving sound sources. However, do check a possible workaround suggested by the developers here in Issue #105 - where the idea of using a time-varying convolution on a dense microphone array is suggested.

Finding the "noise level" of an audio recording programmatically

I am tasked with something seemingly trivial which is to
find out how "noisy" a given recording is.
This recording came about via a voice recorder, a
OLYMPUS VN-733 PC which was fairly cheap (I am not doing
advertisement, I merely mention this because I in no way
aim to do anything "professional" here, I simply need to
solve a seemingly simple problem).
To preface this, I have already obtained several datasets
from different outside locations, in particular parks or
near-road recordings. That is, the noise that exists at
these specific locations, and to then compare this noise,
on average, with the other locations.
In other words:
I must find out how noisy location A is compared to location
B and C.
I have made 1 minute recordings each so that at the
least the time span of a recording can be compared
to the other locations (and I was using the very
same voice record at all positions, in the same
height etc...).
A sample file can be found at:
http://shevegen.square7.ch/test.mp3
(This may eventually be moved lateron, it just serves as
example how these recordings may sound right now. I am
unhappy about the initial noisy clipping-sound, ideally
I'd only capture the background noise of the cars etc..
but for now this must suffice.)
Now my specific question is, how can I find out how "noisy"
or "loud" this is?
The primary goal is to compare them to the other .mp3
files, which would suffice for my purpose just fine.
But ideally it would be nice to calculate on average
how "loud" every individual .mp3 is and then compared
it to the other ones (there are several recordings
per given geolocation, so I could even merge them
together).
There are some similar questions but not one in particular
that I was able to find that could answer this in a
objective manner, or perhaps I did not understand the
problem at hand. I have all the audio datasets already
but I have no idea how to find out how "loud" any one
of them is individually; there are some apps on smartphones
that claim that they can do this automatically but since
I do not have any smartphone, this is a dead end for me.
Any general advice will be much appreciated.

Noise is a notion difficult to define. Then, I will focus on loudness.
You could compute the energy of each files. For that, you need to access the samples of the audio signal (generally from a built-in function of you programming language). Then you could compute the RMS energy of the signal.
That could be the more basic processing.

How can I look for certain sounds in a live sound input?

I've combed StackOverflow and the web for many questions on whistle detection, etc, and many people did explain as much as they could as to how they can go about detecting their stuff.
capturing sound for analysis and visualizing frequences in android
analyzing whistle sound for pitch note
But what I don't get is how does FFT help you to detect certain sounds in a given sample audio data?
Here's what I understand so far from some stuff I found here and there.
-The sine wave is more or less the building block of ALL signals, musical or not
-Three parameters - FREQUENCY, AMPLITUDE, and INITIAL PHASE, characterize every steady sine wave completely.
-They make each and any kind of wave unique.
-Fourier transform can be used to inspect what kinds of sine waves there are in a signal
SOURCE -- [Audio signal processing basics][3]
Audio data that the computer generates as received from the mic or other input source, for live processing, is an array of amplitudes processed (or stored or taken) at a particular sample rate.
So how does one go from that to detecting whistles and claps?
And complex things such as say, a short period of whistling to a particular song?
My theory of detecting is that we test our whistles in a spectogram, and record the particular frequency and amplitude characteristics. And then if those particular characteristics are repeated again in the input, we've detected a whistle.
Am I right or wrong?
This sound processing stuff is a little complicated.
Forgot to mention this - I'm using Python. Java is also okay, since most of the examplar code I found was for Android which is in Java. And I can work in Java too. Any mention of any libraries or APIs would be helpful too.

How to reproduce C64-like sounds?

I did some of my own research and found out that SID-chips had only few hardware supported synthesizing features. Including three audio oscillators with four possible waveforms (sawtooth, triangle, pulse, noise), with ADSR envelopes and ring modulators. Accompanied with oscillator sync and ring modulators. Also read there was a way to play single PCM sound as well.
It is all so little, but still I heard lots of different sounds from my TV sets. How were they combined to produce all that variety of audio?
To give some specifics, I'd like to know how to combine those components to produce guitar, piano or drum -like audio? Another interesting things would be different buzzes and sounds specific to C64.

I used to write music on the C64 for games, demos and even services (I wrote the official QuantumLink theme, even). As for your question, the four different waveforms were typically overlaid with the sync and ring mods (less often ring, because it was unpredictable on different versions of the SID chip), and sometimes used cleanly.
For example, a typical 'snare' sound would be composed of a noise waveform with a very fast attack and sustain, and depending on whether you wanted a drumstick or brush sound, either a very fast decay and moderately short release, or a short decay and slower release.
Getting the right sound was typically trial and error, and the limitations were pretty heavy. You really never got to the point of piano or guitar sound due to the simple waveforms without overlayable harmonic waveforms, about the best you could get was things that sounded beepy, things that sounded marimba-y, and things that sounded like a snare drum.
One of the tricks used most often to extend sound was done with fast machine code playback routines that could change the played notes on voices so quickly as to give the impression of a fuller, harmonic tone. We just called it arpeggiation, although at 10 to 12 note changes a second it sounded more like a buzzy chord.
As for the sampled waveforms, they were only available as single bit and later 4 bit samples. These sounded terrible despite our best attempts, because basically the method of playback for a sample on the 64 was to play a white noise waveform and rapidly alter the volume on the SID chip to produce a rising and falling wave. Do it fast enough and it sort of sounds like the original sound, poorly tuned in on a staticky radio.
I suggest you grab hold of a C64 emulator for the PC (CCS64 is a good one) and a 64 BASIC programming guide and just play around.... the SID chip is entirely manipulatable from BASIC.
To sum up, how did we get all of those piano and guitar sounds on a C64? We didn't, really.

Take a look at some of these docs related to producing music on the C64:
http://sid.kubarth.com/articles.html

This type of music you are describing falls into the category of "chiptunes". I'd recommend checking out some modern trackers like MilkyTracker, which are used to create music in this style. There are libraries like libmodplug that allow you to play tracker in your software.

Check out some of the C64 emulators out there. I've read that some of them are 100% accurate in ther sound reproduction, true to the original.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string