So to put it simply, I want to find the frequencies of an audio file.
There's only 2 frequencies in it (one in the left ear, one in the right ear) as it's a binaural. What I want to find is these two frequencies. I have audacity and audition.
Related
Why Combing two digital sound files into one digital sound file often involves adjustment of dB level
?
Mathematically, a 5.1 sound file is much larger than a stereo sound file. why?
When two signals are summed on a normalized scale of 0 to 1 as it usually is in the blocks in memory, often directly summing them will make for an increase in overall sound level that does not fit in with the logarithmic decibel scale. In addition, the levels need to be corrected to select the relative levels of the two signals. They need to be corrected before they are summed so they are not too loud, and so that they are the correct relative volume.
To answer your other question, a 5.1 file typically has 6 channels of audio going on at one given time. A stereo sound file only has two channels at most. So the 5.1 equivalent of a soundfile (metadata excluded) would be usually right around three times larger than your average stereo file of an equivalent bitrate.
I am trying to convert a .wav music file into something playable at beep command.
I need to export the frequencies to a text format to use as input parameters at beep.
Ps.: It is not about Speech Transcription.
The beep command in linux is only to control de pc-speaker. It only allows one frequency simultaneously and doesn't apply. A wav file is a file of samples that normally carries music (music is made of a lot of simultaneous frequencies)
You cannot convert a wav file to play it on the pc-speaker. You need a sound card to do that.
As you say, it's not voice recognition, but even in that case, a violin simple note sounds different than a guitar one, because it carries not only a single frequency in it. There are what is called harmonics, different components at different frequencies (normally multiples of the original frequency) that makes the sound different (not only the frequencies matter, also the relative intensities of them) and that is impossible to reproduce with a tool that only allows you to play a single frequency, with a given shape (the wave is not sinusoidal, but have several already included harmonics, that make it sound like a pc speaker) and no intensity capable.
As a collecter I've thousands of audio files which downloaded from podcasting services. All feeds start with a 15 seconds same introduction. That's very annoying for me so I tried crop all of them.
But all of them are not regular. The voiced presentations are the exactly same but some of them...
... are starting at 00:00 or at 00:05 or at any seconds which we don't know
... have not the introduction on startup
I couldn't determine which seconds should crop.
The question: How can we crop the all audio files according to specific audio clip?
In other sayings "detect same part and remove it" ?
As I understand it you already have a way to crop the files at a specific point. So the problem boils down to working out where the intro ends in each clip. Here's how I would do it:
First, manually isolate the intro audio in a separate file/buffer.
For each clip, you need to work out where in the clip the intro audio occurs. Do this by computing a cross-correlation between the intro audio and the main clip. The correct offset will be the one with the highest correlation coefficient. (You could also look for the minimum in a mean-difference, which is equivalent.)
Once you know where the intro audio is, you can calculate your crop position.
There are a few obvious optimisations:
Only search for the intro audio in the first (say) 30 seconds of each clip.
Don't search for the whole intro audio, just the last 1/2 second.
If you're not 100% sure that the audio is there, you might want to set a threshold for acceptance.
I am currently making a game, similar to Audiosurf. I am trying to find the frequency of an audio file(like .mpg3 or .wav) at every second. Based on the value I will build the level. I have been doing a lot of research on this topic. I have a way to get the samples within the audio file, i am using the unity engine to make this game. I am thinking about breaking the samples into samples per second(using the transfer rate), then do an FFT on each of those and then find the highest frequency within each. Am I on the right path? Can anyone ofter any suggestions or if I am not on the right path, can anyone one correct me? Any help would be appreciated.
You are on the right path with the FFT part and splitting your samples into bins. Here is a library for that: http://www.fftw.org/
Where it gets hairy is with picking your frequency, let me tell you off the bat just throw away the highest frequency in the spectrum, it's part of the static. Maybe you could use the lowest frequency to catch the bassline, but likely the bass drums and even atmospheric sound effects will interfere there.
Now provided you do find some heuristic that allows you to pick the "frequency" at a given moment in the song, this most likely doesn't have correlation to the music itself. You are really better off re-working your idea to use frequency spectrum at each moment, not just a single frequency.
EDIT: The fourier transform will provide you with an array of complex numbers, each represents the amplitude as the real component and phase as the imaginary component for its bin.
Consider multiple (at least two) different audio-files, like several different mixes or remixes. Naively I would say, it must be possible to detect samples, especially the vocals, that are almost equal in two or more of the files, of course only then, if the vocal samples aren't modified, stretched, pitched, reverbed too much etc.
So with what kind of algorithm or technique this could be done? Let's say, the user would try to set time markers in all files best possible, which describe the data windows to compare, containing the presumably equal sounds, vocals etc.
I know that no direct approach, trying to directly compare wav data in any way is useful. But even if I have the frequency domain data (e.g. from FFT), I would have to use a comparison algorithm that kind of shifts the comparing-windows through time scale, since I cannot assume the samples, I want to find, are time sync over all files.
Thanks in advance for any suggestions.
Hi this is possible !!
You can use one technique called LSH (locality sensitive hashing), is very robust.
another way to do this is try make spectrogram analysis in your audio files ...
Construct database song
1. Record your Full Song
2. Transform the sound to spectrum
3. slice your Spectrogram in chunk and get three or four high Frequencies
4. Store all the points
Match the song
1. Record one short sample.
2. Transform the sound into another spectrum
3. slice your Spectrogram in chunk and get three or four hight Frequencies
4. Compare the collected frequencies with your database song.
5. your match is the song with have the high hit !
you can see here how make ..
http://translate.google.com/translate?hl=EN&sl=pt&u=http://ederwander.wordpress.com/2011/05/09/audio-fingerprint-em-python/
ederwander