I just want to detect the particular sound in j2me. Is it possible? Or is there any another way...?
use an Fast Fourier Transformation (FFT) for doing frequency analysis
Related
If I know the SoundFont that a MIDI to audio track has used, can I theoretically reverse the audio back into it's (most likely) MIDI components? If so, what would be one of the best approach to doing this?
The end goal is to try encoding audio (even voice samples) into MIDI such that I can reproduce the original audio in MIDI format better than, say, BearFileConverter. Hopefully with better results than just bandpass filters or FFT.
And no, this is not for any lossy audio compression or sheet transcription, this is mostly for my curiosity.
For monophonic music only, with no background sound, and if your SoundFont synthesis engine and your record sample rates are exactly matched (synchronized to 1ppm or better, have no additional effects, also both using a known A440 reference frequency, known intonation, etc.), then you can try using a set of cross correlations of your recorded audio against a set of synthesized waveform samples at each MIDI pitch from your a-priori known font to create a time line of statistical likelihoods for each MIDI note. Find the local maxima across your pitch range, threshold, and peak pick to find the most likely MIDI note onset times.
Another possibility is sliding sound fingerprinting, but at an even higher computational cost.
This fails in real life due to imperfectly matched sample rates plus added noise, speaker and room acoustic effects, multi-path reverb, and etc. You might also get false positives for note waveforms that are very similar to their own overtones. Voice samples vary even more from any template.
Forget bandpass filters or looking for FFT magnitude peaks, as this works reliably only for close to pure sinewaves, which very few musical instruments or interesting fonts sound like (or are as boring as).
I am developing app to detect the inability of elderly people to unlock their rooms using IC cards in their daycare center.
This room doors has an electronic circuit that emits beep sounds d to signal the user failure in unlock the room. My goal is to detect this beep signal.
I have searched a lot and found some possibilities:
To clip the beep sound and use as a template signal and compare it with test signal (the complete human door interaction audio clip) using convolution, matched filters, DTW or what so ever to measure their similarity. What do u recommend and how to implement it.
To analyze the FFT of beep sound to see if it has a frequency band different that of the background noise. I do not understand how to do it exactly?
To check whether the beep sound form a peak at certain frequency spectrum that is absent in the background noise. If so, Implement a freclipped the beep sound and got the spectrogram as shown in the figure spectrogram of beep sound. but i cannot interpret it? could u give me a detailed explanation of the spectrogram.
3.What is your recommendation? If you have other efficient method for beep detection, please explain.
There is no need to calculate the full spectrum. If you know the frequency of the beep, you can just do a single point DFT and continuously check the level at that frequency. If you detect a rising and falling edge within a given interval it must be the beep sound.
You might want to have a look at the Goertzel Algorithm. It is an algorithm for continuous single point DFT calculation.
What are a few of the simplest features of a sound wave that are correlated with differences between voices of different speakers? Where can I find how to calculate these from data?
The project we're trying to solve takes 3 hours of audio as input, each moment of which has either one of two people talking or silence. We need to determine which of these alternatives each moment corresponds to using hmm.
The results don't have to be 100% accurate. An approximation will suffice.
Are low-pass filters and high-pass filters essentially the same when referring to accelerometer algorithms as when referring to sound engineering (audio processing)?
In sound engineering, a high pass filter cuts out the low frequencies associated with the bass sound, whereas low pass filters cut out high frequencies associated with treble sounds.
I want to understand what these filters are when applied to accelerometer data and how they are used, and am wondering if there's a parallel with the physics of sound. It's all physics, right?
If they are linked in some way that might allow me to understand how to measure accelerometer movements quicker than learning it from scratch.
Thanks
Yes, the concepts are exactly the same. If you think of frequency as how rapidly something changes, you will immediately see the parallels between filtering audio, images, sensor input - anything.
A low-pass filter will only allow relatively slow changes from its output. So any "jerkiness" in the accelerometer signal would be removed, and only the more gradual change (think overall trend) would pass.
I'm working on an openGL project that involves a speaking cartoon face. My hope is to play the speech (encoded as mp3s) and animate its mouth using the audio data. I've never really worked with audio before so I'm not sure where to start, but some googling led me to believe my first step would be converting the mp3 to pcm.
I don't really anticipate the need for any Fourier transforms, though that could be nice. The mouth really just needs to move around when there's audio (I was thinking of basing it on volume).
Any tips on to implement something like this or pointers to resources would be much appreciated. Thanks!
-S
Whatever you do, you're going to need to decode the MP3s into PCM data first. There are a number of third-party libraries that can do this for you. Then, you'll need to analyze the PCM data and do some signal processing on it.
Automatically generating realistic lipsync data from audio is a very hard problem, and you're wise to not try to tackle it. I like your idea of simply basing it on the volume. One way you could compute the current volume is to use a rolling window of some size (e.g. 1/16 second), and compute the average power in the sound wave over that window. That is, at frame T, you compute the average power over frames [T-N, T], where N is the number of frames in your window.
Thanks to Parseval's theorem, we can easily compute the power in a wave without having to take the Fourier transform or anything complicated -- the average power is just the sum of the squares of the PCM values in the window, divided by the number of frames in the window. Then, you can convert the power into a decibel rating by dividing it by some base power (which can be 1 for simplicity), taking the logarithm, and multiplying by 10.