I am using Media Recorder class to record sound . and I am getting amplitude on some interval of time and converting it into decibel, but what I want I also want to get frequency of audio on that interval with corresponding to that amplitude or decibel. I searched, but I did not get proper idea that how to do it.
please if someone can guide me then please help
The process most often used to determine pitch is called the "fast fourier transform". Try using those keywords or the common abbreviation "FFT" along with the language or platform you are working on and that should bring up libraries to incorporate that can do this. Coding an FFT is pretty complex, so you'll probably want to use a library. But if you are curious about the math and how they work, check out The Scientists and Engineers Guide to Digital Signal Processing.
Related
Lets say I have the audio file for Happy Birthday. I want to convert that audio file into an audio file that sounds like this : happy birthday.
First, I'd like to know if I have the ability to program this? Can a highschooler who's almost finished with APCS program this?
If I can:
How would I change the bpm of the song? I've searched through a bunch of websites, but they weren't very helpful.
I know that audio files can be represented in waveforms. How would I scan for each individual wave in an audio file (I need this to isolate the notes)?
This is a very ambitious project, actually. One reason is that it involves using digital signal processing tools like FFT (Fast fourier transforms) to analyze the sound to pick out the pitches. You might be able to find a library that can do this, but as far as coding such a tool, that would involve a steep learning curve.
If you would like to look further into this, there is a good online resource called "The Scientists and Engineers Guide to Digital Signal Processing". I was able to work through and understand the discrete fourier transform with only high school math (lots of trig) and a bit of calculus. It was a lift, though.
Trying to analyze rhythm is also no easy task. Even with advanced tools provided in professional notation system such as Finale, people have trouble playing rhythms in time well enough for the best transcription tools. Algorithms that "quantize" the beats help but also limit the amount of detail that can be included in the playback.
My guess is that as interesting and worthwhile as this project would be, to bring it to completion before the semester ends would require putting together prebuilt pieces. A lot of programming is done that way, these days.
If you scale the project back to something like just getting your code to analyze a short sample of a single note and give its pitch, that would be both impressive and doable with a lot of work. It could be done with a DFT algorithm instead of requiring FFT, reducing the amount of info you'd have to acquire first. That way, you'd only have to work your way up to understanding and implementing the material on this link which is about calculating the DFT. Notice that there is example code in BASIC. The code examples throughout this book are a big help.
I'm a beginner with tensorflow and Python and I'm trying to build an app that automatically detects, in a football (soccer) match some key moments (yellow/red cards, goals, etc).
I'm starting to understand how to do a video analysis training the program on a dataset built by me, downloading images from the web and tagging them. In order to obtain some better results for the analysis, I was wondering if someone had some suggestions on tutorials to follow in order to understand how to train my app also on audio files, to make the program able to understand when there is a pitch variation in the audio of the video and combine both video and audio analysis in order to get better results.
Thank you in advance
Since you are new to Python and to tensorflow, I recommend you focus on just audio for now, especially since its a strong indicator of events of importance in a football match (red/yellow cards, nasty fouls, goals, strong chances, good plays, etc).
Very simply, without using much ML at all, you can use the average volume of a time period to infer significance. If you want to get a little more sophisticated, you can consider speech-to-text libraries to look for keywords in commentator speech.
Using video to try to determine when something important is happening is much, much more challenging.
This page can help you get started with audio signal processing in Python.
https://bastibe.de/2012-11-02-real-time-signal-processing-in-python.html
I've combed StackOverflow and the web for many questions on whistle detection, etc, and many people did explain as much as they could as to how they can go about detecting their stuff.
capturing sound for analysis and visualizing frequences in android
analyzing whistle sound for pitch note
But what I don't get is how does FFT help you to detect certain sounds in a given sample audio data?
Here's what I understand so far from some stuff I found here and there.
-The sine wave is more or less the building block of ALL signals, musical or not
-Three parameters - FREQUENCY, AMPLITUDE, and INITIAL PHASE, characterize every steady sine wave completely.
-They make each and any kind of wave unique.
-Fourier transform can be used to inspect what kinds of sine waves there are in a signal
SOURCE -- [Audio signal processing basics][3]
Audio data that the computer generates as received from the mic or other input source, for live processing, is an array of amplitudes processed (or stored or taken) at a particular sample rate.
So how does one go from that to detecting whistles and claps?
And complex things such as say, a short period of whistling to a particular song?
My theory of detecting is that we test our whistles in a spectogram, and record the particular frequency and amplitude characteristics. And then if those particular characteristics are repeated again in the input, we've detected a whistle.
Am I right or wrong?
This sound processing stuff is a little complicated.
Forgot to mention this - I'm using Python. Java is also okay, since most of the examplar code I found was for Android which is in Java. And I can work in Java too. Any mention of any libraries or APIs would be helpful too.
I am new in j2me developing world.
I just want to know that how to get audio frequency from the audio recording application which stores data in .amr file.
Please help me, I tried a lot, but I am helpless.
So any idea regarding this will be appreciated.
thanks in advance.
im gonna ad here what i have found from the other sites that may be useful to you and me(as a newbie)
http://www.developer.nokia.com/Community/Discussion/showthread.php?154169-Getting-Recorded-Audio-Frequency-in-J2ME
If you want frequency of sound in Hz then it is actually not a single value but a series of values as a function of time.
You will have to calculate fourier transform of the sound samples which will give you frequency.
Read about this on wikipedia on how to calculate fourier transform and frequency graph...
http://www.developer.nokia.com/Community/Discussion/showthread.php?95262-Frequency-Analysis-in-J2ME-MMAPI
this forum says something about fft(fast fourrier transform) and analysing recorded amr sound rather than processing live stream and provides 3 link about fft which are right underneat this line have a look at them:..
look at the site mobile-tuner.com. (im new too. in fact i know nothing about java.)
but the site says that tuner function enabled phones are s60 phones. i was trying to write guitar tuner program. since my phone is nokia 5310 express music which is s40 i gave up.
so good luck to you
note: javax.microedition.media.control.RecordControl
--i don't know too much but i have a hunch about that ""Record Control"" class or function is related to audio frequency function in j2me. and the frequency analysis thing is inside the "sound processing"
I'm looking for a library to detect when ball hit in a audio of a tennis match.
I read this topic but I think there is a suite library for this job.
please guide me
tanx
I doubt that there is a library for this specific task. You can probably implement something from scratch though, using a sliding FFT to generate a power spectrum and some kind of simple template matching in the frequency domain.