Is there a library for ball hitting sound detection - audio

I'm looking for a library to detect when ball hit in a audio of a tennis match.
I read this topic but I think there is a suite library for this job.
please guide me
tanx

I doubt that there is a library for this specific task. You can probably implement something from scratch though, using a sliding FFT to generate a power spectrum and some kind of simple template matching in the frequency domain.

Related

getting frequency of audio on every interval of time in android

I am using Media Recorder class to record sound . and I am getting amplitude on some interval of time and converting it into decibel, but what I want I also want to get frequency of audio on that interval with corresponding to that amplitude or decibel. I searched, but I did not get proper idea that how to do it.
please if someone can guide me then please help
The process most often used to determine pitch is called the "fast fourier transform". Try using those keywords or the common abbreviation "FFT" along with the language or platform you are working on and that should bring up libraries to incorporate that can do this. Coding an FFT is pretty complex, so you'll probably want to use a library. But if you are curious about the math and how they work, check out The Scientists and Engineers Guide to Digital Signal Processing.

APCS final project: Converting an audio file to a simpler MIDI file

Lets say I have the audio file for Happy Birthday. I want to convert that audio file into an audio file that sounds like this : happy birthday.
First, I'd like to know if I have the ability to program this? Can a highschooler who's almost finished with APCS program this?
If I can:
How would I change the bpm of the song? I've searched through a bunch of websites, but they weren't very helpful.
I know that audio files can be represented in waveforms. How would I scan for each individual wave in an audio file (I need this to isolate the notes)?
This is a very ambitious project, actually. One reason is that it involves using digital signal processing tools like FFT (Fast fourier transforms) to analyze the sound to pick out the pitches. You might be able to find a library that can do this, but as far as coding such a tool, that would involve a steep learning curve.
If you would like to look further into this, there is a good online resource called "The Scientists and Engineers Guide to Digital Signal Processing". I was able to work through and understand the discrete fourier transform with only high school math (lots of trig) and a bit of calculus. It was a lift, though.
Trying to analyze rhythm is also no easy task. Even with advanced tools provided in professional notation system such as Finale, people have trouble playing rhythms in time well enough for the best transcription tools. Algorithms that "quantize" the beats help but also limit the amount of detail that can be included in the playback.
My guess is that as interesting and worthwhile as this project would be, to bring it to completion before the semester ends would require putting together prebuilt pieces. A lot of programming is done that way, these days.
If you scale the project back to something like just getting your code to analyze a short sample of a single note and give its pitch, that would be both impressive and doable with a lot of work. It could be done with a DFT algorithm instead of requiring FFT, reducing the amount of info you'd have to acquire first. That way, you'd only have to work your way up to understanding and implementing the material on this link which is about calculating the DFT. Notice that there is example code in BASIC. The code examples throughout this book are a big help.

Tutorial tensorflow audio pitch analysis

I'm a beginner with tensorflow and Python and I'm trying to build an app that automatically detects, in a football (soccer) match some key moments (yellow/red cards, goals, etc).
I'm starting to understand how to do a video analysis training the program on a dataset built by me, downloading images from the web and tagging them. In order to obtain some better results for the analysis, I was wondering if someone had some suggestions on tutorials to follow in order to understand how to train my app also on audio files, to make the program able to understand when there is a pitch variation in the audio of the video and combine both video and audio analysis in order to get better results.
Thank you in advance
Since you are new to Python and to tensorflow, I recommend you focus on just audio for now, especially since its a strong indicator of events of importance in a football match (red/yellow cards, nasty fouls, goals, strong chances, good plays, etc).
Very simply, without using much ML at all, you can use the average volume of a time period to infer significance. If you want to get a little more sophisticated, you can consider speech-to-text libraries to look for keywords in commentator speech.
Using video to try to determine when something important is happening is much, much more challenging.
This page can help you get started with audio signal processing in Python.
https://bastibe.de/2012-11-02-real-time-signal-processing-in-python.html

Realtime audio manipulation

Here is what i like to achieve:
I like to play around in creating "new" software / hardware instruments.
Sound processing and creation is always managed by software. But one could play the instrument via ultrasonic distance sensor for example. Another idea is to start playback when someone interrupts the light of a photoelectric barrier and so on....
So the instrument would play common sounds, but has to be used in an unusal way. For example, the ultrasonic instrument would play a sound if it detects something in a certain distance. The sound could be manipiulated in pitch for example if the distance gets smaller.
Basically i like to playback a sound sample and manipualte this in realtime.
I guess i have to use WAV samples for this, right? And which programming language do you think fits best for this task?
Edited after kevins hint: please kick me into the right direction - give me a hint where to start.
Thanks in advance
Since you're using the the Processing tag, you can try Processing.
It comes with a sound library like Minim or you can install beads which is great. There's actually a nice book on it: Sonifying Processing
You might find SuperColider fun as well.
The main thing is what are you comfortable with at the moment ?
If Processing syntax looks intimidating, you can actually try a different programming paradigm like data flow. In which case you can use PureData(free, opensource) or MaxMSP(very similar, but commercial). The idea is rather than typing instructions, you connect boxes with wires which is fun and the examples are great too.
If you're into c++ there are plenty of libraries. On the creative side, there's a nice set of libraries called OpenFrameworks that's easy and fun to use. If this is your cup of tea, have a peek at Maximilian.
Bottomline is: there are multiple options to achieve the same task. Choose the best tool for your (based on your background) or try each and see what you like best.
You asked "And which programming language do you think fits best for this task?" - I would also suggest using Processing. I have been used Processing to work with sounds previously. And in all cases I used Minim. It has many UgenS to generate sounds programmatically.
Also, you wants to integrate with some sensors. I'm not sure what types of sensors you will use, but Processing goes pretty well with different Arduino modules and sensors. Check this link for more direction.
Furthermore, you can export your project as .exe or executable .jar files. And their JS version (P5.js) works almost the same as the Java version.

Tracking the top of heads with Kinect

I was wondering if there was an existing API for tracking the top of people heads with the Kinect. e.g., the Kinect is facing downwards from a ceiling.
If not, how might I implement such a thing with its depth data.
No. The Kinect expects to be facing a standing (or seated, given the appropriate flag) human. All APIs (official or 3rd party) that have a notion of skeleton tracking expect this.
If you wish you track someone from above, you will need to use a library such as OpenCV (or EmguCV, for C# development). Well, you don't have to, but they offer utilities to help with computer vision and image processing. These libraries don't care if you are using a Kinect or just a regular RGB camera.
Using the Kinect from above, you could use the depth data to help locate and track blobs. With the Kinect at a known distance from the floor, have a few people walk under it and see what z-coordinates you get out of it -- you can then assume that anything within a certain z-coordinate range is a person walking across the screen (vs. a cat, or something else).
You will need to use standard image processing techniques (see OpenCV reference above) to initially find the blobs within the image. Once found, the depth data from the Kinect might be useful but I think you'll find it isn't ultimately necessary if you're just watching people walk across the floor.
We built a Kinect-driven experience where the sensors had to point downward to detect users walking along a wall. We used openTSPS to do all the work of taking the camera input and doing blob detection and handing off tracked "persons" to (in our case) a Processing app. It works really well for us.
http://opentsps.com/

Resources