Finding audio peaks in video files - linux

I have a bunch of video files that I want to process. I want to write a program that can find the audio peaks in each file and return the times where those peaks occurred.
I've looked for a lot of different APIs in different languages but couldn't get any of them to work. I am partial to php and java, so if anyone knows any good audio processing libraries in those languages that would be great! But really I don't care too much about the language. I will need to run this program on a cron.
Also, is it possible to use system calls to ffmpeg from within a script to accomplish this? Thanks in advance.

While I've only used this to work directly with audio files, the python wrapper around theechonest's audio analysis service can slurp in the audio from various video files. It uses ffmpegs shared libs to do this, though I find this wrapper much easier to work with via python then the command line.
Of particular interest within the api is echonest.video which is, to quote the docs:
Framework that turns video into silly putty.
I'd add a couple other helpful urls but apparently I can only add one since I don't have a reputation...
anyway, hopefully that's a helpful lead.

Related

Realtime Sound Routing...Trigger a Sound with Another Sound

I'm looking for a program that is able to recognize individual audio samples from my computer and reroute them to trigger WAV files from a library. In my project, it would need to be realtime as the latency would not be a desired result. I tried using dictation software that would recognize words to trigger opening a file and that's the direction where I want to go, but instead of words I want it to be sounds and it would happen in realtime. I'm not sure where to go and am just looking for some guidance. Does anyone have any suggestions of what I should do?
That's a fairly broad question, but I can tell you how I would do it. (Hardly the only way, but where I would start.)
If you're looking for real time input, the Java Sound library (excellent tutorial here) allows for that. (Just note that microphone input from a web page is difficult on anything, due to major security concerns, so this would be a desktop application.)
If it needs to be real time, the first thing I would suggest is stream and multithread the hell out of it. I would suggest the Java 8 Stream API, but since you're looking for subsamples that match a specific pattern, then each data point will have to be aware of the state of its neighbors, and that isn't easy with streams.
You will probably want to know if a sound roughly resembles an audio profile, so for that, I would pick a tolerance on just how close you want it to be for a match (remembering that samples may not line up 100% anyway, so "exact" is not an option), and then look up Hidden Markov Models. I suggest these because they're what voice recognition software typically uses, and while your sounds may not be voices, it will give you an idea of what has already been done.
You'll also want to maintain a limited list of audio samples in memory. Specifically, you will likely need the most recent data, because an audio signal is a time-variant signal, and you can't get a match from just one point. I wouldn't make it much longer than the longest sample you're looking to recognize, as audio takes up a boatload of memory.
Lastly (for audio), I would recommend picking a standard format for comparison. Make it as good as gets you decent results, and start high. You will want to convert everything to that format before you compare it.
Once you recognize a specific sound, it's basically a Command Pattern. Specific sounds can be mapped, even with a java.util.HashMap, to specific files, which (if there are few enough) you might even have pre-loaded.
Lastly, it's worth looking at the Java Speech API. It's not part of the JDK and it's quite dated, but you might get some good advice from its implementation.
This is of course the advice of a Java-preferring programmer, but I imagine that there might be some decent libraries in Python and Ruby to help you as well; and of course there's something in C somewhere. This may sound like a lot, but most of the material is already implemented and ready-to-go.
Hopefully this helps, let's look forward to other answers.

Audio support for programming languages

I want to start on a hobby project that focuses on displaying audio files in a folder in a certain fashion and has the ability to play such an audio file and shows basic control options for playing. However, i'm struggling to find a fit programming language for this.
The displaying part shouldn't be too hard and can probably be done in most of the programming languages. The audio part is what concerns me the most since it's not the main focus of the project and should only do limited things (so it shouldn't be too hard) and i do not know anything about sound support in the programming languages i currently know. (Java, C and C++)
Specifically i would like to be able to do these things:
Play a sound file
Stop/pause a playing song
Adjust volume
Show a bar that displays the current position in the song
Most files will be .mp3 files but being able to process other formats is certainly a plus. Since this is just a small project it's ok if it runs just on Windows. Scalabilty would be nice but not required.
It would be nice to have a small overview of audio support/audio libraries of programming languages (i'm always up for something new) that can accomplish these simple things, in a not too complicated way, aswell as personal experiences.
In this way i hope to create a better understanding of which programming language fits my project best. (i would very much like to not have to change language mid-way the project)
--
Edit:
This is only for a later stage of the project if the first part was successfull: i will want to change the file names of the audio files that are displayed. (to make them follow a specific format)
I haven't written audio processing programs much, but I know a lot of them exist for C and C++. For Java perhaps, too, but I don't know Java. I had used audio with SDL in a game, but that doesn't have that many features and I don't recommend it.
There's this question asking for a library in C, and there are a couple of similar questions that SO brings up on the side. You may want to take a look at those.
You would also need to look for a library that loads different file types. SDL at least, only opens .wav files, which I believe most of the playback libraries would support. For MP3, you will most likely need an additional library. I know Audacity uses LAME Mp3 so I'm guessing that should be good.
Some of the functionalities you want is also doable by yourself. For example, knowing the length of the music and the amount you have already read, you will know how far in the audio you are. Adjusting the volume is also a multiplication (in the simplest case) that you can do on the audio data if the library doesn't provide it.
A very good choice seems to be PortAudio which is used by Audacity, and also recommended in the accepted answer of the question I mentioned above.
I've done audio apps in both Java and C++. Java development goes way faster because it's a more powerful language and has garbage collection, but JavaSound is a pretty awful solution for audio. Of course, there are wrappers for FFMPEG and other stuff, so you can get a lot of things working. Here's an example of a Java audio app: http://www.indabamusic.com/help/mantis
OTOH, C++ gives you lots of control, low latency and wealth of libraries. (another answer mentioned Portaudio, which is, indeed, great.) But you will definitely find it also has a much longer development cycle.
You can certainly do everything you want to do with either language.

automatically trim audio files

I'm trying to extract snippets (3-5 sec.) from a large collection of audio files.
I would like to do this with a shell script. I found basically nothing on the internet, so I'm asking here.
I'm also familiar with perl, php and java - I don't care what language will do, I just want to job done :)
Scenario: I got a large archive of audio files in very high resolution. I need to extract a very short snippet in low resolution for a preview (3 to 5 sec. is extremely short but that's what we need). Being a huge fan of shell scripting, I was hoping to automatize a process that extracts the snippet at RANDOM onset time... is it really too much to ask? :)
Thank you for your ideas!
You can use flash to automatically set lengths on the audio according to this blog:
http://help.adobe.com/en_US/FlashPlatform/reference/actionscript/3/flash/media/Sound.html
Hope it helps.

How to Serve/Stream Multiple Audio Files

I'm working on a project where we have many small audio files of around 500-600k. Then there are audio files of around 15M.
The 15M files are full narrated articles. The smaller ones are individual sentences within the article.
There are going to be many users and many articles in the future.
I want to be able to load the audio files relatively fast -- either through pre-loading or streaming or something of that nature. Basically if a user clicks on a button -- I want the audio to start more or less immediately.
What are my options here? Red5? Icecast?
EDIT:
I'd like to avoid flash if at all possible but not opposed to it -- I definitely can't use html5 audio as much as I'd like too.
I've already tried doing document onload to issue get requests for the files -- there are usually 15-20 per page. (19 small files, one big one). That doesn't seem to work as well as I thought it might.
In terms of latency -- I'm looking for push-button instant play -- right now I can count to 2 or 3 for the small files and 6-7 for the big one. Flash would be able to do this?
Streaming solutions such as Icecast are not appropriate here. All you need is simple HTTP.
You don't mention what you are playing these things on the client side with. If you are doing this in flash, it is relatively simple to preload or play while the download is still running.
For audio compression, you should be using MP3. For speech, you can easily get away with a lower bitrate. 48kbit 44.1kHz Mono is generally acceptable. This will load fine, even on decent mobile connections.
In any case, HTTP is the way to go. That way you can request the separate files easily. Icecast is for a single stream that runs for awhile, such as internet radio.
ok -- so i did some investigation and figured out what the competition was using
it was this:
http://www.schillmania.com/projects/soundmanager2/
basically what it does is try and use html5 audio tags with the ever so helpful 'preload=true' flag set and if it can't do that it fallsback on flash to preload the mp3

Available options for playing a stream or a remote mp3 file on iOS 4

I am trying to make an application for listening to podcasts. Each podcast is an mp3 file, around 50MB in size. After reviewing the Using Audio chapter of the Multimedia Programming Guide, I decided to use AVPlayer, as the other options did not seem appropriate. However, the more I work with AVFoundation, the more complicated it seems and I have a feeling that simply streaming an mp3 file should be easier. Plus on the top of this document, there is a note stating:
Important: This document contains
information that used to be in iOS
Application Programming Guide. The
information in this document has not
been updated specifically for iOS 4.0
Does that mean that I have some other options, or that AVFoundation is maybe an overkill for what I need to do? I would really appreciate it if someone could clear things out a bit and let me know if I'm making something wrong here.
Thanks in advance!
You should explore Cocos Denshion.
http://www.cocos2d-iphone.org/wiki/doku.php/cocosdenshion:cookbook
The audio engine comes with cocos2d, and it is just 5 classes you can include with your project.
It's very simple to use, as you can see from the above link. It's basically just a wrapper for some AVFoundation classes.
The only trick will be to stream your mp3, but it looks like you can simply update the Cocos Denshion CDAudioManager to hand a URL to the AVAudioPlayer, as a start. Whether or not that satisfies your streaming requirement, I don't know.
At the very least, it will give you some AVFoundation code to study.
I just found a pdf with a nice overview of some possible options from this course blog. Together with Julian's suggestion this is all I could find so far.

Resources