automatically partition audio files into small parts - audio

I am looking for a way to automatically extract parts from audio files. Something like Imagemagick for audio files.
I only need to extract random parts of a fixed length from a large set of complete ogg-vorbis files. I easily know how to automatically interpret the output from a programm, so I would be able to write a small script if I had programs to do the following:
Get the length of the file
Extract parts of the given an offset in seconds and a length
Is there any program, which allows me to do this under linux? The files I am using are ogg vorbis files.
If there is a python library, which is able to do this, it would work as well.

You can use SoX (Sound eXchange) to do both.

Related

Is there a way to set the details of a file in Windows using python?

I want to be able to set the "Title" and "Comments" (listed in properties->details) of some mp3 files in Windows using python. Is this possible, perhaps with a library like PyWin32? Also, would these details be visible in other operating systems or are they Windows-specific? Thanks.
Simple Answer:
Yes, you can set 'Title' and 'Comments' (and many other fields) of an mp3 file in Windows using Python.
Also, the details are visible on all operating systems and are not windows specific.
First you have to understand what is mp3 file and how data is organized within an mp3 file.
Detailed Answer:
Raw audio consumes a lot of size. For example, an audio signal of 10 sec sampled 48 kHz and having a bit depth of 16 bits per sample will be of size 10*48000*16 bits, which is close to 1 MB. So, for a 5 minute song, it will almost take 30 MB. But, if you observe, most 5 min mp3 songs are of size around 5 MB (of course it depends on sampling frequency, bit depth and amount of compression used). How is it possible? It is possible because we compress the data using signal processing techniques which in itself is a big topic altogether which we will not discuss here. So, to create an mp3 file we need something called encoder which converts the raw audio data to compressed data and every time you play an mp3 song, decoder is used which converts the data from compressed format to raw audio, which is what you can only listen. So, compression is done for saving storage and also transmission bandwidth (basically saving amount of data to be transmitted over internet).
Now, coming to how data is organized inside an mp3 file. mp3 file will obviously contain the compressed data. In addition many mp3 files contain some meta data (like Title and Comments you mentioned in your question). There are several formats for storing this meta data. So, a decoder which is decoding mp3 file should also support decoding of meta-data, then only you can see the information, other wise you can't see. The meta data is operating system independent, and can be seen on any operating system provided you have a proper decoder.
Finally, yes you can edit the meta data on windows (for that matter on any OS) using python. If you want to do this, using only python without any library, you need to understand how data is organized inside an mp3 file, find the meta-data inside it, edit it and store it back. But, there are libraries and packages in python which support editing meta-data of mp3 file. You can use them directly. Also, the meta data is independent of OS, and once you edit your properties, you should be able to see the properties in any OS provided the decoder you use has the support.
Some links which will help you:
mp3 tag tool
Another stack overflow question which gives details about libraries that support viewing and editing of meta data using Python

A simple way to turn raw ascii data into sound

I'm looking for method to process simple data into an audio output such as an mp3 file. The data is in the form of a two-column text file with a time signature in milliseconds and a level in millivolts.
Ideally the method would be script-able (with linux or unix tools). I have tried using Audacity to read raw data, however it seems to expect binary files, and doesn't seem to be flexible with sample rates etc.

Converting Audio From Unknown Format

I would like to create a utility in either PHP or Perl to convert an audio file created by the Nortel's Callpilot voice mail system into a wave file. The problem is that the format, which has the .vbk file extension, is unknown to virtually any audio player. To date, I have not found one that will play a .vbk file. I've looked at audio file conversion libraries in CPAN and tried many of them, they don't recognize the file. I was not successful with PHP's audio formats manipulation either. Nortel does provide a converter, however, it does not suite my needs. I would like to have this run via cron on a CentOS system. I don't know how to reverse engineer this format. There seems to be just scraps of info on this format on the web. This page indicates that it is "based on the H.232 format":
https://www.odesk.com/o/jobs/job/Reverse-Engineer-Nortel-VBK-Audio-Format_~~f501f11679f3f6bb/
I know this is a very old thread, but I've recently been looking into converting Nortel's vbk format as well. Importing the vbk files into Audacity with raw data option, Encoding: U-Law, Byte order: little-endian, Channels: 1 Channel (Mono), Sample rate: 8000 Hz. Not sure if they have multiple formats for their vbk files, but mine were from a BCM50 phone system.
Well, this is the joy of closed proprietary systems. But there is a chance they could play nice. Try to contact Callpilot and see if they'll give you the format specs. It's worth a shot.
As for reverse engineering, you need to be able to generate known content. Like a constant tone at 60Hz for exactly 1 second. Then at 50Hz. Then at 10 seconds. Compare them. Isolate the data from the metadata. There is going to be compression involved, so try a handful of common compression schemes, maybe research into Nortel's practices will probably tell you more. If you can feed that into a player and get a tone back out, you're on your way.
There's probably more informed and structured ways to go about reverse engineering, but from my experience it's a lot of trial and error.

Efficiently generating time index of pre-transcribed speech using it's audio source and open source tools

On TED.com they have transcriptions and they go to the appropriate section of the video when clicking a part of the transcription.
I want to do this for 80 hours of audios and transcriptions I have, on Linux with OSS.
This is the approach I'm thinking:
Start small with a 30 minuite sample
Split the audio up into 2 minute WAV file formatted chunks, even if it breaks words up
Run the phrase spotter from CMU Sphinx's long-audio-aligner on each chunk, with the transcript
Take the time index for identified words/phrases found in each bit and calculate the actual estimated time of the ngrams in the original audio file.
Does this seem like an efficient approach? Has anyone actually done this?
Are there alternate approaches that are worth trying like dumb word counting that may be accurate enough?
You can just feed all your audio and text in a long audio aligner and it will give you the timestamps of the words. Using this timestamps you can jump to the specific word in a file.
I'm not sure why do you want to split your audio or do something else.

Hashing raw audio data

I'm looking for a solution to this task: I want to open any audio file (MP3,FLAC,WAV), then proceed it to the extracted form and hash this data. The thing is: I don't know how to get this extracted audio data. DirectX could do the job, right? And also, I suppose if I have fo example two MP3 files, both 320kbps and only ID3 tags differ and there's a garbage inside on of the files mixed with audio data (MP3 format allows garbage to be inside) and I extract both files, I should get the exactly same audio data, right? I'd only differ if one file is 128 and the other 320, for example. Okay so, the question is, is there a way to use DirectX to get this extracted audio data? I imagine it'd be some function returning byte array or something. Also, it would be handy to just extract whole file without playback. I want to process hundreds of files so 3-10min/s each (if files have to be played at natural speed for decoding) is way worse that one second for each file (only extracting)
I hope my question is understandable.
Thanks a lot for answers,
Aaron
Use http://sox.sourceforge.net/ (multiplatform). It's faster than realtime as you'd like, and it's designed for batch mode much more than DirectX. For example, sox -r 48k -b 16 -L -c 1 in.mp3 out.raw. Loop that over your hundreds of files using whatever scripting language you like (bash, python, .bat, ...).

Resources