I have a flow where iOS app users will record a large video file and upload it to our server. After the fact, the user might want to extract certain portions of that larger video based on specific time stamps and generate a highlight reel that can be viewed and shared locally back on the iOS device.
As a FE developer I don't really have much experience with where to even start here. Our BE will be built in NodeJS. It seems to me that this should be a relatively straightforward problem to solve, but I don't know.
Are there APIs that make movie manipulation easy? Can I easily extract a clip based on a start and stop time and save that as a separate file? Are those costly tasks? Or not too bad?
I'm guessing that the response to this call would be a list of a series of file names that have been generated as a result of these clips being generated, that the iOS app could then pull down and load.
It's not quite as straightforward as it might seem as video files are quite structured with header information and indexing into the individual video and audio tracks and frames. Any splitting up or cropping needs to allow for this and also create new files with the correct headers and indexing etc.
Fortunately, there are indeed libraries that you can use to do this type of thing, one of the most powerful being ffmpeg.
There are projects which allow the ffmpeg command line tool be used programatically - the advantage of this approach is that you get to leverage the vast community knowledge base for ffmpeg command line.
One of the popular ones for nodejs is:
https://github.com/damianociarla/node-ffmpeg
You can then look at the ffmpeg documentation or community answers to find the particularly functionality you need - for example to crop video at a start and end time as you asked:
https://stackoverflow.com/a/42827058/334402
https://superuser.com/a/704118
The general idea is quite simple and will be of the format:
ffmpeg -i yourInputVideo.mp4 -ss 01:30:00 -to 02:30:00 -c copy copy yourNewOutputVideo.mp4
It's worth taking a look at the seeking info in the ffmpeg online documentation (https://ffmpeg.org/ffmpeg.html) to help understand the examples, especially the second one above:
-ss position (input/output)
When used as an input option (before -i), seeks in this input file to position. Note that in most formats it is not possible to seek exactly, so ffmpeg will seek to the closest seek point before position. When transcoding and -accurate_seek is enabled (the default), this extra segment between the seek point and position will be decoded and discarded. When doing stream copy or when -noaccurate_seek is used, it will be preserved.
When used as an output option (before an output url), decodes but discards input until the timestamps reach position.
position must be a time duration specification, see (ffmpeg-utils)the Time duration section in the ffmpeg-utils(1) manual.
I want to be able to set the "Title" and "Comments" (listed in properties->details) of some mp3 files in Windows using python. Is this possible, perhaps with a library like PyWin32? Also, would these details be visible in other operating systems or are they Windows-specific? Thanks.
Simple Answer:
Yes, you can set 'Title' and 'Comments' (and many other fields) of an mp3 file in Windows using Python.
Also, the details are visible on all operating systems and are not windows specific.
First you have to understand what is mp3 file and how data is organized within an mp3 file.
Detailed Answer:
Raw audio consumes a lot of size. For example, an audio signal of 10 sec sampled 48 kHz and having a bit depth of 16 bits per sample will be of size 10*48000*16 bits, which is close to 1 MB. So, for a 5 minute song, it will almost take 30 MB. But, if you observe, most 5 min mp3 songs are of size around 5 MB (of course it depends on sampling frequency, bit depth and amount of compression used). How is it possible? It is possible because we compress the data using signal processing techniques which in itself is a big topic altogether which we will not discuss here. So, to create an mp3 file we need something called encoder which converts the raw audio data to compressed data and every time you play an mp3 song, decoder is used which converts the data from compressed format to raw audio, which is what you can only listen. So, compression is done for saving storage and also transmission bandwidth (basically saving amount of data to be transmitted over internet).
Now, coming to how data is organized inside an mp3 file. mp3 file will obviously contain the compressed data. In addition many mp3 files contain some meta data (like Title and Comments you mentioned in your question). There are several formats for storing this meta data. So, a decoder which is decoding mp3 file should also support decoding of meta-data, then only you can see the information, other wise you can't see. The meta data is operating system independent, and can be seen on any operating system provided you have a proper decoder.
Finally, yes you can edit the meta data on windows (for that matter on any OS) using python. If you want to do this, using only python without any library, you need to understand how data is organized inside an mp3 file, find the meta-data inside it, edit it and store it back. But, there are libraries and packages in python which support editing meta-data of mp3 file. You can use them directly. Also, the meta data is independent of OS, and once you edit your properties, you should be able to see the properties in any OS provided the decoder you use has the support.
Some links which will help you:
mp3 tag tool
Another stack overflow question which gives details about libraries that support viewing and editing of meta data using Python
On TED.com they have transcriptions and they go to the appropriate section of the video when clicking a part of the transcription.
I want to do this for 80 hours of audios and transcriptions I have, on Linux with OSS.
This is the approach I'm thinking:
Start small with a 30 minuite sample
Split the audio up into 2 minute WAV file formatted chunks, even if it breaks words up
Run the phrase spotter from CMU Sphinx's long-audio-aligner on each chunk, with the transcript
Take the time index for identified words/phrases found in each bit and calculate the actual estimated time of the ngrams in the original audio file.
Does this seem like an efficient approach? Has anyone actually done this?
Are there alternate approaches that are worth trying like dumb word counting that may be accurate enough?
You can just feed all your audio and text in a long audio aligner and it will give you the timestamps of the words. Using this timestamps you can jump to the specific word in a file.
I'm not sure why do you want to split your audio or do something else.
I am looking for a way to automatically extract parts from audio files. Something like Imagemagick for audio files.
I only need to extract random parts of a fixed length from a large set of complete ogg-vorbis files. I easily know how to automatically interpret the output from a programm, so I would be able to write a small script if I had programs to do the following:
Get the length of the file
Extract parts of the given an offset in seconds and a length
Is there any program, which allows me to do this under linux? The files I am using are ogg vorbis files.
If there is a python library, which is able to do this, it would work as well.
You can use SoX (Sound eXchange) to do both.
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I've got many, many mp3 files that I would like to merge into a single file. I've used the command line method
copy /b 1.mp3+2.mp3 3.mp3
but it's a pain when there's a lot of them and their namings are inconsistent. The time never seems to come out right either.
David's answer is correct that just concatenating the files will leave ID3 tags scattered inside (although this doesn't normally affect playback, so you can do "copy /b" or on UNIX "cat a.mp3 b.mp3 > combined.mp3" in a pinch).
However, mp3wrap isn't exactly the right tool to just combine multiple MP3s into one "clean" file. Rather than using ID3, it actually inserts its own custom data format in amongst the MP3 frames (the "wrap" part), which causes issues with playback, particularly on iTunes and iPods. Although the file will play back fine if you just let them run from start to finish (because players will skip these is arbitrary non-MPEG bytes) the file duration and bitrate will be reported incorrectly, which breaks seeking. Also, mp3wrap will wipe out all your ID3 metadata, including cover art, and fail to update the VBR header with the correct file length.
mp3cat on its own will produce a good concatenated data file (so, better than mp3wrap), but it also strips ID3 tags and fails to update the VBR header with the correct length of the joined file.
Here's a good explanation of these issues and method (two actually) to combine MP3 files and produce a "clean" final result with original metadata intact -- it's command-line so works on Mac/Linux/BSD etc. It uses:
mp3cat to combine the MPEG data frames only into a continuous file, then
id3cp to copy all metadata over to the combined file, and finally
VBRFix to update the VBR header.
For a Windows GUI tool, take a look at Merge MP3 -- it takes care of everything. (VBRFix also comes in GUI form, but it doesn't do the joining.)
As Thomas Owens pointed out, simply concatenating the files will leave multiple ID3 headers scattered throughout the resulting concatenated file - so the time/bitrate info will be wildly wrong.
You're going to need to use a tool which can combine the audio data for you.
mp3wrap would be ideal for this - it's designed to join together MP3 files, without needing to decode + re-encode the data (which would result in a loss of audio quality) and will also deal with the ID3 tags intelligently.
The resulting file can also be split back into its component parts using the mp3splt tool - mp3wrap adds information to the IDv3 comment to allow this.
Use ffmpeg or a similar tool to convert all of your MP3s into a consistent format, e.g.
ffmpeg -i originalA.mp3 -f mp3 -ab 128kb -ar 44100 -ac 2 intermediateA.mp3
ffmpeg -i originalB.mp3 -f mp3 -ab 128kb -ar 44100 -ac 2 intermediateB.mp3
Then, at runtime, concat your files together:
cat intermediateA.mp3 intermediateB.mp3 > output.mp3
Finally, run them through the tool MP3Val to fix any stream errors without forcing a full re-encode:
mp3val output.mp3 -f -nb
The time problem has to do with the ID3 headers of the MP3 files, which is something your method isn't taking into account as the entire file is copied.
Do you have a language of choice that you want to use or doesn't it matter? That will affect what libraries are available that support the operations you want.
MP3 files have headers you need to respect.
You could ether use a library like Open Source Audio Library Project and write a tool around it.
Or you can use a tool that understands mp3 files like Audacity.
What I really wanted was a GUI to reorder them and output them as one file
Playlist Producer does exactly that, decoding and reencoding them into a combined MP3. It's designed for creating mix tapes or simple podcasts, but you might find it useful.
(Disclosure: I wrote the software, and I profit if you buy the Pro Edition. The Lite edition is a free version with a few limitations).
As David says, mp3wrap is the way to go. However, I found that it didn't fix the audio length header, so iTunes refused to play the whole file even though all the data was there. (I merged three 7-minute files, but it only saw up to the first 7 minutes.)
I dug up this blog post, which explains how to fix this and also how to copy the ID3 tags over from the original files (on its own, mp3wrap deletes your ID3 tags). Or to just copy the tags (using id3cp from id3lib), do:
id3cp original.mp3 new.mp3
I would use Winamp to do this. Create a playlist of files you want to merge into one, select Disk Writer output plugin, choose filename and you're done. The file you will get will be correct MP3 file and you can set bitrate etc.
I'd not heard of mp3wrap before. Looks great. I'm guessing someone's made it into a gui as well somewhere. But, just to respond to the original post, I've written a gui that does the COPY /b method. So, under the covers, nothing new under the sun, but the program is all about making the process less painful if you have a lot of files to merge...AND you don't want to re-encode AND each set of files to merge are the same bitrate. If you have that (and you're on Windows), check out Mp3Merge at: http://www.leighweb.com/david/mp3merge and see if that's what you're looking for.
If you want something free with a simple user interface that makes a completely clean mp3 I recommend MP3 Joiner.
Features:
Strips ID3 data (both ID3v1 and ID3v2.x) and doesn't add it's own (unlike mp3wrap)
Lossless joining (doesn't decode and re-encode the .mp3s). No codecs required.
Simple UI (see below)
Low memory usage (uses streams)
Very fast (compared to mp3wrap)
I wrote it :) - so you can request features and I'll add them.
Links:
MP3 Joiner website: Here
Latest installer: Here
Personally I would use something like mplayer with the audio pass though option eg -oac copy
Instead of using the command line to do
copy /b 1.mp3+2.mp3 3.mp3
you could instead use "The Rename" to rename all the MP3 fragments into a series of names that are in order based on some kind of counter. Then you could just use the same command line format but change it a little to:
copy /b *.mp3 output_name.mp3
That is assuming you ripped all of these fragment MP3's at the same time and they have the same audio settings. Worked great for me when I was converting an Audio book I had in .aa to a single .mp3. I had to burn all the .aa files to 9 CD's then rip all 9 CD's and then I was left with about 90 mp3's. Really a pain in the a55.