Difference between audio encoding/decoding and format conversion - audio

Recently i have been trying to convert an audio file from one format to another through ffmpeg. i was trying to do some google but results made me a little confused about the difference between encoding and decoding an audio file and converting from one format to another.

Let me describe it this way: There are several different file formats for video files (sometimes also called "wrappers"). There are also several different codecs which can be used to encode (or compress) the audio and video. Audio and video use different codecs - and the encoded formats can be sorted in different file types/formats.
So when you talk about "encoding" vs. "converting" a couple of things come into play.
"Encoding" would be the act of taking audio/video and encoding them into a given codec(s). "Converting" implies having stuff in one format, but wanting it in another. There are two ways of looking at this:
Often called "repackaging" - this is when the video (for example) has been encoded correctly (let's say h264, with a bunch of parameters), but you want it in a different file-type - maybe it's an .AVI and you wanted it in an .MP4. This doesn't involve changing the actual video - just re-wraping the h264 stream in a new "wrapper", and is thus a fast operation.
Re-encoding. Let's say your audio was in a MP3 format, and you wanted it in an AAC format. This would require decoding the entire MP3 stream, and re-encoding it into AAC.
Obviously you can also do "1" and "2" together.
Refer Formats and Codecs for detailed information.
Hope it helps!

Related

Safely parsing the output of file (or libmagic)

I'm writing some code where I rely on the file utility to determine the file type of arbitrary files, typically audio files. For the most part, it works great, an ogg file for example might give the following output:
Ogg data, Vorbis audio, mono, 44100 Hz, ~80000 bps, created by: Xiph.Org libVorbis I (1.0.1)
A simple regexp can classify this as ogg vorbis.
But for some other file types, file tries to get clever, an nsf (NES sound format) file for example, can yield this output:
NES Sound File ("The Legend of Zelda" by Konchano, copyright 1987 Nintendo), version 1, 8 tracks, NTSC
"NES Sound File" is clear enough, but it is followed by a string of unstructured data that is clearly just copied from the file itself. A malicious user could create an nsf file where this string is replaced by something like "Ogg data, Vorbis audio", making classification a lot harder.
Now let's say I fix this by discarding anything within parentheses (ignoring the fact that the title of the track could itself contain parentheses), along comes a Protracker module:
4-channel Protracker module sound data Title: "space_debris"
Again, untrusted data straight from the file, in a different position, now with the prefix "Title:". I can attempt to filter it out but really this is becoming a hassle.
I'm not finding any help in the man page. Is there really no way to tell file not to mix these unsafe strings into its output? Or is file simply not the right tool for this job?

Merging of two AAC files into a single file

I am trying to merge two different AAC audio files and a H264 video file to form a single TS file using C++ code. I have been successful in it. So now my TS file possess the following order. First, video part from the video file, then audio part from the first audio file and then audio part from the second audio file and then again the video part and it goes on the same way. On hearing the resulting file, I recognized the presence of the different audio files with the video.The problem is that the resulting audio ain't that much cleared. Distortions can be recognized making it unclear to hear. Also note that the resulting audio seems slow as compared to the original.Can anyone guide me in getting off those distortions and procuring the exact replica of my original files ?
Thanks,
Ashish.

rtmp audio message(0x08) format (mp3)

im trying to write a little client for rtmp(audio only). so far i got the communication working (red5 server) but now im stuck with the audio data.
the server is sending in MP3 44KHz 16bit stereo.
i get my Audiomessage which consists of the byte identifying the codec (0x2f) and the audio data which looks for example like this
ff:fb:92:64:eb:80:03:98:58:d2:e9:26:1b:7e:5d:e7:4a:1a:19:26:5c:8b:89:07:47:44:98:6b:91:2d:9c:28:b4:33:15:70:82:c9:29:87:8d:e4:8f:31:83:84:7b:e5:82:b5:57:62:00:02:e5:bb:f1:86:15:7a:8f:da:9e:ca:4f:83:9d:0a:c4:56:7b:b3:3d:56:43:ba:2b:28:b8:9d:0c:e1:82:0c:08:36:24:f3:39:67:54:b7:41:d9:8e:ef:36:96:56:22:d2:b9:9f:ae:40:43:8e:ea:39:52:0c:a4:48:25:02:54:91:c7:35:37:2d:be:f2:37:23:61:65:35:d9:0f:aa:18:b4:37:d9:d4:c8:68:21:3c:bd:ea:c1:d0:98:df:eb:96:59:99:88:09:37:36:c3:8b:47:80:64:84:41:ba:35:ea:a6:0a:d6:74:9e:09:f6:a5:d7:3f:1f:53:d8:fb:8d:d9:d3:f8:ee:c7:c1:68:25:25:8e:ae:6a:1c:08:52:9d:58:cf:cf:87:c1:ba:a4:f0:63:76:b0:b4:65:79:1b:3b:21:5f:2f:b5:7a:18:43:af:f7:fd:15:0c:87:c9:73:54:95:22:94:cc:cb:e3:da:4d:e0:f3:8a:95:69:69:eb:32:71:57:08:49:76:e0:f3:84:8c:4b:4c:84:6b:5d:7a:c8:c9:d7:df:d5:e2:68:bb:5f:6c:9f:ba:f4:0a:6c:6e:51:8a:b3:59:9a:07:0c:e4:2a:9d:ec:d1:99:53:48:f2:8b:22:b2:d3:bf:e1:5b:9f:ee:49:9f:2c:ee:63:1f:6f:da:90:e7:65:00:55:99:97:77:b9:e8:97:43:81:fd:32:e4:81:20:d0:78:f5:4f:59:47:39:f2:57:5d:f4:d5:91:48:c9:45:10:52:49:4d:04:87:6b:0e:a5:72:ed:34:74:08:93:5b:8a:54:3a:d9:7e:53:8f:c7:5e:b1:99:f3:55:63:72:49:99:55:3a:b8:0d:73:3b:2a:ea:9a:b5:32:d2:3b:61:c2:4e:e9:56:78:99:14:4a:a7:46:f4:ee:ae:6f:ff:c8:85:2d:07:68:ad:e2:84:dd:0a:bd:2e:93:12:43
i dont find a little thing about the data format. as the first byte is always 0xff i assume every chunk of audio data has a little header describing its contents.
the rtmp spec from adobe doesnt loose a single word about the format of the audio message package (just two lines saying its an audio message... wow).
does anyone know the format for the audio messages or at least a source where i find something?
The Adobe spec doesn't document the elementary stream formats because they are covered in their own documents, and usually quite large. MP3 is covered by ISO/IEC 11172-3.
There is a good rundown available here:
http://www.mpgedit.org/mpgedit/mpeg_format/mpeghdr.htm

Multiple audio streams in a MPEG-4 file

The MPEG-4 file format allows multiple streams to be present in a file.
This is useful for videos containing audio in multiple languages. In the case of such a video, the audio streams are synchronized to the video.
Is it possible to create a MPEG-4 file the contains desynchronized audio streams, i.e. the audio track are played on after another?
I want to design a MPEG-4 file that contains a music album, so it is crucial that the tracks are played one after another by media players such as VLC.
When I use MP4Box (from the GPAC framework) the resulting file is recognised by VLC as having synchronized audio streams. Which box of the MPEG-4 file format is responsible for this? Or how can I tell VLC that these audio streams are not synchronized?
Thanks in advance!
I can think of two ways you could do that, and both would be somewhat problematic.
You could concatenate all the audio streams into one audio track in the MP4 file. This won't be ideal, for some obvious reasons. For one thing, it's not exactly what you were asking for.
You could also just store the tracks as synchronized audio streams, but set the timing information in such a way that the first sample of the second track won't start playing until the first track finished playing, etc.
I'm not aware of any tools that can do this, but the file format will support such a scheme. Since it's an unusual way to store audio in an MP4 file, I would expect players to have problems with this, too.
Concatenating all streams would work and the individual tracks can be addressed by adding chapters. It works at least with VLC.
MP4Box -new -cat track1.m4a -cat track2.m4a -chap chapters.txt album.m4a
The chapters.txt would look something like this:
CHAPTER1=00:00:00.00
CHAPTER1NAME=Track 1
CHAPTER2=00:03:40.00
CHAPTER2NAME=Track 2
But this is only a hack.
The solution I'm looking for should preserve the tracks as individual streams.

compressed and uncompressed .wav files

What is the difference between compressed and uncompressed .wav files?
The WAV format is a container format for audio files in Windows.
The WAV file consists of a header and the contents. The header contains information about the size, duration, sampling frequency, resolution, and other information about the audio contained in the WAV file. Generally, after the header is the actual audio data.
Since WAV is a container format, the data it contains can be stored in various formats. One of which is uncompressed PCM, but it can also store ADPCM, MP3 and other formats, and can be read and written if an audio codec for the format is available.
The difference between compressed and uncompressed WAV files is that the data contained within the WAV file is either uncompressed raw audio samples, or it is compressed using an audio codec, in which case, it must be decompressed before it can be played back.
Further reading:
Wikipedia: Audio compression (data)
Wikipedia: WAV
Wikipedia: Codec
There's a great explanation here. The basic difference is that an uncompressed wave file has just the raw bits in it as they "appear". There is nothing done to compress or shrink them. A compressed wave file uses some sort of codec to shrink down the data before putting it in the file.
The difference between these two things is basically in the size of object, the compressed one might have low size compared to uncompressed basically the content are the same.
You have to be very careful when using the word "uncompressed" when talking about media.
Basically ALL digital media is compressed in some way. Audio, or video. No matter what it is, it is compressed in some way. Its intrinsic to converting from analog to digital.
The problem isn't really technical, its lingual.
People think that uncompressed means "nothing done to it" when in reality there really isnt any way you can do this. There is always some kind of compression done when you convert the analog signal coming out of the mic and going into a file...Its essential.
What uncompressed means is very high quality. And different "Uncompressed" codecs do things differently.
I know more about video codecs, so i will base my example in those.
Black Magic (A company that makes video Out Cards) has an Uncompressed Codec. Its very good. Makes Beautiful images.. But its not really "uncompressed". Sure its big. But compare it to a DPX of TIFF image sequence...and it aint that big, and is quite compressed. Its only 10 bit, but something like an OpenEXR image sequence is like 32 bit...and coming from film, that is still technically compressed. It has to be.
Its just the nature of the beast.

Resources