rtmp audio message(0x08) format (mp3) - audio

im trying to write a little client for rtmp(audio only). so far i got the communication working (red5 server) but now im stuck with the audio data.
the server is sending in MP3 44KHz 16bit stereo.
i get my Audiomessage which consists of the byte identifying the codec (0x2f) and the audio data which looks for example like this
ff:fb:92:64:eb:80:03:98:58:d2:e9:26:1b:7e:5d:e7:4a:1a:19:26:5c:8b:89:07:47:44:98:6b:91:2d:9c:28:b4:33:15:70:82:c9:29:87:8d:e4:8f:31:83:84:7b:e5:82:b5:57:62:00:02:e5:bb:f1:86:15:7a:8f:da:9e:ca:4f:83:9d:0a:c4:56:7b:b3:3d:56:43:ba:2b:28:b8:9d:0c:e1:82:0c:08:36:24:f3:39:67:54:b7:41:d9:8e:ef:36:96:56:22:d2:b9:9f:ae:40:43:8e:ea:39:52:0c:a4:48:25:02:54:91:c7:35:37:2d:be:f2:37:23:61:65:35:d9:0f:aa:18:b4:37:d9:d4:c8:68:21:3c:bd:ea:c1:d0:98:df:eb:96:59:99:88:09:37:36:c3:8b:47:80:64:84:41:ba:35:ea:a6:0a:d6:74:9e:09:f6:a5:d7:3f:1f:53:d8:fb:8d:d9:d3:f8:ee:c7:c1:68:25:25:8e:ae:6a:1c:08:52:9d:58:cf:cf:87:c1:ba:a4:f0:63:76:b0:b4:65:79:1b:3b:21:5f:2f:b5:7a:18:43:af:f7:fd:15:0c:87:c9:73:54:95:22:94:cc:cb:e3:da:4d:e0:f3:8a:95:69:69:eb:32:71:57:08:49:76:e0:f3:84:8c:4b:4c:84:6b:5d:7a:c8:c9:d7:df:d5:e2:68:bb:5f:6c:9f:ba:f4:0a:6c:6e:51:8a:b3:59:9a:07:0c:e4:2a:9d:ec:d1:99:53:48:f2:8b:22:b2:d3:bf:e1:5b:9f:ee:49:9f:2c:ee:63:1f:6f:da:90:e7:65:00:55:99:97:77:b9:e8:97:43:81:fd:32:e4:81:20:d0:78:f5:4f:59:47:39:f2:57:5d:f4:d5:91:48:c9:45:10:52:49:4d:04:87:6b:0e:a5:72:ed:34:74:08:93:5b:8a:54:3a:d9:7e:53:8f:c7:5e:b1:99:f3:55:63:72:49:99:55:3a:b8:0d:73:3b:2a:ea:9a:b5:32:d2:3b:61:c2:4e:e9:56:78:99:14:4a:a7:46:f4:ee:ae:6f:ff:c8:85:2d:07:68:ad:e2:84:dd:0a:bd:2e:93:12:43
i dont find a little thing about the data format. as the first byte is always 0xff i assume every chunk of audio data has a little header describing its contents.
the rtmp spec from adobe doesnt loose a single word about the format of the audio message package (just two lines saying its an audio message... wow).
does anyone know the format for the audio messages or at least a source where i find something?

The Adobe spec doesn't document the elementary stream formats because they are covered in their own documents, and usually quite large. MP3 is covered by ISO/IEC 11172-3.
There is a good rundown available here:
http://www.mpgedit.org/mpgedit/mpeg_format/mpeghdr.htm

Related

How to combine jpeg frames with uncompressed mono audio into an h264 stream or any other format processed by web browsers out of the box?

So I have an esp32 which captures images and sound. The esp32-camera library already returns the jpeg encoded buffer. The audio however is uncompressed and is just a digital representation of signal strength at high sample rate.
I use esp32 to host a webpage which contains <image> element and a JavaScript snippet, which constantly sends GET requests to a branching url for image data and updates the element. This approach is not very good, especially that now I've added audio capabilities to the circuit.
I'm curious if it would be possible to combine jpeg encoded frames and some audio data into a chunk of h264 and then send it directly as a response to a GET request making it a stream?
This not only would simplify the whole serving multiple webpages thing, but also remove the issues of syncing the audio and video if they are sent separately.
In particular I'm also curious how easy would it be to do on esp32 since it doesn't have a whole bunch of ram and computational power. It would be challenging to find or port large libraries which could help as well, so i guess I would have to code it myself.
I also am not sure if h264 is the best option. I know its supported on most browser out of the box and is using jpeg compression behind the scenes for the frames, but perhaps a simpler format exists which is also widely supported.
So to sum it up: Is h264 a best bet in the provided context? Is combining jpeg and uncompressed mono audio into h264 possible in the provided context? If an answer to either of previous questions is a no, what alternatives do i have if any?
I'm curious if it would be possible to combine jpeg encoded frames and some audio data into a chunk of h264 and then send it directly as a response to a GET request making it a stream?
H.264 is a video codec. It doesn't have anything to do with audio.
I know its supported on most browser out of the box and is using jpeg compression behind the scenes for the frames
No, this isn't true. H.264 is its own thing. It's far more powerful than JPEG and is specifically designed for motion, whereas JPEG was not.
You need a few things:
A video codec, to efficiently handle your frames. Most of these embedded camera libraries can give you an MJPEG stream. I'd use that if possible. I don't think your ESP32 has other video encoding capability, does it? H.264 is a good choice, but only if you can actually encode it.
A container format, to aid in streaming your audio and video streams together. ISOBMFF/MP4 is common, as is WebM/Matroska.
If you're only streaming to a single client (which seems likely given the limited horsepower of the board), and if you have enough capability to do the audio/video encoding, you can generate a WebM stream on the fly that is directly playable in a <video> element. This seems exactly what you are asking for.

File information of .raw audio files using terminal in linux

How to get file information like sampling rate, bit rate etc of .raw audio files using terminal in linux? Soxi works for .wav files but it isn't working for .raw.
If your life depended on discovering an answer you could make some assumption to tease apart the unknowns ... however there is no automated way since the missing header would give you the easy answers ...
The audio analysis tool called audacity allows you to open up a RAW file, make some guesses and play the track
http://www.audacityteam.org
In audacity goto File -> Import -> Raw Data...
Above settings are typical for audio ripped from a CD ... toy with trying stereo vs mono for starters.
Those picklist widgets give you wiggle room to discover the format of your PCM audio given that the source audio is something when properly rendered is recognizable ... would be harder if the actual audio was noise
However if you need a programmatic method then rolling your own solution to ask those same questions which appear in above window is possible ... is that what you need or will audacity work for you ? We can go down the road of writing code to play off the unknowns mentioned in #Frank Lauterwald's comment
To kick start discovering this information programmatically, if the binary raw audio is 16 bit then each audio sample (point on the audio curve) will consume two bytes of your PCM file. For mono audio then the following two bytes would be your next sample, however if its stereo then these two following bytes would be the sample from the other channel. If more than two channels then just repeat. Typical audio is little endian. Sampling rate is important when rendering the audio, not when programmatically parsing raw bytes. One approach would be to create an output file with a WAV header followed by your source PCM data. Populate the header with answers from your guesswork. This way you could listen to this output file to help confirm your guesses.
Here is a sample 500k mono PCM audio file signed 16 bit which can be imported into audacity or used as input to rolling your own identification code
The_Constructus_Corporation_Long_Street-ycexQvMy03k_excerpt_mono.pcm

ffmpeg - Can I draw an audio channel as an image?

I'm wondering if it's possible to draw an audio channel of a video or audio file as an image using ffmpeg, or if there's another tool that would do it on Win2k8 x64. I'm doing this as part of an encoding process after a user uploads a video or audio file.
I'm using ColdFusion 10 to handle the upload and calling cfexecute to run ffmpeg.
I need the image to look something like this (without the horizontal lines):
You can do this programmatically very easily.
Study the basics of FFmpeg. I suggest you to compile this sample. It explains how to open a video/audio, identify the streams and loop over the packets.
Once you have the data packet (in this case you are interested only in the audio packets). You will decode it (line 87 of this document) and obtain the raw data of an audio. It's the waveform itself (the analogue "bitmap" for an audio).
You could also study this sample. This second example is how to write a video/audio file. You don't want to write any video, but with this sample you can easily understand how the audio raw data packet works, if you see the functions get_audio_frame() and write_audio_frame().
You need to have some knowledge about creating a bitmap. Any platform has an easy way to do that.
So, the answer for you: YES, IT IS POSSIBLE TO DO THIS WITH FFMPEG! But you have to code a little bit in order to get what you want...
UPDATE:
Sorry, there are ALSO built-in features for this:
You could use those filters... or
showspectrum, showwaves, avectorscope
Here are some examples on how to use it: FFmpeg Filters - 12.22 showwaves.

Difference between audio encoding/decoding and format conversion

Recently i have been trying to convert an audio file from one format to another through ffmpeg. i was trying to do some google but results made me a little confused about the difference between encoding and decoding an audio file and converting from one format to another.
Let me describe it this way: There are several different file formats for video files (sometimes also called "wrappers"). There are also several different codecs which can be used to encode (or compress) the audio and video. Audio and video use different codecs - and the encoded formats can be sorted in different file types/formats.
So when you talk about "encoding" vs. "converting" a couple of things come into play.
"Encoding" would be the act of taking audio/video and encoding them into a given codec(s). "Converting" implies having stuff in one format, but wanting it in another. There are two ways of looking at this:
Often called "repackaging" - this is when the video (for example) has been encoded correctly (let's say h264, with a bunch of parameters), but you want it in a different file-type - maybe it's an .AVI and you wanted it in an .MP4. This doesn't involve changing the actual video - just re-wraping the h264 stream in a new "wrapper", and is thus a fast operation.
Re-encoding. Let's say your audio was in a MP3 format, and you wanted it in an AAC format. This would require decoding the entire MP3 stream, and re-encoding it into AAC.
Obviously you can also do "1" and "2" together.
Refer Formats and Codecs for detailed information.
Hope it helps!

Decode G711(PCM u-law)

Please bear with me as my understanding of audio codec is limited.
I have this audio source from a IPCAM (through a htto//... CGI interface).
I am trying to write several client programs to play this audio source on Windows, MAC, as well as Android phone. The audio is encoded in G711 (PCM ulaw).
Do I need to decode the PCM audio data to a raw audio data before I could pass it to the audio engine to play? If so, is there some sample code on how to decode it?
I am confused as somehow I believe PCM is already RAW. Could I just feed it directly to the audio engine on Android for example?
thanks much in advance
It depends on what API you are using to play sound, but most require linear PCM and you have µ-law PCM, so unless your API supports µ-law playback you will need to convert the µ-law sample values to linear.
With G.711 the compressed µ-law samples are 8 bits and these will be converted to 14 bit linear values which you will store in a buffer as 2 bytes per sample. There is a brief description of the µ-law encoding on the G.711 Wikipedia page.
You may find this useful:
u-Law companding algorithm in C

Resources