How to find resolution and framerate values in H.264 MPEG-2 TS? - resolution

I'm working on a MPEG-2 TS video containing a H.264 stream, and I'm looking for video properties stored in the stream, by scanning PAT, PMT, PES, etc.
I'm able to read PAT, PMT, and elementary streams type and PID. Here I would like to find the resolution and the framerate (fps). Are they located in the PES header, or elsewhere? They are not in PAT or PMT.
Below, Transport Stream Packet Editor is able to find two different informations, one itself and the other from the Haali Media Decoder helper codec. How to get the first one:
Pseudo-code is welcomed.

I am not sure about the availability of the height width information in the MPEG2TS header. Because TS file can have multiple programs. But if you are targeting only TS files made of H.264 elementary stream then you an get these informations from the SPS of the H.264 elementary stream.
Every H.264 frame starts with four or three bytes sequence header 0x00 0x00 0x01 or 0x00 0x00 0x00 0x01. The frame is an SPS frame if doing the AND operation with next byte after start headers is equal to 0x07.
E.g. SPS frame 0x00 0x00 0x00 0x01 0x67 ... Doing AND operation (0x67 & 0x1F) = 0x07
Parsing the SPS header is also not an easy task but you can find the details in ffmpeg source code.
Hope this helps.

No, they were not present in PES header. To find resolution and frame rate from H.264 video in MPEG2-TS you need to parse SPS(Sequence parameter set)from H.264 stream.
These are the steps for parsing H.264 NAL(Network adaption layer) units:
Parse NAL unit prefix(NAL unit prefix is of 3(0x00,0x00,0x01) or 4(0x00,0x00,0x00,0x01) byte code) then Header(next byte after prefix code)
Check the type of NAL unit(last 5 bits) from Header byte.
If NAL unit is of type 7 means,this NAL unit is SPS NAL unit then parse the code
This ITU link gives the documentation about h.264 standard.
See section 7.3.2.1.1: Sequence parameter set data syntax gives the syntax to find the parameters in SPS.

I presume working code for this is resident inside of the ffprobe binary for the FFMPEG project, as it produces the desired output:
$ ffprobe -v quiet -show_streams output1.mp4
[STREAM]
index=0
codec_name=h264
... // A bunch of stream data
width=1280
height=1024
sample_aspect_ratio=1:1
display_aspect_ratio=5:4
....
r_frame_rate=30000/1001
avg_frame_rate=30000/1001
time_base=1/30000
...
[/STREAM]

The information you are looking for is inside the H.264 SPS NAL units.
You need to parse the PES data, extract the NALUs and then parse the SPS data. There you'll find the resolution. If SPS carries VUI information you have information about the desired frame rate.
MPEG2-TS is a transport stream, it transports something but does not carry detailed information about what it carries. It just wraps stuff.
What you could use from MPEG2-TS is PTS/DTS of the PES header and average the frame rate from the presentation time stamps provided.
To do it properly, parse the PES header, parse the NALU headers, parse the actual SPS NAL unit and if present the VUI it contains.

Related

Decode aac raw data and get the length of CPE/SCE

I have some aac raw data from an mp4 file which does not have moov header. I want to know the number of samples in this data and size of each sample. I have the decoder specific info as well.
After searching I have found out that aac samples in m4 files does not have ADTS header, they are just CPE/SCE. I want to know the size of each CPE/SCE.
For what I know AAC frames does not encode their own size so usually the framing is provided by a container. That said with some luck and some heuristics maybe you can guess the start of each frame, e.g. looks for 0x01 (SCE with instance tag 0 and part of global gain field) or 0x2? (CPE) and usually the samples don't very that much is size.

MPEG Transport Stream Audio data information

I am writing a code to extract AAC audio data from mpeg ts stream. I want to get stream properties like sampling frequency, number of channels, Audio type, Audio profile type etc. from Transport stream, without decoding the actual data. How much of the information will be available from stream?
Also I want to know is there any way to find the total duration of the stream without actually finding the last PTS value in the file
Thanks
AAC frames packed in TS use ADTS headers. Its 7 (or 9) bytes, and very easy to parse. ADTS header format is documented well online.

MJPEG data doesn't have SOI marker. What kind of mjpeg I'm dealing with?

I'm writing a simple MJPEG streamer in Java.
Since it has very specific application, I had to reverse engineer the rtp packets sent by the GStreamer using the Wireshark sniffer. The GStreamer, in turn, just restreams the mjpeg avi file taken from specific camera.
The first and the most straightforward way is to prepare packets as an array of data and send them to the consumer, particularly VLC player, that uses live555 library. I'm populating the rtp headers well, so the internal payload is hardcoded; in my experiment VLC successfully displays the encoded screen, i.e. it decodes the hardcoded rtp payload well.
So, that is the value of first two packets payload (from total 25) in hex view:
String[] hexArray =
{
"0000000000ffa05a00000080100b0c0e0c0a100e0d0e1211101318281a181616183123251d283a333d3c3933383740485c4e404457453738506d51575f626768673e4d71797064785c6567631112121815182f1a1a2f634238426363636363636363636363636363636363636363636363636363636363636363636363636363636363636363636363636363ad8a5a00296801692800a5ed40052d001450014b40052f5a004dbce694e68000c7d29435002e452e68017a8a2800c51400518c500141a004c5348a0031498a005a2800a2800a2800a28014514001a2800a51400b450014500251400941a004a2800a5a004c668a004a5a00292800a5a0029734006697340052d0019a5cd004d15c4917dd6e3d0d5d8751523f7831ee2802d32c3728090187622aacba79eb1367d8d00547478db0ea54fbd20a0008a4e94005140052e6801411de9719a00694c8e29bb4d001d29680145385002629306800e476a379e38a00a9711bc97d0b80762839fd2926b677ba128e9b81340178e49a4a0028a004a2800a2801ebcd0450018a422801299707103ffba6802b45129d1d1d87ce4819a6c56ed2c45d0f4ed4010c91329c329151b479a008da32298548a004a334012a4acbdea64ba1d186280275955ba1a75002e714e0fc608cd002ec46e876d4b1d94afca1523d734012fd825c7381504700695e3923f987dd6cf7a009134fee5003f5a71b50a083b4714019de63472ba30fba719f5a956e79a0099270dc77a833feb1cf6a0087c3d8315c49d8c86b54d001d39a963b9963e8d91e86802c2df82312c60fd283159ce3e5f9188fa50042fa5b6331386fad5592d668cfcc87ea28021a280108a4c500262931400845215a00694cd31d38a00a8edf3800f5ab7167cb258f4a004a4ed4007bd283400b450014b40051400b4500140a005cd1400b45002d2e2801368349b481807a50028c8a5dd8eb400bb8529a000506800a2800a280034dc50018a4a0028a0028a0028a00296800a2800a05002d1400b4500145002628c5001494009de96800a280128a00292800a2800a5a0028068017345002e696800a5cd00392578db2848abb0ea2540120dc3d6802ec734370080437b1a8a5b147c98ced3e9da8029c904917df5e3d474a8e801297140094b400519a007034bd6801a541a69522800cfad2e680141a703400b46280136d2e280168c0228010ad26d34006293140062931400e14fc50021146df949c8a006e3bd57bdff8f57f718a0046f9747b65f5c1a92c86db66f76a0091803503db46dd06d3ed4015ded9d738f9876aaee9ce08c1a0089a3a8ca500260d1400a1b078a952775ef9a009d2e54f0dc54c1c1e86801c0d3d6464395623e940121bb988ff00586985d8924b1c9a008e57900cef6fceab34ae5c61cf1d6802c8903f2c050608dcf1c6680226b778db767a75fa5131db67330e323028022d29bc8d3e3ec5f923deafa5c83de80255954f7a76e1400b45003d26923395722ac25fb8e1d430a007f99693ff00ac4009f6a6369b138cc52fe7401564b09d3f8770f6aaecaca70ca41f7a0069146280022931400dc5262802bc96b193bb6f23de95b88cfd2800a3b5001411400500e68014519a00052d002f4a2800a2800a05002d2d0019a5cd002e68a005a050018e68c500041ed4738a003340606801720d2d001450036908c50025140051400514001a5a0028a002945002d250014b4005140052d002518a004c5140084514005140098a08a0028a00292800a280168a005a33400b9a5a00296801c188390715662be963c03f30f7a00bd15e4528c13b4fa1a596d62979c6d6f51401525b2953ee8de3d4557f62306800a4c50014a680014b400669d9a000804629bb3d2801304500f3400b9a706a00506945001462800a28016930280108a4c50018a70340013450004554d40edb5ff810fe74005e7cb6b6d1f60b9a9edc62", //1
"000004ec00ffa05ad23f53c9a0075260500211cd35a35718619a00aef66adf74e2abc96cebd5723da802131d4663a00615c52722800cd2ab9071934013a5cb0ebcd4cb72a7af140130604706968003cd44f0ab1c8e0d0048221b786e7de8daebd680012718a64f189ad9a20719e45003e2b74fb3c48c7e64183ef4c7b47ce55b1400d2658f191d0734b1dd6680275ba1532cc08eb400f0c294e2800a504af20e0d004e97b3271907eb538bc8a5189a3feb40086d6d67e51b6fb0a824d324519460d40151e0963fbc845474005262801920e2abce76c4dec2801d4940051d4d0021eb4a38a00297140074a5068003d68cd0014a38140051400b466800a5a0029680168cd0028a075a005ef4668014e29314009b79cd1839ce6800e7346ef6a000114b40084669b834008296800068a005a4a00296800a5a0028a0031450014a0d002d25002d140098a280131477a004a2800a280131462800a2800a2800a2800a2800a5cd002d2d0014b9a00506a68aea48beeb71e86802f43a8231c48369f5ed561922b850480c3b11401525b060731b6e1e86aaba321c3020fbd00251400519a005a41400a0d381a005e0d26d1400d2b8a43c50000d3b3400a1a9c0d0014500145001d697140098a5071400871da9a3ad002e6a9ea4730aafabaff3a005d48fef2251d92acc38fb3463da801d8a154b1c0ea68002b8241ea28c500376d211400c7851faad5692cffb87f034015e48193ef2d4263a00618e98460d001466801eb2329e0d4eb72470c280265995bbd48083400b9a5dc68011f0c3a60fad33045003be6039a72c8477a0078914f0c3229ad04321ce3140113da1504a1a882cabfc3400e5b82ad86e2a64ba05b19e940132ce0f7a915c3500381a5a004a952e254fbae7f1a00b09a81e049183f4a5c594e390149fc280227d3323314991ef55a4b39e3ce53207714015241818354af8e2dcfb9a009e92800029318a004ef4b400519cd002d038a005cd140052d00140a0028a00052e79a005a280169450014b40051400b4b400521a000d2e280108a4d9ce6800c11df349c8eb40099e69280168a005a4a005a3b500252d0014b4005140051400b40a005a28016931400946280128c500149400514001a4a0028a0028a004a280168a005a2801734b9a00296800cd491c8f19ca311401761d408e2519f7156d648a71c10d9ed4010cb60a7988edf6354e58258befaf1ea3a50047450019a5cd0019a280141a50d400b9a0806801a57d29bcd00283466801c1a9dba801783498a0028068017349400628228012a95f73244bfed668013516cdd7d302ae8e2341ed40052e6800dc73934a2800146280108cd3769a0031eb51496d1b83c60fb500577b361f77e6aaaf110704608a008da3a618e801b82282680141c53d6565e8680274b9fef0a996556e86801f9a5a0050d8a3e461c8e7d6800f2f8e0e693e65eb400a1cf4cd49bc746140018e293d326a07b21bb2a714010b452a1c0e68499875c8a009d6e7d4d4cb7008e680241203de9fb85002f1498e6801caee872ac454e97d2a8c361beb4014f559567f28c636b03f37b8ac7d41c2f94a7a33806802d5140094b400d22968010d274a005cd2d0014a0d0014500145002d1400b45001d296800ef477a005a51400a0f34b40051400b40e7ad0018e694f4a004a05002d21140098a42b91400d2b4878a00514b4001a4a005a280168a0028a0028a00296800cd19a005a33400b450025262801283400521a0028a0028a004a2800a280128a002945002d19a005a01a005a5a0001a72b10410706802e417ee9c3fcc3f5abb0dd4530c6707d0d0012da452f206d3ea2a9cb672c63206f1ea2802be292800a2800a5a0039a5cd0019a5a00695f4a6f22800cd2eea0050d4e0d400bba8cd0019a5cd002e68a004c550ba39d42dd3d89fa7228021ba7dd76deed5a8dc1c7b500369680128cd002834b", //2
According to the spec, the first bytes of the first packet are:
0000000000ffa05a
this is Type-specific, Fragment Offset, Type, Q, Width, Height, MBZ, Precision, Length
the next data starting with 0000008010 and ending with 636363 is Quantization Table 128 bytes long.
So, the question is what is the data starting right after it?
ad8a5a0029
According to specs, no one of it can be recognized as possible JPEG marker, so the question is what is the encoding used and how can I encode the image I like with this particular encoding?
Attempt to encode on the manner of http://www.media.mit.edu/pia/Research/deepview/src/JpegEncoder.java works well for the jpeg files, but is not decoded with VLC's live555.

a-law/raw audio data

I have spent the evening messing around with raw A-law audio input/output from the built in ALSA tools aplay and arecord, and passing them through an offline moving average filter I have written.
My question is: the audio seems to be encoded using values between 0x2A and 0xAA - a range of 128. I have been reading through this guide which is informative but doesn't really explain why and offset of 42 (0x2A) has been chosen. The file I used to examine this was a square wave exported from audacity as unsigned 8-bit 8kHz audio and examined in a hex editor.
Can anyone shed some light on how A-law is encoded in a file?
This may help;
/dev/dsp
8000 frames per second, 8 bits per frame (1 byte);
# Max volume = \xff (or \x00).
# No volume = \x80 (the middle).

Live audio streaming container formats

When I start receiving the live audio (radio) stream (e.g. MP3 or AAC) I think the received data are not kind of raw bitstream (i.e. raw encoder output), but they are always wrapped into some container format. If this assumption is correct, then I guess I cannot start streaming from arbitrary place of the stream, but I have to wait to some sync byte. Is that right? Is it usual to have some sync bytes? Is there any header following the sync byte, from which I can guess the used codec, number of channels, sample rate, etc.?
When I connect to live stream, will I receive data starting by the nearest sync byte or I will get them from the actual position and I have to check for the sync byte first?
Some streams like icecast use headers in the HTTP response, where stream related information are included, but i think i can skip them and deal directly with the steam format.
Is that correct?
Regards,
STeN
When you look at SHOUTcast/Icecast, the data that comes across is pure MPEG Layer III audio data, and nothing more. (Provided you haven't requested metadata.)
It can be cut at an arbitrary place, so you need to sync to the stream. This is usually done by finding a potential header, and using the data in that header to find sequential headers. Once you have found a few frame headers, you can safely assume you have synced up to the stream and start decoding for playback.
Again, there is no "container format" for these. It's just raw data.
Now, if you want metadata, you have to request it from the server. The data is then just injected into the stream every x number of bytes. See http://www.smackfu.com/stuff/programming/shoutcast.html.
Doom9 has great starting info about both mpeg and aac frame formats. Shoutcast will add some 'metadata' now and then, and it's really trivial. The thing I want to share with you is this; I have an application that can capture all kind of streams, and shoutcast, both aac and mp3 is among them. First versions had their files cut at arbitrary point according to the time, for example every 5 minutes, regardless of the mp3/aac frames. It was somehow OK for the mp3 (the files were playable) but was very bad for aacplus.
The thing is - aacplus decoder ISN'T that forgiving about wrong data, and I had everything from access violations to mysterious software shutdowns with no errors of any kind.
Anyway, if you want to capture stream, open the socket to the server, read the response, you'll have some header there, then use that info to strip metadata that will be injected now and then. Use the header information for both aacplus and mp3 to determine frame boundaries, and try to honor them and split the file at the right place.
mp3 frame header:
http://www.mp3-tech.org/programmer/frame_header.html
aacplus frame header:
http://wiki.multimedia.cx/index.php?title=Understanding_AAC
also this:
aacplus frame alignment problems
Unfortunately it's not always that easy, check the format and notes here:
MPEG frame header format
I will continue the discussion byu answering myself (even we are discouraged to do that):
I was also looking into streamed data and I have found that frequently the sequence ff f3 82 70 is repeated - this I suggest is the MPEG frame header, so I try to look what that means:
ff f3 82 70 (hex) = 11111111 11110011 10000010 01110000 (bin)
Analysis
11111111111 | SYNC
10 | MPEG version 2
01 | Layer III
1 | No CRC
1000 | 64 kbps
00 | 22050Hz
1 | Padding
0 | Private
01 | Joint stereo
11 | ...
Any comments to that?
When starting receiving the streaming data, should I discard all data prior this header before giving the buffer to the class which deals with the DSP? I know this can be implementation specific, but I would like to know what are in general the proceedings here...
BR
STeN

Resources