Is it possible to encode and decode jpeg by blocks? - jpeg

I want to develop some very specific UDP protocol, with following workflow:
It is very simple in design, and very "realtim'ish", in contrast to stubs like HTTP MJPEG streaming technique: it could fully utulize even narrow channel to deliever as much updates as quickly as possible. Drawbacks are that delivery not guranteed, and frame start and end are not determined - with each packet you know just that "some blocks at that coordinates just have been updated"
But I am very new to libjpeg api. How is it possible to encode frame, but then split bitstream, to pack in packet only "integer" amount of DCT-ized and huffman-encoded blocks? (as it stated in article on how JPEG works)
So can reword the task - Given encoded jpeg file; the goal is to split data stream to chunks up to some maximum size in such a way, that each chunk would contain just suffient data to decode it. (No chunk concatenation is allowed)

Related

Is it possible to splice advertisements or messages dynamically into an MP3 file via a standard GET request?

Say you have an MP3 file and it's 60,000,000 bytes, and you also have an MP3 advertisement that's 500,000 bytes, both encoded at the same bit rate.
Would it be possible using an nginx or apache module to change the MP3 "Content-Length" header value to 60,500,000 and then control the incoming "Content-Range" requests so the first 500,000 bytes return the advertisement audio, and any range request greater than 500,000 begins returning the regular audio file with a 500,000 byte offset?
Or is it only possible to splice advertisements (or messages) into an MP3 file using an application such as FFmpeg to re-render the entire file?
Apologies if this is a stupid question, I'm just trying to think outside of the box.
You cannot arbitrarily splice MP3 without artifacts and decoder errors.
You also generally cannot cut/splice MP3 on frame boundaries due to the Bit Reservoir. Basically, a particular MP3 frame may contain data from another frame to more efficiently use the available bandwidth when its needed. Ignoring the bit reservoir can also cause artifacts and/or decoder errors.
What you can do is re-encode your advertisement and eventually re-join the stream. That is, at the point of ad insertion, decode the stream to PCM, mix (or replace in the audio) for your ad, and have this parallel stream re-encoded to PCM. If the encoding parameters are the same, eventually (after a couple of extra MP3 frames), you'll have identical bitstreams, and you can go back to reading the stream from the same buffer.
If you're doing this for ad-insertion on internet radio (live) streams, keep in mind that you'll have to do this on the server for every client (or at least, for each ad variant and timing variant). If this is for podcasts or other pre-recorded content, I'd recommend the FFmpeg route. You won't have to build anything, you can stream and cache the output as its being encoded, and you'll have compatibility with other codecs without building one-off code for each codec/container.

How do Shoutcast servers and clients deal with mp3 frame headers and frame dependencies?

Short story:
If I myself intend to receive and then send a Shoutcast compatible audio stream processed by my application, then how to do it properly using an mp3 (de/en)coder library? Pseudo code, or better - lame mp3 specific code would be highly appreciated.
Long story:
More specific questions which bother me were caused by an article about mp3, which says:
Generally, frames are independent items. Each frame has its own header
and audio informations. There is no file header. Therefore, you can
cut any part of MPEG file and play it correctly (this should be done
on frame boundaries but most applications will handle incorrect
headers). For Layer III, this is not 100% correct. Due to internal
data organization in MPEG version 1 Layer III files, frames are often
dependent of each other and they cannot be cut off just like that.
This made me wonder, how Shoutcast servers and clients deal with frame headers and frame dependencies.
Do I have to encode to constant bitrate (CBR) only, if I want to achieve maximum compatibility with the most of Shoutcast players out there?
Is the mp3 frame header used at all or the stream format is deduced from a Shoutcast protocol specific HTTP header?
Does Shoutcast protocol guarantee (or is it common good practice) to start serving mp3 stream on frame boundaries and continue to respond with chunks that are cut at frame boundaries? But what is the minimum or recommended size of a mp3 frame for streaming live audio?
How does Shoutcast deal with frame dependencies - does it do something special with mp3 encoding to ensure that the served stream does not have frames which depend on previous frames (if this is even possible)? Or maybe it ignores these dependencies on server side/client side, thus getting audio quality reduction or even artifacts?
SHOUTcast servers do not know or care about the data being passed through them. They send it as-is. You can actually send arbitrary data through a SHOUTcast server, and receive it. SHOUTcast will segment the media data wherever the buffer size falls.
It's up to the client to re-sync to the data. It does this by locating the frame header, then being decoding. Once the codec has enough frames to reliably play back audio, it will begin outputting raw PCM. It's up to the codec when to decide it's safe to start playback. Since the codec knows what it's doing in terms of decoding the media, it knows when it has sufficient data (including bit reservoirs) to begin without artifacts. It's also worth noting that the bit reservoir cannot be carried on too far, so it doesn't take but a few frames at worst to handle it.
This is one of the reasons it's important to have a sizable buffer server-side, to flush to the clients as fast as possible on connect. If playback is to start quickly, the codec needs more data than the current frame to begin.

A way to add data "mid stream" to encoded audio (possibly with AAC)

Is there a way to add lossless data to an AAC audio stream?
Essentially I am looking to be able to inject "this frame of audio should be played at XXX time" every n frames in.
If I use a lossless codec I suppose I could just inject my own header mid stream and that data would be intact as it needs to be the same on the way out just like gzip does not loose data.
Any ideas? I suppose I could encode the data into chunks of AAC on the server and on the network layer add a timestamp saying play the following chunk of AAC at time x but I'd prefer to figure a way to add it to the audio itself.
This is not really possible (short of writing your own specialized encoder), as AAC (and MP3) frames are not truly standalone.
There is a concept of the bit reservoir, where unused bandwidth from one frame can be utilized for a later frame that may need more bandwidth to store a more complicated sound. That is, data from frame 1 might be needed in frame 2 and/or 3. If you cut the stream between frames 1 and 2 and insert your alternative frames, the reference to the bit reservoir data is broken and you have damaged frame 2's ability to be decoded.
There are encoders that can work in a mode where the bit reservoir isn't used (at the cost of quality). If operating in this mode, you should be able to cut the stream more freely along frame boundaries.
Unfortunately, the best way to handle this is to do it in the time domain when dealing with your raw PCM samples. This gives you more control over the timing placement anyway, and ensures that your stream can also be used with other codecs.

MJPEG data doesn't have SOI marker. What kind of mjpeg I'm dealing with?

I'm writing a simple MJPEG streamer in Java.
Since it has very specific application, I had to reverse engineer the rtp packets sent by the GStreamer using the Wireshark sniffer. The GStreamer, in turn, just restreams the mjpeg avi file taken from specific camera.
The first and the most straightforward way is to prepare packets as an array of data and send them to the consumer, particularly VLC player, that uses live555 library. I'm populating the rtp headers well, so the internal payload is hardcoded; in my experiment VLC successfully displays the encoded screen, i.e. it decodes the hardcoded rtp payload well.
So, that is the value of first two packets payload (from total 25) in hex view:
String[] hexArray =
{
"0000000000ffa05a00000080100b0c0e0c0a100e0d0e1211101318281a181616183123251d283a333d3c3933383740485c4e404457453738506d51575f626768673e4d71797064785c6567631112121815182f1a1a2f634238426363636363636363636363636363636363636363636363636363636363636363636363636363636363636363636363636363ad8a5a00296801692800a5ed40052d001450014b40052f5a004dbce694e68000c7d29435002e452e68017a8a2800c51400518c500141a004c5348a0031498a005a2800a2800a2800a28014514001a2800a51400b450014500251400941a004a2800a5a004c668a004a5a00292800a5a0029734006697340052d0019a5cd004d15c4917dd6e3d0d5d8751523f7831ee2802d32c3728090187622aacba79eb1367d8d00547478db0ea54fbd20a0008a4e94005140052e6801411de9719a00694c8e29bb4d001d29680145385002629306800e476a379e38a00a9711bc97d0b80762839fd2926b677ba128e9b81340178e49a4a0028a004a2800a2801ebcd0450018a422801299707103ffba6802b45129d1d1d87ce4819a6c56ed2c45d0f4ed4010c91329c329151b479a008da32298548a004a334012a4acbdea64ba1d186280275955ba1a75002e714e0fc608cd002ec46e876d4b1d94afca1523d734012fd825c7381504700695e3923f987dd6cf7a009134fee5003f5a71b50a083b4714019de63472ba30fba719f5a956e79a0099270dc77a833feb1cf6a0087c3d8315c49d8c86b54d001d39a963b9963e8d91e86802c2df82312c60fd283159ce3e5f9188fa50042fa5b6331386fad5592d668cfcc87ea28021a280108a4c500262931400845215a00694cd31d38a00a8edf3800f5ab7167cb258f4a004a4ed4007bd283400b450014b40051400b4500140a005cd1400b45002d2e2801368349b481807a50028c8a5dd8eb400bb8529a000506800a2800a280034dc50018a4a0028a0028a0028a00296800a2800a05002d1400b4500145002628c5001494009de96800a280128a00292800a2800a5a0028068017345002e696800a5cd00392578db2848abb0ea2540120dc3d6802ec734370080437b1a8a5b147c98ced3e9da8029c904917df5e3d474a8e801297140094b400519a007034bd6801a541a69522800cfad2e680141a703400b46280136d2e280168c0228010ad26d34006293140062931400e14fc50021146df949c8a006e3bd57bdff8f57f718a0046f9747b65f5c1a92c86db66f76a0091803503db46dd06d3ed4015ded9d738f9876aaee9ce08c1a0089a3a8ca500260d1400a1b078a952775ef9a009d2e54f0dc54c1c1e86801c0d3d6464395623e940121bb988ff00586985d8924b1c9a008e57900cef6fceab34ae5c61cf1d6802c8903f2c050608dcf1c6680226b778db767a75fa5131db67330e323028022d29bc8d3e3ec5f923deafa5c83de80255954f7a76e1400b45003d26923395722ac25fb8e1d430a007f99693ff00ac4009f6a6369b138cc52fe7401564b09d3f8770f6aaecaca70ca41f7a0069146280022931400dc5262802bc96b193bb6f23de95b88cfd2800a3b5001411400500e68014519a00052d002f4a2800a2800a05002d2d0019a5cd002e68a005a050018e68c500041ed4738a003340606801720d2d001450036908c50025140051400514001a5a0028a002945002d250014b4005140052d002518a004c5140084514005140098a08a0028a00292800a280168a005a33400b9a5a00296801c188390715662be963c03f30f7a00bd15e4528c13b4fa1a596d62979c6d6f51401525b2953ee8de3d4557f62306800a4c50014a680014b400669d9a000804629bb3d2801304500f3400b9a706a00506945001462800a28016930280108a4c50018a70340013450004554d40edb5ff810fe74005e7cb6b6d1f60b9a9edc62", //1
"000004ec00ffa05ad23f53c9a0075260500211cd35a35718619a00aef66adf74e2abc96cebd5723da802131d4663a00615c52722800cd2ab9071934013a5cb0ebcd4cb72a7af140130604706968003cd44f0ab1c8e0d0048221b786e7de8daebd680012718a64f189ad9a20719e45003e2b74fb3c48c7e64183ef4c7b47ce55b1400d2658f191d0734b1dd6680275ba1532cc08eb400f0c294e2800a504af20e0d004e97b3271907eb538bc8a5189a3feb40086d6d67e51b6fb0a824d324519460d40151e0963fbc845474005262801920e2abce76c4dec2801d4940051d4d0021eb4a38a00297140074a5068003d68cd0014a38140051400b466800a5a0029680168cd0028a075a005ef4668014e29314009b79cd1839ce6800e7346ef6a000114b40084669b834008296800068a005a4a00296800a5a0028a0031450014a0d002d25002d140098a280131477a004a2800a280131462800a2800a2800a2800a2800a5cd002d2d0014b9a00506a68aea48beeb71e86802f43a8231c48369f5ed561922b850480c3b11401525b060731b6e1e86aaba321c3020fbd00251400519a005a41400a0d381a005e0d26d1400d2b8a43c50000d3b3400a1a9c0d0014500145001d697140098a5071400871da9a3ad002e6a9ea4730aafabaff3a005d48fef2251d92acc38fb3463da801d8a154b1c0ea68002b8241ea28c500376d211400c7851faad5692cffb87f034015e48193ef2d4263a00618e98460d001466801eb2329e0d4eb72470c280265995bbd48083400b9a5dc68011f0c3a60fad33045003be6039a72c8477a0078914f0c3229ad04321ce3140113da1504a1a882cabfc3400e5b82ad86e2a64ba05b19e940132ce0f7a915c3500381a5a004a952e254fbae7f1a00b09a81e049183f4a5c594e390149fc280227d3323314991ef55a4b39e3ce53207714015241818354af8e2dcfb9a009e92800029318a004ef4b400519cd002d038a005cd140052d00140a0028a00052e79a005a280169450014b40051400b4b400521a000d2e280108a4d9ce6800c11df349c8eb40099e69280168a005a4a005a3b500252d0014b4005140051400b40a005a28016931400946280128c500149400514001a4a0028a0028a004a280168a005a2801734b9a00296800cd491c8f19ca311401761d408e2519f7156d648a71c10d9ed4010cb60a7988edf6354e58258befaf1ea3a50047450019a5cd0019a280141a50d400b9a0806801a57d29bcd00283466801c1a9dba801783498a0028068017349400628228012a95f73244bfed668013516cdd7d302ae8e2341ed40052e6800dc73934a2800146280108cd3769a0031eb51496d1b83c60fb500577b361f77e6aaaf110704608a008da3a618e801b82282680141c53d6565e8680274b9fef0a996556e86801f9a5a0050d8a3e461c8e7d6800f2f8e0e693e65eb400a1cf4cd49bc746140018e293d326a07b21bb2a714010b452a1c0e68499875c8a009d6e7d4d4cb7008e680241203de9fb85002f1498e6801caee872ac454e97d2a8c361beb4014f559567f28c636b03f37b8ac7d41c2f94a7a33806802d5140094b400d22968010d274a005cd2d0014a0d0014500145002d1400b45001d296800ef477a005a51400a0f34b40051400b40e7ad0018e694f4a004a05002d21140098a42b91400d2b4878a00514b4001a4a005a280168a0028a0028a00296800cd19a005a33400b450025262801283400521a0028a0028a004a2800a280128a002945002d19a005a01a005a5a0001a72b10410706802e417ee9c3fcc3f5abb0dd4530c6707d0d0012da452f206d3ea2a9cb672c63206f1ea2802be292800a2800a5a0039a5cd0019a5a00695f4a6f22800cd2eea0050d4e0d400bba8cd0019a5cd002e68a004c550ba39d42dd3d89fa7228021ba7dd76deed5a8dc1c7b500369680128cd002834b", //2
According to the spec, the first bytes of the first packet are:
0000000000ffa05a
this is Type-specific, Fragment Offset, Type, Q, Width, Height, MBZ, Precision, Length
the next data starting with 0000008010 and ending with 636363 is Quantization Table 128 bytes long.
So, the question is what is the data starting right after it?
ad8a5a0029
According to specs, no one of it can be recognized as possible JPEG marker, so the question is what is the encoding used and how can I encode the image I like with this particular encoding?
Attempt to encode on the manner of http://www.media.mit.edu/pia/Research/deepview/src/JpegEncoder.java works well for the jpeg files, but is not decoded with VLC's live555.

Strategy for time-indexed audio archive with lossy compression

For part of one of my projects, I am considering developing an audio archive for internet radio stations. This archive would be indexed and addressable by date/time.
For example, the server would connect to a stream (generally encoded in MP3), and save the stream data. A client could connect to this server and request audio from 2011-07-05 15:58:30 to 2011-07-05 15:59:37. The server would return the audio data to the client for playback.
My initial thought was to save the data to 1-minute chunks of raw MP3 data to disk, and reference these files from a database. The server would be dumb to the stream/file format, and wouldn't understand mpeg frames. It would simply pass on data to the client, dividing the chunks up linearly to send. It would be up to the client to sync to the stream. This is not unlike how internet radio servers run in general. SHOUTcast servers simply output the data, byte for byte, that is sent to them from the encoder. When a client connects, data is sent, regardless of whether or not it even ends on an MP3 frame. It is up to the client to sync.
I am wondering if there might be a better approach, maximizing compatibility with clients and audio formats. Any thoughts on how to go about this?
The only other thing I can think of is decoding the MP3 to raw PCM audio and re-encoding as necessary when requested. I would prefer not to go this route due to the disk space required, and the loss of quality when re-encoding.
This question is language-agnostic, but if it is helpful, I will likely implement a solution in PHP with MySQL as the database.
You don't have to worry about this, since ALL mp3 that I accessed over shoutcast is Constant Bitrate. Do you don't have to index it. I have POC project that had archive in 5 minute chunks, then uses PHP to combine that files and pseudo-stream it to the winamp via shoutcast. It worked!
And since you are working with mp3, you can assume (and you'll assume correctly) that the density of the captured file is linear, so to access 30 second of the 60 second file you should seek in the middle. Since mp3 decoders are robust enough, you don't have to track the frames at all here.
AACplus, whole different story. It's inherent VBR.

Resources