Dump subtitle from AVSubtitle in the file - audio

In FFMPEG sofftware, AVPicture is used to store image data using data pointer and linesizes.It means all subtitles are stored in the form of picture inside ffmpeg. Now I have DVB subtitle and I want to dump picture of subtitles stored in AVPicture in a buffer. I know these images subtitles can be dump using for, fopen and sprintf. But do not know how to dump Subtitle.I have to dump subtitles in .ppm file format.
Can anyone help me to dump picture of subtitles in buffer from AVSubtitle .

This process looks complex but actually very simple.
AVSubtitle is generic format, supports text and bitmap modes. Dvbsub format afaik bitmap only and the bitmap format can be differ like 16color or 256color mode as called CLUT_DEPTH.
I believe (in current ffmpeg) the bitmaps stored in AVSubtitleRect structure, which is member of AVSubtitle.
I assume you have a valid AVSubtitle packet(s) and if I understand correctly you can do these and it should work:
1) Check pkt->rect[0]->type. The pkt here is a valid AVSubtitle packet. It must be type of SUBTITLE_BITMAP.
2) If so, bitmap with and height can be read from pkt->rects[0]->w and pkt->rects[0]->h.
3) Bitmap data itself in will be pkt->rects[0]->data[0].
4) CLUT_DEPTH can be read from pkt->rects[0]->nb_colors.
5) And CLUT itself (color table) will be in pkt->rects[0]->data[1].
With these data, you can construct a valid .bmp file that can be viewable on windows or linux desktop, but I left this part to you.
PPM Info
First check this info about PPM format:
https://www.cs.swarthmore.edu/~soni/cs35/f13/Labs/extras/01/ppm_info.html
What I understand is PPM format uses RGB values (24bit/3bytes). It looks like to me all you have to do is construct a header according to data obtained from AVSubtitle packet above. And write a conversion function for dvbsub's indexed color buffer to RGB. I'm pretty sure somewhere there are some ready to use codes out there but I'll explain anyway.
In the picture frame data Dvbsub uses is liner and every pixel is 1 byte (even in 16color mode). This byte value is actually index value that correspond RGB (?) values stored in Color Look-Up Table (CLUT), in 16 color mode there are 16 index each 4 bytes, first 3 are R, G, B values and 4th one is alpha (transparency values, if PPM doesn't support this, ignore it).
I'm not sure if decoded subtitle still has encoded YUV values. I remember it should be plain RGBA format.
encode_dvb_subtitles function on ffmpeg shows how this encoding done. If you need it.
https://github.com/FFmpeg/FFmpeg/blob/a0ac49e38ee1d1011c394d7be67d0f08b2281526/libavcodec/dvbsub.c
Hope that helps.

As this is where I ended up when searching for answers to how to create a thumbnail of an AVSubtitle, here is what I ended up using in my test application. The code is optimized for readability only. I got some help from this question which had some sample code.
Using avcodec_decode_subtitle2() I get a AVSubtitle structure. This contains a number of rectangles. First I iterate over the rectangles to find the max of x + w and y + h to determine the width and height of the target frame.
The color table in data[1] is RGBA, so I allocate an AVFrame called frame in AV_PIX_FMT_RGBA format and shuffle the pixels over to it:
struct [[gnu::packed]] rgbaPixel {
uint8_t r;
uint8_t g;
uint8_t b;
uint8_t a;
};
// Copy the pixel buffers
for (unsigned int i = 0; i < sub.num_rects; ++ i) {
AVSubtitleRect* rect = sub.rects[i];
for (int y = 0; y < rect->h; ++ y) {
int dest_y = y + rect->y;
// data[0] holds index data
uint8_t *in_linedata = rect->data[0] + y * rect->linesize[0];
// In AVFrame, data[0] holds the pixel buffer directly
uint8_t *out_linedata = frame->data[0] + dest_y * frame->linesize[0];
rgbaPixel *out_pixels = reinterpret_cast<rgbaPixel*>(out_linedata);
for (int x = 0; x < rect->w; ++ x) {
// data[1] contains the color map
// compare libavcodec/dvbsubenc.c
uint8_t colidx = in_linedata[x];
uint32_t color = reinterpret_cast<uint32_t*>(rect->data[1])[colidx];
// Now store the pixel in the target buffer
out_pixels[x + rect->x] = rgbaPixel{
.r = static_cast<uint8_t>((color >> 16) & 0xff),
.g = static_cast<uint8_t>((color >> 8) & 0xff),
.b = static_cast<uint8_t>((color >> 0) & 0xff),
.a = static_cast<uint8_t>((color >> 24) & 0xff),
};
}
}
}
I did manage to push that AVFrame through an image decoder to output it as a bitmap image, and it looked OK. I did get green areas where the alpha channel is, but that might be an artifact of the settings in the JPEG encoder I used.

Related

Is a cursor greater than 512x512 pixels in size possible?

Goal:
I'm trying to create a cursor file which can cover the whole screen with a flashlight effect on a full hd (1920x1080) screen. For that, the cursor image resolution would need to be at 4K (3840x2160) along with having an alpha channel (32bpp). Axialis Cursor Workshop is the only cursor creation program I've tried which goes above the usual 256² pixel limit, but still caps at 512² pixels...
File format analysis:
Looking at the file format specifications, the usual upper bound of 256² pixels might be caused by the CUR/ICO format working with 8 bits for width and height fields each. ANI format looks more promising since it has 32 bits reserved for those. On the flip side, it seems to have no hotspot fields, and itself uses CUR/ICO format for the animation frames, unless the IconFlag bit is set to FALSE. Looking at a cursor file produced by Axialis CW, I see the flag set to TRUE weirdly enough.
Hex edit approach:
I've tried inserting raster data from a (converted) bmp of same size (521²) by the means of hex editing. Then I tried to insert raster data from a 1024² bpm, updating image dimensions and the file size in the headers. Which only kind of works, I guess.
I'd appreciate any help or pointers in the right direction.
Related things, in no particular order:
install cursor scheme.inf (Creates a certain cursor scheme from cur/ani files)
Set Cursor.ps1 (Applies a certain cursor scheme & size)
File format specification index (For the technical details)
PNG to BMP Converter (Properly converts png to 32bpp bmp files)
Axialis CursorWorkshop (Can create ani files up to 512² pixels at 32bpp)
Got it working with Hex Editor Neo and a binary template I put together for the ico/cur file format:
// ico.h
#pragma once
#pragma byte_order(LittleEndian)
#include "stddefs.h"
#include "bitmap.h"
struct ICONDIRENTRY;
struct ICONFILE;
public struct ICONDIR {
[description("")]
uint16 Reserved;
$assert(Reserved==0);
[description("Specifies image type: 1 for icon (.ICO) image, 2 for cursor (.CUR) image. Other values are invalid.")]
uint16 Type;
[description("Specifies number of images in the file.")]
uint16 Count;
[description("")]
ICONDIRENTRY Entries[Count];
};
struct ICONDIRENTRY {
var entryIndex = array_index;
[description("Cursor Width")]
uint8 Width;
[description("Cursor Height (added height of XORbitmap and ANDbitmap). A negative value would indicate pixel order being top to bottom")]
int8 Height;
[description("Specifies number of colors in the color palette. Should be 0 if the image does not use a color palette.")]
uint8 ColorCount;
[description("")]
uint8 Reserved;
$assert(Reserved==0);
[description("In ICO format: Specifies color planes. Should be 0 or 1. In CUR format: Specifies the horizontal coordinates of the hotspot in number of pixels from the left.")]
uint16 XHotspot;
[description("In ICO format: Specifies bits per pixel. In CUR format: Specifies the vertical coordinates of the hotspot in number of pixels from the top.")]
uint16 YHotspot;
[description("Size of (InfoHeader + ANDBitmap + XORBitmap)")]
uint32 SizeInBytes;
[description("FilePos, where InfoHeader starts")]
uint32 FileOffset as ICONFILE*;
};
struct ICONFILE {
BITMAPINFO Info;
// no idea why this isn't working
/*var bmiv1header = BITMAPINFOHEADER(Info.bmiHeader);
var size = bmiv1header.biSizeImage;
if(size == 0) {
size = Entries[entryIndex].SizeInBytes - bmiv1header.biSize;
}
uint8 RawData[size];*/
uint8 __firstPixel;
};
The cursor file I created successfully looks something like this with the template applied:
The trick was to set value of the image height field in the BITMAPHEADERINFO structure to twice the amount of pixels in height. The reason for this is that two separate pixel arrays are expected which are applied using bitwise XOR and AND. I was surprised when it already worked in the preview without even adding an AND pixel array. Seems like you can omit that or something, idk.

android AudioTrack playback short array (16bit)

I have an application that playback audio. It takes encoded audio data over RTP and decode it to 16bit array. The decoded 16bit array is converted to 8 bit array (byte array) as this is required for some other functionality.
Even though audio playback is working it is breaking continuously and very hard to recognise audio output. If I listen carefully I can tell it is playing the correct audio.
I suspect this is due to the fact I convert 16 bit data stream into a byte array and use the write(byte[], int, int, AudioTrack.WRITE_NON_BLOCKING) of AudioTrack class for audio playback.
Therefore I converted the byte array back to a short array and used write(short[], int, int, AudioTrack.WRITE_NON_BLOCKING) method to see if it could resolve the problem.
However now there is no audio sound at all. In the debug output I can see the short array has data.
What could be the reason?
Here is the AUdioTrak initialization
sampleRate =AudioTrack.getNativeOutputSampleRate(AudioManager.STREAM_MUSIC);
minimumBufferSize = AudioTrack.getMinBufferSize(sampleRate, AudioFormat.CHANNEL_OUT_STEREO, AudioFormat.ENCODING_PCM_16BIT);
audioTrack = new AudioTrack(AudioManager.STREAM_MUSIC, sampleRate,
AudioFormat.CHANNEL_OUT_STEREO,
AudioFormat.ENCODING_PCM_16BIT,
minimumBufferSize,
AudioTrack.MODE_STREAM);
Here is the code converts short array to byte array
for (int i=0;i<internalBuffer.length;i++){
bufferIndex = i*2;
buffer[bufferIndex] = shortToByte(internalBuffer[i])[0];
buffer[bufferIndex+1] = shortToByte(internalBuffer[i])[1];
}
Here is the method that converts byte array to short array.
public short[] getShortAudioBuffer(byte[] b){
short audioBuffer[] = null;
int index = 0;
int audioSize = 0;
ByteBuffer byteBuffer = ByteBuffer.allocate(2);
if ((b ==null) && (b.length<2)){
return null;
}else{
audioSize = (b.length - (b.length%2));
audioBuffer = new short[audioSize/2];
}
if ((audioSize/2) < 2)
return null;
byteBuffer.order(ByteOrder.LITTLE_ENDIAN);
for(int i=0;i<audioSize/2;i++){
index = i*2;
byteBuffer.put(b[index]);
byteBuffer.put(b[index+1]);
audioBuffer[i] = byteBuffer.getShort(0);
byteBuffer.clear();
System.out.print(Integer.toHexString(audioBuffer[i]) + " ");
}
System.out.println();
return audioBuffer;
}
Audio is decoded using opus library and the configuration is as follows;
opus_decoder_ctl(dec,OPUS_SET_APPLICATION(OPUS_APPLICATION_AUDIO));
opus_decoder_ctl(dec,OPUS_SET_SIGNAL(OPUS_SIGNAL_MUSIC));
opus_decoder_ctl(dec,OPUS_SET_FORCE_CHANNELS(OPUS_AUTO));
opus_decoder_ctl(dec,OPUS_SET_MAX_BANDWIDTH(OPUS_BANDWIDTH_FULLBAND));
opus_decoder_ctl(dec,OPUS_SET_PACKET_LOSS_PERC(0));
opus_decoder_ctl(dec,OPUS_SET_COMPLEXITY(10)); // highest complexity
opus_decoder_ctl(dec,OPUS_SET_LSB_DEPTH(16)); // 16bit = two byte samples
opus_decoder_ctl(dec,OPUS_SET_DTX(0)); // default - not using discontinuous transmission
opus_decoder_ctl(dec,OPUS_SET_VBR(1)); // use variable bit rate
opus_decoder_ctl(dec,OPUS_SET_VBR_CONSTRAINT(0)); // unconstrained
opus_decoder_ctl(dec,OPUS_SET_INBAND_FEC(0)); // no forward error correction
Let's assume you have a short[] array which contains the 16-bit one channel data to be played.
Then each sample is a value between -32768 and 32767 which represents the signal amplitude at the exact moment. And 0 value represents a middle point (no signal). This array can be passed to the audio track with ENCODING_PCM_16BIT format encoding.
But things are going weird when playing ENCODING_PCM_8BIT is used (See AudioFormat)
In this case each sample encoded by one byte. But each byte is unsigned. That means, it's value is between 0 and 255, while 128 represents the middle point.
Java has no unsigned byte format. Byte format is signed. I.e. values -128...-1 will represent actual values of 128...255. So you have to be careful when converting to the byte array, otherwise it will be a noise with barely recognizable source sound.
short[] input16 = ... // the source 16-bit audio data;
byte[] output8 = new byte[input16.length];
for (int i = 0 ; i < input16.length ; i++) {
// To convert 16 bit signed sample to 8 bit unsigned
// We add 128 (for rounding), then shift it right 8 positions
// Then add 128 to be in range 0..255
int sample = ((input16[i] + 128) >> 8) + 128;
if (sample > 255) sample = 255; // strip out overload
output8[i] = (byte)(sample); // cast to signed byte type
}
To perform backward conversion all should be the same: each single sample to be converted to exactly one sample of the output signal
byte[] input8 = // source 8-bit unsigned audio data;
short[] output16 = new short[input8.length];
for (int i = 0 ; i < input8.length ; i++) {
// to convert signed byte back to unsigned value just use bitwise AND with 0xFF
// then we need subtract 128 offset
// Then, just scale up the value by 256 to fit 16-bit range
output16[i] = (short)(((input8[i] & 0xFF) - 128) * 256);
}
The issue of not being able to convert data from byte array to short array was resolved when used bitwise operators instead of using ByteArray. It could be due not setting the correct parameters in ByteArray or it is not suitable for such conversion.
Nevertheless implementing conversion using bitwise operators resolved the problem. Since the original question has been resolved by this approach, please consider this as the final answer.
I will raise a separate topic for playback issue.
Thank you for all your support.

How to fix .gif with corrupted alpha channel (stuck pixels) collected with Graphicsmagick?

I want to convert an .avi with alpha channel into a .gif.
Firstly, I use
ffmpeg -i source.avi -vf scale=720:-1:flags=lanczos,fps=10 frames/ffout%03d.png
to convert .avi to sequence of .png's with aplha channel.
Then, I use
gm convert -loop 0 frames/ffout*.png output.gif
to collect a .gif.
But it seems that pixels of the output.gif just get stuck when something opaque is rendered on top of the transparent areas.
Here's an example:
As you can see the hearts and explosions do not get derendered.
P.S.
FFMPEG output (collection on .png's) is fine.
I do not use Graphicsmagick but your GIF has image disposal mode 0 (no animation). You should use disposal mode 2 (clear with background) or 3 (restore previous image) both works for your GIF. The disposal is present in gfx extension of each frame in the Packed value.
So if you can try to configure encoder to use disposal = 2 or 3 or write script that direct stream copy your GIF and change the Packed value of gfx extension chunk frame by frame. Similar to this:
GIF Image getting distorted on interlacing
If you need help with the script then take a look at:
How to find where does Image Block start in GIF images?
Decode data bytes of GIF87a raster data stream
When I tried this (C++ script) on your GIF using disposal 2 I got this result:
The disposal is changed in C++ like this:
struct __gfxext
{
BYTE Introducer; /* Extension Introducer (always 21h) */
BYTE Label; /* Graphic Control Label (always F9h) */
BYTE BlockSize; /* Size of remaining fields (always 04h) */
BYTE Packed; /* Method of graphics disposal to use */
WORD DelayTime; /* Hundredths of seconds to wait */
BYTE ColorIndex; /* Transparent Color Index */
BYTE Terminator; /* Block Terminator (always 0) */
__gfxext(){}; __gfxext(__gfxext& a){ *this=a; }; ~__gfxext(){}; __gfxext* operator = (const __gfxext *a) { *this=*a; return this; }; /*__gfxext* operator = (const __gfxext &a) { ...copy... return this; };*/
};
__gfxext p;
p.Packed&=255-(7<<2); // clear old disposal and leave the rest as is
p.Packed|= 2<<2; // set new disposal=2 (the first 2 is disposal , the <<2 just shifts it to the correct position in Packed)
It is a good idea to leave other bits of Packed as are because no one knows what could be encoded in there in time ...

How to interpret the field 'data' of an XImage

I am trying to understand how the data obtained from XGetImage is disposed in memory:
XImage img = XGetImage(display, root, 0, 0, width, height, AllPlanes, ZPixmap);
Now suppose I want to decompose each pixel value in red, blue, green channels. How can I do this in a portable way? The following is an example, but it depends on a particular configuration of the XServer and does not work in every case:
for (int x = 0; x < width; x++)
for (int y = 0; y < height; y++) {
unsigned long pixel = XGetPixel(img, x, y);
unsigned char blue = pixel & blue_mask;
unsigned char green = (pixel & green_mask) >> 8;
unsigned char red = (pixel & red_mask) >> 16;
//...
}
In the above example I am assuming a particular order of the RGB channels in pixel and also that pixels are 24bit-depth: in facts, I have img->depth=24 and img->bits_per_pixels=32 (the screen is also 24-bit depth). But this is not a generic case.
As a second step I want to get rid of XGetPixel and use or describe img->data directly. The first thing I need to know is if there is anything in Xlib which exactly gives me all the informations I need to interpret how the image is built starting from the img->data field, which are:
the order of R,G,B channels in each pixel;
the number of bits for each pixels;
the numbbe of bits for each channel;
if possible, a corresponding FOURCC
The shift is a simple function of the mask:
int get_shift (int mask) {
shift = 0;
while (mask) {
if (mask & 1) break;
shift++;
mask >>=1;
}
return shift;
}
Number of bits in each channel is just the number of 1 bits in its mask (count them). The channel order is determined by the shifts (if red shift is 0, the the first channel is R, etc).
I think the valid values for bits_per_pixel are 1, 2, 4, 8, 15, 16, 24 and 32 (15 and 16 bits are the same 2 bytes per pixel format, but the former has 1 bit unused). I don't think it's worth anyone's time to support anything but 24 and 32 bpp.
X11 is not concerned with media files, so no 4CC code.
This can be read from the XImage structure itself.
the order of R,G,B channels in each pixel;
This is contained in this field of the XImage structure:
int byte_order; /* data byte order, LSBFirst, MSBFirst */
which tells you whether it's RGB or BGR (because it only depends on the endianness of the machine).
the number of bits for each pixels;
can be obtained from this field:
int bits_per_pixel; /* bits per pixel (ZPixmap) */
which is basically the number of bits set in each of the channel masks:
unsigned long red_mask; /* bits in z arrangement */
unsigned long green_mask;
unsigned long blue_mask;
the numbbe of bits for each channel;
See above, or you can use the code from #n.m.'s answer to count the bits yourself.
Yeah, it would be great if they put the bit shift constants in that structure too, but apparently they decided not to, since the pixels are aligned to bytes anyway, in "standard order" (RGB). Xlib makes sure to convert it to that order for you when it retrieves the data from the X server, even if they are stored internally in a different format server-side. So it's always in RGB format, byte-aligned, but depending on the endianness of the machine, the bytes inside an unsigned long can appear in a reverse order, hence the byte_order field to tell you about that.
So in order to extract these channels, just use the 0, 8 and 16 shifts after masking with red_mask, green_mask and blue_mask, just make sure you shift the right bytes depending on the byte_order and it should work fine.

Mapped colors to the VGA palette turn out wrong

I am learning old DOS programming, specifically controlling VGA directly. I am also doing this to relearn and get better at C.
Anyway I have written a small program that loads a PCX file and displays it. The one I am using is of a cacodemon from DooM, with the original DooM palette. The pixel data seems to be correct, as well as the RGB values for the palette (I did a printf of all 256 rgb triplets and they matched the editor I am using). However when I display the palette, there is obvious differences and the image's color is distorted.
Original image and palette:
http://i.imgur.com/7lM5R.png
My output (the numbers are palette values, and are correct):
http://i.imgur.com/MJTUE.png
Here is the palette load code
void setPalette(unsigned char * newPalette)
{
int x, y = 0;
//SET PALETTE MEMORY
for (x = 0; x <= 255; x++)
{
outp(PALETTE_MASK, 0xFF); //Can access whole palette
outp(PALETTE_REGISTER_WR, x); //Set index
outp(PALETTE_DATA,newPalette[y]); //Write R value
outp(PALETTE_DATA,newPalette[y+1]); //Write G value
outp(PALETTE_DATA,newPalette[y+2]); //Write B value
printf("%d, %d, %d\n", newPalette[y], newPalette[y+1], newPalette[y+2]);
y += 3;
//getch();
}
}
I figured it out. Because VGA stores only 64 levels of R,G, & B, you need to shift each value twice right.

Resources