Questions about EXIF in hexadecimal form

Questions about EXIF in hexadecimal form - exif

I am trying to understand the EXIF header portion of a jpeg file (in hex) and how to understand it so I can extract data, specifically GPS information. For better or worse, I am using VB.Net 2008 (sorry, it is what I can grasp right now). I have extracted the first 64K of a jpg to a byte array and have a vague idea of how the data is arranged. Using the EXIF specification documents, version 2.2 and 2.3, I see that there are tags, that are supposed to correspond to actual byte sequences in the file. I see that there is a “GPS IFD” that has a value of 8825 (in hex). I search for the hex string 8825 in the file (which I understand to be two bytes 88 and 25) and then I believe that there is a sequence of bytes following the 8825. I suspect that those subsequent bytes denote where in the file, by way of an offset, the GPS data would be located. For example, I have the following hex bytes, starting with 88 25: 88 25 00 04 00 00 00 01 00 00 05 9A 00 00 07 14. Is the string that I am looking for longer than 16 bytes? I get the impression that in this string of data, it should be telling me where to find the actual GPS data in the file.
Looking at http://search.cpan.org/~bettelli/Image-MetaData-JPEG-0.153/lib/Image/MetaData/JPEG/Structures.pod#Exif_and_DCT, halfway down the page, it talks about “Each IFD block is a structured sequence of records, called, in the Exif jargon, Interoperability arrays. The beginning of the 0th IFD is given by the 'IFD0_Pointer' value. The structure of an IFD is the following:”
So, what is an IFD0_Pointer? Does it have to do with an offset? I presume an offset is so many bytes from a beginning point. If that is true, where is that beginning point?
Thanks for any responses.
Dale

I suggest you to read The Exif Specification (PDF); it is clear and quite easy to follow. For a short primer, here is the summary of an article I wrote:
A JPEG/Exif file starts with the start of the image marker (SOI). The SOI consists of two magic bytes 0xFF 0xD8, identifying the file as a JPEG file. Following the SOI, there are a number of Application Marker sections (APP0, APP1, APP2, APP3, ...) typically including metadata.
Application Marker Sections
Each APPn section starts with a marker. For the APP0 section, the marker is 0xFF 0xE0, for the APP1 section 0xFF 0xE1, and so on. Marker bytes are followed by two bytes for the size of the section (excluding the marker, including the size bytes). The length field is followed by variable size application data. APPn sections are sequential, so that you can skip entire sections (by using the section size) until you reach the one you are interested in. Contents of APPn sections vary, the following is for the Exif APP1 section only.
The Exif APP1 Section
Exif metadata is stored in an APP1 section (there may be more than one APP1 section). The application data in an Exif APP1 section consists of the Exif marker 0x45 0x78 0x69 0x66 0x00 0x00 ("Exif\0\0"), the TIFF header and a number of Image File Directory (IFD) sections.
The TIFF Header
The TIFF header contains information about the byte-order of IFD sections and a pointer to the 0th IFD. The first two bytes are 0x49 0x49 (II for Intel) if the byte-order is little-endian or 0x4D 0x4D (MM for Motorola) for big-endian. The following two bytes are magic bytes 0x00 0x2A (42 ;)). And the following four important bytes will tell you the offset to the 0th IFD from the start of the TIFF header.
Important: The JPEG file itself (what you have been reading until now) will always be in big-endian format. However, the byte-order of IFD subsections may be different, and need to be converted (you know the byte-order from the TIFF header above).
Image File Directories
Once you get this far, you have your pointer to the 0th IFD section and you are ready to read actual metadata. The remaining IFDs are referenced in different places. The offset to the Exif IFD and the GPS IFD are given in the 0th IFD fields. The offset to the first IFD is given after the 0th IFD fields. The offset to the Interoperability IFD is given in the Exif IFD.
IFDs are simply sequential records of metadata fields. The field count is given in the first two bytes of the IFD. Following the field count are 12-byte fields. Following the fields, there is a 4 byte offset from the start of the TIFF header to the start of the first IFD. This value is meaningful for only the 0th IFD. Following this, there is the IFD data section.
IFD Fields
Fields are 12-byte subsections of IFD sections. The first two-bytes of each field give the tag ID as defined in the Exif standard. The next two bytes give the type of the field data. You will have 1 for byte, 2 for ascii, 3 for short (uint16), 4 for long (uint32), etc. Check the Exif Specification for the complete list.
The following four bytes may be a little confusing. For byte arrays (ascii and undefined types), the byte length of the array is given. For example, for the Ascii string: "Exif", the count will be 5 including the null terminator. For other types, this is the number of field components (eg. 4 shorts, 3 rationals).
Following the count, we have the 4-byte field value. However, if the length of the field data exceeds 4 bytes, it will be stored in the IFD Data section instead. In this case, this value will be the offset from the start of the TIFF header to the start of the field data. For example, for a long (uint32, 4 bytes), this will be the field value. For a rational (2 x uint32, 8 bytes), this will be an offset to the 8-byte field data.
This is basically how metadata is arranged in a JPEG/Exif file. There are a few caveats to keep in mind (remember to convert the byte-order as needed, offsets are from the start of TIFF header, jump to data sections to read long fields, ...) but the format is quite easy to read. Following is the color-coded HEX view of a JPEG/Exif file. The blue block represents the SOI, orange is the TIFF header, green is the IFD size and offset bytes, light purple blocks are IFD fields and dark purple blocks are field data.

Here is a php script I wrote to modify exif headers.
<?php
$full_image_string=file_get_contents("torby.jpg");
$filename="torby.jpg";
if (isset($_REQUEST['filename'])){$filename=$_REQUEST['filename'];}
if (array_key_exists('file', $_REQUEST)) {
$thumb_image = exif_thumbnail($_REQUEST['file'], $width, $height, $type);
} else {
$thumb_image = exif_thumbnail($filename, $width, $height, $type);
}
if ($thumb_image!==false) {
echo $thumb_image;
$thumblen=strlen($thumb_image);
echo substr_count($full_image_string,$thumb_image);
$filler=str_pad("%%%THUMB%%%", $thumblen);
$full_image_string=str_replace($thumb_image,$filler,$full_image_string);
file_put_contents("torby.jpg",$full_image_string);
exit;
} else {
// no thumbnail available, handle the error here
echo 'No thumbnail available';
}
?>

Related

Cannot interpret ELF 64-bit obj file section header tables

I have a relocatable 64-bit .o file at hand and would like to read each section header. From the wiki I see that the 2-byte value stored in 0x28 should be the start of section header table. The value is 0x0328.
I then go to address 0x0328 and judging from other information each section header is of 0x40 size and I have a total of 14 headers (this information is confirmed with readelf). However then I got lost, and what I interpreted is completely different from what readelf told me:
Section Header 0: I see all 64 bytes are 00, which matches what readelf told me;
Section Header 1: I see the following values from 0x368:
However readelf told me that:
What did I miss?

How to read zip file header manually

I have a file in which i have very important java project source code that got lost.
It is an elf-file. When i open it with and editor most of it is unreadable but the complete java project seems to be embedded as a uncompressed zip folder inside the file with folderstructure and everything (dont ask me why. I only try to get the information back i am not responsible).
The relevant information pieces in the elf-file look like the following:
PK
Üi‰L§½kQ Q 9 file/path/i/cant/show/contenttext
content
content
Because i dont know where the zip folder starts and where it ends and because everything is uncompressed my idea was to write a small script to scrape the from the elf-file and create the complete javaproject from that.
For that i want the file name length from the header so its easy to know where filename ends end filecontent starts.
ThisPK Üi‰L§½kQ Q 9 seems to be the file header of the zipfile. I converted it to hex and it looks like this: 504B03040A2020082020DC69894CA71E BD6B512020205120202039202020
I tried to format that with the info from wikipedia:
504B0304 //sig (this showed me i did something right)
0A20 // version
2008 // generalpurpose flag
2020 // compression method
DC69 // File last modification time
894C // File last modification date
A71EBD6B //CRC-32 of uncompressed data
51202020 //Compressed size (or 0xffffffff for ZIP64)
51202020 //Uncompressed size (or 0xffffffff for ZIP64)
3920 //File name length (n)
2020 //Extra field length (m)
And Endian switch:
04034B50 //sig
200A // version
0820 // generalpurpose flag
2020 // compression method
69DC // File last modification time
4C89 // File last modification date
6BBD1EA7 //CRC-32 of uncompressed data
20202051 //Compressed size (or 0xffffffff for ZIP64)
20202051 //Uncompressed size (or 0xffffffff for ZIP64)
2039 //File name length (n)
2020 //Extra field length (m)
But something seems wrong. The length of the file header is right (30 bytes plus filename) and the numbers seem to have information at the right point but 2020 should be 0000 for compression. To me it seems the conversion to hex is only half right.
What do i have to change to get the right numbers?

I found my error.
The problem of the weird 2020 instead of 0000 was my mistake. I opened the file in notepadd++ copied interesting sections into a new file and converted them there into hex. I think the copying changed the data. When i open the file directly in a hexeditor the zipefile header is all right.

Extending ext4 File system's filename size limit to 1012 characters

I am pulling data from a server and one of the folder name is longer than 256 bytes, so my CentOS is throwing an error that the name is too long. I have searched all over the internet but couldn't find any way to create a folder/file name with size of over 256 bytes under ext2/ext3/ext4 file systems.
However, One solution had suggested to create reiserfs file system alongside ext4 to handle files\folder with longer names. This solution might work, but I just read in one of the book that it is possible to extend the limit of filename size from 255 characters to 1012 characters if needed.
The maximal file name size is 255 characters. This limit could be extended to `1012` if needed.
Source: Google
But I couldn't find any website that explains how the filesystem could be modified to extend the size to 1012?
Can someone please help me with that?

Don't know where 1012 was found (mentioned in http://e2fsprogs.sourceforge.net/ext2intro.html - Design and Implementation of the Second Extended Filesystem, ISBN 90-367-0385-9. 1995), but in modern Linux kernel file name is fixed in struct ext2_dir_entry_2 with maximum of 255 chars (bytes):
https://elixir.bootlin.com/linux/v4.10/source/fs/ext2/ext2.h#L600
/*
* The new version of the directory entry. Since EXT2 structures are
* stored in intel byte order, and the name_len field could never be
* bigger than 255 chars, it's safe to reclaim the extra byte for the
* file_type field.
*/
struct ext2_dir_entry_2 {
__le32 inode; /* Inode number */
__le16 rec_len; /* Directory entry length */
__u8 name_len; /* Name length */
__u8 file_type;
char name[]; /* File name, up to EXT2_NAME_LEN */
};
There was struct ext2_dir_entry with longer file name length, but extra byte of name_len was redefined as file_type.
__le16 name_len; /* Name length */
So, current maximum file name length for ext2 is 255
https://elixir.bootlin.com/linux/v4.10/source/include/linux/ext2_fs.h#L22
#define EXT2_NAME_LEN 255
https://elixir.bootlin.com/linux/v4.10/source/fs/ext2/namei.c#L62
if (dentry->d_name.len > EXT2_NAME_LEN)
return ERR_PTR(-ENAMETOOLONG);
Same for ext3/ext4:
https://elixir.bootlin.com/linux/v4.10/source/fs/ext4/ext4.h#L1892
/*
* Structure of a directory entry
*/
#define EXT4_NAME_LEN 255
https://elixir.bootlin.com/linux/v4.10/source/fs/ext4/namei.c
* `len <= EXT4_NAME_LEN' is guaranteed by caller.
if (namelen > EXT4_NAME_LEN)
return NULL;
if (dentry->d_name.len > EXT4_NAME_LEN)
return ERR_PTR(-ENAMETOOLONG);
Ondisk format is described with 8 bit file_name too (file_type uses only 3 bits in older docs - EXT2_FT_MAX, but modern driver will not handle 255+ file names. ext4 has extra FT of 0xde):
http://www.nongnu.org/ext2-doc/ext2.html#IFDIR-NAME-LEN "4.1.3. name_len - 8bit unsigned value indicating how many bytes of character data are contained in the name."
http://cs.smith.edu/~nhowe/262/oldlabs/ext2.html#direntry "The file_type field indicates what kind of file the entry is referring to... The maximum length of a file name is EXT2_NAME_LEN, which is usually 255."
https://oss.oracle.com/projects/ocfs2/dist/documentation/disklayout.pdf#page=16 "__u8 name_len"

See https://en.wikipedia.org/wiki/Comparison_of_file_systems#Limits, not a lot of filesystems handle filenames of more than 255 bytes.
While it may not be a direct reply to your question, I do not think you should try to go this route (changing the maximum length).
From what server do you retrieve your files? Why not changing their name when doing the retrieval?
Are you sure that the whole path name is relevant, if you extract the first or last 250 characters isn't it enough to reference each file without ambiguity?
You have many options, depending on your constraints:
either generate "random" names (or sequential ones), and just store in a text file the mapping between old true name and the new fake name
or split the initial name in 250 characters path elements (or something like that) and create intermediate directories with them
You can find similar discussion at https://serverfault.com/questions/264339/how-to-increase-filename-size-limit-on-ext3-on-ubuntu

J2ME - PNG created in PhotoShop not displaying in emulator

My midlet is showing some images fine, but not others.
They are all 8-bit PNGs, but the ones that aren't displaying are the ones I have created myself in PhotoShop.
So I am thinking maybe my PhotoShop (CS6) settings are wrong...
PNG-8, Selective, Diffusion, Colors: 256, Dither: 100%, Matte: None, Web
Snap: 0%, Convert to sRGB: ticked, Width: 48, Height: 48, Percent: 100%,
Quality: Bicubic.
I've experimented with a few of these settings, but to no avail.
Any ideas?
There is a similar problem here but this is opposite to mine in that PhotoShop mends things in that case, rather than breaks things...
My code is...
image = Image.createImage("/img/loading1.png");
...and here is my stack trace:
java.io.EOFException
at javax.imageio.stream.ImageInputStreamImpl.readFully(
ImageInputStreamImpl.java:353)
at java.io.DataInputStream.readUTF(DataInputStream.java:609)
at javax.imageio.stream.ImageInputStreamImpl.readUTF(ImageInputStreamImpl.java:332)
at com.sun.kvem.png.PNGImageReader.parse_iTXt_chunk(PNGImageReader.java:447)
at com.sun.kvem.png.PNGImageReader.readMetadata(PNGImageReader.java:650)
at com.sun.kvem.png.PNGImageReader.readImage(PNGImageReader.java:1312)
at com.sun.kvem.png.PNGImageReader.read(PNGImageReader.java:1582)
at com.sun.kvem.midp.GraphicsBridge.loadImage(GraphicsBridge.java:2602)
at com.sun.kvem.midp.GraphicsBridge.createImageFromData(GraphicsBridge.java:2511)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at com.sun.kvem.sublime.MethodExecution.process(MethodExecution.java:42)
at com.sun.kvem.sublime.SublimeExecutor.processRequest(SublimeExecutor.java:63)
at javax.microedition.lcdui.Image.createImage(Image.java:315)
The image in question does exist - both in the project and in the jar that is built.
Here is the image in question:

According to the crash log, the PNG decoder in J2ME fails inside the non-critical chunk iTXt:1
> com.sun.kvem.png.PNGImageReader.readMetadata
> com.sun.kvem.png.PNGImageReader.parse_iTXt_chunk
> javax.imageio.stream.ImageInputStreamImpl.readUTF
> java.io.DataInputStream.readUTF
According to libpng documentation, the text part of an iTXt chunk must be valid UTF8:
... The remaining chunk data is the main UTF-8 text, either zlib-compressed or not, according to the compression flag. Since its length can be determined from the chunk length, it is not null-terminated. As with the other two text chunks, newlines should be represented by single line-feed characters (decimal 10), and all other control characters (1-9, 11-31, and 127-159) are discouraged.
and so normally this would indicate that the stream read is not valid UTF8 text - it contains 'raw' bytes higher than the plain ASCII range 0..127 that do not conform to UTF8 rules.
I found that not to be the case in the sample image. There is only one set of consecutive bytes that form a UTF8 code sequence, and it is a valid one:
<?xpacket begin="EFBBBF" id=" ..
(the bolded section represents 3 data bytes in hexadecimal notation). I first suspected this was the error:
If the BOM character appears in the middle of a data stream, Unicode says it should be interpreted as a "zero-width non-breaking space" (inhibits line-breaking between word-glyphs). In Unicode 3.2, this usage is deprecated in favour of the "Word Joiner" character, U+2060.[1] This allows U+FEFF to be only used as a BOM.
(http://en.wikipedia.org/wiki/Byte_order_mark)
.. and so a fully conforming UTF8 reader should inspect its bytes and throw an UTFDataFormatException when it encounters a BOM anywhere else than as the very first value. Surprisingly, this does not seem to be the problem! First of all, there is no indication any of the readUTF sources do anything else than only verify if the UTF8 code is valid on its own, irrespective of its value. There are lots of 'invalid' Unicode code points (values that do not represent a valid Unicode character or instruction), but it appears to me they are all silently ignored. But I noticed the common readUTF functions only implement a small subset of UTF8/Unicode (see, e.g., Modified UTF-8 in Oracle's documentation).
So the problem lies elsewhere. Another clue to this is that the error thrown is not UTFDataFormatException but rather EOFException, indicating the read buffer ran out of the number of bytes it was promised to contain.
(warning: pure conjecture follows)
Looking at a source of DataInputStream, I find this snippet of code:
588 public final static String readUTF(DataInput in) throws IOException {
589 int utflen = in.readUnsignedShort();
followed by a loop to read utflen bytes (not "Unicode characters"). This is wrong for an iTXt chunk, as it does not have a 'first word' to indicate its length. The number of bytes in the plain text can be derived from the chunk length (which is, per PNG convention, the total data length excluding the length long word, the iTXt signature itself, and the final CRC32 code) minus the length of the zero-terminated keyword name, language, and "translated keyword" strings, and the two bytes which indicate compression of the full plain text.
As a work-around, remove the iTXt chunks from your PNG images. The data itself -- XMP Metadata -- is most likely not interesting at all for your purposes (but feel free to read what benefits Adobe thinks it has). And if your workflow does not use it, it's just a useless hunk of uncompressed text, taking up 814 bytes of the total of 981 bytes in your sample image -- a whopping 83%!
You can use an external utility to remove extraneous data chunks; the command line for the popular pngcrush, for example, is
pngcrush -rem alla -rem text InputFile.png OutputFile.png
(from en.wikipedia.org/wiki/Pngcrush).
Or directly from Photoshop: if you save a PNG 'the usual way' with the "Save As" menu option, the metadata goes in and there is no checkbox to get rid of it. If you use "Save for Web & Devices" instead, you get a large dialog with a lot of handy options, such as a drop down list labelled "Metadata".
Choosing "All" I got an even larger file; my version of Photoshop creates a massive 3K chunk of XMP Metadata, including a 2K totally empty 'filler' block...
Selecting "Copyright" or "None" finally got rid of all the crud (presumably because I did not fill in any copyright information), and then you get a nice 169 bytes long PNG, in which the only metadata is that software used is called "Adobe ImageReady".
1 Which is kind of ironic. Per PNG specifications,
.. A decoder encountering an unknown chunk in which the ancillary bit is 1 can safely ignore the chunk and proceed to display the image.
(source)
This "ancillary bit" is the 5th bit of the first byte of the chunk ID: 0 (uppercase) = critical, 1 (lowercase) = ancillary, i.e., if the first character of the chunk ID is a capital, a PNG reader must read and interpret its data correctly, and if it's not, it can be skipped silently.
So technically, the writers of J2ME could safely have ignored this entire chunk. But they messed it up, attempt to read it, and now the code crashes on all programs that merely try to read the image data in PNGs which happen to contain iTXt chunks.

Identify encoding to convert between known strings

I have some ISO 11785 RFID animal tags which are programmed with a number and also have a number written on them. The two numbers are matched through a text file which was sent with the RFID tags. The problem is that the numbers returned by the RFID reader don't match the numbers in the file. I would like someone to help me determine how the tag numbers have been converted to be put in the file. The RFID tag returns a number with manufacturer code followed by a unique animal code.
Here is a sample of the data:
RFID Tag | RFID Response From Tag
8000F580076C2BA9 | 982 000124529577
8000F580076C2C34 | 982 000124529716
8000F580076C2C32 | 982 000124529714
8000F580076C2DD4 | 982 000124530132
8000F580076C2BDC | 982 000124529628
Can anyone suggest what I should do to the number on the left to get to the number on the right? I have tried converting from hex to decimal but that doesn't seem to work.
Edit
It turns out if I take the last 7 characters of hex then they become the animal ID, the F58 translates to the 3982 which can be worked around to get me the 982 part. I think I should be able to work from this but thought I would leave the post anyway in case someone else has the same problem.

The field encoding is defined by the ISO standard.
bits
1 Animal flag
2-4 retag counter
5-9 user information
10-15 reserved
16 additional data flag
17-26 country/manufacturer code (982 is Allflex)
27-64 unique ID

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string