Custom Exif Tags - exif

I am using exiv2 to manipulate metadata in a jpeg file. I need to write more information related to image processing into the metadata. Is is possible to create Custom Exif tags other than the standard ones?

From http://www.exif.org/Exif2-2.PDF:
D. Tags Relating to User Information
MakerNote
A tag for
manufacturers of Exif writers to record any desired information. The
contents are up to the manufacturer, but this tag should not be used
for any other than its intended purpose. Tag = 37500 (927C.H) Type =
UNDEFINED Count = Any Default = none
UserComment A tag for Exif users to write keywords or comments on the
image besides those in ImageDescription, and without the character
code limitations of the ImageDescription tag. Tag = 37510 (9286.H)
Type = UNDEFINED Count = Any Default = none
exiv2 supports MakerNote tags: http://dev.exiv2.org/projects/exiv2/wiki/How_to_add_support_for_a_new_makernote
If you don't want to do this, you can use UserComment: http://www.exiv2.org/doc/exifcomment_8cpp-example.html

A better way to have custom metadata is by using Adobe XMP with a custom namespace.

Based on my research EXIF tags are conformed to a standard set of tags as described here.
Within the standards document it states (in reference to EXIF tags),
A registration system is used for character codes to avoid
duplication. When a character code is registered, a standard document
is indicated in the reference column to indicate the character format
specification. If a character code is used for which there is no clear
specification like Shift-JIS in Japan, Undefined is designated.
Therefore its my understanding that there is a standard set of tags (the registered tags) otherwise when you query it using a tool such as exiftool it will show up as 'Undefined'.
So in order to have a tag that is recognizable by tools or other entities, you'd need to register your tag to become part of the standardized tags.

I did find a way to add custom EXIF tags to files with ExifTool
I had to create a config file for fields I wanted:
%Image::ExifTool::UserDefined = (
'Image::ExifTool::Exif::Main' => {
0xd001 => {
Name => 'Category',
Writable => 'string',
WriteGroup => 'IFD0',
},
0xd002 => {
Name => 'PhotoID',
Writable => 'string',
WriteGroup => 'IFD0',
},
# add more user-defined EXIF tags here...
},
);
Then you can write to the fields using exiftool:
exiftool -config exif.config -EXIF:PhotoID=15 -EXIF:Category=Cars c:\temp\pic.png
To read the EXIF info:
exiftool -config exif.config c:\temp\pic.png

Related

Node.js XML builder Error: Invalid character in name: during creation of XML document

I am trying to create the XML file using the node.js and npm package xmlbuilder. When I am trying to create the tags I have some special characters such as : / etc due to which I am getting the following error:
Error: Invalid character in name: http://google.com.http://google.com
How can I resolve this issue. I can replace it with the blank but I don't want to do that I want my XML to retain these special characters.
var root = builder.create('test:document')
var ObjectEvent = root.ele('ObjectEvent')
for(var ex=0; ex<Extension.length; ex++)
{
Extension[ex].NameSpace = Extension[ex].NameSpace;
Extension[ex].LocalName = Extension[ex].LocalName;
Extension[ex].FreeText = Extension[ex].FreeText;
ObjectEvent.ele(Extension[ex].NameSpace+Extension[ex].LocalName,Extension[ex].FreeText).up()
}
ObjectEvent.ele(Extension[ex].NameSpace+'.'+Extension[ex].LocalName,Extension[ex].FreeText).up()
My Extension elements would look something like this;
[
{
NameSpace: 'http://google.com',
LocalName: 'http://google.com',
ExtensionVlaues: 0,
FreeText: 'Google Website',
'$$hashKey': 'object:290'
}
]
I wanted to know how can I retain all the special characters in my XML document
XML namespaces can be URIs, but XML element names cannot: / is not allowed in XML elements names.
I wanted to know how can I retain all the special characters in my XML document
Realize that your error is not about special characters in your XML document; it's about special characters in the names of your XML elements.
Regarding XML elements, you simply must abide by the rules specified in the standard regarding the allowed characters in XML element names. Otherwise, your data is not XML, and you and your callers will not be able to use XML tools and libraries with it.
See also How to include ? and / in XML tag

How do I detect .connect_pad_added() template = video_%u?

Using the gstreamer Rust bindings, how can I test if a sometimes pad that has been added is from template video_%u or audio_%u?
For example, using qtdemuxm, the following pad added is called once for video and once for audio
.connect_pad_added(move |demux, src_pad| {
according to the binding docs it seems
get_property_name_template(&self)
but this fails
.connect_pad_added(move |demux, src_pad| {
let templateName = get_property_name_template(&src_pad);
| ^^^^^^^^^^^^^^^^^^^^^^^^^^ not found in this scope
a more manual way is to get the name then if else, but is there a more direct method?
println!(
"Received new pad {}",
src_pad.get_name()
);
I have also tried matching the pad form a template
.connect_pad_created('video_%u', src_pad{ ....
but I could not find a way to match the string of the template.
You have at least two options here:
You check if the pad name starts with audio_ or video_. You can get the name via get_name()
You get the pad template from the pad via get_pad_template() and then check the name template via get_property_name_template()
Ideally you would however not depend on template names (unless you explicitly work with a specific element factory, like qtdemux here) but instead look at the caps on the pad via get_current_caps() and if they are not available yet get notified once they change via connect_notify(Some("caps"), ...).

Google Cloud Vision API - DOCUMENT_TEXT_DETECTION: no "property" field in "pages"

I'm trying to extract the language from the detection response:
response.full_text_annotation.pages[0].property.detected_languages[0].language_code
but it seems that sometimes the detections are missing the TextProperty (property) field - as specified here: Page
Is it not always guaranteed to be in the detection?
Also, is there a way to receive only the fullTextAnnotation without the singular textAnnotations fields?
I think is not possible to receive only the fullTextAnnotation without the singular textAnnotations, because the response structure is TextAnnotation -> Page -> Block -> Paragraph -> Word -> Symbol and if you look into the TextAnnotation response, there is no way to modify it.
Regarding the missing TextProperty (property) field, you can try to fix this by using “DOCUMENT_TEXT_DETECTION” instead of “TEXT_DETECTION” towards TYPE. According to the documentation, The TEXT_DETECTION endpoint will auto-detect only a subset of supported languages, while the DOCUMENT_TEXT_DETECTION endpoint will auto-detect the full set of supported languages.

Logstash kv filter

I have a file with the following format:
10302\t<document>.....</document>
12303\t<document>.....</document>
10054\t<document>.....</document>
10034\t<document>.....</document>
as you can see there are two values separated by a tab char. I need to
index the first token (e.g. 10302, 12303...) as ID
extract (and then index) some information from the second token (the XML document). In other words, the second token would be used with the xml filter for extracting some information
Is it possibile to do that separating the two values using the kv filter? Ideally I should end, for each line, with a document like this:
id:10302
msg:<document>....</document>
I could use a grok filter but I'd like to avoid any regex as the field detection is very easy and can be accomplished with a simple key-value logic. However, using a plain kv detection I'm ending with the following:
"10302": <document>.....</document>
"12303": <document>.....</document>
"10054": <document>.....</document>
"10034": <document>.....</document>
and this is not want I need.
It is not possible to use kv for the job you want to do, as far as I know, since there are no possible key for the id (10302, 10303, 10304...). There are no possible key since there is nothing before the id.
This grok configuration would work, assuming each id + document is on the same line :
grok {
match => { "message" => "^%{INT:ID}\t%{GREEDYDATA:msg}"}
}

Can I read the font from one document, and embed that font in a brand new document, using iTextSharp?

In the below code (based on code previously provided by Chris Haas), I am reading the fonts from an existing document. Using this method, I am able to re-use those font objects elsewhere in the existing document. However, now I want to use this method to read the fonts in document "A", and embed them when I'm creating brand-new document "B". Can this be done?
The BaseFont.CreateFont method here is taking a PRindirectReference as an argument, which keeps me from being able to specify "BaseFont.EMBEDDED" as an argument, as can be seen in overloaded versions of the method where the specific path to a font is known.
internal static HybridDictionary findAllFonts(PdfReader reader)
{
HybridDictionary fd = new HybridDictionary();
//Get the document's acroform dictionary
PdfDictionary acroForm = (PdfDictionary)PdfReader.GetPdfObject(reader.Catalog.Get(PdfName.ACROFORM));
//Bail if there isn't one
if (acroForm == null)
{
return null;
}
//Get the resource dictionary
var DR = acroForm.GetAsDict(PdfName.DR);
//Get the font dictionary (required per spec)
var fontDict = DR.GetAsDict(PdfName.FONT);
foreach (var internalFontName in fontDict.Keys)
{
var internalFontDict = (PdfDictionary)PdfReader.GetPdfObject(fontDict.Get(internalFontName));
var baseFontName = (PdfName)PdfReader.GetPdfObject(internalFontDict.Get(PdfName.BASEFONT));
//Console.WriteLine(baseFontName.ToString().Substring(1, baseFontName.ToString().Length - 1));
var iRef = (PRIndirectReference)fontDict.GetAsIndirectObject(internalFontName);
if (iRef != null)
{
fd.Add(baseFontName.ToString().Substring(1, baseFontName.ToString().Length - 1).ToLower(),
BaseFont.CreateFont(iRef));
}
}
return fd;
}
This won't always be possible because usually fonts aren't embedded entirely. Instead you'll have subsets of the font. A glyph that is present in one subset may not be present in another subset.
Moreover, you'll face encoding problems: suppose that you have a document where Arial is used as a simple font for Greek glyphs. In that case, you'll have a maximum of 256 characters that can't be reused if you want to use Arial in another document to render a Russian text, or a text in Latin-1.
Even if you use Unicode, then you'll still have a problem, because there is not a single font that contains all Unicode characters. There are 1,114,112 code points in Unicode whereas a character identifier in a composite font can only be a number from 0 to 65,535...
You should really abandon the idea of reusing fonts that are present in existing documents to create new documents. On one side, it smells like you're trying to do something that is illegal (don't you have a license to use the actual fonts). On the other side your question sounds like: I have carrot soup, please tell me how to extract the original carrots from that soup so that I can reuse them for another purpose. You may have some results if you can still find large chunks of carrots in the soup, but in most cases, you'll fail.
For instance: if you have an elementary Type 1 font that is fully embedded, you should be able to copy all the essential elements of the font descriptor, but as soon as you're faced with the modern way of storing font subsets inside a PDF, you'll get stuck discovering that you're trying to do something that is simply impossible.

Resources