how to add a digital signature to an xlsx file - digital-signature

I am trying to add a digital signature to an xlsx file... Can't seem to find any resources for this (other than adding signatures to literal/regular xml files). Is this possible with docx4j? I see it includes jaxb-xmldsig but there are no samples that I could find. Perhaps someone could point me in the right direction?
EDIT: Per Jason, I looked at the differences via the demo webapp....
There are two new entries in [Content_Types].xml:
<Default Extension="sigs"
ContentType="application/vnd.openxmlformats-package.digital-signature-origin"/>
<Override ContentType="application/vnd.openxmlformats-package.digital-signature-xmlsignature+xml" PartName="/_xmlsignatures/sig1.xml"/>
Two new parts within a new top level directory (_xmlsignatures):
/_xmlsignatures/origin.sigs
/_xmlsignatures/sig1.xml
There is also a _rels directory within _xmlsignatures which contains a single file origin.sigs.rels. I can post more info if that will be helpful.

It is not the DigSig from the extended properties ?
see :
http://www.schemacentral.com/sc/ooxml/e-extended-properties_Properties.html
see :
http://msdn.microsoft.com/en-us/library/documentformat.openxml.extendedproperties.properties_members(v=office.14).aspx (DigitalSignature)
see : docx4j\xsd\docProps\shared-documentPropertiesExtended.xsd
Digital Signature > contains a binary Blob
If it is, you can add the DigSig to the propertie by editing the extendPropertie
DocPropsExtendedPart docPropsExtPart = wordMLPackage.getDocPropsExtendedPart();
Properties extProp = docPropsExtPart.getContents();
ExtendedProperties.modifyProp(props.getExtendedProperties(), extProp);
wordMLPackage.setPartShortcut(docPropsExtPart, Namespaces.PROPERTIES_EXTENDED);

Related

XBA parsing and update with Excel VBA

I'm trying to make an XML parser/updater through Excel VBA.
First of all, I have been going back and forth between Excel VBA and Python but it seemed like Excel VBA was a better option to me.
However, I am open to any method really so please let me know if anyone has a different suggestion that would work better.
So, what I want to do with this application.
Parse XML and note the information on Excel format
I need name and the value of each attributes along with the text value of each node
After getting the information in the Excel format, I want to be able to revise values and output back to the XML format
So, in a nutshell, I am really aiming for a XML editor I guess?
But I am stuck at a few issues from the startline.
Here's a brief implementation of the XML parsing portion:
'load xml document
Set xmlDoc = CreateObject("MSXML2.DOMDocument.6.0")
xmlDoc.async = False
xmlDoc.validateOnParse = False
xmlDoc.Load(xmlFilepath)
'get document elements
Set xmlDocElement = xmlDoc.DocumentElement
Debug.Print xmlDocElement.xml
For i = 0 To xmlDocElement.ChildNodes.Length - 1
Debug.Print xmlDocElement.ChildNodes(i).xml
For j = 0 To xmlDocElement.ChildNodes(i).Attributes.Length - 1
Debug.Print xmlDocElement.ChildNodes(i).Attributes.Item(j).Name
Debug.Print xmlDocElement.ChildNodes(i).Attributes.Item(j).Value
Next j
Debug.Print xmlDocElement.ChildNodes(i).Text
Next i
The above method works well more or less with an exception for two conditions, so far at least.
XML file cannot be loaded if the text includes &/>/<
XML file cannot be loaded if it includes more than 1 highest parent node.
Text including &/>/< sample:
<parenttag>
<childtag>I love mac&cheese</childtag>
</parenttag>
The answer I found online was quite conclusive:
Revise the text so that it does not use &/>/<.
But I cannot modify the text and need to keep the current format.
Any way to bypass this?
More than 1 highest parent node sample:
<parenttag>
<childtag>Text</childtag>
</parenttag>
<differenttag>
<childtag>Some other text</childtag>
</differenttag>
XML Load does not work with multiple parent tags in 1 XML file.
And again, I cannot modify the XML file content, so I need a way around the load error.
I also want to note that I have initially started this project
by reading XML file as a text and process line by line.
But, this did not work well with multi-line content
and thus trying to figure out a way to process XML file properly.
This question really includes multiple portions but I would really appreciate if I can get any help.
The issue is that any XML parser will only accept valid XML. And
<childtag>I love mac&cheese</childtag>
is just no valid XML. It should be encoded as
<childtag>I love mac&cheese</childtag>
So that is what you need to fix. You can only work with a standard (like XML standard) if everyone follow the XML standard rules and produces valid XML. Otherwise your code might look like XML but it is no XML (until it is valid).
Also multiple root elements is not allowed in XML. If it has multiple roots then it is no XML. So to get out of your issue the only thing you can do is fix those issues before loading the file into a parser. For example you can add a root tag to make your multiple parents become childs of that root:
<myroot>
<parenttag>
<childtag>Text</childtag>
</parenttag>
<differenttag>
<childtag>Some other text</childtag>
</differenttag>
</myroot>
And & that are not encoded yet need to be changed to & to make them valid.
The only other option is to write your own parser to parse that custom files which are not XML. But that will not be possible in 2 lines of code as you will need to develop a parser for your NON-XLM files.

Download Blob_file with SAS

I am trying to download Blob file from ORACLE DB. I used dbms_lob.substr to cut binary data on parts (max length of HEX field is 2,000). So I cut it, then I put data into .docx file. When I open it I see the message:
Word found a problem with content in file test777.docx
and asks me to repair the file. After the Office suite repairs, the document just opens fine. I am able to open the document.
The core problem I think in a screenshot:
[![enter image description here][1]][1]
After cutting remained quantity of a symbol of the last field is it is supplemented by '02'. So when I write it in a file and open it with binary view I see lots of spaces in there. As I understand that is a core problem.
[![enter image description here][2]][2]
Does anyone knows how to avoid it? I think the problem in method of downloading.
How to repair bunch of files like Office does? (I have nearly 100 files every month).
You didn't specify a variable name for the length of the blob so I will use BLOB_LENGTH. You need to make sure not to write out more than the full length. Also you do not want the MOD option on the FILE statement since you are creating the file not appending to an existing file.
data _null_;
length fv $ 120;
set blobs;
fv="k:\Folder\"||File_nm;
file writeout FILEVAR=fv recfm=n;
array blob[8] blob_1-blob_8;
do i=1 to 8 ;
len = max(0,min(2000,blob_length - 2000*(i-1)));
put blob[i] $varying2000. len;
end;
run;

Output other than .txt

I'm looking to build a simple program that will simply modify existing output files from an other program so I don't have to open the program and enter a bunch of data the long way. This program is very specific to my domain and has an extension named .wcc. However, when I change the extension of one of these output files to .txt, I get half gibberish :
ÿÿ WPointÿÿ WPolygonÿÿ  WQuadrilateralÿÿ  WMemberDataÿÿ
WLoadÿÿ WLStandardMembersÿÿ WLSavedDesignSettingsÿÿ WLSavedFormatSettingsÿÿ  WLSavedViewSettingsÿÿ WLSavedProjectSettingsÿÿ  WLSavedSettingsÿÿ  WLSavedLoadSettingsÿÿ WLSavedDefaultSettingsÿÿ WLineÿÿ WProductÿÿ WBeamDataÿÿ  WColumnDataÿÿ
WJoistDataÿÿ
WWallStudDataÿÿ WSupportingMemberDataÿÿ WSavedAnalysisSettingsÿÿ WSavedGravityDesignSettingsÿÿ WSavedPreferencesSettingsÿÿ WNotchÿÿ WIJoistÿÿ WFloorCWC37 ÀAE LumberS-P-F No.1/No.2 # À# lumwall.cww ÿÿÿÿ1.2.3.1.Mur_1_EX-D ÿÿÿÿÿÿ B Cÿÿ B C €? 4C 4C   Neige #F #F ÈC ÿÿÿ
WLStandardMembersÿÿ "
There are also musical notes and perpendicular signs which I can't copy paste here. I can sorta read the text, but still not enough to make modifications via txt file. What type of file could this be? Is it even possible to do what I'm trying to do? Thanks!
I am surprised that you are trying to open a .wcc file as a text file (it's contents - as you will see - don't lend themselves to being converted to such a file type); however, the attempt to open the file as a .txt file seems to be specific to your domain.
I noticed part of your question is as follows: "What type of file could this be?"
You are right in thinking that the .wcc file is a rather obscure file type - we don't think about that file type a lot (or are not conscious of it existing). A .wcc file is a WinCam 2000 Cache file that allows WinCam 2000 movies to be previewed in the slide browser - these were often generated by older WinCam 2000 screen recording and editing programs.
Again, the file extension is very rare these days (a Google search only returns ~700 results). But, it appears you have a program that is producing the file, which - as you are saying - "is quite specific to your domain". You may be out of luck with regard to opening them for modification purposes.
Supposedly, you can covert .wac files to .wav files, which are much more relevant to today's technology (and definitely alterable from code); however, without knowing the purpose of the file, e.g. what you are trying to do with the file domain-side, I can't say that this will suit your needs.
Also, the above comments are "correct": changing a file extension will not convert the file to the file extension type. Typically, converters - like a simple software - are needed to convert files.

How can I search in PDF documents/PDX catalog in powershell

I have a vendor that supplies their documentation library as a series of PDF files (and some CHM files) and include a .PDX catalog also.
I want to write a powershell script to front end it (using either powershell forms, or hosting powershell in asp.net).
I'm in the early stages, I've worked out how to get document information from the PDF stream (the xmpmeta XML metadata block near the end of the PDF file - one of the few streams in the file that's in plaintext) which looks like this:
<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 4.2.1-c043 52.372728, 2009/01/18-15:08:04
"><rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"><rdf:Description rdf:about="
" xmlns:pdf="http://ns.adobe.com/pdf/1.3/"><pdf:Producer>GPL Ghostscript 8.64</pdf:Producer><pdf:Keywo
rds>86000056-413</pdf:Keywords></rdf:Description><rdf:Description rdf:about="" xmlns:xmp="http://ns.ad
obe.com/xap/1.0/"><xmp:ModifyDate>2011-03-03T17:38:34-05:00</xmp:ModifyDate><xmp:CreateDate>2011-01-28
T23:12:07+05:30</xmp:CreateDate><xmp:CreatorTool>PScript5.dll Version 5.2</xmp:CreatorTool><xmp:Metada
taDate>2011-03-03T17:38:34-05:00</xmp:MetadataDate></rdf:Description><rdf:Description rdf:about="" xml
ns:xmpMM="http://ns.adobe.com/xap/1.0/mm/"><xmpMM:DocumentID>6cb2263d-2d61-11e0-0000-1390d57dcfcb</xmp
MM:DocumentID><xmpMM:InstanceID>uuid:1a0e68ba-14ad-4a03-b7a1-0a0e127b8753</xmpMM:InstanceID></rdf:Desc
ription><rdf:Description rdf:about="" xmlns:dc="http://purl.org/dc/elements/1.1/"><dc:format>applicati
on/pdf</dc:format><dc:title><rdf:Alt><rdf:li xml:lang="x-default">I/O Subsystem Programming Guide</rdf
:li></rdf:Alt></dc:title><dc:creator><rdf:Seq><rdf:li>Unisys Information Development</rdf:li></rdf:Seq
></dc:creator><dc:description><rdf:Alt><rdf:li xml:lang="x-default">ClearPath MCP 13.1,Application Dev
elopment,Administration,ClearPath MCP</rdf:li></rdf:Alt></dc:description></rdf:Description></rdf:RDF><
/x:xmpmeta>
using the following code (powershell v3, in v2 you need to select and expand the properties thus [string]$title = ($rdf.GetElementsByTagName('dc:title')| Select -expand Alt|Select -expand li)."#text"):
$file = ".\Downloads\68698703-007\PDF\86000056-413.pdf"
#determine what line in file the xmpmeta string starts
[int]$startln = (select-string -pattern '^<x:' $file).ToString().Split(":")[2]
#determine what line in file the xmpmeta string ends
[int]$endln = (select-string -pattern '^</x:' $file).ToString().Split(":")[2]
$startln--
#grab the xmpmeta and cast as type xml
[xml]$xmp = (gc $file)["$startln".."$endln"]
[xml]$rdf = $xmp.xmpmeta.InnerXml
#get title/creator/description element text
[string]$title = $rdf.GetElementsByTagName('dc:title').Alt.li."#text"
[string]$creator = $rdf.GetElementsByTagName('dc:creator').Alt.li."#text"
[string]$description = $rdf.GetElementsByTagName('dc:description').Alt.li."#text"
That's crucial because the filenames are in the format 12345678-123.pdf, the actual title is in the metadata itself, as well as document category etc.
So, I can produce a list of documents (displaying their proper titles, not the real filename) and allow them to be launched, but I also want to be able to search in all the documents using PDX file, but it's by no means plaintext!
I guess I could use one of a number of tools out there to convert each PDF into text, search it, repeat for each document and then return results for each document.
But, it strikes me that Adobe Reader already does that, so can I either start AcroRd32.exe with switches that will start the search, with search terms I've passed in to the AcroRd32 program, or can I use Adobe Search.API from within Powershell?
Any ideas specifically on automating load of the .PDX in Adobe Reader and firing off the search, or using adobe's API in powershell?
EDIT:
I can now launch acrobat from command line and search (so could mimic this in powershell) but the search only works when searching a PDF, not a PDX catalog. Both bring up the search pane, but only in a PDF document does the search field get populated and the search executed.
C:\Program Files (x86)\Adobe\Reader 10.0\Reader>AcroRd32.exe /A "search=trim" "P:\Doc Library\PDF\00_home.pdx"
Or
C:\Program Files (x86)\Adobe\Reader 10.0\Reader>AcroRd32.exe /A "search=trim" "P:\Doc Library\PDF\86000056-413.pdf"
Regards,
Graham
This is an old post, but be aware that the searching you do is potentially dangerous and that there is a better way to find the XMP metadata in a PDF file. XMP was designed specifically to be "findable" by text search. To that purpose it has a well defined begin and end code defined that is in there specifically so that you can extract the XMP data without having to parse the PDF format (or any other format the XMP metadata blob might be embedded in.
You can download the XMP specification here: http://www.adobe.com/devnet/xmp.html. Part 1 is the part where the explanation about XMP Packets explains how a text scanner can find the XMP packet with more accuracy.
Finally, PDF has an additional quirk that allows it to be incrementally updated. This might cause multiple XMP packets to appear in the file (where the last packet is normally the correct one). But annoyingly when the PDF is exported from applications like InDesign, images in the PDF (and other objects) might also have their own "object" XMP attached to it.
So consider where your files come from and how many strange things you might encounter and you want to provision for. But reading the XMP specification is not a bad idea for sure.

How to specify the name of the generate file to the e:worksheet function

We use the jboss seam-->excel module integration for generating excel sheets using e:worksheet. But the downloaded file name comes out as ExportUsers.jxl.xls, I would rather see this as ExportUsers.xls. How do I customize this information.
filename attribute of the e:workbook tag
<e:workbook filename="ExportUsers.xls" />
Take a look at the Seam excel documentation.
filename — The filename to use for the download. The value is a string. Please note that if you map the DocumentServlet to some pattern, this file extension must also match.

Resources