How to have Autohotkey write metadata, then extract that metadata for use in renaming the file

How to have Autohotkey write metadata, then extract that metadata for use in renaming the file - excel

I'm automating a process with Autohotkey where I'm renaming old.xlsx to document_archived_on_%Timestring, then renaming current.xlsx to old.xlsx, then renaming newest_document.xlsx to current.xlsx.
That part was straightforward and works fine.
I want to add a metadata comment to newest_document.xlsx that says "data_as_of_%Timestring%". Later, I want to rename old.xlsx to "document_%metadata%.xlsx.
The simple, working script below:
; Take newest report, rename it to current. Take Current Report and move it to Old
; Take oldest report and archive it.
; Archive old report
FormatTime, Timestring, , yyyyMMdd
FileMove, G:\TPO_Project_DB\Old Data\old.xlsx, G:\TPO_Project_DB\Old Data\Eng_const_rpt_data_as_of_%Timestring%.xlsx
; Rename and move "current.xlsx" to "old.xlsx"
FileMove, G:\TPO_Project_DB\Current Data\current.xlsx, G:\TPO_Project_DB\Old Data\old.xlsx
; Rename and move newest report to current
FileMove, C:\TPOReports\Combined_eng_const_*, G:\TPO_Project_DB\Current Data\current.xlsx

I was able to find a lot of information about reading the meta properties, but regarding writing to these..., well, I didn't find much. I didn't find a full solution, that is, examples of working code, but there may be at least two ways to do this.
Firstly, here are some reference links for reading the meta properties. If the links become dead, searching for "FGP - FileGetProperties" should yield similar results.
How to Access a file's metadata
[Function] FGP - FileGetProperties
One method of writing to Excel's meta properties is to use COM. This likely isn't preferred as it would involve opening the file, writing the property, then saving - which may also be slow. For example, the code below (writes to the "Comments" meta property) took 4.2s to execute - though, only 0.4s after the Excel instance was created - so you may want to create the application object once until you've changed all the Excel file you need.
f1::
sFilePath := A_Desktop . "\test.xls"
oExcel := ComObjCreate( "Excel.Application" )
oExcel.Workbooks.Open( sFilePath )
oExcel.ActiveWorkbook.BuiltinDocumentProperties( "Comments" ).Value := "Test text"
oExcel.ActiveWorkbook.Save
oExcel.Quit
Return
The other method involves using DSOFile.dll which can be found here, https://support.microsoft.com/en-us/help/224351/the-dsofile-dll-files-lets-you-edit-office-document-properties-when-yo
I know nothing about this or how to use it, but it was designed for manipulating Office products' properties and is likely much quicker than opening each file as shown in my code snippet above. Additionally, it might also not change the "Date modified" value.

Related

XBA parsing and update with Excel VBA

I'm trying to make an XML parser/updater through Excel VBA.
First of all, I have been going back and forth between Excel VBA and Python but it seemed like Excel VBA was a better option to me.
However, I am open to any method really so please let me know if anyone has a different suggestion that would work better.
So, what I want to do with this application.
Parse XML and note the information on Excel format
I need name and the value of each attributes along with the text value of each node
After getting the information in the Excel format, I want to be able to revise values and output back to the XML format
So, in a nutshell, I am really aiming for a XML editor I guess?
But I am stuck at a few issues from the startline.
Here's a brief implementation of the XML parsing portion:
'load xml document
Set xmlDoc = CreateObject("MSXML2.DOMDocument.6.0")
xmlDoc.async = False
xmlDoc.validateOnParse = False
xmlDoc.Load(xmlFilepath)
'get document elements
Set xmlDocElement = xmlDoc.DocumentElement
Debug.Print xmlDocElement.xml
For i = 0 To xmlDocElement.ChildNodes.Length - 1
Debug.Print xmlDocElement.ChildNodes(i).xml
For j = 0 To xmlDocElement.ChildNodes(i).Attributes.Length - 1
Debug.Print xmlDocElement.ChildNodes(i).Attributes.Item(j).Name
Debug.Print xmlDocElement.ChildNodes(i).Attributes.Item(j).Value
Next j
Debug.Print xmlDocElement.ChildNodes(i).Text
Next i
The above method works well more or less with an exception for two conditions, so far at least.
XML file cannot be loaded if the text includes &/>/<
XML file cannot be loaded if it includes more than 1 highest parent node.
Text including &/>/< sample:
<parenttag>
<childtag>I love mac&cheese</childtag>
</parenttag>
The answer I found online was quite conclusive:
Revise the text so that it does not use &/>/<.
But I cannot modify the text and need to keep the current format.
Any way to bypass this?
More than 1 highest parent node sample:
<parenttag>
<childtag>Text</childtag>
</parenttag>
<differenttag>
<childtag>Some other text</childtag>
</differenttag>
XML Load does not work with multiple parent tags in 1 XML file.
And again, I cannot modify the XML file content, so I need a way around the load error.
I also want to note that I have initially started this project
by reading XML file as a text and process line by line.
But, this did not work well with multi-line content
and thus trying to figure out a way to process XML file properly.
This question really includes multiple portions but I would really appreciate if I can get any help.

The issue is that any XML parser will only accept valid XML. And
<childtag>I love mac&cheese</childtag>
is just no valid XML. It should be encoded as
<childtag>I love mac&cheese</childtag>
So that is what you need to fix. You can only work with a standard (like XML standard) if everyone follow the XML standard rules and produces valid XML. Otherwise your code might look like XML but it is no XML (until it is valid).
Also multiple root elements is not allowed in XML. If it has multiple roots then it is no XML. So to get out of your issue the only thing you can do is fix those issues before loading the file into a parser. For example you can add a root tag to make your multiple parents become childs of that root:
<myroot>
<parenttag>
<childtag>Text</childtag>
</parenttag>
<differenttag>
<childtag>Some other text</childtag>
</differenttag>
</myroot>
And & that are not encoded yet need to be changed to & to make them valid.
The only other option is to write your own parser to parse that custom files which are not XML. But that will not be possible in 2 lines of code as you will need to develop a parser for your NON-XLM files.

Why is perl saving two copies of Excel spreadsheet?

This is similar to A copy of Excel Addin is created in My Documents after saving, except that I'm working with Perl instead of VBA, and xls files instead of xlsm, and the negative impact of the behavior is different.
I've inherited a Perl script (Perl 5.8.8) that is running on Windows 2003 Server as SYSTEM. After copying an Excel 2003 template file to a unique, fully defined path location, it opens the unique file in Excel using OLE, edits the file, saves the file, and closes the file. What results is the edited file being saved both in the correct, fully-defined path location, and also in the Default User profile's Documents folder.
This causes thousands of these files to accumulate on the C: drive, as every new admin to be hired gets a copy in his Documents folder.
Adding the code that sets the value of $OUT:
if (!$db->Sql("EXEC GetDetails 'name'"))
{
while ($db->FetchRow()>0)
{
#DataIn = $db->Data();
$name = $DataIn[0];
$IN = $DataIn[1];
$OUT = $DataIn[2];
opendir(DIR,"$OUT") || die "$OUT directory does not exist $!\n";
#... loop of proprietary code
#...
#Completed = $db1->Data();
#...
&formatExcelReport #The code that I previously posted
#...
# more proprietary code
# end of loop
} #end of while
}#end of if
The code I originally posted:
# Initialize Excel object
eval {Win32::OLE->new('Excel.Application', 'Quit')};
eval {$Excel = Win32::OLE->GetActiveObject('Excel.Application')};
unless (defined $Excel)
{
$Excel = Win32::OLE->GetActiveObject('Excel.Application')
|| Win32::OLE->new('Excel.Application', 'Quit');
}
$infiles = "Report_Template.xls";
$infiles = $OUT."/".$infiles;
$db6->Sql("EXEC FormatResults '".$Completed[0]."','".$Completed[1]."'");
$row = 2;
$fileName = $Completed[0]."_".$Completed[1];
$uniquefile = $fileName.$printdate.".xls";
# $OUT is a fully defined path on the E: drive
$reportfile = "$OUT"."\\".$uniquefile;
copy($infiles,$reportfile);
$Book = $Excel->Workbooks->Open("$reportfile");
$sheetnum = 1;
my $Sheet = $Book->Worksheets($sheetnum);
# Set Headers
$Header = $Sheet->PageSetup->{'CenterHeader'};
$Header = $Header." Results Test Code: ".$Completed[0]." Worksheet: ".$Completed[1]." Date: ".$headerdate;
$Sheet->PageSetup->{'CenterHeader'}= $Header;
# More file editing
# ...
$Book->Save();
$Book->Close(0);
Win32::OLE->new('Excel.Application', 'Quit');
Is the root of this problem the Save() command? Should I be using SaveAs() instead?
Any other feedback about how Excel is being used welcome, as well.
Thanks!

I don't see what causes this behavior, but here are a few things to try.
The template and the file it is copied to have names
$infiles = $OUT."/".$infiles;
$reportfile = "$OUT"."\\".$uniquefile;
Use the same separator.
Try to suppress some possible setting dictating that another copy be made. Perhaphs
$Excel->Application->{CreateBackup} = 0;
However, this may not be the correct property -- search the VB or Excel documentation for properties that may result in Excel saving an extra copy. (It needn't be "backup".)
Try to create a new file and use SaveAs, as a test to see whether you get two files again. The template copying may be setting it off to Save an extra copy (even though I don't see how). I'd say it's either that, or some general setting that need be turned off.
The rest is the original post, about using SaveAs, whereby I thought that a new file is created
You would use SaveAs to write a new file. See saveas in MSDN library
Saves changes to the workbook in a different file.
Using the save method may result in saving two files fro some reason, as noted in the answer by Borodin. This page also advises to use SaveAs for a new file
The first time you save a workbook, use the SaveAs method to specify a name for the file.
Once you change to using SaveAs there should be a confirmation dialog to deal with. If you want to suppress that you can set a property, with one (or either?) of
$Excel->Application->{DisplayAlerts} = 0;
# or
$Excel->{DisplayAlerts} = 0;
For a number of options, including backups for example, see the Chapter on OLE automation in PERL in a Nutshell.
A note on some other resources. There is a cookbook of sorts in this post on perlmonks. A listing of various operations is given in this SO post.
Finally, I don't know how deep the reasons for using OLE are but if it is only about writing some Excel files there are other modules. For example the very well regarded Spreadsheet::WriteExcel and Excel-Writer-XLSX.

That's very strange Perl code. eval without checking $# afterwards is just wrong -- you need to know if a step of your code has failed for the following steps to make sense
It looks like the problem is in your call to copy($infiles, $reportfile). That will save one copy of the file, while $Book->Save and $Book->Close will save another

Output other than .txt

I'm looking to build a simple program that will simply modify existing output files from an other program so I don't have to open the program and enter a bunch of data the long way. This program is very specific to my domain and has an extension named .wcc. However, when I change the extension of one of these output files to .txt, I get half gibberish :
ÿÿ WPointÿÿ WPolygonÿÿ  WQuadrilateralÿÿ  WMemberDataÿÿ
WLoadÿÿ WLStandardMembersÿÿ WLSavedDesignSettingsÿÿ WLSavedFormatSettingsÿÿ  WLSavedViewSettingsÿÿ WLSavedProjectSettingsÿÿ  WLSavedSettingsÿÿ  WLSavedLoadSettingsÿÿ WLSavedDefaultSettingsÿÿ WLineÿÿ WProductÿÿ WBeamDataÿÿ  WColumnDataÿÿ
WJoistDataÿÿ
WWallStudDataÿÿ WSupportingMemberDataÿÿ WSavedAnalysisSettingsÿÿ WSavedGravityDesignSettingsÿÿ WSavedPreferencesSettingsÿÿ WNotchÿÿ WIJoistÿÿ WFloorCWC37 ÀAE LumberS-P-F No.1/No.2 # À# lumwall.cww ÿÿÿÿ1.2.3.1.Mur_1_EX-D ÿÿÿÿÿÿ B Cÿÿ B C €? 4C 4C   Neige #F #F ÈC ÿÿÿ
WLStandardMembersÿÿ "
There are also musical notes and perpendicular signs which I can't copy paste here. I can sorta read the text, but still not enough to make modifications via txt file. What type of file could this be? Is it even possible to do what I'm trying to do? Thanks!

I am surprised that you are trying to open a .wcc file as a text file (it's contents - as you will see - don't lend themselves to being converted to such a file type); however, the attempt to open the file as a .txt file seems to be specific to your domain.
I noticed part of your question is as follows: "What type of file could this be?"
You are right in thinking that the .wcc file is a rather obscure file type - we don't think about that file type a lot (or are not conscious of it existing). A .wcc file is a WinCam 2000 Cache file that allows WinCam 2000 movies to be previewed in the slide browser - these were often generated by older WinCam 2000 screen recording and editing programs.
Again, the file extension is very rare these days (a Google search only returns ~700 results). But, it appears you have a program that is producing the file, which - as you are saying - "is quite specific to your domain". You may be out of luck with regard to opening them for modification purposes.
Supposedly, you can covert .wac files to .wav files, which are much more relevant to today's technology (and definitely alterable from code); however, without knowing the purpose of the file, e.g. what you are trying to do with the file domain-side, I can't say that this will suit your needs.
Also, the above comments are "correct": changing a file extension will not convert the file to the file extension type. Typically, converters - like a simple software - are needed to convert files.

How can I search in PDF documents/PDX catalog in powershell

I have a vendor that supplies their documentation library as a series of PDF files (and some CHM files) and include a .PDX catalog also.
I want to write a powershell script to front end it (using either powershell forms, or hosting powershell in asp.net).
I'm in the early stages, I've worked out how to get document information from the PDF stream (the xmpmeta XML metadata block near the end of the PDF file - one of the few streams in the file that's in plaintext) which looks like this:
<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 4.2.1-c043 52.372728, 2009/01/18-15:08:04
"><rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"><rdf:Description rdf:about="
" xmlns:pdf="http://ns.adobe.com/pdf/1.3/"><pdf:Producer>GPL Ghostscript 8.64</pdf:Producer><pdf:Keywo
rds>86000056-413</pdf:Keywords></rdf:Description><rdf:Description rdf:about="" xmlns:xmp="http://ns.ad
obe.com/xap/1.0/"><xmp:ModifyDate>2011-03-03T17:38:34-05:00</xmp:ModifyDate><xmp:CreateDate>2011-01-28
T23:12:07+05:30</xmp:CreateDate><xmp:CreatorTool>PScript5.dll Version 5.2</xmp:CreatorTool><xmp:Metada
taDate>2011-03-03T17:38:34-05:00</xmp:MetadataDate></rdf:Description><rdf:Description rdf:about="" xml
ns:xmpMM="http://ns.adobe.com/xap/1.0/mm/"><xmpMM:DocumentID>6cb2263d-2d61-11e0-0000-1390d57dcfcb</xmp
MM:DocumentID><xmpMM:InstanceID>uuid:1a0e68ba-14ad-4a03-b7a1-0a0e127b8753</xmpMM:InstanceID></rdf:Desc
ription><rdf:Description rdf:about="" xmlns:dc="http://purl.org/dc/elements/1.1/"><dc:format>applicati
on/pdf</dc:format><dc:title><rdf:Alt><rdf:li xml:lang="x-default">I/O Subsystem Programming Guide</rdf
:li></rdf:Alt></dc:title><dc:creator><rdf:Seq><rdf:li>Unisys Information Development</rdf:li></rdf:Seq
></dc:creator><dc:description><rdf:Alt><rdf:li xml:lang="x-default">ClearPath MCP 13.1,Application Dev
elopment,Administration,ClearPath MCP</rdf:li></rdf:Alt></dc:description></rdf:Description></rdf:RDF><
/x:xmpmeta>
using the following code (powershell v3, in v2 you need to select and expand the properties thus [string]$title = ($rdf.GetElementsByTagName('dc:title')| Select -expand Alt|Select -expand li)."#text"):
$file = ".\Downloads\68698703-007\PDF\86000056-413.pdf"
#determine what line in file the xmpmeta string starts
[int]$startln = (select-string -pattern '^<x:' $file).ToString().Split(":")[2]
#determine what line in file the xmpmeta string ends
[int]$endln = (select-string -pattern '^</x:' $file).ToString().Split(":")[2]
$startln--
#grab the xmpmeta and cast as type xml
[xml]$xmp = (gc $file)["$startln".."$endln"]
[xml]$rdf = $xmp.xmpmeta.InnerXml
#get title/creator/description element text
[string]$title = $rdf.GetElementsByTagName('dc:title').Alt.li."#text"
[string]$creator = $rdf.GetElementsByTagName('dc:creator').Alt.li."#text"
[string]$description = $rdf.GetElementsByTagName('dc:description').Alt.li."#text"
That's crucial because the filenames are in the format 12345678-123.pdf, the actual title is in the metadata itself, as well as document category etc.
So, I can produce a list of documents (displaying their proper titles, not the real filename) and allow them to be launched, but I also want to be able to search in all the documents using PDX file, but it's by no means plaintext!
I guess I could use one of a number of tools out there to convert each PDF into text, search it, repeat for each document and then return results for each document.
But, it strikes me that Adobe Reader already does that, so can I either start AcroRd32.exe with switches that will start the search, with search terms I've passed in to the AcroRd32 program, or can I use Adobe Search.API from within Powershell?
Any ideas specifically on automating load of the .PDX in Adobe Reader and firing off the search, or using adobe's API in powershell?
EDIT:
I can now launch acrobat from command line and search (so could mimic this in powershell) but the search only works when searching a PDF, not a PDX catalog. Both bring up the search pane, but only in a PDF document does the search field get populated and the search executed.
C:\Program Files (x86)\Adobe\Reader 10.0\Reader>AcroRd32.exe /A "search=trim" "P:\Doc Library\PDF\00_home.pdx"
Or
C:\Program Files (x86)\Adobe\Reader 10.0\Reader>AcroRd32.exe /A "search=trim" "P:\Doc Library\PDF\86000056-413.pdf"
Regards,
Graham

This is an old post, but be aware that the searching you do is potentially dangerous and that there is a better way to find the XMP metadata in a PDF file. XMP was designed specifically to be "findable" by text search. To that purpose it has a well defined begin and end code defined that is in there specifically so that you can extract the XMP data without having to parse the PDF format (or any other format the XMP metadata blob might be embedded in.
You can download the XMP specification here: http://www.adobe.com/devnet/xmp.html. Part 1 is the part where the explanation about XMP Packets explains how a text scanner can find the XMP packet with more accuracy.
Finally, PDF has an additional quirk that allows it to be incrementally updated. This might cause multiple XMP packets to appear in the file (where the last packet is normally the correct one). But annoyingly when the PDF is exported from applications like InDesign, images in the PDF (and other objects) might also have their own "object" XMP attached to it.
So consider where your files come from and how many strange things you might encounter and you want to provision for. But reading the XMP specification is not a bad idea for sure.

Not using colnames when reading .xls files with RODBC

I have another puzzling problem.
I need to read .xls files with RODBC. Basically I need a matrix of all the cells in one sheet, and then use greps and strsplits etc to get the data out. As each sheet contains multiple tables in different order, and some text fields with other options inbetween, I need something that functions like readLines(), but then for excel sheets. I believe RODBC the best way to do that.
The core of my code is following function :
.read.info.default <- function(file,sheet){
fc <- odbcConnectExcel(file) # file connection
tryCatch({
x <- sqlFetch(fc,
sqtable=sheet,
as.is=TRUE,
colnames=FALSE,
rownames=FALSE
)
},
error = function(e) {stop(e)},
finally=close(fc)
)
return(x)
}
Yet, whatever I tried, it always takes the first row of the mentioned sheet as the variable names of the returned data frame. No clue how to get that solved. According to the documentation, colnames=FALSE should prevent that.
I'd like to avoid the xlsReadWrite package. Edit : and the gdata package. Client doesn't have Perl on the system and won't install it.
Edit:
I gave up and went with read.xls() from the xlsReadWrite package. Apart from the name problem, it turned out RODBC can't really read cells with special signs like slashes. A date in the format "dd/mm/yyyy" just gave NA.
Looking at the source code of sqlFetch, sqlQuery and sqlGetResults, I realized the problem is more than likely in the drivers. Somehow the first line of the sheet is seen as some column feature instead of an ordinary cell. So instead of colnames, they're equivalent to DB field names. And that's an option you can't set...

Can you use the Perl-based solution in the gdata instead? That happens to be portable too...

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string