Reading password protected XLSX on linux (and windows) with Perl

Reading password protected XLSX on linux (and windows) with Perl - linux

I'm trying write a simple perl script that reads some fields from a password protected XSLX file.
I've looked at Spreadsheet::XLSX and SimpleXlsx but neither seem to support password protected files.
Any idea how this can be done?
Using Win32::OLE
This is done like so:
my $Book =
$Excel->Workbooks->Open( { FileName => $file, Password => $password } );

None of the current Perl xlsx reading modules support reading encrypted files.
It isn't straightforward to decrypt these files since the encrypted XML files are stored in an OLE container document as opposed to the usual ZIP container.

This "should" be doable with OpenOffice/LibreOffice. There seem to be quite a few bugs around xlsx and encrypted file support, not to mention the combination, so I'd try opening the files in LibreOffice GUI first and if that works for your specific files, call it via library or command line.
OpenOffice::OODOC is the Perl connector, if that doesn't work you can use the command line to convert to a non-password protected file and then open it in your tool of choice.

Related

Check if Excel file saved in compatibility mode without using Excel

I have recently been experimenting with perl and some modules to read Excel files and in particular the format of thier cells.
For example I wrote a piece of perl code that used the module ParseExcel to read a cells background colour. However while testing I noticed that for certain files the colour returned by my perl program did not match the colour reported by Excel. Eventually I found the reason for this was that the file I was reading was a .xls file saved in compatibility mode. Basically the creator of the file had used the functionality of Excel .xlsx type files (2007+) to colour some of the cells and then saved the file with the old .xls file extension that did not support the colours chosen.
So my question: Is there any way to tell whether a given .xls file (or any other old Excel file format) has been saved in compatibility mode without usung Excel to find out? The reason I ask is that I am working under a linux environment and can't use any windows tools to analyse the files.
Furthermore, if one could identify that a given Excel file has, indeed, been saved in compatibiity mode is there any way of knowing how the original colours were mapped to the ones that my program is telling me?
Many thanks for any help on this.

I do not think that you can do this using Spreadsheet::ParseExcel. I have tried saving an xls file with a color from an .xlsx and saving it with 2003 compatibility. Then comparing it with an empty .xls of 2003 and I do not see any difference in my files.
You can try the following code to debug it with your own files trying to find a difference that you could use:
use strict;
use warnings;
use Spreadsheet::ParseExcel;
use Data::Dumper;
use JSON;
use Test::More tests => 1;
my $file_1 = 'test_xls.xls';
my $file_2 = 'compat_xls.xls';
my #files = (
$file_1,
$file_2,
);
my #workbooks;
foreach my $file (#files){
print("\n\nReading $file\n");
my $parser = Spreadsheet::ParseExcel->new();
my $workbook = $parser->parse($file);
# print Dumper($workbook->{PkgStr});
delete $workbook->{PkgStr};
delete $workbook->{File};
delete $workbook->{Worksheet}->[0]->{MinRow};
delete $workbook->{Worksheet}->[0]->{RowHeight};
delete $workbook->{Worksheet}->[0]->{_Pos};
delete $workbook->{Worksheet}->[0]->{MinCol};
delete $workbook->{Worksheet}->[0]->{MaxCol};
delete $workbook->{Worksheet}->[0]->{MaxRow};
delete $workbook->{Worksheet}->[0]->{Cells};
delete $workbook->{Format}->[62];
push #workbooks, $workbook;
}
my ($ok, $stack) = is_deeply($workbooks[0], $workbooks[1]);
my $diag = explain($stack);
print(Dumper($diag));

Finding and saving present directory of a SAS-Studio-program in a Linux server

I am trying to make a macro variable in SAS Studio which saves the "present working directory" as a macro variable.
The SAS-program is run in a "CPF" process flow file in SAS Studio, and the whole SAS-file and processes are saved and run in a Linux server.
In SAS-Studio, the location of CPF-process flow file seems like in the directory /sasdata/model_v1, and when I run a Linux command like X "pwd" then I expect that the result will give /sasdata/model_v1, but I get another directory instead like /sasinstall/sasconfig/Lev1/SASApp instead, I guess the the process flow file with CPF-suffix is run from this directory.
So the question is how I can find and save the working directory of my cpf-file and save as a macro-variable, or even maybe for my other sas-files too, I may need the solution for both SAS-files and CPF-files.
If I find the directory, then I guess it should be enough to save them as macro-variable by using %let macrovariable = "/directory"

I don't think SAS will show you the path of the process file. It doesn't in SAS/Studio 3.5.
It will set the path for a normal program file (as long as you have saved it) in the _SASPROGRAMFILE macro variable.

Writing PDF binary file from stream yields malformed PDF

Dear Stack Overflow users,
I would appreciate you kind help with the following problem:
We have an Apache server functioning as a forward proxy, with ext_filter configured: whenever the response is of MIME type PDF, the filter is called (a perl script), and the PDF's content may be read from the STDIN. We read the PDF from STDIN, write it to a file and that's all. This almost always work well, but on one specific website, the PDF is malformed when written in the following way:
my $input_file = shift;
binmode STDIN;
open(OUT, ">" . $input_file);
binmode OUT;
foreach my $line (<STDIN>){
print OUT $line;
}
close OUT;
If we instead call 'tee' (set the filter to use 'tee')- the file is written correctly. Analyzing the malformed PDF shows that the xref table is malformed in the PDF we write and Adobe Reader fails to open it. We have already tried using sysopen,sysread etc. , using ":raw", and several other ways to write a binary file properly, and nothing worked (cut&paste code from documnetation for writing binary files). Only when using the 'tee' utility in linux as the filter, it was written correctly. This doesn't help us- we need to be able to write it to a file from stdin as part of the perl script. Any suggestions? If there could be a way to somehow call 'tee' with a system call, and give it STDIN of the perl program- it might could work. Many thanks in advance.

Well, although the code was basiclly correct, putting it inside "eval" somehow ruined thd PDF.
I still don't understand why, but deleting the eval solved the problem.
The perl is called from a context of ext_filter module of Apache.
I'll farther investigate this and update when I'll find an explanation for this.
Thanks for everyone.

using FileChooser to save a file with default filename

I wat to save a file.I use this.
FileChooser fileChooser = new FileChooser();
File file = fileChooser.showSaveDialog(null);
But in the dialog I want to suggest a name for the file, so that the user only selects a directory for the given file.The name of the file is known already.So i want to suggest that filename.
ThankYou.

This is now fixed in Javafx 2.2.45 (bundled with java 7.0_45 now) and you can do what the OP is suggesing with the following property of fileChooser, setInitialFilename, used as such:
FileChooser myFile = new FileChooser();
myFile.setInitialFileName("Whatever_file_I_want.coolFile");
Now, I don't think there is anyway to STOP the user from choosing a different file, but at leas this will give them a default you want them to pick.

Initial file name providing - it is a thing, which requires to transfer your string (initial name) through native call, to the call of the native file chooser. It is a complex thing, and you can look at these issues about its implementing :
http://javafx-jira.kenai.com/browse/RT-16111 (main one)
http://javafx-jira.kenai.com/browse/RT-24588
http://javafx-jira.kenai.com/browse/RT-24612
They all have fix version lombard, so, they are fixed in JDK 8.
So, you can specify initial file name for a file, starting from JDK 8 (you can access it, downloading JDK early access).
Recently, I've tested this feature, and it is working.
There is a method setInitialName() or smth like that.
And, as I've mentioned, it is a complex thing, and you are not likely to be able to implement it by yourself (until you are able to build jfx).
So, the decision - to wait until JDK8 release, or to use early access builds. Or, to use your own implementation of file chooser.

Here's a workaround that worked for me:
you can use javafx.stage.DirectoryChooser to select a directory for the file you want to save and after saving create a new file in this directory with the default name and extension.
DirectoryChooser dc = new DirectoryChooser();
File file = dc.showDialog(null);
if (file != null) {
file = new File(file.getAbsolutePath() + "/dafaultFilename.extension");}

Proper way to differentiate pst and dbx files in bash shell

I want to identify the file-format of the input file given to my shell script - whether a .pst or a .dbx file. I checked How to check the extension of a filename in a bash script?. That one deals with txt files and two methods are given there -
check if the extension is txt
check if the mime type is application/text etc.
I tried file -ib <filename> on a .pst and a .dbx file and it showed application/octet-stream for both. However, if I just do file <filename>, then I get
this for the dbx file -
file1.dbx: Microsoft Outlook Express DBX File Message database
and this for the pst file -
file2.pst: Microsoft Outlook binary email folder (Outlook >=2003)
So, my questions are -
is it better to use mime type detection everytime when the output can be anything and we need a proper check?
How to apply mime type check in this case - both returning "application/octet-stream"?
Update
I didn't want to do an extension based detection because it seems we just can't be sure on a Unix system, that a .dbx file truly is a dbx file. Since file <filename> returns a line which contains the correct information of the file (e.g. "Microsoft Outlook Express DBX File Message database"). That means the file command is able to identify the file type properly. Then why does it not get the correct information in file -ib <filename> command?
Will parsing the string output of file <filename> be fine? Is it advisable assuming I only need to identify a narrow set of data storage files of outlook family (MS Outlook Express, MS Office Outlook 2003,2007,2010 etc.). A small text identifier like application/dbx which could be compared would be all I need.

The file command relies on having a file type detection database which includes rules for the file types that you expect to encounter. It may not be possible to recognize these file types if the file content doesn't have a unique code near the beginning of the file.
Note that the -i option to emit mime types actually uses a separate "magic" numbers file to recognize file types rather than translating long descriptions to file types. It is quite possible for these two databases to be out of sync. If your application really needs to recognize these two file types I suggest that you look at the Linux source code for "file" to see how they recognize them and then code this recognition algorithm right into your app.
If you want to do the equivalent of DOS file type detection, then strip the extension off the filename (everything after the last period) and look up that string in your own table where you define the types that you need.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string