Perl reading XLSX embeded documents - linux

First post here so if I am doing something stupid, please let me know.
CentOS 7(3.10.0-123.6.3.el7.x86_64), perl5 (revision 5 version 16 subversion 3), Spreadsheet::ParseXLSX v0.16
I have an .xlsx that has some embedded .pdf's and another xlsx in embedded within it. I am trying to read these and insert them into a csv along with the original xlsx. After some googl'ing to no avail my thought was to unzip the original xlsx and read the files from the xl/embeddings directory but, alas there is something with the file that causes adobe reader to not be able to read the oleObject(X).bin files in that dir.
I have been able to successfully read all the worksheets that dio not contain embedded docs using Spreadsheet::ParseXLSX, works great. I have googled for a solution but am either not using correct search prams or ...
If you know how to do this can you point me to some instructions?
TIA,
JohnM

Related

Error while loading excel file into pandas : xlrd.biffh.XLRDError: Workbook is encrypted

I am downloading a file using python and read it. But while reading the xls file it throws xlrd.biffh.XLRDError: Workbook is encrypted I am able to open the file manually but not in python
My code
df = pd.read_excel(filepath)
Can somebody help me on this! . I have researched a lot to crack this issue but None worked.
XLRD is not capable of handling file with encryption on it's own, but there is another Python library that actually can unencrypt a lot of MS Office files including the one you are trying to read .xls. It's called msoffcrypto-tool

Talend 7.1 tFileOutputExcel corrupt file

I'm trying to output an excel file from Talend 7.1. I've tried a few different setups and both xls and xlsx formats but they all result in the output file being corrupt and not being able to open it.
What am I doing wrong? I am loading an xlsx file into a database and this part works fine but outputting to excel I just can't figure it out! I was writing from the tMap directly to the tFileOutputExcel and it wasn't working (corrupt) so I changed it to write to a csv file first and then write that csv to the tFileOutputExcel but it is still corrupt.
This is my job detail:
And this is the settings in the tFileOutputExcel
I got this working by changing the transfer mode in the FTP component from 'ascii' to 'binary'. Such a simple thing but if this helps anyone else with this issue who is a newb like me :)

Can't get to exif data .JPG image

I'm trying to read the exif data from a .JPG image. I've tried differents solutions found here and there (PIL, piexif, exifread...) and none of them worked for this set of images. It worked for other images taken from another camera but not for this one, all these different methods returning empty dictionaries. It seems that there is no exif data but (I apologies for my newbyness) when I RIGHT-click + properties (I use windows), I do see what is exif data to me : date of creation, etc...
Here is one image :
image.JPG
If another of the thousands of anonymous heroes could help me on this one, I would be very grateful...
Alright so I found a solution which I share now.
The problem is that the libraries that open metadata are not taking all possible configurations for the image file and therefore, they can handle some and some others they cannot. I finally made it using exiftool, an executable that I dowloaded on my windows on this link :
https://sno.phy.queensu.ca/~phil/exiftool/
Then I paste the executable in a folder and I add exiftool.py in that folder, that I got from :
https://github.com/smarnach/pyexiftool/find/master
Then, using this small piece of code (for example):
import exiftool
with exiftool.ExifTool("exiftool.exe") as et:
metadata = et.get_metadata_batch(files)
for d in metadata:
print("{:20.20} {:20.20}".format(d["SourceFile"],
d["File:FileCreateDate"]))
Of course, this is just to show that you indeed can access the metadata, then you can do whatever you want with that. Here is the documentation of the library exiftool : http://smarnach.github.io/pyexiftool/
Cheers, JM

What .xlsx file format is this?

Using an existing SSIS package, I was trying to import .xlsx files we received from a client. I received the error message:
External table is not in the expected format
These files will open in XL
When I use XL (currently XL2010) to Save As... the file without making any changes:
The new file imports just fine
The new file is 330% the size of the original file
When changing .xlsx to .zip and investigating the contents with WinZip:
The original file only has 4 .xml files and a _rels folder (with 2 .rels files):
The new file has the expected .xlsx contents:
Does anyone know what kind of file this could be?
It would be nice to develop my SSIS package to work with these original files, without having to open and re-save each file. There are only 12 files, so if there are no other options, opening/saving each file is not that big of deal...and I could automate it with VBA going forward.
Thanks for any help anyone can provide,
CTB
There are many Excel file formats.
The file you are trying to import may have another excel format but the extension is changed to .xlsx (it could be edited by someone else) , or it could be created with a different Excel version.
There is a Third-Part application called TridNet File Identifier which is an utility designed to identify file types from their binary signatures. you can use it to specify the real extension of the specified file.
Also after a simple search on External table is not in the expected format this error is thrown when the definition (or version) of the excel files supported in the connection string is different from the file selected. Check the connection string used in the excel connection manager. It might help to identify the version of the file.

How to convert Paradox to Excel

I am trying to work with Paradox files and convert these to an Excel file.
Does anyone know how to achieve such conversion?
I wrote a small Python script to read Paradox .DB files.
But please be careful, it's not complete: some field types may not be converted (only memos AFAIK, but I'm not a Paradox expert).
https://gist.github.com/BertrandBordage/9892556
You can either read a .DB file as Python objects using paradox.read('your_file.DB') or convert it to a CSV file using paradox.to_csv('your_file.DB').

Resources