Identifying Different Excel File Formats - excel

Is anyone familiar with a library or tool that can determine which format an excel file is in? Or, failing that, documentation on the different formats that would allow me to write my own?

The Excel file format is called the Binary Interchange File Format (BIFF) there are different versions of Excel that use the same version of BIFF.
Open Office document on the Excel File Format.
Take a look at the Open Office API, this should help you.

Excel 97-2003 workbooks are known as Biff8. They are actually OLE Compound documents which are essentially a file system within a file. They store the main workbook in a stream named "Workbook" and they have other streams for VBA modules, OLE objects, document properties, etc...
Win32 includes APIs for reading OLE Compound Documents. They are far from trivial. Once you get the "Workbook" stream, the first Biff record identifies the file as being an Excel file.
You can find excellent documentation from Microsoft on the Biff8 file format on the Microsoft Office Binary File Formats page.
The new Excel 2007 Open XML (xlsx) format is actually a zip file with workbook parts and is documented at OpenXmlDeveloper.org.
I am not aware of a tool which will simply tell you the format of a workbook. You could take the easy, but not very reliable approach of just looking at the extension which will be right 99%+ of the time - if accuracy is not an issue.
There are many tools to read xls and xlsx workbooks, including SpreadsheetGear for .NET which reads both.
Disclaimer: I own SpreadsheetGear LLC

Related

Can I read other than the first sheet with the Read Delimited Spreadsheet.vi function in LabVIEW?

I am using Read Delimited Spreadsheet.vi in LabVIEW and need to read data from other than the first sheet. How do I tell LabVIEW that want to use other than the first sheet?
CSV files are plain text files and there are no multiple sheets inside.
Sheets are present within Excel files, but this function "Read Delimiter Spreadsheet" does not work with these.
Unfortunately LabVIEW still doesn't have built-in support for reading Excel files as far as I know, although it can write them with the Save to Measurement File express VI.
There are third-party toolkits available for reading Excel files in LabVIEW, or you might be able to use some Python code with openpyxl or pandas.
I suppose you need to read a Excel file (.xls or .xlsx) and not a CSV file (as suggested by Mateusz the CSV file doesn't have sheets).
Anyway, in LabVIEW you can read, write, manipulate and do any other operations on Excel files by using ActiveX. It is verbose but you can use it as any other LabVIEW library.
Look this post or the built-in examples in your LabVIEW environment
As mentioned above, you can use Excel to read a spreadsheet. Alternately, you can use LibreOffice.
LibreOffice to LV library

using MS XLSB instead of MS XLS

I need opinion on switching from MS XLS to XLSB. I have several models on MS xls file (Microsoft Excel 97-2003 Worksheet .xls) and have been using those models since many years. The xls files has lot of data, it has formula, macros, add-ins, formula to pull data from databases such as Bloomberg, Factset or Haver. I am planning to shift the model from MS Xls to MS xlsb i.e. binary format, however want to be sure if everything would work fine in binary format.
Can you please let me know if MS xls files are completely compatible with MS xlsb? ? Is there any disadvantage of using XLSB? Would be really thankful for your help.
XLS and XLSB are different file formats. The big disadvantage of XLSB is that it only works in Excel 2007+, so you can't depend on add-ins that only work in Excel 2000.
If your file works correctly in Excel 2007 or 2010 or 2013, then XLSB will preserve every feature from the XLS. This is different from XLSX, where many features are intentionally omitted.

xla vs xlam addin, what is the difference?

Could someone please explain the difference between an xla Excel addin format and an xlam Excel addin format? Googling didnt provide anything useful.
The m stands for macro-enabled which is the new format (as from Excel 2007).
These are add-ins that may call macro's.
On the other hand, you could also have xlax extensions, which are meant for macro-freeworkbooks.
Note also the difference between xls and xlsm, where xlsx files also don't contain macro's.
Why? My guess is that the main reason would be security.
Some people don't like to receive files, not knowing if there are potentially harmful macro's in it. In the old format, you could not make the distinction based on the file extension.
Both files are macro enabled files:
XLA files are excel files for office 97 - that are loaded as addins
XLAM files are excel 2003+ files, which are actually zip files that have xml documents inside them per opendocument protocol.

Excel file and program structure

I need to know for a school project how Excel work. Precisely I need to know what kind of structure is behind an Excel file and how the Excel program work with this file.
I know Excel is a Microsoft propriety and it' s not Open Source so I know I can' t find too much on this argument... But everything that can help me to understand how excel work it' s useful.
If I could not find something about Excel I will try to take a look at Open Office or Open Document format. So even some information about this will be real useful.
Thanks to all
You can find details of the MS Office BIFF file formats here in the microsoft.com library, while the Office Open XML format is published here on the ECMA site and here in the microsoft.com library.
You can find specifications for the OpenDocument format used by Open Office on the OASIS site
It is simpler than you may think.
An excel file is just a zip file of multiple XML documents. Each XML document corresponds to one spreadsheet in the Excel file.
You will find the XML sheets at xl\worksheets inside the zip folder.
You can scripting reading and writing to it.

Programmatically Determine If An Excel File (.xls) Contains Macros

Is there any way to programmatically determine if an .xls contains macros, without actually opening it in Excel?
Also are there any methods to examine which certificate (including timestamp cert) these macros are signed with? Again without using Excel.
I'm wondering in particular if there are any strings that always show up in the raw data of an Excel file when macros are present.
Yes, you can open the .xls file as a compound document file and check whether is contains a VBA folder and streams containing VBA code.
Sample code is available in this CodeProject article:
Another OLE Doc Viewer but with editing facility
The certificate information is stored in the DocumentSummaryInformation stream. If you want to read out the information from there you should dig into the file format specifications available from Microsoft:
[MS-OSHARED]: Office Common Data Types and Objects Structure Specification
[MS-OFFCRYPTO]: Office Document Cryptography Structure Specification
An xls file containing a macro should contain a string looking something like
Keyboard Shortcut:
Don't know if this is a surefire solution though

Resources