How to parse strict *.xlsx file in Java - apache-poi

I need to parse data from xlsx file. Currently I'm using Jakarta-POI (v. 3.11) to do that. It handles fine some xlsx but not all. I noticed that the files that are not parsed properly are "strict xlsx" files saved with Office 2013. To be more exact this files are compliant with ISO29500 not ECMA-376 the difference is that in ISO29500 file there are relationships with type:
http://purl.oclc.org/ooxml/officeDocument/relationships/officeDocument
and Jakarta-POI is looking for:
String CORE_DOCUMENT =
"http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument"
Is there a way to make Jakarta-POI read this files?

OOXML Strict Converter for Office 2010 may help if you need to resave the docs using an older format.
Some of the purl namespaces are listed on http://pyxb.sourceforge.net/PyXB-1.2.2/bundles.html (Jethro's link above appears to no longer work).
The up to date XML schema files can be found at:
http://www.ecma-international.org/publications/standards/Ecma-376.htm

Related

Reading XLSB file - Apache POI

I have referred all post in stack overflow related to reading XLSB file using apache POI.
I tried many ways to read XLSB file using available links/example mentioned in post. But I am ended up in issues.
I am using latest Apache POI 3.17 and used the code mentioned in
Link :
Exception reading XLSB File Apache POI java.io.CharConversionException
Section: Post mentioned by "Gagravarr "
I am getting the following errors
The method getLocale() is undefined for the type XSSFBEventBasedExcelExtractor
The method getFormulasNotResults() is undefined for the type XSSFBEventBasedExcelExtractor
The constructor XSSFEventBasedExcelExtractor.SheetTextExtractor() is not visible
The method getIncludeSheetNames() is undefined for the type XSSFBEventBasedExcelExtractor
.......................... etc
I checked the base class "XSSFEventBasedExcelExtractor" in poi-ooxml-3.17.jar (source files) and I can able to find implementation for all the method.
I wanted to know whether this is an known issue ? Does it mean that there is no working example available to read XLSB files in Java.
I hope this query is not duplicate.
Recently, i study how to use poi to read xlsb.
If you just want to read a xlsb purely, you can use the apache test example code as the following.
https://svn.apache.org/repos/asf/poi/trunk/src/ooxml/testcases/org/apache/poi/xssf/eventusermodel/TestXSSFBReader.java
In fact, xlsb use .bin file instead of .xml file.
If you want to do more thing to xlsb file, you can read this document as the following.
https://msdn.microsoft.com/en-us/library/office/cc313133(v=office.12).aspx

CoSign API: Signing a BYTE stream using SAPI?

I would like to extend the question asked here for other types of byte streams. I would like know how can I map byte streams extension to SAPI_ENUM_FILE_TYPE? I know that pdf files should be mapped to SAPI_ENUM_FILE_TYPE.SAPI_ENUM_FILE_ADOBE however I am not quite sure how to perform it for other type of files (e.g. Office documents)
Taken from the online documentation:
SAPI_ENUM_FILE_NONE - Not in use.
SAPI_ENUM_FILE_WORD - Word file (.doc file).
SAPI_ENUM_FILE_ADOBE - PDF file (Adobe).
SAPI_ENUM_FILE_TIFF - TIFF file.
SAPI_ENUM_FILE_DETACHED - Not supported in the current version.
SAPI_ENUM_FILE_P7M - Not supported in the current version.
SAPI_ENUM_FILE_XML - XML File. Supported from version 5.
SAPI_ENUM_FILE_OFFICE_XML_PACKAGE - Office 2007 file type (.docx file or .xlsx file).
SAPI_ENUM_FILE_INFOPATH_XML_FORM - InfoPath 2007/2010/2013 form (.xml file).

Excel biff5 to biff8 conversion

My system uses Apache-POI to manage some xls files. Now I've got almost 300 xls files, but it appears that they are in an old format so i got this exception:
The supplied spreadsheet seems to be Excel 5.0/7.0 (BIFF5) format. POI only supports BIFF8 format (from Excel versions 97/2000/XP/2003)
Is there a way to handle that or to automatically convert all those files to a biff8 format?
Go with converting it to OOXLS format, POI supports both BIFF8 and newer OOXLS. Download official Microsoft converter pack:
http://www.microsoft.com/en-us/download/details.aspx?id=3
Convert files by running excelcnv.exe -oice <input file> <output file>. You can try run it directly from your code as external program, or create some batch file. There is a good explanation from mrdivo at social msdn here.
EDIT
The download mentioned above from microsoft.com is no longer available as of 6/21/2018. However, excelcnv.exe is a standard part of some Microsoft Office installations. It has been confirmed to be deployed with Office 2014 and Office 2016, and possibly other versions. It can be found at:
C:\Program Files (x86)\Microsoft Office\root\Office16` (or `Office14`).
It seems apache-POI can't handle BIFF5 format.
You should try to use Java Excel API instead : http://jexcelapi.sourceforge.net/

Working with excel files using apache poi

Is there any way to read or write both excel 2003 and 2007 format using apache poi.I know that we can use HSSF workbook for 2003 format and XSSF for 2007(correct me if am wrong).But is there any way to read both the format using any single workbook but not using separately.
Yes, you can do it. In fact, it's fairly widely documented on the Apache POI website!
If you already have code that uses HSSF, then you should follow the HSSF to SS converting guide for help on updating your code to be general across the two formats.
If you don't have any code yet, then follow the User API guide to get started - all the code in that is general for both formats. You can also look at the Quick Guide for some specific problems and how to solve them in the general way.
Use
WorkbookFactory.create(in);
Based on the javadoc, it
Creates the appropriate HSSFWorkbook / XSSFWorkbook from the given
InputStream.
Try Workbook wb = WorkbookFactory.create(OPCPackage pkg);.
It should work. However, if the XSSF is too big you will get an OutOfMemoryException and therefore you should use the event user model to read your file. In that case you should read your path and check the extension of your file, like following:
private boolean isXLS(String inputPath) {
String tmp = inputPath.substring(inputPath.length() - 3,
inputPath.length());
if (tmp.equalsIgnoreCase("XLS"))
return true;
else
return false;
}
Read the How-to for more information about the event user model.

Setting mime type for excel document

MS Excel has the following observed MIME types:
application/vnd.ms-excel (official)
application/msexcel
application/x-msexcel
application/x-ms-excel
application/x-excel
application/x-dos_ms_excel
application/xls
application/x-xls
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet (xlsx)
Is there any one type that would work for all versions? If not, do we need to set response.setContentType() with each one of these mime types individually?
Also, we use file streaming in our application to display document (not just excel - any type of document). In doing so, how can we retain the filename if the user opts to save the file - currently, the name of the servlet that renders the file appears as the default name.
I believe the standard MIME type for Excel files is application/vnd.ms-excel.
Regarding the name of the document, you should set the following header in the response:
header('Content-Disposition: attachment; filename="name_of_excel_file.xls"');
Waking up an old thread here I see, but I felt the urge to add the "new" .xlsx format.
According to http://filext.com/file-extension/XLSX the extension for .xlsx is application/vnd.openxmlformats-officedocument.spreadsheetml.sheet. It might be a good idea to include it when checking for mime types!
For .xls use the following content-type
application/vnd.ms-excel
For Excel 2007 version and above .xlsx files format
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
I was setting MIME type from .NET code as below -
File(generatedFileName, "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet")
My application generates excel using OpenXML SDK. This MIME type worked -
vnd.openxmlformats-officedocument.spreadsheetml.sheet
I am using EPPlus to generate .xlsx (OpenXML format based) excel file. For sending this excel file as attachment in email I use the following MIME type and it works fine with EPPlus generated file and opens properly in ms-outlook mail client preview.
string mimeType = "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet";
System.Net.Mime.ContentType contentType = null;
if (mimeType?.Length > 0)
{
contentType = new System.Net.Mime.ContentType(mimeType);
}
For anyone who is still stumbling with this after using all of the possible MIME types listed in the question:
I have found that iMacs tend to also throw a MIME type of "text/xls" for XLS Excel files, hope this helps.

Resources