I'm processing xlsx files in PHP. When I create xlsx file using openoffice with just one filled cell, the spreadsheet already has 65536 rows itself. It looks like openoffice stores all cells even if they are empty. Is there any solution I can save only as many rows/columns as they are filled? Because every library for parsing xlsx produces then wrong number of rows so I'm not able to process it correctly. Good solution would be to achieve it directly inside openoffice, or use some external tool/script (*nix based) to clean up such xlsx file if possible.
Here is one solution:
Use OpenTBS to open the XLSX and save it with or without modifications.
OpenTBS is a pure PHP tools whose purpose is to merge data with LibreOffice and Ms Office documents using the technique of templates.
Since its version 1.9.0, OpenTBS can handle with XLSX workbooks made with LibreOffice. Such XLSX workbooks may have an extra row definition which contains a repetition attribute that extends it upon the maximum limit of row number. OpenTBS simply reduce this extra extended row into a single row.
OpenTBS download page
OpenTBS demo
OpenTBS doc
Related
TL;DR: Excel Workbook generated by Docx4J always says corrupted but I can't determine what Excel doesn't like about the underlying XML, let alone how to fix it.
My use case is as follows:
I am trying to produce an excel workbook with charts and graphs automatically on a regular basis. Only the raw data will change but everything else will dynamically update as the raw data is changed.
So I built an excel workbook which has a number of charts and graphs being generated by a sheet of raw data. I am using it as a template. All values of the raw data are numeric. The intent was to use Docx4J to read this 'template' and to populate the raw data sheet, then save it as a new file whereupon opening will initiate the recalculation and the charts and graphs will update. Since I am new to Docx4j, I basically decided to do baby steps by first seeing if I could open and read the contents of the cells; which I could. So far so good. I also could change the values of the cells but I could only verify this programatically by writing out to the console the location and value before a change, then the location and value after the change (ex. A1=45 followed by A1=55).
My problem starts when I try to open the resulting file. It generates, looks to be about the right size but Excel claims it is corrupted. It does try to recover what it can, but ultimately fails and the workbook won't even open. For troubleshooting, I opened up the generated xlsx and confirmed all the various XML files that make up an xlsx file were present and readable so I am concluding either something is missing or some part of the XML coming out the other side is not what Excel wants. Further troubleshooting involved creating an empty workbook (no data, 1 sheet) as my 'template', opening it and then saving it back to the file system with a different name and simply trying to see if I could open it in Excel but no dice. This has me ruling out anything to do with my attempts to write or add data to the sheet.
Relevant Environment Information:
'template' workbook is being generated on a Windows 10 64bit machine
My docx4j code is executing on a Debian 10 Linux machine running OpenJDK 11.0.4
My version of Excel both to create the 'template' and open the copy is Excel for Office365
I am running Docx4J v11.1.3 but I also tried with v8.1.5(both cases I had to use the Reference Implementation of JAXB to get around a marshalling error when trying to save)
I did see another post on Stackoverflow here about an issue related to fonts in Linux environments so I made sure to install the MS TT Corefonts but it didn't help my problem.
I ran the entire unzipped directory through BeyondCompare and there are some differences but I don't know which are just artifacts of the two different OS' or even which differences matter. Mostly they are:
small differences in file size
boolean values showing as "1", "yes", or "true" but not the same way for both files
namespaces and attributes in one file but not the other
Sheet1 from my blank workbook, before and after
All ideas are welcome.
Please try the just-released docx4j 8.1.6, which fixes handling of xlsx files created by recent releases of Excel. This was https://github.com/plutext/docx4j/issues/389
We are generating large XLSX documents (only data) and we have template XLSX (styles, image, etc...).
I know XLSX is just zip, you can extract them look what is inside.
It's possible somehow to copy styles & formatting from template XLSX file to generated XLSX document (copy xl/styles.xml file and zip that again is not enough). Excel complains the file is not ok, so I think there are some consistency checks?...
Thanks
We built a Python-based tool to copy XLSX styles from one sheet to another, allowing you to essentially template the visual design of a spreadsheet and apply it to new data.
https://github.com/Sydney-Informatics-Hub/copy_xlsx_styles may solve your needs.
Folks,
We have one requirement related to scalability for excel update.
e.g. We have Excel work book with 10 work sheets each is having huge amount of data.
Now what we want to do is replace one of the CELL in one of the sheet with 'NEW VALUE'
But with Apache POI we understood we have to load entire 'Work book' even if we have to modify data in one of the excel sheets. This is consuming huge memory and is not acceptable.
Is there any CELL level read write(immediate flush) facility or API for excel.
An XLSX file is actually a ZIP file containing multiple XML files.
You could extract the contents (but keep the folder structure intact), change only the XML file you want, compress it back, and replace the original.
How can i use the same macro in both the XLS and XLSX formats?
Is there anyhthing to be added in the XLSX format. i have written a small macro in XLSX (it has last column XFD) i want to use it in the XLS (it has last column as IV) .Is there anything to be changed?
This question is not about converting the file formats.
XLSX provides a broader set of formulas and also VBA functionality. Additionally, XLSX expands the row/column restriction that exists before Excel 2007 (from 256 columns to 16,384).
If your macros are written using up to column 16,384 (XFD), you're going to have to perform the calculations in chunks to make them fit within the 256 column (IV) restriction of Excel 2003. If you're using variables and not writing everything to a sheet, then there should be no problem (barring running out of memory).
Without more detail, it's difficult to tell whether you're using a set of functions from Excel 2007 that may not be supported in Excel 2003.
Which format was it written first? normally it should work in both formats of excel. As you are referring to last cell of both formats, it seems to have a code level issue than a version lavel.
I have to create a custom tag that uses POI to read in an excel file, modify some cells, and then write the modifications back to the same excel file. There are multiple sheets in the excel file. In some of the sheets, certain cells are locked, some cells have color coding, and some cells have drop down options. When I write the modifications back to the excel file, all these special styles must remain the same.
I noticed that I can read in an excel file as a Workbook by doing WorkbookFactory.create(). I can also read in as a HSSFWorkbook by doing new HSSFWorkbook(). My question is, which one should I use for what I want to accomplish?
Workbook is the common interface that applies for HSSF (.xls files) and XSSF (.xlsx files). By using Workbook instead of HSSFWorkbook, your code can work just the same for both file format.
From the Why Change? section on the POI website:
If you have existing HSSF usermodel code that works just fine, and you don't want to use the new OOXML XSSF support, then you probably don't need to. Your existing HSSF only code will continue to work just fine.
However, if you want to be able to work with both HSSF for your .xls files, and also XSSF for .xslx files, then you will need to make some slight tweaks to your code.
So, if you only need to support .xls, and only ever with, you can stick with HSSFWorkbook. However, if you want to work with both .xls and .xlsx, either now or in the future, use the common interfaces