Docx4J Generated XLSX file is always corrupt - excel

TL;DR: Excel Workbook generated by Docx4J always says corrupted but I can't determine what Excel doesn't like about the underlying XML, let alone how to fix it.
My use case is as follows:
I am trying to produce an excel workbook with charts and graphs automatically on a regular basis. Only the raw data will change but everything else will dynamically update as the raw data is changed.
So I built an excel workbook which has a number of charts and graphs being generated by a sheet of raw data. I am using it as a template. All values of the raw data are numeric. The intent was to use Docx4J to read this 'template' and to populate the raw data sheet, then save it as a new file whereupon opening will initiate the recalculation and the charts and graphs will update. Since I am new to Docx4j, I basically decided to do baby steps by first seeing if I could open and read the contents of the cells; which I could. So far so good. I also could change the values of the cells but I could only verify this programatically by writing out to the console the location and value before a change, then the location and value after the change (ex. A1=45 followed by A1=55).
My problem starts when I try to open the resulting file. It generates, looks to be about the right size but Excel claims it is corrupted. It does try to recover what it can, but ultimately fails and the workbook won't even open. For troubleshooting, I opened up the generated xlsx and confirmed all the various XML files that make up an xlsx file were present and readable so I am concluding either something is missing or some part of the XML coming out the other side is not what Excel wants. Further troubleshooting involved creating an empty workbook (no data, 1 sheet) as my 'template', opening it and then saving it back to the file system with a different name and simply trying to see if I could open it in Excel but no dice. This has me ruling out anything to do with my attempts to write or add data to the sheet.
Relevant Environment Information:
'template' workbook is being generated on a Windows 10 64bit machine
My docx4j code is executing on a Debian 10 Linux machine running OpenJDK 11.0.4
My version of Excel both to create the 'template' and open the copy is Excel for Office365
I am running Docx4J v11.1.3 but I also tried with v8.1.5(both cases I had to use the Reference Implementation of JAXB to get around a marshalling error when trying to save)
I did see another post on Stackoverflow here about an issue related to fonts in Linux environments so I made sure to install the MS TT Corefonts but it didn't help my problem.
I ran the entire unzipped directory through BeyondCompare and there are some differences but I don't know which are just artifacts of the two different OS' or even which differences matter. Mostly they are:
small differences in file size
boolean values showing as "1", "yes", or "true" but not the same way for both files
namespaces and attributes in one file but not the other
Sheet1 from my blank workbook, before and after
All ideas are welcome.

Please try the just-released docx4j 8.1.6, which fixes handling of xlsx files created by recent releases of Excel. This was https://github.com/plutext/docx4j/issues/389

Related

Microsoft Excel keeps repairing my .xlsm file for no apparent reason and eliminates data validations on a sheet

I recently created an automated Excel utility (using Microsoft Office 2019), in which I've extensively used data validations, VBA code, named ranges and formatting. It was working well until one day I received an Excel prompt message that read:
When I click on Yes, it gives me another pop-up where it says it recovered the file, and also gives me a link to the error log XML file. I click on it and open the .xml file using my default browser, and it shows the following details:
Looks like it is removing data validations from a particular sheet, and I realize that is true when I navigate to that sheet in the UI. To work around this unwarranted and repeated data-validation removal that Excel application is enforcing, I created a macro code that will re-instate all these data validations as required. The real problem arises when this Excel file is opened on a different computer with Microsoft Office 365. Looks like it is removing not just data-validations but also other components like named ranges and buttons. There could be other things that it might be removing, which I am unaware of at the moment. So the macro created to re-instate the data-validations is no longer useful.
Why does this problem arise? And why is different version of Excel behaving differently? How do I solve this? Appreciate your kind help. Thank you!
As rightly suggested by Ron Rosenfeld and e_conomics, the issue was with the data validation lists, whose sources were strings of comma separated values that were going beyond 255 characters. Apparently, that is a limitation with Excel.
When I replaced the sources of data validation lists (string of comma separated values) with the ranges containing the corresponding values, the problem resolved itself. The repair dialogue never appeared again.

How to know which formulae were recovered / corrupted in an Excel file

I have a program which generates an Excel file. Specifically, it's a node app which generates a JSON file which is loaded into GrapeCity's SpreadJS and exported again via their ExcelIO libs. This file has a lot of formulae in it - at least a thousand of various forms built according to various rules from an input data set which is itself non-trivial. Whilst these files load file in SpreadJS and export in such a way that they load in Excel and appear to work, I get a number of errors from Excel when I try to load it:
Excel completed file level validation and repair. Some parts of this workbook may have been repaired or discarded.
Removed Records: Formula from /xl/worksheets/sheet1.xml part
Removed Records: Formula from /xl/worksheets/sheet2.xml part
Removed Records: Formula from /xl/worksheets/sheet3.xml part
After I initially posted this question, I eventually figured out that this was because the formulae in question were using single quotes for text strings rather than double. The question is - without playing guessing games, how could I identify which formulae Excel has removed / fixed? Excel's so-called log file is just a repetition of the equally unhelpful references.
Any of the following would count as good answers:
A way to get Excel to tell me the string of the formula which it has a problem with
A way to get Excel to tell me the cell reference (e.g. F5) of the formula which it has a problem with
An external tool which would do the same
A library or tool for validating Excel formulae which I could run on either the Excel file or the original input which would give me similar output. If it was an npm lib that would be even better
thank you for using SpreadJS.
Can you please share the original spreadjs file (ssjson) with our team, they can perform the export and figure out what might be causing this. In general exported files should not product file open errors, even if the formula is incorrect.
Grapecity Team
http://www.grapecity.com/spreadjs
https://github.com/LesterLyu/fast-formula-parser/ works for verifying formulae.

Excel telling me it can't access a file I don't want it to access

I'm working on a long project and there are many betas with overlapping functionality. I keep raising the number of the beta after making substantial changes.
This file imports a bunch of data from spreadsheets which are exported feeds from a SQL database. It does this with no problem and that part of it has been working for months - importing some raw data sheets and using both cell references and VB to look at it different ways, and I set up a checklist page where the end user can go through certain records one by one and classify them.
All of this works fine and I really only made a few changes today, just finishing off little bits of code that take some data from one sheet and rearrange it in another to export. It was all working with no errors and I copied the file into a new folder (which I have been doing with these templates for months), imported my data (test data in this case), and tried to run the function that copies some info from one sheet onto an another.
Then I get an error from Excel that says "we can't connect to 'https://...my.sharepoint../BETAV9_8_ItemAccountingTEMPLATEetcetc. Please make sure you're using correct web address."
After that I click OK and then I get a second message that says:
Microsoft Excel cannot access the file 'same file' There are several possible reasons:
The file name or path does not exist
The file is being used by another program
The workbook you are trying to save has the same name as a currently open workbook
This file they're telling me they can't find is like 3 betas ago and I've been using subsequent versions without getting this error. That file isn't in the current working folder, nor in the folder from which I copied it originally.
So I'm thinking I have something in my file where there's a reference to an external file, so I went through this process: https://support.microsoft.com/en-us/office/find-links-external-references-in-a-workbook-fcbf4576-3aab-4029-ba25-54313a532ff1?ui=en-us&rs=en-us&ad=us
I don't know where to begin. There is only one thing in the Name Manager (a table that I created on one of these pages with VBA so I can sort it a couple different ways) and the button that says "Delete" is grayed out. There are several references in cells to other files - but not the BETAv9_8 it's telling me it can't find. They are simply file paths/names of the files the user imported - not hidden somewhere in code, just listed on a sheet so the user can check what data she imported.
After going through all these recommended steps I still can't find anywhere in my program that mentions BETAv9_8. In the meantime I'm getting an error for a thing that worked just fine a couple hours ago.
Is this a bug? Is there something I can do to fix it?
Thanks in advance

Reset Default Workbook Open options

I use VBA in Excel to pull data from different sources (mostly .csv and .xls/.xlsx files) and paste them into my data tables (in the same Excel File I have a data table for each specific data source).
Each of those files comes with different settings. I have created an specific VBA Macro for each of my data sources to process, remove and copy the relevant information of each individual file, and then I call all of the Macros from another Macro. The problem I'm having is that for one of the data sources, when using the Workbooks.Open method, I have had to set the parameter Format to "Nothing" (Format:=5). But this affects then the subsequent macros and therefore the following files are not processed correctly.
I know I have two possibilities: Either I call this macro at the end, after I've processed all the other files or; I set the Format parameter in all of my Macros to the one specific for each of the files configuration. However there must be a way to simply reset the delimiter to the default one used in my Regional Settings. Does anyone knows a solution?
Sorry if there's already a thread with this issue but I've tried looking for it and didn't find any.
Thank you in advance.

Excel will not update links, entire day of research and tests with random results

Short version, my Excel is set properly to automatically update links and all my files(locally stored) work fine for years. Suddenly one will not update linked data. I click on Connections>Edit connections>Check Status every linked file has "Warning! Values referring to other workbooks were not updated"
Refresh/calculate all does nothing.Changing to manual and doing this, back to auto, open and closing, restarting, using these same files on another PC. Nothing I did fixes it.
Clicking into an individual cell(F2) then back out though updates that one cell.
Open
All security settings are correct I am 98% sure, regardless whatever settings I had haven't changed and it did work.
I read a post that seemed exactly the same but his solution was enable protected content. Not the case here, i disabled it fully. There seems to be an error causing this possibly..
Long version. This is my largest file I continue to build
I have a main excel sheet that is linked to 35 workbooks. The source workbooks have lists of 3-4 columns, ranging from 1,000-10,000 rows long. The main WB uses index match for each source to pull two small fields. It takes about 5-6mins to do a calculate all with a desktop i7 3.64ghz ivy with 16gb ram.I never have issues on. Win 10/64bit and office 2016 64bit.
Some source files are .xls, I am in the process of changing them to xlsx but when I open the xls file and the values update. I then save as xlsx with a shorter name as well (trying to lighten the formulas) I go back to Connections>Edit connections>Check Status and the same warning is there. However I can click update values and it says OK.
This is very important file and is not physically monitored. Until something sells wrong I realize it wasnt updated, I am hoping for a concrete answer I can solve instead of just changing random things and hoping.. all help is greatly appreciated!
I actually resolved this by opening all source workbooks with the main workbook open and resaving them one by one (Status on some turned OK, most remained). Then I simply used Edit Links and Update Values for each connection which changed the status to OK. Saved the main workbook and no more issues..
Yes I need to take those suggestions and the workbook to the next level as it has clearly outgrown my skillset, but getting there..

Resources