I have a node project where I need node to create a new XLSM file using an existing XLSM file as a template. The template contains a great deal of styling, images, and VBA. The application simply inserts values into a few cells and saves the new file with a discreet name.
I have tried XLSX and ExcelJS npms to accomplish this. Both fail in different ways:
ExcelJS: Doesn't seem to support writing XLSM files at this time. Attempting to do so results in a corrupt file.
XLSX: I've not been able to create a true copy of the template. It's missing formatting and VBA, among other things. This is the very basic code I'm starting with:
const templatePath = "C:/Users/rapsputinforever/Desktop/template.xlsm"
const directory = "C:/Users/rapsputinforever/Desktop"
const workbook = XLSX.readFile(templatePath);
// will insert data to some cells here
XLSX.writeFile(workbook, directory + '/copy.xlsm');
I know this package has a variety of options which don't seem related to the issue I'm having and helpful toward accomplishing what seems, on the surface, a very simple task:
Read Template
Add Values to Cells
Write as New File keeping all VBA, styling, and et cetera
I'm willing to look into other packages, libraries, even other technologies. This tool is part of the back-end of a React app, however I'm not sure React can accomplish this. I'm open to any advice. I appreciate the help!
I successfully accomplished the issue by breaking the problem down to its constituent parts. Knowing about the nature of excel files, that they are zipped packages of XML files (source here) then it's a matter of step-wise doing the following steps:
Create a duplicate of the template XLSM file, copy has ".zip" as extension ('fs')
Un-package the zip file ('extract-zip')
read sharedStrings.XML ('fs')
Count the number of iterations of XML tag " < s i > " in sharedStrings
Example:
let stringCount = (sharedStrings.match(/<si>/g) || []).length;
read worksheet1.XML (or whatever sheet data is to be inserted)
Locate the cell by finding the tag for it. If empty the tag will resemble this:
Example:
<c r="D10" s="29"/>
Note, r = cell address, s = style tag, you want to maintain both in the next step...
Replace the empty cell tag with the shared string flag and the desired index of the new string to be inserted.
Example:
<c r="D9" s="29" t="s"><v>${stringCount}</v></c>
To summarize so far, we replace the cell XML tag with the insert string which is inserted into the sharedStrings file and referenced by the index/count of that new string. This can be interated in a loop to insert an array of values to be inserted.
Save both sharedFiles.XML and worksheet1.XML ('fs')
Package the un-zipped folder into a new archive ('archiver')
Re-name the archived folder with '.XLSM' extention ('fs')
Cleanup, kill any duplicate zip files/folders
If the work-up is correct, the sharedString index is accurate, if the style ID is maintained, and the correct script structure applied with all these asynchronous actions, the resulting file should have the desired results because the integrity of the containing VBA, Styles, Queries, Images, etc, was all maintained by creating a true copy and manipulating that copy's constituent parts.
I am a novice developer, no doubt my approach is overly lengthy and inefficient, and my understanding of why this works versus these other libraries don't rests solely upon my intuition. I believe the issue is that ExcelJS and other NPMs create a worksheet "in the buffer" which unfortunately only contains any element of the original file that that library accounts for based on XLSX files. If ExcelJS isn't looking at VBA then the new worksheet won't have VBA because when that sheet was duplicated in the buffer it only duplicated anything within its scope.
I am still very much open to more suggestions/alternatives/approaches/wisdom. I hope to fine-tune this further to be scalable: I only was able to design this for my very specific application. If I do manage to generalize this and clean up my code I will share the snippet here.
Thanks,
EDIT: Hello! As it turns out, things are not so simple! Despite not prompting errors on opening the file, the duplicate still has some background issues which is made evident if you happen to utilize Power Query to parse though said XLSM files. Additionally, any formula that references the cells filled by the node solution will not be updated on opening the file. To solve this, run this VBA:
Application.CalculateFullRebuild
This will update all formulae on every sheet. Once this file is saved the sheet should be "normal" again. The issue at hand is based on the XML file calcChain.XML.
Related
TL;DR: Excel Workbook generated by Docx4J always says corrupted but I can't determine what Excel doesn't like about the underlying XML, let alone how to fix it.
My use case is as follows:
I am trying to produce an excel workbook with charts and graphs automatically on a regular basis. Only the raw data will change but everything else will dynamically update as the raw data is changed.
So I built an excel workbook which has a number of charts and graphs being generated by a sheet of raw data. I am using it as a template. All values of the raw data are numeric. The intent was to use Docx4J to read this 'template' and to populate the raw data sheet, then save it as a new file whereupon opening will initiate the recalculation and the charts and graphs will update. Since I am new to Docx4j, I basically decided to do baby steps by first seeing if I could open and read the contents of the cells; which I could. So far so good. I also could change the values of the cells but I could only verify this programatically by writing out to the console the location and value before a change, then the location and value after the change (ex. A1=45 followed by A1=55).
My problem starts when I try to open the resulting file. It generates, looks to be about the right size but Excel claims it is corrupted. It does try to recover what it can, but ultimately fails and the workbook won't even open. For troubleshooting, I opened up the generated xlsx and confirmed all the various XML files that make up an xlsx file were present and readable so I am concluding either something is missing or some part of the XML coming out the other side is not what Excel wants. Further troubleshooting involved creating an empty workbook (no data, 1 sheet) as my 'template', opening it and then saving it back to the file system with a different name and simply trying to see if I could open it in Excel but no dice. This has me ruling out anything to do with my attempts to write or add data to the sheet.
Relevant Environment Information:
'template' workbook is being generated on a Windows 10 64bit machine
My docx4j code is executing on a Debian 10 Linux machine running OpenJDK 11.0.4
My version of Excel both to create the 'template' and open the copy is Excel for Office365
I am running Docx4J v11.1.3 but I also tried with v8.1.5(both cases I had to use the Reference Implementation of JAXB to get around a marshalling error when trying to save)
I did see another post on Stackoverflow here about an issue related to fonts in Linux environments so I made sure to install the MS TT Corefonts but it didn't help my problem.
I ran the entire unzipped directory through BeyondCompare and there are some differences but I don't know which are just artifacts of the two different OS' or even which differences matter. Mostly they are:
small differences in file size
boolean values showing as "1", "yes", or "true" but not the same way for both files
namespaces and attributes in one file but not the other
Sheet1 from my blank workbook, before and after
All ideas are welcome.
Please try the just-released docx4j 8.1.6, which fixes handling of xlsx files created by recent releases of Excel. This was https://github.com/plutext/docx4j/issues/389
I am having a folder of 10 excel-based CSV files. Is there any method to copy the data from all these files into 1 excel
Not good with VBA, so thought to ask you guys
On a first sight, I would go for the following approach (coming up with the codes is up to you, Google is your friend):
Get a list of all file names within that directory
Iterate over every item from the above list and open the file with Workbooks.Open(...)
Copy the whole content and paste it to the additional Excel you want to manage to hold the data of all files
Repeat the steps for each file
Remember to save the last row after every paste s.t. you can continue with amending the data into the addtional Excel instead of replacing the content.
I have about 10000 excel files, that in a specific cell of all of them there is a picture. I need a script to read all files and save the picture with the same name of the excel files in a folder.
Could you please help with that?
Thanks.
This method is based on a number of assumptions:
All the files (10000) are located in a know folder,
All files are named according to a paradigm that can be reproduced programmatically (if not, you can get the list of files within the folder, store the list within an array, and loop through the array),
Pictures are always within the same worksheet or, if in more than one, the names of the worksheets can be reproduced programmatically,
The filenames to be used to save the pictures can match (at least as a seed) the one of the Excel the pictures are extracted,
You will manage to write some basic VBA.
Note that for the VBA you have at least two options:
Write it within an EXCEL that will only serve as the extraction engine, or
Write it as a stand-alone file and run it via DOS commands.
The VBA logic:
Create the outer loop that processes a single file,
Within the outer loop, generate the name of a file to be open,
Open the file using Workbooks.Open VBA function,
Select the worksheet and the cell containing the picture,
Use the Workbook.SaveAs to save the picture (you will need to specify the type of file to be used, e.g. .bmp).
As a simple and very efficient tool to get the code (at least) partially generated by Excel, you can RECORD a MACRO for each action and then stop recording. You will see the code generated (you will need to access the VBA mode). You can copy-paste the generated code into your development (you might need to do some simple adaptations though).
That's it.
Hope you manage. Good luck!!
My spreadsheet has external links that I cannot find. It pops up the "workbook contains links to other data sources" warning upon opening. I don't want to just suppress the link warning, I need to remove the links.
I've tried all the basic ways to find external links that I'm aware of, and it's still happening. I've tried:
Searching for "[" in formulas in the entire workbook
Charts
Checking the named ranges from the Formulas/Name Manager menu
Checking objects
Conditional formatting menus
Is there another way to find external links? Thanks.
It can come from several sources. In my case, it came from the formula of a rule in the conditionnnal formatting. And no Excel search tool could find it.
In the case of an xlsx file, you can find it with an automatical approach:
In the Data tab, click on Edit links. All your links should be displayed. Mark down the values of the Location fields.
Unzip the xlsx file. Technically, an xlsx file is a zip container. See this post for more information.
Now search the whole directory for the Location strings.
Figure out to what your links are related. In my case, it was inside a x14:conditionalFormatting xml node. No wonder the Search tool did not work, it was not in a cell.
Modify that formula
Game over.
I would check the names collection in your workbook.
If you have a named range, for example, that has links to another workbook this will do this also... You can examine these in Excels Name Manager, or some VBA code executed in the debug window...like:
for i = 1 to names.Count: debug.Print Names(i): Next
I had an Excel 2013 file that whenever opened displayed a message regarding a missing external link. I could not find such a link (and location in file) using many suggestions and tools (Kutools, FormulaDEsk etc.).
Finally, I changed the file from xlsx to zip, opned and searched and deleted the gokder relating ton External Links (and changed back). Problem solved!
I'm currently writing a conversion function that takes data and creates an .xls file where part of the data becomes the sheet names.
My problem is, xlswrite automatically creates 3 default sheets with default names when it creates a new Excel file. Of course, these usually don't match the names in my data, so after my conversion is done, my Excel file looks almost fine, it simply has 3 leading sheets which are not supposed to be there.
Is there a way, without using ActiveX, to either stop xlswrite from creating those sheets in the first place, or delete them afterwards?
I just found out xlswrite actually uses AxtiveX internally, so the answer is
No, there is no way.
Just use ActiveX.
I made a copy of a template Excel file with a single named sheet from the program directory to the current directory, and then write to this file.
Use
fileparts(mfilename('fullpath'))
to get the path to the program file.