I am able to write a data frame to an excel file, using the RODBC package.
Now I would like to include some formulas, e.g. =A1, which upon opening the excel file are interpreted as such; including "=A1" as text in the data frame results in a string entry "=A1" in the excel file (the value shown in the formula bar is '=A1), and is not interpreted as a formula.
You need to write your file without quotes. When I constructed a tiny file named testcsv.csv with this as contents:
=B2, 2
... And then used the File/Open ... menu and used the dialog to open it after selecting All Files as the file type, the expected calculation occurs:
(This is on a Mac with Excel2011, <\strike> so Windoze might be different. <\strike>) Works the same on Excel 2007 running in WinXP.
You are better off using one of the CRAN packages interfacing xls file natively -- I had good luck with xlsx; others have reported success with xlsReadWrite
Another possible approach in windows would be to use the rcom package http://rcom.univie.ac.at/.
Related
I have a program which generates an Excel file. Specifically, it's a node app which generates a JSON file which is loaded into GrapeCity's SpreadJS and exported again via their ExcelIO libs. This file has a lot of formulae in it - at least a thousand of various forms built according to various rules from an input data set which is itself non-trivial. Whilst these files load file in SpreadJS and export in such a way that they load in Excel and appear to work, I get a number of errors from Excel when I try to load it:
Excel completed file level validation and repair. Some parts of this workbook may have been repaired or discarded.
Removed Records: Formula from /xl/worksheets/sheet1.xml part
Removed Records: Formula from /xl/worksheets/sheet2.xml part
Removed Records: Formula from /xl/worksheets/sheet3.xml part
After I initially posted this question, I eventually figured out that this was because the formulae in question were using single quotes for text strings rather than double. The question is - without playing guessing games, how could I identify which formulae Excel has removed / fixed? Excel's so-called log file is just a repetition of the equally unhelpful references.
Any of the following would count as good answers:
A way to get Excel to tell me the string of the formula which it has a problem with
A way to get Excel to tell me the cell reference (e.g. F5) of the formula which it has a problem with
An external tool which would do the same
A library or tool for validating Excel formulae which I could run on either the Excel file or the original input which would give me similar output. If it was an npm lib that would be even better
thank you for using SpreadJS.
Can you please share the original spreadjs file (ssjson) with our team, they can perform the export and figure out what might be causing this. In general exported files should not product file open errors, even if the formula is incorrect.
Grapecity Team
http://www.grapecity.com/spreadjs
https://github.com/LesterLyu/fast-formula-parser/ works for verifying formulae.
TL;DR: Excel Workbook generated by Docx4J always says corrupted but I can't determine what Excel doesn't like about the underlying XML, let alone how to fix it.
My use case is as follows:
I am trying to produce an excel workbook with charts and graphs automatically on a regular basis. Only the raw data will change but everything else will dynamically update as the raw data is changed.
So I built an excel workbook which has a number of charts and graphs being generated by a sheet of raw data. I am using it as a template. All values of the raw data are numeric. The intent was to use Docx4J to read this 'template' and to populate the raw data sheet, then save it as a new file whereupon opening will initiate the recalculation and the charts and graphs will update. Since I am new to Docx4j, I basically decided to do baby steps by first seeing if I could open and read the contents of the cells; which I could. So far so good. I also could change the values of the cells but I could only verify this programatically by writing out to the console the location and value before a change, then the location and value after the change (ex. A1=45 followed by A1=55).
My problem starts when I try to open the resulting file. It generates, looks to be about the right size but Excel claims it is corrupted. It does try to recover what it can, but ultimately fails and the workbook won't even open. For troubleshooting, I opened up the generated xlsx and confirmed all the various XML files that make up an xlsx file were present and readable so I am concluding either something is missing or some part of the XML coming out the other side is not what Excel wants. Further troubleshooting involved creating an empty workbook (no data, 1 sheet) as my 'template', opening it and then saving it back to the file system with a different name and simply trying to see if I could open it in Excel but no dice. This has me ruling out anything to do with my attempts to write or add data to the sheet.
Relevant Environment Information:
'template' workbook is being generated on a Windows 10 64bit machine
My docx4j code is executing on a Debian 10 Linux machine running OpenJDK 11.0.4
My version of Excel both to create the 'template' and open the copy is Excel for Office365
I am running Docx4J v11.1.3 but I also tried with v8.1.5(both cases I had to use the Reference Implementation of JAXB to get around a marshalling error when trying to save)
I did see another post on Stackoverflow here about an issue related to fonts in Linux environments so I made sure to install the MS TT Corefonts but it didn't help my problem.
I ran the entire unzipped directory through BeyondCompare and there are some differences but I don't know which are just artifacts of the two different OS' or even which differences matter. Mostly they are:
small differences in file size
boolean values showing as "1", "yes", or "true" but not the same way for both files
namespaces and attributes in one file but not the other
Sheet1 from my blank workbook, before and after
All ideas are welcome.
Please try the just-released docx4j 8.1.6, which fixes handling of xlsx files created by recent releases of Excel. This was https://github.com/plutext/docx4j/issues/389
I am writing some VBA macros in Excel 2010.
Source code is managed with TortoiseHG.
However I cannot find a smooth way of comparing 2 different commits, because the TortoiseHG only shows the complete excelfile and not the different VBA files I'm writing my code in.
If anyone have done this in a nice and smooth way I'd be very happy to get some ideas.
Also using an external tool like BeyondCompare is possible.
Thank you!
For the VBA Code:
For code comparison purposes I'd suggest exporting the VBA module(s) and checking in the .bas (regular module) and .cls (class module) files and checking those in next to the .xlsm.
Those exported VBA modules actually give you plain text versions of the code. Code in Worksheet / Workbook modules can also be exported as .cls files. This will allow TortoiseHG to compare the actual code instead of the Excel file itself.
For the Excel file comparison on itself:
For comparing the actual XLSM, you can consider saving the file, changing the extension to .zip and extracting the zips contents.
An Excel file for version 2007 and up is actually an archive containing XMLs that define the whole workbook and a .bin file for the VBA Project.
You might want to pull said xmls through XMLLint or a similar tool, since they're not pretty printed by default.
Side note:
You're asking for a nice and smooth way - I'd say there is none within Excel itself. You can use VBA to do these Module Exports on Workbook_BeforeClose() event, but you'll run into some security issues - By default you're not allowed to access the VBProject from within.
Of course there are some 3rd party tools for comparing VBA code without having to extract the modules, however "what to use" will be severely opinion based - BeyondCompare is indeed one option.
Hope this helps.
Beyond Compare has an add-on file format to compare VBA code in Excel files. Download the Microsoft Excel Workbooks VBA file format from the Additional File Formats for BC4 page.
You can export each module, and compare using Windows native fc command.
fc /N Module1 Module2 > Result.txt
Help:
fc /?
Compares two files or sets of files and displays the differences between
them
FC [/A] [/C] [/L] [/LBn] [/N] [/OFF[LINE]] [/T] [/U] [/W] [/nnnn]
[drive1:][path1]filename1 [drive2:][path2]filename2
FC /B [drive1:][path1]filename1 [drive2:][path2]filename2
/A Displays only first and last lines for each set of differences.
/B Performs a binary comparison.
/C Disregards the case of letters.
/L Compares files as ASCII text.
/LBn Sets the maximum consecutive mismatches to the specified
number of lines.
/N Displays the line numbers on an ASCII comparison.
/OFF[LINE] Do not skip files with offline attribute set.
/T Does not expand tabs to spaces.
/U Compare files as UNICODE text files.
/W Compresses white space (tabs and spaces) for comparison.
/nnnn Specifies the number of consecutive lines that must match
after a mismatch.
[drive1:][path1]filename1
Specifies the first file or set of files to compare.
[drive2:][path2]filename2
Specifies the second file or set of files to compare.
https://learn.microsoft.com/en-us/windows-server/administration/windows-commands/fc
When creating an Excel using the example CreateSimpleSpreadsheet.java.
The created spreadsheet shows correctly the sample data in MacOS Excel (Ver 15.30 - 170107)
but when opening in MacOS using Numbers (ver 3.6.2 - 2577) the sheet is empty.
However, when opening in Excel under MacOS and then saving the sheet new and opening the new file using Numbers, the sheet is correctly displayed.
What is the cause for this ?
I need the created xlsx file opens correctly in Numbers under MacOS also
Saving the XLSX in Excel 2010 adds a reference attribute to each cell, eg #r="A1". Try adding that to the first cell, and B1 to the second.
Excel also makes other changes which don't matter, including #r=1 on the row, and using shared strings (as opposed to inline strings). An easy way to see them is to use the Open XML SDK 2.0 Productivity Tool's compare files feature. But that's for Windows...
UPDATE NEXT MORNING
Adding the reference attribute to each cell, eg #r="A1", is all that's required for iCloud Numbers (as of time of writing) to display cell contents.
https://github.com/plutext/docx4j/commit/05c368caca7dd404f8d39f83346a7980b6cfdcf1 adds those.
In a perl script I am writing, I am trying to find a way to open an existing excel spreadsheet, change the name of the first worksheet, and save it. It would seem like a simple task but I haven't found a simple way to do it.
Spreadsheet::WriteExcel can easily change worksheet name, but it seems like it can't read in an existing excel file.
Another constraint is that the perl module I use shouldn't need installation. I can work around this if there's no good option, but it would make things more complicated.
Edit: I am using ActivePerl 5.18, so modules included in this are ideal.
The only way of doing this while preserving everything else in the Excel file is to use Win32::OLE.
That requires having Excel installed on the computer on which the program will be run, and, of course, only works on Windows.
If you can't do that, you will have to read the Excel file, and write out the contents to another file, changing the name of the worksheet in the process. Depending on exactly what you have in the source Excel file, this can get rather involved rather fast.
See also "How can I merge two Excel (xls) files in Perl or batch?" and "In Perl, how can I copy a subset of columns from an XLSX work sheet to another?"