I am using ColdFusion to export a fairly few number of rows (around 1000) but large number of columns (around 300) to Excel. It is a multi-sheet Excel file with at least two of the sheets having the large number of columns. Using cfspreadsheet throws a Java Heap Error. Updating JVM settings value shows no improvement. What is the best way to export to Excel without causing the Java Heap Error?
Edit: I have tried few ways to fix the issue within the program. I am using the xml Workbook within cfsavecontent to build the multiple sheets and render the result using cfcontent. In this case, cfcontent might be utilizing high amount of memory resulting in the heap space error.
<cfsavecontent variables="REQUEST.xmlData">
<cfoutput>
<xml version="1.0"?>
<?mso-application progid="Excel.sheet"?>
<Workbook>
...other contents
</Workbook>
</cfoutput>
</cfsavecontent>
For second workaround, I am using querynew to build the contents and dump the final result in an excel using <Cfspreadsheet action="write">. For subsequent sheets, I am using <cfspreadsheet action="update">. The ultimate goal is serve the excel using <cflocation url="excelPath">, but in this case, cfspreadsheet update is taking forever throwing out of memory error.
If updating jvm is not an option, what other ways do you suggest to implement to overcome the memory issues.
I'm a little late to the party...
From what I can tell, cfspreadsheet tries to materialize the entire file in memory before flushing to disk. With cfsavecontent you're doing this explicitly.
You're familiar with building Workbook XML, so all that your cfsavecontent approach needs is streaming to disk.
This can be accomplished by digging into the underlying Java libraries. java.io.FileWriter can append to a file without keeping the entire file in memory:
var append = true;
var writer = createobject("java", "java.io.FileWriter").init(filename, append);
try {
writer.append("<xml version=""1.0""?>\n<?mso-application progid=""Excel.sheet""?>\n<Workbook>\n");
// Build your workbook in chunks
// for (var row in query)
// writer.append(markup)
writer.append("</Workbook>");
} finally {
writer.close();
}
From testing, I believe FileWriter invokes flush regularly, so I've omitted it, but I can't find any documentation stating that's the case. I never saw memory usage get very high but YMMV.
Related
PROBLEM: I've hit a troubleshooting wall and hoping for suggestions on what to check to get past an issue I'm having with an internet site I'm working on. When reading data from a spreadsheet using NPOI (C#), sometimes (not all the time) the row reading stops after just ten rows.
Sorry for the very long post but not sure what is/isn't useful. Primary reason for posting here is that I don't know the right question to ask the Great Google Machine.
I have an intranet site where I'm reading an XLSX file and pushing its contents into an Oracle table. As you can tell by the subject line, I'm using NPOI. For the most part, it's just working, but only sometimes...
In Oracle, I have a staging table, which is truncated and is supposed to be filled with data from the spreadsheet.
In my app (ASPX), users upload their spreadsheet to the server (this just works), then the app calls a WebMethod that truncates data from the Oracle staging table (this just works), then another WebMethod is called that is supposed to read data from the spreadsheet and load the staging table (this, kinda works).
It's this "kinda works" piece is what I need help with.
The spreadsheet has 170 data rows. When I run the app in VS, it reads/writes all 170 records most of the time but sometimes it reads just 10 records. When I run the app from the web server, the first time it fails (haven't been able to catch a specific error), the second and subsequent times, it reads just ten records from the spreadsheet and successfully loads all ten. I've checked the file uploaded to the server and it does have 170 data records.
Whether the process reads 10 records or 170 records, there are no error messages and no indication why it stopped reading after just ten. (I'll mention here that the file today has 170 but tomorrow could have 180 or 162, so it's not fixed).
So, I've described what it's supposed to do and what it's actually doing. I think it's time for code snippet.
/* snowSource below is the path/filename assembled separately */
/* SnowExcelFormat below is a class that basically maps row data with a specific data class */
IWorkbook workbook;
try
{
using (FileStream file = new FileStream(snowSource, FileMode.Open, FileAccess.Read, FileShare.Read))
{
workbook = WorkbookFactory.Create(file);
}
var importer = new Mapper(workbook);
var items = importer.Take<SnowExcelFormat>(0);
/* at this point, item should have 170 rows but sometimes it contains only 10 with no indication why */
/* I don't see anything in the workbook or importer objects that sheds any light on what's happening. */
Again, this works perfectly fine most of the time when running from VS. That tells me this is workable code. When running this on the web server, it fails the first time I try the process but subsequently it runs but only picking up that first 10 records, ignoring the rest. Also, all the data that's read (10 or 170) is successfully inserted into the staging table, which tells me that Oracle is perfectly okay with the data, its format, and this process. All I need is to figure out why my code doesn't read all the data from Excel.
I have verified numerous times that the local DLL and webserver DLL are the same. And I'm reading the same Excel file.
I'm hitting a serious wall here and have run out of ideas on how to troubleshoot where the code is failing, when it fails. I don't know if there's something limiting memory available to the FileStream object causing it to stop reading the file prematurely - and didn't run across anything that looked like a resource limiter. I don't know if there's something limiting the number of rows pulled by the importer.Take method. Any suggestions would be appreciated.
I faced the same issue on some files and after analyzing the problem, this is what worked for me.
importer.Take<SnowExcelFormat>(0) have three parameters and one of them is maxErrorRows. Its default value is 10.
Your parsing stop when facing more than 10 errors then the function stop reading.
What you have to do is to set maxErrorRows instead of taking default value that is 10.
I need to parse big Excel spreadsheet (approximately 20 sheets) sheet by sheet with ColdFusion. cfspreadsheet tag fails when processing large amount of data with [java.lang.OutOfMemoryError: GC overhead limit exceeded]. Apache POI User API directly behaves the same way:
<cfscript>
pkg = CreateObject("java", "org.apache.poi.openxml4j.opc.OPCPackage").open(JavaCast("string", fileName));
// error on next line
wb = CreateObject("java", "org.apache.poi.xssf.usermodel.XSSFWorkbook").Init(pkg);
</cfscript>
I tried to use Apache POI event API instead of User API but faced problems with java inheritance. Has anyone ever used XSSF and SAX (Event API) for big spreadsheets processing in ColdFusion?
After all I succeeded in using CF + Apache POI Event API + Mark Mandel's JavaLoader.cfc, Thank you #Leigh, #barnyr for all your help. I implemented excel parser in java using XSSF and SAX Event API, now it works and works very fast. This wasn't easy due template to parse wasn't simple and as were denoted in comments increasing heap size may be cheaper.
I'm processing a data set and running into a problem - although I xlswrite all the relevant output variables to a big Excel file that is timestamped, I don't save the code that actually generated that result. So if I try to recreate a certain set of results, I can't do it without relying on memory (which is obviously not a good plan). I'd like to know if there's a command(s) that will help me save the m-files used to generate the output Excel file, as well as the Excel file itself, in a folder I can name and timestamp so I don't have to do this manually.
In my perfect world I would run the master code file that calls 4 or 5 other function m-files, then all those m-files would be saved along with the Excel output to a folder names results_YYYYMMDDTIME. Does this functionality exist? I can't seem to find it.
There's no such functionality built in.
You could build a dependency tree of your main function by using depfun with mfilename.
depfun(mfilename()) will return a list of all functions/m-files that are called by the currently executing m-file.
This will include all files that come as MATLAB builtins, you might want to remove those (and only record the MATLAB version in your excel sheet).
As pseudocode:
% get all files:
dependencies = depfun(mfilename());
for all dependencies:
if not a matlab-builtin:
copyfile(dependency, your_folder)
As a "long term" solution you might want to check if using a version control system like subversion, mercurial (or one of many others) would be applicable in your case.
In larger projects this is preferred way to record the version of source code used to produce a certain result.
I have generated an excel file from xml. But i can not open it with Excel. Excel gives the following error opening it:
Problems came up in the following areas during load:
Table
Then it shows a message that the log file corresponding the error can be found at : C:/Documents and Setting/myUserName/Local Settings/Temporary Internet Files/Content.MSO/xxxxx.log
But i can not find Content.MSO folder in my windows. I checked folder settings and made all folders visible but i still can not access this folder. So that i can not analise the log file.
how could i find the generated log file?
I found the problem without analising the log file. i stil can not access the log file in temporary internet files. But i realised that i put a string(non-number) characters on a number-styled cell in Excel xml. So if you having the similar issues about your Excel file generated from xml, then have a look at if your cell values are appopriate with your cell data type.
If you type or paste the path of the log file into Explorer or your text editor of choice, you may find that the folder does exist, despite being invisible.
In my case it was a <Row> with an incorrect ss:Index
I was using a template and the last row had a fixed Index=100. If the number of rows I added exceeded 100, this last row had a wrong index and excel threw the error without any other message or log (MacOSX, Excel 15.25.1). I wish they printed more informative error messages, what a waste of our time.
Excel 2016. My error message was "Worksheet Settings". Path was pointing to non-existing file.
My cause of the problem was ExpandedRowCount not big enough for number of rows in Worksheet. If you add rows in XML directly (i.e. on a machine where Excel is not installed), make sure to increment number of rows in ExpandedRowCount.
yes.Even i too faced the same problem and problem was with the data type of cells ofexcel generated using xslt
In addition to checking the data being used vs "Type" assigned, make sure that the list of characters that need to be encoded for XML are indeed encoded.
I had a system that appeared to be working, but then some user data including & and < was throwing this error.
If you're not sure what's going on with your file, try http://www.xmlvalidation.com/ - that helped be spot the issue in a large file immediately.
I used this function to fix it, modified from this post:
function xmlsafe($s) {
return str_replace(array('&','>','<','"'), array('&','>','<','"'), $s);
}
and then run echo xmlsafe($myvalue) where you were just echoing $myvalue in your script.
This seems to be more appropriate for XML than htmlentities() or other options built into PHP.
I had the same issue, and the answer was - type of Cell was Number and some values doesn't converts to this type on my backend.
I had the SAME problem,
and its because de file is TOO BIG.
I try an extract from SAP, more little than the one with that make the error) and save it in XML file. and it WORK, no more error.
so maybe if you can save in 2 Excel files XML instead of 1 it will be good ;)
ALicia
I have coded a UI import tool that will scan a bunch of folders, locate all XML files in them, load them to do a first basic check on validity, then try to import them in the DB (which causes another even bigger bunch of validity checks to run).
When the basic checks or the import fails, the app shows a detailed error message to the user, so the user can open up the respective XML file and edit it.
BUT... the user CANNOT save the file, because the file "is in use by another process".
At that stage, all my importer objects are long gone, but I figured they might not be garbage collected yet, so they keep open handles to the XML files. So I tried a GC.Collect() after the checking/import process and then magically the user can edit and save the XML files.
All my code ever does with the XML files is this:
XmlReader reader = XmlReader.Create(m_xmlInputFile);
m_XmlDocument = new XmlDocument();
m_XmlDocument.Load(reader);
'reader' is a local variable, so it goes out of scope immediately, m_XmlDocument is a member variable that will live as long as the importer object is alive. The importer object is a local variable in another function, so everything should end up in death row after all is said and done. Still it looks like waiting on death row might take a while...
Not that it matters much in my case, but just out of curiosity I would like to know if there is something I could do (apart from forcing a GC) to "free" the XML files on disk, so that the user can do his/her editing without surprises.
Thanks
XmlReader implements IDisposable, and you're not holding up your end of the contract.
Either call Dispose on it at an appropriate time, or (better) surround the code that uses it in a using block:
using(XmlReader reader = XmlReader.Create(m_xmlInputFile))
{
m_XmlDocument = new XmlDocument();
m_XmlDocument.Load(reader);
}
If you ever find yourself forcing a garbage collection, you're doing something wrong (to within 99.99% certainty).
Nothing magical happens when a reference goes out of scope - yes, the object it refers to will become eligible for garbage collection (if that was the last remaining reference to the object), but no extra code will run.
Whereas, if the object holds resources, and ought to be cleaned up as soon as possible, it will implement the disposable pattern