I am writing an Excel File using Apache POI.
I want to write in it all the data of myResultSet
which has the fieldnames(columns) stored in the String[] fieldnames.
I have 70000 rows and 27 columns
My Code:
String xlsFilename = "myXLSX.xlsx";
org.apache.poi.ss.usermodel.Workbook myWorkbook = new XSSFWorkbook();
org.apache.poi.ss.usermodel.Sheet mySheet = myWorkbook.createSheet("myXLSX");
Row currentRow = mySheet.createRow(0);
for (int k = 0; k < fieldNames.length; k++) {
// Add Cells Of Title Of ResultsTable In Excel File
currentRow.createCell(k).setCellValue(fieldNames[k]);
}
for (int j = 0; j < countOfResultSetRows; j++) {
myResultSet.next();
currentRow = mySheet.createRow(j + 1);
for (int k = 0; k < fieldNames.length; k++) {
currentRow.createCell(k).setCellValue(myResultSet.getString(fieldNames[k]));
System.out.println("Processing Row " + j);
}
}
FileOutputStream myFileOutputStream = new FileOutputStream(xlsFilename);
myWorkbook.write(myFileOutputStream);
myFileOutputStream.close();
My problem is that while writing the rows the program is getting slower and slower.
When it reaches row 3500 it stops with the Exception:
Exception in thread "Thread-3" java.lang.OutOfMemoryError: Java heap space
at java.lang.AbstractStringBuilder.(AbstractStringBuilder.java:45)
at java.lang.StringBuffer.(StringBuffer.java:79)
It seems I'm out of memory.
How can I solve this.
Is there a way to store my data to a temporary file every 1000 of them (for example)?
What would you suggest?
I had the same problem using jxl and never solve it either (JAVA - Out Of Memory Error while writing Excel Cells in jxl)
Now I need xlsx files anyway, so I have to use POI.
There seems to be an approach which creates data file in XML format first and then replacing that XML with existing template xlsx file.
http://svn.apache.org/repos/asf/poi/trunk/src/examples/src/org/apache/poi/xssf/usermodel/examples/BigGridDemo.java
this is not applicable for xls format files though.
How about allowing your app to use more memory (like -Xmx500m for 500 MB)?
Assign more memory to the heap when running your program:
$ java -Xms256m -Xmx1024m NameOfYourClass
I've been there more than once.
Are you running this running on top of an application server?
What I've done in the past as was mentioned by Pablo, is to increase the heap space, but make sure that it is being increased for the application server that you are running on.
I have also had to really optimize the code when doing this.
Since you are outputting to a .xlsx file, XML takes quite a bit of memory. Not sure if it would work for you in this situation or not, but if you can create a normal .xls do that and than convert it at the end into a .xlsx file (using Apache POI of course).
Use SXSSFWorkbook instead of XSSFWorkbook, this is used for streaming User model Api
Source: https://coderanch.com/t/612234/Write-Huge-Excel-file-Xlsx
Hopefully this will help you.
Related
I have several text files with 2 columns, containing only numbers which i would like to import into a single Excel spreadsheet (Excel 2016) using matlab. Reason for using matlab (R2014a) is because i later have scripts that process the data as well as its the only progaming language i am mildly familiar with.
I tried to use the following
Using matlab to save a 100 text files to a one excel file but into different spread sheet?
but i just couldnt understand anything as I am a newbie and this example I think is for making several excel files while I want only one. Thanks for the help! Greatly appreciated.
content = dir();
col = 1;
for i = 1:10
if content(i).isdir ~= 1
fileID = fopen('AN050ME.ASC');
data = textscan(fileID, '%s %s');
fclose(fileID);
datum(:, col) = data{1};
col = col + 1;
datum(:, col) = data{2};
col = col + 1;
clear data;
end
end
filename = 'Datum.xls';
sheet=1;
xlswrite(filename, datum, sheet, 'A1');
close all;
This is basic working algorithm, you need to further work on it to optimize it for speeed
Hints:
1. pre-declare the size of datum, based of number of files.
2. if all the files you have to read are of same extension, read only
them through dir()
Good luck for fine tuning
im trying use for loop on this to loop from datasheet, but its only reading one row, dont know where i did wrong any Ideas?
import org.apache.poi.xssf.usermodel.*;
import org.apache.poi.ss.usermodel.DataFormatter
cellDataFormatter = new DataFormatter()
//Create formula evaluator
fEval = new XSSFFormulaEvaluator(context.srcWkSheet.getWorkbook())
//Increment the rowcounter then read in the next row of items
RC = context.rowCounter;
if(RC<=context.srcWkSheet.getLastRowNum()){//Check if we've reached the last row
for(int i =0; i < RC; i++)
{
curTC = testRunner.testCase
sourceRow = context.srcWkSheet.getRow(i)//Get a spreadsheet row
//Step through cells in the row and populate property data
data1Cell = sourceRow.getCell(0)
curTC.setPropertyValue("data1",cellDataFormatter.formatCellValue(data1Cell ,fEval))
data2Cell = sourceRow.getCell(1)
curTC.setPropertyValue("data2",cellDataFormatter.formatCellValue(data2Cell ,fEval))
data3Cell = sourceRow.getCell(2)
curTC.setPropertyValue("data3",cellDataFormatter.formatCellValue(data3Cell ,fEval))
//Rename test cases for readability in the TestSuite log
curTC.getTestStepAt(0).setName("data1-" + curTC.getPropertyValue("BC"))
//Go back to first test request with newly copied properties
testRunner.gotoStep(0)
}
}
From the API documentation for testRunner.gotoStep(0):
Transfers execution of this TestRunner to the TestStep with the specified index in the TestCase
Execution will continue after the indexed step. You are probably expecting it will return back to your loop, which is incorrect!
You probably meant something like: curTC.getTestStepAt(0).run(context.testRunner, context); API documentation.
You could also have an issue with the excel file you are providing. XSSFFormulaEvaluator I believe is only for old style *.xls excel format. Could be an issue if you're feeding *.xlsx format excel file.
In SoapUI NG Pro there's a DataSource test step that simply allows you to point to a file (xls or xlsx) and feed in the data
http://www.soapui.org/data-driven-tests/functional-tests.html
I have a complex object(tree structure) which I am flattening it out into a datatable to display it on an excel sheet. Datatable is huge and has around 20000 rows and 10000 columns.
Writing the data onto an excel cell one at a time took forever. So, I am converting the complex object into a datatable and then writing it to the excel sheet using the code below.
Is it possible to write 20K rows x 10K columns data to an excel sheet fairly quickly in less than a minute or < 5 minutes? What is the best technique to complete this task fast.
Environment: Visual studio 2010, VSTO excel workbook project, .net framework 4.0, excel 2010/2007
EDIT:
Original source of data is a rest service response in json format. I am then deserializing json response into c# objects and finally flattening it into a datatable.
Using this Code to write datatable to an excel sheet:
Excel.Range oRange;
var oSheet = Globals.Sheet3;
int rowCount = 1;
foreach (DataRow dr in resultsDataTable.Rows)
{
rowCount += 1;
for (int i = 1; i < resultsDataTable.Columns.Count + 1; i++)
{
// Add the header the first time through
if (rowCount == 2)
{
oSheet.Cells[1, i] = resultsDataTable.Columns[i - 1].ColumnName;
}
oSheet.Cells[rowCount, i] = dr[i - 1].ToString();
}
}
// Resize the columns
oRange = oSheet.get_Range(oSheet.Cells[1, 1],
oSheet.Cells[rowCount, resultsDataTable.Columns.Count]);
oRange.EntireColumn.AutoFit();
Final Solution:
Used a 2D Object array instead of datatable and wrote it to the range.
In addition to freezing Excel's animation, you can, given the data source this is coming from, save yourself the looping through the Excel.Range object, which is bound to be a bottleneck, by instead of writing to a Datatable, write to a string[,], which Excel can use to write to a Range at once. Looping through a string[,] is much faster than looping through Excel cells.
string[,] importString = new string[yourJsonSource.Rows.Count, yourJsonSource.Columns.Count];
//populate the string[,] however you can
for (int r = 0; r < yourJsonSource.Rows.Count; r++)
{
for (int c = 0; c < yourJsonSource.Columns.Count; c++)
{
importString[r, c] = yourJsonSource[r][c].ToString();
}
}
var oSheet = Globals.Sheet3;
Excel.Range oRange = oSheet.get_Range(oSheet.Cells[1, 1],
oSheet.Cells[yourJsonSource.Rows.Count, yourJsonSource.Columns.Count]);
oRange.Value = importString;
I can't speak about using a datatable for the job, but if you want to use Interop, you definitely want to avoid writing cell by cell. Instead, create a 2-d array, and write it at once to a range, which will give you a very significant performance improvement.
Another option you should consider is avoiding interop altogether, and using OpenXML. If you are working with Excel 2007 or above, this is typically a better approach to manipulate files.
VSTO is always gonna take its time, the best tip I can share with you is disable sheet refresh when you populate data, one way to do this is pop up a "Modal" progress dialog box and refresh your sheet in background, this will give you 50-70% better performance. Another thing you can do is update VS to sp1, it helps.
Currently I have an application that takes information from a SQLite database and puts it to Excel. However, I'm having to take each DataRow, iterate through each item, and put each value into it's own cell and determine highlighting. What this is causing is 20 minutes to export a 9000 record file into Excel. I'm sure it can be done quicker than that. My thoughts are that I could use a data source to fill the Excel Range and then use the column headers and row numbers to format only those rows that need to be formatted. However, when I look online, no matter what I seem to type, it always shows examples of using Excel as a database, nothing about importing into excel. Unless I'm forgetting a key word or to. Now, this function has to be done in code as it's part of a bigger application. Otherwise I would just have Excel connect to the DB and pull the information itself. Unfortunately that's not the case. Any information that could assist me in quick loading an excel sheet would be appreciated. Thanks.Additional Information:Another reason why the pulling of the information from the DB has to be done in code is that not every computer this is loaded on will have Excel on it. The person using the application may simply be told to export the data and email it to their supervisor. The setup app includes the needed dlls for the application to make the proper format.Example Code (Current):
For Each strTemp In strColumns
excelRange = worksheet.Cells(1, nCounter)
excelRange.Select()
excelRange.Value2 = strTemp
excelRange.Interior.Color = System.Drawing.Color.Gray.ToArgb()
excelRange.BorderAround(Excel.XlLineStyle.xlContinuous, Excel.XlBorderWeight.xlThin, Excel.XlColorIndex.xlColorIndexAutomatic, Type.Missing)
nCounter += 1
Next
Now, this is only example code in terms of the iteration I'm doing. Where I'm really processing the information from the database I'm iterating through a dataTable's Rows, then iterating through the items in the dataRow and doing essentially the same as above; value by value, selecting the range and putting the value in the cell, formatting the cell if it's part of a report (not always gray), and moving onto the next set of data. What I'd like to do is put all of the data in the excel sheet (A2:??, not a row, but multiple rows) then iterate through the reports and format each row then. That way, the only time I iterate through all of the records is when every record is part of a report.
Ideal Code
excelRange = worksheet.Cells("A2", "P9000")
excelRange.DataSource = ds 'ds would be a queried dataSet, and I know there is no excelRange.DataSource.
'Iteration code to format cells
Update:
I know my examples were in VB, but it's because I was also trying to write a VB version of the application since my boss prefers VB. However, here's my final code using a Recordset. The ConvertToRecordset function was obtained from here.
private void CreatePartSheet(Excel.Worksheet excelWorksheet)
{
_dataFactory.RevertDatabase();
excelWorksheet.Name = "Part Sheet";
string[] strColumns = Constants.strExcelPartHeaders;
CreateSheetHeader(excelWorksheet, strColumns);
System.Drawing.Color clrPink = System.Drawing.Color.FromArgb(203, 192, 255);
System.Drawing.Color clrGreen = System.Drawing.Color.FromArgb(100, 225, 137);
string[] strValuesAndTitles = {/*...Column Names...*/};
List<string> lstColumns = strValuesAndTitles.ToList<string>();
System.Data.DataSet ds = _dataFactory.GetDataSet(Queries.strExport);
ADODB.Recordset rs = ConvertToRecordset(ds.Tables[0]);
excelRange = excelWorksheet.get_Range("A2", "ZZ" + rs.RecordCount.ToString());
excelRange.Cells.CopyFromRecordset(rs, rs.RecordCount, rs.Fields.Count);
int nFieldCount = rs.Fields.Count;
for (int nCounter = 0; nCounter < rs.RecordCount; nCounter++)
{
int nRowCounter = nCounter + 2;
List<ReportRecord> rrPartReports = _lstReports.FindAll(rr => rr.PartID == nCounter).ToList<ReportRecord>();
excelRange = (Excel.Range)excelWorksheet.get_Range("A" + nRowCounter.ToString(), "K" + nRowCounter.ToString());
excelRange.Select();
excelRange.NumberFormat = "#";
if (rrPartReports.Count > 0)
{
excelRange.Interior.Color = System.Drawing.Color.FromArgb(230, 216, 173).ToArgb(); //Light Blue
foreach (ReportRecord rr in rrPartReports)
{
if (lstColumns.Contains(rr.Title))
{
excelRange = (Excel.Range)excelWorksheet.Cells[nRowCounter, lstColumns.IndexOf(rr.Title) + 1];
excelRange.Interior.Color = rr.Description.ToUpper().Contains("TAG") ? clrGreen.ToArgb() : clrPink.ToArgb();
if (rr.Description.ToUpper().Contains("TAG"))
{
rs.Find("PART_ID=" + (nCounter + 1).ToString(), 0, ADODB.SearchDirectionEnum.adSearchForward, "");
excelRange.AddComment(Environment.UserName + ": " + _dataFactory.GetTaggedPartPrevValue(rs.Fields["POSITION"].Value.ToString(), rr.Title));
}
}
}
}
if (nRowCounter++ % 500 == 0)
{
progress.ProgressComplete = ((double)nRowCounter / (double)rs.RecordCount) * (double)100;
Notify();
}
}
rs.Close();
excelWorksheet.Columns.AutoFit();
progress.Message = "Done Exporting to Excel";
Notify();
_dataFactory.RestoreDatabase();
}
Can you use ODBC?
''http://www.ch-werner.de/sqliteodbc/
dbName = "c:\docs\test"
scn = "DRIVER=SQLite3 ODBC Driver;Database=" & dbName _
& ";LongNames=0;Timeout=1000;NoTXN=0;SyncPragma=NORMAL;StepAPI=0;"
Set cn = CreateObject("ADODB.Connection")
cn.Open scn
Set rs = CreateObject("ADODB.Recordset")
rs.Open "select * from test", cn
Worksheets("Sheet3").Cells(2, 1).CopyFromRecordset rs
BTW, Excel is quite happy with HTML and internal style sheets.
I have used the Excel XML file format in the past to write directly to an output file or stream. It may not be appropriate for your application, but writing XML is much faster and bypasses the overhead of interacting with the Excel Application. Check out this Introduction to Excel XML post.
Update:
There are also a number of libraries (free and commercial) which can make creating excel document easier for example excellibrary which doesn't support the new format yet. There are others mentioned in the answers to Create Excel (.XLS and .XLSX) file from C#
Excel has the facility to write all the data from a ADO or DAO recordset in a single operation using the CopyFromRecordset method.
Code snippet:
Sheets("Sheet1").Range("A1").CopyFromRecordset rst
I'd normally recommend using Excel to pull in the data from SQLite. Use Excel's "Other Data Sources". You could then choose your OLE DB provider, use a connection string, what-have-you.
It sounds, however, that the real value of your code is the formatting of the cells, rather than the transfer of data.
Perhaps refactor the process to:
have Excel import the data
use your code to open the Excel spreadsheet, and apply formatting
I'm not sure if that is an appropriate set of processes for you, but perhaps something to consider?
Try this out:
http://office.microsoft.com/en-au/excel-help/use-microsoft-query-to-retrieve-external-data-HA010099664.aspx
Perhaps post some code, and we might be able to track down any issues.
I'd consider this chain of events:
query the SQLite database for your dataset.
move the data out of ADO.NET objects, and into POCO objects. Stop using DataTables/Rows.
use For Each to insert into Excel.
I am using JXL to write an excel file of 50000 rows and 30 columns.
My code looks like this:
for (int j = 0; j < countOfRows; j++) {
myWritableSheet.addCell(new Label(0, j, myResultSet.getString(1), myWritableCellFormat));
myWritableSheet.addCell(new Label(1, j, myResultSet.getString(2), myWritableCellFormat));
.....
.....
}
While writing the cells, the program goes slower and slower
and finally around the row 25000 I am getting the following error:
Exception in thread "Thread-3" java.lang.OutOfMemoryError: Java heap space
at jxl.write.biff.WritableSheetImpl.getRowRecord(WritableSheetImpl.java:984)
at jxl.write.biff.WritableSheetImpl.addCell(WritableSheetImpl.java:951)
at KLL.ConverterMainFrame$exportToXLSBillRightsThread.run(ConverterMainFrame.java:6895)
It's always difficult in Java to handle the memory.
In this case it seems to be the jxl's problem.
Is there a way to write the file, clear the memory and coninue writing cells every 1000 cells?
Would that be a good idea or what else would you propose as a solution?
The JExcel FAQ has a couple of suggestions including Curtis' idea above.
If you don't mind the performance hit, you could use a temporary file instead of doing it all in memory.
WorkbookSettings s = new WorkbookSettings();
s.setUseTemporaryFileDuringWrite(true);
WritableWorkbook ws = Workbook.createWorkbook(new File("someFile.xls"),s);
Is raising the memory available to the VM (with -Xms and -Xmx) not an option?