to read a big excel file in java - excel

I am trying to read a excel file through java apache poi in netbeans containing about 8000 columns and 1200 rows for which I am getting the following exception. I have also tried to increase the heap size in netbeans with –Xmx2048m but it doesn’t help me out.
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at org.apache.xmlbeans.impl.store.Cur$CurLoadContext.attr(Cur.java:3039)
at org.apache.xmlbeans.impl.store.Cur$CurLoadContext.attr(Cur.java:3060)
at org.apache.xmlbeans.impl.store.Locale$SaxHandler.startElement(Locale.java:3250)
at org.apache.xmlbeans.impl.piccolo.xml.Piccolo.reportStartTag(Piccolo.java:1082)
at org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.parseAttributesNS(PiccoloLexer.java:1802)
at org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.parseOpenTagNS(PiccoloLexer.java:1521)
at org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.parseTagNS(PiccoloLexer.java:1362)
at org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.parseXMLNS(PiccoloLexer.java:1293)
at org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.parseXML(PiccoloLexer.java:1261)
at org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.yylex(PiccoloLexer.java:4808)
at org.apache.xmlbeans.impl.piccolo.xml.Piccolo.yylex(Piccolo.java:1290)
at org.apache.xmlbeans.impl.piccolo.xml.Piccolo.yyparse(Piccolo.java:1400)
at org.apache.xmlbeans.impl.piccolo.xml.Piccolo.parse(Piccolo.java:714)
at org.apache.xmlbeans.impl.store.Locale$SaxLoader.load(Locale.java:3439)
at org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Locale.java:1270)
at org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Locale.java:1257)
at org.apache.xmlbeans.impl.schema.SchemaTypeLoaderBase.parse(SchemaTypeLoaderBase.java:345)
at org.openxmlformats.schemas.spreadsheetml.x2006.main.WorksheetDocument$Factory.parse(Unknown Source)
at org.apache.poi.xssf.usermodel.XSSFSheet.read(XSSFSheet.java:188)
at org.apache.poi.xssf.usermodel.XSSFSheet.onDocumentRead(XSSFSheet.java:180)
at org.apache.poi.xssf.usermodel.XSSFWorkbook.onDocumentRead(XSSFWorkbook.java:300)
at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:159)
at org.apache.poi.xssf.usermodel.XSSFWorkbook.<init>(XSSFWorkbook.java:221)
at testdoc.Testdoc.main(Testdoc.java:26)
Java Result: 1
BUILD SUCCESSFUL (total time: 49 seconds)
The line no 26 is
File excel = new File ("E:\\Project\\Rapid out\\"+filename+""+type+".xlsx");
FileInputStream fis = new FileInputStream(excel);
Line 26:: XSSFWorkbook wb = new XSSFWorkbook(fis);
XSSFSheet ws = wb.getSheet("Sheet2");

Instead of using InputStream, can you try with File:
XSSFWorkbook wb = new XSSFWorkbook(excel);
From POI Guide
When opening a workbook, either a .xls HSSFWorkbook, or a .xlsx XSSFWorkbook, the Workbook can be loaded from either a File or an InputStream. Using a File object allows for lower memory consumption, while an InputStream requires more memory as it has to buffer the whole file.

Related

Multipart to xssfworkbook

I am constructing an API that takes an Excel file as multipart, but in the implementation I need to read data from this multipart file, so I am using the xssfworkbook class. Can I convert the accepted .xlsx multipart file to xssfworkbook in Java?
fileAutomate(#RequestBody MultipartFile file)
Workbook workbook = new XSSFWorkbook(file)
List myData=excelTesting.getDataFromSheet(workbook,'Properties')

Apach poi Excel write() takes too much time

I am building excel (xlsx) using POI 4.0. Below is the final code to flush and close excel workbook which takes around 4 sec on my macbook pro for aprox 900Kb on disc (actual .xlsx file). Is there any way to improve it?
Also, I am not writing huge data. Actually opening existing excel which has functions and some static content, adding a header row with cell style (around 60 columns) and then closing it.
Tried using SXSSFWorkbook but no improvement.
Tried to replace FileOutputStream with ByteArrayOutputStream (just to test performance) but its pretty much same.
public void generateFinalExcelFile(XSSFWorkbook workbook) throws IOException {
String filePath = "somepath.xlsx"
File outFile = new File(filePath);
FileOutputStream outputStream = new FileOutputStream(outFile);
BufferedOutputStream out = new BufferedOutputStream(outputStream);
workbook.write(out); // this one takes long time.
out.close();
workbook.close();
}

How to update excel metadata using java

I am trying to update excel metadata in java using apache POI. Input file is large containing 8K columns and 600 rows. I am using below code
OPCPackage pkg = OPCPackage.open(new File("path for input"));
POIXMLProperties props = new POIXMLProperties(pkg);
props.getCoreProperties().setTitle("Test Title");
XSSFWorkbook wb = new XSSFWorkbook(pkg);
FileOutputStream fos = new FileOutputStream("path for output");
BufferedOutputStream bos = new BufferedOutputStream(fos);
wb.write(bos);
fos.close();
Above code is throwing me Out Of memory exception as below.
java.lang.OutOfMemoryError: Java heap space
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
at org.apache.xmlbeans.impl.store.Locale$SaxLoader.load(Locale.java:3414)
at org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Locale.java:1272)
at org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Locale.java:1259)
at org.apache.xmlbeans.impl.schema.SchemaTypeLoaderBase.parse(SchemaTypeLoaderBase.java:345)
at org.openxmlformats.schemas.spreadsheetml.x2006.main.WorksheetDocument$Factory.parse(Unknown Source)
at org.apache.poi.xssf.usermodel.XSSFSheet.read(XSSFSheet.java:227)
at org.apache.poi.xssf.usermodel.XSSFSheet.onDocumentRead(XSSFSheet.java:219)
at org.apache.poi.xssf.usermodel.XSSFWorkbook.parseSheet(XSSFWorkbook.java:452)
at org.apache.poi.xssf.usermodel.XSSFWorkbook.onDocumentRead(XSSFWorkbook.java:417)
at org.apache.poi.ooxml.POIXMLDocument.load(POIXMLDocument.java:184)
at org.apache.poi.xssf.usermodel.XSSFWorkbook.<init>(XSSFWorkbook.java:286)
Could you please help to overcome this issue for updating excel properties ?
Promoting a comment to an answer....
If you're just changing the OPC level metadata, there's no need to load the file up in the XSSF layer at any point. You'd only need to do that if you also wanted to change the spreadsheet contents eg cells
Your code can be as simple as
OPCPackage pkg = OPCPackage.open(new File("path for input"));
POIXMLProperties props = new POIXMLProperties(pkg);
props.getCoreProperties().setTitle("Test Title");
props.commit();
pkg.close();

Modifying an Excel File in Google Drive with Apache POI

I need to modify some cells of an Excel file stored in Google Drive. I'm using Apache POI to manipulate the Excel file and I can read and modify the file, but when I commit it to Google Drive it seems to work, it returns a success code but the file is not changed in Drive. The function I'm using to save the file is:
ParcelFileDescriptor file=result.getDriveContents().getParcelFileDescriptor();
InputStream in=new FileInputStream(file.getFileDescriptor());
HSSFWorkbook wb = new HSSFWorkbook(in);
for(Componente c:listaComponentes){
HSSFSheet sheet=wb.getSheetAt(c.getHoja());
HSSFRow fila=sheet.getRow(c.getFila());
fila.getCell(celdaSerie).setCellValue(c.getSerie());
}
FileOutputStream fileOut = new FileOutputStream(file.getFileDescriptor());
wb.write(fileOut);
result.getDriveContents().commit(mGoogleApiClient, null);

JAVA - Apache POI OutOfMemoryError while writing Excel File

I am writing an Excel File using Apache POI.
I want to write in it all the data of myResultSet
which has the fieldnames(columns) stored in the String[] fieldnames.
I have 70000 rows and 27 columns
My Code:
String xlsFilename = "myXLSX.xlsx";
org.apache.poi.ss.usermodel.Workbook myWorkbook = new XSSFWorkbook();
org.apache.poi.ss.usermodel.Sheet mySheet = myWorkbook.createSheet("myXLSX");
Row currentRow = mySheet.createRow(0);
for (int k = 0; k < fieldNames.length; k++) {
// Add Cells Of Title Of ResultsTable In Excel File
currentRow.createCell(k).setCellValue(fieldNames[k]);
}
for (int j = 0; j < countOfResultSetRows; j++) {
myResultSet.next();
currentRow = mySheet.createRow(j + 1);
for (int k = 0; k < fieldNames.length; k++) {
currentRow.createCell(k).setCellValue(myResultSet.getString(fieldNames[k]));
System.out.println("Processing Row " + j);
}
}
FileOutputStream myFileOutputStream = new FileOutputStream(xlsFilename);
myWorkbook.write(myFileOutputStream);
myFileOutputStream.close();
My problem is that while writing the rows the program is getting slower and slower.
When it reaches row 3500 it stops with the Exception:
Exception in thread "Thread-3" java.lang.OutOfMemoryError: Java heap space
at java.lang.AbstractStringBuilder.(AbstractStringBuilder.java:45)
at java.lang.StringBuffer.(StringBuffer.java:79)
It seems I'm out of memory.
How can I solve this.
Is there a way to store my data to a temporary file every 1000 of them (for example)?
What would you suggest?
I had the same problem using jxl and never solve it either (JAVA - Out Of Memory Error while writing Excel Cells in jxl)
Now I need xlsx files anyway, so I have to use POI.
There seems to be an approach which creates data file in XML format first and then replacing that XML with existing template xlsx file.
http://svn.apache.org/repos/asf/poi/trunk/src/examples/src/org/apache/poi/xssf/usermodel/examples/BigGridDemo.java
this is not applicable for xls format files though.
How about allowing your app to use more memory (like -Xmx500m for 500 MB)?
Assign more memory to the heap when running your program:
$ java -Xms256m -Xmx1024m NameOfYourClass
I've been there more than once.
Are you running this running on top of an application server?
What I've done in the past as was mentioned by Pablo, is to increase the heap space, but make sure that it is being increased for the application server that you are running on.
I have also had to really optimize the code when doing this.
Since you are outputting to a .xlsx file, XML takes quite a bit of memory. Not sure if it would work for you in this situation or not, but if you can create a normal .xls do that and than convert it at the end into a .xlsx file (using Apache POI of course).
Use SXSSFWorkbook instead of XSSFWorkbook, this is used for streaming User model Api
Source: https://coderanch.com/t/612234/Write-Huge-Excel-file-Xlsx
Hopefully this will help you.

Resources