Memory Leak Issue in Apache POI 3.9 version

Memory Leak Issue in Apache POI 3.9 version - memory-leaks

I have used Apache POI for creating .xlsx files. I tried with POI 3.8 version but it has the memory leakage issue(It creates temp files and take large amount of time to create excel file). Now I am using POI 3.9 version, but it also shows the memory leakage issue same as in 3.8 version. I tried to retrieve 10000 records from DB and creates Excel file, It took around one hour time to create the file. **
Is there any new function or new package available in 3.9 version for resolving the memory leakage issue??

Related

Upgrade janusgraph from 0.2.2 to 0.5.2

I'm new to Janusgraph. I need to upgrade the Janausgraph version from 0.2.2(storage: cassandra, index: es) to the latest stable version (0.5.2). I've gone through the docs/forums how to initiate the process (I've seen only the changelog). I wasn't able to figure out the clear/direct solution. Whether to go for incremental upgrade (0.2.2 > 0.x.x* > 0.5.2) or direct upgrade (install 0.5.2, try to dump the cassandra data some way, iff works)
I've tried the second, downloaded the latest janusgraph (both base and -full dist), installed the latest cassandra(311) and es(6xx,7xx). I've copied the old cassandra data to the latest cassandra (/var/lib/cassandra). I've started both the servers, janusgraph and cassandra, it is up and running. But when I tried to interact with janusgraph(via gremlin server), it gave error like "Gremlin groovy script engine - Illegal Argument exception "
I figured out this is how it should not be done. I need to do an incremental upgrade by proper import/export data.
Can someone help me, how should I proceed further in incremental upgrade. How can I export/import all the janusgraph/gremlin-server data.

You will need to stop the 0.2 instance, set the configuration graph.allow-upgrade=true to janusgraph.properties (see here), then start a new 0.5 instance on top of the same Cassandra (or if needed migrate the old Cassandra/ES data to newer Cassandra/ES instances).
Thereafter, a good practice is to stop this 0.5 instance, remove the graph.allow-upgrade setting, then restart it for normal use, and change it only when the next upgrade is needed.

I almost forgot to write answer (Late but might be useful).
Firstly there aren't any incremental upgrades required. We can upgrade with simple "import/export" commands.
There are 3 different formats available as of now: json, xml and binary(gryo).
Gremlin commands (gremlin-cli):
// Export from *version(0.2.2)*
graph = JanusGraphFactory.open('conf/gremlin-server/janusgraph-cql-es-server.properties')
graph.io(IoCore.gryo()).writeGraph('janusgraph_dump_2020_09_30_local.gryo')
graph.tx().commit()
// Import to *version(0.5.2)*
graph = JanusGraphFactory.open('conf/gremlin-server/janusgraph-cql-es-server.properties')
graph.io(IoCore.gryo()).readGraph('janusgraph_dump_2020_09_30_local.gryo')
graph.tx().commit()
This solved my problem.

How to render large number of rows (order of 50k) in BIRT excel reports?

I am using BIRT runtime 4.8.0 in a java project for generating Excel reports. The excel report has 1k columns and can have 10k to 50k rows (result of 1 query, maps to 1 table). I am using spudsoft ExcelEmitter for rendering the static excel reports.
Data source: Impala jdbc connection, using 1 dataset with 1 query
The issue is it takes 6 to 7GB of heap space (java) just to render 10k rows in this report, so as to load everything in memory and then write it to file.
Is there any way to reduce the memory footprints (predictable heap space usage preferably under 3GB) while rendering the excel sheets (options like pagination of query results, rendering file in parts, etc)?

I solved it with a new version of the spudsoft emitter.
That changes the apache poi from xssf to sxssf:
ExcelEmitter.ExtractMode
Experimental feature! When set to true, the emitter should run faster for XLSX files, but with a limited feature set:
Images will be omitted.
Merged cells are not allowed.
Structure header and footer are not supported. See ExcelEmitter.StructuredHeader.
https://www.eclipse.org/forums/index.php/m/1804253/#msg_1804253

Use below code to set the limit to 15K. This resolved my problem.
reportContext.getAppContext().put("MAX_PAGE_BREAK_INTERVAL", 15000);

I was able to generate excel for large dataset (order of 50k rows and 1k columns) by directly using Apache POI Streaming APIs. Aspose APIs are another good tool for doing this.
Using POI streaming APIs you can render excel with order of 50k rows, 1k columns in about a minute or two under 2GB of peak RAM usage.
So if you extend the Spudsoft excel emitter to use the POI streaming APIs then it can be handled using BIRT as well.

Primefaces DataExporter - XLSX and XLSXSTREAM

It seems that with the recent versions of PrimeFaces, new types were added for the DataExporter (see ExporterType from the current primefaces 6.2 docs)
I can't seem to find anything on the web regarding the new Apache POI XLSX and XLSXSTREAM types. Can somebody explain the differences between the two of them? Is one more efficient that the other? Are there limitations regarding the size of the exported data?

I can explain. They both produce the exact same XLSX files which are the Open Document format of Excel files using Apache POI. There is no limit on size that I am aware of.
The big difference is how they get created.
XLSX - reads the whole document in memory
XLSXSTREAM - processes and garbage collects as it processes so its memory efficient.
From the POI Docs:
SXSSF (package: org.apache.poi.xssf.streaming) is an API-compatible
streaming extension of XSSF to be used when very large spreadsheets
have to be produced, and heap space is limited. SXSSF achieves its low
memory footprint by limiting access to the rows that are within a
sliding window.
Basically if you don't care about your server resources use XLSX if you have many users downloading Excel files and JVM Memory is important to you use the XLSXSTREAM

How to migrate data from Cassandra 2.1.9 to a fresh 3.5 installation

I tried to use sstableloader to load data into Cassandra 3.5. The data was captured using nodetool snapshot under Cassandra 2.1.9. All the tables loaded fine except one. It's small, only 2 columns and 20 rows. So, I entered this bug: https://issues.apache.org/jira/browse/CASSANDRA-11806. The bug was quickly closed as a duplicate. It doesn't seem to be a duplicate, since the original case is upgrading a node in-place, not loading data with sstableloader.
Even so, I tried to apply the the advice given to run upgradesstable [sic].
The directions given to upgrade from one version of Cassandra to another seem sketchy at best. Here's what I did based on my working backup/restore and info garnered from various Cassandra docs on how to upgrade:
Snapshot the data from prod (Cassandra 2.1.9), as usual
Restore data to Cassandra 2.1.14 running on my workstation
Verify the restore to 2.1.14 (it worked)
Copy the data/data/makeyourcase into a Cassandra 3.5 install
Fire up Cassandra 3.5
Run nodetool upgradesstables to upgrade the sstables to 3.5
nodetool upgradesstables fails:
>./bin/nodetool upgradesstables
error: Unknown column role in table makeyourcase.roles
-- StackTrace --
java.lang.AssertionError: Unknown column role in table makeyourcase.roles
So, the questions: Is it possible to upgrade directly from 2.1.x to 3.5? What's the actual upgrade process? The process at http://docs.datastax.com/en/latest-upgrade/upgrade/cassandra/upgradeCassandraDetails.html is seemingly missing important details.

This turned out to be a problem with the changing state of the table over time.
Since the table was small, I was able to migrate the data by using COPY to export the data to CSV and then importing it into the new version.
Have a look at https://issues.apache.org/jira/browse/CASSANDRA-11806 for discussion of another workaround and a coming bug fix.

NodeJS: How to handle process out of memory error on large xlsx file parsing

I am using nodejs to parse xlsx files as cell by cell and the parsed cell values will be stored in mongodb.
It is working fine for small excel files which is sized less than 3MB. But in case of more than 3MB, the node application was crashed by throwing an error as "CALL_AND_RETRY_2 Allocation failed - process out of memory".
Used technologies:
Nodejs: v0.8.22,
MongoDB: 2.2.4
System Config:
OS: Ubuntu 12.04,
Memory: 4GB,
Processor: Intel I5
My steps to parse and store the xlsx data into mongodb:
Unzip a uploaded xlsx file.
Reading the styles, shared Strings, sheets, cells of each sheets and defined names from the extracted xml files of the uploaded xlsx file and saving those read values into an JS object.
Then save the read values into mongodb collections by iterating the values on the JS object.
Based on my knowledge STEP2 is causing the out of memory error because I am storing the entire xlsx values in a single JS object?.
Please provide some idea to change the way of the above process or some other valuable way to handle this situation.
Thanks.

You could try to start node with
node --max-old-space-size=3000 app
to increase the max mem to 3 GB. However, the default memory limit of node is 512 MB on 32 bit systems and 1 GB on 64 bit (according to https://github.com/joyent/node/wiki/FAQ). If you hit these limits when parsing a 3 MB excel file, that sound seriously odd - might be a memory leak. Maybe you want to post the code?
Btw, Node 0.8 is not exactly a the latest and greatest... Maybe you should also try to update to a more recent version.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string