i am facing problems reading xls having complex formulas in OptaPlanner, using Apache POI.
Is there any known limitations? Please suggest the resolution steps
Related
I am using the box/spout library for exporting simple Excel files and it is no longer maintained and I wonder what solution I should choose for the current and also future projects.
Box/spout was much faster than the library I used before and as long as you don't need fancy formatting it did, what was needed and in a sufficient fast way.
I wonder which library to use instead now. An export to csv isn't an option since my users are used to the comfort of an Excel file and most are not able to open a csv file in Excel and convert it to Excel format.
I am currently using Symfony 5 and php8.1 in an Alpine Linux container
I know it isn't a direct code question, but I would be glad to know your experience or approach to excel exports in the year 2022.
P.S.: Before I used PHPExcel which was very slow, when you had many rows to export. It got a major refactoring and is now called PhpSpreadsheet but I don't know if they fixed the performance issues with many rows
I want to export data (~10.000 entries) to excel and I am not sure which technology to choose. Apache POI looks quite nice to me. Can anybody give information about the differences in performance when processing the code.
(A) PHP7 & OBDC / ADOdb
(B) Java & Apache POI
What would you recommend to use?
If you prefer PHP or Java, I would choose between:
(A) PHP & PHPExcel
or
(B) Java & Apache POI
You should do performance tests and choose the best that fits you. The results may vary and depends by your data. You will also need to do optimizations.
I'm looking into using Cassandra to store 50M+ documents that I currently have in XML format. I've been hunting around but I can't seem to find anything I can really follow on how to bulk load this data into Cassandra without needing to write some Java (not high on my list of language skills!).
I can happily write a script to convert this data into any format if it would make the loading easier although CSV might be tricky given the body of the document could contain just about anything!
Any suggestions welcome.
Thanks
Si
If you're willing to convert the XML to a delimited format of some kind (i.e. CSV), then here are a couple options:
The COPY command in cqlsh. This actually got a big performance boost in a recent version of Cassandra.
The cassandra-loader utility. This is a lot more flexible and has a bunch of different options you can tweak depending on the file format.
If you're willing to write code other than Java (for example, Python), there are Cassandra drivers available for a bunch of programming languages. No need to learn Java if you've got another language you're better with.
What is the best way to load Excel files to a Hive table?
Is there a command to change them to tab delimited format?
You could look at something with tika parsing, or apache pos parsing for xls spreadsheets.
https://poi.apache.org/
https://tika.apache.org/
You'll need a java-ish language to use this stuff, so consider groovy, jython, clojure, scala, or if you know it java.
I'm doing something similar with a bunch of xlsx files already in hdfs, with this sort of pre-processing before the output ends up in hive. Hopefully your xlsx sheets are somewhat straight forward and just resemble 2d datasets. (embedded pivot tables, charts, etc. don't come across into hive with any context.)
Good luck, it's not pretty... xls is tuff to work with because it's just so flexible.
You can try the newest version of the HadoopOffice library which has a HiveSerde for Excel files https://github.com/ZuInnoTe/hadoopoffice/wiki
Is there a way to update an excel which is already open on my system using Apache POI and also through vba code running on some other excel?
The Apache POI library does not support reading, writing, or executing macros in an Excel document.
Apache stores an internal representation of the worksheet in its memory model. You will have data concurrency issues if you try to have two different applications writing to the same Excel file.