I want to export data (~10.000 entries) to excel and I am not sure which technology to choose. Apache POI looks quite nice to me. Can anybody give information about the differences in performance when processing the code.
(A) PHP7 & OBDC / ADOdb
(B) Java & Apache POI
What would you recommend to use?
If you prefer PHP or Java, I would choose between:
(A) PHP & PHPExcel
or
(B) Java & Apache POI
You should do performance tests and choose the best that fits you. The results may vary and depends by your data. You will also need to do optimizations.
Related
I'm looking into using Cassandra to store 50M+ documents that I currently have in XML format. I've been hunting around but I can't seem to find anything I can really follow on how to bulk load this data into Cassandra without needing to write some Java (not high on my list of language skills!).
I can happily write a script to convert this data into any format if it would make the loading easier although CSV might be tricky given the body of the document could contain just about anything!
Any suggestions welcome.
Thanks
Si
If you're willing to convert the XML to a delimited format of some kind (i.e. CSV), then here are a couple options:
The COPY command in cqlsh. This actually got a big performance boost in a recent version of Cassandra.
The cassandra-loader utility. This is a lot more flexible and has a bunch of different options you can tweak depending on the file format.
If you're willing to write code other than Java (for example, Python), there are Cassandra drivers available for a bunch of programming languages. No need to learn Java if you've got another language you're better with.
Is there a way to update an excel which is already open on my system using Apache POI and also through vba code running on some other excel?
The Apache POI library does not support reading, writing, or executing macros in an Excel document.
Apache stores an internal representation of the worksheet in its memory model. You will have data concurrency issues if you try to have two different applications writing to the same Excel file.
The issue here is ADO connection with Excel - is this still the standard way to read/write excel files within a Dephi XE environment? We're coming up with multiple issues when reading/writing using the ACEOLEDB driver (ACE 12) and this includes
Reading cells with hashtags don't return results
"Invalid Floating Point" when exporting grids.
We've also noticed that there's many versions of the ACE 12 driver out on Microsoft's website (via Access Database driver executables) and they each seem to have different issues with Delphi.
With these things in mind,
Is using ADO with Excel bad at this point?
Does anyone else have these issues and what did you do to resolve them (other than using XLS files instead of XLSX)?
ADO in Delphi is leaning to TDataSet model, which mean strictly tabular data... that excel is not. Each excel sheet has a random cells filled, some of those may constitute quazi-tabular ranges, or may not.
Depending on the installed software you can
1) use Excel application to open XLSX, read the cells and pass them to your program. This is most easy and compatible method, though is noticeably slow due to COM IPC marshalling and switching. There are tricks to fasten it, like hiding Excel window, copying arrays of data instead of cell-by-cell approach and such.
Start exploring TExcelApplication component - http://docwiki.embarcadero.com/RADStudio/XE3/en/Using_Component_Wrappers
2) If you do not want to rely on having commercial Excel installed, you may try reading XLSX files with OpenOffice. Vanilla OpenOffice can only read them though, but some other distro's can write them as well. OpenOffice also exposes external APIs both COM-based and HTTP-based. I know there are Delphi projects of Delphi - OOo interacting but personally did not used them and apart of noting that approach i can say no detailed assesment of it.
3) Microsoft also used to sell Office for Developers or such, that gave you Access and Excel kernels as redistributables, that you could pass with your application and install them and use them. Dunno if it is still feasible though.
4) there is a set of commercial components reading and writing those files directly w/o need to have external EXE's doing the job. While that would be the most fast way to work, it would only support some subset of features (which may or may not be ok for your particular goals) and may have troubles with "future compatibility" as Microsoft would roll out updated versions of XLS and XLSX formats (which again may be of some or none concern to you). Like there was TXLSFile for Biff8 format, there is for example OExport library. There is also a component from well known TMS Studio and maybe some more.
5) You can join some open-source project and try to enhance it for your needs, then again that depends upon how much the subset you need.
I know, many people succefulyl use OLE DB to access Excel data, but for me it always sounded as some perversino, because Excel files do not have any internal regular data arrangement at all, less so strictly-tabular RDBMS-like one.
I've really only ever found it possible to manipulate Excel via COM. I've tried alternatives, like ADO, but they always seem so full of archane bugs - or maybe it's just my ignorance.
COM is definitely slow in certain areas. I've used a combination of COM and (within the Excel file) VBA to achieve what I need to do.
Given that Excel is NOT going to go away BUT Microsoft cannot be relied upon not to betray its users by, for instance, doing away with VBA and COM support, it would be great if someone, somewhere (and I wish I had the skills) could create some proper support for Excel from Delphi.
I am given a huge excel file with a great deal of data and formulas and I need to access the functionality implemented therein to create a web service running on MacOSX. I tried at first to use POI to access its functionality from Java but unfortunately POI does not support some functions like MINVERSE, TRANSPOSE, MMULT and others and I found it particularly difficult to implement them (also see this question).
Is there a programmatic way to access the excel sheet? I found RCOMClient for R but it seems it works onl on Windows. Perl, Python, R, Octave, C++, Java or anything if fine provided that it provided full functionality over Excel formulas.
Python Excel: Why Python Excel in NOT an option for my problem. In the tutorial it is written that:
A relative reference is useful only if you have external knowledge of
what cells can be used as the origin. Many formulas found in Excel
files include function calls and multiple references and are not
useful, and can be too hard to evaluate. A full calculation engine is
not included in xlrd.
Unfortunately I need a very good support of the Excel engine.
Sure you can.
Have you checked the wonderful Python Excel project ?
Even better : you could take a look at that stupendous blog post.
Have fun and keep us posted if you need more help.
Don't bother accessing Excel from another language than Python, as Python is where you'll waste the lower amount of time.
Looking to develop server-side application that will process documents. The source documents are mostly MS-Word 2003, 2007, i.e. the MS version of Docx. Want the server application to be able to run on both linux or windows.
Wanting to know what is the best tool or library for reading and writing MS-Word files under linux. Compatibility is the most important consideration. Must preserve source document formatting including tables.
I have seen a kind of similar post here but it was specific to python. I don't care what language or libraries are used as long as they are available for windows and linux.
Must not require MS-Word to read the Word files.
I am aware of Open Office but am looking for a solution which has a high degree of compatibility with MS-Word files.
Also just came across this solution which looks promising. aspose.com
Anyone had any experience using Aspose.Words for Java or similar 3rd party packages? It looks promising but it's pricey at over $2K for an OEM subscription. That said if it delivers as advertised it may still be the best solution out there.
thanks
There have been a couple of suggestions but nothing so far which would fits the bill (or the budget).
Have you considered using b2xtranslator to convert binary .doc to .docx. (On Linux, you'd have to run it in Mono)
You could then use POI or docx4j to manipulate the docx. Not a solution if you need to save as .doc though (unless you use OO for that bit)
Ok, I'll have another go at an answer ;-)
What about using unaconv
It can convert any document OpenOffice can read to any document OpenOffice can write. You should be able to use that to convert both to/from MS-Word documents (providing they're not overly complicated which I've found open office can't handle very well).
The only caveat is that you need to have an instance of OpenOffice running on the linux server for unoconv to interact with.
Mono has recently acquired support for the system.io.packaging .net class, which allows some degree of manipulation of docx files. If the kind of thing you want to do is add/remove resources and recurse over the text, it's probably the right thing.