OpenTBS more faster fill xlsx file with over 10000 row - excel

I'm using OpenTBS plugin for TinyButStront template engine more than 4 years, and almost since beginning, I found problem with merging XLSX files with more than 10000 rows in source data array.
Until now, I've used OpenTBS v.1.9.0, and solution which fix this problem, found here : http://www.tinybutstrong.com/forum.php?thr=3256
But, I decided update OpenTBS from v.1.9.0 to latest v.1.9.11, and found that my problem with large data source still not fixed, even that Skrol29 says that in v.1.9.2 "6 times faster when saving XLSX merged sheets with numerous rows."
I've used same fix from http://www.tinybutstrong.com/forum.php?thr=3256
in new v.1.9.11 and it still works and merge numerous rows way faster.
So, Skrol29, can you review again this solution and apply into official release ?

The fix submitted at http://www.tinybutstrong.com/forum.php?thr=3256 has not been implemented as is in OpenTBS because is uses regexp functions. Regexp are smart and powerful, nevertheless they assumes some string disposition that may sometime be different. So they may be not as rigorous as we can expect.
Thus TBS and OpenTBS simple XML parsing has been thoroughly tested and are more trusted.
So for this problem of performance with Ms Excel worksheets merging, it is recommended to use the command OPENTBS_RELATIVE_CELLS supported since OpenTBS 1.9.2.

Related

Keyword search excel

So I am terrible with excel and my current job requires me to comb through a ton of data. Here's what I am trying to do... I have about 40 pages of data in an excel sheet - it contains about 200 different programs and what each is capable of doing. Then, I have a separate list of people's problems that our programs may be able to help solve. Both lists are written in paragraph format in a word doc, but I have moved it to excel.
I am trying to figure out a way to match just the name of the program to just the title of the problem, by searching/comparing the description of the problem with the description of our program. In the past, doing this manually has taken nearly 80 manhours and it just seems like a waste.
Is there even a way to do this?
How difficult would it be for an excel novice to do, seeing as it's on a standalone system - where I can't copy/paste here?
Thanks in advance for any help/advice. I tried to include examples.
Customer problem
Potential solution

What is the common knowledge about NPOI, EPPlus and Koogra as of 2015?

Yes, Koogra only reads. EPPlus only supports .xlsx and is buggy in edge cases.
What else should one know for choosing between them?
Is one of them much slower than others?
NPOI seems way to complicated and is a Java port, so is it worth
using?
Should one use EPPlus for .xlsx and NPOI for .xls?
What is the general knowledge about them today?
Jet/ACE OLE DB either read worksheet as strings, or as typed columns, so you either lose numbers precision or you must have headers in the first row. Thus, they are to be avoided.
No library supports XLSB.
Speed.
For a large XLS, the time of reading for NPOI:Jet:Koogra:EDR is 14:8:7:5.
For the same XLSX, the time for EPPlus:NPOI:Koogra:EDR is 52:36:20:16.
For relatively small files with many tabs EPPlus can be a bit faster than EDR.
Errors (#DIV/0!, #VALUE!) etc.
EDR and Koogra don't explicitly support errors. EDR reads them as usual strings, Koogra -- as blank cells.
NPOI and EPPlus do.
Koogra reads dates as [OLE date] numbers and they are undistinguishable from real numbers. Also it sometimes reads numbers with many decimals digits incorrectly. EDR gets this fine. So, no to Koogra.
NPOI is complicated, 5 dlls of 4 MB. Koogra and EDR are simple, 200 KB and two dlls (themselves and zip) each.
EDR works as a IDataReader, so it reads data sequentially. It also has built-in function to get a DataSet. With sequential read yoou can only go through first sheet in the work book. Koogra supports random access to cells and sheets.
EDR is based on SharpZip, Koogra is based on Ionic.Zip. The former allows to open a file from .zip a Stream which can be benefical for other parts of the project.
I haven't looked at writing aspects of NPOI, so without the need to distinguish errors, I would go with EPPlus for .xlsx and with EDR for reading .xls.

A new idea on how to beat the 32,767 text limit in Excel

So as many others have asked in the past is there a way to beat the 32k limit per cell in Excel?
I have found ways to do it by splitting the work load into two different .txt files and then merging the two .txt files, however it is a giant PITA and more often then not I end up only using excel to its limits as I do not have time to validate the data after .txt file merges anymore this is a long process and tedious IMO.
However I think that if the limitation is there it is there because it was coded when Microsoft developed Excel, and since they have yet to raise it (2013 version the limit is still the same limit so it would do no good to upgrade)
I also know that many will say if you have a need for information in a single cell in that length then you should use ACCESS well I have no idea how to use ACCESS or how to import a tab delimited file into ACCESS like you would into EXCEL, and then even if I could figure that out I still now have to figure out how to learn all the new commands and he EXCEL equivalents if there is even such a thing.
So I was browsing some blog posts the other day on how to beat limitations by software and I read something about reverse engineering.
Would it be possible to load excel into a hex editor, go in and change every instance of 32767 to something greater?
While 32767 may seem like an arbitrary number, it's actually the upper limit of a 16-bit signed integer (called a short in C). The range of a short goes from -32768 to 32767.
A 16-bit integer can also be unsigned, in which case its range is 0 to 65535.
Since it's impossible for a cell to have a negative number of characters, it seems odd that Microsoft would limit a cell's length based on a signed rather than unsigned 16-bit integer. When they wrote the original program, they probably couldn't imagine anyone storing so much information in a single cell. Using shorts may have simplified the code. (My first computer had only 4K of memory, so it's still amazing to me that Excel can store 8 times that much information in a single cell.)
Microsoft may have kept the 32767 limit to maintain backward compatibility with previous versions of Excel. However, that doesn't really make sense, because the row and column counts greatly increased in recent versions of Excel, making large spreadsheets incompatible with previous versions.
Now to your question of reverse-engineering Excel. It would be a gargantuan task, but not impossible. In the early '90s, I reverse-engineered and wrote vaccines for a few small computer viruses (several hundred bytes). In the '80s, I reverse-engineered an 8KB computer chess program.
When reverse-engineering an executable, you'll need a good disassembler or decompiler. Depending on what you use, you may get assembly-language or C code as the output. But note that this will not be commented code, and you will not see meaningful variable or function names. You'll have to read every line of code to determine what it does. And you'll quickly discover that the executable is the least of your worries. Excel's executable links in a number of DLL files, which would also need reverse-engineering.
To be successful, you will need an extensive knowledge of Windows programming in addition to C or Intel assembly code – not to mention a large amount of patience. Learning Access would be a much simpler task.
I'd be interested in why 32767 is insufficient for your needs. A database may make more sense, and it wouldn't necessarily need to duplicate the functionality of Excel. I store information in a database for output to Web pages, in which case I use HTML+JavaScript for anything that needs to be interactive.
In case anyone is still having this issue:
I had the same problem with generating a pipe-separated file of longitudinal research data. The header row exceeded the 32767 limit. Not an issue unless the end-user opens the file in excel. Work around is to have end-user open file in google sheets, perform the text-to-columns transformation, then download and open file in excel.
https://support.clarivate.com/ScientificandAcademicResearch/s/article/Web-of-Science-Length-limit-of-cell-contents-in-Excel-when-opening-exported-bibliographic-data?language=en_US
Jack Straw from Wichita (https://stackoverflow.com/users/10327211/jack-straw-from-wichita) surely you can do an import of a pipe separated file directly into Excel, using Data>Get Data? For me it finds the pipe and treats the piped file in the same way as a CSV. Even if for you it did not, you have an option on the import to specify the separator that you are using in your text file.
Kind regards
Sefton Hall

How to read excel(2007+ xlsx) sheet using actionscript(AIR)?

How to read excel(2007+ xlsx) sheet using actionscript(AIR)?
as3xls
An Actionscript 3 library for reading and writing Excel files. Currently reading numbers, text, and formulas from Excel version 2.0-2003 and writing numbers, text, and dates to Excel 2.0 is supported. No server-side help is needed.
SUPPORT INFORMATION
Documentation and samples are at http://code.google.com/p/as3xls/
I wrote this: https://github.com/childoftv/as3-xlsx-reader I'd love to know if it helps
Do you have any idea how... Inefficient this is?
Excel uses a complex setup for files, and unless you want to write a full-scale parser for its spreadsheets (which, believe me, will be difficult, alone to figure out what the format chars do), you'd be better off finding another solution.
Say, using a "save to XML" option would make your job a few thousand times easier, without exaggeration. AS3 has no native support for Excel, there is no real point for it to have such. But it has great integrated methods for working with XML.
If possible, save the Excel files to XML and parse those.
Better still, use databases, and parse them as XML through PHP.
I did a search and came up with this: http://code.google.com/p/php-excel-reader/
Once you've got it in PHP, passing it on to Flash is no problem at all. I'd recommend turning it into straight arrays of objects and converting it to AMF3 via Zend_Amf, AMFPHP or WebOrb, whichever one you're most comfortable with. You can then create tables, manipulate the data or whatever you like. It'd also be a lot faster and lighter than using XML.
PK
I took a look at the xlsx breakdown and it would take me 1 week to write an xlsx writer that could do basic formatting and formulas. I've only spent 1 hour perusing through the directories in an xlsx file and all you'd have to do is create the same directory structure...mostly cut and paste some strings..and then zip it and call it xlsx.
I tried this theory by manually making an xlsx file using 7zip. I downloaded childoftv's reader and, though I don't need the reader, the package includes a few zip/unzip classes that would prove helpful for anyone who wants to make a xlsx writer.
Long story short, the setup isn't complex, somebody just has to take a week out of their busy schedule to do it. I need this functionality so if nobody's done it yet, then I'll have to. Hopefully my search will find something better than a forum where the general consensus is "it's too hard, give up."

Free VB6/VBA profiler and best Excel practices

We have a lot of reports that are generated via VBA & Excel. Only a small percentage of the reports are actual calculations - the majority of the work is sql calls and formatting/writing of cells. The longest of which takes several hours, the majority takes around 20-30 mins each.
The VBA/Excel code plugs into a dll that the VB6 desktop apps use - it's here that all the sql calls are made. While I am sure that there is room for improvement here, it's not this that concerns me - the desktop apps are fairly snappy.
Two VBA functions are used in abundance: These are called GetRange and SetupCell and they nearly always appear together. The GetRange function is a wrapper for the Excel.Range object. It takes a sheet, and 4 values for the extents of the range. Its main use is to pick the cell for editing. There doesn't appear to be much chance of optmising it, but is it the best way?
Its partner is SetupCell. This takes a Excel.Range object, text and a dozen parameters about the cell (font, borders, etc). Most of these parameters are optional booleans but again, it seems very wasteful. Some of these can be set posthumously but some are dependant on the values contained in the cell.
There's quite a lot of code contained in these functions, mainly if statements and work won't appreciate me posting it.
I guess I've got two questions: Is there a better way and what is it and is there are free profiler that I can use to see if the bulk of the time is here or in the dll?
several hours is ridiculous for a report.
If the problem is VBA buy "Professional Excel Development" (stephen Bullen, Rob Bovey et al): this has a free VBA profiler called PerfMon.
If the problem is Excel Calculation see http://msdn.microsoft.com/en-us/library/aa730921.aspx?ppud=4
But I would guess that the problem is the high overhead associated with referencing things cell-by-cell: you should always work in large blocks of cells at a time.
Have you thought about using an actual reporting solution? What's your backend db? If you are using MSSQL 2000 or higher there is a fairly decent reporting solution you can use free of charge. SQL Server Reporting Services.
It sounds as if the reports are spending most of their time formatting cells. This could be why the reports seem so slow and the desktop app doesn't.
Alternatively, if you know the formatting before hand and it is fairly static, you could pre-format the sheets to cut down on some of the work.
I will throw this in there as well. Most reporting solutions will allow for conditional formatting and such, but since they are designed to work as such performance will be much better than having Excel do it.
This isn't a profiler recommendation, but it is a suggestion for speeding up Excel macros that are spending their time updating the screen. I've had excellent results by turning off screen updating while the macro is running: set Application.ScreenUpdating= False, and also using a number of other similar settings. Just be sure to turn them back on again when the macro finishes :P
It's not free but you can profile with this. I suspect the demo will be adequate to your needs: http://www.aivosto.com/vbwatch.html
It sounds like the VBA code (or the VB code that's writing to the sheets) is doing so line by line, this can take ages, and is poor design. Write to Excel as a variant in one go. Format the sheet after the data is all imported.
Thanks
Ross

Resources