I'm trying to read an Excel sheet from an XLS or XLSX file in memory using Delphi 7. When possible I use automation to read the cells one by one, but when Excel is not installed, I revert to using the ADO/ODBC Jet driver.
I connect using either
Provider=Microsoft.Jet.OLEDB.4.0; Data Source=file.xls;Extended Properties="Excel 8.0;Persist Security Info=False;IMEX=1;HDR=No";
Provider=Microsoft.ACE.OLEDB.12.0; Data Source=file.xlsx;Extended Properties="Excel 12.0;Persist Security Info=False;IMEX=1;HDR=No";
My problem then is that when I use the following query:
SELECT * FROM [SheetName$]
the returned results do not contain the empty rows or empty columns, so if the sheet contains such rows or columns, the following cells are shifted and do not end up in their correct position. I need the sheet to be loaded "as is", ie know exactly from what cell position each value comes from.
I tried to read the cells one by one by issuing one query of the form
SELECT F1 FROM `SheetName$A1:A1`
but now the driver returns an error saying "There is data outside the selected region". btw I had to use backticks to enclose the name because using brackets like this [SheetName$A1:A1] gave out a syntax error message.
Is there a way to tell the driver to select the sheet as-is, whithout skipping blanks? Or maybe a way to know from which cell position each value is returned?
For internal policy reasons (I know they are bad but I do not decide these), it is not possible to use a third party library, I really need this to work from standard Delphi 7 components.
I assume that if your data is say in the range B2:D10 for example, you want to include the column A as an empty column? Maybe? Is that correct? If that's the case, then your data set, when you read the sheet (SELECT * FROM [SheetName$]) would also return 1 million rows by 16K columns!
Can you not execute a query like: SELECT * FROM [SheetName$B2:D10] and use the ADO GetRows function to get an array - which will give you the size of the data. Then you can index into the array to get what data you want?
OK, the correct answer is
Use a third party library no matter what your boss says. Do not even
try ODBC/ADO to load arbitrary Excel files, you will hit a wall sooner or later.
It may work for excel files that contain a single data table, but not when you want to cherry pick data in a sheet primarily made for human consumption (ie where a single column contains some cells with introductory text, some with numerical data, some with comments, etc...)
Using IMEX=1 ignores empty lines and empty columns
Using IMEX=0 sometimes no longer ignores empty lines, but now some of the first non empty cells are considered field names instead of data, although HDR=No. Would not work anyway since valules in a column are of mixed types.
Explicitly looping across cells and making a SELECT * FROM [SheetName$A1:A1] works until you reach an empty cell, then you get access violations (see below)
Access violation at address 1B30B3E3 in module 'msexcl40.dll'. Read of address 00000000
I'm too old to want to try and guess the appropriate value to use so it works until someone comes with yet another mix of data in a column. Sorry for having wasted everybody's time.
Related
TL;DR: I'm basically trying to obtain a column range such as 'Sheet 1'!$A:$A where the A is obtained by matching the contents of a given cell to a 1:1 range within a sheet referenced by another given cell, for use in a dynamic range.
In the highly probable case where that made zero sense, here's an illustration:
PARAMETERS: A2 = "LIST" | C2 = "FirstName" | Desired result: 'LIST'!$A:$A
And I've obtained that, BUT, I can't use that output ('LIST'!$A:$A) within formulas (namely to create a dynamic range). For instance, here 'LIST'!$A:$A contains 101 cells with values in them:
V3 = NamedFormula = 'LIST'!$A:$A
COUNTA(INDIRECT(V3)) = 101
COUNTA(INDIRECT(NamedFormula)) = 1 because it evaluates to #VALUE and that is a singular result.
Before delving into the topic of using INDIRECT with a Named Range (which I've read about and am still getting over my confused grief), I'm realizing my Names are getting a bit out of hand. I tend to use Excel like a mad scientist. So, in case there's a much simpler solution to what I'm trying to do, here's my actual mission:
0. I'm building a tool to simplify a process where email addresses are built from different data, which needs to run without any scripts, only formulas.
1. A tab with no imposed name would contain a user database with minimally (firstname and lastname OR IDs) AND (potentially other data columns) in no specific order. Tool users would import that tab from wherever the data got to them depending on the client, and would only need to copy-paste relevant headers to the main tab without changing anything else here for data integrity.
2. The main tab would have specific input fields where tool users would paste in the name of the imported tab as well as the labels of the columns they need (for instance, the labels in the first row of the columns containing the first name and the last name), and an input field for the domain name to use to build those email addresses.
3. A Data tab is referenced for cleaning and preparing strings for email address formats.
4. The Export tab would spew out a list of clean email addresses that can be exported to CSV.
The Data tab is just 2 columns to use with SUBSTITUTE so that for instance apostrophes are removed but accented letters are normalized (é -> e). I've used LAMBDA within Names to get there. The problem is to tie everything in - to get those Named ranges into the final formula.
The Names I'm using so far (I'd like to use fewer but testing specific parts extended beyond simple usage I fear):
ALPH ={"A";"B";"C";"D";"E";"F";"G";"H";"I";"J";"K";"L";"M";"N";"O";"P";"Q";"R";"S";"T";"U";"V";"W";"X";"Y";"Z"}
LABELS =LAMBDA(labelname,ADDRESS(2,MATCH(labelname,INDIRECT("'"&PARAMETERS!$A$2&"'!$1:$1"),0),1,1,PARAMETERS!$A$2))
RANGECOL =LAMBDA(labelname,COLUMN(INDIRECT(LABELS(labelname))))
RNCOL =LAMBDA(label,"'"&PARAMETERS!$A$2&"'!$"&INDEX(ALPH,RANGECOL(label))&":$"&INDEX(ALPH,RANGECOL(label)))
I haven't tied everything in the Data tab yet - I'm still trying to automate my main tab before pushing further and using the Data tab substitutions on top of everything. That will be the next step, not my current focus. But, for the curious and interested, on the Data tab I'm using something something I found on ablebits which works wonders =]
So, now if I use the offset range with a static LIST!A:A it works:
=IF($C$2<>"",LOWER(INDEX(OFFSET(INDIRECT(ADDRESS(2,MATCH($C$2,INDIRECT("'"&$A$2&"'!$1:$1"),0),1,1,$A$2)),0,0,COUNTA(LIST!A:A)-1,1),ROW())),"") &IF($C$3<>"","."&LOWER(INDEX(OFFSET(INDIRECT(ADDRESS(2,MATCH($C$3,INDIRECT("'"&$A$2&"'!$1:$1"),0),1,1,$A$2)),0,0,COUNTA(LIST!A:A)-1,1),ROW())),"") &"#"&$C$4
But when I try to use the dynamic RNCOL($C$3) it does not:
=IF($C$2<>"",LOWER(INDEX(OFFSET(INDIRECT(LABELS($C$2)),0,0,COUNTA(INDIRECT(RNCOL($C$2)))-1,1),ROW())),"") &IF($C$3<>"","."&LOWER(INDEX(OFFSET(INDIRECT(LABELS($C$3)),0,0,COUNTA(INDIRECT(RNCOL($C$3)))-1,1),ROW())),"") &"#"&$C$4
This just gives #REF, and evaluating shows the digression starting at INDIRECT(RNCOL($C$3)) equating to #VALUE.
I'm starting to see double here but my undying and completely normal love for Excel prevents me from going home from work as I'm way too far down the rabbit hole to let my obsession die here.
Any pointers as to how this can work?
Note - all of the names in the supplied sheet were generated by an online fake name generator, nothing in here is actual user data #GDPR
Thanks in advance! <3
Test sheet is available via Google Drive.
Your current set-up is not good for many reasons, and in my opinion would require a complete overhaul, the scope of which lies beyond a response on this website.
As to a 'quick fix' to your current issue, the reason your formula in E1 is currently returning an error is due to the fact that, as you can see via stepping through with the Evaluate Formula tool, the part
COUNTA(INDIRECT(RNCOL($C$2)))-1
is resolving to
COUNTA(INDIRECT({"'LIST'!$A:$A"}))-1
and this is not the same as
COUNTA(INDIRECT("'LIST'!$A:$A"))-1
in that the value being passed to INDIRECT is an array in the former though not in the latter. Although INDIRECT can accept arrays, it is only within certain constructions in conjunction with other suitable functions; here it will simply error.
And the reason that it is returning an array is due to the fact that RNCOL($C$2) is returning an array, and that is because that function is defined as
=LAMBDA(label,"'"&PARAMETERS!$A$2&"'!$"&INDEX(ALPH,RANGECOL(label))&":$"&INDEX(ALPH,RANGECOL(label)))
and, since RANGECOL($C$2) resolves to 1 here, the above is equivalent to
"'PARAMETERS!$A$2'!$"&INDEX(ALPH,1)&":$"&INDEX(ALPH,1)
Here, because you are omitting the column_num parameter from INDEX, the part
INDEX(ALPH,1)
is resolving to
{"A"}
which is an array (albeit one comprising a single value) and technically different from
"A"
In most circumstances, this is not an issue. As such, it is almost always unnecessary to pass both a row_num and column_num parameter to INDEX when indexing a one-dimensional array. Here, however, it matters.
You can resolve this by explicitly including a column_num parameter, i.e. redefine RNCOL as
=LAMBDA(label,"'"&PARAMETERS!$A$2&"'!$"&INDEX(ALPH,RANGECOL(label),1)&":$"&INDEX(ALPH,RANGECOL(label),1))
Currently the stock process at my company is very manual and it normally doesn't get carried out due to the process being rather boring. Currently all excel based I am slowly moving over to SQL that will automatically update the information.
We have come up with a naming system/code for each item, this is made up from several fields on the excel document. However there is the same codes in different columns that we wish to remove for when we push into SQL (Basically we just want the 1 line item and a count of how many times it has been used)
It has to be dynamic. (I can add an extra tab to the excel document to do any magic required) and if possible not use any Macros
So the data starts like this:
#Counts and then the duplicates are removed to produce this list
I have tried a range of countifs/Vlookups and I can get it roughly working but its not dynamic enough and I end up having multiple rows of 0 Qtys
Hopefully this is enough information
Cheers all
It looks like a very similar question was answered here.
After plugging in that formula in a different column, you can use the CountIf function in the next column.
I have an Excel sheet with a very wide table on it. Due to developer friendlyness I'd like to use a certain style of column header naming (much like proper Hungarian notation), where I suffix each header name with "column type" tags. This allows me to easily spot where e.g. apples and oranges are compared. There are also pivot table reports based on this table.
An example to illustrate this: say you have 2 monetary columns, column A being expressed in another currency than column B. The model should thus never combine them without first applying appropriate exchange rates. To spot this I name these columns e.g. Earned - Cur1 and Saved - Cur2. Any calculation like =[#[Earned - Cur1]] + [#[Saved - Cur2]] is illegal, but due to the tags this can be picked up easily in an audit. I have several such tag groups in use already, and they already prevented some errors creeping in.
However...
The file also needs to be distributed to lots of not-so-savvy end users, and they need to fill in this table and refer to some of the outcome columns. Most intermediate columns we already hide, but the column names are now far from being user-friendly (like: fill out Actual - NK/Q1/EC/%, please?).
And this needs to run in Excel 2010.
What are my options?
Option 1
Add an extra row above the table, putting human readable names in there, and just hide the table header row. This works, but not the users can't sort and filter the table anymore, so that's a no-go.
Option 2
Augment option 1 by prepending a newline to each column name, and make the table header row 1 character high. The header cells would still be there to drive sorting and filtering and the users have human readable names in the row above. The actual header cells would appear like 'empty' buttons. Could work, but then the complex formulas become unreadable due to all the newlines from the column names all over the place.
Option 3
Add a macro that switches the headers in the table by alternative headers in another row above the table. The macro should be ran just before sending out the file to the users, and ran again when they return them filled in and all. I happily coded this option into the file, and it works wonderfully! But then I realized this (and thus option 2 as well) breaks all the derived pivot tables, since Excel links the data by the names used in the table - update the name, and that section of the pivot will be dropped...
I'd really like the option of having our development-oriented column names in there when we ourselves work with the file, but being able to switch out the headers when needed. And of course without rebuilding all the pivots after each such switch.
An opening here would be that pivots seem to only drop the columns once they're refreshed. I could use this to update the header names, then do some magic on the pivots to remap their fields, and only then refresh them, but it seems there's no way from within VBA to accomplish that (PivotField.SourceName is read only).
Hopefully someone can think of an alternative, or am I SOL? I'm totally open to other workarounds.
Workaround 1
Insert null-terminating characters in the header names such that they do not show normally in the formulas, but do not show in the table header row. If only it were that simple though... Turns out Excel throws up from a =Char(0)&"abc", and things like =Char(8)&"abc" (tab anyone?) give Unicode replacement characters when pasted into a header cell... (?)
Workaround 2
A last resort seems to be to unzip the excel file, and plough through the xml data to update everything in one go there, then rezip the file. But this code also needs to be executed by less skilled users, and I see too many ifs and buts to make me feel safe using this setup.
Workaround 3
For now I just use a variation on option 2; I have some VBA that 'empties' the header cells instead of prepending a newline to them. By 'emptying' I mean setting the font size to 1, subscript, non-bold, and then make the font color identical to the background color, followed by setting it's row height to the default 14.5. The cryptic names do leak out however; column header cell drop down arrows for sorting&filtering show the cryptic name, as well as the pivot field settings and of course the formula bar when you just click such a cell. But I guess it's the best I can do?
And then again I'm probably just perfectionizing this thing faaar to much :) But from this point on it's about the challenge!
Make sure you Tick the Box "Add this data to the DataModel" when creating your pivot(s)
AFAIK when your Pivots are connected to the Datamodel instead of directly to the Range/Table you can change your column-names in the Table and your Pivot will stay fine. You could even use other names in your Pivot.
Basically I am trying to improve a spreadsheet that current uses fixed IF functions within IF functions to determine where to find data, then originally used the VLOOKUP function to return the appropriate cable cleat size. Where "Cleat Diameter">"Cable Diameter".
I've been using this for a while, however excel quickly runs out of resources with all the remaining calculations being performed. As a result, I've opted to put all data a single table, and try to use the match function to retrieve the necessary row. Then Simply use the =INDIRECT function to retrieve data from the appropriate column of the associated row.
Unfortunately I believe the issue relates to the fact that I first need to perform at MATCH Type 0 (exact match), followed by a type -1 for the size to identify the next size up that can accommodate a specific cable size.
I've managed a simple lookup on another dataset using (for exact matches):
=MATCH($B3,'Current Raw Data'!A:A,0)+ROW('Current Raw Data'!A:A)-1
However when I attempt the same thing with two types of matches I get errors. The closest I get it using the following array formula, but it does not work unless the data set is arranged so that the contents of Cell C3 is the first occurring item in the dataset in column A:A:
{=MATCH(C3,($B3='Lookup - Cleats'!A:A)*('Lookup - Cleats'!B:B),-1)}
Main sheet:
Dataset Example:
With this array formula (click Ctrl + Shift + Enter together inside formula bar), you should be able to get your results:
=IFERROR(INDEX('Lookup - Cleats'!C$3:C$26,MATCH($B3&$C3,'Lookup - Cleats'!$A$3:$A$26&'Lookup - Cleats'!$B$3:$B$26,0)),"")
I tried my best to use your data setup but maybe miss one or two things that you will need to adjust accordingly. Let me know if this is not working.
33266500,332665100,332665200,332665300 was the original value, cell should look like this: 33266500,332665100,332665200,332665300 but what I see as the cell value in excel is 3.32665E+34
So the question is I want to convert it into the original string. I have found format function on google and I used it like these
format(3.32665E+34,"standard")
giving it as 332,6650,033,266,510,000,000,000
How to parse it or get back the orginal string? I belive format is the function in vba.
Excel has a 15 digit precision limit. If the numbers are already shown like this when you access the file, there is no way to get the number back - you have already lost some digits. VBA code and formulas will not help you.
If this is not the case, you can add a single quote ' mark before the number to store it as text. This will ensure Excel does not try to treat it as a number and thus lose precision.
If you want the value kept exactly, store the data as a string, not as a number. The data type you are using simply doesn't have the ability to do what you are asking it to do.
If you're starting with an Excel file that has already been created then you've already lost the information: Excel has tried to understand what it was given and its best guess has turned out to be wrong. All you can do (if you can't get the source data) is go back to the creator of the Excel file and tell them what's wrong.
If you're starting with, say, a text file that you're importing, then the news is much better:
If you're importing manually using the Text Import Wizard, then at "Step 3 of 3" you need to set "Column Data Format" for the problem field to "Text".
If you're using a macro, you'll need to specify a value for the TextFileColumnDataTypes property that does the same thing. The easiest way to get it right is to use the Macro Recorder.
If you want the four values in the string to be separate cells, then again, look at the Text Import Wizard settings: in Step 1 of 3 you need to set "Delimited" data type (usually the default) and in Step 2 make sure that "Comma" is checked.
The value needs to be entered into the cell as a string. You need to make whatever it is that inserts the value preceed the value with a '.