Related
TL;DR: I'm basically trying to obtain a column range such as 'Sheet 1'!$A:$A where the A is obtained by matching the contents of a given cell to a 1:1 range within a sheet referenced by another given cell, for use in a dynamic range.
In the highly probable case where that made zero sense, here's an illustration:
PARAMETERS: A2 = "LIST" | C2 = "FirstName" | Desired result: 'LIST'!$A:$A
And I've obtained that, BUT, I can't use that output ('LIST'!$A:$A) within formulas (namely to create a dynamic range). For instance, here 'LIST'!$A:$A contains 101 cells with values in them:
V3 = NamedFormula = 'LIST'!$A:$A
COUNTA(INDIRECT(V3)) = 101
COUNTA(INDIRECT(NamedFormula)) = 1 because it evaluates to #VALUE and that is a singular result.
Before delving into the topic of using INDIRECT with a Named Range (which I've read about and am still getting over my confused grief), I'm realizing my Names are getting a bit out of hand. I tend to use Excel like a mad scientist. So, in case there's a much simpler solution to what I'm trying to do, here's my actual mission:
0. I'm building a tool to simplify a process where email addresses are built from different data, which needs to run without any scripts, only formulas.
1. A tab with no imposed name would contain a user database with minimally (firstname and lastname OR IDs) AND (potentially other data columns) in no specific order. Tool users would import that tab from wherever the data got to them depending on the client, and would only need to copy-paste relevant headers to the main tab without changing anything else here for data integrity.
2. The main tab would have specific input fields where tool users would paste in the name of the imported tab as well as the labels of the columns they need (for instance, the labels in the first row of the columns containing the first name and the last name), and an input field for the domain name to use to build those email addresses.
3. A Data tab is referenced for cleaning and preparing strings for email address formats.
4. The Export tab would spew out a list of clean email addresses that can be exported to CSV.
The Data tab is just 2 columns to use with SUBSTITUTE so that for instance apostrophes are removed but accented letters are normalized (é -> e). I've used LAMBDA within Names to get there. The problem is to tie everything in - to get those Named ranges into the final formula.
The Names I'm using so far (I'd like to use fewer but testing specific parts extended beyond simple usage I fear):
ALPH ={"A";"B";"C";"D";"E";"F";"G";"H";"I";"J";"K";"L";"M";"N";"O";"P";"Q";"R";"S";"T";"U";"V";"W";"X";"Y";"Z"}
LABELS =LAMBDA(labelname,ADDRESS(2,MATCH(labelname,INDIRECT("'"&PARAMETERS!$A$2&"'!$1:$1"),0),1,1,PARAMETERS!$A$2))
RANGECOL =LAMBDA(labelname,COLUMN(INDIRECT(LABELS(labelname))))
RNCOL =LAMBDA(label,"'"&PARAMETERS!$A$2&"'!$"&INDEX(ALPH,RANGECOL(label))&":$"&INDEX(ALPH,RANGECOL(label)))
I haven't tied everything in the Data tab yet - I'm still trying to automate my main tab before pushing further and using the Data tab substitutions on top of everything. That will be the next step, not my current focus. But, for the curious and interested, on the Data tab I'm using something something I found on ablebits which works wonders =]
So, now if I use the offset range with a static LIST!A:A it works:
=IF($C$2<>"",LOWER(INDEX(OFFSET(INDIRECT(ADDRESS(2,MATCH($C$2,INDIRECT("'"&$A$2&"'!$1:$1"),0),1,1,$A$2)),0,0,COUNTA(LIST!A:A)-1,1),ROW())),"") &IF($C$3<>"","."&LOWER(INDEX(OFFSET(INDIRECT(ADDRESS(2,MATCH($C$3,INDIRECT("'"&$A$2&"'!$1:$1"),0),1,1,$A$2)),0,0,COUNTA(LIST!A:A)-1,1),ROW())),"") &"#"&$C$4
But when I try to use the dynamic RNCOL($C$3) it does not:
=IF($C$2<>"",LOWER(INDEX(OFFSET(INDIRECT(LABELS($C$2)),0,0,COUNTA(INDIRECT(RNCOL($C$2)))-1,1),ROW())),"") &IF($C$3<>"","."&LOWER(INDEX(OFFSET(INDIRECT(LABELS($C$3)),0,0,COUNTA(INDIRECT(RNCOL($C$3)))-1,1),ROW())),"") &"#"&$C$4
This just gives #REF, and evaluating shows the digression starting at INDIRECT(RNCOL($C$3)) equating to #VALUE.
I'm starting to see double here but my undying and completely normal love for Excel prevents me from going home from work as I'm way too far down the rabbit hole to let my obsession die here.
Any pointers as to how this can work?
Note - all of the names in the supplied sheet were generated by an online fake name generator, nothing in here is actual user data #GDPR
Thanks in advance! <3
Test sheet is available via Google Drive.
Your current set-up is not good for many reasons, and in my opinion would require a complete overhaul, the scope of which lies beyond a response on this website.
As to a 'quick fix' to your current issue, the reason your formula in E1 is currently returning an error is due to the fact that, as you can see via stepping through with the Evaluate Formula tool, the part
COUNTA(INDIRECT(RNCOL($C$2)))-1
is resolving to
COUNTA(INDIRECT({"'LIST'!$A:$A"}))-1
and this is not the same as
COUNTA(INDIRECT("'LIST'!$A:$A"))-1
in that the value being passed to INDIRECT is an array in the former though not in the latter. Although INDIRECT can accept arrays, it is only within certain constructions in conjunction with other suitable functions; here it will simply error.
And the reason that it is returning an array is due to the fact that RNCOL($C$2) is returning an array, and that is because that function is defined as
=LAMBDA(label,"'"&PARAMETERS!$A$2&"'!$"&INDEX(ALPH,RANGECOL(label))&":$"&INDEX(ALPH,RANGECOL(label)))
and, since RANGECOL($C$2) resolves to 1 here, the above is equivalent to
"'PARAMETERS!$A$2'!$"&INDEX(ALPH,1)&":$"&INDEX(ALPH,1)
Here, because you are omitting the column_num parameter from INDEX, the part
INDEX(ALPH,1)
is resolving to
{"A"}
which is an array (albeit one comprising a single value) and technically different from
"A"
In most circumstances, this is not an issue. As such, it is almost always unnecessary to pass both a row_num and column_num parameter to INDEX when indexing a one-dimensional array. Here, however, it matters.
You can resolve this by explicitly including a column_num parameter, i.e. redefine RNCOL as
=LAMBDA(label,"'"&PARAMETERS!$A$2&"'!$"&INDEX(ALPH,RANGECOL(label),1)&":$"&INDEX(ALPH,RANGECOL(label),1))
I have a data frame, say from A1:C100, where each cell is a value (not derived from any formula) that happens to be stored as a percentage. I copy and paste values of a column in the data frame (say Column A), sort it from largest to smallest, and then paste it in say Column E. I then use =MATCH(E1, $A$1:$A$100, 0), which works as expected and returns the correct row.
However, if I then add a constant to every value in E, say column F is =E+1, and use =MATCH(F1-1, $A$1:$A$100, 0) about 90% of the values will still be correct, but some return #N/A.
How can I work around this without changing the original data frame? I have already tried rounding data to various precision points (for example =MATCH(ROUND(F1-1,4), $A$1:$A$100, 0)), or using non-exact matching (for example =MATCH(F1-1, $A$1:$A$100, 1) or even something like =MATCH(F1-.999, $A$1:$A$100, -1)) but no luck.
Any other suggestions/anyone else ever encounter something like this? What is the underlying issue?
The problem with your precision correction is that it's only being applied to the 'lookup' component of the Match function. To get more accurate 'match' rate, advise you either round the values being 'searched' at source, or simply deploy the following:
= Match(round(range_1,6), round(range_2,6), 0)
where you would substitute range_1, range_2 with the relevent/respective ranges per your example/q.
Note - the advantage of doing this 'at source' (i.e. inserting an additional column, say, range_3, where range_3 = round(range_2,6), then substitute range_2 with range_3 in the eqn. above) concerns computational speed (this would be far quicker, especially if range_2 is substantially long, e.g. >30k rows). The reason for this should be obvious: calculations within the match function are repeated for each and every match / cell that is performed, whereas the 'at source' version would only apply the calculation 'once' (i.e. across all rows/cells in question).
Hope this makes sense, and best of luck with your endeavours.
PS - change round(range,6) to round(range,4) if necessary. I usually like 6 dps when using something like this in conditional formatting for validation reasons...
Ta,
J
I am fairly new to Excel and only using it for a hobby, however I have noticed that when I attempt to add an entire column of values together, there is one value that is not included in the addition. It is also the only 'non-defined' value of the column (i.e. it is calculated using a formula rather than being inputted directly). When I go to edit the formula I see that the value in question (in the below screenshot 24.99), it seems to be a string given the speech marks either side of it, hence why it isn't in the addition.
I am confused as when I reference only that value in a sum, the value is included in the sum, as seen below:
Before it is suggested, I have experimented with using different data types for the value in the cell, including 'Currency', which is what the rest of the cells in the column are, as well as 'Number', 'Accounting', and 'General'. One thing that is strange is that the £ symbol never appears at the front of the number, no matter what data type it's casted as.
For those that are curious, the formula that is used to get the number in the cell B9 is below, where H13 is a number in the form 'Currency', and J13 is a number in the form 'General'.
=IMPRODUCT(H13,J13)
Per this link, a common way to convert text formatted numbers to, well, 'numbers' is to apply a mathematical operation (e.g. if you have ="100", multiplying by 1 will yield 100 - see screenshot)...
Cell constituents
Result
See here for Microsoft's own words "Numbers that are stored as text can cause unexpected results.".
In your example (which I've successfully replicated to aide this soln. - see below), taking 0-B9 (per below screenshot, courtesy your Q above) yields -24.99 in this case as Excel interprets cell B9 as an operand, upon which -1x (as an operation) is applied:
Put another way, you can yield exactly the same (or rather, 'anticipated') result via the summation formula as follows:
Included in the above figure is a depiction of how I have become familiarised with this 'conversion via mathematical operation' trick: it shows how taking 0 + False = 0 (or rather, per the depiction above, 0 + True = 1).
Simple example
Try the following to convince yourself:
Enter ="0", ="1", and ="2" in cells A1:A3 respectively. In the adjacent column/cells (B1:B3), enter =1-A1/A2/A3 as the case may be, i.e.:
Results
Advanced example
Now go ahead and enter (per cell) 3 a's followed by 'b' and 'c' (lowercase) in cells A1:A5. In cell B1 enter the formula "=A1:a5 = "a" (assuming you have Office 365, else use 'Ctrl+shift+enter'). This will return an array of corresponding 'True' and 'False' values for any character (col A) that satisfies the condition '="a"', videlicit:
Of course, summing column B would be futile (well, it returns 0):
However, applying an innocuous mathematical operation '1 *' makes all the difference in the world: you now have an elegant (but time / calculation expensive) way to calculate the count of cells that satisfy the criteria '="a"' - in this case, 3!
Of course, the same could be achieved using any 'neutral' calculation - for instance, adapting the apparent anomaly you've stumbled upon, using sum(0+(A1:A5 = "a") yields the self-same count of 3 ☺
i.e.
Trust this delineates/clarifies (with sufficient detail) the reason why a straight summation (which lack any 'special' operations in its own right) does not include text values, where as taking 0-B9 (per your Q) does (i.e. via the implicit 0-1*B9 operand with respect to B9).
using different data types for the value in the cell, including 'Currency'
What you changed is the number format, not data type.
It's an important difference in Excel, you can change format of cells containing numbers, however if a cell contains text - regardless of it actually represents a number - changing number format will not affect it.
=IMPRODUCT(H13,J13)
This function is used to multiply complex numbers. You could use =H13*J13
when I reference only that value
Functions taking multiple values (SUM, AVERAGE, MIN...) are designed everything not stored as number.
On the other side, if you explicitly include text in a calculation, Excel will try to convert it (e.g. 1 + "9" = 10).
That's what happens in your example too, you don't just include your string in SUM, but first it's added to 0.
Just getting started in Excel and I was working with a database extract where I need to count values only if items in another column are unique.
So- below is my starting point:
=SUMPRODUCT(COUNTIF(C3:C94735,{"Sharable Content Object Reference Model 1.2","Authored SCORM/AICC content","Authored External Web Content"}))
what i'd like to figure out is the syntax to do something like this-
=sumproduct (Countif range1 criteria..., where range2 criteria="is unique value")
Am I getting this right? The syntax is a bit confusing, and I'm not sure I've chosen the right functions for the task.
I just had to solve this same problem a week ago.
This method works even when you can't always sort on the grouping column (J in your case). If you can keep the data sorted, #MikeD 's solution will scale better.
Firstly, do you know the FREQUENCY trick for counting unique numbers? FREQUENCY is designed to create histograms. It takes two arrays, 'data' and 'bins'. It sorts 'bins', then creates an output array that's one longer than 'bins'. Then it takes each value in 'data' and determines which bin it belongs in, incrementing the output array accordingly. It returns the array. Here's the important part: If a value appears in 'bins' more than once, any 'data' value meant for that bin goes in the first occurrence. The trick is to use the same array for both 'data' and 'bins'. Think it through, and you'll see that there's one non-zero value in the output for each unique number in the input. Note that it only counts numbers.
In short, I use this:
=SUM(SIGN(FREQUENCY(<array>,<array>)))
to count unique numeric values in <array>
From this, we just need to construct arrays containing numbers where appropriate and text elsewhere.
In the example below, I'm counting unique days when the color is red and the fruit is citrus:
This is my conditional array, returning 1 or true for the rows I'm interested in:
($A$2:$A$10="red")*ISNUMBER(MATCH($B$2:$B$10,{"orange","grapefruit","lemon","lime"},0))
Note that this requires ctrl-shift-enter to be used as an array formula.
Since the value I'm grouping by for uniqueness is text (as is yours), I need to convert it to numeric. I use:
MATCH($C$2:$C$10,$C$2:$C$10,0)
Note that this also requires ctrl-shift-enter
So, this is the array of numeric values within which I'm looking for uniqueness:
IF(($A$2:$A$10="red")*ISNUMBER(MATCH($B$2:$B$10,{"orange","grapefruit","lemon","lime"},0)),MATCH($C$2:$C$10,$C$2:$C$10,0),"")
Now I plug that into my uniqueness counter:
=SUM(SIGN(FREQUENCY(<array>,<array>)))
to get:
=SUM(SIGN(FREQUENCY(
IF(($A$2:$A$10="red")*ISNUMBER(MATCH($B$2:$B$10,{"orange","grapefruit","lemon","lime"},0)),MATCH($C$2:$C$10,$C$2:$C$10,0),""),
IF(($A$2:$A$10="red")*ISNUMBER(MATCH($B$2:$B$10,{"orange","grapefruit","lemon","lime"},0)),MATCH($C$2:$C$10,$C$2:$C$10,0),"")
)))
Again, this must be entered as an array formula using ctrl-shift-enter. Replacing SUM with SUMPRODUCT will not cut it.
In your example, you'd use something like:
=SUM(SIGN(FREQUENCY(
IF(ISNUMBER(MATCH($C$3:$C$94735,{"Sharable Content Object Reference Model 1.2","Authored SCORM/AICC content","Authored External Web Content"},0)),MATCH($J$3:$J$94735,$J$3:$J$94735,0),""),
IF(ISNUMBER(MATCH($C$3:$C$94735,{"Sharable Content Object Reference Model 1.2","Authored SCORM/AICC content","Authored External Web Content"},0)),MATCH($J$3:$J$94735,$J$3:$J$94735,0),"")
)))
I'll note, though, that scaling might be a problem on data sets as large as yours. I tested it on larger data sets, and it was fairly fast on the order of 10k rows, but really slow on the order of 100k rows, such as yours. The internal arrays are plenty fast, but the FREQUENCY function slows down. I'm not sure, but I'd guess it's between O(n log n) and O(n^2) depending on how the sort is implemented.
Maybe this doesn't matter - none of this is volatile, so it'll just need to calculate once upon refreshing the data. If the column data is changing, though, this could be painful.
Asuming the source data is sorted by the key value [A], start with determining the occurence of the key column
B2: =IF(A2=A1;B1+1;1)
Next determine a group sum
C2: =SUMIF($A$2:$A$9;A2;$B$2:$B$9)
A key is unique if its group sum is exactly 1
D2: =(C2=1)
To count records which match a certain criterium AND are unique, include column D in a =IF(AND(D2, [yourcondition];1;0) and sum this column
Another option is to asume a key unique within a sorted list if it is unequal to both its predecessor and successor, so you could find the unique records like
E2: =AND(A2<>A1;A2<>A3)
G2: =IF(AND(E2;F2="this");1;0)
E and G can of course be combined into one single formula (not sure though if that helps ...)
G2(2): =IF(AND(AND(A2<>A1;A2<>A3);F2="this");1;0)
resolving unnecessarily nested AND's:
G2(3): =IF(AND(A2<>A1;A2<>A3;F2="this");1;0)
all formulas in row 2 should be copied down to the end of the list
It seems like a bit of an omission that there's no easy way to create a user-defined declarative function in Excel without defining a macro. I can't use XSLM with the uphill battle that will entail in the Enterprise, but I want to be able to define a function with intent thus.
I want to do this;
=BreakEven(C1:C20)
But I can't use a macro, although I can use a "named formula". The trouble is how to pass parameters to that? I've seen a couple of tricks (kludgy workarounds) but not for xslx.
I'd like to be able to define a Breakeven() function in another tab and reference it here passing in MORE THAN one parameter, two ranges in fact. I'm sure there's some way using string parsing but I can't see it.
I don't mind if the function doesn't look exactly like that, as long as it evaluates within the cell and I can parse it for 'intent'. For instance, this example (http://www.jkp-ads.com/articles/ExcelNames09.asp) which I was unable to get to work in xlsx uses this syntax;
=IF(ROW(D3),CellColor)
Where "cellcolor" is the name of the function and D3 is the range parameter. The other solution I'm toying with is to define a function in column format with a variable argument list (this is two rows of an excel spreadsheet);
[Value][function][parameter1][parameter2][parameter3]
24050 BreakEven C1:C20 A1:A20
It's not pretty, but the benefit of the latter is that it describes the function to an external reader. We know it's a breakeven function, whereas if we put the actual formula "OFFSET,INDIRECT,SUM()()()()etc" it would not be readable/parseable. Of course, in that case, I'd have to construct the value field by parsing the cells to the right in Excel, which would make the Value formula messy but at least it would be a self-describing row.
Can anyone suggest a better method?
Poor-man's UDF
So I think what we're going to have to do is this;
A B C D E
1 [Value][function][parameter1][parameter2][parameter3]
2 24050 BreakEven C1:C20 A1:A20
3 111 mySum 1 10 100
Where "BreakEven" is a "named function". Here's the formula for "mySum";
=sum(C1:E1)
To evaluate functions listed in B, we just put this in column A (transposing the same value for all rows in column A;
=value(B)
This works because A2 and A3 both evaluate column B as a value, which causes BreakEven and Sum to run (as poor-man's UDFs) in the context of A2 and A3. The range (C1:E1) is relative of course.
So in effect, we can write any function name in column B (as long as there's a corresponding named function defined in the workbook which can be as complex as you like). Columns C, D and E act as the parameters for the function on the same row.
I would have loved to just be able to write the following in column A instead;
=mySum(1,10,100)
But in the absence of that support, the mechanism above serves to provide a readable parameterised function that would be understandable by a user, that's also machine readable (works in CSV too) and allows us to offload our re-usable functions to a library sheet somewhere in the workbook for maintenance.
Not perfect, but an acceptable compromise, unless anyone has a clever way of doing this in a single cell?
Not really an answer, but easier to illustrate here than in a comment. Although you can't rename formulas in a simplistic way - I like your suggestion actually I've never thought about that before; but then I've never worked in a non-macro environment so this has never occurred- you can add notes into the actual formula explaining what it does. For example:
=N("This is a really complex BreakEven Formula")+SUM(3,4,5)
Is a perfectly valid formula. As I said, not really an answer, but could potentially add clarity to a complex formula
You can do this with a small trick
For example to create effectively a cuberoot UDF that emulates =cuberoot(x) then name a variable as cuberoot with a 'value' like this.
=(RC[-1])^(1/3)
Now you can either do this using a temporary switch to RC mode, or put the cursor in say cell E5 and type the name value as =(D5)^(1/3)
Now whenever you need a cuberoot you can put the argument in any cell and put =cuberoot in the cell to its right. It really works and follows true Excel rules.
I use it for multiparameter models that have the single 'argument' Time as a dependent variable. I then define the term Model as the model equation eg =a+bTime+cTime^2
where a,b,c are already named locations holding unique parameter values -
and then define Time as =RC[-1]
My sheets are filled with cells simply saying =Model and have the required time value to the left (ie their argument). It is simple to extend to multi arg functions using multiple cells. It usually fits in well with spreadsheet layouts. Change the definition of your model once in the define name box and all places change simultaneously.
I have a function called ToDMS which takes the decimal degree value in the preceding cell and converts it to a deg Min and Sec string - very tidy.
You need the degrees to be in a single cell but want it in the alt. form in another cell
elegant, simple and it works
Bob Jordan