Array evaluation of find_text argument in SEARCH() function - excel

Say I have the following:
Entering the following formulas in cell C1 and then clicking Evaluate Formula->Evaluate produces very different results:
Formula 1: B$1:B$5 evaluates as non-array
{=SEARCH(B$1:B$5,A1)}
Formula 2: B$1:B$5 evaluates as an array
{=IF(SEARCH(B$1:B$5,A1),"")}
Why, exactly, is this? What is the cause of this behavior? If possible, please provide other examples using other Excel functions to illustrate what is happening here.
Parenthetically:
My question came about while experimenting with the accepted answer to this question.

In general, an array of values will only be returned by a worksheet function given that the following two conditions are satisfied:
1) The formula in question is either in itself capable of returning an array of values, or else is contained within a larger set-up of several functions, one or more of those which precede the function in question (and therefore act upon it) having that property. Whether that capability is something which requires coercion (i.e. via array-entry (CSE)) or is an in-built feature of the function is not important in terms of the answer you are seeking.
2) The array generated must be passed to a further function for processing. Excel is more teleological than you think: it has no great belief in returning an array of values as an end in itself.
As for your example, it's not that SEARCH, when array-entered, isn't capable of processing arrays (it is). It's more that there is no further function incited which is to act upon that array. In the IF version, there is precisely that, though again, if you process that one more time you'll find that your current array is reduced to just the first element in that array. Wrap a further function around the IF, e.g. SUM, and you'll be able to go one step further, and so on and so on.
And here is a major difference between evaluating formulas via the Evaluate Formula tool, and repeated "evaluation" via selecting various parts of the function in the formula bar and pressing F9.
The latter will always return an array of values, whether the above two conditions are satisfied or not. However - and not many people realise this - the "evaluation" so obtained can, ultimately, lead to incorrect results, and so should only be used providing one is aware of its limitations.
Take the following example, for instance:
With A1:A10 empty, the formula:
=SUMPRODUCT(0+(A1:A10=""))
correctly returns 10.
Now select just the part A1:A10 in the formula bar and press F9. Excel, being forced to "evaluate" the range, returns:
=SUMPRODUCT(0+({0;0;0;0;0;0;0;0;0;0}=""))
which, on further processing, results (correctly, it would seem) in the quite different result of 0.
Regards

Related

Performance problem with INDEX function and names introduced inside LET/LAMBDA

Excel's INDEX function shows strange behaviour when the object being indexed is a name introduced in a LET construct or a parameter of a LAMBDA. The behaviour is consistent on Windows and Mac.
Suppose cell B2 contains the formula
=SEQUENCE(50000)
Then the following formulas are all relatively performant:
=MAP(SEQUENCE(2000),LAMBDA(x,INDEX(B2#,x)))
=LAMBDA(MAP(SEQUENCE(2000),LAMBDA(x,INDEX(B2#,x))))()
=LET(mgen,LAMBDA(B2#),LAMBDA(MAP(SEQUENCE(2000),LAMBDA(x,INDEX(mgen(),x)))))()
The following formulas, however, are terribly slow
=LET(m,B2#,MAP(SEQUENCE(2000),LAMBDA(x,INDEX(m,x))))
=LET(m,B2#,LAMBDA(MAP(SEQUENCE(2000),LAMBDA(x,INDEX(m,x)))))()
=LAMBDA(m,MAP(SEQUENCE(2000),LAMBDA(x,INDEX(m,x))))(B2#)
The performance problem gets the worse the longer array B2 is. You can crash Excel with the latter 3 formulas when you replace 50000 by 500000 in B2, while the first three formulas still work perfectly fine.
Note that the length of B2 (the array indexed) should in theory not have any impact on performance, since INDEX is called the exact same number of times in all my examples.
To me, INDEX seems to have performance problems whenever the first argument does not directly refer to a worksheet range.
Yet if that is so -- how can I efficiently (in constant time) get the n-th element of a LET/LAMBDA-named array?
I cannot work around this by writing the indexed array into a cell, since in my case, the indexed array is the result of another lambda.
Edit for clarification: The only purpose of the MAP/SEQUENCE(2000) construct in my examples is to make 2000 separate calls to INDEX, in order for the performance differences to become visible. The construct is completely unrelated to the problem. The performance problems occur whenever I make a lot of INDEX calls with the first argument being a LET/LAMDA name.
Second edit: It seems, however, that some kind of loopy construct is needed to reproduce the problem. I have not been able to reproduce the problem with 2000 separate LET/INDEX formulas in 2000 cells.

Excel's LAMBDA with a "kind of" composite function

Ever since I learnt that Excel is now Turing-complete, I understood that I can now "program" Excel using exclusively formulas, therefore excluding any use of VBA whatsoever.
I do not know if my conclusion is right or wrong. In reality, I do not mind.
However, to my satisfaction, I have been able to "program" the two most basic structures of program flow inside formulas: 1- branching the control flow (using an IF function has no secrets in excel) and 2- loops (FOR, WHILE, UNTIL loops).
Let me explain a little more in detail my findings. (Remark: because I am using a Spanish version of Excel 365, the field separator in formulas is the semicolon (";") instead of the comma (",").
A- Acumulator in a FOR loop
B- Factorial (using product)
C- WHILE loop
D-UNTIL loop
E- The notion of INTERNAL/EXTERNAL SCOPE
And now, the time of my question has arrived:
I want to use a formula that is really an array of formulas
I want to use an accumulator for the first number in the "tuple" whereas I want a factorial for the second number in the tuple. And all this using a single excel formula. I think I am not very far away from succeeding.
The REDUCE function accepts a LET function that contains 2 LAMBDAS instead of a single LAMBDA function. Until here, everything is perfect. However, the LET function seems to return only a "single" function instead of a tuple of functions
I can return (in the picture) function "x" or function "y" but not the tuple (x,y).
I have tried to use HSTACK(x,y), but it does not seem to work.
I am aware that this is a complex question, but I've done my best to make myself understood.
Can anybody give me any clues as to how I could solve my problem?
Very nice question.
I noticed that in your attempts you have given REDUCE() a single constant value in the 1st parameter. Funny enough, the documentation nowhere states you can't give values in array-format. Hence you could use the 1st parameter to give all the constants in (your case; horizontal) array-format, and while you loop through the array of the 2nd parameter you can apply the different types of logic using CHOOSE():
=REDUCE({0,1},SEQUENCE(5),LAMBDA(a,b,CHOOSE({1,2},a+b,a*b)))
This way you have a single REDUCE() function which internal processes will update the given constants from the 1st parameter in array-form. You can now start stacking multiple functions horizontally and input an array of constants, for example:
=REDUCE({0,1,100},SEQUENCE(5),LAMBDA(a,b,CHOOSE({1,2,3},a+b,a*b,a/b)))
I suppose you'd have to use {0\1} and {1\2} like I'd have to in my Dutch version of Excel.
Given your accumulator:
Formula in A1:
=REDUCE(F1:G1,SEQUENCE(F3),LAMBDA(a,b,CHOOSE({1,2},a+b,a*b)))

Excel - Row-wise calculation without helpers

Recently I answered a question about how to retrieve the MEDIAN() of each MEDIAN() in a 2-column matrix without helpers, e.g:
The row-wise calculation without helpers wasn't too hard because the median with only two values is always the average. Therefor a simple formula was all it took:
=MEDIAN((A1:A3+B1:B3)/2)
But for curiosity sakes however, wat if I would have at least a 3-column matrix?
The median will actually need to be calculated. Here the medians are {8,2,2}.
I can't seem to find a way to to get a row-wise calculation for 3+columns. In this case it's about MEDIAN() but I can imagine there could be other functionalities. Since this could be simplified data I don't want to resort to something like =MEDIAN(MEDIAN(A1:C1),MEDIAN(......
I tried to fiddle around with OFFSET(), though not a fan of volatile functions I was hoping it would either work directly with an array, or would be triggered correctly through using MEDIAN(LET(X,SEQUENCE(ROWS(A1:A3)),MEDIAN(OFFSET(A1:C1,X-1,0)))). I then moved on to combinations of either MMULT() or LARGE(), however none of my attempt were succesfull.
Question
So the question ultimately is; how do we return the result (array) of an row-wise calculation without helpers. And if not possible, that's also a perfectly fine answer so I can rest my head =)
New Answer
With the new BYROW() function one could use:
=MEDIAN(BYROW(A1:C3,LAMBDA(a,MEDIAN(a))))
The nested LAMBDA() in the 2nd parameter makes it a piece o' cake to loop all rows in a dynamic array (not a range per se).
Previous Answer (Pre-BYROW())
So. After a long thought, as far as my understanding goes this is not possible through current formulae. However, currently in BETA, Excel365 will feature the new LAMBDA() function which makes it possible to create your own function without VBA and even recursively call itself. It isn't the prettiest of solutions but I thought I would share what I did here:
Formula in E3:
=MED(A1:C3,"",ROWS(A1:A3))
Where MED() is our own LAMBDA() function created at the "name manager" menu. It reads:
=LAMBDA(rng,txt,rws,IF(rws=0,MEDIAN(FILTERXML("<t><s>"&txt&"</s></t>","//s")),MED(rng,TEXTJOIN("</s><s>",,txt,MEDIAN(INDEX(rng,rws;0))),rws-1)))
As can be seen there are 4 main parameters of which 3 variables:
rng - The range to be examined.
txt - A reserved variable to be used in FILTERXML().
rws - A counter.
The 4th parameter is a nested IF() which if the counter is as 0 will return the median of all medians. This is done through FILTERXML() which I will not get into detail right now.
If the counter is not yet at 0 it will recursively call the LAMBDA() function untill it is, and what it does is using the same three parameters but we can alter them right there and then. Therefor we leave rng intact, we concatenate the MEDIAN() of the row (current counter) through TEXTJOIN() to create a valid xml construct. And last but not least we need to lower the counter.
It's a struggle, but with LAMBDA() it will now be possible to do a rowwise calculation.
Note, if you are interested in the FILTERXML() construct, you might like this post where I now also included a LAMBDA() version of a SPLIT() function.

Frequency() with arrays: adds an element to return arrays

I'm using the following formula as named formula (via name manager). It is then used in a larger sumproduct(). The goal is to ensure that with an array calculation, the calculation is only made once for certain groups of rows (e.g. you have the same data repeated accross many rows for category A. I only need to know how many people are in category A once).
=IF(FREQUENCY(IF(LEN(tdata[reportUUID])>0,MATCH(tdata[reportUUID],
tdata[reportUUID],0),0),IF(LEN(tdata[reportUUID])>0,MATCH(tdata[reportUUID],
tdata[reportUUID],0),0))>0,TRUE)
Let's step through the results one by one with the evaluate formula in Excel. Sorry for the screenshot, but Excel doesn't allow to copy actual steps with real data....
In order of steps:
In the last image, there's now a 7th item in my array. I only have 6 row of data, hence why for the previous steps I only had 6 items in the array, as was expect.
This is messing up my calculations, because the return array from this function gets multiplied by others arrays which all have 6 items (or whatever is the number of data rows I have).
What is this 7th item, and how can I either get ride of it or prevent it from return errors?
I did try to wrap some formula into iferror() or ifna(), however it doesn't feel clean. I feel this might backfire and isn't a strong way to handle this. I rather take it at the source....
EDIT: For example of use with other arrays:
{=SUMPRODUCT(--IFERROR(((tdata[_isVisible]=1)*(f_uniqueUUIDfactor),0))}
Where f_uniqueUUIDfactor is the formula from the initial post. tdata[_isVisible]=1 is used as a way to filter data on the dashboard (e.g. through dropdown, the users can set ranges for dates, and with VBA I hide the rows in the raw data NOT within the range).
The point is that sumproduct() ends up multipliying each raw data row thogheter as 0 & 1 s, so that only those meeting all the criterias get returned. The IFERROR() above is the workaround for the extra array element introduced by frequency(). It works as is, but if a cleaner way exists I'd prefer that. I would also be keen on understanding why that elements get added.
This is a good example of why it is preferable to use multiple, recursive IF statements when evaluating arrays over multiple criteria, rather than form the product of those arrays.
Firstly, though, before coming to the reason for that statement, I should point out a few minor technical inaccuracies/flaws with your construction also.
1) By including a value_if_false clause in your constructions being passed as FREQUENCY's data_array and bins_array parameters, you are risking incorrect results, since zero is a valid numerical to be considered by FREQUENCY, whereas a Boolean FALSE (which would be the equivalent entry in the resulting array had you omitted the value_if_false clause altogether) is disregarded by this function.
2) MATCH with an exact (i.e. 0, or FALSE) match_type parameter is a relatively resource-heavy construction, particularly if the range to be considered is quite large. As such, and since it is not necessary to use this construction for FREQUENCY's bins_array parameter, it is preferable to use the more efficient:
ROW(tdata[reportUUID])-MIN(ROW(tdata[reportUUID]))+1
Moreover, note that repetition of the IF(LEN construction is also not necessary within this second parameter.
In all, then:
IF(FREQUENCY(IF(LEN(tdata[reportUUID])>0,MATCH(tdata[reportUUID],tdata[reportUUID],0)),ROW(tdata[reportUUID])-MIN(ROW(tdata[reportUUID]))+1)>0,TRUE)
is considerably more rigorous and more efficient than the version you give.
To answer your main question, it is well-documented that FREQUENCY always returns an array having a number of entries one greater than that of the bins_array passed.
As mentioned in my comment to your post, the resolution to the problem you are facing largely depends on precisely what further manipulation you are intending for the resulting array.
However, let's assume for the sake of an explanation that you simply wish to multiply the array resulting from your FREQUENCY construction by some other column within your table, tdata[Column2] say, and then sum the result.
The difference between:
=SUM(IF(FREQUENCY(IF(LEN(tdata[reportUUID])>0,MATCH(tdata[reportUUID],tdata[reportUUID],0)),ROW(tdata[reportUUID])-MIN(ROW(tdata[reportUUID]))+1)>0,TRUE)*tdata[Column2])
i.e. using multiplication of the two arrays, and:
=SUM(IF(FREQUENCY(IF(LEN(tdata[reportUUID])>0,MATCH(tdata[reportUUID],tdata[reportUUID],0)),ROW(tdata[reportUUID])-MIN(ROW(tdata[reportUUID]))+1)>0,tdata[Column2]))
i.e. using a straightforward IF clause, is here crucial.
In fact, the former will always return an error, whereas the latter, in general, will not.
The reason is that the former will resolve to (assuming that your table has e.g. 10 rows' worth of data and assuming some random Boolean results to the FREQUENCY construction):
=SUM(IF({TRUE;TRUE;TRUE;FALSE;FALSE;FALSE;FALSE;FALSE;TRUE;TRUE;FALSE},TRUE)*tdata[Column2])
which is, since the value_if_true clause is superfluous here:
=SUM({TRUE;TRUE;TRUE;FALSE;FALSE;FALSE;FALSE;FALSE;TRUE;TRUE;FALSE}*tdata[Column2])
whereas the second construction I give will resolve to:
=SUM(IF({TRUE;TRUE;TRUE;FALSE;FALSE;FALSE;FALSE;FALSE;TRUE;TRUE;FALSE},tdata[Column2]))
The two may look identical, but the fact that the former is using multiplication to resolve the array, whereas the latter is not, is the key difference.
Although in both cases the array resulting from the FREQUENCY construction, i.e.:
{TRUE;TRUE;TRUE;FALSE;FALSE;FALSE;FALSE;FALSE;TRUE;TRUE;FALSE}
comprises 11 entries (i.e. 1 more than the number of entries in the second array being considered), the difference is that, when you then attempt to multiply an 11-element array with a 10-element array (i.e. tdata[Column2]), Excel, rather than outright disallowing such an operation, artificially redimensions the smaller of the two arrays such that it matches the dimensions of the larger.
In doing so, however, any additional entries are automatically set as #N/A error values.
Effectively, then:
=SUM({TRUE;TRUE;TRUE;FALSE;FALSE;FALSE;FALSE;FALSE;TRUE;TRUE;FALSE}*tdata[Column2])
is resolved as:
=SUM({TRUE;TRUE;TRUE;FALSE;FALSE;FALSE;FALSE;FALSE;TRUE;TRUE;FALSE}*{38;67;49;3;10;11;97;20;3;57;#N/A})
i.e., as mentioned, the second, 10-element array is redimensioned to one of 11 elements in an attempt to form a legitimate operation. And, as also mentioned, that 11th element is #N/A, which means of course that the entire construction will also result in that value.
In the non-multiplication version, however, i.e.:
=SUM(IF({TRUE;TRUE;TRUE;FALSE;FALSE;FALSE;FALSE;FALSE;TRUE;TRUE;FALSE},tdata[Column2]))
although the same redimensiong also takes place, we are saved by our use of an IF clause in place of multiplication, since the above resolves to:
=SUM(IF({TRUE;FALSE;TRUE;FALSE;TRUE;TRUE;TRUE;FALSE;TRUE;FALSE;FALSE},{38;67;49;3;10;11;97;20;3;57;#N/A}))
and the Boolean FALSE in the 11th position here 'overrides' the error value in the equivalent position from the second array, since the above resolves to:
=SUM({38;FALSE;49;FALSE;10;11;97;FALSE;3;FALSE;FALSE})
Regards

Is Excel's 'Dependency-Chain' increased or decreased?

I have a formula which includes the use of a UDF that calculated National insurance liability (a tax in the UK). It is complex and use many times, so it is slow to calculate.
Some of the cells it gets the salary from are unused, and so have zeros in.
is the formula:
=NI_Calc(D9,A1:A5,B1:B5,F1:F5)
better than:
=IF(D9=0,0,NI_Calc(D9,A1:A5,B1:B5,F1:F5))
i.e. does the IF function instruct Excel not to calculate the UDF 'NI_Calc'? thereby reducing the load.
or does Excel calculate the UDF anyway and the IF function just adds to its load?
Thanks
the IF statement in excel will not calculate the second part of the statement if the first part satisfies the condition. From the docs...
"When Excel finishes evaluating the first condition, the results may match (in which case the Approve result appears) or they may not. If it's not a match, the parent IF function has already run through two of its three arguments. You still have two possible outcomes! You complete your formula by nesting your second IF function in the third argument (value_if_false) of the parent IF. The nested IF becomes the self-contained third argument of the parent IF. When the nested IF finishes evaluating, it decides between the two remaining possible outcomes, displays the result, and the function ends."
https://support.office.com/en-us/article/IF-function-69aed7c9-4e8a-4755-a9bc-aa8bbff73be2
EDIT: Actually, thats not really massively clear...
this answer Does Excel evaluate both result arguments supplied to the IF function? shows it better

Resources