Background: I'm using Excel functions to parse a lot of data out, essentially creating a flexible pivot table. It sorts a lot of race timing data by car, etc. In this portion of the sheet, I'm searching for the minimum segment times for each car. The rest of the sheet avoids macros and VBA so I'd like to avoid that here.
Issue: My formula worked when there are no zeros, but sometimes there are zeros that I need to exclude. My array formula is pretty complicated, but the change I made that broke it is this:
OLD (working):
{=min(if(car_number = indirect("number_vector"), indirect("data_vector")))}
NEW (non-working):
{=min(if(and(car_number = indirect("number_vector"),not(0=indirect("data_vector"))), indirect("data_vector")))}
I am using INDIRECT() with this exact argument several times in the formula. However, in this particular instance (inside the NOT()), it returns #VALUE! instead of {data1;...;datan}. Please see the screencaps below.
Before evaluation:
After evaluation:
I suspect that your AND function might be a problem - AND only returns a single result not an array of results as required, try using multiple IFs like this
=min(if(car_number = indirect("number_vector"),IF(indirect(data_vector)<>0, indirect(data_vector))))
Note that I also used <> rather than using NOT
Are data vector and number vector the same size and shape? (both vertical?)
why are there quotes around one but not the other?
Related
Excel's INDEX function shows strange behaviour when the object being indexed is a name introduced in a LET construct or a parameter of a LAMBDA. The behaviour is consistent on Windows and Mac.
Suppose cell B2 contains the formula
=SEQUENCE(50000)
Then the following formulas are all relatively performant:
=MAP(SEQUENCE(2000),LAMBDA(x,INDEX(B2#,x)))
=LAMBDA(MAP(SEQUENCE(2000),LAMBDA(x,INDEX(B2#,x))))()
=LET(mgen,LAMBDA(B2#),LAMBDA(MAP(SEQUENCE(2000),LAMBDA(x,INDEX(mgen(),x)))))()
The following formulas, however, are terribly slow
=LET(m,B2#,MAP(SEQUENCE(2000),LAMBDA(x,INDEX(m,x))))
=LET(m,B2#,LAMBDA(MAP(SEQUENCE(2000),LAMBDA(x,INDEX(m,x)))))()
=LAMBDA(m,MAP(SEQUENCE(2000),LAMBDA(x,INDEX(m,x))))(B2#)
The performance problem gets the worse the longer array B2 is. You can crash Excel with the latter 3 formulas when you replace 50000 by 500000 in B2, while the first three formulas still work perfectly fine.
Note that the length of B2 (the array indexed) should in theory not have any impact on performance, since INDEX is called the exact same number of times in all my examples.
To me, INDEX seems to have performance problems whenever the first argument does not directly refer to a worksheet range.
Yet if that is so -- how can I efficiently (in constant time) get the n-th element of a LET/LAMBDA-named array?
I cannot work around this by writing the indexed array into a cell, since in my case, the indexed array is the result of another lambda.
Edit for clarification: The only purpose of the MAP/SEQUENCE(2000) construct in my examples is to make 2000 separate calls to INDEX, in order for the performance differences to become visible. The construct is completely unrelated to the problem. The performance problems occur whenever I make a lot of INDEX calls with the first argument being a LET/LAMDA name.
Second edit: It seems, however, that some kind of loopy construct is needed to reproduce the problem. I have not been able to reproduce the problem with 2000 separate LET/INDEX formulas in 2000 cells.
Alright this should be a simple one.
I apologize in case it has been already solved, but I can only find posts related to solving this issue with programming languages and not specifically to EXCEL.
Furthermore, I could find posts that address a sub-problem of my question (e.g. regarding limitation of certain EXCEL functions) and should solve/invalidate my request but maybe, just maybe, there is a workaround.
Problem statement:
I want to calculate the minimum value for each column in an EXCEL matrix. Simply enough, I want to input a 2D array (mxn matrix) in a function and output an array with dimension 1xm where each item is the minimum value MIN(nj) of each nj column.
However, I want to solve this with specific constraints:
Avoid using VBA and other non-function scripting: that I could devise myself;
All in one function: what I want to achieve here is to have one and one function only, not split the problem into multiple passages (such as for example copypasting a MIN() function below each column, that wouldn't do it);
The result should be a transposable array (which is already ok, I assume);
Where I am stranded with my solution so far:
The main issue here is that any function I am trying to use takes the entire matrix as a single array input and would calculate the MIN() of the entire matrix, not each column. My current (not working) function for an exemplary 4x4 matrix in range A1:D4 would be as below (the part in bold is where it is clearly not working):
=MIN(INDEX(A1:D4,SEQUENCE(4,4,1,1)))
which ofc does not work, because INDEX() does probably not "understand" SEQUENCE() as an array of items to take into account. Another, not working, way of solving this is to input a series of ranges (A1:A4;B1:B4;C1:C4;D1:D4) so that INDEX() "understands" the ranges as single columns, but ofc does not know and I do not know sincerely how to formulate that. I could use INDIRECT() in some way to reference the array of ranges, but do not know how and could find a way by searching online.
Fundamental question is: can a function, which works with single arrays, also work with multiple arrays? Basically, I do not know how to communicate an EXCEL array formula, that each batch of data I am inputting is a single array and must be evaluated separately (this is very easily solved with for() cycles, I know).
Many thanks for any suggestion and any workaround, any function and solution works as longs as it fits in the constrains defined above (maybe a LAMBA() function? don't know).
This is ofc a simplification of a way more complex problem (I am trying to calculate the annual mean temperature evolution for a specific location by finding the value - for each year from 1950 to 2021 - that is associated to the lat/lon coordinates that are the nearest to the one of the location inputted, given a netCDF-imported grid of time-arrayed data; the MIN() function is used to selected the nearest location, which is then used, via INDEX() to find temp data). I need to do this in one hit (meaning just pasting the function, which evaluates a matrix of data that is referenced by a fixed range), so that I can just use it modularly for other data sets. I already have a working solution, which is "elegant"* enough, but not "elegant"* as the one I could develop solving this issue.
*where "elegant"= it saves me one click every time for 1000+ datasets when applying the function.
If I understand your problem correct then this should solve it:
=BYCOL(A1:D4,LAMBDA(d,MIN(d)))
Ever since I learnt that Excel is now Turing-complete, I understood that I can now "program" Excel using exclusively formulas, therefore excluding any use of VBA whatsoever.
I do not know if my conclusion is right or wrong. In reality, I do not mind.
However, to my satisfaction, I have been able to "program" the two most basic structures of program flow inside formulas: 1- branching the control flow (using an IF function has no secrets in excel) and 2- loops (FOR, WHILE, UNTIL loops).
Let me explain a little more in detail my findings. (Remark: because I am using a Spanish version of Excel 365, the field separator in formulas is the semicolon (";") instead of the comma (",").
A- Acumulator in a FOR loop
B- Factorial (using product)
C- WHILE loop
D-UNTIL loop
E- The notion of INTERNAL/EXTERNAL SCOPE
And now, the time of my question has arrived:
I want to use a formula that is really an array of formulas
I want to use an accumulator for the first number in the "tuple" whereas I want a factorial for the second number in the tuple. And all this using a single excel formula. I think I am not very far away from succeeding.
The REDUCE function accepts a LET function that contains 2 LAMBDAS instead of a single LAMBDA function. Until here, everything is perfect. However, the LET function seems to return only a "single" function instead of a tuple of functions
I can return (in the picture) function "x" or function "y" but not the tuple (x,y).
I have tried to use HSTACK(x,y), but it does not seem to work.
I am aware that this is a complex question, but I've done my best to make myself understood.
Can anybody give me any clues as to how I could solve my problem?
Very nice question.
I noticed that in your attempts you have given REDUCE() a single constant value in the 1st parameter. Funny enough, the documentation nowhere states you can't give values in array-format. Hence you could use the 1st parameter to give all the constants in (your case; horizontal) array-format, and while you loop through the array of the 2nd parameter you can apply the different types of logic using CHOOSE():
=REDUCE({0,1},SEQUENCE(5),LAMBDA(a,b,CHOOSE({1,2},a+b,a*b)))
This way you have a single REDUCE() function which internal processes will update the given constants from the 1st parameter in array-form. You can now start stacking multiple functions horizontally and input an array of constants, for example:
=REDUCE({0,1,100},SEQUENCE(5),LAMBDA(a,b,CHOOSE({1,2,3},a+b,a*b,a/b)))
I suppose you'd have to use {0\1} and {1\2} like I'd have to in my Dutch version of Excel.
Given your accumulator:
Formula in A1:
=REDUCE(F1:G1,SEQUENCE(F3),LAMBDA(a,b,CHOOSE({1,2},a+b,a*b)))
So I quite often find myself doing tasks on Excel which involve evaluating a text string as an array. Generally speaking I just use this:
Function EVAL(Ref As String)
EVAL = Evaluate(Ref)
End Function
So the formula will be, for example:
=EVAL("{"&CHAR(34)&SUBSTITUTE(TEXTJOIN(";",TRUE,MID(Index[Industries],2,LEN(Index[Industries])-2)),";",CHAR(34)&";"&CHAR(34))&CHAR(34)&"}")
The cells in this example will have contents like:
;Automotive;Rail;Energy;
;Automotive;Rail;
;Energy;
;Automotive;Aerospace;
(As it happens this is the precise problem I'm stuck on right now, though it has come up in different ways in the past.)
This has worked for me in the past, but I've been running into difficulties lately.
I have come to the conclusion it isn't working because application.evaluate, it turns out, has a character limit of 255. I've seen examples of VBA tricks to bypass this for text strings that are formulas rather than arrays, but copy-pasting those they don't seem to work for when I'm using it to interpret a text string as an array rather than as a formula.
Is there some trick to get this to work? (Or, indeed, is there some alternative method to achieve this altogether?)
Right, as per my comments, if you are using ms365, you could avoid your workbook to be xlsm just because you need to split values into an array. Make use of what is available with native functions, for example:
Formula in C2:
=TEXTSPLIT(CONCAT(A1:A4),,";",1)
Formula in D2:
=FILTERXML("<t><s>"&SUBSTITUTE(CONCAT(A1:A4),";","</s><s>")&"</s></t>","//s[node()]")
Note 1: As per time of writing you'd need to enable the BETA-channel to gain access to TEXTSPLIT(), and if I recall correctly your version (2203) is allowed to start using this function. Just google how to get access and update your Excel.
Both options can obviously be nested inside the UNIQUE() function.
Note 2: If at any point CONCAT()'s limits are reached (32767 characters, thanks #ScottCraner), maybe you can avoid using that with help of the lambda's helper function REDUCE():
=TEXTSPLIT(REDUCE("",A1:A4,LAMBDA(a,b,a&b)),,";",1)
Note 3: In case you can't update your Excel just yet, and you wonder how to use FILTERXML(), don't mind me refering you to another post I wrote a while back here.
I've been using a rather long embedded CUBEVALUE() function, which is a pain to work with. It looks something like:
=IFERROR(VALUE(CUBEVALUE(arg1;arg2;arg3));CUBEVALUE(arg1;arg2;arg3))
Due to the CUBEVALUE function and its arguments, it's becoming a REALLY long function and thus not easy to work with. Since there are only 3 arguments, which are written in different cells, I'd like to create something like
=MyFunction(A1,A2,A3)
and use A1, A2 and A3 as "arg1, arg2, arg3" in the function mentioned first. This way its possible to "pull" the function so it would calculate using the input in B1:B3 and C1:C3 etc. as well.
The function works fine and can be pulled through and such, but my question is how to rename this loooong function into something more user-friendly, as it requires only 3 cells as an input and the rest of the text in the function just makes it hard to use for end-users.
Using UDF is not an option because CUBEVALUE can't be called through VBA... and any attempt to stich strings together and using the final result with INDIRECT also seems to fail..
In a similar question on this site, someone refers to using "asynchronous UDF's", but no further information was given (and what I could find seemed irrelevant).
You shouldn't really have several long cube functions. Allocate some space in hidden rows/columns or in header rows/columns to add your cubemember functions. Then throughout most of your report, you should just have cubevalue functions that reference other cells with error handling around them. Proper use of absolute and relative references are your friend.
Peter Meyers has some great tips for this here, slides 20 - 24. I have an example Excel file with cube functions on my blog here.