I was trying to calculate the R-squared value of two arrays using RSQ function. One array is fixed, another is located in different columns. I want to generate a code so that i will be able to generate R-squared value for all variables by drag down the cell.
i tried
=RSQ($H$4:$H$102,OFFSET($A$4:$A$102,0,ROW(Z3)-2))
where ROW(Z3)-2 = 1 and the offset part should refer to B4:B102.
The result of RSQ was #N/A. But when I tried SUM(OFFSET($A$4:$A$102,0,ROW(Z3)-2)) it do gives me a correct sum for B4:B102. Can anyone help me out with this problem?
thanks!!!
=RSQ($H$4:$H$102,OFFSET($A$4:$A$102,0,MAX(ROW(Z3)-2)))
The problem seems to be that ROW(n) returns a 1x1 array. I'm guessing Excel is complaining that the 1x1 array is not sized the same as other arrays you are using. Wrapping it in MAX seems to work around this by returning the value in that array, and calculation goes on.
I must say I have not noticed this behavior before. Good question.
Related
Given a list, I would like to make a list of cumulative values:
I have created a formula which works =MAKEARRAY(1;5;LAMBDA(r;c;SUM(INDEX(F5:J5;1;1):INDEX(F5:J5;1;c)))).
However, I realize that by wrapping it with LET the formula =LET(ps; F5:J5; MAKEARRAY(1;COLUMNS(ps);LAMBDA(r;c;SUM(INDEX(ps;1;1):INDEX(ps;1;c))))) returns a list of #VALUE.
Does anyone know the reason?
Additionally, does anyone have a better idea to make such a cumulative list by Excel formulas (my solution is not optimal)? If we don't use the LAMBDA function and the helper functions, that will be even better.
When using a range in LET it is no longer stored as a range, but as an array in memory.
The position in the sheet is no longer relevant. But when you use INDEX(start,1,1):INDEX(end,1,1) references, the position is relevant, because where in the sheet would for instance start be, and where would end be? By using LET you tell Excel to no longer store it's position and you therefore can use the individual values in each "range", but you can't refer to anything in between the two.
Instead you could better use SCAN for this;
=SCAN(0,SEQUENCE(1,5),LAMBDA(a,b,a+b))
Or =SCAN(0,F5:J5,LAMBDA(a,b,a+b))
Alright this should be a simple one.
I apologize in case it has been already solved, but I can only find posts related to solving this issue with programming languages and not specifically to EXCEL.
Furthermore, I could find posts that address a sub-problem of my question (e.g. regarding limitation of certain EXCEL functions) and should solve/invalidate my request but maybe, just maybe, there is a workaround.
Problem statement:
I want to calculate the minimum value for each column in an EXCEL matrix. Simply enough, I want to input a 2D array (mxn matrix) in a function and output an array with dimension 1xm where each item is the minimum value MIN(nj) of each nj column.
However, I want to solve this with specific constraints:
Avoid using VBA and other non-function scripting: that I could devise myself;
All in one function: what I want to achieve here is to have one and one function only, not split the problem into multiple passages (such as for example copypasting a MIN() function below each column, that wouldn't do it);
The result should be a transposable array (which is already ok, I assume);
Where I am stranded with my solution so far:
The main issue here is that any function I am trying to use takes the entire matrix as a single array input and would calculate the MIN() of the entire matrix, not each column. My current (not working) function for an exemplary 4x4 matrix in range A1:D4 would be as below (the part in bold is where it is clearly not working):
=MIN(INDEX(A1:D4,SEQUENCE(4,4,1,1)))
which ofc does not work, because INDEX() does probably not "understand" SEQUENCE() as an array of items to take into account. Another, not working, way of solving this is to input a series of ranges (A1:A4;B1:B4;C1:C4;D1:D4) so that INDEX() "understands" the ranges as single columns, but ofc does not know and I do not know sincerely how to formulate that. I could use INDIRECT() in some way to reference the array of ranges, but do not know how and could find a way by searching online.
Fundamental question is: can a function, which works with single arrays, also work with multiple arrays? Basically, I do not know how to communicate an EXCEL array formula, that each batch of data I am inputting is a single array and must be evaluated separately (this is very easily solved with for() cycles, I know).
Many thanks for any suggestion and any workaround, any function and solution works as longs as it fits in the constrains defined above (maybe a LAMBA() function? don't know).
This is ofc a simplification of a way more complex problem (I am trying to calculate the annual mean temperature evolution for a specific location by finding the value - for each year from 1950 to 2021 - that is associated to the lat/lon coordinates that are the nearest to the one of the location inputted, given a netCDF-imported grid of time-arrayed data; the MIN() function is used to selected the nearest location, which is then used, via INDEX() to find temp data). I need to do this in one hit (meaning just pasting the function, which evaluates a matrix of data that is referenced by a fixed range), so that I can just use it modularly for other data sets. I already have a working solution, which is "elegant"* enough, but not "elegant"* as the one I could develop solving this issue.
*where "elegant"= it saves me one click every time for 1000+ datasets when applying the function.
If I understand your problem correct then this should solve it:
=BYCOL(A1:D4,LAMBDA(d,MIN(d)))
I'm trying to make my monthly transaction spreadsheet less work-intensive but I'm running up against problems outputting my category lookups as an array. Right now I have a table with all my monthly transactions and I want to create another table with monthly running totals. What I've been doing is manually summing each entry from each category, but I'd love to automate the process. Here's what I have:
=SUM(INDEX(Transactions[Out], N(IF(1,MATCH(I12,Transactions[Category],FALSE)))))
I've also tried using AGGREGATE in place of SUM but it still only returns the first value in the category. The N(IF()) was supposed to force INDEX to return all the matches as an array, but it's not working. I found that trick online, with no explanation of why it works, so I really don't know how to fix it. Any ideas?
Just in case anyone ever looks at this thread in the future, I was able to find a simpler solution to my problem once I implemented the Transactions[Category]=I12 method. SUM, itself will take an array as an argument, so all I had to do was form an array of the values I wanted to keep from Transactions[Out] range. I did this by adjusting the method Ron described above, but instead of using 1/(Transactions[Category]=I12 I used 1/IF(Transactions[Category]=I12, 1,1000) and surrounded that by a FLOOR(*resulting array*, .01) which rounded all the thousandth's down to zero and didn't yield any #DIV/0! errors.
Then! I realized that the simplest way to get the actual numbers I wanted, rather than messing with INDEX or AGGREGATE, was to multiply the range Transactions[Out] by the binary array from the IF test. Since the range is a table, I know they will always be the same size. And SUM automatically multiplies element by element and then adds for operations like this.
(The result is a "CSE" formula, which I guess isn't everyone's favorite. I'm still not 100% clear on what it means: just that it outputs data in a single cell, rather than over multiple cells. But in this context, SUM should only output a single number, so I'm not sure why I need CSE... A problem for another day!)
In your IF, the value_if_true clause needs to return an array of the desired row numbers from the array.
MATCH does not return an array of values; it only returns a single value which, with the FALSE parameter, will be the first value. That's why INDEX is only returning the first value.
One way to return an array of values:
Transactions[Category]=I12
will return an array of {TRUE,FALSE,FALSE,TRUE,...} depending on if it matches.
You can then multiply that by the Row number to get the relevant row on the worksheet.
Since you are using a table, to obtain the row number in the data body array, you have to subtract the row number of the Header row.
But now we are going to have an array which includes 0's for the non-matching entries, which is not good for us as a row number argument for the INDEX function.
So we get rid of that by using the AGGREGATE function with the ignore errors argument set after we do change the equality test to 1/(Transactions[Category]=I12) which will create DIV/0 errors for the non-matchers.
Putting it all together
=SUM(INDEX(Transactions[Out],AGGREGATE(15,6,1/(Transactions[Category]=I12)*ROW(Transactions)-ROW(Transactions[#Headers]),ROW(INDIRECT("1:"&COUNTIF(Transactions[Category],$I$12))))))
You may need to enter this with CSE depending on your version of Excel.
Also, if you have a lot of these formulas, you may want to change the k argument for AGGREGATE to use the INDEX function (non-volatile) instead of the volatile INDIRECT function.
=SUM(INDEX(Transactions[Out],AGGREGATE(15,6,1/(Transactions[Category]=I12)*ROW(Transactions)-ROW(Transactions[#Headers]),ROW(INDEX($A:$A,1,1):INDEX($A:$A,COUNTIF(Transactions[Category],$I$12),1)))))
Edit
If you have Excel/O365 with dynamic arrays and the FILTER function, you can greatly simplify the above to the normally entered:
=SUM(FILTER(Transactions[Out],Transactions[Category]=I12))
I have a question about Excel & VBA’s function SUMIFS(). I have two codes, but I changed the input-places from one to another (Århus & Odense, but 2 and 3 could also be used):
I need to find the correct sum when these criterias are used. I have tried to google and tried to understand SUMIFS. I tried to simulataing another dataset, with the same amount of variables and changes the different input locations. However, when comparing the 4 difference input places I get the same result
Code 1)
SUM(SUMIFS($D$2:$D$2000;$B$2:$B$2000;{"Odense";"Århus"};_
$C$2:$C$2000;{2;3};$E$2:$E$2000;ABS(I16)))
Code 2)
SUM(SUMIFS($D$2:$D$2000;$B$2:$B$2000;{"Århus";"Odense"};_
$C$2:$C$2000;{2;3};$E$2:$E$2000;ABS(I16)))
Code 1 gives 152832 and Code 2 gives 135751. So I hope that anyone can explain to me why this happen. Or maybe that there is something wrong with the data that is being used.
when using two arrays in SUMIFS if both arrays are vertical or both horizontal then it will matter as it will only do two and compare one to one stepping through each array the same.
If you want to do OR on both arrays then one must be Vertical and the other Horizontal:
SUM(SUMIFS($D$2:$D$2000;$B$2:$B$2000;{"Århus";"Odense"};$C$2:$C$2000;TRANSPOSE({2;3});$E$2:$E$2000;ABS(I16)))
Also note that when not in lockstep the max on the OR type is two arrays.
I'm trying to do some bootstrapping with a data set in Excel with the formula =INDEX($H$2:$H$5057,RANDBETWEEN(2,5057)), where my original data set in is column H. It seems to work most of the time, but there is always approximately one cell that outputs a reference error. Does anyone know why this happens, or how to avoid including that one cell? I'm trying to generate a histogram from this data, and FREQUENCY does not play nice with an array with an error in it.
Please try:
=INDEX($H$2:$H$5057,RANDBETWEEN(1,5056))
=RANDBETWEEN(2,5057) returns a reasonably arbitrary value of 2 or any integer up to and including 5057. Used as above this specifies the position in the chosen array (H2:H5057) - that only has 5056 elements, so one problem would be when RANDBETWEEN hits on 5057. Much easier to observe with just H2:H4 and RANDBETWEEN(2,4).