Excel: Formula to place a random number in an array - excel

This one has got me seriously stumped, so I thought I'd share it with you guys and see what I get :)
General Problem
I have the following data in a spreadsheet
As you can see I have two identical sets of headings 'Cat, Dog, Man...' (the precise names do not matter)
These two sets are classed under either the 'From' column or the 'To' row.
The bulk of the table is an array of numbers between 0 and 1, or empty cells.
Essentially what I want to do is find where a given Rand() number ranks along each row. I then return the item from the 'To' row corresponding to where the random number would be placed.
i.e. Reading along the 1st Cat row, a random number (between 1 and 0 too) is ranked. If it's,say, between 0 and 0.3658... I return Cat, 0.7193... and 1 gives me Van
So ideally I return the value from the To set of headings which is vertically above the upper bound within which the Rand() number lies.
Attempted Solution
To achieve this ranking I've been using the Match(value,array,0) function (0 as my numbers are in ascending order).
A simplified version of my formula is therefore:
=Index(To_Headings,MATCH(RAND(),From_Row,0))
where To_Headings is the array E3:I3 and From_Row is an array I generate using a further formula, resulting in D#:I# (# being a row number which is related to the From headings, so must be an integer between 4 and 8)
However if you are particularly observant, you can see this is where my solution falls short
As I say,ideally I want to find the upper bound of where RAND() lies, as this is always in the same column as my desired output To heading. MATCH() with a parameter of 0 returns the lower bound of where RAND() lies. Typically this is the column 1 to the left of the desired column.
e.g. Reading along the Cat row again - for a random number of 0.5, bounds within which it lies are 0.3658... and 0.7193... .The upper bound is directly below Dog, my desired output. The lower is 1 column to the left of the desired 1, so in my formula I simply shift back to the right when reading off using Index
HOWEVER the blank cells render this useless. For a random number between 0.719... and 1, the lower bound is now two columns to the left of the upper bound. In other instances it can be 3 or 4, in fact any number. This is because the blank cells push the upper bound further right.
Right, bearing all that in mind, can anyone tell me how to rank the RAND() number so it gives me the upper bound? Of course I've tried Match with -1 as the parameter, however because that requires descending order the problem flips too!
I'm thinking I could try counting if there are any blanks to the right of the lower bound, and offsetting my INDEX by that many instead of just 1, but I can see that will add a lot of lines to my code, and I really need to keep it streamlined, as it will be running in about 10000 cells!!

You can use this array formula.
Since your data is arranged in ascending order, this will find the first cell in which the random number is less than or equal to the number in the dataset:
=INDEX($E$3:$I$3,MATCH(TRUE,$J$2<=E4:I4,0))
Being an array formula it must be confirmed with Ctrl-Shift-Enter instead of Enter when exiting edit mode. If done correctly then Excel will put {} around the formula.

I've found another very simple answer through experiment with the Match function.
All you have to do is fill those blanks with the number on the left. Since Match doesn't look for the first instance of a number greater than the random number, but instead the last, by filling in the blanks all the adjustment is done for you.
So the example picture now becomes the following:
So for example, reading along the Cat row again, 0.719... is repeated. For a random input of 0.8, instead of stopping at the Dog column as before (since 0.8<1, therefore 0.719... is the biggest number smaller than the random input), the formula now stops 1 column to the left of the Upper bound every time.
Now I admit #ScottCraner gave a perfect answer to my problem as stated, however an Array formula is stuck to only one size (without VBA) whereas cell formulae can automatically fill a new range, so for my application I cannot use that answer. So I thought I'd add on this solution as a cell formula approach.

Related

Multiply based on condition then sum results ( multiply if empty <> ) in google sheets

I need help in writing a formula in cell b7. The formula must look to the right and multiply the nonempty cells by the corresponding value in row 3, and I would like to sum up the results.
File link provided.
FILE LINK
ScreenShot
Please see my comment to your original post.
That said, I will try to explain how to approach this as I think you intend. (This solution will be a Google Sheets solution which will not work in Excel.)
The first thing you will need to do is to delete everything from Row 11 down: all of your examples and notes must be deleted for the following proposed formula to work correctly.
Once you have no superfluous data below your main chart, delete everything from B6:B (including the header "Total").
Then, place the following formula in cell B6:
={"TOTAL"; FILTER(MMULT(C7:G*1, TRANSPOSE(C$3:G$3*1)), A7:A<>"")}
This formula will return the header text "TOTAL" (which you can change within the formula itself if you like) followed by the calculation you want for each row where a name is listed in A7:A.
MMULT is a difficult function to explain, but it multiplies one matrix ("grid") or numbers by another matrix ("grid") and returns the sum of all products per row (or per column, depending on how you set it up) —— which is what you are trying to do.
MMULT must have every element of both matrices be a real number. To convert potential nulls to zeroes, you'll see *1 appended to each range (since null times 1 is zero).
This assumes that all data entered into C7:G and C3:G3 will always be either a number or null. If you enter text, you'll throw the formula into an error. If you think accidental text entries in those ranges are possible, use this version instead:
={"TOTAL"; FILTER(MMULT(IFERROR(C7:G*1, ROW(C7:G)*0), TRANSPOSE(IFERROR(C$3:G$3*1, COLUMN(C$3:G$3)*0))), A7:A<>"")}
The extra bits use IFERROR to exchange error-producing entries with zeroes, since MMULT must have every space in both matrices filled with a real number.

Excel: Make a dynamic formula that counts a specified max sum of X consecutive days

I am trying to make a formula that could count the max sum of any number of consecutive days that I indicate in some cell. Here is the dataset and the formula:
Dataset
The formula that calculates the maximum sum of three consecutive days:
=MAX(IFERROR(INDEX(
INDEX(E2:AI2,0)+
INDEX(F2:AI2,0)+
INDEX(G2:AI2,0),
0),""))
As you can see the number of days here is determined by the number of rows in the formula that start with "Index". The only difference between these rows is the letters (E, F, G). Is there any way I could reference a cell in which I could put a number for those days, instead of adding more rows to this formula?
Another approach avoding use of Offset is to use Scan to generate an array of running totals, then subtract totals which are N elements apart (where N is the number of consecutive cells to be added):
=LET(range,E2:AI2,
length,A1,
runningTotal,SCAN(0,range,LAMBDA(a,b,a+b)),
sequence1,SEQUENCE(1,COLUMNS(range)-length+1,A1),
sequence2,SEQUENCE(1,COLUMNS(range)-length+1,0),
difference,INDEX(runningTotal,sequence1)-IF(sequence2,INDEX(runningTotal,sequence2),0),
MAX(difference))
The answer here was posted by another user on another website, so I will repost it here:
One way to achieve this without relying on a VBA solution would be to use the BYCOL() function (available for Excel for Microsoft 365):
=BYCOL(array, [function])
The array specifies the range to which you want to apply your function, and the function itself is specified in a lambda statement. In the end, you want to get the minimum value of the sum of x consecutive days. Assuming that your data is stored in the range E2:AI2 and the number of consecutive days is stored in cell A1, the function looks like this:
=MIN(BYCOL(E2:AI2,LAMBDA(col,SUM(OFFSET(col,,,,A1)))))
The MIN() part ensures that you get only the smallest sum of the array (all sums of the x consecutive values) returned. The array is simply the range in which your data is stored; it is named in the lambda argument col and consequently used by its name. In your case, you want to apply the sum function for, e.g., x = 4 consecutive days (where 4 is stored in cell A1).
However, with this simple specification, you run into the problem of offsetting beyond cells with values toward the right end of the data. This means that the last sum you get would be 81.8 (value on 31 Jan) + 3 times 0 because the cells are empty. To avoid this, you can combine your function with an IF() statement that replaces the result with an empty cell if the number of empty cells is greater than 0. The adjusted formula looks like this:
=MIN(BYCOL(E2:AI2,
LAMBDA(col,IF(COUNTIF(OFFSET(col,,,,A1),"")>0,"",SUM(OFFSET(col,,,,A1))))))
If you do not have the Microsoft 365 version, there are two approaches that would also work. However, the two approaches are a bit more tedious, especially for cases with multiple days (because the number of days can not really be set automatically; except for potentially constructing the ranges with a combination of ADDRESS() and INDIRECT()), but I would still argue a bit neater than your current specification:
=MIN(INDEX(E2:AF2+F2:AG2+G2:AH2+H2:AI2,0))
=SUMPRODUCT(MIN(E2:AF2+F2:AG2+G2:AH2+H2:AI2))
The idea regarding the ranges is the same in both scenarios, with a shift in the start and end of the range by 1 for each additional day.
Another approach getting to the same result:
=LET(range,E2:AI2,
cons,4,
repeat,COLUMNS(range)-cons+1,
MAX(
BYROW(SEQUENCE(repeat,cons,,1)-INT(SEQUENCE(repeat,cons,0,1/cons))*(cons-1),
LAMBDA(x,SUM(INDEX(range,1,x))))))
This avoids OFFSET (volatile, slowing your file down) and the repeat value, consecutive number and/or the range are easily changeable.
Hope it helps (I answered to the max sum, as stated in the title). Change max to min to get the min sum result.
Edit:
I changed the repeat part in the formula to be dynamic (max number of consecutive columns in range), but you can replace it by a number or a cell reference.
The cons part can also be linked to a cell reference.
Also found a big in my formula which is fixed.

Transpose multiple occurrences

EDIT: I have revived the source data source to remove the ambiguity of my last screen shots
I am trying to transpose spreadsheet data where there are many rows where the customer name may be duplicated but each row contains a different product.
For instance
revised original data source
to
revised proposed data format
I would like to do it with formulae if possible as I struggle with VB
Thank you for any help
I realise this is a huge answer, apologies but I wanted to be clear. If you need anything from me, drop me a comment and I'll help out.
Here's the output from my formula:
EDITED ANSWER - Named ranges used for ease of understanding:
These are just an example of a few of the named ranges I have used, you can reference the ranges directly or name them yourself (simplest way is to highlight the data then put the name in the drop down next to the formula bar [top left])
Be wary that as we will be using Array formulas for AccNum and AccType, you will not want to select the entire column and instead opt for either the exact data length or overshoot it by 100 or so. Large array formulas tend to slow down calculation and will calculate every cell individually regardless of it being empty.
First formula
=IF(COUNTIF(D2:D11,">""")>0,CONCATENATE("Account Number ",LEFT((COLUMN(A:A)+1)/2,1)),"")
This formula is identical to the one in the original answer apart form the adjusted heading title.
=IF(Condition,True,False) - There are so many uses for the IF logic, it is the best formula in Excel in my opinion. I have used to IF with COUNTIF to check whether there is more than 0 cells that are more than BLANK (or ""). This is just a trick around using ISBLANK() or other blank identifiers that get confused when formula is present.
If the result is TRUE, I use CONCATENATE(Text1,Text2,etc.) to build a text string for the column header. ROW(1:1) or COLUMN(A:A) is commonly used to initiate an automatically increasing integer for formulas to use based on whether the count increase is required horizontally or vertically. I add 1 to this increasing integer and divide it by 2 so that the increase for each column is 0.5 (1 > 1.5 > 2 > 2.5) I then use LEFT formula to just take the first digit to the left of this decimal answer so the number increases only once every 2 columns.
If the result is FALSE then leave the cell blank ,""). Standard stuff here, no explanation needed.
Second Formula
=CONCATENATE(INDEX(Forename,MATCH(Sheet4!$A2,Reference,0)))
=CONCATENATE(INDEX(Surname,MATCH(Sheet4!$A2,Reference,0)))
CONCATENATE has only been used here to force blank cells to remain blank when pulled by INDEX. INDEX will read blank cells as values and therefore 0's whereas CONCATENATE will read them as text and therefore "".
INDEX(Range,Row,Column): This is a lookup formula that is much more advanced than VLOOKUP or HLOOKUP and not limited in the way that they are.
The range i have used is the expected output range - Forename or Surname
The row is then calculated using MATCH(Criteria,Range,Match Type). Match will look through a range and return the position as an integer where a match occurs. For this I have set the criteria to the unique reference number in column A for that row, the range to the named range Reference and the match type as 0 (1 Less than, 0 Exact Match, -1 Greater than).
I did not define a column number for INDEX as it defaults to the first column and I am only giving it one column of data to output from anyway.
Third Formula
Remember these need to be entered as an array (when in the formula bar hit Ctrl+Shift+Enter)
=IFERROR(INDEX(AccNum,SMALL(IF(Reference=Sheet4!$A2,ROW(Reference)-ROW(INDEX(Reference,1,1))+1),ROUNDDOWN((COLUMN(A:A)+1)/2,0))),"")
=IFERROR(INDEX(AccType,SMALL(IF(Reference=Sheet4!$A2,ROW(Reference)-ROW(INDEX(Reference,1,1))+1),ROUNDDOWN((COLUMN(B:B)+1)/2,0))),"")
As you can see, one of these is used for AccNum and the other for AccType.
IFERROR(Value): The reason that this has been used is that we are not expecting the formula to always return something. When the formula cannot return something or SMALL has run out of matches to go through then an error will occur (usually #VALUE or #NUM!) so i use ,"") to force a blank result instead (again standard stuff).
I have already explained the INDEX formula above so let's just dive in to how I have worked out the rows that match what we are looking for:
SMALL(IF(Reference=Sheet4!$A2,ROW(Reference)-ROW(INDEX(Reference,1,1))+1),ROUNDDOWN((COLUMN(B:B)+1)/2,0))
The IF statement here is fairly self explanatory but as we have used it as an array formula, it will perform =Sheet4!$A2 which is the unique reference on every cell in the named range Reference individually. In your mock data this returns a result of: {FALSE;TRUE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE} for the first entry (I included titles in the range, hence the initial FALSE). IF will do my row calculation* for every true but leave the FALSEs as they are.
This leaves a result of {FALSE;2;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE} that SMALL(array,k) will use. SMALL will only work on numeric values and will display the 'k'th result. Again the column trick has been used but to cover more ground, I used another method: ROUNDDOWN(Number,digits) as opposed to using LEFT() Digits here means decimal places so I used 0 to round down to a whole integer for the same result. As this copies across the columns like so: 1, 1, 2, 2, 3, 3, SMALL will alternatively (as the formulas alternate) grab the 1st smallest AccNum then the 1st Smallest AccType before grabbing the 2nd AccNum and Acctype and so forth.
*(Row number of the match minus the first row number of the range then plus 1, again fairly common as a foolproof way to always get the correct row regardless of where the data starts; actually as your data starts on row 1 we could just do ROW(Reference) but I left it as is incase you had data in a different format)
ORIGINAL ANSWER - Same logic as above
Here's your solution in 3 parts
Part 1 being a trick for the auto completion of the titles so that they will hide when not used (in case you will just copay and paste values the whole lot to speed up use again).
=IF(COUNTIF(C2:C11,">""")>0,CONCATENATE("Product ",LEFT((COLUMN(A:A)+1)/2,1)),"") in C
=IF(COUNTIF(D2:D11,">""")>0,CONCATENATE("Prod code ",LEFT((COLUMN(B:B)+1)/2,1)),"") in D
Highlight both of the cells and drag across to stagger the outputs "Product " and "Prod code "
Part 2 would be inputting the unique IDs to the new sheet, I would suggest copying your entire column A across to a new sheet and using DATA > REMOVE DUPLICATES > Continue with current selection to trim out the multiple occurrences of unique IDs.
In column B use =INDEX(Sheet2!$B$1:$B$7,MATCH(Sheet4!$A2,Sheet2!$A$1:$A$7,0)) to get the names pulled across.
Part 3, the INDEX
Once again, we are doing a staggered input here before copying the formula across the page to cover the entirety of the data.
=IFERROR(INDEX(Sheet2!$C$1:$D$11,SMALL(IF(Sheet2!$A$1:$A$11=Sheet4!$A2,ROW(Sheet2!$A$1:$A$11)-ROW(INDEX(Sheet2!$A$1:$A$11,1,1))+1),ROUNDDOWN((COLUMN(A:A)+1)/2,0)),1),"") in C
=IFERROR(INDEX(Sheet2!$C$1:$D$11,SMALL(IF(Sheet2!$A$1:$A$11=Sheet4!$A2,ROW(Sheet2!$A$1:$A$11)-ROW(INDEX(Sheet2!$A$1:$A$11,1,1))+1),ROUNDDOWN((COLUMN(B:B)+1)/2,0)),2),"") in D
The formulas of Part 3 will need to be entered as an array (when in the formula bar hit Ctrl+Shift+Enter) . This will need to be done before copying the formulas across.
These formulas can now be dragged / copied in all directions and will feed off of the unique ID in column A.
My Answer is already rather long so I haven't gone on to break the formula down. If you have any trouble understanding how this works, let me know and I will be happy to write up a quick guide, breaking it down chunk by chunk for you.

Sum multiple values from another sheet

I need to identify the cost of an engine depending which parts it will use.
I have one sheet that has the cost of each part for each engine model. There are 3 engine models and nearly 500 parts (!Parts).
In another sheet I try to sum the value of all the various combinations of parts. (!EngineCost). I use a "1" to indicate the inclusion of that part. I grab the prices from !Parts and total them per engine size.
At the moment I am doing this very manually (see below). Is there a better way to do this?
=IF(O3=1,'Parts'!$R$6,"0")+IF(P3=1,'Parts'!$R$7,"0")+IF(Q3=1,'Parts'!$R$8,"0")
Thanks!
Here is link to a sample sheet https://www.dropbox.com/s/wbh5muf7721mk0s/engine.xlsx
My first hunch was that you could use the following formula:
=SUMIFS(O3:O6, 1, 'Parts'!$R$6:$R$8)
This sum the values in 'Parts'!$R$6:$R$8 when the corresponding cell in O3:O6 is equal to 1.
However, as #simoco pointed out, you have a transpose - one array is transposed relative to the other. That make this very slightly more challenging. You need two steps:
find the cells that have a value of 1
sum those cells
The following can do it:
=SUMPRODUCT((O3:Q3=1), TRANSPOSE('Parts'!$R$6:$R$8))
entered as an array formula*); or, taking advantage of the fact that matrix multiplication is really the element by element multiplication of a row vector with a column vector, followed by taking the sum:
=MMULT(I3:K3,IF(G4:G6=1,1,0))
again entered as an array formula*).
*) Array formula is entered by pressing ctrl-shift-enter on PC, or cmd-shift-enter on Mac.

Why this array formula doesn't work?

On the illustration all formulas are array. The range that each formula spans is bordered, and the first formula on each block is written on the top of that block.
Range A4:A103 is an input vector (which is numeric), range C4:G23 is a given (input) permutation of the rows of A4:A103 (necessarily positive non-zero integer numbers not greater then the length of the input vector).
Let us I interpret the permutation matrix as set of rows.
How to compute for each row in a constant number of cells the minimal number in the input vector? By the constant number of cells, I mean solution, that would require fixed number of cells for each row, regardless of the number of columns in permutation. (In the production case each dimension is much, much bigger; there is about 100 columns in the permutation matrix.)
I don't ask for VBA solutions. If it is necessary the solution can use a free and publicly available Excel add-on, like MoreFunc, but I'd prefer to keep it vanilla Excel 2007 or later.
I thought that the formula {=MIN(INDEX(INDIRECT($A$2);$C4:$G4))} would solve my problem. Surprisingly, Excel seems to not take into account the array nature of the formula, and evaluates it as if it was written as =MIN(INDEX(INDIRECT($A$2);$C4) which is equivalent to dysfunctional =INDEX(INDIRECT($A$2);$C4).
On the other hand, we can see the argument to the MIN is understood as array in the range I4:M4.
INDEX works in some strange ways!
Normally INDEX can't return an array - although you seem to have found the one exception to that - when it's an array formula entered in a range.
You should be able to use OFFSET to return the required array that will work within MIN, i.e. with this formula
=MIN(N(OFFSET(INDIRECT($A$2);$C4:$G4-1;0)))
confirmed with CTRL+SHIFT+ENTER

Resources