Related
I have a data set and some index like below
I'd like to sum column 2, 5 and 7 (A1, A4 and A6 in data set). Note that na is also included in this column. How should I prepare a formula for this type of calculation. Thanks
Assuming no excel version constraints as per tag listed in the question, then you can use the following in cell G1 to sum by columns, if you need by row, then replace BYCOL with BYROW:
=BYCOL(CHOOSECOLS(B2:D3,TOCOL(E2:E4,2)), LAMBDA(x, SUM(TOCOL(x,2))))
Here a sample output:
It deals in cases we have #N/A values in column selection as well as in the data to sum, via TOCOL function, using the second input argument.
In column D (Result), I would like to have the following formula.
For each Cell in column C, find in column B the first value higher than the value of the cell of column C (starting from the same row) and gives as output the difference between the values found in column A (Count).
Example:
the value in C2 is 40. the first cell of B that has a value higher than 40 is B6. So D2 takes A6.value - A2.value = 5 - 1 = 4.
Can it be done without the use of VBA?
It can easily be accomplished with an array formula (so you have to enter the formula with Ctrl+Shift+Enter ) :
{=MATCH(TRUE;IF(B2:$B$7>C2;TRUE;FALSE);0)-1}
Put this formula in cell D2, and just drag down. You only have to change the end of your data set (change $B$7 into the real last cell of the column with data)
The formula works as follows :
The IF statement results in an array with TRUE/FALSE values that meet your criteria : {FALSE;FALSE;FALSE;FALSE;TRUE;FALSE}
The MATCH (with the 0 switch) searches the array for the index of the first match, which is 5 in our case
And you have to subtract 1 to get the offset to the cell where the function is placed, so this gives you 4
So although you have to enter it as an array formula (you will get an N/A error without the ctrl+shift+enter), the result is just a single number.
Also, depending on your data set, you might want to add some ERROR handling in case no match has been found, e.g. just using the example data set in your question, the result in cell D5 will be N/A so you have to decide what value you want the result to be in such case.
And finally, I did not use the values in column A, as I assumed this is just a sequential ascending counter. If this is not the case, and you specifically want to find the difference between the corresponding values in that column, you can use the variant mentioned by Foxfire... in one of the other answers: =MIN(IF(B2:$B$6>C2;A2:$A$6))-A2
A slightly adjusted and shortened answer on Peter K.'s suggestion:
In D2:
=MATCH(TRUE,$B3:B$7>C2,0)
enter the formula with ctrl+shift+enter
Something like this should work for you. First transform your range to a table.
=IFERROR(AGGREGATE(15,6,--([#Second]<[First])*(ROW([#Second])<=ROW([First]))/--([#Second]<[First]*(ROW([#Second])<=ROW([First])))*[Count],1) - [#Count],"")
In this formula the comparison between [second] and [First] starts at the same row. That means that the value in D5 is 0 and not 1. (Like #Foxfire And Burns And Burnslike stated in the comments).
Ok, I did not post the answer waiting until OP answered why D5 is 1 instead of 0, but my formula is also an array formula. It would be:
=MIN(IF(B2:$B$6>C2;A2:$A$6))-A2
To type this formula in array mode, you need to type it as usual, but instead of pressing ENTER, you need to press CTRL+SHIFT+ENTER
The following code allows me to determine distinct values in a pivot table in Excel:
=SUMPRODUCT(($A$A:$A2=A2)*($B$2:$B2=B2))
See also: Simple Pivot Table to Count Unique Values
The code runs perfectly fine. However, can somebody help me understand how this code actually works?
You write: the following code allows me to determine distinct values in a pivot table in Excel
No. That formula alone does not do that. Read on for the explanation of what does.
There's a typo in the formula. It should be
=SUMPRODUCT(($A$2:$A2=A2)*($B$2:$B2=B2))
See the difference?
The formula starts in row 2 and is copied down. In each row, the $A$2 reference and the $B$2 reference will stay the same. The $ signs make them absolute references. The relative references $A2 and A2 will change their row numbers when copied down, so in row 3 the A2 will change to A3 and B2 will change to B3. In the next row it will be A4 and B4, and so on.
You may want to create a sample scenario with data similar to that in the thread you link to. Then use the "Evaluate Formula" tool on the Formulas ribbon to see step by step what is calculated. The formula evaluates from the inside out. Let's assume the formula has been copied down to row 5 and we are now looking at
=SUMPRODUCT(($A$2:$A5=A5)*($B$2:$B5=B5))
($A$2:$A5=A5) this bit compares all the cells from A2 to A5 with the value in A5. The result is an array of four values, either true or false. The next bit ($B$2:$B5=B5) also returns an array of true or false values.
These two arrays are multiplied and the result is an array of 1 or 0 values. Each array has the same number of values.
The first value of the first array will be multiplied with the first value of the second array. (see the red arrows)
The second value of the first array will be multiplied with the second value of the second array. (see the blue arrows)
and so on.
True * True will return 1, everything else will return 0. The result of the multiplication is:
The nature of the SumProduct function is to sum the result of the multiplications (the product), so that is what it does.
This function alone does not do anything at all to establish distinct values in Excel. In the thread you link to, the Sumproduct is wrapped in an IF statement and THAT is where the distinct values are identified.
=IF(SUMPRODUCT(($A$2:$A2=A2)*($B$2:$B2=B2))>1,0,1)
In plain words: If the combination of the value in column A of the current row and column B of the current row has already appeared above, return a zero, otherwise, return a 1.
This marks distinct values of the combined columns A and B.
Firts, i think you made a type here, as the formula should be :
=SUMPRODUCT(($A$2:$A2=A2)*($B$2:$B2=B2))
Let's decompose it in 2 parts:
First, we check the cells between A2 and A2, so only one cell, and we check the number of cells wich are equals to A2. In this case, the output should be 1, as you're comparing A2 with A2. However, you're not limited to compare A2 with A2. If you had chosen 2 cells equals to A2, the results would have been 2.You can compare as many cells as you want with A2 (replace the characters after the $ to modulate).
We do the same for the second bracket, except the pivot value is B2.
After that, you need to understand what the function SUMPRODUCT does. It sum the value of the product for a range of array. For example, say you have the value 1 on A1, 1 on A2, 2 on B1 and 3 on B2, if you make SUMPRODUCT((A1:A2)*(B1:B2)) , you will obtain (1*2) + (1*3) = 5. So, in the example you gave us, it will give the sum of (A2=A2)*(B2=B2) = 1.
So, it will output the number of pair (Ax,Bx) which is equals to (A2,B2). With the link, you can see that, if you select the first line only, the function will output 1 (and so the IF will output 1), but if you select the first 2 lines, the function will output 2, (and so the IF will output 0).
I hope this made sense to you, as i hoped i didn't make any mistakes along the explanation.
I have a simple table with 5 names and 5 grades if you will.
In another column I order the grades using the LARGE function.
Now is there a way to know the row of each of the "ordered" grades to obtain something like that?
White 23 31 5
Red 15 23 1
Green 23 23 3
Blue 18 18 4
Grey 31 15 2
The column I can't calculate is the last one!
You should use the rank() function if you want to rank these grades. Not large().
=RANK(D2,$D$2:$D$6,1)
You can try this
=MATCH(LARGE(B1:B5,1),B1:B5,0)
The result is a number of row...
In Cell D1 Put =INDEX($A$1:$A$5,MATCH(C1,$B$1:$B$5,0))
Then in Cell D2 put =IF(D1<>INDEX($A$1:$A$5,MATCH(C2,$B$1:$B$5,0)),INDEX($A$1:$A$5,MATCH(C2,$B$1:$B$5,0)),INDEX($A$1:$A$5,MATCH(C2,$B$1:$B$5,0)+MATCH(C2,INDIRECT("$B$"&MATCH(C2,$B$1:$B$5,0)+1&":$B$5"),0)))
This will also work when duplicate Grades are present
But I Strongly Suggest using Sort as Follows:
*****Also: ***** Here is the explanation on the above Formulas.
To get the Row that contains the Number we are looking for (the number in Column C) you need yo use the Match() Function. We enter =MATCH(C1,B1:B5,0) in D1:
What this is doing: IS looking to the value in C1, this is 31
It is looking in Range("B1:B5"), And 0 is for an Exact match.
So when look for a match to C1 or 31 we get 5. This tells us that 31 is in Row 5
Now, to get the Value of Column A on Row 5 we use INDEX() Function as Follows:
We add to the =MATCH(C1,B1:B5,0) in D1 as =INDEX(A1:A5,MATCH(C1,B1:B5,0))
This will look in Range("A1:A5") for Row 5 (This is because =MATCH(C1,B1:B5,0) = 5)
And the result will be Grey
Now if we drag this formula down we will find the first problem:
Here are our 2 Issues:
1) We get an `N/A` error in the last row.
2) Although `Green` is only in `Range("A1:A5")` one time we see it twice
even though it would seem that `White` should be twice.
These are cause because:
1) We need to add `$` to the range that will remain the same so when we drag down
the formula is won't shift the range. As is the formula in `D5` is
`=INDEX(A5:A9,MATCH(C5,B5:B9,0))` and we receive the error *because*
`Range("A5:A9")` does not contain `15`, but the issue is we meant
to look in `Range("A1:A5")`
So we change the Formula as so: =INDEX($A$1:$A$5,MATCH(C1,$B$1:$B$5,0))
Take note that we do not use the $ on C1 in the formula cause we WANT this value to change as we move down.
But we still have the issue of double values when they shouldn't be there.
Because D1 is the first cell we won't change the formula in it. As anything that is equal to the greatest value is simply tied with it and I don't see any reason why the order of the tie would matter.
Instead we will start in D2 and enter =IF(D1<>INDEX($A$1:$A$5,MATCH(C2,$B$1:$B$5,0)),INDEX($A$1:$A$5,MATCH(C2,$B$1:$B$5,0)),INDEX($A$1:$A$5,MATCH(C2,$B$1:$B$5,0)+MATCH(C2,INDIRECT("$B$"&MATCH(C2,$B$1:$B$5,0)+1&":$B$5"),0)))
What this is doing is checking if the value of =INDEX($A$1:$A$5,MATCH(C2,$B$1:$B$5,0))
is not equal to the value in the row above. (being a sorted list means all double values would be on top of each other) and If it is NOT the same then use the value, but if it is the same we need to do a little more work.
If the value is not the same we use the Formula INDEX($A$1:$A$5,MATCH(C2,$B$1:$B$5,0)+MATCH(C2,INDIRECT("$B$"&MATCH(C2,$B$1:$B$5,0)+1&":$B$5"),0)))
Now to explain it I will use our example of double values. In D3 we find the formula: =IF(D2<>INDEX($A$1:$A$5,MATCH(C3,$B$1:$B$5,0)),INDEX($A$1:$A$5,MATCH(C3,$B$1:$B$5,0)),INDEX($A$1:$A$5,MATCH(C3,$B$1:$B$5,0)+MATCH(C3,INDIRECT("$B$"&MATCH(C3,$B$1:$B$5,0)+1&":$B$5"),0)))
And because we know that INDEX($A$1:$A$5,MATCH(C3,$B$1:$B$5,0)) will be equal to the above cell (White), and we have gone over how the if true works, I will focus on the if false value of: INDEX($A$1:$A$5,MATCH(C3,$B$1:$B$5,0)+MATCH(C3,INDIRECT("$B$"&MATCH(C3,$B$1:$B$5,0)+1&":$B$5"),0))
We know MATCH(C3,$B$1:$B$5,0) is the Row that contains the first instance of C3 in this case 23 and the row is Row 1 so we need to look for 23 in the row Under Row 1. So we use MATCH(C3,INDIRECT("$B$"&MATCH(C3,$B$1:$B$5,0)+1&":$B$5"),0) which is equal to MATCH("23", B2:B4,0) because we are adding a 1 to the row that has the 1st match for 23 or C3.
that will now return us the Value of 2 as, the value 23 is in the second row of Range("A2:A5"), Red is in Row 1 and Blue in Row 3 of that range as shown:
but we don't want Row 2 we know that 23 relates to Green and that Green is in Row 3 So we add the row the we last found the value 23 (1 or MATCH(C3,$B$1:$B$5,0))to the row we currently found it (2) and get Row 3.
Here is a formula approach based on the methodology outlined in this link. The final layout of this approach is shown below.
I have assumed that there is 1 header row and I use 2 helper columns (D & E). While additional rows can be added to the header, the table must begin in column A in order for the formulas in column E to work correctly.
Although the helper columns could be eliminated by consolidating their formulas into the formulas in column F, I do not recommend it: the resulting formulas would be a pain to maintain.
Formulas Needed
Cell C2: =LARGE(B:B,ROW(A2)-ROW($A$1)) [Copy down to bottom of data]
Cell D2: =MATCH(C2,B:B,0) [Copy down to bottom of data]
Cell E2: =D2
Cell E3: =IF(D3<>D2,D3,E2+MATCH(C3,INDIRECT("B"&(E2+1)&":B"&COUNTA(A:A)),0))
[Copy down to bottom of data]
Cell F2: =OFFSET($A$2,E2-ROW($A$2),0) [Copy down to bottom of data]
Explanation of Answer
There are four steps to getting the answer:
Sort the grades from highest to lowest (as you showed in your example data)
Create a partial ordering of the row numbers for the sorted grades
Get the row numbers for duplicate grades
Use that ordering to show the name for each sorted grade
Sort the grades from highest to lowest
As you have done, my sort uses the LARGE function, which returns the nth largest value in a range or array. As shown, the LARGE function in cell C2 takes the grades in column B. The "n" for LARGE is calculated as the current row number minus the number of rows in the header, in this case the 1 row for cell A1. When the formula is copied down, "n" progresses from 1 to 2 to 3, etc.
Partially order the grade row numbers
The next step is to determine the row numbers for the unsorted grades that correspond to the sorted grades.
To do that, I use the MATCH function to find where each of the sorted grades lies in the list of unsorted grades in column B. MATCH takes three arguments--the value to be matched, the range in which to make the match, and optionally, the type of match, with a value of 0 or FALSE for an exact match--and returns an index number which represents where in the lookup range the match is found (1 for the first row in the match range, 2 for the second row, etc.).
In the formula for cell D2 shown above, the MATCH function on the grade 31 returns 6 since 31 is in the sixth row of column B.
The result for cell D4 shows why it is only possible to get a partial ordering with this formula. While we are trying to lookup the row for the second instance of a grade of 23, the formula returns a value of 2, which corresponds to the row for the first instance of 23. That's because MATCH will always return the first match for 23 it finds, which is on row 2!
Get correct row numbers for duplicate grades
The next step is to get the correct row references for the duplicated row numbers in column D. The formulas that accomplish this are shown for the first three cells in column E of the table.
There are three cases that have to be dealt with in column E:
For the first (and possibly only) instance of the highest grade, it is possible to just use the row number calculated in cell D2.
The second case deals with the first instances of the row references of the remaining grades. For these the rows numbers calculated in column D can again be used (via the TRUE branch of the IF statement in the column E formulas). For example, in cell E2 -- which corresponds to the first instance of grade 23 -- the row number in cell D3 can be used.
The final case is the rows for duplicate grades. Here, the MATCH for each duplicate in column B is recalculated using a sliding range that excludes the previous matches for that grade. For example, for the duplicated grade of 23 in column C, the match is on the range B3:B6, rather than the range of B2:B6 used in the column D calculation.
Diplay the names in sorted order
This final step is straight forward: Get the name corresponding to the sorted grade. Here the OFFSET function is used; its arguments are a cell reference and the number of rows and columns from that reference that the desired value is to be found.
How can I sort a table in sheet 1 like
A B C D E
3 7 3 6 5
into another table in sheet 2
A C E D B
3 3 5 6 7
by using function only?
One really easy way to do it would be to just have a rank index and then use HLOOKUP to find the corresponding values:
=RANK(A4,$A$4:$E$4,1)
=IF(COUNTIF($A$1:A$1,A1)>1,RANK(A4,$A$4:$E$4,1)+COUNTIF($A$1:A$1,A1)-1,RANK(A4,$A$4:$E$4,1))
=HLOOKUP(COLUMN(),$A$2:$E$4,2,FALSE)
=HLOOKUP(COLUMN(),$A$2:$E$4,3,FALSE)
Okay, here's the "one formula does it all" solution without additional temporary columns:
Formula in A6:
=INDEX($A$2:$E$2,MATCH(SMALL($A$3:$E$3+COLUMN($A$3:$E$3)/100000000,COLUMN()),$A$3:$E$3+COLUMN($A$3:$E$3)/100000000,0))
Enter it as an array formula, i.e. press Ctrl-Shift-Enter. Then copy it to the adjacent columns.
To get also the number, use this formula in A7 (again as array formula):
=ROUND(SMALL($A$3:$E$3+COLUMN($A$3:$E$3)/100000000,COLUMN()),6)
Both formulas are a bit bloated as they have to also handle the potential duplicates. The solution is to simply add a very tiny fraction of the column before applying the sorting (SMALL function) - and then to remove it again...
I think an easier solution is to use the "Large" function:
The Column functions are just an easy way to count from 5 down to 1, but a helper column may be even easier.
You can have a similar answer using the "Small" function as well.
Although I have used the "add a small number technique", I think this is the most elegant for a helper row/column:
=RANK(B4,$B$4:$F$4,0) + COUNTIF($B$4:$F$4,B4)-1
(copy across the row column) RANK gets you close and the COUNTIF portion handles duplicates by counting the number of duplicates of the cell to that point. Since there is always match (itself) you subtract 1 for the final rank. This "sorts" ties in the order that they appear. Note that an empty cell will generate #N/A and character data as #Values.
It's doable! :-)
Here is the sample file.
Explanation
I assume that your data is in row 1 of Sheet1 and that you want the data sorted starting in column 1 of another sheet - if not, adjust accordingly.
Place this formula in column 1 of the target sheet, say in row 2:
=ROUND(SMALL(Sheet1!$A$1:$E$1+COLUMN(Sheet1!$A$1:$E$1)/100000000,COLUMN()),6)
You need to enter the formula as an array formula, i.e. press Ctrl-Shift-Enter.
In case you place it in another column than column one, you need to replace COLUMN() with COLUMN()-COLUMN($A$2)+1 where $A$2must be a reference to the cell itself.
This formula will return you the lowest number in your range - if you copy it the next 4 columns, you'll get your order list of numbers.
To translate this back to the column number, we need to perform 2 steps:
Figure out the column number:This can be done with with this formula:
=MATCH(SMALL(Sheet1!$A$1:$E$1+COLUMN(Sheet1!$A$1:$E$1)/100000000,COLUMN()),Sheet1!$A$1:$E$1+COLUMN(Sheet1!$A$1:$E$1)/100000000,0)
- again, to be entered as array formula. If the source data does not start in column A, you need to add +COLUMN(Sheet1!$A$1)-1 where $A$1 needs to be replaced with the leftmost cell of the source data.
Convert the column number to a letter:Assuming your column number from step 1 is in cell A6, this formula will do the job:
=LEFT(ADDRESS(1,A6,2),SEARCH("$",ADDRESS(1,A6,2))-1)
Of course, you can also combine step 1&2, this will result in this megaformula:
=LEFT(
ADDRESS(
1,
MATCH(
SMALL(Sheet1!$A$1:$E$1+COLUMN(Sheet1!$A$1:$E$1)/100000000,COLUMN()),
Sheet1!$A$1:$E$1+COLUMN(Sheet1!$A$1:$E$1)/100000000,
0),
2),
SEARCH(
"$",
ADDRESS(
1,
MATCH(
SMALL(Sheet1!$A$1:$E$1+COLUMN(Sheet1!$A$1:$E$1)/100000000,COLUMN()),
Sheet1!$A$1:$E$1+COLUMN(Sheet1!$A$1:$E$1)/100000000,0),
2)
)-1
)
Again, to be entered as array formula.