Extracting the upper quartile data from an array and placing it in new column - excel

Hi and thanks for any help with this in advance
Below is a hypothetical data set; abundance = count data; mud% = the mud content in which the animals were found; mud bin = bins i've made up depending on the mud%; and UQ = upper quartile of the abundance data from its corresponding mud bin (i.e. the upper quartile for the abundance data in mud bin 1 is 17.25 etc).
Problem:
In excel, for abundance data in each of the four mud bins, I'm wanting to extract any values in the abundance column that are >= the upper quartile value for that particular mud bin and place these in a new column on the same sheet (with no gaps between rows from values that didn't meet the criteria) along with their corresponding mud% value in the neighboring cell. I've added the new columns to the below sheet to give you an idea of what I'm after.
abundance | mud% | mud bin | UQ | | New column | Mud% |
18 10.9 1 18 10.9(mud bid 1)
15 6.5 1 44 38.9(mud bin 1)
6 13.4 1 45 38 (mud bin 2)
13 42.1 1 37 37.8(mud bin 2)
15 36.4 1 etc
44 38.9 1 17.25 etc
22 46 2
30 36.4 2
45 38 2
29 35.3 2
37 37.8 2
29 41.8 2 35.25
11 44.4 3
17 47.8 3
21 40.7 3
15 13.9 3
35 13.9 3
14 13.9 3
15 13.9 3 19
19 12 4
14 12 4
10 12 4
12 12 4
14 12 4
13 12 4
45 9.525 4
66 9.525 4
78 9.525 4 45
The reality is I have a rather large dataset containing abundance data for a number of species, all on the same excel sheet and would greatly appreciate any insight into how I might achieve this in the most efficient manor.

For starters, to make this explanation simpler, I will assume that the last row of data is in row 100.
Populate Upper Quartile values for all line items
First you'll need to use the Quartile formula; however, since you want to find the upper quartile within a bin, you'll have to use an array formula. Put this formula in your UQ column (place in cell D2 and drag down). When entering the formula Be sure to press Ctrl+Shift before pressing Enter
=QUARTILE(IF($C$2:$C$100=C2,$A$2:$A$100,""),3)
The first part of this formula, $C$2:$C$100=C2 is your condition. Everywhere this condition is met, you will get the corresponding value in $A$2:$A$100; otherwise, you'll get a blank value. This will give you an array of abundance values that matches the indicated mudbin, C2. now that you have your subset of data, the quartile function will give you the value in the 3rd quartile (17.25 for mudbin 1, which will be placed next to every row that has a mudbin of 1).
Now that we have all the quartiles, we can get all the abundance values that are greater than the UQ for that mudbin. This is done in two parts
Get abundance values greater than mudbin UQ
First, you need to select one column of cells that has the same number of rows as your data (for example, select cells F2:F100)
Enter the following formula into the formula bar (while F2:F100 are highlighted) and press Ctrl+Shift, then enter
=IF($A$2:$A$100>$D$2:$D$100,$A$2:$A$100,"")
Similar to the IF statement used before, this formula finds all the abundance values that are greater than their corresponding UQ value. Now column F will have an abundance number where it is greater than it's UQ value, and a blank where it is not. Now onto the final step.
Populate abundance values that are greater than the UQ value, without the blanks
Select G2:G100 (your "New Column" in your sample data)
Enter the following formula into the formula bar (while G2:G100 are highlighted) and press Ctrl+Shift, then enter
=INDEX(F2:F100,SMALL(IF(F2:F100<>"",ROW(F2:F100)-1),ROW()-ROW($F$1)))
Looking at the IF statement again, this will find every value in F2:F100 that is not blank, but instead of grabbing the values, we'll keep track of the row number of that non blank value (done by ROW(F2:F100)-1
). Now that we have the row numbers of all the non blank values, we can grab the non-blank values in order and populate them in G2:G100. ROW()-ROW($F$1) is a counter, and SMALL will use the counter to determine the nth smallest number to return. Once we have our row number of the non blank value, INDEX returns that value
Finally, to populate the Mud%, you'll need to use the row number of the non blank values to get the mud% and the mud bin (You have the formula already to get the row number of the non blank value).
It's not a simple answer, but at least you won't have to use VBA.

Related

Sum of the greatest value in one column, plus the sum of the other values in another column

Consider the following sheet/table:
A B
1 90 71
2 40 25
3 60 16
4 110 13
5 87 82
I want to have a general formula in cell C1 that sums the greatest value in column A (which is 110), plus the sum of the other values in column B (which are 71, 25, 16 and 82). I would appreciate if the formula wasn't an array formula (as in requiring Ctrl + Shift + Enter). I don’t have Office 365, I have Excel 2019.
My attempt
Getting the greatest value in column A is easy, we use MAX(A1:A5).
So the formula I want in cell C1 should be something like:
=MAX(A1:A5) + SUM(array_of_values_to_be_summed)
Obtaining the values of the other rows in column B (what I called array_of_values_to_be_summed in the previous formula) is the hard part. I've read about using INDEX, MATCH, their combination, and obtaining arrays by using parenthesis and equal signs, and I've tried that, without success so far.
For example, I noticed that NOT((A1:A5 = MAX(A1:A5))) yields an array/list containing ones (or TRUEs) for the relative position of the rows to be summed, and containing a zero (or FALSE) for the relative position of the row to be omitted. Maybe this is useful, I couldn't find how.
Any ideas? Thanks.
Edit 1 (solution)
I managed to obtain what I wanted. I simply multiplied the array obtained with the NOT formula, by the range B1:B5. The final formula is:
=MAX(A1:A5) + SUM(NOT((A1:A5 = MAX(A1:A5))) * B1:B5)
Edit 2 (duplicate values)
I forgot to explain what the formula should do if there are duplicates in column A. In that case, the first term of my final formula (the term that has the MAX function) would be the one whose corresponding value in column B is smallest, and the value in column B of the other duplicates would be used in the second term (the one containing the SUM function).
For example, consider the following sheet/table:
A B
1 90 71
2 110 25
3 60 16
4 110 13
5 110 82
Based on the above table, the formula should yield 110 + (71 + 25 + 16 + 82) = 304.
Just to give context, the reason I want such a formula is because I’m writing a spreadsheet that automatically calculates the electric current rating of the short-circuit protective device of the feeder of a group of electric motors in a house or building or mall, as required by the article 430.62(A) of the US National Electrical Code. Column A is the current rating of the short-circuit protective device of the branch-circuit of each motors, and column B is the full-load current of each motor.
You can use this formula
=MAX(A1:A5)
+SUM(B1:B5)
-AGGREGATE(15,6,(B1:B5)/(A1:A5=MAX(A1:A5)),1)
Based on #Anupam Chand's hint for max-value-duplicates there could also be min-value-duplicates in column B for corresponding max-value-duplicates in column A. :) This formula would account for that
=SUM(B1:B5)
+(MAX(A1:A5)-AGGREGATE(15,6,(B1:B5)/(A1:A5=MAX(A1:A5)),1))
*SUMPRODUCT((A1:A5=MAX(A1:A5))*(B1:B5=AGGREGATE(15,6,(B1:B5)/(A1:A5=MAX(A1:A5)),1)))
Or with #Anupam Chand's shorter and better readable and overall better style :)
=SUM(B1:B5)
+(MAX(A1:A5)-MINIFS(B1:B5,A1:A5,MAX(A1:A5)))
*COUNTIFS(A1:A5,MAX(A1:A5),B1:B5,MINIFS(B1:B5,A1:A5,MAX(A1:A5)))
The explanation works for bot solutions:
The SUM-part just sums the whole list.
The second line gets the max-value for column A and the corresponding min-value of column B for the max-values in column A and adds or subtracts it respectively.
The third line counts, how many times the corresponding min-value for the max-value occurs and multiplies it with the second line.
Can you try this ?
=MAX(A1:A5)+SUM(B1:B5)-MINIFS(B1:B5,A1:A5,MAX(A1:A5))
What we're doing is adding the max of A to all rows of B and then subtracting the min value of B where A is the max.
If you have Excel 365 you can use the following LET-Formula
=LET(A,A1:A5,
B,B1:B5,
MaxA,MAX(A),
MinBExclude, MINIFS(B,A,MaxA),
sumB1,SUMPRODUCT(B*(A=MaxA)*(B<>MinBExclude)),
sumB2,SUMPRODUCT(B*(A<>MaxA)),
MaxA +sumB1+sumB2
A and B are shortcuts for the two ranges
MaxA returns the max value for A (110)
MinBExclude filters the values of column B by the MaxA-value (25, 13, 82) and returns the min-value of the filtered result (13)
sumB1 returns the sum of the other MaxA values from column B (26 + 82)
sumB2 returns the sum of the values from B where value in A <> MaxA (71 + 60)
and finally the result is returned
If you don't have Excel 365 you can add helper columns for MaxA, MinBExclude, sumB1 and sumB2 and the final result

Sumif of multiple Index matches against one value

Need help regarding Excel dynamically search based sum of two columns matching from two different tables.
I have got this Table of Data Entered One Time
A B C
1 Qlty Warp Weft
2 Stpl.1 150 20
3 Cotn.1 80 60
4 Stpl.2 20 20
5 Cotn.2 20 20
6 Stpl.3 20 40
in Column A2:A6, Quality can not be duplicated, its a unique Name
The Data entry and report Table is here
A B C D E F
8 Yarn Name Sent Bags Remaining Qualty Used Warp Used Weft
9 20 800 600 Stpl.1 71 200
10 150 101 30 Stpl.2 70 30
11 40 300 290 Stpl.3 100 10
12 20 400
C9:C5000 is Returning Column, Values are calculated on the base of Column A9:A5000 (Yarn Name)
Need to Find Yarn Name (eg:) "20" in B2:B6 AND/OR C2:C6, wherever it matches, index that Quality from A2:A6
Then match the returned qualities(could be more than one) to D9:D5000 and sum the mathced results from E9:F5000
I have tried so far in C12
=SUMIF($A$9:$A12,A12,$B$9:$B12)-(SUMIF($D$9:$D12,INDEX($A$2:$A$6,MATCH(A12,$B$2:$B$6,0)),$D$9:$D12)+SUMIF($D$9:$D12,INDEX($A$2:$A$6,MATCH(A12,$C$2:$C$6,0)),$D$9:$D12))
PS:- I am using Excel 2007
If I understand correctly, then following array formula can help you:
=SUM(IFERROR(INDEX($A$2:$A$6,N(IF(1,(MMULT(--($B$2:$C$6=A9),TRANSPOSE(COLUMN($B$2:$C$6)^0))>0)*(ROW($B$2:$C$6))-1)))=TRANSPOSE($D$9:$D$12),0)*TRANSPOSE($E$9:$E$12+$F$9:$F$12))
Array formula after editing is confirmed by pressing ctrl + shift + enter
Edit:
To calculate Warp and Weft columns separately use following array formula:
=SUM(IFERROR(INDEX($A$2:$A$6,N(IF(1,((A9=$B$2:$B$6)*ROW($B$2:$B$6))-1)))=TRANSPOSE($D$9:$D$12),0)*TRANSPOSE($E$9:$E$12))+SUM(IFERROR(INDEX($A$2:$A$6,N(IF(1,((A9=$C$2:$C$6)*ROW($C$2:$C$6))-1)))=TRANSPOSE($D$9:$D$12),0)*TRANSPOSE($F$9:$F$12))

Conditional Formatting based on date and value in Excel

I am trying to return the color for a score based on the date for the score and the score itself. Scoring has used different cut-offs over time:
Table 1
Date1 Score Color
Sep-16 24 [should be red]
Jul-16 6 [should be green]
Apr-14 12 [should be yellow]
... ... ...
Table 2
Date2 Red Orange Yellow Green
Aug-16 20 15 9.5 0
Jul-16 20 15.5 9.5 0
Apr-16 20 15 9.5 0
Mar-15 19 14 7 0
Feb-15 20 13 8.5 0
Jan-15 19 14 7 0
Apr-14 19 14 7 0
I want to place a formula in the "Color" cell that will evaluate Table 2 and return the column name for instances where the date in date1 is the most recent instance where it is greater than date 2, and for which the score given on table 1 is equal to or larger than the score given on table 2 for the correct row.
Thanks,
You need nested approximate lookups. This would be easier if your data was sorted the other way around. At least table 2 should have the columns in ascending order, instead of descending, so the match function can return the correct position of the number with an approximate match.
If you can arrange the columns in Table2 in the order Date2, Green, Yellow, Orange, Red, then the following formula will be possible.
=INDEX(Table3[[#Headers],[Green]:[Red]],MATCH([#Score],INDEX(Table3[Green],IFERROR(MATCH([#Date1],Table3[Date2],-1),1)):INDEX(Table3[Red],IFERROR(MATCH([#Date1],Table3[Date2],-1),1)),1))
This uses structured references, which accommodates rows being inserted into the tables without breaking the formulas.
Now you can use conditional formatting based on the cell values in column C.
Just for comparison, I have chosen to keep the lookup table (in Sheet2 rather than in an actual table) the same as in the question i.e. both tables are sorted from largest to smallest or most recent to least recent and the MATCHes both have -1 as the third argument:-

excel incrementation by consecutive numbers

I have a form, thanks to user JamTay317, That lists data depending on folder number (bold number in form). I need to copy it for all 1500 folders (about 400 pages)
Form is divided on 4 labels on a page for easier printing
form overview
Form get it's folder number (nr teczki) from list with all folders from another sheet called "lista teczek":
list of folders
For first 4 folder numbers I use formula:
A2='lista teczek'!A1
J2='lista teczek'!A2
A21='lista teczek'!A3
J21='lista teczek'!A4
When I copy whole page underneath it increments by 36 (number of rows between)
A38='lista teczek'!A37
J38='lista teczek'!A38
A57='lista teczek'!A39
J38='lista teczek'!A40
Instead of A5, A6, etc.
Is there any way to override excel's incrementation to force it to use consecutive numbers? Or at least formula which will make it easier to follow folders list?
So I would use offset() to get the correct position
=A2=OFFSET('lista teczek'!$A$1;ROW(A1)-INT(ROW(A1)/36)*36+4*INT(ROW(A1)/36)-1;0)
So this will offset from A1 in the list sheet.
Below are row numbers a resultant lookup row numbers
Note the formula I used in the offset has an extra "-1" as this is an OFFSET so to get 1 from 1 we need to offset by 0
1 1
2 2
3 3
4 4
37 5
38 6
39 7
40 8
73 9
74 10
75 11
76 12
109 13
110 14
111 15
112 16
145 17
146 18
147 19
----LOGIC--- (edit)
So the idea is that you work out the occurrence you are on. Int(row()/36) gives us this. For example
int(1/36)=0
Int(363/36)=10
First part gives us the offset from the start of the occurrence
3-int(3/36)*36=3
378-Int(363/36)*36=3
Second part give the total of the previous occurrence
4*int(3/36)=0
4*Int(363/36)*36=40
So you need to change the 36 to the gap between the occurrences and the 4 to the length of occurrences Not sure if that helps to explain

Is there any range or between kind of function in Excel?

I have a table with three columns in Excel, a,b and c.
One cell has a numeric value X [say 25], I need to get the value in the c column such that X should be in between min and max column. For X=25 the value in c is "20" since X is between the min and max values for the 5th table row.
I'm looking for a range or between kind of function, but couldn't find a function with either name.
a b c
-----------------------------
min max improvements
0 1 70
2 5 60
6 10 50
11 20 30
21 30 20
31 40 10
41 80 5
You want to look up in that table of yours which improvement value is correct if some value is "25"; in this case that would mean to match line 5 of your table and return the value 20.
You are looking for VLOOKUP, e.g.
=VLOOKUP(B12;A2:C8;2)
giving
(edit: fixed stupid typo in formula and image, now giving correct result)

Resources