Taking average of certain values in one Excel column based on values in another - excel

I have a (large) array of data in Excel of which I need to compute the average value of certain values in one column, based on the values of another column. For example, here's a snippet of my data:
So specifically, I want to take the average of the F635 mean values corresponding with Row values of 1. To take it a step further, I want this to continue to Row values of 2, Row values of 3 etc.
I'm not familiar with how to run code in Excel but have attempted to solve this by using the following:
=IF($C = "1", AVERAGE($D:$D), "")
which (to my understanding) can be interpreted as "if the values (anywhere) in column C are equal to 1, then take the average of the corresponding values in column D."
Of course, as I try this I get a formula error from Excel.
Any guidance would be incredibly appreciated. Thanks in advance.

For more complicated cases, I would use an array-formula. This one is simple enough for the AVERAGEIF formula. For instance =AVERAGEIF(A1:A23;1;B1:B23)
Array-formula allows for more elaborate ifs. To replicate the above, you could do =SUM(IF($A$1:$A$23=1;$B$1:$B$23;0))/COUNT(IF($A$1:$A$23=1;$B$1:$B$23;0)).
Looks like more work but you can create extremely elaborate if-statements. Instead of hitting ENTER, do CTRL-ENTER when entering the formula. Use * between criteria to replicate AND or + for OR. Example: SUM(IF(($A$1:$A$23="apple")*($B$1:$B$23="green");$C$1:$C$23;0)) tallies values for green apples in c1:c23.
Your sample data includes three columns with potential ifs so my guess is that you're going to need array formulas at some point.

Excel already has a builtin function for exactly this use; AVERAGEIF().
=AVERAGEIF(C:C,1,D:D)

Related

Compare multiple columns as pair-wise for Excel/Google Sheets

I am new to Excel/Google Sheets. I have a difficulty of writing a formula to compare columns as a pair-wise since the formula would be
so big as the day goes.
For example, there're 2 main columns Foo and Bar. I want to find the total number of days that Foo
and Bar are equal so the current formula is =IF(A3 = G3, 1, 0)+IF(B3 = H3, 1, 0)+IF(C3 = I3, 1, 0)+...
But this is kind of tedious because there're ~40 days to compare with. Are there any other alternatives
to write a formula in efficient way? Either Google-App-Scripts or Excel Formula is appreciated.
Cheers!
Give a try on below google-sheet formula. Adjust ranges as you need.
=ArrayFormula(SUM(IF(A3:E3=G3:K3,1,0)))
Assuming that you're needing to get such a total for each row and not merely a single row, try this:
=ArrayFormula(IF(A3:A="",,MMULT(IF(A3:F=G3:L,1,0),SEQUENCE(COLUMNS(A:F),1,1,0))))
Of course you will need to adjust the three ranges to match your own FOO and BAR ranges.
This one formula will produce all results for all rows.
The MMULT function is tricky to explain to those as yet unfamiliar with it. But it's a powerful tool. I'll add a picture I created that may best explain what it does:
By making the second matrix a simple SEQUENCE of 1s as long as the other matrix is wide, we wind up multiplying everything by 1 before adding together. And since anything multiplied by 1 is itself, this combination serves only to do a row-by-row add.
Things to keep in mind with MMULT:
1.) Every cell in every matrix must be a number or it will produce an error.
2.) As in the above formula, there are ways to use either/or conditions to turn every cell in a matrix into a number.

Sumproduct or Countif on a 2D matrix

I'm working on data from a population of people with allergies. Each person has a unique ExceptionID, and each allergen has a unique AllergenID (451 in total).
I have a data table with 2 columns (ExceptionID and AllergenID), where each person's allergies are listed row by row. This means that the ExceptionID column has repeated values for people with multiple allergies, and the AllergenID column has repeated values for the different people who have that allergy.
I am trying to count how many times each pair of allergies is present in this population (e.g. Allergen#107 & Allergen#108, Allergen#107 & Allergen#109,etc). To keep it simple I've created a matrix of 451 rows X 451 columns, representing every pair (twice actually because A/B and B/A are equivalent).
I somehow need to use the row name (allergenID) to lookup the ExceptionID in my data table, and count the cases where that matches the ExceptionIDs from the column name (also AllergenID). I have no problem using Vlookup or Index/Match, but I'm struggling with the correct combination of a lookup and Sumproduct or Countif formula.
Any help is greatly appreciated!
Mike
PS I'm using Excel 2016 if that changes anything.
-=UPDATE=-
So the methods suggested by Dirk and MacroMarc both worked, though I couldn't apply the latter to my full data set (17,000+ rows) because it was taking a long time.
I've since decided to turn this into a VBA macro because we now want to see the counts of triplets instead of pairs.
With the 2 columns you start with, it is as good as impossible... You would need to check every ExceptionID to have 2 different specific AllergenID. Better use a helper-table with ExceptionID as rows and AllergenID as columns (or the opposite... whatever you like). The helper table needs a formula like:
=COUNTIFS($A:$A,$D2,$B:$B,E$1)
Which then can be auto-filled. (The ranges are from my example, you need to change them to your needs).
With this helper-matrix you can easily go for your bigger matrix like this:
=COUNTIFS(E:E,1,INDEX($E:$G,,MATCH($I2,$E$1:$G$1,0)),1)
Again, you can auto-fill with this formula, but you need to change it, so it fits your needs.
Because the columns have the same ID2 (would be your AllergenID), there is no need to lookup them because E:E changes automatically with the auto-fill.
Most important part of the formulas are the $ which should not be messed up, or you can not auto-fill it.
Picture of my self-made example (formulas are from the upper left cell in each table):
If you still have any questions, just ask :)
It can be done straight from your original set-up with array formulas:
Please note that array formulas MUST be entered with Ctrl-Shift-Enter, before copying across and down:
In the example pic, I have NAMED the data ranges $A$2:$A$21 as 'People' and $B$2:$B$21 as 'Allergens' to make it a nicer set-up. You can see in the formula bar how that looks as a formula. However you could use the standard references like this in your first matrix cell:
EDIT: silly me, N function is not needed to turn the booleans into 1's and 0's, since multiplying booleans will do the trick. Below formula works...
SUM(IF(MATCH($A$2:$A$21,$A$2:$A$21,0)=ROW($A$2:$A$21)-1, NOT(ISERROR(MATCH($A$2:$A$21&$E2,$A$2:$A$21&$B$2:$B$21,0)))*NOT(ISERROR(MATCH($A$2:$A$21&F$1, $A$2:$A$21&$B$2:$B$21,0))), 0))
Then copy from F2 across and down. It can be perhaps improved in technique with sumproduct or whatever, but it's just a rough example of the technique....

Finding the maximum value among the products in two rows

I am an excel beginner and I would like to do the following.
Let row1= (a_1 a_2 a_3) and row2= (b_1 b_2 b_3).
I want excel to calculate the largest number among the products (a_1b_1, a_2b_2, a_3b_3).
It is very difficult to look up these things for I am not sure what kind of calculation I am doing and it is hard to explain.
Take a third column, C and enter formula in C1 as $A1*$B1. Pull it down vertically to all other rows so that row number gets incremented for each.
Then in the fourth column, use the formula MAX(C:C)
The following formula, array-entered, gives you the result of the largest number among the products:
{=MATCH(A1:C1*A2:C2)}
(provided your data is in A1:C2 in the form you presented it).
For explanations on how to insert array formula in excel see e.g. this microsoft link; in short, you type the formula without the curly brackets and confirm with CTRL+SHIFT+ENTER instead of only ENTER.
If you want to find where this couple of numbers is (in your case: which column), I would try this:
{=MATCH(MAX(A1:C1*A2:C2);A1:C1*A2:C2;0)}
(also array-entered).
you can do that, or make a pivot with the raw data and get the MAX/MIN/AVG, based on the pivot options. I tend to use that instead and then vlookup the ID to the pivot to get whatever aggregate you need.

Find Minimum Value Based on 2 Criteria (Excel 2013)

Looking to find the max value in a column based on two sets of criteria
So the logic would be: Find the minimum value in column M, where the value in column A matches column N, and the value in Column Y is less than 318.
I've tried using an array formula like this but it doesn't seem to be working/is to memory heavy to run:
=MIN(IF(AND(N:N=A2,Y:Y<=318),M:M))
is there a simpler way? or perhaps a UDF that could work?
Thank you for your help!
You can't use AND in these type of formulas because it only returns a single value rather than the required array.
Here are three possible working versions:
1.) Use * to simulate AND
=MIN(IF((N:N=A2)*(Y:Y<=318),M:M))
confirmed with CTRL+SHIFT+ENTER
2.) Use multiple nested IFs
=MIN(IF(N:N=A2,IF(Y:Y<=318,M:M)))
confirmed with CTRL+SHIFT+ENTER
3.) Use AGGREGATE function
=AGGREGATE(15,6,M:M/(N:N=A2)/(Y:Y<=318),1)
The advantages of this approach are that you don't need "array entry", and it can ignore any errors in the data
Either way it's best to reduce the ranges sizes if you can because it might be slow with whole columns
Give this a try and adjust ranges to suit. Try not to use whole column references:
=SMALL(INDEX(($N$2:$N$101=A2)*($Y$2:$Y$101<=318)*$M$2:$M$101,),1+ROWS($M$2:$M$101)-COUNTIFS($N$2:$N$101,A2,$Y$2:$Y$101,"<=318"))
If you are using the whole column to pick up new data as it is added, consider using Dynamic Named Ranges instead
When things get this complex, I'll usually break it down and setup smaller/simpler formulas in seperate columns.
In other words, you have data in columns A through Y ?
So let's create a formula in column AA:
1) identify when value in Col A matches col N, and value in col Y < 318
=and(A1=N1,Y1<318)
2) copy AA1 to all the rows of your data.
3) now we have a condition to work off .. since there is a SUMIF and COUNTIF, but no MINIF .. we'll have to build that ourselves. first the IF:
in column AB1:
=if(AA1,M1,"")
copy that down to all your data.
finally, do your min:
=MIN(AB:AB)
Should give you your answer.
You could probably splice the first two together, but again, building a complex formula like this, build it simply, first, ;)

Using SUMIFS with multiple AND OR conditions

I would like to create a succinct Excel formula that SUMS a column based on a set of AND conditions, plus a set of OR conditions.
My Excel table contains the following data and I used defined names for the columns.
Quote_Value (Worksheet!$A:$A) holds an accounting value.
Days_To_Close (Worksheet!$B:$B) contains a formula that results in a number.
Salesman (Worksheet!$C:$C) contains text and is a name.
Quote_Month (Worksheet!$D:$D) contains a formula (=TEXT(Worksheet!$E:$E,"mmm-yy"))to convert a date/time number from another column into a text based month reference.
I want to SUM Quote_Value if Salesman equals JBloggs and Days_To_Close is equal to or less than 90 and Quote_Month is equal to one of the following (Oct-13, Nov-13, or Dec-13).
At the moment, I've got this to work but it includes a lot of repetition, which I don't think I need.
=SUM(SUMIFS(Quote_Value,Salesman,"=JBloggs",Days_To_Close,"<=90",Quote_Month,"=Oct-13")+SUMIFS(Quote_Value,Salesman,"=JBloggs",Days_To_Close,"<=90",Quote_Month,"=Nov-13")+SUMIFS(Quote_Value,Salesman,"=JBloggs",Days_To_Close,"<=90",Quote_Month,"=Dec-13"))
What I'd like to do is something more like the following but I can't work out the correct syntax:
=SUMIFS(Quote_Value,Salesman,"=JBloggs",Days_To_Close,"<=90",Quote_Month,OR(Quote_Month="Oct-13",Quote_Month="Nov-13",Quote_Month="Dec-13"))
That formula doesn't error, it just returns a 0 value. Yet if I manually examine the data, that's not correct. I even tried using TRIM(Quote_Month) to make sure that spaces hadn't crept into the data but the fact that my extended SUM formula works indicates that the data is OK and that it's a syntax issue. Can anybody steer me in the right direction?
You can use SUMIFS like this
=SUM(SUMIFS(Quote_Value,Salesman,"JBloggs",Days_To_Close,"<=90",Quote_Month,{"Oct-13","Nov-13","Dec-13"}))
The SUMIFS function will return an "array" of 3 values (one total each for "Oct-13", "Nov-13" and "Dec-13"), so you need SUM to sum that array and give you the final result.
Be careful with this syntax, you can only have at most two criteria within the formula with "OR" conditions...and if there are two then in one you must separate the criteria with commas, in the other with semi-colons.
If you need more you might use SUMPRODUCT with MATCH, e.g. in your case
=SUMPRODUCT(Quote_Value,(Salesman="JBloggs")*(Days_To_Close<=90)*ISNUMBER(MATCH(Quote_Month,{"Oct-13","Nov-13","Dec-13"},0)))
In that version you can add any number of "OR" criteria using ISNUMBER/MATCH
You can use DSUM, which will be more flexible. Like if you want to change the name of Salesman or the Quote Month, you need not change the formula, but only some criteria cells. Please see the link below for details...Even the criteria can be formula to copied from other sheets
http://office.microsoft.com/en-us/excel-help/dsum-function-HP010342460.aspx?CTT=1
You might consider referencing the actual date/time in the source column for Quote_Month, then you could transform your OR into a couple of ANDs, something like (assuing the date's in something I've chosen to call Quote_Date)
=SUMIFS(Quote_Value,"<=90",Quote_Date,">="&DATE(2013,11,1),Quote_Date,"<="&DATE(2013,12,31),Salesman,"=JBloggs",Days_To_Close)
(I moved the interesting conditions to the front).
This approach works here because that "OR" condition is actually specifying a date range - it might not work in other cases.
Quote_Month (Worksheet!$D:$D) contains a formula (=TEXT(Worksheet!$E:$E,"mmm-yy"))to convert a date/time number from another column into a text based month reference.
You can use OR by adding + in Sumproduct. See this
=SUMPRODUCT((Quote_Value)*(Salesman="JBloggs")*(Days_To_Close<=90)*((Quote_Month="Cond1")+(Quote_Month="Cond2")+(Quote_Month="Cond3")))
ScreenShot
Speed
SUMPRODUCT is faster than SUM arrays, i.e. having {} arrays in the SUM function. SUMIFS is 30% faster than SUMPRODUCT.
{SUM(SUMIFS({}))} vs SUMPRODUCT(SUMIFS({})) both works fine, but SUMPRODUCT feels a bit easier to write without the CTRL-SHIFT-ENTER to create the {}.
Preference
I personally prefer writing SUMPRODUCT(--(ISNUMBER(MATCH(...)))) over SUMPRODUCT(SUMIFS({})) for multiple criteria.
However, if you have a drop-down menu where you want to select specific characteristics or all, SUMPRODUCT(SUMIFS()), is the only way to go. (as for selecting "all", the value should enter in "<>" + "Whatever word you want as long as it's not part of the specific characteristics".
In order to get the formula to work place the cursor inside the formula and press ctr+shift+enter and then it will work!
With the following, it is easy to link the Cell address...
=SUM(SUMIFS(FAGLL03!$I$4:$I$1048576,FAGLL03!$A$4:$A$1048576,">="&INDIRECT("A"&ROW()),FAGLL03!$A$4:$A$1048576,"<="&INDIRECT("B"&ROW()),FAGLL03!$Q$4:$Q$1048576,E$2))
Can use address / substitute / Column functions as required to use Cell addresses in full DYNAMIC.

Resources