Evaluating Lab Data in Excel - excel

Morning, I am working to review/analyze a set of lab data. It has up to 95 samples, so I am looking to automate as much as I can, but I am at a wall with this one.
Each sample has three columns; a concentration, a qualifier (which may be blank, u, or j), and a MDL. Currently the data starts in column AC, with 3 columns per sample, but for a variable number of samples. My goal right now is to count the number of samples that have a "u" in the qualifier column AND have an MDL that is not equal to their concentration. (I tried pasting in a table, but I couldn't get it to work so apologies).
I am currently using a simple sum of COUNTIFS statements "=SUM(COUNTIFS(AD5,"U",AC5,"<>"&AE5)+COUNTIFS(AG5,"U",AF5,"<>"&AH5)...", however I have to update the cells referenced manually which is a bit time intensive and can introduce errors. Does anyone know how I might use an offset formula or something? I figure that is my best bet since the data is uniformly setup. I just can't get it right on my end.
Many thanks for any help!

A simple solution is to reformat to a single table (probably using VBA for automation?) then create a fourth column with a logical value for the concentration == MDL test. Then use countifs like you proposed.
This can be obviously extended by creating a fifth column with the AND() test and using simple countif.
If you have samples on different columns (I get the impression because of: =SUM(COUNTIFS(AD5,"U",AC5,"<>"&AE5)+COUNTIFS(AG5,"U",AF5,"<>"&AH5)...",
and you really want to use offset, you can create a "sample count" cell for each table (located at row 1+ the maximum sample size you have in the whole book) and then use offset and countifs like you wrote. References should update automatically when copying.
If I understand correctly, you could use countifs straight up if the references are set up correctly and copy the cell, pasting it at the location I said before
(1+ the maximum sample size you have in the whole book)

Related

How do you average repeating blocks of cells in Excel?

I am trying to figure out how to average 4 x 4 groups of cells in my spreadsheet across a very large data set. I've tried using OFFSET with a cell range (e.g. B2:E5) but I haven't had success (I don't even know if you can use a range for the reference with OFFSET). This is my first time tackling a problem like this, so any advice would be welcome! A portion of the data set is attached to give an idea of the ranges I would like to average.
If the data was in B2:Q17 you could use this complicated looking formula to return the average for each 4*4 block.
=AVERAGE(INDEX($B$2:$Q$17,(ROWS($2:2)-1)*4+1,(COLUMNS($R:R)-1)*4+1):INDEX($B$2:$Q$17,(ROWS($2:2)-1)*4+4,(COLUMNS($R:R)-1)*4+4))
You would copy this across and down and it could go anywhere on a sheet.
You can use INDEX to refer to a range derived from values fro TopLeft and BottomRight corners, in the form
INDEX(DataRange, TopRow, LeftCol):INDEX(DataRange, BottomRow, RightCol)
Then wrap that in AVERAGE(...)
To demonstrate
=AVERAGE(INDEX($B$7:$AH$903,B1,B2):INDEX($B$7:$AH$903,B3+B1-1,B4+B2-1))
Note: INDEX has the advantage over OFFSET in that it's non-volatle

How do I sum several rows by category? (EXCEL)

I have created an excel sheet to have an overview over costs in my projects, however, I also need an overview of costs per category in my projects. I googled it and tried to find examples online, however, it only returns a value of 0, which shouldn't be the case. Can anyone help me? The sheet looks like below.
I am going by the SUMIF function to group by category but my excel sheet is a bit more complex than that so I tried to adjust it accordingly as seen in the code below. No matter what I do it either returns an error or 0.
=IF(B12=B8;"";SUMIF(B12:B39;B12;J12:BE39))
In the formula above I am trying to sum the costs of a category that could be written in B12, for example, Software development. For confidential reasons, I cannot show the actual filled out excel sheet.
sumif does not work with summing multiple columns. Instead use a sumproduct statement instead like so:
=IF($B12=$B$8;"";SUMPRODUCT(($B$12:$B$39=$B12)*($J$12:$BE$39)))
A detailed explanation to how this works can be found here
Edit:
I sense a follow up question coming, how to skip certain columns. Because as you have set it up now, it will count the entire range from J12 to BE39, in which you have both forecast costs and actual costs. I guess this is to compare the costs to what was projected and what the actual costs are. Right now it will count both the projected and actual cost, doubling up. To prevent this you can enter every second column separated by a + like so:
=IF($B12=$B$8;"";SUMPRODUCT(($B$12:$B$39=$B12)*($J$12:$J$39+$L$12:$L$39+$N$12:$N$39)))
Also I have added $ signs to all non-changing values so it will work when dragging down the fill handle on the formula to populate the below cells.

Using AverageIf function on large amount of cells does not behave as expected. What am I doing wrong?

I have an excel spreadsheet that I'm fooling around with attempting to analyze data. I could go the pivot table route possibly, but I like being stubborn and building my own formulas/tables to analyze the data sometimes.
Anyways my problem is this:
I'm trying to find the average of a column of cells (just one column) that contain 'x' value somewhere in the same row. Using the function AverageIf, I can easily do this 'manually', but I'd like to be able to change the 'x' value by editing another cell.
Currently said code looks like this:
=AVERAGEIF($D$2:$D$527,L$2,$I$2:$I$527).
It works fine. But it limits what 'x' value I can sort the Average through to only ones within the D column. I attempted to highlight all the data I would potentially be analyzing like so:
=AVERAGEIF($A$2:$H$527, L$2, $I$2:$I$527)
That just gives me all sorts of issues. I'm obviously not utilizing the If function correctly here, is there anyway to fix this error or am I stuck analyzing column by column for different 'x' values.
Side note, 'x' values are all string text, not an actual digit. Not sure if that makes a difference. And I am attempting to make a small table with this data (not an actual excel table), hence the $ in the formulas, as I'm using the fill option. There are just a ton of different comparisons I could potentially make and I don't like limiting myself.
Also, when I used fill to move this formula over one column, I have a completely different error from the above formula. In the first case, the output is a . .
In the second case the output is a #div/0 error. Only difference in the two formula's is the criteria portion.
The actual code for the second filled formula:
=AVERAGEIF($A$2:$H$527, M$2, $I$2:$I$527)
Though the output is changing based upon the 'x' value. Some work fine after testing a few, most give me issues.
EDIT: Playing around with this for a bit, weird stuff keeps happening. I find it depends where on the 'table' I set up matters to how the average is calculated. In other words, an 'x' value that doesn't work in the first column of my analyzing table will work perfectly fine elsewhere in the table, but only if certain other values are selected in the other spaces of the table. If that doesn't make sense in how I described it, let me know.
EDIT 2: Just going to throw up the data set, DropBox Excel File
The problem is that when using a multi-column range as the first parameter, the "value" range (Col I) in your example gets shifted to the right, according to the location on each row where the "criteria" value is matched.
Simple example:
Using "A" as the criteria get the average from ColF as expected, but using "B" or "C" gets the average from cols G and H respectively.
I think in some cases you're actually averaging numbers in your "analysis" block (which in your posted file is to the right of your data block).

Sumproduct or Countif on a 2D matrix

I'm working on data from a population of people with allergies. Each person has a unique ExceptionID, and each allergen has a unique AllergenID (451 in total).
I have a data table with 2 columns (ExceptionID and AllergenID), where each person's allergies are listed row by row. This means that the ExceptionID column has repeated values for people with multiple allergies, and the AllergenID column has repeated values for the different people who have that allergy.
I am trying to count how many times each pair of allergies is present in this population (e.g. Allergen#107 & Allergen#108, Allergen#107 & Allergen#109,etc). To keep it simple I've created a matrix of 451 rows X 451 columns, representing every pair (twice actually because A/B and B/A are equivalent).
I somehow need to use the row name (allergenID) to lookup the ExceptionID in my data table, and count the cases where that matches the ExceptionIDs from the column name (also AllergenID). I have no problem using Vlookup or Index/Match, but I'm struggling with the correct combination of a lookup and Sumproduct or Countif formula.
Any help is greatly appreciated!
Mike
PS I'm using Excel 2016 if that changes anything.
-=UPDATE=-
So the methods suggested by Dirk and MacroMarc both worked, though I couldn't apply the latter to my full data set (17,000+ rows) because it was taking a long time.
I've since decided to turn this into a VBA macro because we now want to see the counts of triplets instead of pairs.
With the 2 columns you start with, it is as good as impossible... You would need to check every ExceptionID to have 2 different specific AllergenID. Better use a helper-table with ExceptionID as rows and AllergenID as columns (or the opposite... whatever you like). The helper table needs a formula like:
=COUNTIFS($A:$A,$D2,$B:$B,E$1)
Which then can be auto-filled. (The ranges are from my example, you need to change them to your needs).
With this helper-matrix you can easily go for your bigger matrix like this:
=COUNTIFS(E:E,1,INDEX($E:$G,,MATCH($I2,$E$1:$G$1,0)),1)
Again, you can auto-fill with this formula, but you need to change it, so it fits your needs.
Because the columns have the same ID2 (would be your AllergenID), there is no need to lookup them because E:E changes automatically with the auto-fill.
Most important part of the formulas are the $ which should not be messed up, or you can not auto-fill it.
Picture of my self-made example (formulas are from the upper left cell in each table):
If you still have any questions, just ask :)
It can be done straight from your original set-up with array formulas:
Please note that array formulas MUST be entered with Ctrl-Shift-Enter, before copying across and down:
In the example pic, I have NAMED the data ranges $A$2:$A$21 as 'People' and $B$2:$B$21 as 'Allergens' to make it a nicer set-up. You can see in the formula bar how that looks as a formula. However you could use the standard references like this in your first matrix cell:
EDIT: silly me, N function is not needed to turn the booleans into 1's and 0's, since multiplying booleans will do the trick. Below formula works...
SUM(IF(MATCH($A$2:$A$21,$A$2:$A$21,0)=ROW($A$2:$A$21)-1, NOT(ISERROR(MATCH($A$2:$A$21&$E2,$A$2:$A$21&$B$2:$B$21,0)))*NOT(ISERROR(MATCH($A$2:$A$21&F$1, $A$2:$A$21&$B$2:$B$21,0))), 0))
Then copy from F2 across and down. It can be perhaps improved in technique with sumproduct or whatever, but it's just a rough example of the technique....

Trying to improve efficiency of array formula

I have a SUM array formula that has multiple nested IF statements, making it very inefficient. My formula spans over 500 rows, but here is a simple version of it:
{=SUM(IF(IF(A1:A5>A7:A11,A1:A5,A7:A11)-A13:A17>0,
IF(A1:A5>A7:A11,A1:A5,A7:A11)-A13:A17,0))}
As you can see, the first half of the formula checks where the array is greater than zero, and if they are, it sums those in the second part of the formula.
You will notice that the same IF statement is repeated in there twice, which to me is inefficient, but is the only way I could get the correct answer.
The example data I have is as follows:
Sample Data in spreadsheet http://clients.estatemaster.net/SecureClientSite/Download/TempFiles/example.jpg
The answer should be 350 in this instance using the formula I mentioned above.
If I tried to put in a MAX statement within the array, therefore removing the test to find where it was greater than zero, so it was like this:
{=SUM(MAX(IF(B2:B6>B8:B12,B2:B6,B8:B12)-B14:B18,0))}
However, it seems like it only calculates the first row of data in each range, and it gave me the wrong answer of 70.
Does anyone know a away that I can reduce the size of the formula or make it more efficient by not needing to repeat an IF statement in there?
UPDATE
Jimmy
The MAX formula you suggested didnt actually work for all scenarios.
If i changed my sample data in rows 1 to 5 as below (showing that some of the numbers are greater than their respective cells in rows 7 to 11, while some of the numbers are lower)
Sample Data in spreadsheet http://clients.estatemaster.net/SecureClientSite/Download/TempFiles/example2.jpg
The correct answer im trying to achive is 310, however you suggested MAX formula gives an incorrect answer of 275.
Im guessing the formula needs to be an array function to give the correct answer.
Any other suggestions?
=MAX( MAX( sum(A1:A5), sum(A7:A11) ) - sum(A13:A17), 0)
A more calculation-efficient (and especially re-calculation efficient) way is to use helper columns instead of array formulae:
C1: =MAX(A1,A7)-A13
D1: =IF(C1>0,C1,0)
copy both these down 5 rows
E1: =SUM(D1:D5)
Excel will then only recalculate the formulae dependent on any changed value, rather than having to calculate all the virtual columns implied by the array formula every time any single number changes. And its doing less calculations even if you change all the numbers.
You may want to look into the VB Macro editor. In the Tools Menu, go to Macros and select Visual basic Editor. This gives a whole programming environment where you can write your own function.
VB is a simple programming language and google has all the guidebooks you need.
There, you can write a function like MySum() and have it do whatever math you really need it to, in a clear way written by yourself.
I pulled this off google, and it looks like a good guide to setting this all up.
http://office.microsoft.com/en-us/excel/HA011117011033.aspx
This seems to work:
{=SUM(IF(A1:A5>A7:A11,A1:A5-A13:A17,A7:A11-A13:A17))}
EDIT
- doesn't handle cases where subtraction ends up negative
This works - but is it more efficient???
{=SUM(IF(IF(A1:A5>A7:A11,A1:A5,A7:A11)>A13:A17,IF(A1:A5>A7:A11,A1:A5,A7:A11)-A13:A17,0))}
What about this?
=MAX(SUM(IF(A1:A5>A7:A11, A1:A5, A7:A11))-SUM(A13:A17), 0)
Edit:
Woops - Missed the throwing out negatives part. What about this? Not sure it it's faster...
=SUM((IF(A1:A5>A7:A11,IF(A1:A5>A13:A17,A1:A5,A13:A17),IF(A7:A11>A13:A17,A7:A11,A13:A17))-A13:A17))
Edit 2:
How does this perform for you?
=SUM((((A1:A5>A13:A17)+(A7:A11>A13:A17))>0)*(IF(A1:A5>A7:A11,A1:A5,A7:A11)-A13:A17))

Resources