I'm working on data from a population of people with allergies. Each person has a unique ExceptionID, and each allergen has a unique AllergenID (451 in total).
I have a data table with 2 columns (ExceptionID and AllergenID), where each person's allergies are listed row by row. This means that the ExceptionID column has repeated values for people with multiple allergies, and the AllergenID column has repeated values for the different people who have that allergy.
I am trying to count how many times each pair of allergies is present in this population (e.g. Allergen#107 & Allergen#108, Allergen#107 & Allergen#109,etc). To keep it simple I've created a matrix of 451 rows X 451 columns, representing every pair (twice actually because A/B and B/A are equivalent).
I somehow need to use the row name (allergenID) to lookup the ExceptionID in my data table, and count the cases where that matches the ExceptionIDs from the column name (also AllergenID). I have no problem using Vlookup or Index/Match, but I'm struggling with the correct combination of a lookup and Sumproduct or Countif formula.
Any help is greatly appreciated!
Mike
PS I'm using Excel 2016 if that changes anything.
-=UPDATE=-
So the methods suggested by Dirk and MacroMarc both worked, though I couldn't apply the latter to my full data set (17,000+ rows) because it was taking a long time.
I've since decided to turn this into a VBA macro because we now want to see the counts of triplets instead of pairs.
With the 2 columns you start with, it is as good as impossible... You would need to check every ExceptionID to have 2 different specific AllergenID. Better use a helper-table with ExceptionID as rows and AllergenID as columns (or the opposite... whatever you like). The helper table needs a formula like:
=COUNTIFS($A:$A,$D2,$B:$B,E$1)
Which then can be auto-filled. (The ranges are from my example, you need to change them to your needs).
With this helper-matrix you can easily go for your bigger matrix like this:
=COUNTIFS(E:E,1,INDEX($E:$G,,MATCH($I2,$E$1:$G$1,0)),1)
Again, you can auto-fill with this formula, but you need to change it, so it fits your needs.
Because the columns have the same ID2 (would be your AllergenID), there is no need to lookup them because E:E changes automatically with the auto-fill.
Most important part of the formulas are the $ which should not be messed up, or you can not auto-fill it.
Picture of my self-made example (formulas are from the upper left cell in each table):
If you still have any questions, just ask :)
It can be done straight from your original set-up with array formulas:
Please note that array formulas MUST be entered with Ctrl-Shift-Enter, before copying across and down:
In the example pic, I have NAMED the data ranges $A$2:$A$21 as 'People' and $B$2:$B$21 as 'Allergens' to make it a nicer set-up. You can see in the formula bar how that looks as a formula. However you could use the standard references like this in your first matrix cell:
EDIT: silly me, N function is not needed to turn the booleans into 1's and 0's, since multiplying booleans will do the trick. Below formula works...
SUM(IF(MATCH($A$2:$A$21,$A$2:$A$21,0)=ROW($A$2:$A$21)-1, NOT(ISERROR(MATCH($A$2:$A$21&$E2,$A$2:$A$21&$B$2:$B$21,0)))*NOT(ISERROR(MATCH($A$2:$A$21&F$1, $A$2:$A$21&$B$2:$B$21,0))), 0))
Then copy from F2 across and down. It can be perhaps improved in technique with sumproduct or whatever, but it's just a rough example of the technique....
Related
I have array of numbers in a single column like this:
I want only that numbers for which corresponding negative numbers exist. If number exist 2 times, but negative number exist only one time, then I wanted to retain one positive and one negative number. Similarly, if number exists 3 times, and negative number appears only two times, then I want 2 set of numbers including positive and negative. In this case, I wanted to get output:
5 2 -2 -5
Orders of numbers are not relevant for me. Please do not use VBA. You can create multiple column and apply filter at the end.
Thank you for the response, but I wanted to get the data in column next to the values. Like:
5
2
-2
-5
Please help.
Here's another Office 365 solution:
Name the data range DATA
Put this formula anywhere: =CONCAT(REPT(-ROW(A1:A100)&" "&ROW(A1:A100)&" ",COUNTIF(DATA,"="&ROW(A1:A100)*IF(COUNTIF(DATA,"="&-ROW(A1:A100))<COUNTIF(DATA,"="&ROW(A1:A100)),-1,1))))
That will output the pairs into one cell.
Here's a slightly modified Step 2, which excludes duplicates: =CONCAT(IF((COUNTIF(DATA,"="&-ROW(A1:A100))>0)*(COUNTIF(DATA,"="&ROW(A1:A100))>0),-ROW(A1:A100)&" "&ROW(A1:A100)&" ",""))
Looks like this:
The data doesn't need to be sorted. Both methods work up to 100, but you can easily expand that by changing A100 to A1000 or whatever you need.
Use the vlookup formula to identify the rows, and you can use the Filter & Unique formula to get the list, or a pivot table.
First, immediately next to your data use the formula:
=vlookup(A1*-1,$A$1:$A$1,1,0)
For non-365:
This will produce an error for each instance that doesn't have a match. You can filter at this point to get your list from the existing table. You can also create a pivot table under the Data tab of your ribbon and inserting a pivot table. Filter the #N/A from there to get an exclusive list without hidden rows.
For 365:
You can use the following combination of formulas to get the exclusive list as well.
=UNIQUE(FILTER(B1:B8,ISNUMBER(B1:B8)),0,0) or =UNIQUE(FILTER($B$1:$B$8,ISNUMBER($B$1:$B$8)),0,0) should yield the same results
As ScottCraner mentioned, you can circumvent the helper column in 365 by modifying the formula a bit more:
=UNIQUE(FILTER(A1:A8,ISNUMBER(MATCH(-A1:A8,A1:A8,0)),"")
The Match here is doing something similar to the Vlookup, but housing that logic within the formula, so it's a cleaner solution in my opinion.
Using your data the result was { -5,-2,2,5 }
These are spill formulas so you only need to put it in one spot and it will expand the formula over the adjacent cells below where it's entered for however many cells needed to list all the unique numbers that occur. It takes into account the negatives and so on. This may be a 365 formula, so if you're on another version of excel it may not work.
Edit: Adjusted the instructions to fully address the question.
I am new to Excel/Google Sheets. I have a difficulty of writing a formula to compare columns as a pair-wise since the formula would be
so big as the day goes.
For example, there're 2 main columns Foo and Bar. I want to find the total number of days that Foo
and Bar are equal so the current formula is =IF(A3 = G3, 1, 0)+IF(B3 = H3, 1, 0)+IF(C3 = I3, 1, 0)+...
But this is kind of tedious because there're ~40 days to compare with. Are there any other alternatives
to write a formula in efficient way? Either Google-App-Scripts or Excel Formula is appreciated.
Cheers!
Give a try on below google-sheet formula. Adjust ranges as you need.
=ArrayFormula(SUM(IF(A3:E3=G3:K3,1,0)))
Assuming that you're needing to get such a total for each row and not merely a single row, try this:
=ArrayFormula(IF(A3:A="",,MMULT(IF(A3:F=G3:L,1,0),SEQUENCE(COLUMNS(A:F),1,1,0))))
Of course you will need to adjust the three ranges to match your own FOO and BAR ranges.
This one formula will produce all results for all rows.
The MMULT function is tricky to explain to those as yet unfamiliar with it. But it's a powerful tool. I'll add a picture I created that may best explain what it does:
By making the second matrix a simple SEQUENCE of 1s as long as the other matrix is wide, we wind up multiplying everything by 1 before adding together. And since anything multiplied by 1 is itself, this combination serves only to do a row-by-row add.
Things to keep in mind with MMULT:
1.) Every cell in every matrix must be a number or it will produce an error.
2.) As in the above formula, there are ways to use either/or conditions to turn every cell in a matrix into a number.
I have a pricelist, with currently 5 different categories of products. Each product will have to have two different prices. Depedning of the product and the type of price, the calculation will be different. Therefor I've used INDEX/MATCH to find the formula needed, from a table I created.
Below a screendump, and I wanted to attach the Excel fil, but canøt seem to work out how.
Question: HOW do I then "run" the formula I fetched? -I've tried different suggestions on using EVALUATION, but it doesn't seem to cut it? Also I've tried "Indirect' on the whole formula, without success.
I would like to avoid any VBA for this case.
Can anybody provide some insight?
You could but if I understand properly, the only thing changing in the formulas is the "muliplier" number, then it's better to lookup that number instead of the whole formula. The other method (which would use Evaluate etc) is not be considered "good practice" for a number of reasons.
EDIT:
I didn't see the 2nd varying value (since I was on the SO mobile app) but it's still not an issue since it would a target column. You could be thinking of the opposite: sometimes lookups based on multiple criteria can get complicated, but this a matter of more data, as opposed to adding criteria for the lookup.
VLookup would have been the simplest method, like G2 could have been:
=VLOOKUP(E2, $J$4:$L$8, 2, False)
...to return the second column of range J4:L8 where the first column equals E2. (Then for the next required column, same formula except with 3 instead of 2.)
Since I wasn't sure more columns could be added one day, I allowed for that by, instead of specifying "Column 2 or 3" etc, it finds the column dynamically by name. (So the multiplier/factor used in G2 will change if you change the title in G1 to the name of a different column existing in the target data chart.
For the sake of neatness as well as potential of additional columns like G & H, I moved the lookup table to a separate sheet. It can stay out of the way since you won't need to see or change it very often. (If the same chart was going to be referenced by many workbooks, you could even move it to a separate workbook and point all formulas at that, since it's always best to have one copy of identical data instead of many in different workbooks.
Also to assist with potential future changes (and just to be tidier), instead of referring to the target table range addresses (like "J4:L8" etc) I named two ranges:
the table of multiplier/factor data can be referred to by it's address, or by myMultipliers
the titles of the same table is also called myMultiplierTitles (used to match to the titles of column G & H on the original sheet.
Formula
After those changes, the lookup formula in G2 is:
=INDIRECT(VLOOKUP($E2,myMultipliers,MATCH(G$1,myMultiplierTitles,0),FALSE)&ROW())*VLOOKUP($E2,myMultipliers,MATCH(G$1,myMultiplierTitles,0)+1,FALSE)
INDIRECT returns the value of a cell that you refer to by name (text/string) as opposed to directly (as a range). For example:
=INDIRECT("A1")
returns the same as
=A1
...but with INDIRECT we can get the name from elsewhere (a cell, function or formula). So if x="A1" then =INDIRECT(x) returns the same as the 2 above examples.
Your original plan of storing the entire formula in a table as text would have worked with the help of INDIRECT and/or EVALUATE but I think this way is considered better practice partly because it facilitates easier future expansion.
The formula is longer than it would have been, but that's mostly because it's dynamically reading the field names. And size doesn't matter. :-)
I have a (large) array of data in Excel of which I need to compute the average value of certain values in one column, based on the values of another column. For example, here's a snippet of my data:
So specifically, I want to take the average of the F635 mean values corresponding with Row values of 1. To take it a step further, I want this to continue to Row values of 2, Row values of 3 etc.
I'm not familiar with how to run code in Excel but have attempted to solve this by using the following:
=IF($C = "1", AVERAGE($D:$D), "")
which (to my understanding) can be interpreted as "if the values (anywhere) in column C are equal to 1, then take the average of the corresponding values in column D."
Of course, as I try this I get a formula error from Excel.
Any guidance would be incredibly appreciated. Thanks in advance.
For more complicated cases, I would use an array-formula. This one is simple enough for the AVERAGEIF formula. For instance =AVERAGEIF(A1:A23;1;B1:B23)
Array-formula allows for more elaborate ifs. To replicate the above, you could do =SUM(IF($A$1:$A$23=1;$B$1:$B$23;0))/COUNT(IF($A$1:$A$23=1;$B$1:$B$23;0)).
Looks like more work but you can create extremely elaborate if-statements. Instead of hitting ENTER, do CTRL-ENTER when entering the formula. Use * between criteria to replicate AND or + for OR. Example: SUM(IF(($A$1:$A$23="apple")*($B$1:$B$23="green");$C$1:$C$23;0)) tallies values for green apples in c1:c23.
Your sample data includes three columns with potential ifs so my guess is that you're going to need array formulas at some point.
Excel already has a builtin function for exactly this use; AVERAGEIF().
=AVERAGEIF(C:C,1,D:D)
i have a little problem with final formulas in one of my column. How to start. maybe i will explain what i have a then what i want.
i have an excel worksheet with 3 sheets. i want to record goods and what are these goods made of. first is sheet called Goods where is possible to put number of goods i want to make. In this case i want to make 1x sandwich1 and at the same time 3x sandwich2. i dont want make sandwich3 this time.
Second sheet is Matrix sheet where I record every good and what it is made of. This sheet is basic sheet and all other sheets take list of goods (resp. ingredients) from this sheet. Simply when i want to make sandwich1 i look at matrix and know that i need 1x1pc of egg + 1x5g of cheese. And for 3x sandwiche2 i need 3x10g of sausages.
Final sheet is called Ingredients. It is a list of used ingredients from Matrix sheet (exactly same order) to make these sandwiches. I want to fill formulas into column B which would go through one ingredient ofter ingredient and count needed amount of it. So it would look into matrix in the same row and where there is some number it would multiply with number of items from Goods sheet. The list of goods is also in the same order as in the matrix sheet.
I hope you understand now what i want and will try to help me. I think there will be SUMPRODUCT, SUMIF and maybe INDERECT functions but i am not that skilled in excel
thanks for any suggestions
You can use MMULT function here - it's an "array formula" which you need to enter in a range. You can do that like this:
In Ingredients worksheet enter this formula in B2
=MMULT(Matrix!C2:E4+0;Goods!B2:B4+0)
[I'm assuming you have a European version of Excel where ; is used to separate arguments]
Now select the whole range B2:B4, press F2 key to select formula and hold down CTRL and SHIFT keys and press ENTER. This "array enters" the formula in the range and you should now see curly braces like { and } around the formula and also the correct results.
You cannot change part of that array now, only the whole thing
Note that I'm assuming that the contents of Goods!A2:A4 will be the same as Matrix!C1:E1 and in the same order. You can extend the ranges to be as large as you like as long as that principle still holds
I suspect that this is an issue of "when all you have is a hammer, every problem is a nail". For reasons known only to you you are using a spreadsheet to solve a problem that databases were made to do. Any solution to this problem in a spreadsheet will be entirely dependent on the integrity of your data - add another column or get things out of order and it will fail.
That said, what you have in your link is effectively a pivot table and what you need is the unpivoted version of this - the instructions for getting this are here.
When you have that, you can use the various database functions in excel to get your answer.