Excel Pivot - Count of distinct values in a given range - excel

I've the following data set from which I need the count of distinct values in a pivot. I've tried few function like FREQUENCY, COUNTIFS etc. but I could not make it.
Input
Input Data
Output
Expected Output

=SUM(IF((B2:D4=C10),1,0))
To get result after using formula hit ctrl+shift+enter

I think it's an awkward case because the data values are in more than one column and because they are text not numbers.
The only way I could come up with would be to repeat a standard method of getting the distinct values and then use COUNTIF to get the counts.
So starting in F2 I have:-
=IFERROR(INDEX($B$2:$B$4,MATCH(0,COUNTIFS($F$1:$F1,$B$2:$B$4),0)),
IFERROR(INDEX($C$2:$C$4,MATCH(0,COUNTIFS($F$1:$F1,$C$2:$C$4),0)),
IFERROR(INDEX($D$2:$D$4,MATCH(0,COUNTIFS($F$1:$F1,$D$2:$D$4),0)),"")))
(It's an array formula and must be entered with CtrlShiftEnter)
And starting in G2:-
=COUNTIF($B$2:$D$4,F2)
To avoid having to specify an exact range (e.g. $B2:$B4), you could use the following in F2 and adjust it to the maximum number of rows you are likely to use:-
=IFERROR(INDEX($B$2:$B$10,MATCH(0,IF(ISTEXT($B$2:$B$10),COUNTIFS($F$1:$F1,$B$2:$B$10),1),0)),
IFERROR(INDEX($C$2:$C$10,MATCH(0,IF(ISTEXT($C$2:$C$10),COUNTIFS($F$1:$F1,$C$2:$C$10),1),0)),
IFERROR(INDEX($D$2:$D$10,MATCH(0,IF(ISTEXT($D$2:$D$10),COUNTIFS($F$1:$F1,$D$2:$D$10),1),0)),"")))
and this in G2:-
=IF(F2="","",COUNTIF($B$2:$D$10,F2))
but of course it's restricted to three columns and anything beyond this I think may point to a VBA solution.
There is a also a general formula for distinct values from a 2d array here but the output includes a zero when blank rows and columns are included so would need some modification.
So here is the modified formula from the reference above with error handling starting in I2:-
=IFERROR(INDEX(tbl_text, MIN(IF( IF(ISTEXT(tbl_text),COUNTIF($I$1:$I1, tbl_text),1)=0, ROW(tbl_text)-MIN(ROW(tbl_text))+1)),
MATCH(0, COUNTIF($I$1:$I1, INDEX(tbl_text, MIN(IF(IF(ISTEXT(tbl_text),COUNTIF($I$1:$I1, tbl_text),1)=0, ROW(tbl_text)-MIN(ROW(tbl_text))+1)), , 1)), 0), 1),"")
With the counts starting in J2:-
=IF(J2="","",COUNTIF(tbl_text,J2))
where tbl_text is a named range defined (when I tested it) as $B$2:$E$10
This I think should meet your additional criterion of it being more generalized because you can set tbl_text to include the maximum number of rows and columns that you are likely to use.
Will need a slight further modification to ignore blanks within the table.

Related

Getting number and off setting numbers from array in Excel

I have array of numbers in a single column like this:
I want only that numbers for which corresponding negative numbers exist. If number exist 2 times, but negative number exist only one time, then I wanted to retain one positive and one negative number. Similarly, if number exists 3 times, and negative number appears only two times, then I want 2 set of numbers including positive and negative. In this case, I wanted to get output:
5 2 -2 -5
Orders of numbers are not relevant for me. Please do not use VBA. You can create multiple column and apply filter at the end.
Thank you for the response, but I wanted to get the data in column next to the values. Like:
5
2
-2
-5
Please help.
Here's another Office 365 solution:
Name the data range DATA
Put this formula anywhere: =CONCAT(REPT(-ROW(A1:A100)&" "&ROW(A1:A100)&" ",COUNTIF(DATA,"="&ROW(A1:A100)*IF(COUNTIF(DATA,"="&-ROW(A1:A100))<COUNTIF(DATA,"="&ROW(A1:A100)),-1,1))))
That will output the pairs into one cell.
Here's a slightly modified Step 2, which excludes duplicates: =CONCAT(IF((COUNTIF(DATA,"="&-ROW(A1:A100))>0)*(COUNTIF(DATA,"="&ROW(A1:A100))>0),-ROW(A1:A100)&" "&ROW(A1:A100)&" ",""))
Looks like this:
The data doesn't need to be sorted. Both methods work up to 100, but you can easily expand that by changing A100 to A1000 or whatever you need.
Use the vlookup formula to identify the rows, and you can use the Filter & Unique formula to get the list, or a pivot table.
First, immediately next to your data use the formula:
=vlookup(A1*-1,$A$1:$A$1,1,0)
For non-365:
This will produce an error for each instance that doesn't have a match. You can filter at this point to get your list from the existing table. You can also create a pivot table under the Data tab of your ribbon and inserting a pivot table. Filter the #N/A from there to get an exclusive list without hidden rows.
For 365:
You can use the following combination of formulas to get the exclusive list as well.
=UNIQUE(FILTER(B1:B8,ISNUMBER(B1:B8)),0,0) or =UNIQUE(FILTER($B$1:$B$8,ISNUMBER($B$1:$B$8)),0,0) should yield the same results
As ScottCraner mentioned, you can circumvent the helper column in 365 by modifying the formula a bit more:
=UNIQUE(FILTER(A1:A8,ISNUMBER(MATCH(-A1:A8,A1:A8,0)),"")
The Match here is doing something similar to the Vlookup, but housing that logic within the formula, so it's a cleaner solution in my opinion.
Using your data the result was { -5,-2,2,5 }
These are spill formulas so you only need to put it in one spot and it will expand the formula over the adjacent cells below where it's entered for however many cells needed to list all the unique numbers that occur. It takes into account the negatives and so on. This may be a 365 formula, so if you're on another version of excel it may not work.
Edit: Adjusted the instructions to fully address the question.

Taking average of certain values in one Excel column based on values in another

I have a (large) array of data in Excel of which I need to compute the average value of certain values in one column, based on the values of another column. For example, here's a snippet of my data:
So specifically, I want to take the average of the F635 mean values corresponding with Row values of 1. To take it a step further, I want this to continue to Row values of 2, Row values of 3 etc.
I'm not familiar with how to run code in Excel but have attempted to solve this by using the following:
=IF($C = "1", AVERAGE($D:$D), "")
which (to my understanding) can be interpreted as "if the values (anywhere) in column C are equal to 1, then take the average of the corresponding values in column D."
Of course, as I try this I get a formula error from Excel.
Any guidance would be incredibly appreciated. Thanks in advance.
For more complicated cases, I would use an array-formula. This one is simple enough for the AVERAGEIF formula. For instance =AVERAGEIF(A1:A23;1;B1:B23)
Array-formula allows for more elaborate ifs. To replicate the above, you could do =SUM(IF($A$1:$A$23=1;$B$1:$B$23;0))/COUNT(IF($A$1:$A$23=1;$B$1:$B$23;0)).
Looks like more work but you can create extremely elaborate if-statements. Instead of hitting ENTER, do CTRL-ENTER when entering the formula. Use * between criteria to replicate AND or + for OR. Example: SUM(IF(($A$1:$A$23="apple")*($B$1:$B$23="green");$C$1:$C$23;0)) tallies values for green apples in c1:c23.
Your sample data includes three columns with potential ifs so my guess is that you're going to need array formulas at some point.
Excel already has a builtin function for exactly this use; AVERAGEIF().
=AVERAGEIF(C:C,1,D:D)

I want to use sumproduct with two different tables based on selection

I am working on a statistical model where we use sumproduct to generate forecast values by multiplying coefficients in one table with variables in another. Right now it is being done manually and that is taking time. I would like to automate it but I'm not able to figure this out.
We are using concatenate to identify different rows to use for vlookup. The variable columns are the same in number for both tables. I need to multiply each variable cell respectively in both tables and sum them, hence sumproduct.
this is what I am trying to do
Forecast model 1 sales for product A in phones in USA = sumproduct([variables by year from table 1 for USA for phones], [Variables for USA phone product A model 1 from table 2] )
I hope someone can help me.
Proof of Concept
You will need to update the references to suit your spreadsheet table locations.
In cell E21 use the following and copy right and down as required:
=SUMPRODUCT(INDEX($G$3:$I$12,MATCH($B21&$A21&$C21,$A$3:$A$12,0),0),INDEX($F$15:$H$18,MATCH($A21&$C21&$D21&MID(E$20,16,1),$A$15:$A$18,0),0))
This process was simplified because you had a unique ID tag on each of the previous two tables that could be built from the information in the third table. If you ever get into double digit forecast models the MID() function part of the formula will need to be modified. The 16 in the mid function refers to the character location of the number in the forecast model sales header name in Table 3. As such you either need to keep that header format exactly the same or modify the position of the number in the MID() function.
UPDATE 1
Explanation of Formulas
The following formulas were used in this solution:
SUMPRODUCT
INDEX
MATCH
MID
Concatenate
I will start with the assumption that you already understand sumproduct() as you were already using it before you ran into your problem. One thing to note about sumproduct is that it causes array like calculation to occur on the portion within it brackets. In this case we fed it two ranges of equal size. The difficult part was more an issue of determining those ranges.
Using your ID columns as a lookup row we used the match() function to determine which row to use. For the first set of variables we used the following to determine which row to look in:
=MATCH($B21&$A21&$C21,$A$3:$A$12,0)
Match is made up of three arguments inside the brackets:
MATCH(what to look for, where to look, type of match)
What we need to look for in table is a concatenation of various cells in Table 3 to build the ID in Table 1. It could have been written using the full formula:
=CONCATENATE($B21,$A21,$C21)
but the short form using & was used instead:
=$B21&$A21&$C21
Once we had what to look for we needed the range of where to look and supplied the ID column from table 1:
$A$3:$A$12
This now leaves the third and final argument of what type of search to perform. An exact match seemed to be the most appropriate match to perform so the value of 0 was supplied. What match returns is the row within the supplied range. It is relative to the range supplied and not the actual row in the spreadsheet. If it cannot make a match it will return an error instead of a row number.
Now that we know what row we want, we can use this information with the INDEX() function. The INDEX() function is made up of 3 arguments as well with the third argument being optional depending on if a 1D or 2D range is being indexed:
INDEX(Range to work with, 2D Row or 1D Position reference, 2D Column reference)
IN the case we are dealing with for the first table, the range to work with was your list of variables:
$G$3:$I$12
This is a 2D range. As such we need to tell INDEX() both what Row to look in as well as which Columns to look in. For the row to look in, we used the previously discussed MATCH() function. Since we want all columns and not just a specific column we use the value of 0. If Match returns an error, or if a number greater than the number of rows or columns selected is supplied, INDEX() will return an error. Based on the information discussed, the index function would look like:
=INDEX($G$3:$I$12,MATCH($B21&$A21&$C21,$A$3:$A$12,0),0)
You can try entering the above in a cell but it will give you an error. if you select three adjacent cells in the same row and use CONTROL+SHIFT+ENTER when entering the formula, Excel will add {} around the formula and it will be an array formula and should show you the three variables being used.
The same process as described above can be used for determining the second range of variable from Table 2. The only difference here is that the forecast model number was not in a column of its own but instead in the header row surrounded by text. As such the MID() function needed to be used to go into the header row, bypass the surrounding text and pull the model number out so it could be used as part of the CONCATENATION() used for the "what to look for" in MATCH():
=MID(E$20,16,1)
The MID() function work again with three arguments:
MID(Text to look in, which character to start at, how many characters to pull)
So in this case we are looking at the header in E20. Note the lock $ on the row number so the formula is always looking in row 20 no matter how far down it gets copied. It is then going to the 16th character. In this case the character "1" and pulling 1 character. If the header had just been 1 and 2, there would be no need for the MID function and the cell (with proper lock) could have been used.

Sumproduct or Countif on a 2D matrix

I'm working on data from a population of people with allergies. Each person has a unique ExceptionID, and each allergen has a unique AllergenID (451 in total).
I have a data table with 2 columns (ExceptionID and AllergenID), where each person's allergies are listed row by row. This means that the ExceptionID column has repeated values for people with multiple allergies, and the AllergenID column has repeated values for the different people who have that allergy.
I am trying to count how many times each pair of allergies is present in this population (e.g. Allergen#107 & Allergen#108, Allergen#107 & Allergen#109,etc). To keep it simple I've created a matrix of 451 rows X 451 columns, representing every pair (twice actually because A/B and B/A are equivalent).
I somehow need to use the row name (allergenID) to lookup the ExceptionID in my data table, and count the cases where that matches the ExceptionIDs from the column name (also AllergenID). I have no problem using Vlookup or Index/Match, but I'm struggling with the correct combination of a lookup and Sumproduct or Countif formula.
Any help is greatly appreciated!
Mike
PS I'm using Excel 2016 if that changes anything.
-=UPDATE=-
So the methods suggested by Dirk and MacroMarc both worked, though I couldn't apply the latter to my full data set (17,000+ rows) because it was taking a long time.
I've since decided to turn this into a VBA macro because we now want to see the counts of triplets instead of pairs.
With the 2 columns you start with, it is as good as impossible... You would need to check every ExceptionID to have 2 different specific AllergenID. Better use a helper-table with ExceptionID as rows and AllergenID as columns (or the opposite... whatever you like). The helper table needs a formula like:
=COUNTIFS($A:$A,$D2,$B:$B,E$1)
Which then can be auto-filled. (The ranges are from my example, you need to change them to your needs).
With this helper-matrix you can easily go for your bigger matrix like this:
=COUNTIFS(E:E,1,INDEX($E:$G,,MATCH($I2,$E$1:$G$1,0)),1)
Again, you can auto-fill with this formula, but you need to change it, so it fits your needs.
Because the columns have the same ID2 (would be your AllergenID), there is no need to lookup them because E:E changes automatically with the auto-fill.
Most important part of the formulas are the $ which should not be messed up, or you can not auto-fill it.
Picture of my self-made example (formulas are from the upper left cell in each table):
If you still have any questions, just ask :)
It can be done straight from your original set-up with array formulas:
Please note that array formulas MUST be entered with Ctrl-Shift-Enter, before copying across and down:
In the example pic, I have NAMED the data ranges $A$2:$A$21 as 'People' and $B$2:$B$21 as 'Allergens' to make it a nicer set-up. You can see in the formula bar how that looks as a formula. However you could use the standard references like this in your first matrix cell:
EDIT: silly me, N function is not needed to turn the booleans into 1's and 0's, since multiplying booleans will do the trick. Below formula works...
SUM(IF(MATCH($A$2:$A$21,$A$2:$A$21,0)=ROW($A$2:$A$21)-1, NOT(ISERROR(MATCH($A$2:$A$21&$E2,$A$2:$A$21&$B$2:$B$21,0)))*NOT(ISERROR(MATCH($A$2:$A$21&F$1, $A$2:$A$21&$B$2:$B$21,0))), 0))
Then copy from F2 across and down. It can be perhaps improved in technique with sumproduct or whatever, but it's just a rough example of the technique....

Find Minimum Value Based on 2 Criteria (Excel 2013)

Looking to find the max value in a column based on two sets of criteria
So the logic would be: Find the minimum value in column M, where the value in column A matches column N, and the value in Column Y is less than 318.
I've tried using an array formula like this but it doesn't seem to be working/is to memory heavy to run:
=MIN(IF(AND(N:N=A2,Y:Y<=318),M:M))
is there a simpler way? or perhaps a UDF that could work?
Thank you for your help!
You can't use AND in these type of formulas because it only returns a single value rather than the required array.
Here are three possible working versions:
1.) Use * to simulate AND
=MIN(IF((N:N=A2)*(Y:Y<=318),M:M))
confirmed with CTRL+SHIFT+ENTER
2.) Use multiple nested IFs
=MIN(IF(N:N=A2,IF(Y:Y<=318,M:M)))
confirmed with CTRL+SHIFT+ENTER
3.) Use AGGREGATE function
=AGGREGATE(15,6,M:M/(N:N=A2)/(Y:Y<=318),1)
The advantages of this approach are that you don't need "array entry", and it can ignore any errors in the data
Either way it's best to reduce the ranges sizes if you can because it might be slow with whole columns
Give this a try and adjust ranges to suit. Try not to use whole column references:
=SMALL(INDEX(($N$2:$N$101=A2)*($Y$2:$Y$101<=318)*$M$2:$M$101,),1+ROWS($M$2:$M$101)-COUNTIFS($N$2:$N$101,A2,$Y$2:$Y$101,"<=318"))
If you are using the whole column to pick up new data as it is added, consider using Dynamic Named Ranges instead
When things get this complex, I'll usually break it down and setup smaller/simpler formulas in seperate columns.
In other words, you have data in columns A through Y ?
So let's create a formula in column AA:
1) identify when value in Col A matches col N, and value in col Y < 318
=and(A1=N1,Y1<318)
2) copy AA1 to all the rows of your data.
3) now we have a condition to work off .. since there is a SUMIF and COUNTIF, but no MINIF .. we'll have to build that ourselves. first the IF:
in column AB1:
=if(AA1,M1,"")
copy that down to all your data.
finally, do your min:
=MIN(AB:AB)
Should give you your answer.
You could probably splice the first two together, but again, building a complex formula like this, build it simply, first, ;)

Resources