Count the quantity of unique combinations in multiple columns in Excel - excel

I need to have excel count the number of times number pairs occur in the same row regardless of their order. The following is what I'm look for. Column C will display the number of times A & B contain the same numbers, but not necessarily in the same columns (or order). Example below: 6 2 and 2 6 should be considered the same thing. Therefore the count in Column C should = 2 for both 6 2 and 2 6.
My Objective:
I tried the pivot table suggested at the following link and it successfully counted matching pairs, but for example 6 2 and 2 6 were not considered the same and the count was only 1 for each.
This simple pivot table solution almost works
Thank you! They all seem to work, but the easiest solution I found was here Quick to copy for large data

Use this array formula:
=SUM(COUNTIFS(A:A,A1:B1,B:B,TRANSPOSE(A1:B1)))
Being an array formula it must be confirmed with Ctrl-Shift-Enter instead of Enter when exiting edit mode. If done correctly then excel will put {} around the formula.

Use the solution from SuperUser you posted, but use:
=CONCATENATE(MAX(A2:B2),MIN(A2:B2))

I'm trying to think of a single - formula solution using COUNTIFS but can't find one at the moment so here's a two formula version.
In column D =IF(A2<B2,A2&", "&B2,B2&", "&A2)
In column C =COUNTIF(D:D, D2)
This creates a list in column D with the following logic;
If A < B then A goes first
If A > B then B goes first
If A = B then B goes first but it doesn't really matter
The list will be a set of strings which we then count.

Related

Finding uninterrupted sub-arrays in Excel - Kadane's algorithm variation?

Suppose you have an ordered, indexed list of positive values. These positive values are interrupted by 0 values. I want to determine if a consecutive sub-array exists which is not interrupted by 0 values and whose sum exceeds a certain threshold.
Simple example:
Index, Value
0 0
1 0
2 3
3 4
4 2
5 6
6 0
7 0
8 0
9 2
10 3
11 0
In the above example, the largest consecutive sub-array not interrupted by 0 is from index 2 to index 5 inclusive, and the sum of this sub-array is 15.
Thus, for the following thresholds 20, 10 and 4, the results should be FALSE, TRUE and TRUE respectively.
Note I don't necessarily have to find the largest sub-array, I only have to know if any uninterrupted sub-array sum exceeds the defined threshold.
I suspect this problem is a variation of Kadane's algorithm, but I can't quite figure out how to adjust it.
The added complication is that I have to perform this analysis in Excel or Google Sheets, and I cannot use scripts to do it - only inbuilt formulas.
I'm not sure if this can even be done, but I would be grateful for any input.
Start with
=B2
in c2
then put
=IF(B3=0,0,B3+C2)
in C3 and copy down.
EDIT 1
If you were looking for a Google sheets solution, try something like this:
=ArrayFormula(max(sumif(A2:A,"<="&A2:A,B2:B)-vlookup(A2:A,{if(B2:B=0,A2:A),sumif(A2:A,"<="&A2:A,B2:B)},2)))
Assumes that numbers in column B start with zero: would need to add Iferror if not. It's basically an array formula implementation of #Gary's student's method.
EDIT 2
Here is the Google Sheets formula translated back into Excel. It gives you an alternative if you don't want to use Offset:
=MAX(SUMIF(A2:A13,"<="&A2:A13,B2:B13)-INDEX(SUMIF(A2:A13,"<="&A2:A13,B2:B13),N(IF({1},MATCH(A2:A13,IF(B2:B13=0,A2:A13))))))
(entered as an array formula).
Comment
Maybe the real challenge is to find a formula that works both in Excel and Google sheets because:
Vlookup doesn't work the same way in Excel
The offset/subtotal combination doesn't work in Google sheets
The index/match combination with n(if{1}... doesn't work in Google sheets.
With data in columns A and B, insure column B end with a 0. Then in C2 enter:
=IF(AND(B3=0,B2<>0),SUM(B$1:$B2)-MAX($C$1:C1),"")
and copy downwards:
Column C lists the sums of consecutive non-zeros. In another cell enter something like:
=MAX(C:C)>19
where 19 is the criteria value.
You can avoid the "helper" column by using a VBA UDF.
EDIT#1:
Use this instead:
=IF(AND(B3=0,B2<>0),SUM(B$1:$B2)-SUM($C$1:C1),"")
Thanks to #Tom Sharpe and #Gary's Student for answering the question.
While I admittedly did not specify this in the question, I would prefer to achieve the solution without a helper column because I have to do this operation on 30+ successive columns. I just didn't think it was possible in Excel.
Full credit goes to user XOR LX on the Excelforum for coming up with this solution. It has blown my mind and took me the better part of an hour to wrap my head around, but it is certainly very creative. There is no way I could have come up with it myself. Re-posting it here for the benefit of everyone who is looking into this.
Copy and paste the table from my initial question into an empty Excel sheet such that the headers appear in (A1:B1) and the values appear in (A2:B13).
Then enter this formula as an array formula (ctrl+shift+enter), which gives the max of the sums of all the uninterrupted sub-arrays:
=MAX(SUBTOTAL(9,OFFSET(B2,A2:A14,,-FREQUENCY(IF(B2:B13,A2:A13),IF(B3:B14=0,A2:A13,0))-1)))
Note the deliberate offset to include one additional row below the end of the dataset.

Excel - Formula that Counts the First of each Duplicate

I have a spreadsheet that looks like follows:
On the lefthand side, column A, I have Bug numbers that corresponds to a specific bug in our Bugzilla.
I am trying to find a way to place, in a single cell, the amount of Bugs that have duplicates, each of them only counted once.
For example, say I have this:
Col A
1
1
2
3
3
3
4
4
5
I would like to have the formula return me 3 because I have 3 numbers that have duplicates, but each of them only counted once.
Use COUNTIF and SUMPRODUCT:
=SUMPRODUCT((COUNTIF($A$1:$A$9,$A$1:$A$9)>1)/COUNTIF($A$1:$A$9,$A$1:$A$9))
To deal with blanks we need to use SUM(IF()) in an array:
=SUM(IF(((COUNTIF($A$1:$A$9,$A$1:$A$9)>1)*($A$1:$A$9<>"")),1/COUNTIFS($A$1:$A$9,$A$1:$A$9,$A$1:$A$9,"<>")))
being an array formula it needs to be confirmed with Ctrl-Shift-Enter instead of Enter when exiting edit mode.

Concatenate values based on criteria

I have a two column list of data in Excel. The first column being a question number from a test and the second column being a number referencing what is being tested on that question. Some elements are tested on more than one question. What I want to be able to do is to list the question numbers that each element is tested on. For example:
A B Should return: C D
1 Q Ref Q Ref
2 1 N1 1,3,5 N1
3 2 N4 2 N4
4 3 N1 4 N3
5 4 N3
6 5 N1
I want this to be returned using a formula.
Problems I have are returning then concatenating an unspecified number of values from one column that reference to a particular criterion for another column that is further to the right.
EDIT: Looking for a formula answer, not VBA if possible
EDIT: Thanks all for your comments so far. I will have a look at each of the possible solutions given so far and let you know what I go with. The 1,2,3 etc will need to be in the same cell.
Just to put my comment in an answer, so it make more sense.
First sort columns A and B on Column B.
In C2 put the formula:
=IF(B2=B3,A2&","&C3,A2)
Then copy down.
Then in Column E place your unique reference list. And in D2 put:
=VLOOKUP(E2,$B$2:$C$6,2,FALSE)
And copy down.
You can then hide column C.
It does require that it be sorted correctly and a helper column but it does stay to the formulas only rule.
By nature, Excel discourages this in worksheet formulas. I guess they figure that if you do this in a User Defined Function (aka UDF) and it hoops a workbook, it is your own fault and so be it. To that end, I've never seen a standard or array formula using only native worksheet functions that accomplishes this on a 'ragged-edge' array of cells and it's been tried a few times. Consider it #REF! by design.
You can run successive IF functions (up to 64 by xl2007+ standards) to accomplish the string stitching (see this) but you will also be limited to the total length of a formula (see this). We also used 'helper' cells to run off the first 7 IFs in <=xl2003 then reference that cell in the first IF of another 7 nested IFs (rinse and repeat).
TLDR; In short, VBA is your most viable solution (see this). Conditional string concatenation is fraught with problems by itself let alone in an array loop.
CONCATENATE function

Sorting text and numbers in Excel

I got list of created ranges. What I want to do is sort them ascending by cell numbers they are refering to. I tried using sort option but all I came up with is to create my own sorting list...
List of ranges:
Column 1 Column 2
pp2dni2007 =szkolenia!$B$2:$E$33
pp2dni2010 =szkolenia!$B$273:$E$500
pp3dni2008 =szkolenia!$B$34:$E$83
pp3dni2009 =szkolenia!$B$84:$E$272
Desired output:
Column 1 Column 2
pp2dni2007 =szkolenia!$B$2:$E$33
pp3dni2008 =szkolenia!$B$34:$E$83
pp3dni2009 =szkolenia!$B$84:$E$272
pp2dni2010 =szkolenia!$B$273:$E$500
Here is a way (although a bit ugly).
Suppose a set up like this:
Step 1:
Place the cursor to C1 and go to Formulas --> Define Name. Define the following name:
We need to use this function to get the formula of each cell in column B because we will sort based on this formula.
Step 2:
At cell C1 enter and fill down:
=LEFT(SUBSTITUTE(GET_FORMULA,"=szkolenia!R",""),FIND("C",SUBSTITUTE(GET_FORMULA,"=szkolenia!R",""))-1)
broken down for convenience:
=LEFT(SUBSTITUTE(GET_FORMULA,"=szkolenia!R",""),
FIND("C",SUBSTITUTE(GET_FORMULA,"=szkolenia!R",""))-1)
This basically returns the row number of the reference that is stored in GET_FORMULA.
Step 3:
Select columns A, B and C and sort based on column C:
Result:
Or with formulas:
Notes:
The file has to be saved as macro-enabled in order to make the GET_FORMULA name work.
I do not really like helper columns (like column C above) but in this case things would overcomplicate without it.
I hope this helps, although it is a really ugly solution..
When I read loannis solution I came up with another solution to my problem ;)
Forgot to tell: column 2 data isn't importaat when comes to data but it is a hint on how to sort.
Okay, so it looks like that:
Column 1 Column 2
pp2dni2007 =szkolenia!$B$2:$E$33
pp2dni2010 =szkolenia!$B$273:$E$500
pp3dni2008 =szkolenia!$B$34:$E$83
pp3dni2009 =szkolenia!$B$84:$E$272
We got this data, so what is pain in here is this hard data "=szkoleni...".
To sort it out easy all is needed is to get rid of it. Using find&replace I am deleting "=szkolenia!$B$" part and then using it once again I am deleting rest of it ":*".
Now columns look like that:
Column 1 Column 2
pp2dni2007 2
pp2dni2010 273
pp3dni2008 34
pp3dni2009 84
Now it's just a case of simple sorting and voila! It can be easily used via macro too ;)
Thanks loannis, you were my inspiration ;)
Sort&Filter
Reference Link
http://office.microsoft.com/en-us/excel-help/sort-data-in-a-range-or-table-HP010073947.aspx

Exceptions in Excel calculated columns

(Alternate title: Why on earth doesn't Excel support user-defined formulas with parameters without resorting to VB and the problems that entails?).
[ Updated to clarify my question ]
In excel when you define a table it will tend to automatically replicate a formula in a column. This is very much like "fill down".
But ... what if you need exceptions to the rule?
In the tables I'm building to do some calculations the first row tends to be "special" in some way. So, I want the auto-fill down, but just not on the first row, or not on cells marked as custom. The Excel docs mention exceptions in computed columns but only in reference to finding them and eliminating them.
For example, first row is computing the initial value
The all the remaining rows compute some incremental change.
A trivial example - a table of 1 column and 4 rows:
A
1 Number
2 =42
3 =A2+1
4 =A3+1
The first formula must be different than the rest.
This creates a simple numbered list with A2=42, A3=43, A4=44.
But now, say I'd like to change it to be incremented by 2 instead of 1.
If I edit A3 to be "A2+2", Excel changes the table to be:
A
1 Number
2 =A1+2
3 =A2+2
4 =A3+2
Which of course is busted -- it should allow A2 to continue to be a special case.
Isn't this (exceptions - particularly in the first row of a table) an incredibly common requirement?
If you have the data formatted as a table you can use table formulas (eg [#ABC]) instead of A1 format (eg A1, $C2 etc). But there are 2 tricks to account for.
Firstly there is no table formula syntax for the previous row, instead excel will default back to A1 format, but you can use the offset formula to move you current cell to the previous row as shown below. However in this case it will return an # value error since I cant +1 to "ABC".
ABC
1 =OFFSET([#ABC],-1,0)+1
2 =OFFSET([#ABC],-1,0)+1
3 =OFFSET([#ABC],-1,0)+1
4 ....
So the second trick is to use a if statement to intialise the value, buy checking if the previous row value = heading value. If the same use the initial value else add the increment. Note assumes table is named Table1
ABC
1 =IF(OFFSET([#ABC],-1,0)=Table1[[#Headers],[ABC]],42,OFFSET([#ABC],-1,0)+1)
2 =IF(OFFSET([#ABC],-1,0)=Table1[[#Headers],[ABC]],42,OFFSET([#ABC],-1,0)+1)
3 =IF(OFFSET([#ABC],-1,0)=Table1[[#Headers],[ABC]],42,OFFSET([#ABC],-1,0)+1)
4 ....
Note you can set the initial value to be a cell outside the table to define the initial value (in say $A$1) and increment (in say $A$2) as below
ABC
1 =IF(OFFSET([#ABC],-1,0)=Table1[[#Headers],[ABC]],$A$1,OFFSET([#ABC],-1,0)+$A$2)
2 =IF(OFFSET([#ABC],-1,0)=Table1[[#Headers],[ABC]],$A$1,OFFSET([#ABC],-1,0)+$A$2)
3 =IF(OFFSET([#ABC],-1,0)=Table1[[#Headers],[ABC]],$A$1,OFFSET([#ABC],-1,0)+$A$2)
4 ....
I use this IF OFFSET combination all the time for iterating and looping in tables.
If you have alot of columns that need to determine if they are the first row you can have one column test if first row and the rest can work with a simpler if. eg ABC will give true for first row false for others, then DEF with increment the initial value
ABC DEF
1 =OFFSET([#ABC],-1,0)=Table1[[#Headers],[ABC]] =IF([#ABC],$A$1,OFFSET([#DEF],-1,0)+$A$2)
2 =OFFSET([#ABC],-1,0)=Table1[[#Headers],[ABC]] =IF([#ABC],$A$1,OFFSET([#DEF],-1,0)+$A$2)
3 =OFFSET([#ABC],-1,0)=Table1[[#Headers],[ABC]] =IF([#ABC],$A$1,OFFSET([#DEF],-1,0)+$A$2)
4 ....
Hope that helps
I don't know if you are looking for something as simple as locking down a formula. You can do that by highlighting the part of the formula you do not want to change and then hitting F4. This will absolute this section of the formila, using a $ to indicate it, and will not change as you copy/paste it down the table.
Alternately, you may be able to use Defined Names. These you can set up in the Data tab and basically assigns something to a name or variable you can then put into your formulas. These can be as simple as an easy reference for a cell on another sheet to incredibly complex multi-sheet formals.
Normally, to handle "exceptional" formula in the first row of a table consiting of several columns, you simply enter it there manually, and fill only the lines below. But if you have more "exceptional" cases scattered around, you will need another column with 0/1 values indicating where the exceptins are. And then you use if(condition, formula_if_true, formula_if_false) everywhere.
A B
Number Exceptional?
1 if(C1,42,A1+1) 0
2 if(C2,42,A2+1) 1
3 if(C3,42,A3+1) 0
As much as I love Excel, and as much as it is the best product of whole MS, it is still a weak tool. FYI, you can quiclky learn modern and poweful scripting languages, such as Ruby, here, and never be bothered by spreadsheet idiosyncrasies again.

Resources