How to identify unique combinations of pairs in Excel? - excel-formula

My goal is to get a two-column array of all the possible unique pairs of N items (so N*(N-1)/2 pairs).
I have done this in the past using extra calculation columns, but with the advent of LET() I was wondering if I can get a single function call.
This is my example data:

In the end I came up with this, which is a little unwieldy but does the job:
=LET(x,B2:B5,INDEX(x,LET(n,ROWS(x),s,(1+SEQUENCE((n*n*2),2))/2,r,INT((s-1)/n)+1,LET(a,IF(INT(s)=s,r,INT(s)-((r-1)*n)),FILTER(a,INDEX(a,,1)<INDEX(a,,2))))))

Related

How to sort rows in Excel without having repeated data together

I have a table of data with many data repeating.
I have to sort the rows by random, however, without having identical names next to each other, like shown here:
How can I do that in Excel?
Perfect case for a recursive LAMBDA.
In Name Manager, define RandomSort as
=LAMBDA(ζ,
LET(
ξ, SORTBY(ζ, RANDARRAY(ROWS(ζ))),
λ, TAKE(ξ, , 1),
κ, SUMPRODUCT(N(DROP(λ, -1) = DROP(λ, 1))),
IF(κ = 0, ξ, RandomSort(ζ))
)
)
then enter
=RandomSort(A2:B8)
within the worksheet somewhere. Replace A2:B8 - which should be your data excluding the headers - as required.
If no solution is possible then you will receive a #NUM! error. I didn't get round to adding a clause to determine whether a certain combination of names has a solution or not.
This is just an attempt because the question might need clarification or more sample data to understand the actual scenario. The main idea is to generate a random list from the input, then distribute it evenly by names. This ensures no repetition of consecutive names, but this is not the only possible way of sorting (this problem may have multiple valid combinations), but this is a valid one. The solution is volatile (every time Excel recalculates, a new output is generated) because RANDARRAY is volatile function.
In cell D2, you can use the following formula:
=LET(rng, A2:B8, m, ROWS(rng), seq, SEQUENCE(m),
idx, SORTBY(seq, RANDARRAY(m,,1,m, TRUE)), rRng, INDEX(rng, idx,{1,2}),
names, INDEX(rRng,,1), nCnts, MAP(seq, LAMBDA(s, ROWS(FILTER(names,
(names=INDEX(names,s)) * (seq<=s))))), SORTBY(rRng, nCnts))
Here is the output:
Update
Looking at #JosWoolley approach. The generation of the random sorting can be simplified so that the resulting formula could be:
=LET(rng, A2:B8, m, ROWS(rng), seq, SEQUENCE(m), rRng,SORTBY(rng, RANDARRAY(m)),
names, TAKE(rRng,,1), nCnts, MAP(seq, LAMBDA(s, ROWS(FILTER(names,
(names=INDEX(names,s)) * (seq<=s))))), SORTBY(rRng, nCnts))
Explanation
LET function is used for easy reading and composition. The name idx represents a random sequence of the input index positions. The name rRng, represents the input rng, but sorted by random. This sorting doesn't ensure consecutive names are distinct.
In order to ensure consecutive names are not repeated, we enumerate (nCnts) repeated names. We use a MAP for that. This is a similar idea provided by #cybernetic.nomad in the comment section, but adapted for an array version (we cannot use COUNTIF because it requires a range). Finally, we use SORTBY with input argument by_array, the map result (nCnts), to ensure names are evenly distributed so no consecutive names will be the same. Every time Excel recalculate you will get an output with the names distributed evenly in a different way.
Not sure if it's worth posting this, but I might as well share the results of my research such as it is. The problem is similar to that of re-arranging the characters in a string so that no same characters are adjacent The method is just to insert whichever one of the remaining characters (names) has the highest frequency at this point and is not the same as the previous character, then reduce its frequency once it has been used. It's fairly easy to implement this in Excel, even in Excel 2019. So if the initial frequencies are in D2:D8 for convenience using Countif:
=COUNTIF(A$2:A$8,A2)
You can use this formula in (say) F2 and pull it down:
=INDEX(A$2:A$8,MATCH(MAX((D$2:D$8-COUNTIF(F$1:F1,A$2:A$8))*(A$2:A$8<>F1)),(D$2:D$8-COUNTIF(F$1:F1,A$2:A$8))*(A$2:A$8<>F1),0))
and similarly in G2 to get the ages:
=INDEX(B$2:B$8,MATCH(MAX((D$2:D$8-COUNTIF(F$1:F1,A$2:A$8))*(A$2:A$8<>F1)),(D$2:D$8-COUNTIF(F$1:F1,A$2:A$8))*(A$2:A$8<>F1),0))
I'm fairly sure this will always produce a correct result if one is possible.
HOWEVER there is no randomness built in to this method. You can see if I extend it to more data that in the first several rows the most common name simply alternates with the other two names:
Having said that, this is a bit of a worst case scenario (a lot of duplication) and it may not look too bad with real data, so it may be worth considering this approach along with the other two methods.

Can I use MINIFS or INDEX/MATCH on two non-contiguous ranges...?

Problem is straightforward, but solution is escaping. Hopefully some master here can provide insight.
I have a big data grid with prices. Those prices are ordered by location (rows) and business name (cols). I need to match the location/row by looking at two criteria (location name and a second column). Once the matching row is found (there will always be a match), I need to get the minimum/lowest price from two ranges within the grid.
The last point is the real challenge. Unlike a normal INDEX or MINIFS scenario, the columns I need to MIN aren't contiguous... for example, I need to know what the MIN value is between I4:J1331 and Q4:U1331. It's not an intersection, it's a contiguous set of values across two different arrays.
You're probably saying "hey, why don't you just reorder your table to make them contiguous"... not an option. I have a lot of data, and this spreadsheet is used for a bunch of other stuff. So, I have to work with the format I have, and that means figuring out how to do a lookup/min across multiple non-contiguous ranges. My latest attempt:
=MINIFS(AND($I$4:$J$1331,$K$4:$P$1331),$B$4:$B$1331,$A2,$E$4:$E$1331,$B2)
Didn't work, but it should make it more clear what I'm trying to do. There has GOT to be an easy way to just tell excel "use these two ranges instead of one".
Thanks,
Rick
Figured it out. For anyone else who's interested, there doesn't seem to be any easy way to just "AND" arrays together for a search (Hello MS, backlog please). So, what I did instead was to just create multiple INDEX/MATCH arrays inside of a MIN function and take the result. Like this:
MIN((INDEX/MATCH ARRAY 1),(INDEX/MATCH ARRAY 2))
They both have identical criteria, the only difference is the set of arrays being indexed in each function. That basically gives me this:
MIN((match array),(match array))
And Min can then pull the lowest value from either.
Not as elegant as I'd like... lots of redundant code, but at least it works.
-rt

SumProduct Using Multiple Criteria Returning Too Much Data

Although this question has been asked and answered, (Stack Overflow is where I learned how to implement SP), an issue has come up which I can't figure out.
I'm using SP to sum shipments within a pivot table using a product number (with wild-cards), and a specific date. For instance, part numbers can be "AX10235-HP", "AX11135-HP", "AX10235-HP2", "AX10235-HPSPARE" or TP10101-IBM. (There are a large variety of numbers.)
So in this case I want to sum the qty shipments of "AX???35-HP". I wish to sum just the first 2 parts in my short list. However, the command used causes all the parts to sum except the *-IBM number; as if there was a wild-card at the end of the number. In other words "AX???35-HP" is the same as "AX???35-HP*". I've tried wrapping the value in quotes but it takes uses the quotes literally so fails.
This is the function
SUMPRODUCT((S_PART_DATA)*(ISNUMBER(SEARCH($A6,S_PART_RANGE))*(S_PART_DATES=T$4)))
S_PART_DATA array of Shipments,
S_PART_RANGE array of list of part numbers,
S_PART_DATES array of Dates shipments were made
It works the way you describe because SEARCH function finds $A6 within other text, hence it may not be an exact match - better to use SUMIFS function like this:
=SUMIFS(S_PART_DATA,S_PART_RANGE,$A6,S_PART_DATES,T$4)
Assuming all named ranges are the same size and A6 contains the value AX???35-HP
If that doesn't work try this version
=SUMPRODUCT(S_PART_DATA*ISNUMBER(SEARCH("^"&$A6&"^","^"&S_PART_RANGE&"^"))*(S_PART_DATES=T$4))
concatenating the ^ values means you will [probably] only get exact matches

How to compare 2 arrays with unequal numbers of elements in Excel

In an Excel array formula, I would like to test each element of one array against each element of a second array, when the 2 arrays do NOT have the same number of elements. Simplified right down, this scenario could be represented by:
=SUMPRODUCT({1,2,3,4,5}={1,2})
NB - in my real world scenario these arrays are calculated from various prior steps.
Using the above example, I would want a result of {TRUE,TRUE,FALSE,FALSE,FALSE}. What I get is {TRUE,TRUE,#N/A,#N/A,#N/A}.
It's clear that, when there's more than 1 value being tested for, Excel wants equal numbers of elements in the 2 arrays; when there isn't, the #N/A error fills in the blanks.
I've considered writing a UDF to achieve what I want, and I'm pretty sure my coding skills are up to creating something like:
=ArrayCompare({1,2,3,4,5},"=",{1,2})
But I'd much rather do this using native functionality if it's not too cumbersome...
So, simple question; can an array formula be constructed to do what I'm after?
Thanks peeps!
Using MATCH function is probably the best way.....but if you actually want to compare every element in one array with another array in a direct comparison then one should be a "column" and one a "row", e.g.
=SUMPRODUCT(({1,2,3,4,5}={1;4})+0)
Note the semi-colon separator in the second array
If you can't actually change the column/row designation then TRANSPOSE can be used, i.e.
=SUMPRODUCT(({1,2,3,4,5}=TRANSPOSE({1,4}))+0)
You may not get the required results if the arrays contain duplicates because then you will get some double-counting, e.g. with this formula
=SUMPRODUCT(({1,1,1,1,1}={1;1})+0)
the result is 10 because there are 5x2 comparisons and they are all TRUE
Maybe:
{=IF(ISERROR(MATCH({1,2,3,4,5},{1,2},0)),FALSE,TRUE)}
If the second array is a subset of the first array, same order, and starting at position 1 then you can use this array formula for equivalence testing:
=IFERROR(IF({1,2,3,4,5}={1,2},TRUE),FALSE)
For non equivalence just swap the FALSE and TRUE
=IFERROR(IF({1,2,3,4,5}={1,2},FALSE),TRUE)
You can then use this in other formulas just as an array:
However if the arrays are not in order, as in this example:
{1,2,3,4,5},{1,4,5}
Then you have to use MATCH. However all you need is to surround the match with an ISNUMBER like so:
Equivalence test:
=ISNUMBER(MATCH({1,2,3,4,5},{1,4,5},0))
Non Equivalence test:
=NOT(ISNUMBER(MATCH({1,2,3,4,5},{1,4,5},0)))
Remember all array formulas are entered with ctrl + shift + enter

nested excel functions with conditional logic

Just getting started in Excel and I was working with a database extract where I need to count values only if items in another column are unique.
So- below is my starting point:
=SUMPRODUCT(COUNTIF(C3:C94735,{"Sharable Content Object Reference Model 1.2","Authored SCORM/AICC content","Authored External Web Content"}))
what i'd like to figure out is the syntax to do something like this-
=sumproduct (Countif range1 criteria..., where range2 criteria="is unique value")
Am I getting this right? The syntax is a bit confusing, and I'm not sure I've chosen the right functions for the task.
I just had to solve this same problem a week ago.
This method works even when you can't always sort on the grouping column (J in your case). If you can keep the data sorted, #MikeD 's solution will scale better.
Firstly, do you know the FREQUENCY trick for counting unique numbers? FREQUENCY is designed to create histograms. It takes two arrays, 'data' and 'bins'. It sorts 'bins', then creates an output array that's one longer than 'bins'. Then it takes each value in 'data' and determines which bin it belongs in, incrementing the output array accordingly. It returns the array. Here's the important part: If a value appears in 'bins' more than once, any 'data' value meant for that bin goes in the first occurrence. The trick is to use the same array for both 'data' and 'bins'. Think it through, and you'll see that there's one non-zero value in the output for each unique number in the input. Note that it only counts numbers.
In short, I use this:
=SUM(SIGN(FREQUENCY(<array>,<array>)))
to count unique numeric values in <array>
From this, we just need to construct arrays containing numbers where appropriate and text elsewhere.
In the example below, I'm counting unique days when the color is red and the fruit is citrus:
This is my conditional array, returning 1 or true for the rows I'm interested in:
($A$2:$A$10="red")*ISNUMBER(MATCH($B$2:$B$10,{"orange","grapefruit","lemon","lime"},0))
Note that this requires ctrl-shift-enter to be used as an array formula.
Since the value I'm grouping by for uniqueness is text (as is yours), I need to convert it to numeric. I use:
MATCH($C$2:$C$10,$C$2:$C$10,0)
Note that this also requires ctrl-shift-enter
So, this is the array of numeric values within which I'm looking for uniqueness:
IF(($A$2:$A$10="red")*ISNUMBER(MATCH($B$2:$B$10,{"orange","grapefruit","lemon","lime"},0)),MATCH($C$2:$C$10,$C$2:$C$10,0),"")
Now I plug that into my uniqueness counter:
=SUM(SIGN(FREQUENCY(<array>,<array>)))
to get:
=SUM(SIGN(FREQUENCY(
IF(($A$2:$A$10="red")*ISNUMBER(MATCH($B$2:$B$10,{"orange","grapefruit","lemon","lime"},0)),MATCH($C$2:$C$10,$C$2:$C$10,0),""),
IF(($A$2:$A$10="red")*ISNUMBER(MATCH($B$2:$B$10,{"orange","grapefruit","lemon","lime"},0)),MATCH($C$2:$C$10,$C$2:$C$10,0),"")
)))
Again, this must be entered as an array formula using ctrl-shift-enter. Replacing SUM with SUMPRODUCT will not cut it.
In your example, you'd use something like:
=SUM(SIGN(FREQUENCY(
IF(ISNUMBER(MATCH($C$3:$C$94735,{"Sharable Content Object Reference Model 1.2","Authored SCORM/AICC content","Authored External Web Content"},0)),MATCH($J$3:$J$94735,$J$3:$J$94735,0),""),
IF(ISNUMBER(MATCH($C$3:$C$94735,{"Sharable Content Object Reference Model 1.2","Authored SCORM/AICC content","Authored External Web Content"},0)),MATCH($J$3:$J$94735,$J$3:$J$94735,0),"")
)))
I'll note, though, that scaling might be a problem on data sets as large as yours. I tested it on larger data sets, and it was fairly fast on the order of 10k rows, but really slow on the order of 100k rows, such as yours. The internal arrays are plenty fast, but the FREQUENCY function slows down. I'm not sure, but I'd guess it's between O(n log n) and O(n^2) depending on how the sort is implemented.
Maybe this doesn't matter - none of this is volatile, so it'll just need to calculate once upon refreshing the data. If the column data is changing, though, this could be painful.
Asuming the source data is sorted by the key value [A], start with determining the occurence of the key column
B2: =IF(A2=A1;B1+1;1)
Next determine a group sum
C2: =SUMIF($A$2:$A$9;A2;$B$2:$B$9)
A key is unique if its group sum is exactly 1
D2: =(C2=1)
To count records which match a certain criterium AND are unique, include column D in a =IF(AND(D2, [yourcondition];1;0) and sum this column
Another option is to asume a key unique within a sorted list if it is unequal to both its predecessor and successor, so you could find the unique records like
E2: =AND(A2<>A1;A2<>A3)
G2: =IF(AND(E2;F2="this");1;0)
E and G can of course be combined into one single formula (not sure though if that helps ...)
G2(2): =IF(AND(AND(A2<>A1;A2<>A3);F2="this");1;0)
resolving unnecessarily nested AND's:
G2(3): =IF(AND(A2<>A1;A2<>A3;F2="this");1;0)
all formulas in row 2 should be copied down to the end of the list

Resources