Excel mutual pairing - excel

How can I pair members of a list such that if the pair of member 'a' is member 'b' then the pair of 'b' is 'a', in MS Excel?
Ideally, the pairing randomizes at each calculation.
Attempt:
1) I have a list of names in a column, (column B).
2) I put the natural numbers up to the list length in column A:
=ROW(B2)-1
3) Random numbers in column C:
=RAND()
4) I ranked ordered the randoms in column D.
=RANK(C2, $C$2:$C$178)
Thus I have two orderings. The order of appearance in comumn A and a random order in D.
However, for example, row by row the "pair of 11" does not have 11 as its pair, in turn. How can I achieve a mutual pairing? If 11 is paired with 27 I need 27 to be paired with 11 also.
(I can then VLOOKUP to pull the names.)
=VLOOKUP(D2,$A$2:$B$178,2,0)
EDIT:
Bellow is my output. You can see that 'c' is the pair of 'a' but 'a' is not pair of 'c'. (The F column checks if someone is paired to themselves.)
So basicly in the output E column I am taking the name corresponding to the random rank ordering. I would like to acheve mutual pairs in some way.

Here is another approach. In your column E you have a permutation of the names. Keep this as an intermediate column. In this case it is c,b,d,a. Simply match c to b and d to a. Index and Match can be used to extract the pairings like this:
The crucial formula in F2 is
=IF(MOD(MATCH(B2,$E$2:$E$5,0),2) = 0,INDEX($E$2:$E$5,MATCH(B2,$E$2:$E$5,0)-1),INDEX($E$2:$E$5,MATCH(B2,$E$2:$E$5,0)+1))
The formulas in columns D, E are exactly as you gave them.

In order to randomly pair up the numbers like this, you need to take into account what numbers have already been picked. Then, you need to randomly select from the remaining numbers.
So, to do this, I am going to scrap your "Random Number" column, and calculate "Rank" directly. To start with, we check if this item already has a pair. All we will do is use MATCH to find if this entry already exists earlier in Column D, and (if so) use INDEX to find what number it has been paired up with:
=INDEX(A:A, MATCH(A2, D$1:D1, 0))
All code, except the last example, will be written for Cell D2
If this entry has already been paired up, it will return an error. Otherwise, it will return a match. If no match was found, we need to pick one at random. This condition is just an IFERROR:
=IFERROR(INDEX(A:A, MATCH(A2, D$1:D1, 0)), <FIND_RANDOM_RANK>)
There are several ways to get the a Rank at Random, but I am going to use AGGREGATE and RANDBETWEEN. We will create a list of the un-picked numbers in AGGREGATE (e.g. {2, 3, 7, 8, 12}), then use RANDBETWEEN to select a position at random (e.g. RANDBETWEEN(1, 5), because there are 5 items to choose from). Our list will be made of Row Numbers, so we will need to convert them to Sequence Numbers with INDEX.
=IFERROR(INDEX(A:A, MATCH(A2, D$1:D1, 0)), INDEX(A:A, <FIND_RANDOM_ROW>))
Since our Row Number are numbers, so we can put them in order and pick the kth SMALLest
AGGREGATE(15, 6, <NUMBER_LIST>, RANDBETWEEN(1, <NUMBER_OF_ITEMS>))
The Number of Items will be the total Number, minus how many items have already been paired. The number of items already paired will be the number of items above us in the list, PLUS however many of those are paired with items further down.
AGGREGATE(15, 6, <NUMBER_LIST>, RANDBETWEEN(1, COUNTA(A:A)-(Row()+COUNTIF(D$1:D1,">"&A2))))
In the Number List, we can use #DIV0! errors to mark items as Excluded. We exclude them if they have already been paired up: either above us in the list, or paired to an item above us in the list.
ROW(A:A) / ((Row(A:A)>Row()) * (COUNTIF(D$1:D1,A:A)=0))
Now, before we stick everything together, we need to limit our column sizes in this part (which will work as an Array Formula). If we try to calculate this for all 1048576 rows in the Worksheet, then not only would it take ages, but you would be matching against a lot of empty rows.
To do this, we can use INDEX and COUNTA to work out how many rows of record we have. This means, for example, changing A:A to A$1:INDEX(A:A,COUNTA(A:A)). If you have 10 rows of data, then this will go A$1:INDEX(A:A,COUNTA(A:A)) → A$1:INDEX(A:A,10) → A$1:A10, which is 100,000 times less data to process!
ROW(A$1:INDEX(A:A,COUNTA(A:A))) / ((Row(A$1:INDEX(A:A,COUNTA(A:A)))>Row()) * (COUNTIF(D$1:D1,A:A)=0))
Now, we can put that all together, to get our final equation:
=IFERROR(INDEX(A:A, MATCH(A2, D$1:D1, 0)), INDEX(A:A,AGGREGATE(15, 6, ROW(A$1:INDEX(A:A,COUNTA(A:A))) / ((ROW(A$1:INDEX(A:A,COUNTA(A:A)))>ROW()) * (COUNTIF(D$1:D1,A:A)=0)), RANDBETWEEN(1, COUNTA(A:A)-(ROW()+COUNTIF(D$1:D1,">"&A2))))))
This will give us our Matched Rank. All we need to do then is a quick VLOOKUP on Columns A and B to work out what the name is:
=VLOOKUP(D2,A:B,2,FALSE)
This code is for Cell E2
(All-in-all, this code will be a lot more efficient if your list is of a fixed size, and you can then replace the COUNTA(A:A) and A$1:INDEX(A:A,COUNTA(A:A)) bits with the correct numbers and cell references directly, such as 5 and A$1:A$5)

Related

Appending two lists in excel

I have been trying and searching how to append two lists in excel to use in a formula. The lists do not exist in columns, they are created using a formula. I want to combine the two lists in a single one, not to show the values but to use the new list in a formula. I am using excel 365 (UNIQUE function). Let me replace my initial text by a real small case.
I have an excel file with 3 work sheets. Sheet1 is:
Sheet2 is:
Now I want to run some analysis in Sheet3. In my example I want to count how many unique values from column A have column B containing one of the letters 'a', 'b, 'c', or 'd'. For instance, in Sheet1, the letter 'a' appears in all rows. Column A has 3 unique values. So my result for 'a' is 3. The letter 'b' does not appear for the case where column A is '3'. Therefore the result for 'b' is '2'.
So I create a Sheet3 to show my results. The first column contains a list of letters {a, b, c, d}. I then use the formula:
=COUNT(UNIQUE(FILTER(Sheet1!$A$1:$A$100, ISNUMBER(SEARCH(A1, Sheet1!$B$1:$B$100)))))
From inside out: the SEARCH function looks in cells B1 to B100 (I can live with specifying a larger range) where is the position of the value specified in column A (of the current sheet). If it does, then SEARCH returns a number. I check if the return value is a number (ISNUMBER) and use this to filter values in column A of Sheet1. I then apply the UNIQUE function to these values and finally count them.
Then I do the same with values in Sheet2. And it works. This is the output:
Column B is the number of unique values (as specified above) from Sheet1 and Column C the same from Sheet2.
So far so good. But now I want to have the counting of unique values globally. Not for each Sheet. One cannot just add the values from column B and C, as there might be an overlap. For example, the result for 'a' should be 3, not 5.
The solution here would be to grab the two unique lists (one from Sheet1 and the other from Sheet2), join them, UNIQUE this new list, and count. How do I join them ? That is my question.
Note that this 'counting of unique values' is just an example. I might want to find the maximum, or sort them, or find only prime numbers, or the average, or the median, or something else. So I need a general approach to join the results.
I got options close to a workable thing when all the data is in the same worksheet.
Finally, note that the data size I have is not huge, but it is large (thousands of lines at the most).
Here is something you could try:
=LET(x,{"A","B","C"},y,{"D","E"},z,CHOOSE({1,2},x,y),cnt,MAX(COUNTA(x),COUNTA(y)),seq,SEQUENCE(cnt*2),final,INDEX(z,MOD(seq-1,cnt)+1,CEILING(seq/cnt,1)),FILTER(final,NOT(ISERROR(final))))
Here both 'x' and 'y' variables are placeholders for your two (vertical) arrays. In this case I used: {"A","B","C"} and {"D","E"}. Assuming you just want to place the 2nd array directly under the 1st one, the above suggestion does just that:

Most frequent observation per row with two ranked tie-breakers

I have a dataset in Excel where i would like a formula to find the most frequent observation (from column B to column F) for each row. However, if there are any ties there are two tie-breakers, ranked in the following order: The first tie-breaker is, that if the number 4 is tied as the most frequent observations in any row the result in that row should be 4. The second tie-breaker is that if there is a tie (where 4 is not tied for the most frequent observation) it should show the value in Column G.
In the picture below I have made a rough sketch of (to the left) the data I have now and (to the right) the outcome i want.
Picture of dataset:
What formula would I need to write, in order to get the result i would like?
Thanks in advance,
Anders
See if this works for you:
=IF(ISNA(MODE.MULT(MyData)),IF(ISNA(MATCH(4,MyData,0)),Fruit,4),IF(ISERR(INDEX(MODE.MULT(MyData),2)),MODE.MULT(MyData),IF(ISNA(MATCH(4,MODE.MULT(MyData),0)),Fruit,4)))
entered as an array formula CTRL-SHIFT-ENTER.
Here MyData is a placeholder for a row of data. In your example, MyData will be a single row from columns B-F; for case A, MyData={1,1,1,1,2}. Fruit is a placeholder the corresponding value from column G. You can replace MyData with B2:F2 and Fruit with G2 then copy and paste to other locations.
Here's how it works. The formula uses Excel's MULT.MODE function, which returns as many mode values as there are in the data.
MULT.MODE returns N/A when there are no repeated elements in MyData. This is the situation for your cases D and E. This means there is an N-way tie, so we need to apply the tie breaking rules. This is done by using the MATCH function to see if 4 is found in MyData; if it is, return 4, otherwise return Fruit.
If MyData has repeated elements, MULT.MODE returns an array containing the mode or modes found. If there is no tie, MULT.MODE returns a single element, otherwise the array will have at least two elements. To test for ties, we attempt access to the 2nd element of the array with use INDEX(MULT.MODE(MyData),2). This will throw an error if there is no tie.
If there is no tie, detect the resulting error with ISERR. With no tie, we return the result of MULT.MODE.
If there is a tie, no error occurs. In that case, we use MATCH to look for 4 in the results of MULT.MODE. If 4 is found, we return 4; if not return Fruit.
Hope that helps.
#xidgel: This is a great answer, but you'll also have to account for the case where all observations are the same.
=IF(ISNA(MODE.MULT(MyData)),
IF(ISNA(MATCH(4,MyData,0)),Fruit,4),
IF(ROWS(MODE.MULT(MyData))<2,
IF(AND(COUNTIF(MyData,"<>"&MODE.MULT(MyData))=0,MODE.MULT(MyData)<>4),Fruit,MODE.MULT(MyData)),
IF(ISNA(MATCH(4,MODE.MULT(MyData),0)),Fruit,4)))
entered as an array formula CTRL-SHIFT-ENTER.

Returning all possible values instead of a VLOOKUP

So I've looked up tutorials on how to do this, and I'm still struggling, so I could use some expert help. I know it involves a very complex nested formula with things like SMALL, ROW, INDEX, etc...
So here are two screenshots that provide a sample of what I'm looking for. In realities there is over 1000 rows, but this makes it easier for you guys.
So here is my first example, lets call this Sheet1!:
Code, ID_1 and ID_2. So as you can see (and just focus on the input in A2) there will be two separate IDs in the linked workbook. That sheet, or at least a tiny sample of it, looks like this:
In the first column we see the code we're looking for (which is what we have in A2 of the first one), each of them with different IDs. So as I'm sure you can tell by now, I'm looking for a formula that will allow me to return those values in ID_1 and ID_2 in the first sheet.
I have been going at this for an hour and I'm stumped, so I would greatly appreciate any help provided!
This is a more generic code if the ids are NOT listed consecutively: Obviously I have done this as an example to take in a more general case where the ids occur anywhere throughout the second dataset, AND where there are potentially several.
IFERROR(INDEX($V$2:$V$15, SMALL(IF($U$2:$U$15=$M2, ROW($U$2:$U$15), FALSE), COLUMNS($N2:N2))-ROW($V$1), 1), "")
This formula must be entered with Ctrl-Shift-Enter before copying across and down! Note all absolute and relative referencing/locking ($ signs)
The logical steps in constructing such a formula:
1) We use IF function to test if the values in the column U match the value in column M.
2) In the 'value-if-true' parameter, we will get the corresponding row number of values in column U. These numbers will be fed later in the SMALL function.
3) In the value-if-false part, we just return false, as that will later be used as a non-number in the SMALL function
Above 3 steps in the part: IF($U$2:$U$15=$M2, ROW($U$2:$U$15), FALSE)
4 ) We have now an array of mixed row numbers and FALSE values, which we want to feed to the INDEX function to simply get the corresponding value in column V(our second datset). BUT as we wish to retrieve the different row matches for each code, we have to fish them out of the mixed array with the SMALL function.
5) using our columns as an incrementer, we apply the SMALL function to the array with a varying k parameter. We USE the COLUMNS function (note carefully the different $ sign usage), so that as we drag the formula across, the column count increments: COLUMNS($N2:N2) - giving K values of 1, 2, 3, 4 as we drag the formula across from column N to column Q. Note that it is useful that the SMALL function disregards FALSE values when looking through the array for the values by size.
6) There is an adjustment to account for the fact that the rows are relative to the 'Ids' range which we will feed into the INDEX function to retrieve the different ids. SMALL(IF($U$2:$U$15=$M2, ROW($U$2:$U$15), FALSE), COLUMNS($N2:N2))-ROW($V$1).
This can be avoided if we use the entire column V as the look-up array parameter in the INDEX function, but that's another way...
7) This resulting value can now be passed to the INDEX function to obtain the various ids. The column_num parameter of 1 which I put in the function isn't necessary in a single-column look-up array, but is there for completeness.
8) The entire construction is then wrapped in an IFERROR function to give an empty string if there is no match, but some people may wish to have error outputs there...
well if the two ID will be consecutive in the second list try this:
=index('workbookname'SheetName!columnrangeofserialnumbers,match(A2,'workbookname'Sheetname!columnrangeofIDs,0))
Assuming your other workbook is called Serials, and all the info is on sheet1 you would enter the follow in B2:
=index('serials'sheet1!$B$2:$B$1000,match(A2,'serials'sheet1!$B$2:$B$1000,0))
in C2 enter the following (assuming ids will show up consecutively)
=index('serials'sheet1!$B$2:$B$1000,match(A2,'serials'sheet1!$B$2:$B$1000,0)+1)
This only works if the other workbook is open as far as I know and with the understanding that the two ID will be listed consecutively in the list.

Sort Order formula to alphabetise in Excel

I am currently drawing up a spreadsheet that will automatically remove duplicates and alphabetize a list:
I am using the COUNTIF() function in column G to create a sort order and then VLOOKUP() to find the sort in column J.
The problem I am having is that I can't seem to get my SortOrder column to function properly. At the moment it creates an index for two number 1's meaning the cell highlighted in yellow is missed out and the last entry in the sorted list is null:
If anyone can find and rectify this mistake for me I'll be very grateful as it has been driving me insane all day! Many thanks.
I'll provide my usual method for doing an automatic pulling-in of raw data into a sorted, duplicate-removed list:
Assume raw data is in column A. In column B, use this formula to increase the counter each time the row shows a non-duplicate item in column A. Hardcord B2 to be "1", and use this formula in B3 and drag down.
=if(iserror(match(A3,$A$2:A2,0)),B2+1,B2)
This takes advantage of the fact that when we refer to this row counter in our revised list, we will use the match function, which only checks for the first matching number. Then say you want your new list of data on column D (usually I do this for display purposes, so either 'group-out' [hide] columns that form the formulas, or do this on another tab). You can avoid this step, but if you are already using helper columns I usually do each step in a different column - easier to document. In column C, starting in C3 [C2 hardcoded to 1] and drag down, just have a simple counter, which error-checks to the stop at the end of your list:
=if(C2<max(B:B),C2+1," ")
Then in column D, starting at D2 and dragged down:
=iferror(index(A:A,match(C2,B:B,0)),"")
The index function is like half of the vlookup function - it pulls the result out of a given array, when you provide it with a row number. The match function is like the other half of the vlookup function - it provides you with the row number where an item appears in a given array.
Hope this helps you in the future as well.
The actual reason that this is going wrong as implied by Jeeped's comment is that you can't meaningfully compare a string to a number unless you do a conversion because they are stored differently. So COUNTIF counts numbers and text separately.
20212 will give a count of 1 because it is the only (or lowest) number.
CS10Z002 will give a count of 1 because it is the first text string in alphabetical order.
Another approach is to add the count of numbers to the count if the current cell contains text:-
=COUNTIF(INDIRECT("$D$2:$D$"&$F$3),"<="&D2)+ISTEXT(D2)*COUNT(INDIRECT("$D$2:$D$"&$F$3))
It's easier to show the result of three different conversions with some test data:-
(0) No conversion - just use COUNTIF
=COUNTIF(D$2:D$7,"<="&D2)
"999"<"abc"<"def", 999<1000
(1) Count everything as text
=SUMPRODUCT(--(D$2:D$7&""<=D2&""))
"1000"<"999"
(2) Count numbers before text
=COUNTIF(D$2:D$7,"<="&D2)+ISTEXT(D2)*COUNT(D$2:D$7)
999<1000<"999"
(3) Count everything as text but convert numbers with leading zeroes
=SUMPRODUCT(--(TEXT(D$2:D$7,"000000")<=TEXT(D2,"000000")))
"000999" = "000999", "000999"<"001000"

Merge two lists in Excel in ascending or descending order

I have two independent lists in Excel. I need to combine them into one list and have that list sorted either ascending or descending order.
For example how to merge these two lists (Named ranges List1 and List2 in Excel)
List1 List2
AA BB
DD EE
FF GG
KK
into one list
AA
BB
DD
EE
FF
GG
KK
I managed to merge the lists but without ascending or descending sorting with the following formula:
=IFERROR(INDEX(List1, ROWS(AH4:$AH$4)), IFERROR(INDEX(List2, ROWS(AH4:$AH$4)-ROWS(List1)), ""))
and the result looking like this
Can the sorted merged list be achieved using formulas only (no VBA)?
Please try:
=IF(ROW()<=COUNTA(A:A), INDEX(A:A,ROW()), IF(ROW()>COUNTA(A:B), "", INDEX(B:B,ROW()-COUNTA(A:A))))
Straight steal from Jerry Beaucaire.
If you make a new range of all cells from both lists, and call it BothLists, then you could use this-
=INDEX(BothLists,IF(ISODD(ROW()-ROW(List1))=TRUE,((ROW()-ROW(List1)+1)/2)+1,(ROW()-ROW(List1)+2)/2),IF(ISODD(ROW()-ROW(List1))=TRUE,1,2))
This assumes that, like your sample here, all the values in your two columns "tile" ALWAYS from lowest to highest when moving through the lists. (e.g., lowest value in A2, second lowest in B2, third lowest in A3, fourth lowest in B3, etc.)
In other words, it means that these two things are true-
(1) Within the same row, every value in the first Column is less that the value in second Column.
(2) Every value in the first Column is greater than the value in the Row directly above it, but in the second Column
If that's not true, please let know.
Reference - personal experience from a similar job I had to figure out where I work.
You can do a merge sort with formulas only, but any implementation I can think of is going to require extra scratch columns. The simplest implementation I've got uses 5 extra columns - you can easily cut that down to 3, or even 2, but IMO you're better off keeping the formulas simple and just hiding the extra columns.
In this example we're merging the sorted ranges A1:A14 and C4:C14. In actuality it's the entirety of columns A and C - the formula I'm using determines the end of the range by looking for blanks, so that it's easy to add data later, instead of using an explicit range size. But you could change that if you wanted.
The output is in column E. The formula there is simple: =IF(ISBLANK(G1),,INDIRECT(G1)) . The "if" is just to avoid errors when we run out of data, it's not strictly needed.
All the real work is done in column G:
=IF(ISBLANK(H1),IF(ISBLANK(I1),,"C"&K1),IF(ISBLANK(I1),"A"&J1,IF(H1<=I1,"A"&J1,"C"&K1)))
Most of that is bounds checking for if we run out of data in one or both of the lists. The comparison comes at the end of the expression: Construct the cell index of the item in the first list if it's <=, otherwise use the 2nd list.
Columns H and I reflect the list values based on the list indices in columns J and K. So they have simple formulas:
=INDIRECT("A"&J1)
=INDIRECT("C"&K1)
Columns J and K are also pretty simple. They get incremented based on the result of what happened in column G. The first row is static, and gets set based on where your data is. (In the simplest case, it's just 1 1.) After that, the formulas are:
=J1+IF(LEFT($G1)="A",1,0)
=K1+IF(LEFT($G1)="C",1,0)

Resources