I have two large columns.
Column A contains 100,000 different numbers/rows. Column B contains 100,210 numbers/rows. They have the same numbers except column B has 210 extra rows. I need to be able get the values of that extra 210 rows.
The issue im having is that the numbers in these rows are not unique.
For example,
Column A contains the following numbers: 2,1,3,4,5,5,6,7
Column B contains the following numbers: 1,2,3,4,5,5,5,5,6,6,7,8
I want the outcome result to be: 5,5,6,8
I can't seem to wrap my head around a way to do this.
I have the two columns in a text file that im importing into excel. If there are better ways to do it outside of excel, I am open to it too.
With the Dynamic Array formula Filter:
=FILTER(B1:B12,COUNTIF(OFFSET(B1,0,,SEQUENCE(ROWS(B1:B12))),B1:B12)>COUNTIF(A:A,B1:B12))
Without FILTER:
Put this in the first cell and copy down:
=IFERROR(INDEX(B:B,AGGREGATE(15,7,ROW(B1:B12)/(COUNTIF(OFFSET(B1,0,,ROW(INDEX($ZZ:$ZZ,1):INDEX($ZZ:$ZZ,ROWS(B1:B12)))),B1:B12)>COUNTIF(A:A,B1:B12)),ROW($ZZ1))),"")
Try to follow these steps, supposing that Column A has less values than the Column B and the rows start at 1:
A. Create Column C.
In the cell C1 place the function: =COUNTIF(A:A;B1)
Copy this function to the rest of cells, for all items of Column B. So, cell C2 will have the function =COUNTIF(A:A;B2) and so on.
B. Create column D.
In the cell D1 place the function: =COUNTIF($B1:$B1;B1)
Copy this function to the rest of cells, for all items of Column B. So, cell D2 will have the function =COUNTIF($B$1:$B2;B2) and so on.
C. Create column E.
In the cell E1 place the function: =IF(D1<=C1,"Exists","Missing")
Copy this function to the rest of cells, for all items of Column B. So, cell E2 will have the function =IF(D2<=C2,"Exists","Missing") and so on.
D. Filter to show only the rows that Column E values are "Missing".
Of course you can combine all above 3 columns to one (e.g. in Column F), so these cells will have the functions:
F1: =IF(COUNTIF($B$1:$B1,B1)<=COUNTIF(A:A,B1),"Exists","Missing")
F2: =IF(COUNTIF($B$1:$B2,B2)<=COUNTIF(A:A,B2),"Exists","Missing")
and so on
Explanation:
In column C we count how many times the value of the respective cell
of Column B exist in the whole Column A.
In Column D we count how many times we have "met" this value in Column B so far.
In Column E we check if we have "met" the value more times that it exists in Column A. If indeed we have "met" it more times, then we mark the cell as "missing"
Tested with the example you provided and works okay.
I hope it helps!
Good luck!
EDIT - Addition of Screenshot
I want to match specific rows and columns.
If A2 is matched with B1, then return list "ROW 1" in B2:B151
If A5 is matched with B5 then return list "ROW 4" in B2:B151
As explained in the image B2 is feed by a Dropdown from another sheet (This is not part of the question)
I am trying to override this list in B2:B151 every time a new match is made.
I haven't got a clue on how to proceed have tried multiple Index and match combination even Vlookup which is not viable for this sort of problem.
I was thinking of concatenating all these formulas in B2. The reason why I want to use Rows and columns to determine the specific match is that the data in B1 an A2:A7 is ever-changing and that's the reason why the formula needs to be dynamic.
I think this is what you want, but I'm not 100% sure of the desired final outcome.
The formula in B1 is
=OFFSET($D2,0,MATCH($B$1,PhaseArrayList,0)-1,1,1)
and you can just copy that down as many rows as you need. When you change the value in B1 it will update.
The Offset says says start at D2, go 0 rows down, go to the right the number of columns equal to the match of B1 less 1. So if C is in B1 the match returns 3 and the offset goes across 2 columns, i.e. to column F (ROW 3).
(Btw would be less confusing if you labelled them columns!)
let's say I have a list of duplicate values in a column such as:AAABBAACABC. ...AAC. Then I want Excel to Rank respectively the first A as A1,the first B as B1 the second A as A2 and the second B as B2 and so forth until the A(nth) and the B(nth). And any additional A should automatically Rank as A(nth+1). Same thing for B C etc.
note that I should be able to enter any new value such as X that will automatically Rank as X(nth+1) or X1 if it did not exist in the table before.
Hoping that I made myself understood. I am waiting for anyone to help me. VBA or array formulas I have no preference once I can realise what I want.
Note that I found a way by sorting them in accending order but it can't match for any additional value at the end of the data base.
Thanks.
I am assuming your data is in Column "A". Enter the formula in Column "B" and copy the formula till the row you want.
=A1 & IF(A1="","",COUNTIF($A$1:A1,A1))
I have 2 sets of data. I put it in Excel e.g. column A and column B. Now I want to know which data from B is part of column A. I run this formula =IF(COUNTIF($A$1:$A$327238,B1)>0,"Exist", "Nope")
Then I 'filter it and look only 'Exist'. Based on that I know that all data in B that has label 'Exist' is part of column A
Now I want to know opposite i.e. which data from A are part of B. For that reason I use the same formula but I replace the data in columns i.e. data from B now in A and vice versa.
Then I randomly verify results.
For case 1 it looks it works fine but for second case it looks it's not accurate.
My assumption: should it work in case 2 as well ( maybe I just was not very accurate in some way ) and I should expect it to work?
Thanks
In cell C1 (assuming your data starts from 1st row) type the following =IF(A2=B2,"equal","no"), and then populate the same formula to the last row where there is still data, so that for row N, your formula in column C is =IF(AN=BN,"equal","no"). After that you will just need to count the cells with value "no" to know the differences. Sorry if I didn't get the question correctly.
Ok, assuming that the two sets of data are in columns A and B (they might be of different sizes), and the last rows of data are L and M respectively, click on D1 and type the following: =IFNA(INDEX(B$1:B$5,MATCH(A1,B$1:B$5,0),1),"Unique"). Drag down to apply this formula on D1 - DL. That's it, you have the duplicate elements. Since the duplicate elements are the same in both columns - A and B, you don't need to repeat this for column B. Note, that for all the unique elements the corresponding rows of column D have the word "Unique", so if you want the unique elements, you can just get the elements from A with the mentioned row numbers:
Just select any column's first row cell and type the following formula: =IF(D1="Unique",INDEX(A$1:A$L,ROW(D1)),"Duplicate").
I have a simple table with 5 names and 5 grades if you will.
In another column I order the grades using the LARGE function.
Now is there a way to know the row of each of the "ordered" grades to obtain something like that?
White 23 31 5
Red 15 23 1
Green 23 23 3
Blue 18 18 4
Grey 31 15 2
The column I can't calculate is the last one!
You should use the rank() function if you want to rank these grades. Not large().
=RANK(D2,$D$2:$D$6,1)
You can try this
=MATCH(LARGE(B1:B5,1),B1:B5,0)
The result is a number of row...
In Cell D1 Put =INDEX($A$1:$A$5,MATCH(C1,$B$1:$B$5,0))
Then in Cell D2 put =IF(D1<>INDEX($A$1:$A$5,MATCH(C2,$B$1:$B$5,0)),INDEX($A$1:$A$5,MATCH(C2,$B$1:$B$5,0)),INDEX($A$1:$A$5,MATCH(C2,$B$1:$B$5,0)+MATCH(C2,INDIRECT("$B$"&MATCH(C2,$B$1:$B$5,0)+1&":$B$5"),0)))
This will also work when duplicate Grades are present
But I Strongly Suggest using Sort as Follows:
*****Also: ***** Here is the explanation on the above Formulas.
To get the Row that contains the Number we are looking for (the number in Column C) you need yo use the Match() Function. We enter =MATCH(C1,B1:B5,0) in D1:
What this is doing: IS looking to the value in C1, this is 31
It is looking in Range("B1:B5"), And 0 is for an Exact match.
So when look for a match to C1 or 31 we get 5. This tells us that 31 is in Row 5
Now, to get the Value of Column A on Row 5 we use INDEX() Function as Follows:
We add to the =MATCH(C1,B1:B5,0) in D1 as =INDEX(A1:A5,MATCH(C1,B1:B5,0))
This will look in Range("A1:A5") for Row 5 (This is because =MATCH(C1,B1:B5,0) = 5)
And the result will be Grey
Now if we drag this formula down we will find the first problem:
Here are our 2 Issues:
1) We get an `N/A` error in the last row.
2) Although `Green` is only in `Range("A1:A5")` one time we see it twice
even though it would seem that `White` should be twice.
These are cause because:
1) We need to add `$` to the range that will remain the same so when we drag down
the formula is won't shift the range. As is the formula in `D5` is
`=INDEX(A5:A9,MATCH(C5,B5:B9,0))` and we receive the error *because*
`Range("A5:A9")` does not contain `15`, but the issue is we meant
to look in `Range("A1:A5")`
So we change the Formula as so: =INDEX($A$1:$A$5,MATCH(C1,$B$1:$B$5,0))
Take note that we do not use the $ on C1 in the formula cause we WANT this value to change as we move down.
But we still have the issue of double values when they shouldn't be there.
Because D1 is the first cell we won't change the formula in it. As anything that is equal to the greatest value is simply tied with it and I don't see any reason why the order of the tie would matter.
Instead we will start in D2 and enter =IF(D1<>INDEX($A$1:$A$5,MATCH(C2,$B$1:$B$5,0)),INDEX($A$1:$A$5,MATCH(C2,$B$1:$B$5,0)),INDEX($A$1:$A$5,MATCH(C2,$B$1:$B$5,0)+MATCH(C2,INDIRECT("$B$"&MATCH(C2,$B$1:$B$5,0)+1&":$B$5"),0)))
What this is doing is checking if the value of =INDEX($A$1:$A$5,MATCH(C2,$B$1:$B$5,0))
is not equal to the value in the row above. (being a sorted list means all double values would be on top of each other) and If it is NOT the same then use the value, but if it is the same we need to do a little more work.
If the value is not the same we use the Formula INDEX($A$1:$A$5,MATCH(C2,$B$1:$B$5,0)+MATCH(C2,INDIRECT("$B$"&MATCH(C2,$B$1:$B$5,0)+1&":$B$5"),0)))
Now to explain it I will use our example of double values. In D3 we find the formula: =IF(D2<>INDEX($A$1:$A$5,MATCH(C3,$B$1:$B$5,0)),INDEX($A$1:$A$5,MATCH(C3,$B$1:$B$5,0)),INDEX($A$1:$A$5,MATCH(C3,$B$1:$B$5,0)+MATCH(C3,INDIRECT("$B$"&MATCH(C3,$B$1:$B$5,0)+1&":$B$5"),0)))
And because we know that INDEX($A$1:$A$5,MATCH(C3,$B$1:$B$5,0)) will be equal to the above cell (White), and we have gone over how the if true works, I will focus on the if false value of: INDEX($A$1:$A$5,MATCH(C3,$B$1:$B$5,0)+MATCH(C3,INDIRECT("$B$"&MATCH(C3,$B$1:$B$5,0)+1&":$B$5"),0))
We know MATCH(C3,$B$1:$B$5,0) is the Row that contains the first instance of C3 in this case 23 and the row is Row 1 so we need to look for 23 in the row Under Row 1. So we use MATCH(C3,INDIRECT("$B$"&MATCH(C3,$B$1:$B$5,0)+1&":$B$5"),0) which is equal to MATCH("23", B2:B4,0) because we are adding a 1 to the row that has the 1st match for 23 or C3.
that will now return us the Value of 2 as, the value 23 is in the second row of Range("A2:A5"), Red is in Row 1 and Blue in Row 3 of that range as shown:
but we don't want Row 2 we know that 23 relates to Green and that Green is in Row 3 So we add the row the we last found the value 23 (1 or MATCH(C3,$B$1:$B$5,0))to the row we currently found it (2) and get Row 3.
Here is a formula approach based on the methodology outlined in this link. The final layout of this approach is shown below.
I have assumed that there is 1 header row and I use 2 helper columns (D & E). While additional rows can be added to the header, the table must begin in column A in order for the formulas in column E to work correctly.
Although the helper columns could be eliminated by consolidating their formulas into the formulas in column F, I do not recommend it: the resulting formulas would be a pain to maintain.
Formulas Needed
Cell C2: =LARGE(B:B,ROW(A2)-ROW($A$1)) [Copy down to bottom of data]
Cell D2: =MATCH(C2,B:B,0) [Copy down to bottom of data]
Cell E2: =D2
Cell E3: =IF(D3<>D2,D3,E2+MATCH(C3,INDIRECT("B"&(E2+1)&":B"&COUNTA(A:A)),0))
[Copy down to bottom of data]
Cell F2: =OFFSET($A$2,E2-ROW($A$2),0) [Copy down to bottom of data]
Explanation of Answer
There are four steps to getting the answer:
Sort the grades from highest to lowest (as you showed in your example data)
Create a partial ordering of the row numbers for the sorted grades
Get the row numbers for duplicate grades
Use that ordering to show the name for each sorted grade
Sort the grades from highest to lowest
As you have done, my sort uses the LARGE function, which returns the nth largest value in a range or array. As shown, the LARGE function in cell C2 takes the grades in column B. The "n" for LARGE is calculated as the current row number minus the number of rows in the header, in this case the 1 row for cell A1. When the formula is copied down, "n" progresses from 1 to 2 to 3, etc.
Partially order the grade row numbers
The next step is to determine the row numbers for the unsorted grades that correspond to the sorted grades.
To do that, I use the MATCH function to find where each of the sorted grades lies in the list of unsorted grades in column B. MATCH takes three arguments--the value to be matched, the range in which to make the match, and optionally, the type of match, with a value of 0 or FALSE for an exact match--and returns an index number which represents where in the lookup range the match is found (1 for the first row in the match range, 2 for the second row, etc.).
In the formula for cell D2 shown above, the MATCH function on the grade 31 returns 6 since 31 is in the sixth row of column B.
The result for cell D4 shows why it is only possible to get a partial ordering with this formula. While we are trying to lookup the row for the second instance of a grade of 23, the formula returns a value of 2, which corresponds to the row for the first instance of 23. That's because MATCH will always return the first match for 23 it finds, which is on row 2!
Get correct row numbers for duplicate grades
The next step is to get the correct row references for the duplicated row numbers in column D. The formulas that accomplish this are shown for the first three cells in column E of the table.
There are three cases that have to be dealt with in column E:
For the first (and possibly only) instance of the highest grade, it is possible to just use the row number calculated in cell D2.
The second case deals with the first instances of the row references of the remaining grades. For these the rows numbers calculated in column D can again be used (via the TRUE branch of the IF statement in the column E formulas). For example, in cell E2 -- which corresponds to the first instance of grade 23 -- the row number in cell D3 can be used.
The final case is the rows for duplicate grades. Here, the MATCH for each duplicate in column B is recalculated using a sliding range that excludes the previous matches for that grade. For example, for the duplicated grade of 23 in column C, the match is on the range B3:B6, rather than the range of B2:B6 used in the column D calculation.
Diplay the names in sorted order
This final step is straight forward: Get the name corresponding to the sorted grade. Here the OFFSET function is used; its arguments are a cell reference and the number of rows and columns from that reference that the desired value is to be found.