Find the difference between data in 2 columns in Excel - excel

I have 2 sets of data. I put it in Excel e.g. column A and column B. Now I want to know which data from B is part of column A. I run this formula =IF(COUNTIF($A$1:$A$327238,B1)>0,"Exist", "Nope")
Then I 'filter it and look only 'Exist'. Based on that I know that all data in B that has label 'Exist' is part of column A
Now I want to know opposite i.e. which data from A are part of B. For that reason I use the same formula but I replace the data in columns i.e. data from B now in A and vice versa.
Then I randomly verify results.
For case 1 it looks it works fine but for second case it looks it's not accurate.
My assumption: should it work in case 2 as well ( maybe I just was not very accurate in some way ) and I should expect it to work?
Thanks

In cell C1 (assuming your data starts from 1st row) type the following =IF(A2=B2,"equal","no"), and then populate the same formula to the last row where there is still data, so that for row N, your formula in column C is =IF(AN=BN,"equal","no"). After that you will just need to count the cells with value "no" to know the differences. Sorry if I didn't get the question correctly.
Ok, assuming that the two sets of data are in columns A and B (they might be of different sizes), and the last rows of data are L and M respectively, click on D1 and type the following: =IFNA(INDEX(B$1:B$5,MATCH(A1,B$1:B$5,0),1),"Unique"). Drag down to apply this formula on D1 - DL. That's it, you have the duplicate elements. Since the duplicate elements are the same in both columns - A and B, you don't need to repeat this for column B. Note, that for all the unique elements the corresponding rows of column D have the word "Unique", so if you want the unique elements, you can just get the elements from A with the mentioned row numbers:
Just select any column's first row cell and type the following formula: =IF(D1="Unique",INDEX(A$1:A$L,ROW(D1)),"Duplicate").

Related

How to find non matching records from two columns while accounting for duplicate values in Excel

I have two large columns.
Column A contains 100,000 different numbers/rows. Column B contains 100,210 numbers/rows. They have the same numbers except column B has 210 extra rows. I need to be able get the values of that extra 210 rows.
The issue im having is that the numbers in these rows are not unique.
For example,
Column A contains the following numbers: 2,1,3,4,5,5,6,7
Column B contains the following numbers: 1,2,3,4,5,5,5,5,6,6,7,8
I want the outcome result to be: 5,5,6,8
I can't seem to wrap my head around a way to do this.
I have the two columns in a text file that im importing into excel. If there are better ways to do it outside of excel, I am open to it too.
With the Dynamic Array formula Filter:
=FILTER(B1:B12,COUNTIF(OFFSET(B1,0,,SEQUENCE(ROWS(B1:B12))),B1:B12)>COUNTIF(A:A,B1:B12))
Without FILTER:
Put this in the first cell and copy down:
=IFERROR(INDEX(B:B,AGGREGATE(15,7,ROW(B1:B12)/(COUNTIF(OFFSET(B1,0,,ROW(INDEX($ZZ:$ZZ,1):INDEX($ZZ:$ZZ,ROWS(B1:B12)))),B1:B12)>COUNTIF(A:A,B1:B12)),ROW($ZZ1))),"")
Try to follow these steps, supposing that Column A has less values than the Column B and the rows start at 1:
A. Create Column C.
In the cell C1 place the function: =COUNTIF(A:A;B1)
Copy this function to the rest of cells, for all items of Column B. So, cell C2 will have the function =COUNTIF(A:A;B2) and so on.
B. Create column D.
In the cell D1 place the function: =COUNTIF($B1:$B1;B1)
Copy this function to the rest of cells, for all items of Column B. So, cell D2 will have the function =COUNTIF($B$1:$B2;B2) and so on.
C. Create column E.
In the cell E1 place the function: =IF(D1<=C1,"Exists","Missing")
Copy this function to the rest of cells, for all items of Column B. So, cell E2 will have the function =IF(D2<=C2,"Exists","Missing") and so on.
D. Filter to show only the rows that Column E values are "Missing".
Of course you can combine all above 3 columns to one (e.g. in Column F), so these cells will have the functions:
F1: =IF(COUNTIF($B$1:$B1,B1)<=COUNTIF(A:A,B1),"Exists","Missing")
F2: =IF(COUNTIF($B$1:$B2,B2)<=COUNTIF(A:A,B2),"Exists","Missing")
and so on
Explanation:
In column C we count how many times the value of the respective cell
of Column B exist in the whole Column A.
In Column D we count how many times we have "met" this value in Column B so far.
In Column E we check if we have "met" the value more times that it exists in Column A. If indeed we have "met" it more times, then we mark the cell as "missing"
Tested with the example you provided and works okay.
I hope it helps!
Good luck!
EDIT - Addition of Screenshot

Excel Index & Match two columns while pulling data for a third

I am trying to create a formula which compares numerical values of two columns and when a match is found return data from a different column.
Here is the problem I am having - the formula I have come up with seems to go Row by Row - for example, if column A & B are the two columns I want to compare, with column C having the data I want if a match is made. If I have a value of 1 in row A1 and a value of 1 in B1 - it will successfuly return the data in column C.
The problem is that my numbers are jumping, the row's do not match, for example column A is 1,2,3 however column B is 1,3,2. The end result here is that I get data for the value 1, but on the mismatch for the second row I get no value.
Basically the formula I made seems to do a hard comparison based just off the two rows of each column - meaning it will only compare A1 to B1. What I really need is it to compare the ENTIRE column and disregard the rows completely
Here is the formula I have been fooling around with - this formula works if A1 and B1 match
=INDEX(M:M,MATCH($L:L,$V:V,0))
In this formula M has the data I need, while columns L and V have numerical values, I need it to not 'hard check' row by row and instead evaluate the entire column and when a match is found return the result (so if both columns have a '2' return that value REGARDLESS of the fact that the '2' may be in rows A2 and B9)
Hopefully I explained my issue well, and I appreciate all the help I can get
EDIT
Sorry for failing to explain it properly on my end -- I will base my explanation off the picture link below.
I need data from column B to show up in column D. What is happening is row 2 matches so the data is successfully retrieved for number 1 - however on row 3 where column A switches the numbering it compared the number '3' to the number '2' and recognizes it is not a match and returns NA -- even though there IS a match in columns A4 and C2 -- in this situation for C3 I would need the data from B4 to showup in D3
http://i.stack.imgur.com/nrKJp.png
Two formulas that will each give you the return you want, Put one of the following in D2 and copy down:
=VLOOKUP(C2,A:B,2,FALSE)
OR
=INDEX(B:B,MATCH(C2,A:A,0))

Excel Sum instances of an item in a column until a different column variable is met

What I'm trying to accomplish is a little complicated to explain:
There is a formula that inserts an integer into Columns B or C depending on the values of other columns (not shown here).
What I want to do it place a sum of the items in column B where the criteria are:
Items in column B have a value of 1
and
The range that is counted are all items between Type A and Type A (in the image below B2:B13)
The sum should be placed in the cell in line with the instance of Type A being used for the range.
So Below I want the sum of items in column B that are occur in rows B3:B13 and have that number placed in B2.
Then the same for B15:B24 and the result placed in B14, and so on...
The test in column A that will be the trigger will remain consistent as it is auto-generated.
Then the same would happen in column C (which I'm assuming I can just use the same formula just change the column references)
Any help would be appreciated.
Place this formula in B2:
=IFERROR(SUM(OFFSET(B3,0,0,MATCH($A2,$A3:$A1000,0)-1,1)),SUM(B3:B1000))
Then filter on all rows with Type A and copy from B2 for all the visible cells.
The IFERROR is only for the last Type A (with no Type A underneath it. Then it will just sum the remaining rows underneath. Also, adjust the 1000 if needed.

Excel - Comparing multiple columns to see if results are identical

I would like to compare three separate columns in an excel spreadsheet, across thousands of rows.
If any value appears in column A multiple times (say the word hello in column A rows 1 and 4, and the word bye in column A and rows 3 & 5, I would like to check the corresponding values in column B for those rows (ie rows 1&4 and 3&5).
If the values in column B for rows 1&4 are say 15 & 15, and the values for rows 3&5 are 20 & 20 , then I want to check column C.
Now we know rows 1&4 and 3&5 have the same corresponding values in column A & B, I would like to check the corresponding values in column C. If these are different then I would like to perform a specific calculation. If they are the same values in Column C, then I want to ignore these rows.
I am sorry this is very unclear, as I cannot paste an image to show what I mean. I can email you an example if it helps.
This is way beyond me and my excel skills and I do not know where to start. Any help would be appreciated. I am hoping I don't need to write a Macro.
Thanks in advance!
So, to resummarize your question as I understand it:
Column A holds string values (text). There are some duplicates here.
Column B holds number values. When a duplicate occurs in column A, the data in column B may or may not be identical as for the other duplicate entries.
Column C holds values (you did not define what type of values, but I assume these are number values). Sometimes, duplicates in column A hold the same values in column B, and also the same values in column C. In this case, we can ignore the row as all the duplicates agree. Sometimes, duplicates in column A hold different values in column B. In this case, we can also ignore the values. Finally, sometimes duplicates in column A hold the same values in column B, but different values in column C. For these specific values, we want to perform some other type of calculation (which you did not specify).
Put the following in column D, starting at row 2 (assuming a header on row 1), which is the starting point of the formula we will build.
=IFERROR(VLOOKUP(A$1:B1,A2,2,0)=B2,"")
This says: Look at column A, starting always at row 1, and going until 1 row above the current row. Check for a match of the text in the current row. If it finds a match there, pull the result from column B. Does that result match column B in the current row? If it matches it will say TRUE; if it doesnt match it will say FALSE. If there are no duplicates yet in column A it will say "".
Now add a new check - if the above formula is TRUE [ie: there is a duplicate in column A, and the result in column B matches], then we want to check the results from column C:
=IFERROR(IF(VLOOKUP(A$1:B1,A2,2,0)=B2,VLOOKUP(A$1:C1,A2,3,0)=C2,""),"")
This will now return TRUE if the values in column C match for that duplicate in column A (which is only checked if the values in column B match too). Finally, add in your "special calculation", like so:
=IFERROR(IF(VLOOKUP(A$1:B1,A2,2,0)=B2,IF(VLOOKUP(A$1:C1,A2,3,0)=C2,"",C2+1),""),"")
Where I have C2+1, this is where you will perform your special calculation. This will only be recorded by Excel if: there is a duplicate in column A, that duplicate has a matching value in column B, and that duplicate has an unmatched value in column C.

Is there a way to delete cells if their value is contained in another column?

I have a spreadsheet with multiple columns with a few thousand rows and I would like to find the cells that are common across all columns. Is there a function that I can use to check if a cell value exists in a set of cells/column?
To find out if a value exist in all columns but in any row you can put this equation in the next open column and drag down:
=AND(MATCH(A1,B:B,0),MATCH(A1,C:C,0))
This assumes you have data in column A, B & C and the equation is in column D. now you can sort on column D for unique values.
Depending on your data type you might get an error. If that is the case try this:
=AND(IFERROR(MATCH(A1,B:B,0),FALSE),IFERROR(MATCH(A1,C:C,0),FALSE))

Resources