Finding and obtaining the difference between two columns - excel

I have two columns. Each column has tens of thousands of values. I need to find the difference between them and print the difference in some cells. I read similar questions, but they are not enough for my question, highlighting the different cells is not enogh for me because it would be very tiring to look at tens of thousands of cell by searching highlighted cells. Thus, i need to obtain the values.
Example:
Column1 Column2
John Jennifer
Mary Washington
Joe John
Michael Texas
Houston Newyork
Texas Mary
Values existing in col1 but not col2 : Joe, Michael, Houston
Values existing in col1 but not col2 : Jennifer, Washington, New York
Algorithmically, i need to check each row of column1 whether it exists in Column2, if the row does not exist in Column2, the value is taken.
Similarly, i need to check each row of column2 whether it exists in Column1, if the row does not exist in Column1, the value is taken.
Thanks

Old: If this is a once-off i'd make a backup then mark the area of the two columns and "remove duplicates" which is a button on the DATA command bar.
You would be left with unique records only, which is what I understand you want.
Edit: If you want to completely remove duplicate values then: Assuming that col1 is "A" and col2 is "B" then this formula only shows the record if it is in A but not in B. You can make a similar one for B not in A. In my example I made two columns C, D for the unique values. Then filter the list on these being "not nothing", and you have a set of unique records.
Current formula is for A2 cell - in my example placed in C2.
=IF(ISERROR(MATCH(A2;B:B;0));A2;"")

Related

Search for duplicate text string in two columns and highlight, excel

I'm looking for a way to search and highlight duplicate text strings in two different columns in Excel; this means that the cell content doesn't have to be identical, instead of that is what I need is that if the content of column A is somehow contained in any cell of column B, both cells get highlighted.
For example, let's say that I have two columns, one named "Patient" and another one called "Couples". So, what I would need is to make a comparison between both columns, and if one of the patient's names is within a couple, both cells get highlighted:
Column A. Patient name | Column B. Couple name
John Smith | Adriana Lewis - Mark Rutte
Peter Brown | Giaccomo Down - Rosy Lawn
Jerry Goldsmith | Bob Loewe - Gigi Pink
Ewan Thompson | Sonia Farrel - John Smith
In this example, the content of A2 ("John Smith") is also contained in B5 ("Sonia Farrel - John Smith"), so that I would need that both A2 and B5 get highlighted. Also, both columns don`t have the same range, one is shorter than the other, since there are more names than couples; and it can happen that two names in different cell are contained in a single couple, so that all three cells should get highlighted.
I have tried everything, with no success... please help!
Multiple ways to do this but here's one option with conditional formatting.
Rule applied to data in column A, using COUNTIF and wildcards.
=COUNTIF($B$2:$B$5,"*"&A2&"*")>0
Rule applied to data in column B, using ISNUMBER, SEARCH and SUMPRODUCT.
=SUMPRODUCT(--ISNUMBER(SEARCH($A$2:$A$5,B2)))>0

Find distinct values based upon multiple columns

I have a spreadsheet of sales with (to keep the example simple) 3 columns
NAME -- STATE -- COUNTRY
It's easy to find how many sales. (sum all the lines)
I can find out how many customers I have but how about finding out how many customers from a particular state (and country)
NAME -- STATE -- COUNTRY
p1----- CA------ USA
p2----- CA------ USA
p1----- CA------ USA
p1----- CA------ USA
p3----- NY------ USA
p3----- NY------ USA
The above example would give 2 unique customers from CA and 1 unique customer from NY and 3 from the USA
EDIT:
The desired result from the above table would be
STATE - UNIQUE CUSTOMERS
CA ---- 2
NY ---- 1
COUNTRY - UNIQUE CUSTOMERS
USA ---- 3
Assuming your data have headers in row 1 of columns A, B, and C, follow these directions.
In cell F1 enter STATE.
In cell G1 enter COUNT.
In cell F2 enter this array-formula (must be confirmed with Ctrl+Shift+Enter↵):
=IFERROR(INDEX(B$2:INDEX(B:B,COUNTA(B:B)),MATCH(0,COUNTIF(F$1:F1,B$2:INDEX(B:B,COUNTA(B:B))),)),"")
In cell G2 enter this regular formula (confirmed with Enter):
=IF(LEN(F2),COUNTIF(B2:B13,F2),"")
Select F2:G2 and copy.
Now select F3:F51 and paste.
UPDATE
The nature of the question changed. The first formula is exactly the same as before. It gets the distinct states in the source data and culls them so they display with no blanks.
The second formula is now different. It needs to count the number of distinct customers in each state, and it is now an array formula confirmed with Ctrl+Shift+Enter↵).
=IF(LEN(F2),SUM(IF(F2=$B$2:$B$50,1/(COUNTIFS($B$2:$B$50,F2,$A$2:$A$50,$A$2:$A$50)),)),"")
This formula (entered as an array formula CTRL-SHIFT-ENTER) will count the number of occurrences of a Name in MyState
=COUNTIFS(Names,Names,States,MyState)
So if MyState="CA" this would return {3;1;3;3;0;0}
To get the number of names in CA you can sum the reciprocals of this array, EXCEPT taking the reciprocal of zero is invalid/infinite. So wrap the formula above in a test for zero: if it's zero, output zero, otherwise take the reciprocal (one of the rare situations where you get to set infinity equal to zero!):
=IF(COUNTIFS(Names,Names,States,MyState)=0,0,1/COUNTIFS(Names,Names,States,MyState))
(Still an array formula.)
For CA this will return {0.333333;1;0.333333;0.333333;0;0}
The final step is to sum with the array formula:
=SUM(IF(COUNTIFS(Names,Names,States,MyState)=0,0,1/COUNTIFS(Names,Names,States,MyState)))
It's possible that this could return say 2.99999... instead of 3 due to rounding errors. If that's a problem you can fix it by wrapping it with the ROUND function or setting the display format zero decimal places.
It should be straightforward to modify this to count by country. Hope that helps.
Since the question also has an 'google-spreadsheets' tag, this would be my suggested formula to use in a google spreadsheet:
For the state counts:
=query(unique(ArrayFormula({A2:A&B2:B, A2:C})), "select Col3, count(Col1) where Col3 <> '' group by Col3 label count(Col1)''",0)
And for the country counts:
=query(unique(ArrayFormula({A2:A&B2:B&C2:C, C2:C})), "select Col2, count(Col1) where Col2 <> '' group by Col2 label count(Col1)''",0)
Also see this example spreadsheet.
Easy with a PivotTable in Excel 2013:
Also easy with Google Spreadsheets:

Remove swapped duplicated row using Excel

I have an id pairs in 2 columns. There are some pairs which are redundant, but in a swapped form. How can i removed redundant id pairs using Excel?
Here is the explanation,
Initial,
col1 col2
id1 id2
id2 id1
id3 id8
id1 id5
id1 id6
id2 id9
Need to be like,
col1 col2
id1 id2
id3 id8
id1 id5
id1 id6
id2 id9
(Note that the 2nd row id2 id1 is deleted, because it is a swapped duplicate).
Thanks..
1.Write two max min functions in the as a excel function adjacent to these 2 cells(maxa did not work for some reason)
2.copy values only to some other place and remove duplicates.
ps: I tried to transpose then sort transpose back, but it tries to sort as one big record rather than tuples. Hence the function.
I would use the following steps:-
Join them together in the right order into column C
=IF(A2<B2,A2&"|"&B2,B2&"|"&A2)
Find the unique values in column D (array formula must be entered with Ctrl-Shift-Enter))
=IFERROR(INDEX($C$2:$C$7, MATCH(0, COUNTIF($D$1:D1, $C$2:$C$7), 0)),"")
Separate the first one in Column E
=IFERROR(LEFT(D2,FIND("|",D2)-1),"")
Separate the second one in column F
=IFERROR(RIGHT(D2,LEN(D2)-FIND("|",D2)),"")
Assuming you're dealing with Column A and B with a header in each column, I'd do this put cell C2 and drag down. Delete anything greater than 0:
=COUNTIFS($B$2:B2,A2,$A$2:A2,B2)
To clarify what's happening here, the $ in the ranges act as anchors when you drag down. So making the range from $B$2:B2 means the next cell down will be $B$2:B3 followed by $B$2:B4 and so on.
The COUNTIFS() formula returns a count where ALL the criteria are met, so only those rows with both values switched are counted.
Using the COUNTIFS() with the anchored range like I've done here only counts whatever comes above that row, which is why you can delete any non-zeros without losing the unique values.
I'd recommend verifying this in case there's something I didn't think of, of course.

VLOOKUP for multiple entries

I have a few columns in a sheet. First column being first names and the fifth being their respective age. If I want to search the age column for a particular age say '12' and return their corresponding first names in a separate sheet, what should i do? I tried VLOOKUP but I could not figure out the logic. Can someone help me out?Thank you.
Unfortunately VLOOKUP will not work in this situation,as the Vlookup function cannot reference details on left side, however you can use a combination of INDEX and MATCH functions thou. Lets say you have following table
A B
mark 11
john 23
Selly 30
Youbaraj 45
and you want to get the value of A based on the value of B, you can use something like
=INDEX(A1:A20,MATCH(1,B1:B20,0))
You can use index and match to do HLOOKUP and VLOOKUP looking into any column and getting values of any side.
You can use an Index and Double match .. to get answers from a column by entering its name.
Example:
A B C D
1 col1 col2 col3 col4
2 val1 val2 val3 val4
3 val5 val6 val7 val8
Consider in cell C10 ColumnName:
And in cell C11, you enter the name of a column
Now see what would happen with this function
=index(A1:D3,Match(C11,A1:D1,0),Match(val-to-look-for, Column (a,b,c,d),0))
You can dynamically type a name of a column to get that column name's look up value
Vlookup is very easy to use, however the first column must be arranged alphabeticaly before in order for it to work properly. (and usualy i use only the exact match argument).

Add cell string to another cell if 2 cells are the same for 2 rows

I'm trying to make a macro that will go through a spreadsheet, and based on the first and last name being the same for 2 rows, add the contents of an ethnicity column to the first row.
eg.
FirstN|LastN |Ethnicity |ID |
Sally |Smith |Caucasian |55555 |
Sally |Smith |Native American | |
Sally |Smith |Black/African American | |
(after the macro runs)
Sally |Smith |Caucasian/Native American/Black/African American|55555 |
Any suggestions on how to do this? I read several different methods for VBA but have gotten confused as to what way would work to create this macro.
EDIT
There may be more than 2 rows that need to be combined, and the lower row(s) need to be deleted or removed some how.
If you can use a formula, then you can do those:
Couple of assumptions I'm making:
Sally is in cell A2 (there are headers in row 1).
No person has more than 2 ethnicities.
Now, for the steps:
Put a filter and sort by name and surname. This provides for any person having their names separated. (i.e. if there is a 'Sally Smith' at the top, there are no more 'Sally Smith' somewhere down in the sheet after different people).
In column D, put the formula =if(and(A2=A3,B2=B3),C2&"/"&C3,"")
Extend the filter to column D and filter out all the blanks.
That is does is it sees whether the names cells A2 and A3 are equal (names are the same), and whether the cells B2 and B3 are equal (surnames are the same).
If both are true, it's the same person, so we concatenate (using & is another way to concatenate besides using concatenate()) the two ethnicities.
Otherwise, if either the name, or username, or both are different, leave as blank.
To delete the redundant rows altogether, copy/paste values on column D, filter on the blank cells in column D and delete. Sort afterwards.
EDIT: As per edit of question:
The new steps:
Put a filter and sort by name and surname. (already explained above)
In column E, put the formula =IF(AND(A1=A2,B1=B2),E1&"/"&C2,C2) (I changed the formula to adapt to the new method)
In column F, put the formula =if(and(A1=A2,B1=B2),F1+1,1)
In column G, put the formula =if(F3<F2,1,0)
In column H, put the formula =if(and(D2="",A1=A2,B1=B2),H1,D2) (this takes the ID wherever it goes).
Put the formulae as from row 2. What step 3 does is putting an incremental number for the people with same name.
What step 4 does is checking for when the column F goes back to 1. This will identify your 'final rows to be kept'.
Here's my output from those formulae:
The green rows are what you keep (notice that there is 1 in column G that allows you to quickly spot them), and the columns A, B, C, E and H are the columns you keep in the final sheet. Don't forget to copy/paste values once you are done with the formulae and before deleting rows!
If first Sally is in A1 then =IF(AND(A1=A2,B1=B2),C1&"/"&C2,"")copied down as appropriate might suit. Assumes where not the same a blank ("") is preferred to repetition of the C value.

Resources