Excel: Find duplicates in column with differences in another column

Excel: Find duplicates in column with differences in another column - excel

I want to highlight cells in column A, which have duplicates in column A but a difference in column B.
A B
1 2 -
2 3 +
3 2 -
2 4 +
1 2 -
3 2 -
4 5 -
The rows (or a cell within the row) with the - shall not be highlighted, but the rows (or a cell within the row) with the + shall be highlighted.
How can I accomplish this in an Excel formula?
Please pay attention to the fact, that not all unique combinations shall be highlighted (last row!).
In SQL the corresponding query would be something like this:
SELECT *
FROM table
GROUP BY A
HAVING COUNT(B) > 1

A simpler solution might be to use Concatenate to join A and B together and use a conditional formating to highlight the unique values. This would leave your desired list highlighted:
For the Conditional Formatting highlight column C then navigate:
Home-> Conditional Formatting -> New Rule-> Format only unique or duplicate values
Then change selection from "duplicate" to "unique" and select the desired format. Apply the setting and have identified the appropriate rows.

Assuming your data is in A1:B7, (with "A" and "B" as headers on row 1):
I used the following formulas to get the matches ..
I just did a simple search after, and before .. if it finds a record above or below, it "flags" it in column F as TRUE.
Not sure it works for 3 or more duplicates, though you didn't seem to indicate how you wanted a 3 of a kind to work ;)
D2=MATCH(A2,A3:$A$1000,0)
E2=IF(ISERROR(D2),IF(ISERROR(G2),"",OFFSET($A$1,G2,0,1,1)),OFFSET(B2,D2,0,1,1))
F2=AND(NOT(AND(ISERROR(D2),ISERROR(G2))),B2<>E2)
G2=MATCH(A2,$A$1:A1,0)`
D col locates the first matching A column after the current row.
G col locates the first matching A Column prior to current row.
E col pulls that remote B column value to current row to more easily check.
F col puts the logic together: If we found something, and B cols are not equal.

Here is another way to do it assuming your above data is in cells A2:B7:
1) Copy and paste your column A values to a blank section of your workbook(Lets say A11) and perform the following function Data->Remove Duplicates with the section selected.
2) Highlight cells B10:B13(all cells where a value is in column A) and type in the following formula:
=FREQUENCY(A2:A8,A10:A13)
Hit Ctrl + Shift + Enter to make this an array.
3) Similar to step two highlight all cells in column C where there is data in columns A and B. In this case C2:C7 and use the following formula:
=IF(VLOOKUP(A2,$A$10:$B$13,2,FALSE)>1,IF(FREQUENCY(VALUE(CONCATENATE($A$2:$A$7,$B$2:$B$7)),VALUE(CONCATENATE($A$2:$A$7,$B$2:$B$7)))<>1,"","Highlight"),"")
Hit Ctrl + Shift + Enter to make this an array.
Your cells that need to be highlighted will now say "Highlight"

Related

Complicated Find row in duplicate show which column A or B or C

Could you please help me for below formula little bit complicated
Problem is
In a sheet I have three column A,B,C any one column amount if it is same in D column need to highlight and show which column A or B orC..
Example
A B C D amount in each row
4 5 7 4 please highlight bez a and d match
Next example
5 3 6 2 should not highlight show error msg
In above case D is matching same number with AorBorC
Please help me this logic formula
Formula
=If(countif(A1:d4)=1, "duplicate","unique")
working fine but is there any possibility to show which cell column A or B or C ... If duplicate which column need to mention
Much appreciated for this complicated formula am not sure whether it required VBA here

Your question mentions "Highlight", so here's that part of the solution. Select your first 3 columns of data (A1:C6 in my case). Then go to Conditional Formatting in the Home Tab. Create New Rule, using a Formula to determine which cells to format.
Here's the formula:
=A1=$D1
Change the format fill to your color of choice. Click OK.
EDIT - Adding the last piece here...
Lastly, to display which column(s) match column D value, you could use a formula such as this.
Cell E1 Formula:
=CONCAT(IF(A1=D1,"A",""),IF(B1=D1,"B",""),IF(C1=D1,"C",""))
Drag it down.

XLOOKUP unlike VLOOKUP returns a reference to the cell and not just the value of the cell.
With this in mind =XLOOKUP(D2,A2:C2,A2:C2,NA()) will return the value if it exists as well as the reference.
If we wrap the Return Array with the Column function it will return the column number.
=XLOOKUP(D2,A2:C2,COLUMN(A2:C2),NA())
Add the ADDRESS function to return the cell address (this will return the address on row 1)
=XLOOKUP(D2,A2:C2,ADDRESS(1,COLUMN(A2:C2),4),NA())
Now substitute the 1 in the cell address with a blank:
=SUBSTITUTE(XLOOKUP(D2,A2:C2,ADDRESS(1,COLUMN(A2:C2),4),NA()),"1","")

Excel, returning what's not equal when comparing columns

I have a few columns for example:
a b c
1 1 0
1 1 0
0 1 0
So I can easily find out if they are equal or not (row 1 = row 2):
=and(a2=a3,b2=b3,c2=c3)
When doing this for comparing row 2 to row 3 we get FALSE, however I'd like to know a way to find out which column(s) caused the fail. In this case it would return column a.
EDIT
I guess I could check each column individually and then search for FALSE's on that row of results, but seeking something more elegant.

Here, I got one for you. This formula is for matching ROW 1 and ROW 2 of your sample data.
=IF(AND(A1=A2,B1=B2,C1=C2),"Matched","Unmatched Column: " & IF((A1=A2),"","A") & IF((B1=B2),"","B") & IF((C1=C2),"","C"))
If you want to matched more than two row, don't worry, put that formula in first row and drag the cell value to the last row. So, every row will filled with related formulas.
I think that the another way is making with excel-vba and no more way to do. If you found the other way, post it. We will vote.
Good Luck.

You can also use conditional formatting.
Highlight the range A3:C4, Choose Conditional Formatting | New Rule | Use a formula... and enter
=A2<>A3
then choose a format (e.g. fill colour) to highlight the cells that don't match.
The formula automatically changes to =A3<>A4 for the second and third rows of data, so will highlight cell A4.

Delete reciprocal duplicates in Excel

I'm trying to find a quick way to delete reciprocal duplicates between two columns.
For example, COG00035 is in column A and COG00065 is in B.
I have to look to make sure that further down there isn't A:COG00065 & B: COG00035.
I would do this manually but there is literally thousands of rows I would to look for. And the entire row has to be deleted as A and B have to stay together. Thanks!
If you need a better example let me know.

Have you tried Data > Remove Duplicates?
You should be able to select what columns you want to include in the comparison. I assume in this case you'd check both the columns you have and if there is duplicate rows matching both entries then they will be removed.
Short overview of this on the Office website.

Two-step process:
1) Remove duplicates in the same order
Add a third column in C that combines the first two, or:
=concatenate(A1, B1)
And drag that all the way down to the end of your data. Then, select your entire data table (from A1 all the way to C# where # is the last row number), go to Data->Remove duplicates, and unselect Columns A and B in the next pop-window (only column C should be selected).
Press "OK" - this should remove any row that has a duplicate in column C, which essentially means columns A and B are the same.
2) Remove duplicates in reciprocal order
Add a fourth column that combines the first two in reverse order, or:
=concatenate(B1, A1)
In the fifth column, add a formula that counts whether any individual cell in Column C is found in Column D, restricted the area in Column D that is below the row of that individual cell.
For example:
Formula in C1: =CONCATENATE(A1,B1)
Formula in D1: =CONCATENATE(B1,A1)
Formula in E1: =COUNTIF(D2:$D$100,C1)
...assuming your table has 100 cells (e.g., $D$100 is last row). What this will do is show 0s in Column E for rows that are unique, and 1s for rows that are duplicates. You can then filter column E to show the 1s and then delete all the visible cells (Home -> Find & Select -> Go To Special... -> Visible cells only; Ctrl + - to delete rows)

Highlighting a cell in one column when matching value found anywhere in a second column

Column B is where data is going to be entered.
Column I has a list of data that is predefined.
I want column I's data to be highlighted when the data is found anywhere in column B. Example:
A B C D E F G H I J
---------------------------------------------
3 1
2 2
7 3
So the cells in column I containing 2 and 3 would be highlighted because found in Column B at some point.

Please try selecting ColumnI, HOME > Styles - Conditional Formatting - New Rule..., Use a formula to determine which cells to format, Format values where this formula is true:
=COUNTIF(B:B,I1)>0
choose your formatting, OK, OK.

Use conditional formating with
=isna(match($i1,$b$1:$b$3,0))
as the condition for cells that don't match (the zero is important as it means exact matches only)
... but
=countif($b$1:$b$3,$i1)=0
will do the same thing, while
=countif($b$1:$b$3,$i1)>0
is a condition for matches

Delete duplicate items in Excel (including original value)

How to delete duplicate items in excelsheet(column), where it has to delete the items which has more than one occurrence:
1
2
3
3
4
4
If we use remove duplicate option, it will give distinct values, but what should be done to get only values
1
2
Since 1 and 2 values are not duplicated and these two have only one occurance in a Excel(column)

Follow Below Steps.
Consider you have data in column A
Write formula as =IF(COUNTIF(A:A,A1)=1,0,1) in column B.
Apply Step 3 for all rows that are there.
Wherever you have duplicate data, you will see 1 in column B else you will see value as 0. :)
Go To menu Data and apply filter for 1. Those are the rows that are duplicate. Want to delete it?? Delete it :)
Here is the demo

How about Conditional Formatting --> Highlight cell rules --> Duplicate values
Duplicate values now highlighted. Apply filter, sort by colour, delete all highlighted cells - that only leaves unique values.

The COUNTIF macro is pretty slow for long columns. If you need something faster, you can do the following:
Sort the column of values in column A.
In Column B next to it, add the formula =IF(A1=A2,0,1)
Double-click the + icon at the bottom right of the formula's selection box to apply the formula to the whole column.
Add a filter while both the columns are selected.
Click the column B filter arrow button to show only values where Column B = 1
This will flag all of the transitions from one value to the next with a 1. Then the filter will only show those rows with a transition. The resulting filtered column A will contain only the unique values.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Excel: Find duplicates in column with differences in another column - excel

Related

Complicated Find row in duplicate show which column A or B or C

Excel, returning what's not equal when comparing columns

Delete reciprocal duplicates in Excel

Highlighting a cell in one column when matching value found anywhere in a second column

Delete duplicate items in Excel (including original value)

Categories

Resources