Finding unique values between 2 Excel columns

Finding unique values between 2 Excel columns - excel

I have two simple columns in Excel below
ID ID1
123 123
124 125
125 126
126
I was able to use the conditional formatting in excel as follow:
1. Highlight the 2 columns
2. Click on the conditional formatting
3. New rule
4. Select format only unique or duplicate values
5. Select unique under format all:
6. Select Format and click ok.
I can see that the steps above shows the highlighted value which is 124.
My question is, how can I filter out that 124 value from such a small sample above?
I have a columnA that has 50k records and columnB that has 48k records. I want to see or filter out the 2k records from columnA.

Here's one approach using match and a filter.
Enter =MATCH(A1,B:B,) in column c add a filter to row 1 and filter for column C values that are #N/A the values in column A which say #N/A in column C are not in column B.
An alternative approach would be to move the column B under Column A and use COLUMN b to define the source ID, ID1... and then use a pivot table to show you both sets and what could be missing from each...
Notice how 5 is not IN ID while 13 is in ID but not in ID1.
also notice that duplicates on Row 5 show up as well.

One way of doing this (kind of manual) is to find the values which are duplicates (the ones that are highlighted). Then select both the columns and all the data.
Then go to: Home > Editing group> Sort & Filter > Filter
There should be small, down arrows on the column headers. Then you can un-check the duplicates.
EDIT
Thanks to xQbert for pointing out my mistake. Here is a way to solve this:
If possible, you can move the second column to another worksheet. Now use the following formula in a column next to the first column:
=COUNTIF(Sheet2!A2:A5,Sheet1!A2:A5)
Just change the cells to the ones for your table. The first parameter is the second column (which you should have put in the new sheet). The second parameter is the first column which will be highlighted. This will put a '1' next to each value which is duplicated.
Then you can use conditional formatting to highlight the cells with a '1' next to them using this formula in the "New Rule":
=B2=1 That is the cell number of the first cell in the first column. It should be highlighted to the colour you set it to. To copy the formatting to the rest of the cells, click on the first cell B1. Then go Home > Format Painter. Drag the formatting to the entire column.
You can then use the filter to show only the cells with "No Fill"
You will have to do the same for the data in the new sheet.
This was a very 'hacky' solution but it's what is possible.

Related

Excel - How to highlight entire row if a specific cell is different from the cell above?

I am trying to apply conditional formatting in excel in which each first occurrence in a column has a highlight on the entire row. The desired result is as follows:
A
B
2
2
8
Highlight this entire row
5
Highlight this entire row
5
7
Highlight this entire row
I currently have the formula "A3<>A2", but that highlight the last occurrence instead of the first. I don't know how to apply the highlight to all cells on the same row.
UPDATE: Apparently excel behaves differently for text and numeric values. My data looks like this:
A
B
Apple
Apple
Banana
Highlight this entire row
Kiwi
Highlight this entire row
Kiwi
Apple
Highlight this entire row

Conditional Formatting - Entire Row For First Occurrence in Column
The issue you are facing is that you have selected e.g. the range 2:10 (focus on 2) and e.g. you are applying the corrected ($) formula =$A3<>$A2 which does what is expected: it highlights the last occurrence in a group i.e. if the value of the next row (3) is different than the value of the current row (2) then highlight row 2.
To highlight the first occurrence in a group, you need =$A2<>$A1 as correctly posted by user11222393 since the first row you selected is row 2 i.e. if the value of the previous row (1) is different than the value of the current row (2) then highlight row 2.
My solution will work similarly if the data is sorted. It will not highlight the first row of repeating groups though, as illustrated in the screenshot below.
You will notice the difference between the solutions best by sorting the data in another column. Mine should have fewer highlighted rows.
Usage
Select the entire rows of the range and goto Home -> Conditional Formatting -> New Rule -> Use formula... (you know the drill) and e.g. use
=COUNTIF($A$2:$A2,$A2)=1
for the first row being row 2.

=$A2<>$A1
Take a look at $ symbols. If you don't use it it will highlight only 1 cell, because it will compare A2 to A1, B2 to B1 and so on.
Result:

How to filter rows by duplicate column values?

In column C, there are duplicate values that I don't want. How do I go about filtering the rows by the condition - Show the rows where there are duplicate values in column C?

I would add a helper column at the end with =COUNTIF(C:C;C1) and copy it down behind every row with data. Then filter on >1.
Or you can create a pivottable and put column C in the values as a count.

Easiest way is to add conditional formatting for duplicates and then filter on colour - it doesn't involve changing your data set at all and is an easy visual cue as to where duplicates exist. However, similar to the answer above, this will filter out both the original and the duplicate record.
If you want one of the rows to show (i.e. 1 of each value in column C), then a new column with the COUNTIF formula applied to the range starting 1 cell below or above would work. So if the new column was "D", in cell "D2", you would use either;
Cell above:
= COUNTIF($C$1:$C1,$C2)
OR
Cell below:
= COUNTIF($C3:$C$9000,$C2)
Note: if you use the count from the cell below (i.e. to show the last row where that value is found), you will need to assign a "finish row" - I used 9000, but you could go to 1048576 if you want to include whole sheet.
You would then filter column D to only show 0s
These solutions assume you still want to keep the records, just hide them. If this is not the case, there are easy ways to delete duplicates.

Selecting every 3rd row in excel

I am looking to select every third blank row in excel. Once I do that, I need to enter a formula into this third blank row that extracts the contents of a cell below. I would like to have it so this will be done for every third blank. A macro would be fine, I am just not familiar with VBA code so I am not sure where to start.
You'll notice that every third blank row contains the ID from column a in the row below it, and the name from column g below the third blank. Any ideas of how this can be done efficiently?

Just add a column which repeats every 3 rows and filter on that!
You may also be looking for Pivot Tables

Add two columns before column "A", so that your id column becomes column "C".
Now fill all cells with value 1 till the last of your data range in column "A".
In cell "B1" use below formula & fill down till your data
=ISNUMBER(D1)
Now add filter ( Ctrl + Shift + L ). And filter data in column "B" with "FALSE"
If you can follow these steps exactly, you will get all rows you want.
Then use this formula in Than apply filter.

Excel not deleting all selected rows

I have a spreadsheet with a couple hundred rows, and some of the cells contain the text "N/A". I'd like to delete the full row of any cell that has "N/A" in it.
My first thought was to use Find All and then once all the relevant cells are selected, I can do Ctrl - and select "entire rows". However this usually leaves a bunch of cells with "N/A". Why is this?

Insert a column (or use last available blank column).
Use equation IFERROR(SEARCH("N/A", A2),"No Match") where I am
assuming Column A has your strings that may contain N/A
Drop Equation down to used range
Filter your new helper column and remove No Match
Delete rows of visible cells
Output below. Any numeric value in Column B means a match of N/A was found.

Delete duplicate items in Excel (including original value)

How to delete duplicate items in excelsheet(column), where it has to delete the items which has more than one occurrence:
1
2
3
3
4
4
If we use remove duplicate option, it will give distinct values, but what should be done to get only values
1
2
Since 1 and 2 values are not duplicated and these two have only one occurance in a Excel(column)

Follow Below Steps.
Consider you have data in column A
Write formula as =IF(COUNTIF(A:A,A1)=1,0,1) in column B.
Apply Step 3 for all rows that are there.
Wherever you have duplicate data, you will see 1 in column B else you will see value as 0. :)
Go To menu Data and apply filter for 1. Those are the rows that are duplicate. Want to delete it?? Delete it :)
Here is the demo

How about Conditional Formatting --> Highlight cell rules --> Duplicate values
Duplicate values now highlighted. Apply filter, sort by colour, delete all highlighted cells - that only leaves unique values.

The COUNTIF macro is pretty slow for long columns. If you need something faster, you can do the following:
Sort the column of values in column A.
In Column B next to it, add the formula =IF(A1=A2,0,1)
Double-click the + icon at the bottom right of the formula's selection box to apply the formula to the whole column.
Add a filter while both the columns are selected.
Click the column B filter arrow button to show only values where Column B = 1
This will flag all of the transitions from one value to the next with a 1. Then the filter will only show those rows with a transition. The resulting filtered column A will contain only the unique values.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Finding unique values between 2 Excel columns - excel

Related

Excel - How to highlight entire row if a specific cell is different from the cell above?

How to filter rows by duplicate column values?

Selecting every 3rd row in excel

Excel not deleting all selected rows

Delete duplicate items in Excel (including original value)

Categories

Resources