Delete duplicate items in Excel (including original value) - excel

How to delete duplicate items in excelsheet(column), where it has to delete the items which has more than one occurrence:
1
2
3
3
4
4
If we use remove duplicate option, it will give distinct values, but what should be done to get only values
1
2
Since 1 and 2 values are not duplicated and these two have only one occurance in a Excel(column)

Follow Below Steps.
Consider you have data in column A
Write formula as =IF(COUNTIF(A:A,A1)=1,0,1) in column B.
Apply Step 3 for all rows that are there.
Wherever you have duplicate data, you will see 1 in column B else you will see value as 0. :)
Go To menu Data and apply filter for 1. Those are the rows that are duplicate. Want to delete it?? Delete it :)
Here is the demo

How about Conditional Formatting --> Highlight cell rules --> Duplicate values
Duplicate values now highlighted. Apply filter, sort by colour, delete all highlighted cells - that only leaves unique values.

The COUNTIF macro is pretty slow for long columns. If you need something faster, you can do the following:
Sort the column of values in column A.
In Column B next to it, add the formula =IF(A1=A2,0,1)
Double-click the + icon at the bottom right of the formula's selection box to apply the formula to the whole column.
Add a filter while both the columns are selected.
Click the column B filter arrow button to show only values where Column B = 1
This will flag all of the transitions from one value to the next with a 1. Then the filter will only show those rows with a transition. The resulting filtered column A will contain only the unique values.

Related

How to filter rows by duplicate column values?

In column C, there are duplicate values that I don't want. How do I go about filtering the rows by the condition - Show the rows where there are duplicate values in column C?
I would add a helper column at the end with =COUNTIF(C:C;C1) and copy it down behind every row with data. Then filter on >1.
Or you can create a pivottable and put column C in the values as a count.
Easiest way is to add conditional formatting for duplicates and then filter on colour - it doesn't involve changing your data set at all and is an easy visual cue as to where duplicates exist. However, similar to the answer above, this will filter out both the original and the duplicate record.
If you want one of the rows to show (i.e. 1 of each value in column C), then a new column with the COUNTIF formula applied to the range starting 1 cell below or above would work. So if the new column was "D", in cell "D2", you would use either;
Cell above:
= COUNTIF($C$1:$C1,$C2)
OR
Cell below:
= COUNTIF($C3:$C$9000,$C2)
Note: if you use the count from the cell below (i.e. to show the last row where that value is found), you will need to assign a "finish row" - I used 9000, but you could go to 1048576 if you want to include whole sheet.
You would then filter column D to only show 0s
These solutions assume you still want to keep the records, just hide them. If this is not the case, there are easy ways to delete duplicates.

Conditional Formatting alternate rows and matching rows in Excel

I'm trying to apply conditional formatting to my data where I need to color the rows based on certain columns. If the current and previous rows have same data in 4 particular columns, I will color those rows. But I also need to apply this color to alternate rows.
So the result I need is like the format in the image below:
As in the sample image above, first two rows have same values in column Name1, Name2, Type_Name and Type_Code, they are colored. Then, the next row is skipped from coloring. And then the next row even if it does not have a matching row above or below, it will be colored. Then rows with Rita in Name1 are skipped.
So far I'm able to get to the rows with same values in the 4 columns and color the alternate rows, both the logics separately, but unable to apply the mix of both properly. Below are the logics applied so far.
This one, where the rows have same values in the 4 required columns, using the formula
=OR($H2&$I2&$J2&$K2 = $H1&$I1&$J1&$K1, $H2&$I2&$J2&$K2 = $H3&$I3&$J3&$K3)
And alternate rows colored with the formula
=MOD(ROW( ),2)=0
I would first add a helper column which separates the groups.
This is done by checking if the relevant columns of the row is the same as the row above. If it is, we simply take the max value of the column, if it is different, we increment the max value by 1. We can then apply the conditional formatting if this helper column is an odd value:

Finding unique values between 2 Excel columns

I have two simple columns in Excel below
ID ID1
123 123
124 125
125 126
126
I was able to use the conditional formatting in excel as follow:
1. Highlight the 2 columns
2. Click on the conditional formatting
3. New rule
4. Select format only unique or duplicate values
5. Select unique under format all:
6. Select Format and click ok.
I can see that the steps above shows the highlighted value which is 124.
My question is, how can I filter out that 124 value from such a small sample above?
I have a columnA that has 50k records and columnB that has 48k records. I want to see or filter out the 2k records from columnA.
Here's one approach using match and a filter.
Enter =MATCH(A1,B:B,) in column c add a filter to row 1 and filter for column C values that are #N/A the values in column A which say #N/A in column C are not in column B.
An alternative approach would be to move the column B under Column A and use COLUMN b to define the source ID, ID1... and then use a pivot table to show you both sets and what could be missing from each...
Notice how 5 is not IN ID while 13 is in ID but not in ID1.
also notice that duplicates on Row 5 show up as well.
One way of doing this (kind of manual) is to find the values which are duplicates (the ones that are highlighted). Then select both the columns and all the data.
Then go to: Home > Editing group> Sort & Filter > Filter
There should be small, down arrows on the column headers. Then you can un-check the duplicates.
EDIT
Thanks to xQbert for pointing out my mistake. Here is a way to solve this:
If possible, you can move the second column to another worksheet. Now use the following formula in a column next to the first column:
=COUNTIF(Sheet2!A2:A5,Sheet1!A2:A5)
Just change the cells to the ones for your table. The first parameter is the second column (which you should have put in the new sheet). The second parameter is the first column which will be highlighted. This will put a '1' next to each value which is duplicated.
Then you can use conditional formatting to highlight the cells with a '1' next to them using this formula in the "New Rule":
=B2=1 That is the cell number of the first cell in the first column. It should be highlighted to the colour you set it to. To copy the formatting to the rest of the cells, click on the first cell B1. Then go Home > Format Painter. Drag the formatting to the entire column.
You can then use the filter to show only the cells with "No Fill"
You will have to do the same for the data in the new sheet.
This was a very 'hacky' solution but it's what is possible.

Excel: Find duplicates in column with differences in another column

I want to highlight cells in column A, which have duplicates in column A but a difference in column B.
A B
1 2 -
2 3 +
3 2 -
2 4 +
1 2 -
3 2 -
4 5 -
The rows (or a cell within the row) with the - shall not be highlighted, but the rows (or a cell within the row) with the + shall be highlighted.
How can I accomplish this in an Excel formula?
Please pay attention to the fact, that not all unique combinations shall be highlighted (last row!).
In SQL the corresponding query would be something like this:
SELECT *
FROM table
GROUP BY A
HAVING COUNT(B) > 1
A simpler solution might be to use Concatenate to join A and B together and use a conditional formating to highlight the unique values. This would leave your desired list highlighted:
For the Conditional Formatting highlight column C then navigate:
Home-> Conditional Formatting -> New Rule-> Format only unique or duplicate values
Then change selection from "duplicate" to "unique" and select the desired format. Apply the setting and have identified the appropriate rows.
Assuming your data is in A1:B7, (with "A" and "B" as headers on row 1):
I used the following formulas to get the matches ..
I just did a simple search after, and before .. if it finds a record above or below, it "flags" it in column F as TRUE.
Not sure it works for 3 or more duplicates, though you didn't seem to indicate how you wanted a 3 of a kind to work ;)
D2=MATCH(A2,A3:$A$1000,0)
E2=IF(ISERROR(D2),IF(ISERROR(G2),"",OFFSET($A$1,G2,0,1,1)),OFFSET(B2,D2,0,1,1))
F2=AND(NOT(AND(ISERROR(D2),ISERROR(G2))),B2<>E2)
G2=MATCH(A2,$A$1:A1,0)`
D col locates the first matching A column after the current row.
G col locates the first matching A Column prior to current row.
E col pulls that remote B column value to current row to more easily check.
F col puts the logic together: If we found something, and B cols are not equal.
Here is another way to do it assuming your above data is in cells A2:B7:
1) Copy and paste your column A values to a blank section of your workbook(Lets say A11) and perform the following function Data->Remove Duplicates with the section selected.
2) Highlight cells B10:B13(all cells where a value is in column A) and type in the following formula:
=FREQUENCY(A2:A8,A10:A13)
Hit Ctrl + Shift + Enter to make this an array.
3) Similar to step two highlight all cells in column C where there is data in columns A and B. In this case C2:C7 and use the following formula:
=IF(VLOOKUP(A2,$A$10:$B$13,2,FALSE)>1,IF(FREQUENCY(VALUE(CONCATENATE($A$2:$A$7,$B$2:$B$7)),VALUE(CONCATENATE($A$2:$A$7,$B$2:$B$7)))<>1,"","Highlight"),"")
Hit Ctrl + Shift + Enter to make this an array.
Your cells that need to be highlighted will now say "Highlight"

How to get the highest values from 2 columns in Excel?

I have a design software which extracts data in to an Excel sheet format
The output is divided into 2 columns, each of these columns has more than 1000 rows.
To make use of this data I need to summarize it to a maximum of the 5 highest values from both of the 2 columns. Therefore, this doesn't mean that it's the maximum of one column and its corresponding value, but it may mean that the 2nd largest value of column 1 & the 4th largest value of column 2.
For example ( if we quoted some of the output data):
The values i should pick here are:
If there is any possible way to achieve that, it will be great
Thanks ..
example file: http://goo.gl/UIEFEv
example file 2: http://goo.gl/VSvuVf
Here's a formula solution. I used 20 rows and extracted the rows which contain the top 5 for each column - you can extend to as many rows as required.
With data in A1:B20 use this formula in D1 confirmed with CTRL+SHIFT+ENTER and copied across to E1 and down both columns:
=IFERROR(INDEX(A$1:A$20,SMALL(IF(($A$1:$A$20>=LARGE($A$1:$A$20,5))+($B$1:$B$20>=LARGE($B$1:$B$20,5)),ROW(A$1:A$20)-ROW(A$1)+1),ROWS(D$1:D1))),"")
Note: there are only eight rows extracted because some of the rows contain values in the top 5 for both columns. I added the highlighting in colums A and B to more clearly illustrate
see screenshot below
Edit:
From the comments below it seems that you want a combination of rows which contain the highest value for that column....and rows which contain the highest total for both columns.
In the original formula there are two conditions joined with "+", i.e.
($A$1:$A$20>=LARGE($A$1:$A$20,5))+($B$1:$B$20>=LARGE($B$1:$B$20,5)
The "+" gives you an "OR" type functionality, e.g. in this case rows are included if individual values are in the top 5 in that particular column. You can add other conditions, so if you want to also add any rows which are in the top 5 considering the total of both columns then you can add another "clause", i.e.
($A$1:$A$20>=LARGE($A$1:$A$20,5))+($B$1:$B$20>=LARGE($B$1:$B$20,5)+($A$1:$A$20+$B$1:$B$20>=LARGE($A$1:$A$20+$B$1:$B$20,5))
....and including that in the complete formula you get this version:
=IFERROR(INDEX(A$1:A$20,SMALL(IF(($A$1:$A$20>=LARGE($A$1:$A$20,5))+($B$1:$B$20>=LARGE($B$1:$B$20,5))+($A$1:$A$20+$B$1:$B$20>=LARGE($A$1:$A$20+$B$1:$B$20,5)),ROW(A$1:A$20)-ROW(A$1)+1),ROWS(D$1:D1))),"")
You could refine that further by using combinations of + and * (for AND), e.g. for the new condition you might only want to include rows with a total in the top 5 if one of the single values is in the top 10 for that column...
Explanation:
The above part shows how you can use + for the OR conditions. In the formula if those conditions are TRUE then the IF function returns the "relative row number" of the range (using ROW(A$1:A$20)-ROW(A$1)+1).
SMALL function then extracts the kth smallest value, k being defined by ROWS(D$1:D1) which starts at 1 in D1 (or E1) and increments by 1 each row.
INDEX function then takes the actual value from that row.
When you run out of qualifying rows SMALL function will return a #NUM! error which IFERROR here converts to a blank
The question is a little unclear but if what you mean is to get the 5 highest values of Column A and their corresponding values in Column B then the five highest values in Column B and the corresponding values in Column A then the (non automated) solution is pretty simple.
Click on a cell with a header title in it.
Click on 'Data' in the top menu.
Click on 'Filter' in the 'Sort & Filter' section.
Click on the button on Column A - select 'Sort Largest to Smallest'
Grab the top five values from both columns then click on the button in column B and repeat.

Resources