Delete reciprocal duplicates in Excel - excel

I'm trying to find a quick way to delete reciprocal duplicates between two columns.
For example, COG00035 is in column A and COG00065 is in B.
I have to look to make sure that further down there isn't A:COG00065 & B: COG00035.
I would do this manually but there is literally thousands of rows I would to look for. And the entire row has to be deleted as A and B have to stay together. Thanks!
If you need a better example let me know.

Have you tried Data > Remove Duplicates?
You should be able to select what columns you want to include in the comparison. I assume in this case you'd check both the columns you have and if there is duplicate rows matching both entries then they will be removed.
Short overview of this on the Office website.

Two-step process:
1) Remove duplicates in the same order
Add a third column in C that combines the first two, or:
=concatenate(A1, B1)
And drag that all the way down to the end of your data. Then, select your entire data table (from A1 all the way to C# where # is the last row number), go to Data->Remove duplicates, and unselect Columns A and B in the next pop-window (only column C should be selected).
Press "OK" - this should remove any row that has a duplicate in column C, which essentially means columns A and B are the same.
2) Remove duplicates in reciprocal order
Add a fourth column that combines the first two in reverse order, or:
=concatenate(B1, A1)
In the fifth column, add a formula that counts whether any individual cell in Column C is found in Column D, restricted the area in Column D that is below the row of that individual cell.
For example:
Formula in C1: =CONCATENATE(A1,B1)
Formula in D1: =CONCATENATE(B1,A1)
Formula in E1: =COUNTIF(D2:$D$100,C1)
...assuming your table has 100 cells (e.g., $D$100 is last row). What this will do is show 0s in Column E for rows that are unique, and 1s for rows that are duplicates. You can then filter column E to show the 1s and then delete all the visible cells (Home -> Find & Select -> Go To Special... -> Visible cells only; Ctrl + - to delete rows)

Related

How to find non matching records from two columns while accounting for duplicate values in Excel

I have two large columns.
Column A contains 100,000 different numbers/rows. Column B contains 100,210 numbers/rows. They have the same numbers except column B has 210 extra rows. I need to be able get the values of that extra 210 rows.
The issue im having is that the numbers in these rows are not unique.
For example,
Column A contains the following numbers: 2,1,3,4,5,5,6,7
Column B contains the following numbers: 1,2,3,4,5,5,5,5,6,6,7,8
I want the outcome result to be: 5,5,6,8
I can't seem to wrap my head around a way to do this.
I have the two columns in a text file that im importing into excel. If there are better ways to do it outside of excel, I am open to it too.
With the Dynamic Array formula Filter:
=FILTER(B1:B12,COUNTIF(OFFSET(B1,0,,SEQUENCE(ROWS(B1:B12))),B1:B12)>COUNTIF(A:A,B1:B12))
Without FILTER:
Put this in the first cell and copy down:
=IFERROR(INDEX(B:B,AGGREGATE(15,7,ROW(B1:B12)/(COUNTIF(OFFSET(B1,0,,ROW(INDEX($ZZ:$ZZ,1):INDEX($ZZ:$ZZ,ROWS(B1:B12)))),B1:B12)>COUNTIF(A:A,B1:B12)),ROW($ZZ1))),"")
Try to follow these steps, supposing that Column A has less values than the Column B and the rows start at 1:
A. Create Column C.
In the cell C1 place the function: =COUNTIF(A:A;B1)
Copy this function to the rest of cells, for all items of Column B. So, cell C2 will have the function =COUNTIF(A:A;B2) and so on.
B. Create column D.
In the cell D1 place the function: =COUNTIF($B1:$B1;B1)
Copy this function to the rest of cells, for all items of Column B. So, cell D2 will have the function =COUNTIF($B$1:$B2;B2) and so on.
C. Create column E.
In the cell E1 place the function: =IF(D1<=C1,"Exists","Missing")
Copy this function to the rest of cells, for all items of Column B. So, cell E2 will have the function =IF(D2<=C2,"Exists","Missing") and so on.
D. Filter to show only the rows that Column E values are "Missing".
Of course you can combine all above 3 columns to one (e.g. in Column F), so these cells will have the functions:
F1: =IF(COUNTIF($B$1:$B1,B1)<=COUNTIF(A:A,B1),"Exists","Missing")
F2: =IF(COUNTIF($B$1:$B2,B2)<=COUNTIF(A:A,B2),"Exists","Missing")
and so on
Explanation:
In column C we count how many times the value of the respective cell
of Column B exist in the whole Column A.
In Column D we count how many times we have "met" this value in Column B so far.
In Column E we check if we have "met" the value more times that it exists in Column A. If indeed we have "met" it more times, then we mark the cell as "missing"
Tested with the example you provided and works okay.
I hope it helps!
Good luck!
EDIT - Addition of Screenshot

Excel: Find duplicates in column with differences in another column

I want to highlight cells in column A, which have duplicates in column A but a difference in column B.
A B
1 2 -
2 3 +
3 2 -
2 4 +
1 2 -
3 2 -
4 5 -
The rows (or a cell within the row) with the - shall not be highlighted, but the rows (or a cell within the row) with the + shall be highlighted.
How can I accomplish this in an Excel formula?
Please pay attention to the fact, that not all unique combinations shall be highlighted (last row!).
In SQL the corresponding query would be something like this:
SELECT *
FROM table
GROUP BY A
HAVING COUNT(B) > 1
A simpler solution might be to use Concatenate to join A and B together and use a conditional formating to highlight the unique values. This would leave your desired list highlighted:
For the Conditional Formatting highlight column C then navigate:
Home-> Conditional Formatting -> New Rule-> Format only unique or duplicate values
Then change selection from "duplicate" to "unique" and select the desired format. Apply the setting and have identified the appropriate rows.
Assuming your data is in A1:B7, (with "A" and "B" as headers on row 1):
I used the following formulas to get the matches ..
I just did a simple search after, and before .. if it finds a record above or below, it "flags" it in column F as TRUE.
Not sure it works for 3 or more duplicates, though you didn't seem to indicate how you wanted a 3 of a kind to work ;)
D2=MATCH(A2,A3:$A$1000,0)
E2=IF(ISERROR(D2),IF(ISERROR(G2),"",OFFSET($A$1,G2,0,1,1)),OFFSET(B2,D2,0,1,1))
F2=AND(NOT(AND(ISERROR(D2),ISERROR(G2))),B2<>E2)
G2=MATCH(A2,$A$1:A1,0)`
D col locates the first matching A column after the current row.
G col locates the first matching A Column prior to current row.
E col pulls that remote B column value to current row to more easily check.
F col puts the logic together: If we found something, and B cols are not equal.
Here is another way to do it assuming your above data is in cells A2:B7:
1) Copy and paste your column A values to a blank section of your workbook(Lets say A11) and perform the following function Data->Remove Duplicates with the section selected.
2) Highlight cells B10:B13(all cells where a value is in column A) and type in the following formula:
=FREQUENCY(A2:A8,A10:A13)
Hit Ctrl + Shift + Enter to make this an array.
3) Similar to step two highlight all cells in column C where there is data in columns A and B. In this case C2:C7 and use the following formula:
=IF(VLOOKUP(A2,$A$10:$B$13,2,FALSE)>1,IF(FREQUENCY(VALUE(CONCATENATE($A$2:$A$7,$B$2:$B$7)),VALUE(CONCATENATE($A$2:$A$7,$B$2:$B$7)))<>1,"","Highlight"),"")
Hit Ctrl + Shift + Enter to make this an array.
Your cells that need to be highlighted will now say "Highlight"

Is there a way to delete cells if their value is contained in another column?

I have a spreadsheet with multiple columns with a few thousand rows and I would like to find the cells that are common across all columns. Is there a function that I can use to check if a cell value exists in a set of cells/column?
To find out if a value exist in all columns but in any row you can put this equation in the next open column and drag down:
=AND(MATCH(A1,B:B,0),MATCH(A1,C:C,0))
This assumes you have data in column A, B & C and the equation is in column D. now you can sort on column D for unique values.
Depending on your data type you might get an error. If that is the case try this:
=AND(IFERROR(MATCH(A1,B:B,0),FALSE),IFERROR(MATCH(A1,C:C,0),FALSE))

Sort row data into columns with same heading in excel 2010

Put simply, I need to sort row data for a specific range into the correct columns based on that columns heading. For example, if there are five columns labelled A through E, and data in the rows below ranging from A through E; I need all of the A's to be in the A column, all of the B's in the B column etc.
Example start data:
How it should look after the sort:
It also must be able to work with the possibility of having empty cells. For example; if the first example data had no B in row 3, the data must not shift over to the left so that C is in the B column etc.
Other info: not feasible to do by hand - over 450 rows.
It also must be able to work with the possibility of having empty cells.
Taking the above into consideration.
NON VBA WAY
Insert enough columns so that the data moves to the right
Next in the row one, duplicate the values from your data
Next in Cell A2 Put this formula
=IF(COUNTIF($H$2:$L$2,A1)>0,A1,"")
Copy the formula to the right
Next remove "$" from the table range and add it to the header in formula in Cell A2 so that we can copy the formula down. This is how it would look
=IF(COUNTIF(H2:L2,$A$1)>0,$A$1,"")
Similarly your B2 formula will look like this
=IF(COUNTIF(H2:L2,$B$1)>0,$B$1,"")
Change it for the rest
How highlight cells A2:E2 and copy the formula down.
Your final Sorted Data looks like this.
Copy columns A:E and do a paste special values on Col A:E itself so that the formulas change into values and then delete Cols H:L

Count values in groups

I have a table with student IDs separated in groups. I need a handy way to count the total number of students in each group and populate it after the last row of each group (marked with ??)
Currently I just enter =COUNT() and then manually figure out the top and bottom borders of the range for each group. Not convenient at all.
I was thinking that a possible solution could be one of the following:
A some kind of pivot table permutation. I failed on this one.
Excel Data->Outline->Subtotals functions. Again, fail. It keeps creating new rows in my table.
A universal formula that can be pasted into each ?? cell. Not the most graceful solution, but still would do.
A macro. As a last remedy if nothing else works.
The following steps will calculate the subtotals while preserving the structuring and formatting of your worksheet.
Put this formula in cell C1 and copy the formula down the column:
=IF(NOT(ISERROR(SEARCH("Total",A1))),COUNTA(INDIRECT("B"&MATCH(LEFT(A1,LEN(A1)-7),A:A,0)+1&".B"&(MATCH(A1,A:A,0)+1))),IF(B1="","",B1))
Apply a conditional format to cell C1 with the formula rule =(MOD(ROW(C1),2)=0) and blue fill to match the shading on the other rows. Copy the format down the column using Paste Special Format.
Either hide column B, or copy the values in column C to column B using Paste Special Values and hide Column C. If you decide to copy the values to column B, you won't need to set the conditional formats.
Here is what the formula does:
First, check whether the formula's row is a Total row, by searching the cell in column A of the row for the word "Total," using the SEARCH function.
If the word "Total" is found:
Determine the range in the worksheet of the student IDs for the group for that total row:
a) Identify the rows in which the words "GroupX" and "GroupX Total" are found by using the MATCH function. With that, you know that the IDs for the group are in a range that starts at, say, row x and ends at row y.
b) With the starting and ending row numbers, construct the address range in which the IDs lie, which has to be the string "B" + (row x) + "." + "B" + (row y).
c) Turn the string into a range reference that can actually used in a formula using the INDIRECT function.
Count the number of students in the group using the COUNTA function and the range, and show that as the formula's result.
If the word "Total" is not found
Check whether the cell in column B is empty
a) If it is empty, show a blank as the formula's result
b) if it is not empty, it must be a student ID, so show the ID as the formula's result.
Add a column (I usually add it to the LEFT of the existing matrix) where you enter a formula from row 2 onwards that fills the blanks in the old column A. Then the old matrix including your new column can be used in a pivot.
So Insert a column left of your matrix, this is column A now. Put a header in Cell A1, for example "Group Name1"
Enter the following formula in cell B2 and extend it to the end:
=IF(B2="",A1,B2) This way your blanks will be filled.
Now apply a pivot on this matrix and there you are.
Maybe not the nicest looking solution, but its quick and works well.
If u have table like this
Students id Name of students group ........
then u can use countif/countifs formula

Resources