Excel array formula to find duplicate row across multiple columns - excel-formula

is there a way to indicate duplicate rows across multiple columns using an array formula?
Data:
AA1 BB1 CC2 duplicate
AA1 BB2 CC1
AA1 BB1 CC2 duplicate
AA1 BB1 CC1
In the above table, rows 1 and 3 are the ones I need to indicate, by putting "duplicate" in column 4.
I know of the remove duplicates functionality in Excel, but I have to see the duplicate lines before actually deleting them. Also, adding a hidden helper column is not an option because of what happens with the file further down in the process...
If data was just in one column, a countif formula would work. So I was hoping some sort of countif(col1 & col2 & col3, range(A:A & B:B & C;C)) could do the trick...
Thanks!

You have to understand what does a duplicate mean. It means if there is occurrence of any more occurrences of the original value. In you example, the first row is NOT a duplicate because it does not have any occurrences before. The next value is a duplicate because it has a second occurrence. I have prepared for you a method to extract out duplicates and mark them as need.
Formula in cell D1:
=CONCATENATE(A1,B1,C1)
Formula in cell E1:
=COUNTIF( D$1:D1, D1 )
Formula in cell F1:
=IF(E1>1,"Duplicate","")
--Edit:
If you want to show all duplicates(including the original value)
Formula in cell D1:
=CONCATENATE(A1,B1,C1)
Formula in cell E1:
=IF(COUNTIF($D$1:$D$4,D1)=1,0,1)
Formula in cell F1:
=IF(E1>0,"Duplicate","")
Cheers!

It;s not necessary here for array formula COUNTIFS will do the job.
=COUNTIFS($A$1:$A$4,A1,$B$1:$B$4,B1,$C$1:$C$4,C1)

To your point where removing the duplicate lines is the objective, not deleting all rows including the first occurrence, and a helper column is not an option, here is how to achieve it.
Using a slightly different formula from Adirmola's answer:
At column D, observe how the addresses are locked... e.g. A$1:A1... for formula at row 1. As you fill down the formula, the left part row number stays the same, but the right part row number increases. Therefore counting the instance of the duplicate occurence.
Then if adding a helper column is not an option, lets bring in the conditional formatting for the purpose of highlighting those 2nd, 3rd, 4th.. occurence, filter by color, and delete them.
Here is how, you will first select the region where the duplicates occur. The active cell (cell in white instead of grayed of the selected region) must be at the first row of the selection.
Add a conditional formatting, using the same formula in column D above for row 1, but this time, lock all the columns, and put a condition >1 behind.
Apply the condition, and you can go ahead and filter by color and delete the duplicates!
Additional info: COUNTIF and COUNTIFS is a very inefficient formula for very large data (about 10,000 rows above depending on how many columns involved). You may feel slow Excel response so it might be a good idea to delete the formula away after removing the duplicate rows. Otherwise, add a double quote to disable the formula so that it can be reused next time. ="COUNTIFS($A$1:$A1,$A1,$B$1:$B1,$B1,$C$1:$C1,$C1) > 1"
Hope this helps

Related

How do I rank data in descending order while ignoring blank cells?

I have a formula which will rank all of the data in a column correctly and ignores blank cells EXCEPT when the value in one of the ranked cells is 0. Then all of the cells in the column with a 0 value AND all of the blank cells in the same column are ranked equally.
Ranking Sheet 1
To get my values in column "G" I'm using: =if(isblank(E2),"",sum(F2:F7))
To rank them I'm using: =IF(ISNA(RANK(G2:G7,G$2:G$61)),"",RANK(G2:G7,G$2:G$61))
I need the blank cells to be ignored even when one of the ranked cells has a value of 0.
I thought that maybe something in the formula used to calculate the values in column "G" was the issue so I deleted the formula in the blank cells (G20:G25 & G26:G31). Nothing changed.
I modified the ranking formula in column "H" by adding "<>" like so,
=IF(ISNA(RANK(G2:G7,"<>",G$2:G$61)),"",RANK(G2:G7,"<>",G$2:G$61))
but that just left H2:H7 blank.
I really don't know where to go from here because I don't actually know what I'm doing.
Try merging G2 to G7, G8 to G13, ... and then use the formula:
=IF(ISNA(RANK(G2,G$2:G$61)),"",RANK(G2,G$2:G$61)).
Since your condition is for values in column E to be blank, why do you not simply reuse that?
=IF(ISBLANK(E2),"",IF(ISNA(RANK(G2:G7,G$2:G$61)),"",RANK(G2:G7,G$2:G$61)))

Unique values on multiple columns

I need to get a list of unique values from few columns. The data looks like this:
I tried using Unique but it only gives me copy of the list. These 4 lists are already unique values found in another sheet.
If getting unique from 4 columns is impossible, how would I go around combining these 4 columns but instead merging data from a row to 1 cell I'd like to append that one column to have one city per row (so add more rows).
Another idea I had - pulling data from multiple sheets into 1 row, but as it's an automated report, number of towns in each sheet changes every time, so can't use specific cell locations.
You can use the following formula (entered into cell F2 and assuming your data is in range A1:D5)
=IFERROR(LOOKUP("zzzzz",INDEX(IF(COUNTIF(F$1:F1,A$2:D$5),0,A$2:D$5),MIN(IF(COUNTIF(F$1:F1,A$2:D$5),"",ROW(A$2:D$5)-ROW(A$2)+1)),0)),"")
As it is an array formula it needs to be entered using Ctrl+Shift+Enter and copied down until there are blank cells
It does not work in my case, in calc (LibreOffice) that i have exactly the same problem. To extract unique values from multiple columns with text, with blank spaces. I have tried a lot of formulas with no success...
What worked for me - I dont know why this worked for me!
A, B, and C contains text from the 2nd row and onwards. Formula is put on D2 cell as Array formula.
=IFERROR(IFERROR(IFERROR(
INDEX($A$2:$A$20; MATCH(0; COUNTIF($D$1:D1; $A$2:$A$20)+($A$2:$A$20=""); 0));
INDEX($B$2:$B$7; MATCH(0; COUNTIF($D$1:D1; $B$2:$B$7)+($B$2:$B$7=""); 0))
);
INDEX($C$2:$C$12; MATCH(0; COUNTIF($D$1:D1; $C$2:$C$12)+($C$2:$C$12=""); 0))
);
"")
Found here

excel SUMIFS only on same date

I'm trying to create a formula in column K which sums all cells that apply , in column J, only when the following conditions are true:
dates are the same in column A
AND client name is the same in column B
For example, in cell K2, I want the sum of J2+J3+J4 because A2=A3=A4 and B2=B3=B4.
K5=J5 only, because there are no other dates with the same client name.
K6=J6+J7 because A6=A7 and B6=B7.
What kind of formula would I use for this? I can't figure out how to do it with a SUMIFS.
I would try using a pivot table with:
The names as row values
The dates as the column values
And funds received using SUM in the values column
Edit
Based on #pnuts comments here is how to get the values in column K. Put this in K2 and drag down.
=IF(OR(COUNTIFS($B$1:B3, B3) = 1, B3 = ""), SUMIFS($J$2:J2, $A$2:A2, A2, $B$2:B2, B2), "")
This formula will give blank values until the formula finds a new client on a new date. However, I still think using pivot table is a better solution.
However, I still find the pivot table
In cell K2 put following formula:
=IF(COUNTIFS($A$2:A2,A2,$B$2:B2,B2)=1,SUMIFS($J$2:$J$10,$A$2:$A$10,A2,$B$2:$B$10,B2),"")
Adjust row 10 value. It will be last row of your actual data.
Copy down as much you need.
EDIT
Uploaded file shows the cause behind formula not working correctly for you. It turned out to be whitespace characters in column B (names) data e.g.
Cell B3: "Moe John" has a trailing space.
Cell B10: Same case with "Doe Jane"
If you want to use above posted formula then all names shall be corrected. Or alternatively to deal with spaces you can adopt below approach.
=IF(COUNTIFS($A$2:A2,A2,$B$2:B2,"*"&TRIM(B2)&"*")=1,SUMIFS($J$2:$J$28,$A$2:$A$28,A2,$B$2:$B$28,B2),"")
Notice the change in COUNTIFS formula where B2 is now replaced with "*"&TRIM(B2)&"*".
Even such formula will take a beating if you have uneven whitespace characters in between your data. I'd suggest normalizing it as much as possible.

Check the number of unique cells in a range

I have an excel sheet.
Under column E, I have 425 cells with data. I want to check if the same data (i.e. text inside the cell) is repeated anywhere else in any of the remaining 424 cells under column E. How do I do this?
For example, in E54 I have
Hello Jack
How would I check this value to see if it was in any other of these cells?
You could use
=SUMPRODUCT(1/COUNTIF(E1:E425,E1:E425))
to count the number of unique cells in E1:425
An answer of 425 means all the values are unique.
An answer of 421 means 4 values are duplicates of other value(s)
Use Conditional Formatting on all the cells that will highlight based on this formula:
COUNTIF(E:E,E1) <> 1
This is based on the column being E, and starting on E1, modify otherwise.
In Excel 2010 it's even easier, just go into Conditional Formatting and choose
Format only unique or duplicate values
If you have to compensate for blank cells, take the formula supplied above by #brettdj and,
Adjust the numerator of your count unique to check for non-blanks.
Add a zero-length string to the COUNTIFS's criteria arguement.
=SUMPRODUCT((E1:E425<>"")/COUNTIF(E1:E425,E1:E425&""))
Checking for non-blank cells in the numerator means that any blank cell will return a zero. Any fraction with a zero in its numerator will be zero no matter what the denominator is. The empty string appended to the criteria portion of the COUNTIF is sufficient to avoid #DIV/0! errors.
More information at Count Unique with SUMPRODUCT() Breakdown.
This formula outputs "unique" or "duplicates" depending if the column values are all unique or not:
{=IF(
SUM(IF(ISBLANK(E1:E425),0,ROW(E1:E425)))
=
SUM(IF(ISBLANK(E1:E425),0,MATCH(E1:E425,E1:E425,0)))
,"unique","duplicates")}
This is an array formula. You don't type the enclosing {} explicitly. Instead you enter the formula without {} and then press cmd-enter (or something else if not a Mac - go look it up!) If you want to split your formula over multiples lines for readability, use cmd-ctrl-return on a Mac.
The formula works by comparing two SUM() results. If they are equal, all the nonblank entries (numeric or text) are unique. If they are not equal there are some duplicates. The formula does not tell you where the duplicates are.
The first sum is what you get by adding up the row numbers of every non-blank entry.
The second sum does a lookup of each nonblank entry using MATCH(). If all entries are unique, MATCH() finds each entry at its own position, and the result is the same as the first sum. But if there are duplicate entries then a later duplicate will match an earlier duplicate and the later duplicate will contribute a different value to the sum, and the sums won't match.
You might have to adjust this formula:
if you want cells containing "" to count as blank, then use LEN(...)=0 for ISBLANK(...). I suppose you could put other tests in there if you wanted, but I have not tried that.
if you want to test an array not starting at row 1, then you should subtract a constant from ROW(...).
if you have a huge column of cells, you might get integer overflow when computing this sum. I don't have a solution to that.
It's a shame that Excel does not have an ISUNIQUE() function!
This may be a simpler solution. Assume column A contains data in question. Sort on that column. Then, starting in B2 (or first non-blank cell, use the following formula:
=IF(A2=A1,1,0).
Than sum on that column. When sum = 0, all values are unique.
highlight E and on the home tab select conditional formatting > Highlight Cell Rules > Duplicate Values...
It will then highlight everything that is repeated.

How to use IF and SUM in excel to count unique entries in a row?

Basically I have a large set of data in excel, and I was wondering how to count across a row how many cells are not #N/A?? I think it should be possible with IF and SUM but I'm not entirely certain.
To count all values except blanks and #N/A errors try COUNTIFS like this for data in row 2
=COUNTIFS(2:2,"<>#N/A",2:2,"<>")
If you don't want to count duplicates then this version will give you a count of all different values (except blanks and errors)
=SUM(IF(1-ISERROR(2:2),(2:2<>"")/COUNTIF(2:2,2:2&"")))
that's an "array formula" that needs to be confirmed with CTRL+SHIFT+ENTER
Note that the first formula uses COUNTIFS function and therefore will not work in versions of excel before 2007 - this is an alternative that will work in those versions
=COUNTA(2:2)-COUNTIF(2:2,"#N/A")
Try using =COUNTIF(RANGE, VALUE), here's an example that will count the numer
=COUNTIF(A:A, "Yes")
or
=COUNTIF(A1:D16, "Yes")
To count the cells that contain a value (I.E., are not empty) then use `=COUNTA(A:A)
When you want to "mark" the duplicates, use this in an empty column:
=COUNTIF($A$2:$A2,A2)>1
Puth the formula is row 2 and copy this all the way down to the last used row.
(What I usually do: Somewhere in column A, press [Ctrl]+[Down], to jump to the last item, then move sideways to the column where you want to put your formula in and put something e.g. an "X". Then jump all the way up [Ctrl]+[Up], put the formula in row 2, copy it and press [Shift]+[Ctrl]+[Down] to mark the wole range in this column from row 2 to the last used row, and press [Enter] to paste your formula.)
In this formula, the search area increases, the further you copy this down.
So this first time a duplicate item is found, the value will be 1 (i.e. false) the second, third or more times this duplicate item is found, the value will be greater than 1 and give a value of true.

Resources