excel 2007 duplicate row - excel

How can I compare records in a table, to make sure these records are not duplicates? Using excel 2007 I don't won’t them to delete after comparison.
Duplicates rows should be colored. I have a table columns are from A to P and I have 500 rows. I want to put condition on A, B, E, F, G, I.

If you don't want to sort your column, you can try with a matrix formula (http://www.stanford.edu/~wfsharpe/mia/mat/mia_mat4.htm).
Practically, you can compare your current row to every row above. Somtething like :
=MIN(LINE(B1)*(IF(A2=A1;1;0))*(IF(B2=B1;1;0)))*(...)
validated with CTRL-SHIFT-ENTER will check if all the conditions are true, else, will return 0.
Please send a file (with anonymous data) if you want a practical example.
Hope that helps
Edit : here is the good solution (provided you want to compare data in the Q column) :
=MIN(LIGNE($Q$5:Q6)*EQUIV(Q6;$Q$5:Q6;0))
If you want to have the first line where the value appear
=MIN(LIGNE($Q$5:Q5)*EQUIV(Q6;$Q$5:Q5;0))
If you'd rather have #N/A if there are no duplicate before that line
Still validate with CTRL-SHIFT-ENTER

Sort by the columns you are interested in then use a formula to compare each row with the one above. You can then use conditional formatting to colour the results.

I may sound stupid here, but usually the simple answers are usually the best.
I did this recently, by literally using the CONCATENATE() function with the TEXT() function to combine all the columns I wanted to compare into a single cell. So in effect I am creating a cell with a unique "key" that holds all the data I want to be unique.
I then sort that column and create another empty column next to it.
Then us this formula to compare the row with the row above it: =IF(A2=A1,0,1)
This simply puts a 0 where it's the same row and a 1 where it's different.
I then filter on the '1's and there are my duplicates!
It'a also usefull as an alternative way of doing a unique COUNT(DISTINCT ...) where I want to count how many unique references of my data exists. SUBTOTAL(3...) is not enough.

Related

Find a value in a range (any of multiple column or rows) and return the value in far left most column of that row

Looking for something like a typical index match formula that can look to the right and return value to the left, but look at all columns in a range. Take below valid formula for example.
(Excel 2021.)
Finds A1's value in column D, and it returns value from column C.
=INDEX($C$1:$C$10,MATCH(A1,$D$1:$D$10,0))
In my ideal world I can Keep $D$1 and change $D$10 to $F$10 so it searches all columns D/E/F, and still returns C like below. However that does not work in our real world, any other ideas please? Thanks!
=INDEX($C$1:$C$10,MATCH(A1,$D$1:$F$10,0))
Update*
To clarify there are mix of letters and numbers. Also this table will be about 50k rows so hoping as simple as possible.
Also Column C will all be unique for sure, and D-F should be unique values but there is a chance a mistake and a few duplicates might be in.
You need MMUL() with INDEX(). Try below formula if you have Excel-365.
=FILTER(C1:C10,MMULT(--(D1:F10=A1),SEQUENCE(COLUMNS(D1:F1))))
For older version try
=INDEX($C$1:$C$10,LARGE(MMULT(--($D$1:$F$10=A1),TRANSPOSE({1,1,1}))*ROW($C$1:$C$10),1))
Since your INDEX/MATCH take from the same rows, you can first simplify your original search with
=XLOOKUP(A1,$D$1:$D$10,$C$1:$C$10)
XLOOKUP combines HLOOKUP and VLOOKUP with exact match being the default.
This will work for searching three rows
IFERROR(IFERROR(XLOOKUP(A1,$D$1:$D$10,$C$1:$C$10), XLOOKUP(A1,$E$1:$E$10,$C$1:$C$10)), XLOOKUP(A1,$F$1:$F$10,$C$1:$C$10))
We can name the columns colC, colD, colE, and colF and it becomes
IFERROR(IFERROR(XLOOKUP(A1,colD,colC), XLOOKUP(A1,colE,colC)), XLOOKUP(A1,colF,colC))
As with other lookups, this returns the first value or #N/A error.
This could be made more scalable for higher number of rows if we are allowed to add a column somewhere.

excel matching two arrays

I have a problem that seems pretty easy, but still cannot find a proper solution, I want to avoid using vba.
I have two tables in one spreadsheet. both have the same columns - Name, City, Province.
My goal is compare both and if three out of three values in a row match, then pull "1", if not, pull 0.
I have used the formulas below , but it does not work for my case .
=IF(AND(A2=P:P,G2=M:M,H2=L:L),1,0)
=INDEX(A:P,MATCH(A2,P:P,FALSE),MATCH(G2,M:M,FALSE),2)
=INDEX(L:P,MATCH(A5,P:P,0),MATCH(G5,M:M,0),MATCH(H5,L:L,0))
=SUMPRODUCT(--(L2:L60=H2),--(M2:M60=G2),--(P2:P60=A2),B2:B60)
It seems that the solution is quiet simple , but I cannot find it,
Thanks in advance!
The key here is to merge the columns together, them Match on that.
Like this
=IFERROR( IF( MATCH(H3&"_"&I3&"_"&J3, $C$2:$C$60&"_"&$B$2:$B$60&"_"&$A$2:$A$60,0), "Yes"), "No")
Choose a seperator character that doesn't otherwise appear in your data (I've chosne _)
Assumption: Values just need to exist, not that they need to be of equivalent row.
=If(IfError(Match(A2,P:P,0),0)*IfError(Match(G2,M:M,0),0)*IfError(Match(H2,L:L,0),0)>0,1,0)
For each IfError, you will output a row number (>0) if you match, or if there is no match a zero will be output. Multiply anything by zero and you get zero, whcih allows a 1 or 0 output for true/false in the overarching If-statement.
If they need to be of the same row, you can compare 2 matches, which rely on the transitive property (A=B, B=C, so A=C):
=If(And(Match(A2,P:P,0)=Match(G2,M:M,0),Match(G2,M:M,0)=Match(H2,L:L,0)),1,0)
Edit1:
Per my comment (to this answer) about false negatives, a UDF or subroutine in VBA would be more appropriate, considering Match() returns the first row that has a match.
As this is not a VBA tagged post, this is a bit above the expected answer... My recommendation would be to:
A) Ensure you are comfortable using VBA.
B) Make a post about creating a user-defined function (note that any post on here about VBA has an expectation that the poster can interact with an expert on the topic and will be putting forth effort to write the code themselves, as StackOverflow is not a code-for-you service).
To help give a lead on what may be in your UDF:
A loop to go through the values from first row to last row in the search column (i.e., L, M, & P)
A variable to dynamically identify the last row of your search column
An if-statement to compare values from your lookup values (i.e., A2, G2, H2) to the search values at the current iteration of the loop
An output of 1 (has match) or 0 (no match).
There are many ways to go about this with VBA; hopefully that's a good start for you, Irina!

Update a column in one spreadsheet by matching a key column in a second spreadsheet

I have two excel files which have the same unique key and would like to update data from one file to another. To be more specific: I have FileA that has the unique key on Column B and FileB that has the unique key on Column B as well. I would like to update FileA:ColumnK from FileB:ColumnD BUT the records are not in the same order!
That means that row 14 on FileA is row 525 on FileB. So my solution would be on cell K14=FileB:D525...
I found a formula to check for duplicates. It works, but I want to pull data from one file to another from different rows!
How can I accomplish that?
Actually, as Jeeped noted, you should use an INDEX(MATCH()) pair or VLOOKUP() function.
That is what's your VLOOKUP() function should look like (assuming it's entered in cell D2 of your FileA):
=VLOOKUP(B2,[FileB.xlsx]Sheet1!$B$2:$D$10,3,0)
B2:D10 range in the second argument should be expanded to include all of the data in FileB. Dollar signs make the reference absolute, so it doesn't change when you copy that formula down to all the cells in FileA's column D.
As for INDEX(MATCH()) pair, this is an example:
=INDEX([FileB.xlsx]Sheet1!$B$2:$D$10,MATCH(B2,[FileB.xlsx]Sheet1!$B$2:$B$10,0),3)
Syntax is a bit complex, but this function is generally faster than VLOOKUP(). Ranges B2:D10 and B2:B10 in this example should also be expanded to include all the real data.
Either way, it will be useful to read internal Excel help on these functions to know what are their arguments at least.

Vlookup and get the min value (date)

TOP Table is Input, and bottom table is preview for required output.
For Each ID I need to find earliest datetime. I also need other information from other columns (please see image below).
My current solution is:
In Cell E2 =A2
Cell E3 drag down =IF(E2<>A3,IF(E1=A3,"",A3),"")
In Cell F2 drag down =IF(E2<>"",MIN(IF($A$2:$A$14=E2,$C$2:$C$14)),"") Ctrl+Shift+Enter
One more option without any intermediate calculations:
Select the whole range starting E2 and to the last row where IDs are located - for the sample given it's row 14, so select range E2:E14: =IFERROR(INDEX($A$2:$A$14,SMALL(IF(MATCH($A$2:$A$14,$A$2:$A$14,0)=ROW(INDIRECT("1:"&ROWS($A$2:$A$14))),MATCH($A$2:$A$14,$A$2:$A$14,0),""),ROW(INDIRECT("1:"&ROWS($A$2:$A$14))))),"") and press CTRL+SHIFT+ENTER instead of usual ENTER - this will define a Multicell ARRAY formula and will result in curly {} brackets around it (but do NOT type them manually!).
F2 (ID2): =IF(E2="","",SUMPRODUCT(--(E2=$A$2:$A$14),--(G2=$C$2:$C$14),$B$2:$B$14)) - normal formula.
G2 (Min Date): =IF(E2="","",MIN(IF(E2=$A$2:$A$14,$C$2:$C$14,2^100))) and press CTRL+SHIFT+ENTER instead of usual ENTER - this will define an ARRAY formula and will result in curly {} brackets around it (but do NOT type them manually!).
H2 (InCh): =IF(E2="","",INDEX($D$2:$D$14,SUMPRODUCT(--(E2=$A$2:$A$14),--(F2=$B$2:$B$14),--(G2=$C$2:$C$14),ROW(INDIRECT("1:"&ROWS($D$2:$D$14)))))) - normal formula.
Remarks:
To make the solution more compact and easy to read, define named range for ID column, and then reference other data columns using OFFSET.
ID2 values may not be unique - as they are on the sample for IDs 1...3.
Resulting set for Min Date should be formatted the same way as source Date row.
The key formula of the solution - is multicell monster which returns unique IDs without empty rows - as OP requested)
Sample file: https://www.dropbox.com/s/d2098updfh8djnf/MinDateIDs.xlsx
This is quite a challenge... I think I have found an approach that works. For the sake of clarity, I used a few helper columns. Also, I did not use any named ranges but stuck with the column-row indications. You might want to change that.
It looks like this:
and zooming in to the relevant columns:
Column F contains an array formula to filter out duplicates. An approach is explained here. The formula I used in F2 is
=INDEX($A$2:$A$14, MATCH(MIN(IF(COUNTIF($F$1:F1,$A$2:$A$14)=0, 1, MAX((COUNTIF($A$2:$A$14, "<"&$A$2:$A$14)+1)*2))*(COUNTIF($A$2:$A$14, "<"&$A$2:$A$14)+1)), COUNTIF($A$2:$A$14, "<"&$A$2:$A$14)+1, 0))
Use Ctrl-Shift-Enter to confirm as array formula. Drag this down or copy into column F. Then columns G and H contain the starting and ending indices of the duplicate ID values. This answer helped, please upvote it :-). The two formulas used are:
=MATCH(2,1/FREQUENCY($F2,$A$2:$A$14))
in G2, and
=FREQUENCY($A$2:$A$14,$F2)
in H2. Again, drag them down to get the full column filled. Next, column I is for clarification only -- and for sanity checking. It contains the desired minimum date from each sub-array. Column J substitutes that formula into a MATCH to find the actual index of the desired date.
=MIN(OFFSET($C$2:$C$14,$G2-1,0,1+$H2-$G2,1))
in I2 and
=$G2-1+MATCH(2,1/FREQUENCY(MIN(OFFSET($C$2:$C$14,$G2-1,0,1+$H2-$G2,1)), OFFSET($C$2:$C$14,$G2-1,0,1+$H2-$G2,1))
in J2. Finally, columns L, M and N index into the original set of data via
=INDEX(B$2:B$14,$J2)
in L2, which you can drag horizontally and then vertically.
When you are done, you can hide the helper columns, or fold everything into big formulas. Good luck with that... There might be an easier way to achieve this, but I did not find it.
If you want the value from column D in G then assuming that column C values are unique you could just use a VLOOKUP, i.e. in G2 copied down
=VLOOKUP(F2,C$2:D$14,2,0)
Per your picture, they're all in the same sheet. Just sort by ID, then Date (ascending). As you work your way down the ID column, each time the ID changes, you know you've found the row with the minimum Date for that specific ID. Create an extra column to signify where ID changes occur, and filter for those rows (hide the column if you so desire).
And... voila.
Know this link is old, but there is a much shorter and easier way!
How about using a pivot table using the Minimum as field setting and then do a =GETPIVOTDATA() to get the information back!
Seems a lot simpler as these formulas!
Actually, I just realized I've been overthinking this...Excel keeps the top item and removes all that follow when removing duplicates.
So if you are going to create an extra working table anyway, why not just copy the range/columns you want to keep, then use the basic sort.
Sort first by ID, then by the column you want as the second filter. Be sure the sorts are in the order you want (e.g. newest to oldest, oldest to newest, A to Z, Largest to smallest, etc).
Once the data is sorted, remove duplicates based on ID. You are left with all of your columns of data, filtered by newest/oldest/largest/smallest per individual.
This worked for my table with 30,000+ records, filtered down to 1500 unique individuals with most recent (plus associated amount), and with a second filter, the largest (plus associated date) for each person.

Question regarding optimal excel function implementation

I have a question about Excel! I hope that isn't too unconventional for this site...
So I have an Excel table with several thousand rows. It is kind of setup like a db in that the first three of my four columns have numerical values identifying the sequence or order that the content or fourth row contains.
I am running into some possible duplication issues, and I am remembering back to my college days something about there being a function for the type of test I need to do. I need to verify that there are no two rows that have the same values for column 1-3. There should never be a time where all three columns' values match exactly that of another row.
Is VLookUp the function I need? Any excel experts out there that know of a function I could look into? Thanks so much!
the quick one-off solution I employ for this kind of quest is the following
create a single key in one temporary column - say F "=A2 & B2 & C2 ..." if combined key - I copy this formula all the way down
create a group counter for that single key - say G "=IF(F2=F1,G1+1,1)" - I can safely include the header row here because it will move the formula into the false part
This formula in G numerates all identical keys from 1 to N and starts by 1 for a new key - I copy this formula all the way down
Important: convert G formulae into values (copy / paste special onto itself)
sort descending by G and delete/manipulate all rows where counter <> 1 - or use autofilter
later on I delete F & G columns
this may sound a bit complicated, but especially in large tables VLOOKUP, COUNTIF's etc can be very time consuming.
Hope that helps
You could create another column that concatenates the first 3, then do a countif on that. Let's say the concatenation column is D and your data begins in the second row:
=countif(D:D,D2)
Copy the formula down, then filter on >1.
I think what you need is a countifs function.
assume you add one formula in a ceel in row 4:
=COUNTIFS(A:A,A4,B:B,B4,C:C,C4)
and copy to formula to the whole column
Then the cells with value 1 is a unique set while those larger than 1 have duplicates.
If you only need to check the data once, try the "Remove duplicates" functionality. This can be found in the Data tab -> Data Tools -> Remove Duplicates. Just unselect all but the first three columns in the dialog and Excel will do the rest.

Resources