Find duplicates with same number sequence - excel

I am currently trying to filter through loads of user data to find duplicate accounts. The best way to find identify the users are telephone numbers.
Unfortunately the numbers are not saved in the same format, nor do all the cells have the same amount of digits. See below:
+1 912 555 1234
001 912 5551234
(912) 5551234
912 5551234
912-555-1234
Is there anyway to just duplicate search for a certain sequence? So in this case 5551234.
I could just remove all the special signs (brackets, dashes, spaces etc.) manually with a simple "search and replace", right? But still the cells would have different amount of digits which is why normal duplicate search does not work.
I really appreciate your help. Thank you a lot!

Assuming you can't use VBA, I've put together a quick series of functions to deal with all the examples you have above. It may not be comprehensive, but you'll get the general idea. Put all of the below code into row 2 of a spreadsheet (so you can use headings if you wish)
Column A: Tel numbers
Column B (remove whitespace): =SUBSTITUTE(A2, CHAR(32),"")
Column C (remove brackets and dashes): =SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(B2, CHAR(40),""), CHAR(41),""),CHAR(45),"")
Column D (replace +1 with 0): =IF(LEFT(C2,1)="+","0"&RIGHT(C2,LEN(C2)-2),C2)
Column E (replace 001 with 0): =IF(LEFT(D2,3)="001","0"&RIGHT(D2,LEN(D2)-3),D2)
Column F (ensure leading 0): =IF(LEFT(E2,1)="0",E2,"0"&E2)
Just copy/paste the cells down, and all the numbers used in your example will have the same format (in column F).
Note that columns B/C could be combined easily into a single column, but I've left them separated to make it easier to understand how it works. The combined column would be
=SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A2, CHAR(32),""), CHAR(40),""), CHAR(41),""),CHAR(45),"")
If you need to remove any more special characters (in addition to the brackets and dashes) you can find all the ascii codes used by the SUBSTITUTE function in this table.

Related

Excel automatically converting 7 digit CAS number to another number (date?)

Problem: I am working with 2 list. One called HYPHEN and one called CAS Number in columns A and B respectively.
Column C uses a formula that combines column A and B and sorts them such that if a hyphen is present in column A, this is inserted before the adjacent CAS number which is then inserted below and the sequence continues so that all hyphens and CAS numbers are included. I've attached an image to better explain this and the Formula to replicate this is given below.
A CAS number is a unique Identify for a material/chemical and usually is written as 000-00-0, however occasionally you get materials with CAS numbers of 0000-00-0 (or other variations).
For the most part column C is correct because all but one CAS numbers are in the usual format. However As highlighted in red 6132-04-3 is being converted to 1545801.
What I have tried:
I have realised that 6132-04-3 is being converted to 03/04/6132 so I'm pretty sure that this is being recognised as a date which is causing the problem. I have tried to format the cells to all be a text format, I have added a comma before the CAS number but nothing returns the desired value of 6132-04-3 and instead always returns 1545801.
To replicate the issue: Column A and B can have any data entered. To replicate the output of column C the formula is given below:
Formula for Column C:
=FILTERXML(""&SUBSTITUTE(TEXTJOIN(",",TRUE,A2:B26),",","")&"","//b")
(Formula provided by #Gary's Student on Stack Overflow)
Any thoughts on how to prevent the red CAS number being converted when it is sorted in Column C would be really appreciated.
This is a crude way to fix it by adding then removing an arbitrary character:
=MID(FILTERXML("<a><b>"&SUBSTITUTE(TEXTJOIN(",",TRUE,IF(A2:B26="","","x"&A2:B26)),",","</b><b>")&"</b></a>","//b"),2,99)
If you have the issue of some of your strings containing a comma, just use a different separator:
=MID(FILTERXML("<a><b>"&SUBSTITUTE(TEXTJOIN("|",TRUE,IF(A2:B26="","","x"&A2:B26)),"|","</b><b>")&"</b></a>","//b"),2,99)
Looks like you could use:
Formula in D2:
=SUBSTITUTE(FILTERXML("<t><s>'"&TEXTJOIN("</s><s>'",,A2:B10)&"</s></t>","//s"),"'","")
Or:
=MID(FILTERXML("<t><s>'"&TEXTJOIN("</s><s>'",,A2:B10)&"</s></t>","//s"),2,99)
I can suggest you this:
Go to Format Cells ---> Number ---> Custom ---> Type
In this "Type" field write this #000-00-0
Press "OK"

SUMIFS Excel Multi Criteria

I'm trying to do use SUMIFS to calculate the total QTY changes for drawings with a material ID that Matches the Pipe Indent Numbers and ignore the ones that are NOT listed the piping indent numbers column. Below is what I have so far but I know the "=D:D" looks wrong.
I don't know how to reference all the numbers in the Pipe Indent Numbers List(D:) as each their own criteria, so that if the Material ID matches ANY of the Pipe Ident Numbers it will sum it.
I need help with this because my actual Pipe Ident Numbers column have about 100 other numbers not just the 5 shown.
Also not sure would I need to have the drawing number in the formula somewhere too? Maybe that's what I'm missing...
=SUMIFS(C:C,B:B,"=D:D")
Any help would be amazing! Thanks :)
you can use
if you want to compare row by row
=SUMPRODUCT(c2:c6*(b2:b6=d2:d6))
if you want to compare the value in column b with all values in column d (of you do not use Excel 365 press CTRL+SHIFT+ENTER to enter the formular)
=SUMPRODUCT(c2:c6*(b2:b6=TRANSPOSE(d2:d6)))
Best Regards
Chris
Use:
=SUMPRODUCT(SUMIFS(C:C,B:B,D2:D6))
One can use D:D but that would do many unneeded calculations. One should limit the range to the dataset.

Sort text alpha numerically in excel

I have part number data in a spreadsheet that has been converted to text data (not numeric as there are letters) that I need to sort alpha numerically. I have read enough that this appears to be almost impossible due to nulls (I have none of these) dashes (I have tons of these). As you will see below, there are multiple letters and numbers in different locations in the field.
MS16624-2066
RWR80S
02-6009-23
23032-1910
31708-1370
11SM1-T
111SM1-5
The final result required is:
MS16624-2066
RWR80S
02-6009-23
11SM1-T
111SM1-5
23032-1910
31708-1370
I have tried as much as I could by looking at the sorts in this forum, but have had no luck. Can anyone suggest a working approach?
Assuming part numbers are in ColumnA starting in A1, in B1:
=SUBSTITUTE(A1,"-","")
Copy down to suit then Copy ColumnB and Paste Special, Values over the top.
Apply Text to Columns, Fixed width to ColumnB and choose character by character (positions 1 to 11 for your example).
Then sort A:M on ColumnC descending and move the rows with C numbers below the C letters.
You may then choose to delete ColumnsB:M.

Comparing two columns in excel for similarities

I have two columns in excel A and B from 1 - 1400
The value in column A is 10 characters, "K0123456789" and column B is 9 characters "0123456789"
I need to compare the value in column B is the same value as column A without the "k" and highlight it if they do not match. I am not familiar with excel too much, so any information here would help so I do not have to go through all these lines myself on a daily basis.
Thanks for any help!
GenZade
You can put a formula in column C such as (For cell C1):
=IF(A1="K"&B1,"Match","No Match")
Of course, you could also add conditional formatting with a similar formula if you want to literally highlight it.
i would have just commented on neelsg post, but i don't have enough reputation for that, apologies. his solution works based on string values rather than numerical values. Some kind of preceding zeros might not work nicely with it. to compare actual numerical values you can use the following:
=IF(RIGHT(A1,LEN(A1)-1)*1=B1,"match","no match")
so depends if you want to match them as strings or actual numbers

Excel VBA - How to sort by first two letters of cell?

I have two columns I want to sort. I want it to look exactly like below with Column A first sorted by its first two letters, then by Column B.
Column A - Column B
AB - Info 3339876
AB - Data 3339877
AB - Data 3339878
AC - Info 3339123
AC - Data 3339124
AC - Info 3339125
AD - Info 3339456
AD - Info 3339457
AD - Data 3339458
The first two letters of Column A are the MOST important and must be sorted by them first. The information after the first two letters of Column A is irrelevant and does not matter. It is much more important that the Column B # data to be sorted in ascending order second (after the first two letters of column A)
Sorry for the confusion. Hopefully that clears things up.
Sort normally by highlighting the column and Hit Alt + A + SA
Update:
'Excel sorts alphanumeric text left to right, character by character', so it may not properly order your numbers if combined into a single cell. You should create an extra column with just the two letter code that you want to use for sorting by using =LEFT(A2,2) and copying all the way down. Then do a multilayered sort by clicking the sort button, sorting first on the two digit code alphabetically and second on the number column:
the answer above is not excatly correct, it will work until all of the numbers has the same length. but if we have
AA 111
AA 99
then excel sorts it like this:
AA111
AA99
(we want the other way around). you should use the Left function as mentioned, but alone, just to create a column with two first letters from the first one. then you have to use the custom sort
http://office.microsoft.com/en-us/excel-help/sort-data-using-a-custom-list-HA102809333.aspx to sort fisrt by the new column and then by the column with numbers
Select the cells and sort normally.

Resources