PROBLEM:
hey y'all, i have a large dataset of both domestic and international phone numbers formatted in various ways that i need to convert to a particular format based on specific criteria.
example of current phone number formats in the dataset:
###-##-##-####-####
+##-##-####-####
(###) ###-####
+## (#) ## ### ## ##
##-##-######-#
as you can see, the phone number formats vary greatly and there are many more examples that i did not list. i work with datasets averaging 1000+ rows.
what i try varies depending on how much data cleanup i need to perform, but below are some of my current methods.
Approach 1: Manually editing
i have attempted manually updating the phone numbers to my desired formatting. however this is time consuming and leads to user error.
Approach 2: CTRL+1 "Format Cells"
i start by sorting my list of numbers. then follow ctrl+1 > Number > Custom to format the following:
domestic as 000-000-0000, UK as +##-##-####-####, etc.
the issue with this method is that the numbers are stored as formatted "Custom" values. so any special spaces or characters (i.e. "-", "+") do not exist within the string. meaning that i cannot import into my crm.
i have attempted to manually add "'" at the beginning of each formatted phone number, but it removes the special formatting. e.g. ###-###-#### just becomes '##########.
Approach 3: Functions
i have tried using the following functions on domestic phone numbers, but they only work if formatting follows ###-###-####. which is not always the case for the data i work with.
=SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,"(",""),")","")," ",""),"-","")
or
=MID(A1,2,3)&MID(A1,7,3)&RIGHT(A1,4)
Approach 4: Macro
i've attempted recording macros, but this does not work properly since the length/formatting of a cell value and size of a sheet always varies.
Approach 5: VBA script
i am currently exploring various scripts. there are a ton of examples on stackoverflow, but most presume clean data formatted as (###) ###-####. so the scripts do not work for me.
this post was helpful as a first step to removing all special characters from cells: Phone number format
but again, only applies to certain types of formatting.
DESIRED OUTCOME
i undergo this process various times a month and am hoping somebody can help me optimize my approach.
i need domestic numbers to become ###-###-#### and international phone numbers vary, but the UK would look like +##-##-####-####. i need these characters to exist within the actual string of each cell, otherwise my crm will not accept the phone numbers.
I'm not entirely sure, but maybe this gets you going:
Formula in B1:
=MAP(A1:A5,LAMBDA(x,LET(y,CONCAT(TEXTSPLIT(x,TEXTSPLIT(x,ROW(1:10)-1,,1),,1)),TEXT(--y,SWITCH(LEN(y),10,"###-###-####",12,"+##-##-####-####","0")))))
MAP(A1:A5,LAMBDA(x - Loop over a given dataset;
LET(y,CONCAT(TEXTSPLIT(x,TEXTSPLIT(x,ROW(1:10)-1,,1),,1)) - Part where each input gets cleared into just pure numeric characters;
TEXT(--y,SWITCH(LEN(y),10,"###-###-####",12,"+##-##-####-####","0"))))) - Now use SWITCH() to test against the length of the numeric input. If 10 or 12 we kind of know what format we like, the last parameter is the 'standard' format. But obviously you could start adding checks. In the samples given, you'd want to include options for length 11 and 15.
I would want to create a random sequence of numbers in 11 digit format and that should run from 10000000000 to 999999999999 and each of the values should be unique and i would like to populate almost 20-50 million worth of records in excel without having to keep dragging all the way down at the bottom of the cell by clicking + button
I tried using RANDBETWEEN but seems like there are duplicates and i have to keep dragging which is a time consuming activity,is there any alternative better way to accomplish this ?
=RANDBETWEEN(10000000000,999999999999)
For that many unique numbers I suggest using an encryption, where the output is guaranteed unique for unique inputs.
Simply encrypt the numbers 0, 1, 2, ... for different unique inputs. You will need to use the same encryption key and other inputs (IV, nonce etc.) to guarantee unique outputs.
You will need to do some processing on the outputs to get them into the required range. Have a look at Format Preserving Encryption for some help with this.
As #BigBen pointed out, Excel is probably the wrong tool for this.
Trying to write a custom data validation formula that would only allow values in the following format: 2-digit year (this can be just 2 numbers), dash ("-"), then a 1 or 2 letter character(s) (would prefer upper case, but would settle for lower case), another dash ("-"), and then a 5-digit number. So the final value looks like: 17-FL-12345 ...or 16-G-00008...
I actually have a but more, but if I could get the above working, that would be terrific. I don't know if there's a way, but it would be great if additionally I could use custom formatting to get the dashes to appear when they are not entered, i.e., user enters "17FL12345" and it gets automatically formatted to "17-FL-12345". Finally, again, this isn't a deal breaker either, but it would also be great if the last 5 digits would add any leading zero's, i.e., the user enters 17-G-8 (or just 17G8) and it gets formatted to 17-G-00008.
Can't use VBA unfortunately. Some potential solutions to similar questions I've viewed include:
https://www.mrexcel.com/forum/excel-questions/615799-data-validation-mixed-numeric-text-formula-only.html
Data VAlidation - Text Length & Character Type
Excel : Data Validation, how to force the user to enter a string that is 2 char long?
Try this:
=AND(ISNUMBER(VALUE(LEFT(A1,2))),MID(A1,3,1)="-",OR(ISNUMBER(FIND(MID(A1,4,1),$C$1)),AND(ISNUMBER(FIND(MID(A1,4,1),$C$1)),ISNUMBER(FIND(MID(A1,5,1),$C$1)))),MID(A1,LEN(A1)-5,1)="-",ISNUMBER(VALUE(RIGHT(A1,5))),OR(LEN(A1)=11,LEN(A1)=10),LEN(A1)-LEN(SUBSTITUTE(A1,"-",""))=2,LEN(A1)-LEN(SUBSTITUTE(A1,"+",""))=0,LEN(A1)-LEN(SUBSTITUTE(A1," ",""))=0)
Assuming, you want to validate A1. I inserted the letters in C1.
Edit:
I edited the original function, to be more secure and left out the Isnumber part and rather went digit by digit.
If you want exceed the 255 limit, you have to slice the function up.
I created 5 functions.
=AND(ISNUMBER(FIND(LEFT(A1),$C$2)),ISNUMBER(FIND(MID(A1,2,1),$C$2)))
=MID(A1,3,1)="-"
=IF(LEN(A1)=10,AND(ISNUMBER(FIND(MID(A1,4,1),$C$1)),MID(A1,5,1)="-"),IF(LEN(A1)=11,AND(ISNUMBER(FIND(MID(A1,4,1),$C$1)),ISNUMBER(FIND(MID(A1,5,1),$C$1)))))
=IF(LEN(A1)=10,MID(A1,5,1)="-",IF(LEN(A1)=11,MID(A1,6,1)="-"))
=IF(LEN(A1)=10,AND(ISNUMBER(FIND(MID(A1,6,1),$C$2)),ISNUMBER(FIND(MID(A1,7,1),$C$2)),ISNUMBER(FIND(MID(A1,8,1),$C$2)),ISNUMBER(FIND(MID(A1,9,1),$C$2)),ISNUMBER(FIND(MID(A1,10,1),$C$2))),IF(LEN(A1)=11,AND(ISNUMBER(FIND(MID(A1,7,1),$C$2)),ISNUMBER(FIND(MID(A1,8,1),$C$2)),ISNUMBER(FIND(MID(A1,9,1),$C$2)),ISNUMBER(FIND(MID(A1,10,1),$C$2)),ISNUMBER(FIND(MID(A1,11,1),$C$2)))))
Set up data validation as on the picture:
I'm trying to find out how I would go about randomizing account numbers in a file, and where I have the same account number making sure that number has the same random number.
I'm exporting a file to some consultants and obviously don't want them to have secure information, but I want them to be able to count the number of times an account number has appeared for reporting purposes.
For the sake of an answer, as mentioned in a comment:
Create a table that maps actual uniques to dummy random, then lookup the substitution in that table.
I'm working with a data set that deals with personal data (i.e. data that deals with people, not [necessarily] private data)... This data that changes over time, and the format is imposed by the client. I need something to use as a primary key, and unfortunately the only field that uniquely identifies a person and doesn't change unpredictably is SSN. The ID number (primary key) is going to be public facing, so I can't publish that, but I'm hoping to obscure it.
The result must be numeric.
The result may be up to 25 digits long.
The result must be unique.
The result should be as difficult as possible to reverse without a key, given the constraints above.
Is there a numeric cipher that would fit this?
Am I crazy for trying this?
Format perserving encryption sounds like a solution to your problems. Use this on the SSN and then you just have some random 10 digit number that you can pad out to the 25 digit id you need. If you do the padding right, you can even invert it (if you have the key). The point is that after running it through the format perserving encryption, you data is not sensitive.
A social security number is nine digits long, which means there are only 10^9 = 1,000,000,000 unique SSNs. Most operations you perform on a SSN can be bruteforced, so I suggest you just assign unique random 25-digit numbers to each SSN. The random 25-digit number is your public ID, and the relationship between each pair is totally private.
The random key is not dependent upon the data it is assigned to, so there is no way to retrieve the input from the output (if you think of it as a function).