Reformatting domestic/international phone numbers with various formats in Excel (VBA or Functions) - excel

PROBLEM:
hey y'all, i have a large dataset of both domestic and international phone numbers formatted in various ways that i need to convert to a particular format based on specific criteria.
example of current phone number formats in the dataset:
###-##-##-####-####
+##-##-####-####
(###) ###-####
+## (#) ## ### ## ##
##-##-######-#
as you can see, the phone number formats vary greatly and there are many more examples that i did not list. i work with datasets averaging 1000+ rows.
what i try varies depending on how much data cleanup i need to perform, but below are some of my current methods.
Approach 1: Manually editing
i have attempted manually updating the phone numbers to my desired formatting. however this is time consuming and leads to user error.
Approach 2: CTRL+1 "Format Cells"
i start by sorting my list of numbers. then follow ctrl+1 > Number > Custom to format the following:
domestic as 000-000-0000, UK as +##-##-####-####, etc.
the issue with this method is that the numbers are stored as formatted "Custom" values. so any special spaces or characters (i.e. "-", "+") do not exist within the string. meaning that i cannot import into my crm.
i have attempted to manually add "'" at the beginning of each formatted phone number, but it removes the special formatting. e.g. ###-###-#### just becomes '##########.
Approach 3: Functions
i have tried using the following functions on domestic phone numbers, but they only work if formatting follows ###-###-####. which is not always the case for the data i work with.
=SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,"(",""),")","")," ",""),"-","")
or
=MID(A1,2,3)&MID(A1,7,3)&RIGHT(A1,4)
Approach 4: Macro
i've attempted recording macros, but this does not work properly since the length/formatting of a cell value and size of a sheet always varies.
Approach 5: VBA script
i am currently exploring various scripts. there are a ton of examples on stackoverflow, but most presume clean data formatted as (###) ###-####. so the scripts do not work for me.
this post was helpful as a first step to removing all special characters from cells: Phone number format
but again, only applies to certain types of formatting.
DESIRED OUTCOME
i undergo this process various times a month and am hoping somebody can help me optimize my approach.
i need domestic numbers to become ###-###-#### and international phone numbers vary, but the UK would look like +##-##-####-####. i need these characters to exist within the actual string of each cell, otherwise my crm will not accept the phone numbers.

I'm not entirely sure, but maybe this gets you going:
Formula in B1:
=MAP(A1:A5,LAMBDA(x,LET(y,CONCAT(TEXTSPLIT(x,TEXTSPLIT(x,ROW(1:10)-1,,1),,1)),TEXT(--y,SWITCH(LEN(y),10,"###-###-####",12,"+##-##-####-####","0")))))
MAP(A1:A5,LAMBDA(x - Loop over a given dataset;
LET(y,CONCAT(TEXTSPLIT(x,TEXTSPLIT(x,ROW(1:10)-1,,1),,1)) - Part where each input gets cleared into just pure numeric characters;
TEXT(--y,SWITCH(LEN(y),10,"###-###-####",12,"+##-##-####-####","0"))))) - Now use SWITCH() to test against the length of the numeric input. If 10 or 12 we kind of know what format we like, the last parameter is the 'standard' format. But obviously you could start adding checks. In the samples given, you'd want to include options for length 11 and 15.

Related

Locale-independent Text function in Excel

I need to format dates in excel, and I'm trying to use the TEXT formula. The problem is that Excel's intepretation of the arguments changes when the locale changes.
For example: if I have a date in cell A1, that i'd like to convert to text, in the year-month-day-format, I have to use =TEXT(A1, "yyyy-mm-dd") if my PC has an English-language locale, but =TEXT(A1, "jjjj-MM-tt") (I kid you not, the M has to be upper case) if it has a German-language locale. This makes the document unportable. (The second argument is plain text and therefore not converted when changing locale.)
Remarks:
This is just an example, I know I could do the long =YEAR(A1) & "-" & TEXT(MONTH(A1), "00") & "-" & TEXT(DAY(A1), "00") in this case. I'm wondering about the more general case.
The date should not just be displayed in a certain format, it should actually be a string. For someone viewing the file this doesn't make a difference, but when using it in other formulas, it does.
I could write a UDF in VBA to solve the issue, but I cannot use VBA in this document.
I do not care about changing the names of the months etc. It's fine, if the name of the month is June or Juni depending on the locale.
I want to stress that the issue occurs due to the PC's locale - not due to the GUI language of the MS Office version. In the example above, Excel's GUI and formulas were in English in both examples; I just changed the locale on the machine.
Many thanks
Here is a slightly cheaty method: Use a VLOOKUP on a value that will change based on your System Language - for example TEXT(1,"MMMM")
=VLOOKUP(TEXT(1,"MMMM"),{"January","yyyy-MM-dd";"Januar","jjjj-MM-tt"},2,FALSE)
In English: Text(1,"MMMM") = "January", so we do a VLOOKUP on the Array below to get "yyyy-MM-dd"
"January" , "yyyy-MM-dd" ;
"Januar" , "jjjj-MM-tt"
Auf Deutsche, Text(1,"MMMM") = "Januar", also wir machen einen SVERWEIS auf dem Array oben, um "jjjj-MM-tt" zu erhalten! :)
Then, just use that in your TEXT function:
=TEXT(A1, VLOOKUP(TEXT(1,"MMMM"),{"January","yyyy-MM-dd";"Januar","jjjj-MM-tt"},2,FALSE))
Obviously, the main reason this works is that TEXT(1,"MMMM") is valid for both German and English. If you are using something like Filipino (where "Month" is "Buwan") then you might find some issues finding a mutually intelligible formatting input.
I found another possibility. It is not perfect in all cases (see below) but it also works with number formats to be locale independent. As I have the same issue with mixed language versions.
For this you make your own function in vba. Open the developer tools with Alt+F11 and create a new module file. Inside the module file paste something like this:
Function FormatString(inputData, formatingString As String) As String
FormatString = Format(inputData, formatingString)
End Function
Then you can use it in cell formulas with english formating strings. Like:
= FormatString(A1; "yyyy-mm-dd")
Advantage: It also works with number formats:
= FormatString(A1; "00.00")
In case (like Germany) your decimal separator is not a .
Drawbacks:
1 Not identical to TEXT function
this doesn't always work with date formatting as maybe expected and not exactly the same as the TEXT function:
FormatString(1; "MMMM")
does not return "January" but "December" because the 1 is taken as a date. Which is something like 31.12.1899.
2 Has to be saved with macros
You have to save the file as *.xlsm for this to work
Note (1): this answers only the case for locale-independent TEXT to format numbers with decimal symbols and digit grouping symbols. For date formatting, see Chronocidal's answer.
Note (2): this answer does not use VBA functions, which would require enabling macros. Enabling macros may not be possible depending on the company's security policy. If enabling macros is an option, Uwe Hafner's answer would be easier.
You can detect the decimal symbol and digit grouping symbol as follows. Enter the number 1 in a specific cell (e.g. A1) and the number 1000 in another cell (e.g. A2).
Decimal symbol: =IF(TEXT(INDIRECT("A1"),"0,00")="001",".",",")
Digit grouping symbol1: =IF(TEXT(INDIRECT("A2"),"#,###")="1000,",".",",")
This is assuming that the decimal symbol is either . or , and the digit grouping symbol is either , or . respectively. This will not detect unusual digit grouping symbols like (space) or ' (apostrophe).
With this information, you can set up a cell (or cells) with a formula that results in the format code you need to apply.
Suppose you need to format a number to two decimal digits and using the digit grouping symbol. You can assume that if the decimal symbol is . then the digit grouping symbol will be , and vice versa. You can do the following:
A1: 1
A2 (the formatting string): =IF(TEXT(INDIRECT("A1"),"0,00")="001","#,##0.00","#.##0,00")
A3 (contains an arbitrary number you wish to format)
A4 (the formatted number): =TEXT(A3,A2)
Technical note: the INDIRECT function is used intentionally because it is a volatile function. This guarantees that the formatting string and anything dependent on it is recalculated even if no data changed in the Excel document. If INDIRECT is not used, Excel caches results and will not recalculate the formatting string when the Excel document is opened on a PC with different locale settings.
1 - Also known as Thousands separator
The easy fix, whether directly custom formatting a cell or using TEXT(), is to use a country code for a language you know the proper formatting codes for.
For instance, I am in the US, have a US version of Excel, and am familiar with its date code formats. So I'd want to use them and to ensure they "come out" regardless of anyone's Windows or Excel version, or the country they are in, I'd do it like the following (for TEXT(), let's say, but it'd be the same idea in custom formatting):
=TEXT(A1,[$-en-US]"yyyy-mm-dd")
The function would collect the value in A1, ask Excel to treat it as a date, Excel would and would say fine, it's cool (i.e.: the value is, say, 43857 and not "horse") because it is a positive number which is a requirement for anything to be treated as a date, and let the function move on to rendering it as a date in the manner prescribed. Rather than giving an #ERROR! as it would for "horse" or -6.
The function would then read the formatting string and see the language code. It would then drop the usual set of formatting codes it loaded upon starting up and load in the formatting codes for English ("en") and in particular, US English ("US"). The rest of the string uses codes from that set so it would interpret them properly and send an appropriate string back to TEXT() for it to display in the cell (and pass on to other formulas if such exist).
I have no way to test the following, but I assume that if one were to use a format that displayed day of the week names or month names, they would be from the same language set. In other words, Excel would not think that even though you specified a country and language that you still wanted, say, Dutch or Congolese month names. So that kind of thing would still need addressed, but would be an easy fix too just involving, say, a simple lookup one could add though it'd be "fun" setting up the lookup table for each language one wanted to accomodate...
However, the basic issue that arises with this problem in general, is very, very easily solved with the country codes. They aren't even hard or arcane anymore now that the [$-409] syntax has been replaced with things like [$-en-us] and [$-he-IL] and so on.

data validation with numbers + text

Trying to write a custom data validation formula that would only allow values in the following format: 2-digit year (this can be just 2 numbers), dash ("-"), then a 1 or 2 letter character(s) (would prefer upper case, but would settle for lower case), another dash ("-"), and then a 5-digit number. So the final value looks like: 17-FL-12345 ...or 16-G-00008...
I actually have a but more, but if I could get the above working, that would be terrific. I don't know if there's a way, but it would be great if additionally I could use custom formatting to get the dashes to appear when they are not entered, i.e., user enters "17FL12345" and it gets automatically formatted to "17-FL-12345". Finally, again, this isn't a deal breaker either, but it would also be great if the last 5 digits would add any leading zero's, i.e., the user enters 17-G-8 (or just 17G8) and it gets formatted to 17-G-00008.
Can't use VBA unfortunately. Some potential solutions to similar questions I've viewed include:
https://www.mrexcel.com/forum/excel-questions/615799-data-validation-mixed-numeric-text-formula-only.html
Data VAlidation - Text Length & Character Type
Excel : Data Validation, how to force the user to enter a string that is 2 char long?
Try this:
=AND(ISNUMBER(VALUE(LEFT(A1,2))),MID(A1,3,1)="-",OR(ISNUMBER(FIND(MID(A1,4,1),$C$1)),AND(ISNUMBER(FIND(MID(A1,4,1),$C$1)),ISNUMBER(FIND(MID(A1,5,1),$C$1)))),MID(A1,LEN(A1)-5,1)="-",ISNUMBER(VALUE(RIGHT(A1,5))),OR(LEN(A1)=11,LEN(A1)=10),LEN(A1)-LEN(SUBSTITUTE(A1,"-",""))=2,LEN(A1)-LEN(SUBSTITUTE(A1,"+",""))=0,LEN(A1)-LEN(SUBSTITUTE(A1," ",""))=0)
Assuming, you want to validate A1. I inserted the letters in C1.
Edit:
I edited the original function, to be more secure and left out the Isnumber part and rather went digit by digit.
If you want exceed the 255 limit, you have to slice the function up.
I created 5 functions.
=AND(ISNUMBER(FIND(LEFT(A1),$C$2)),ISNUMBER(FIND(MID(A1,2,1),$C$2)))
=MID(A1,3,1)="-"
=IF(LEN(A1)=10,AND(ISNUMBER(FIND(MID(A1,4,1),$C$1)),MID(A1,5,1)="-"),IF(LEN(A1)=11,AND(ISNUMBER(FIND(MID(A1,4,1),$C$1)),ISNUMBER(FIND(MID(A1,5,1),$C$1)))))
=IF(LEN(A1)=10,MID(A1,5,1)="-",IF(LEN(A1)=11,MID(A1,6,1)="-"))
=IF(LEN(A1)=10,AND(ISNUMBER(FIND(MID(A1,6,1),$C$2)),ISNUMBER(FIND(MID(A1,7,1),$C$2)),ISNUMBER(FIND(MID(A1,8,1),$C$2)),ISNUMBER(FIND(MID(A1,9,1),$C$2)),ISNUMBER(FIND(MID(A1,10,1),$C$2))),IF(LEN(A1)=11,AND(ISNUMBER(FIND(MID(A1,7,1),$C$2)),ISNUMBER(FIND(MID(A1,8,1),$C$2)),ISNUMBER(FIND(MID(A1,9,1),$C$2)),ISNUMBER(FIND(MID(A1,10,1),$C$2)),ISNUMBER(FIND(MID(A1,11,1),$C$2)))))
Set up data validation as on the picture:

Why do Excel values in parentheses become negative values?

A colleague and I encountered a behavior in Excel which isn't clear to us.
Background:
We have a tool which converts an Excel sheet into a table format. The tool calculates the formulas which are in excel and replaces variables inside it with specific values.
The excel tool is used by one of our customers who use values like (8) or (247).
These Value are automatically translated by excel to -8 or -247.
Question:
I saw that many people want to display negative numbers in parentheses. But why would Excel change values in parentheses to a negative number?
I know that I could simply change the cell config to text and this would solve the problem but I wonder if there is a reason for the behavior, since there seems to be no mathematical reason for this.
Its simply the different format of cells you are bringing the "values from" and "pasting to". ..... numbers with parentheses are in cells with "accounting" format and negatives are stored in general or standard number formated cells. To resolve you can change the format of destination cells to accounting using cell formatting as number>accounting.
To answer the why, it's because accountants put negative numbers in brackets for readability
Unfortunately, this is one of the excel feature/bugs that helps some folks and frustrates others. When opening a file or pasting content, excel will immediately and always try to parse any values into formats it deems appropriate, which can mess up data like:
Zip Codes / Tel. # → Numeric: 05401 → 5401
Fractions → Dates: 11/20 → Nov, 20th YYYY
Std. Errors → Negative Numbers: (0.1) → -0.1
For some workarounds , see Stop Excel from automatically converting certain text values to dates
Once the file is open/pasted, the damage is already done. At that point, your best bet is:
Updating the field and displaying as text (appending with ') to prevent re-casting
Formatting the field if the operation wasn't lossy and is just presenting the info differently
Running a clean if/else to pad or other convert your data based on the identified errors
Specific to displaying values back in parens, if excel is converting them and treating them like negative numbers (which may or may not be the appropriate way to actually store the data), you can apply a different format to positive and negative numbers to wrap back in parens.
It is standard practice to write negative values as numbers in parentheses, especially in accounting. This makes negative values stand out much more than a simple negative hyphen; compare -1 and (1).
Excel is a tool very commonly used by accountants and supports accountant-style spreadsheets. Therefore, entering (100) means having a value of -100, even if there is no minus hyphen!
Here is a fun fact, if you enter (-10), Excel will treat it as normal text.

Excel, Numberplate Clarification

I am working on an excel document for fuel cards at the minute and my current issue is to write in a formula for validating number plates based on UK standard plates (two letters followed by two numbers then three letters i.e. BK08JWZ). At this point in time we are not considering personal plates in this just to keep things simple.
Ideally I need excel to look at the text in the box and confirm it to an agreed layout but I am struggling to find the right formula. The plates are in column 'I' and I have already added in another column after titled 'approved plates' in column 'J'but this can be deleted if it's not needed.
Results wise, I can do this one of two ways, to either get the excel document to highlight and number plates that do not match the DVLA standard , or have a column next to the number plate column that registers a boolean response to the recognition i.e. If it is valid (true) or if not (false).
Either way the plate needs to be able to be seen as it was currently, so if there is something wrong with it, it needs to be visible, not throw up an error message.
Any help would be very welcome.
All the information on UK standard number plates are on this site:
https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/359317/INF104_160914.pdf
I would do it like this:
1) create a lookup sheet with data from the booklet. One column for allowed "memory tag" identiffiers (first two letters), one column for the allowed "age identiffiers" (first two numbers), and one column for allowed random letters (last three letters, full alphabet except I and Q)
2) strip spaces from the number plate for comparison
3) Use MID(numberplate,1,2), MID(numberplate,3,2) and MID(numberplate,5,3) to compare to each lookup list repectively (using INDEX()>0).
4) when all 3 parts are found in lookup lists the number plate is valid.
Try researching Regular Expressions or RegEx. This is a powerful programming tool to determine whether strings match specific patterns. You can use RegEx expressions to extract the pattern, replace the pattern or test for the pattern. Very efficient but not for the faint-hearted although there is plenty of help on-line. Try this article for starters.
The following RegEx may be what you need..
(?^[A-Z]{2}[0-9]{2}[A-Z]{3}$)|(?^[A-Z][0-9]{1,3}[A-Z]{3}$)|(?^[A-Z]{3}[0-9]{1,3}[A-Z]$)|(?^[0-9]{1,4}[A-Z]{1,2}$)|(?^[0-9]{1,3}[A-Z]{1,3}$)|(?^[A-Z]{1,2}[0-9]{1,4}$)|(?^[A-Z]{1,3}[0-9]{1,3}$)
This was copied from this article which gives a very full explanation using DVLA rules.
EDIT:
To use RegEx within Excel. In the IDE, Tools menu, select References and add the Microsoft VBScript Regular Expressions 5.5 reference.
With acknowlegement to user3616725s helpful observation.

Some but not all Excel numbers show as a date

I have a big .xls file. Some numbers show as a date.
31.08 shows as 31.aug
31.13 shows as 31.13 (that is what i want all columns to be)
When I reformat 31.aug to number it shows as 40768,00
I have found no ways to convert 31.aug to 31.08 as a number. All I am able to do is to reformat 31.aug as d.mm and then it shows as 31.08 and when I try to reformat it from 31.08 to number it shows as 40768,00. No way to cheat Excel using different types of cell formats.
How's your regional settings? There are some Regions where the short date is identified by dd.mm.yyyy. (Estonian, for instance). Maybe if you change the regional settings for US / UK and paste the data again it won't be changed.
Worked in a small test I did here. Hope it helps.
Internally Excel stores Dates as integer. 1 is January 1. 1900. If you entered something that Excel interprets as a date then it will be converted into an integer. I think from this point on there is no way back.
There is an setting in Options on the tab "international" where you can define your decimal separator. If you set this to ".", then your Excel should accept 30.12 as decimal number and not as date.
As pointed out by others, Excel interprets some of your data as a date instead of a number, which depends on your regional settings. To avoid this happening try Tiago's and stema's responses, they will work depending on your regional settings.
To repair your problem in a large file after it has happened without re-entering/re-importing your data, you can use something like
=DAY(B5)+MONTH(B5)/100
to convert a "date" back to a number. Excel will still display it as a date when you first enter this, but when you reformat it as "Number" now it will display the value you originally entered.
Since your column seems to contain a mix between correct numbers and dates, you need to add an if() construct to separate the two cases. If you haven't changed the display format yet (i.e. it still displays 31.Aug) you can use
=IF(LEFT(CELL("format";B7);1)="D";DAY(B7)+MONTH(B7)/100;B7)
which checks if the format is a "D"ate format. If you have already changed the format to Number, but know all your correct data is below 40000, you can use
=IF(B5>40000;DAY(B5)+MONTH(B5)/100;B5)
As suggested above, go to Control Panel - Region and Language - Advanced Settings - Numbers - and change the Decimal Symbol from "," to "."
Good luck!
The data you are pasting, is it by any chance a pivot table.
For example, like you, I am copying a lot of data into a large spreadsheet. The data I am copying is from another sheet and it is a pivot table.
If I paste normally, half will show up as numbers, which they are in the source file and half will show up as dates, for no reason, which drives me insane.
If I Paste->Values however, they will all show up as numbers, and as I don't need the pivot functionality in the destination file this solution is fine.
All you have to do is format cell.
1-right click on the cell where you want to insert the number.
2-then click on Number and select 'General' from the number menu.
Hope this will help future people with the same issue.

Resources