Extract a numeral with a specific number of digits from a string - excel

Question relates to Excel (Office365):
I am seeking a solution that will extract a number with a length of 4 digits from a string.
A couple of examples of the type of strings I am referring to are:
"16016KT 9999 SCT030"
"PROB30 0500 FG BKN001"
"MOD TURB BLW 5000FT TILL302300"
"INTER 6000 SHRA SCT015"
In each of the above strings there are a combination of letters and numbers of varying lengths and no set pattern.
The sequence of characters that I am interested in are the 4 digit numbers (in BOLD). Not, the 5000 in 5000ft.
The sequence of 4 digits is unique to all the strings I will be evaluating.
Thanks!

You may use:
=IFERROR(TEXT(FILTERXML("<t><s>"&SUBSTITUTE(A1," ","</s><s>")&"</s></t>","//s[.*0=0][string-length()=4]"),"0000"),"Non found")

On more recent versions of Excel, you may try:
=RegexpFind(A1, "\b[0-9]{4}\b", 0)
See here for how to activate regex support in Excel.

another solution:
=IFERROR(TEXT(UNIQUE(SEQUENCE(9999)/(FIND(" " & TEXT(SEQUENCE(9999),"0000") &" ",A2)>0),,1),"0000"),"")

Another option
In B1, formula copied down :
=IFERROR(TEXT(0+MID(A1,SEARCH(" ???? ",A1)+1,4),"0000"),"not found")

Related

Find specific characters and return the next value in the cell using Excel Formula

I am not sure where to begin with the formula as I have gotten myself so confused with everything. I have a cell the contains "PON " or "PON: " or "PON = " then the actual PON (Example: PON 123467) I want to formula to return 123467 in the cell.
Examples What I want returned
I have PON 123467 for shoes 123467
I have PON: 234567-AB for food 234567-AB
I have PON - 569874-Weird for accessories 569874-Weird
I have PON = DOG-564-987 for dog food DOG-564-987
I am currently using Excel 365
Filterxml() will give you best companion here in this case. Try-
=FILTERXML("<t><s>"&SUBSTITUTE(FILTERXML("<t><s>"&SUBSTITUTE(A1," for","</s><s>")&"</s></t>","//s[1]")," ","</s><s>")&"</s></t>","//s[last()]")
Using FILTERXML, and testing for a substring following PON, you can try:
=FILTERXML("<t><s>"&SUBSTITUTE(TRIM(A1)," ","</s><s>") & "</s></t>","//s[contains(.,'PON')]/following-sibling::*[string-length(.)>2][1]")
Note that FILTERXML solution will cause a PON that is solely numeric, but with a leading zero, to drop the leading zero. Unfortunately, the xPath implementation in that function does not include the string() function
If dropping the leading zero might be a problem, you can add a character to the node that will force the number to be seen as a string. In the modified formula below, I use the unicode zero-width space, but there are others you can use. Note that this will count as a character for the string=length function, so be sure to maintain the >2 parameter:
=FILTERXML("<t><s>"&SUBSTITUTE(TRIM(A1)," ","</s><s>"&UNICHAR(8203)) & "</s></t>","//s[contains(.,'PON')]/following-sibling::*[string-length(.)>2][1]")
Because of the variablity in your data, that sometimes there are extraneous space-separated substrings between PON and your desired extract, the xpath:
locates the substring PON
returns all subsequent siblings that have a string-length of more than two (adjust if necessary)
returns the first sibling that meets that criterion.
You might try this formula.
=TRIM(LEFT(MID(A2,FIND(#{1,2,3,4,5,6,7,8,9},A2),100),FIND(" ",MID(A2,FIND(#{1,2,3,4,5,6,7,8,9},A2),100))))
It extracts the text between the first number and the first space following that number. The size of that extract is limited to 100 characters.

IFERROR with 3 values

I have a VLOOKUP for postcodes and currently it works when searching for both 3 and 4 character postcodes
e.g.
TW13 - Feltham
UB3 - Uxbridge
=IFERROR(VLOOKUP(LEFT(F2,4)&"*",Postcodes!A:C,3,FALSE),VLOOKUP(LEFT(F2,3)&"*",Postcodes!A:C,3,FALSE))
But I forgot that there are 2 character postcodes and both VLOOKUP and IFERROR only allow two checks to be made.
So where should I be looking to first check for 4 characters, then 3 characters or worst case 2 characters? If it helps all my postcodes are in the correct format with the space e.g. TW13 9XX, UB3 4XJ, W3 4EE.
Just nest in another IFERROR() in the value_if_error clause of the first:
=
IFERROR(VLOOKUP(LEFT(F2,4)&"*",Postcodes!A:C,3,FALSE),
IFERROR(VLOOKUP(LEFT(F2,3)&"*",Postcodes!A:C,3,FALSE),
VLOOKUP(LEFT(F2,2)&"*",Postcodes!A:C,3,FALSE)))
How about just extracting the part prior to the breaking space in postcodes
=IFERROR(VLOOKUP(LEFT(F2,FIND(" ",F2)-1)&"*",Postcodes!A:C,3,FALSE),"")

Using the mid but with varying lengths.- excel

In excel I am using the left,mid and Right functions to pull the 'suffix' of a string.
Example:
1234-1234567-1234
The prefix is 4 digits long
The Base is 7 or 8 digits long
and the Suffix is either 3 or 4 digits long.
I have the right formula as: =RIGHT(A6,LEN(A6)-FIND("-",A6)-8) to handle the varying lengths of the suffix
I need the MID formula that pulls the base section that can handle the varying lengths of the base and suffix.
Given
The prefix is 4 digits long The Base is 7 or 8 digits long....
then you can use this formula
=MID(A1,6,8-ISERR(MID(A1,13,1)+0))
Please try:
=MID(A1,1+FIND("-",A1),FIND("-",MID(A1,1+FIND("-",A1),9))-1)
(just for the part between the hyphens).
But Text to Columns with - as delimiter might be more convenient.
You could try:
=MID(A1,6,FIND("-",A1,6)-FIND("-",A1,1)-1)

Extracting decimal numbers from a string in Excel

I've tried lots of searches for this but I'm still not coming up with anything that works.
I have a range of strings in Column A
Amend.Clause_1.1.AddMCQ
Amend.Clause_1.1.AddNo
Amend.Clause_1.1.AddRepeat
Amend.Clause_1.13.AddRepeat
Amend.Clause_1.13.AddTitle
Amend.Clause_1.13.AddUTQ
Amend.Clause_2.8.Heading_Edit
Amend.Clause_2.8.MCQ
Amend.Clause_2.8.Remove
Amend.Clause_4.26.AddUTQ
Amend.Clause_4.26.Heading_Edit
Amend.Clause_4.26.MCQ
Amend.Clause_5.15.AddMCQ
Amend.Clause_5.15.AddNo
Amend.Clause_5.15.AddRepeat
As you can see, the numbers always start in the same place, after the underscore "_" at position 13.
I need to extract the decimal numbers from these strings into a new column so I'm left with 1.1, 1.13, 1.14, 4.26 etc.
I've tried all sorts of combos of MID, LEFT, LEN, RIGHT but to no avail, trying to find the position of the last period.
Could anyone explain how to accomplish this? Ideally I'd like to do this without VBA.
Thanks
Here you are:
=VALUE(MID(A1,SEARCH("_",A1)+1,SEARCH(".",A1,SEARCH(".",A1,SEARCH("_",A1)+1)+1)-(SEARCH("_",A1)+1)))
Here's what inside =VALUE(MID(...)):
A1 - the whole string itself
SEARCH("_",A1)+1 - find the number starting position - right after "_".
SEARCH(".",A1,SEARCH(".",A1,SEARCH("_",A1)+1)+1)-(SEARCH("_",A1)+1) - find number length - position of second "." after first "." minus number starting position.
Try with three functions:
=MID(A1,14,FIND("#",SUBSTITUTE(A1,".","#",3))-14)
Try this - If the position of _ is not necessarily 13.
=MID(A1,FIND("_",A1,1)+1,FIND("¬¬",SUBSTITUTE(A1,".","¬¬",LEN(A1)-LEN(SUBSTITUTE(A1,".",""))))-FIND("_",A1,1)-1)
Or this if the _ is always 13
=MID(A1,14,FIND("¬¬",SUBSTITUTE(A1,".","¬¬",LEN(A1)-LEN(SUBSTITUTE(A1,".",""))))-14)
Use This:
=VALUE(TRIM(LEFT(SUBSTITUTE(RIGHT(A1;LEN(A1)-FIND("_";A1));".";REPT(" ";LEN(A1));2);LEN(A1))))
assuming value is in A1
Far from ideal, but with a shorter formula than the solutions offered so far:
=SUBSTITUTE(A1,".","_",3)
Catch is that formulae would then need to be converted to values, parsed with delimiter _ (being careful to ensure Column data format is Text) and surplus columns deleted.
When the string Amend.Clause_1.1.AddMCQ is in A1
=Find(".",A1,Find(".",A1)+1)
will give the position of the second decimal point, then you should be able to extract the decimal number.
The syntax is
FIND(find_text, within_text, [start_num])

=IF(ISNUMBER(SEARCH.... the difference between 1 and 11

I am trying to convert a single column of numbers in excel to multiple depending on the content.
e.g. Table 1 contains 1 column that contains 1 or more numbers between 1 and 11 separated with a comma. Table 2 should contain 11 columns with a 1 or a 0 depending on the numbers found in Table 1.
I am using the following formula at present:
=IF(ISNUMBER(SEARCH("1",A2)),1,0)
The next column contains the following:
=IF(ISNUMBER(SEARCH("2",A2)),1,0)
All the way to 11
=IF(ISNUMBER(SEARCH("11",A2)),1,0)
The problem with this however is that the code for finding references to 1 also find the references to 11. Is it possible to write a formula that can tell the difference so that if I have the following in Table 1:
2, 5, 11
It doesn't put a 1 in column 1 of Table 2?
Thanks.
Use, for list with just comma:
=IF(ISNUMBER(SEARCH(",1,", ","&A2&",")),1,0)
If list is separated with , (comma+space):
=IF(ISNUMBER(SEARCH(", 1,", ", "&A2&",")),1,0)
A version of LS_dev's answer that will cope with 0...n spaces before or after each comma is:
=IF(ISNUMBER(SEARCH(", 1 ,",", "&TRIM(SUBSTITUTE(A2,","," , "))&" ,")),1,0)
The SUBSTITUTE makes sure there's always at least one space before and after each comma and the TRIM replaces multiple spaces with one space, so the result of the TRIM function will have exactly one space before and after each comma.
How about using the SUBSTITUTE function to change all "11" to Roman numeral "XI" prior to doing your search:
=IF(ISNUMBER(SEARCH("1",SUBSTITUTE(A2, "11", "XI"))),1,0)
If you want to eliminate "11" case, but this is all based on hardcoded values, there should be a smarter solution.
=IF(ISNUMBER(SEARCH(AND("1",NOT("11")),A2)),1,0)

Resources