Related
I'm parsing strings in excel, and I need to return everything through the last number. For example:
Input: A00XX
Output: A00
In my case, I know the last number will be between index 3 and 5, so I'm brute-forcing it with:
=LEFT([#Point],
IF(SUM((MID([#Point],5,1)={"0","1","2","3","4","5","6","7","8","9"})+0),5,
IF(SUM((MID([#Point],4,1)={"0","1","2","3","4","5","6","7","8","9"})+0),4,
IF(SUM((MID([#Point],3,1)={"0","1","2","3","4","5","6","7","8","9"})+0),3,
))))
Unfortunately, I've run into some edge cases where the numbers extended beyond index 5. Is there a generic way to find the last number in a string using excel formulas?
Note:
I've tried =MAX(SEARCH(... but it returns the index of the first number, not the last.
As a starting point: if we know the position of the last number, we can use LEFT to get the string to that point. Suppose that the position is 5:
=LEFT(A1, 5)
But, we don't know the position of the last number. Now, what if the only valid number was 0, and it only appeared once: then we could use FIND to locate the position of the number:
=LEFT(A1, FIND(0, A1))
But, we have more than one valid number. Suppose that we had all the numbers from 0 through 9, but each number could only appear once — then we could use MAX on a FIND array, to tell us which of the numbers is the last one:
=LEFT(A1, MAX(FIND({0,1,2,3,4,5,6,7,8,9}, A1)))
Unfortunately, FIND will throw a #VALUE! error any number doesn't appear, which will then make MAX return the same error. So, we need to fix that with IFERROR:
=LEFT(A1, MAX(IFERROR(FIND({0,1,2,3,4,5,6,7,8,9}, A1), 0)))
However, numbers can appear more than once. As such, we need a method to find the last occurrence of a value in a string (since FIND and SEARCH will, by default, return the first occurrence).
The SUBSTITUTE function has 3 mandatory arguments — Initial String, Value to be Replaced, Value to Replace with — and one Optional argument — the occurrence to replace. Normally, this is omitted, so that all occurrences are replaced. But, if we know how many times a character appears in a string, then we can replace just the last instance with a special/uncommon sub-string to search for.
To count how many times a character appears in a String, just start with the length of the String, then subtract the length when you SUBSTITUTE all copies of that character for Nothing:
=LEN(A1) - LEN(SUBSTITUTE(A1, 0, ""))
This means we can now replace the last occurrence of the character with, for example, ">¦<", and then FIND that:
=FIND(">¦<", SUBSTITUTE(A1, 0, ">¦<", LEN(A1) - LEN(SUBSTITUTE(A1, 0, ""))))
Of course, we want to do this for all the numbers from 0 to 9, and take the MAX value (remembering our IFERROR), so we need to put the Array of values back in:
=MAX(IFERROR(FIND(">¦<", SUBSTITUTE(A1, {0,1,2,3,4,5,6,7,8,9}, ">¦<", LEN(A1) - LEN(SUBSTITUTE(A1, {0,1,2,3,4,5,6,7,8,9}, "")))), 0))
Then, we plug that all back into our initial LEFT function:
=LEFT(A1, MAX(IFERROR(FIND(">¦<", SUBSTITUTE(A1, {0,1,2,3,4,5,6,7,8,9}, ">¦<", LEN(A1) - LEN(SUBSTITUTE(A1, {0,1,2,3,4,5,6,7,8,9}, "")))), 0)))
An alternative, assuming that the length of the string in question will never be more than 9 characters (which seems a safe assumption based on your description):
=LEFT(A1,MATCH(0,0+ISERR(0+MID(A1,{1;2;3;4;5;6;7;8;9},1))))
This, depending on your version of Excel, may or may not require committing with CTRL+SHIFT+ENTER.
Note also that the separator within the array constant {1;2;3;4;5;6;7;8;9} is the semicolon, which, for English-language versions of Excel, represents the row-separator. This may require amending if you are using a non-English-language version.
Of course, we can replace this static constant with a dynamic construction. However, since we are already making the assumption that 9 is an upper limit on the number of characters for the string in question, this would not seem to be necessary.
If you have the newest version of Excel, you can try something like:
=LEFT(D1,
LET(x, SEQUENCE(LEN(D1)),
MAX(IF(ISNUMBER(NUMBERVALUE(MID(D1, SEQUENCE(LEN(D1)), 1))), x))))
For example:
This is an exmaple of the string, and it can be longer
1160752 Meranji Oil Sats -Mt(MA) (000600007056 0001), PE:Toolachee Gas Sats -Mt(MA) (000600007070 0003)GL: Contract Services (510000), COT: Network (N), CO: OM-A00009.0723,Oil Sats -Mt(MA) (000600007053 0003)
The result needs to be column1 600007056 column2 600007070 column3 600007053
I am working in Spotfire and creating calclated columns through transformations as I need the columns to join to other data sets
I have tried the below, but it is only picking up the 1st 600.. number not the others, and there can be an undefined amount of those.
Account is the column with the string
Mid([Account],
Find("(000",[Account]) + Len("(000"),
Find("0001)",[Account]) - Find("(000",[Account]) - Len("(000"))
Thank you!
Assuming my guess is correct, and the pattern to look for is:
9 numbers, starting with 6, preceded by 1 opening parenthesis and 3 zeros, followed by a space, 4 numbers and a closing parenthesis
you can grab individual occurrences by:
column1: RXExtract([Amount],'(?<=\\(000)6\\d{8}(?=\\s\\d{4}\\))',1)
column2: RXExtract([Amount],'(?<=\\(000)6\\d{8}(?=\\s\\d{4}\\))',2)
etc.
The tricky bit is to find how many columns to define, as you say there can be many. One way to know would be to first calculate a max number of occurrences like this:
maxn: Max((Len([Amount]) - Len(RXReplace([Amount],'(?<=\\(000)6\\d{8}(?=\\s\\d{4}\\))','','g'))) / 9)
still assuming the number of digits in each column to extract is 9. This compares the length of the original [Amount] to the one with the extracted patterns replaced by an empty string, divided by 9.
Then you know you can define up to maxn columns, the extra ones for the rows with fewer instances will be empty.
Note that Spotfire always wants two back-slash for escaping (I had to add more to the editor to make it render correctly, I hope I have not missed any).
I have a column of numbers in Excel 2016. The numbers span many orders of magnitude, but are all positive. Some are less than zero. How can I return the first significant figure of each cell in a new column?
For example, for the number 1.9 the result should be 1. For the number 0.9 the result should be 9.
Things I've tried:
Using LEFT() to get the first character. This works for values greater than 1, but for numbers between 0 - 1 it returns 0 (that is, LEFT(0.3, 1) returns 0). I've tried using this with scientific notation formatting and it returns the same result.
I've searched Google and SO for solutions to this problem. There are many posts about rounding to significant figures, but I'm looking to truncate, not round.
Reading through Office's online docs regarding scientific notation.
You could use scientific notation:
=LEFT(TEXT(A1,"0.000000000000000E+00"))
Note: You can only have 15 digits of precision in Excel so this should be OK.
you can multiply the number by a factor of 10 significant enough to deal with any 0 not wanted:
=--LEFT(A1*10^LEN(A1),1)
Read the cell value as text, replace dots and zeros (. / 0) with nothing, return the leftmost "character"; multiply it by 1 to coerce it back into a number:
=LEFT(SUBSTITUTE(SUBSTITUTE(TEXT(A1,"#"),".",""),"0",""))*1
You can also create a custom UDF (User Defined Function) that uses Regular Expressions to accomplish this task. This would require copy/paste VBA knowledge, as well as you setting a reference to:
Microsoft VBScript Regular Expressions 5.5
(which can be done by going to the VBE (Alt+F11), Tools > References. Then check the box of the reference listed above)
Paste the following UDF into a standard code module within the VBE:
Public Function SigNum(ByVal InputNumber As Double) As Long
Dim s As SubMatches
With New RegExp
.Pattern = "\.0*([^0])|^([^0])"
If .test(InputNumber) Then
Set s = .Execute(InputNumber)(0).SubMatches
If s(0) > 0 Then ' This is before the period
SigNum = s(0)
Else
SigNum = s(1)
End If
End If
End With
End Function
On your worksheet, you would be able to use your newly created formula as such:
=SigNum(A1)
You can see what it matches in the example on regex101. When viewing this site, green highlighted numbers are what would be returned if the value is < 0, and red would be what is returned if the value > 0). If the value = 0, this will return 0.
Breaking Down the Pattern
Here's how the pattern \.0*([^0])|^([^0]) works. First, you can see that there is a |, which essentially acts like an Or statement, so we will split these into two sections.
First Section \.0*([^0])
\. will match a literal period. This ensures that we are looking at a value that is less than 0.
0* matches all zeros, 0 to unlimited * times. We use * (zero to unlimited) instead of + (1 to unlimited) because a zero is not required to be in front of the significant number - but the zero itself isn't significant.
[^0] This is a negated character class [^...]. This means it will match anything that is not in this class. Since our significant number should be a value other than zero, we do not want to match a zero. And because it's surrounded by a capturing group (...), this is what is returned back to the function.
Second Section ^([^0])
We've established that since the first section didn't match, then the value must be greater than 0.
^ this is an anchor point that matches the beginning of the string. On the first section we didn't require it because we essentially used the period \. as our anchor. Since our value is greater than 0, we need to ensure we are starting from the absolute left of the input number.
(...) Capturing Group. Anything within this group will be returned as a submatch and ultimately back to the function as it's return value.
[^0] Negated Character class. It will match anything except a 0.
I have unique identifiers for each row. For example 19Jan187938 or 19Jan206414 but there are some which are like 19Jan17333. I need to add a 0 before the number if it's 5 digits, so it becomes 19Jan017333.
I tried,
=TEXT(CONCATENATE(19,AB2,C2),"000000")
even with 11 0's, since the total length is 11. Nothing changes.
Try the following:
=CONCATENATE(LEFT(AB2,5),TEXT(RIGHT(AB2,LEN(AB2)-5),"000000"))
It will basically, take the first 5 characters and concatenate that with the remaining characters formatted as a six digit number with leading zeroes
If your identifier is on A1, you can try this:
=IF(LEN(A1)<11;CONCATENATE(LEFT(A1;5);RIGHT("000000"&MID(A1;6;5);6));A1)
See what happens.
I've tried lots of searches for this but I'm still not coming up with anything that works.
I have a range of strings in Column A
Amend.Clause_1.1.AddMCQ
Amend.Clause_1.1.AddNo
Amend.Clause_1.1.AddRepeat
Amend.Clause_1.13.AddRepeat
Amend.Clause_1.13.AddTitle
Amend.Clause_1.13.AddUTQ
Amend.Clause_2.8.Heading_Edit
Amend.Clause_2.8.MCQ
Amend.Clause_2.8.Remove
Amend.Clause_4.26.AddUTQ
Amend.Clause_4.26.Heading_Edit
Amend.Clause_4.26.MCQ
Amend.Clause_5.15.AddMCQ
Amend.Clause_5.15.AddNo
Amend.Clause_5.15.AddRepeat
As you can see, the numbers always start in the same place, after the underscore "_" at position 13.
I need to extract the decimal numbers from these strings into a new column so I'm left with 1.1, 1.13, 1.14, 4.26 etc.
I've tried all sorts of combos of MID, LEFT, LEN, RIGHT but to no avail, trying to find the position of the last period.
Could anyone explain how to accomplish this? Ideally I'd like to do this without VBA.
Thanks
Here you are:
=VALUE(MID(A1,SEARCH("_",A1)+1,SEARCH(".",A1,SEARCH(".",A1,SEARCH("_",A1)+1)+1)-(SEARCH("_",A1)+1)))
Here's what inside =VALUE(MID(...)):
A1 - the whole string itself
SEARCH("_",A1)+1 - find the number starting position - right after "_".
SEARCH(".",A1,SEARCH(".",A1,SEARCH("_",A1)+1)+1)-(SEARCH("_",A1)+1) - find number length - position of second "." after first "." minus number starting position.
Try with three functions:
=MID(A1,14,FIND("#",SUBSTITUTE(A1,".","#",3))-14)
Try this - If the position of _ is not necessarily 13.
=MID(A1,FIND("_",A1,1)+1,FIND("¬¬",SUBSTITUTE(A1,".","¬¬",LEN(A1)-LEN(SUBSTITUTE(A1,".",""))))-FIND("_",A1,1)-1)
Or this if the _ is always 13
=MID(A1,14,FIND("¬¬",SUBSTITUTE(A1,".","¬¬",LEN(A1)-LEN(SUBSTITUTE(A1,".",""))))-14)
Use This:
=VALUE(TRIM(LEFT(SUBSTITUTE(RIGHT(A1;LEN(A1)-FIND("_";A1));".";REPT(" ";LEN(A1));2);LEN(A1))))
assuming value is in A1
Far from ideal, but with a shorter formula than the solutions offered so far:
=SUBSTITUTE(A1,".","_",3)
Catch is that formulae would then need to be converted to values, parsed with delimiter _ (being careful to ensure Column data format is Text) and surplus columns deleted.
When the string Amend.Clause_1.1.AddMCQ is in A1
=Find(".",A1,Find(".",A1)+1)
will give the position of the second decimal point, then you should be able to extract the decimal number.
The syntax is
FIND(find_text, within_text, [start_num])