How could I extract only the numbers from a text string in Excel or Google Sheets? For example:
A1 - a1b23eg67
A2 - 15dgrgr156
Result desired is
B1 - 12367
B2 - 15156
You can do it with capture groups in Google Sheets
=REGEXREPLACE(A1,ʺ(\d)|.ʺ,ʺ$1ʺ)
Anything which matches the contents of the brackets (a digit) will be copied to the output, anything else replaced by an empty string.
Please see #Max Makhrov's answer to this question
or
=regexreplace(A1,ʺ[^\d]ʺ,ʺʺ)
to remove anything which isn't a digit.
Because you asked for Excel also,
If you have a subscription to office 365 Excel then you can use this array formula:
=--TEXTJOIN("",TRUE,IF(ISNUMBER(--MID(A1,ROW(INDIRECT("1:"&LEN(A1))),1)),MID(A1,ROW(INDIRECT("1:"&LEN(A1))),1),""))
Being an array formula it needs to be confirmed with Ctrl-Shift-Enter instead of Enter when exiting edit mode. If done correctly then Excel will put {} around the formula.
I would imagine there is a way to pull this off with =RegexExtract but I can't figure out how to get it to repeat the search after the first hit. Often with these regex function implementations there is a third parameter to repeat, but it doesn't look like google implemented it.
At any rate, the following formula will do the trick. It's just a little roundabout:
=concatenate(SPLIT( LOWER(A1) , "abcdefghijklmnopqrstuvwxyz" ))
This is converting the string to lower case, then splitting the string using any letter of the alphabet. This will return an array of the numbers left over, which we concatenate back together.
Update, switched over to =REGEXREPLACE() instead of extract...:
=regexreplace(A1, "[a-z]", "")
That's a much cleaner and obvious way of doing it than that concat(split()) nonsense.
Related
I have various strings (formatted as text) that have a similar syntax in column F. Eg:
1t ttn TEST FAILED (9-5 passOne, 21-7 & 877-12 passTwo)
I want to extract the numbers within the brackets into separate cells. There may be anywhere between 1 and 5 different numbers.
The numbers will always contain a dash (-) somewhere in the middle and may have up to 3 digits either side of the dash.
Is this possible? I've been through previous questions and can't find anything that answers this.
Edit to show entered JvdV formula:
With Excel 2016:
Formula in B1:
=SUBSTITUTE(FILTERXML("<t><s>`"&SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(REPLACE($A1,1,FIND("(",$A1),""),")",""),"&","&")," ","</s><s>`")&"</s></t>","//s[contains(., '-')][translate(., '`-','')*0=0]["&COLUMN(A1)&"]"),"`","")
Confirm through CtrlShiftEnter
Drag right, and down for all lines of data.
This assumed:
Substring of interest (between paranthesis) is always at the end.
Numbers with the hyphen as a delimiter may occur outside this substring and are not wanted in our results.
The trick here is to first sanitize the string and find the opening-paranthesis and remove everything prior to that using REPLACE(). Then, just SUBSTITUTE() the closing-paranthesis. The remainder can be split on spaces using FILTERXML(). To prevent errors and unwanted results I subsituted the ampersand and included accent graves to prevent Excel to recognize these substrings as dates. For more insight on the used xpath and the workings of FILTERXML() I'd like to refer you to this.
I have a document in google sheets and the column consists of the name and version, like NLog.Config.4.3.0, NLog.Config.4.4.9 and so on.
See the image below for other examples.
I need to divide this into two columns - name and version, but I'm not familiar with regular expressions so close that I can get this info.
I can use excel and then import it to the Google doc, it doesn't matter for me how to do that.
enter image description here
You can try something like this:
Suppose you have your string in A1, then in B1 you can enter this:
=LEFT(A1,LEN(A1)-MIN(SEARCH({0,1,2,3,4,5,6,7,8,9},A1&"0123456789")))
and in C1 this:
=RIGHT(A1,LEN(A1)-MIN(SEARCH({0,1,2,3,4,5,6,7,8,9},A1&"0123456789"))+1)
you may need to do some adjustments if there are cases without numbers as it will produce an error, for example you can round it with an Iferror like this:
=IFERROR(LEFT(A1,LEN(A1)-MIN(SEARCH({0,1,2,3,4,5,6,7,8,9},A1&"0123456789"))),A1)
Note: A1&"0123456789" is a 'trick' to avoid the search to return error, as the search is looking for all the numbers in the array; we just need the position of the first one, thus the MIN().
Supposing that your raw data were in A2:A, place this in B2:
=ArrayFormula(IFERROR(REGEXEXTRACT(A2:A,"(\D+)\.(.+)"),A2:A))
The regular expression reads "Extract any number of non-digits up to but not including a period as group one, and everything remaining into group two." (In other words, "As soon as you run into a digit after a period, start group two.")
The IFERROR clause means, "If this pattern can't be found, just return the original cell data."
Assuming your content is in column A (Google Sheets), try this arrayformula in any cell other than column A:
=arrayformula(iferror(split(REGEXREPLACE($A:$A,"(\.)(\d+.+$)",char(6655)&"$2"),char(6655)),))
There are two regex groups denoted in ():
(\.) and (\d+.+$).
The first group looks for a dot . - it's escaped using \. The second group looks for a number (0-9) \d, one or more occurrences + then ending with $ one or more + of any character ..
The replacement is char(6655) (wouldn't usually be found in your dataset), and the contents of group two $2.
Then the split function divides the text into two columns by the char(6655) character.
iferror returns nothing if nothing is split.
The arrayformula works down the sheet.
I've been given an excel to import on Database, it was exported from an Access DB. in the excel there's a column type_class, in one excel it's good(sheet1), but on another excel which I moved to sheet2 to make VLOOKUP function, I can't tell whether it's a text or a number column from the first sight. the upper-left green-thing is not showing on all cells. but, using ISTEXT function result in text. below is the original column without any changes or formatting, as well as ISTEXT result.
when I use the column in a VLOOKUB function to transfer the Name to the first sheet, only (1010, 1101, 1102,....), hence the cells with the green-mark on the upper-left corner.
I can easly format the key in sheet1 using text-to-columns, cell formatting, and any other way.
but I cannot change the column in sheet2, I tried:
Text-to-Columns
Cell Formatting
VALUE(text), CLEAN(text), TRIM(text), TRIM(CLEAN(text)), CLEAN(SUBSTITUTE())
Multiply by 1
but only the cell with the green-mark changes to a number, the rest stays the same. I browsed the internet but didn't get a solution either.
Edit:
I uploaded what is need to test the case on the drive. you can find it here
Help Appreciated
For your digit strings that you can't convert to text, from the comments it seems there are extra characters in that string not removable by TRIM or CLEAN.
Determine what those character are
Assume a "non-convertible" digit string is in A1
Enter the following formula
B1: =MID($A$1,ROWS($1:1),1) and fill down
C1: = UNICODE(B1) and fill down
From this you can determine the character to use in a SUBSTITUTE function.
For example:
From the above we see that the character code that we need to get rid of is 160.
So we use:
=SUBSTITUTE(A1,CHAR(160),"")
or, to convert it in one step to a number:
=--SUBSTITUTE(A1,CHAR(160),"")
Note If the character code is >255, use UNICHAR instead of CHAR in the SUBSTITUTE function.
Without an example, I use value() to convert what excel takes as text like so:
=value(left(“10kg”,2))
Or the following also works:
=left(“10kg”,2)*1
Note those double quotes should be the straight ones - sorry smartphone is not always smart...
And if leading or trailing spaces are an issue, then trim() is one solution.
Column A has a sorted-descending list of some bum's Top-250 movies, in the following format: Apocalypse Now (1979)
Column B has a sorted list of My Top-100, in the same format.
Both lists have been copied and pasted into a Notepad text doc to confirm they are similar simple ASCI text – no extra spaces at the end – etc. - and then pasted back into LibreofficeCalc.
I need a function for Column C that shows any of MY movies (B) that he has NOT listed in (A).
Psudo code:
C1 = The cell value in B1 – is it anywhere in A1:A8000? If not – put B1 value into C1, otherwise leave blank.
C2 = The cell value in B2 – is it anywhere in A1:A8000? If not – put B2 value into C2, otherwise leave blank.
Etc.
I have searched and found these functions – none of which work, for whatever reason. I've modified them to 8000 as the upper range which I don't think I'll ever approach.
=IF(ISERROR(MATCH(B1,$A$1:$A$8000,0))=1,B1,"")
=IFERROR(MATCH(B1;$A$1:$A$8000;0);"")
=IFNA(VLOOKUP($B1;$A$1:$A$8000;1;0);"")
=IF(ISNA(VLOOKUP($B1;$A$1:$A$8000;1;0));"";VLOOKUP($B1;$A$1:$A$8000;1;0))
=IF(ISNA(VLOOKUP($B1,$A$1:$A$8000,1,0)),"",VLOOKUP($B1,$A$1:$A$8000,1,0))
=VLOOKUP(B1,$A$1:$A$8000,1,)
=MATCH($B1;$A$1:$A$999;0)
I'd prefer it to be a single cell function, and not VBA.
I actually solved this back in like 2001 using Excel. The trick then was I had to edit the cell and use Ctrl-Shift-Enter to create a “dynamic array”, so the function was bracketed in {} curly brackets. But now I'm using the latest LibreOffice Calc and can't get the ##$# syntax correct.
Thank you!!
Edit NOTE: testing with "A" and "00001" numbers produces very different results. Values have to look like this in both columns:
Alice (1988)
Barfly (1987)
Clueless (1995)
etc.
OK I've tested these in Open Office with the following results:-
=IF(ISERROR(MATCH(B1,$A$1:$A$8000,0))=1,B1,"")
Gives Error 508 because the commas need changing to semicolons.
**=IF(ISERROR(MATCH(B1;$A$1:$A$8000;0))=1;B1;"")**
is fine.
=IFERROR(MATCH(B1;$A$1:$A$8000;0);"")
Gives #Name? because IFERROR isn't recognised.
=IFNA(VLOOKUP($B1;$A$1:$A$8000;1;0);"")
Gives #Name? because IFNA isn't recognised.
=IF(ISNA(VLOOKUP($B1;$A$1:$A$8000;1;0));"";VLOOKUP($B1;$A$1:$A$8000;1;0))
Works but gives the opposite result.
**=IF(ISNA(VLOOKUP($B1;$A$1:$A$8000;1;0));B1;"")**
would be fine.
=IF(ISNA(VLOOKUP($B1,$A$1:$A$8000,1,0)),"",VLOOKUP($B1,$A$1:$A$8000,1,0))
Commas
=VLOOKUP(B1,$A$1:$A$8000,1,)
Commas
=MATCH($B1;$A$1:$A$999;0)
Works but just gives the position of the match.
Probably the easiest way of doing it is:-
**=IF(COUNTIF(A$1:A$8000;B1);"";B1)**
Unfortunately it does seem that strings with brackets in are giving spurious matches in Libre/Open Office. You could get round it by a substitution I guess
=IF(COUNTIF(SUBSTITUTE(SUBSTITUTE(A$1:A$10;"(";"<");")";">");SUBSTITUTE(SUBSTITUTE(B1;"(";"<");")";">"));"";B1)
entered as an array formula and copied (rather than pulled) down or of course global edit all the brackets :-(.
Now that I know the root cause of this thanks to #Lyrl, there is a further option of turning off the regular expressions as suggested or you could escape the brackets:-
=IF(COUNTIF(A$2:A$11;SUBSTITUTE(SUBSTITUTE(B2;"(";"\(");")";"\)"));"";B2)
See documentation on Regex in Open Office here
This should do it,
=IF(ISNUMBER(MATCH(B1,$A$1:$A$8000,0)),"",B1)
Tested formula
=IF(ISNA(MATCH(B1,$A$1:$A$8000,0))=TRUE(),B1,"")
I have an excel document I have to process regularly, while awaiting my company to build an automated process for this, and the issue we recently found is that the formula I'm using strips can't return a result other than #VALUE! when the FIND formula fails to find the text I need it to.
the formula we currently have is:
=IF(FIND("-",M2,3),RIGHT(M2,2))
The cells this formula checks have states, & provinces in them which look like so "CA-ON" or "US-NV".
The problem is that regions for the UK don't fillout as "UK-XX" it inputs the actual county for example "Essex" or "Merryside"
What I need the formula to do is, if it can't find the hyphen(-) in the cell, then it should just take whatever value is there and write it in the cell the formula is in.
I should also mention that some of the cells are also blank, since this is an optional field. Is there anyway to run this formula where if it doesn't find the "-" it just writes whats there?
What about using mid() to see if the third character is "-"
=IF(MID(A1,3,1)="-",RIGHT(A1,2),A1)
If you really want to use the find() function then:
=IF(ISNUMBER(FIND("-",A1)),RIGHT(A1,2),A1)
The IFERROR Function should help here. And there isn't a reason to use the if statement anymore. The formula below will find the hyphen if it is in the first 3 characters, and find the length of the string minus the location of the hyphen and return that string. The IFERROR will catch the instances where there is no hyphen and return your original cell.
=IFERROR(RIGHT(M2,LEN(M2)-FIND("-",LEFT(M2,3),1)),M2)