Extract Multiple Sub-strings Between Brackets/Parentheses in Excel - excel

I have various strings (formatted as text) that have a similar syntax in column F. Eg:
1t ttn TEST FAILED (9-5 passOne, 21-7 & 877-12 passTwo)
I want to extract the numbers within the brackets into separate cells. There may be anywhere between 1 and 5 different numbers.
The numbers will always contain a dash (-) somewhere in the middle and may have up to 3 digits either side of the dash.
Is this possible? I've been through previous questions and can't find anything that answers this.
Edit to show entered JvdV formula:

With Excel 2016:
Formula in B1:
=SUBSTITUTE(FILTERXML("<t><s>`"&SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(REPLACE($A1,1,FIND("(",$A1),""),")",""),"&","&")," ","</s><s>`")&"</s></t>","//s[contains(., '-')][translate(., '`-','')*0=0]["&COLUMN(A1)&"]"),"`","")
Confirm through CtrlShiftEnter
Drag right, and down for all lines of data.
This assumed:
Substring of interest (between paranthesis) is always at the end.
Numbers with the hyphen as a delimiter may occur outside this substring and are not wanted in our results.
The trick here is to first sanitize the string and find the opening-paranthesis and remove everything prior to that using REPLACE(). Then, just SUBSTITUTE() the closing-paranthesis. The remainder can be split on spaces using FILTERXML(). To prevent errors and unwanted results I subsituted the ampersand and included accent graves to prevent Excel to recognize these substrings as dates. For more insight on the used xpath and the workings of FILTERXML() I'd like to refer you to this.

Related

Divide text in the column into two columns (text and numbers) - Google sheet or Excel

I have a document in google sheets and the column consists of the name and version, like NLog.Config.4.3.0, NLog.Config.4.4.9 and so on.
See the image below for other examples.
I need to divide this into two columns - name and version, but I'm not familiar with regular expressions so close that I can get this info.
I can use excel and then import it to the Google doc, it doesn't matter for me how to do that.
enter image description here
You can try something like this:
Suppose you have your string in A1, then in B1 you can enter this:
=LEFT(A1,LEN(A1)-MIN(SEARCH({0,1,2,3,4,5,6,7,8,9},A1&"0123456789")))
and in C1 this:
=RIGHT(A1,LEN(A1)-MIN(SEARCH({0,1,2,3,4,5,6,7,8,9},A1&"0123456789"))+1)
you may need to do some adjustments if there are cases without numbers as it will produce an error, for example you can round it with an Iferror like this:
=IFERROR(LEFT(A1,LEN(A1)-MIN(SEARCH({0,1,2,3,4,5,6,7,8,9},A1&"0123456789"))),A1)
Note: A1&"0123456789" is a 'trick' to avoid the search to return error, as the search is looking for all the numbers in the array; we just need the position of the first one, thus the MIN().
Supposing that your raw data were in A2:A, place this in B2:
=ArrayFormula(IFERROR(REGEXEXTRACT(A2:A,"(\D+)\.(.+)"),A2:A))
The regular expression reads "Extract any number of non-digits up to but not including a period as group one, and everything remaining into group two." (In other words, "As soon as you run into a digit after a period, start group two.")
The IFERROR clause means, "If this pattern can't be found, just return the original cell data."
Assuming your content is in column A (Google Sheets), try this arrayformula in any cell other than column A:
=arrayformula(iferror(split(REGEXREPLACE($A:$A,"(\.)(\d+.+$)",char(6655)&"$2"),char(6655)),))
There are two regex groups denoted in ():
(\.) and (\d+.+$).
The first group looks for a dot . - it's escaped using \. The second group looks for a number (0-9) \d, one or more occurrences + then ending with $ one or more + of any character ..
The replacement is char(6655) (wouldn't usually be found in your dataset), and the contents of group two $2.
Then the split function divides the text into two columns by the char(6655) character.
iferror returns nothing if nothing is split.
The arrayformula works down the sheet.

Excel formula to find text and extract a specific length of characters after it

I'm trying to extract a SN number from multiple types of Chromebooks with different barcode outputs. Below is a photo example.
Is it possible to make one formula to extract the serial number from all barcode types?
Here is my request. I'm wondering if the "=IF" command would work for my needs:
First search for "5CD" OR "5CG" and if its found, extract it with the trailing 7 characters. (10 total)
If "5CD" OR "5CG" are not found, search for P20 and extract it with the trailing 5 characters. (8 total)
If neither are found, keep the cell empty.
Here is a working formula but it only finds one character string.
=IFERROR(TRIM(MID(SUBSTITUTE(A2,"",REPT("",99)),MAX(1,SEARCH("5CG",SUBSTITUTE(A2,"",REPT("",99)))-0),10)),"")
I know this formula could end up very long. Thank you for any suggestions. (I'd like to not have to use multiple columns with multiple formulas if possible)
Use:
=MID(A1,MIN(FIND({"5CD","5CG","P20"},A1&"5CD5CGP20")),IF(ISNUMBER(FIND("P20",A1)),8,10))
Depending on one's version this may need to be confirmed with Ctrl-Shift-Enter instead of Enter.

Formula to extract numbers from a text string

How could I extract only the numbers from a text string in Excel or Google Sheets? For example:
A1 - a1b23eg67
A2 - 15dgrgr156
Result desired is
B1 - 12367
B2 - 15156
You can do it with capture groups in Google Sheets
=REGEXREPLACE(A1,ʺ(\d)|.ʺ,ʺ$1ʺ)
Anything which matches the contents of the brackets (a digit) will be copied to the output, anything else replaced by an empty string.
Please see #Max Makhrov's answer to this question
or
=regexreplace(A1,ʺ[^\d]ʺ,ʺʺ)
to remove anything which isn't a digit.
Because you asked for Excel also,
If you have a subscription to office 365 Excel then you can use this array formula:
=--TEXTJOIN("",TRUE,IF(ISNUMBER(--MID(A1,ROW(INDIRECT("1:"&LEN(A1))),1)),MID(A1,ROW(INDIRECT("1:"&LEN(A1))),1),""))
Being an array formula it needs to be confirmed with Ctrl-Shift-Enter instead of Enter when exiting edit mode. If done correctly then Excel will put {} around the formula.
I would imagine there is a way to pull this off with =RegexExtract but I can't figure out how to get it to repeat the search after the first hit. Often with these regex function implementations there is a third parameter to repeat, but it doesn't look like google implemented it.
At any rate, the following formula will do the trick. It's just a little roundabout:
=concatenate(SPLIT( LOWER(A1) , "abcdefghijklmnopqrstuvwxyz" ))
This is converting the string to lower case, then splitting the string using any letter of the alphabet. This will return an array of the numbers left over, which we concatenate back together.
Update, switched over to =REGEXREPLACE() instead of extract...:
=regexreplace(A1, "[a-z]", "")
That's a much cleaner and obvious way of doing it than that concat(split()) nonsense.

Testing if first character of a string is non-numeric in LibreOffice Basic

I have column of strings mostly composed of numbers. Most of these strings are indeed 10 digit numbers formatted as string like :1234567890 except a few of them. Those exceptions start with a literal character with a letter to be specific like :A1234567890. What I want to do is while looping over that column I want to check on first characters and if it is a literal I want to branch my code. I'm not familiar with LibreOffice Basic yet VBA so any help is appreciated.
Listing 6.14. Display all data in a column in Andrew Pitonyak's Macro Document shows how to loop through all cells in a column.
To find out if the cell's string is numeric, use the IsNumeric function:
If IsNumeric(aCell.String) Then

How to add cells with mix of 6 to 8 decimal places together

Because of floating point values, I cannot add a string of cells that contain values such as:
0.08178502
0.09262585
0.13261762
0.13016377
0.12302067
0.1136332
0.12176183
0.11430552
0.09971409
0.125285
Even if I try adding the first two through a sum formula or auto sum through selecting them, excel spits out an error. I have googled this like crazy and tried to change number formats. Is there a function that can allow me to add this information ?
Screenshot:
The spreadsheet is available on my Dropbox.
Those numbers are all preceded by a NBSP (Char Code 160). So, in order to sum them, you have to remove that. Many solutions. Here's one:
=SUMPRODUCT(--SUBSTITUTE(A1:A18,CHAR(160),""))
If a formula like:
=A1+A2+A3+A4+A5+A6+A7+A8+A9+A10
produces:
#VALUE!
then your "numbers" cells contain non-visible characters.
They must be removed before the formula will work.
If the cells contain text strings and not actual values you will need to convert the text to numeric values before performing any calculations. The function "=value(cell)" will bring the numeric value.
e.g.: A1 contains "000.12345678" (or some other non-numeric presentation of numerals)
In cell B1 type: =value(a1)
Cell B1 now operates as the real number 0.12345678
Oddly enough, the fact that it said 0.xxxxx in all numbers vs. .xxxxx is what the issue was. I'm just sharing that for folks who google/search and have same issue.
All I had to do was select that whole row and do a search in replace for "0." and make it just "." and now my numbers were usable in equations. For some reason the adjustment of formating as many searches suggested wasn't working

Resources