Regular Expressions [duplicate] - excel

How to extract the capitalized full words from a string in excel ? Refer the first Image, I have used the following formula to extract the CAPITAL / BLOCK LETTER WORDS From a string in a cell, it works perfectly,
• Formula used in cell B2
=TEXTJOIN(" ",,
FILTERXML("<a><b>"&SUBSTITUTE(A2," ","</b><b>")
&"</b></a>","//b[translate(.,'abcdefghijklmnopqrstuvwxyz',
'ABCDEFGHIJKLMNOPQRSTUVWXYZ')=.]"))
The above formula works perfectly as longs as there is no numerical, but it doesn't give proper output when there are some numbers, refer the Image below, may be I am missing something, using O365
Refer the cells those green colored backgrounds, it should bring only the CAPITAL WORDS but it carries also the numbers. What should be the right way here. Thank You!
Courtesy : I have learnt & used FILTERXML formula by following the post of JvdV Sir, and it really helped me a lot, Thank you very much Sir for this wonderful piece.!
Workbook_OneDrive_Link

As per the given sample data:
=TEXTJOIN(" ",,FILTERXML("<t><s>"&SUBSTITUTE(A2," ","</s><s>")&"</s></t>","//s[translate(., 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', '')='']"))
This would check when all uppercase alpha-chars are translated to nothing the node would equal nothing, meaning all characters were uppercase alpha.

Related

Extract information before several special characters in excel

I have a problem with an excel file.
I am trying to extract information from a column. This information appears randomly, before a ".", "-" or ":". So an example would be:
CELL
EXPECTED RESULT
hi.this is:
hi
maybe I- this works
maybe I
Who is: what. like-
Who is
I am using the formula:
=MID(A1,1,FIND("-",A1,1)-1)
Using this one, I get the information I need, but I am not able to add the other characters (".", ":",...) to the formula. Also I have the problem that in a same cell, I can have several of this characters, and I only want the information before the FIRST character (of all posible kinds) that appears in the cell.
I dont know if somebody can help me here.
Thank you very much in advance!
You can try:
Formula in B1:
=TEXTBEFORE(A1:A3,{".","-",":"})
If you don't yet have acces to TEXTBEFORE() then try:
=LEFT(A1,MIN(FIND({".","-",":"},A1&".-:"))-1)
I suppose this is an array-entered formula in versions prior to ms365.

How To Extract The CAPITAL WORDS or BLOCK LETTER WORDS From A String In Excel

How to extract the capitalized full words from a string in excel ? Refer the first Image, I have used the following formula to extract the CAPITAL / BLOCK LETTER WORDS From a string in a cell, it works perfectly,
• Formula used in cell B2
=TEXTJOIN(" ",,
FILTERXML("<a><b>"&SUBSTITUTE(A2," ","</b><b>")
&"</b></a>","//b[translate(.,'abcdefghijklmnopqrstuvwxyz',
'ABCDEFGHIJKLMNOPQRSTUVWXYZ')=.]"))
The above formula works perfectly as longs as there is no numerical, but it doesn't give proper output when there are some numbers, refer the Image below, may be I am missing something, using O365
Refer the cells those green colored backgrounds, it should bring only the CAPITAL WORDS but it carries also the numbers. What should be the right way here. Thank You!
Courtesy : I have learnt & used FILTERXML formula by following the post of JvdV Sir, and it really helped me a lot, Thank you very much Sir for this wonderful piece.!
Workbook_OneDrive_Link
As per the given sample data:
=TEXTJOIN(" ",,FILTERXML("<t><s>"&SUBSTITUTE(A2," ","</s><s>")&"</s></t>","//s[translate(., 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', '')='']"))
This would check when all uppercase alpha-chars are translated to nothing the node would equal nothing, meaning all characters were uppercase alpha.

Excel - Can't Extract Partial String via known formulas

I know this has been beaten to death but I cannot get mine to work for the below example:
%B1234567^SMITH/MIKE^ABC123DEF456?;1234567=0111000?
A1 contains the above text data and I am trying to copy the string between "%B" and the first "^".
I tried:
=mid(left(A1,find("%B",A1)-1),find("^",A1)+1,len(A1))
But no data appears in B1 (where the formula is placed).
Any suggestions?
Thanks, Brendan
You could use:
Formula in B1:
=MID(A1,FIND("%B",A1)+2,FIND("^",A1,FIND("%B",A1))-FIND("%B",A1)-2)
Just tested this and try:
=mid(A1,find("B",A1,1)+1,6)*1
You can then incorporate the extra code to find the caret if needed, as I assumed 7 digits or characters. To get the result recognized as a number multiply by 1.
Just don't use the %
This worked for me.
MID(A1,FIND("%B",A1)+2,FIND("^",A1)-FIND("%B",A1)-2)

Formula to extract numbers from a text string

How could I extract only the numbers from a text string in Excel or Google Sheets? For example:
A1 - a1b23eg67
A2 - 15dgrgr156
Result desired is
B1 - 12367
B2 - 15156
You can do it with capture groups in Google Sheets
=REGEXREPLACE(A1,ʺ(\d)|.ʺ,ʺ$1ʺ)
Anything which matches the contents of the brackets (a digit) will be copied to the output, anything else replaced by an empty string.
Please see #Max Makhrov's answer to this question
or
=regexreplace(A1,ʺ[^\d]ʺ,ʺʺ)
to remove anything which isn't a digit.
Because you asked for Excel also,
If you have a subscription to office 365 Excel then you can use this array formula:
=--TEXTJOIN("",TRUE,IF(ISNUMBER(--MID(A1,ROW(INDIRECT("1:"&LEN(A1))),1)),MID(A1,ROW(INDIRECT("1:"&LEN(A1))),1),""))
Being an array formula it needs to be confirmed with Ctrl-Shift-Enter instead of Enter when exiting edit mode. If done correctly then Excel will put {} around the formula.
I would imagine there is a way to pull this off with =RegexExtract but I can't figure out how to get it to repeat the search after the first hit. Often with these regex function implementations there is a third parameter to repeat, but it doesn't look like google implemented it.
At any rate, the following formula will do the trick. It's just a little roundabout:
=concatenate(SPLIT( LOWER(A1) , "abcdefghijklmnopqrstuvwxyz" ))
This is converting the string to lower case, then splitting the string using any letter of the alphabet. This will return an array of the numbers left over, which we concatenate back together.
Update, switched over to =REGEXREPLACE() instead of extract...:
=regexreplace(A1, "[a-z]", "")
That's a much cleaner and obvious way of doing it than that concat(split()) nonsense.

Removing parts of a cell

In Excel, how do I write a formula that will partially delete a cell (from a certain point onwards).
For example, if A1 is "23432 Vol 23432". I want B1 to just be "23432 " (everything from Vol onwards is removed). Thanks.
you cannot delete cells with formulas in Excel.
you can modify the content of a cell by using formulas. you may use LEFT(), RIGHT(), MID() and other similar string processing functions.
is there any rule about the number? If for example the number is always 5 digits long, you can return "23432" out of "23432 Vol 23432" by typing =LEFT(A1;5)
you might also want to look for the space. think the english equivalent for the german FINDEN-function is FIND(keyword;text;[first charindex]). if splitting by space, you find the number by =LEFT(A1;FIND(" ";A1))
please post detailed information about your problem if you need further assistance.
EDIT: you may also use VBA if your problem needs a custom formula or custom actions taken out on a cell.
EDIT2, SOLUTION:
=LEFT(A1;FIND(" Vol";A1))
is the solution to your problem, iff "Vol" and the rest needs to be removed in any case without condition. Remember that if you have any condition attached to this, you might nest this expression (without the '=' though) in a "IF()"-formula.
hope that helped you.
best regards

Resources