Trying to extract a string of text pattern from the beginning and the end of a cell in Excel - excel

I have the following data and what I would like to see on the column result:
Data
Result
PN 65011:2020text text text PN 65011:2020
PN 65011:2020, PN 65011:2020
PN 45014-1:2017text text text text PN 65014-1:2017 PN 8726-1:2017/P11:2020
PN 45014-1:2017, PN 65014-1:2017, PN 8726-1:2017/P11:2020
PN 6534:2020text text text text
PN 6534:2020
PN 65014-1:2017text text text text PN 65014-1:2017/PC1:2013
PN 65014-1:2017,PN 65014-1:2017/PC1:2013
PN ESO 67345:2019text text text PN 65018-1:2019/PC2:2020
PN ESO 67345:2019, PN 65018-1:2019/PC2:2020
PN ESO/EOC 5320:2013text text text PN ESO 27380:2019 PN 65015-1:2020/PC:2021
PN ESO/EOC 5320:2013, PN ESO 27380:2019, PN 65015-1:2020/PC:2021
I have used ="PN "&TEXTJOIN(", PN ",1,IF(ISNUMBER(SEARCH("/",TRIM(MID(SUBSTITUTE(A2,"PN ",REPT(" ",LEN(A2))),(ROW(INDIRECT("1:"&LEN(A2))))*LEN(A2)-(LEN(A2)-1),LEN(A2))))),TRIM(MID(SUBSTITUTE(A2,"PN ",REPT(" ",LEN(A2))),(ROW(INDIRECT("1:"&LEN(A2))))*LEN(A2)-(LEN(A2)-1),LEN(A2))),LEFT(TRIM(MID(SUBSTITUTE(A2,"PN ",REPT(" ",LEN(A2))),(ROW(INDIRECT("1:"&LEN(A2))))*LEN(A2)-(LEN(A2)-1),LEN(A2))),MIN(IFERROR(FIND({" "},LOWER(TRIM(MID(SUBSTITUTE(A2,"PN ",REPT(" ",LEN(A2))),(ROW(INDIRECT("1:"&LEN(A2))))*LEN(A2)-(LEN(A2)-1),LEN(A2))))),""))-1)))
And I almost get what I would like to see, except for the last row (PN ESO 5320:2013), I don't get the numbers. It stops at PN ESO. Like this:
Data
Result
PN ESO/EOC 5320:2013text text PN ESO 27380:2019 text PN 65015-1:2020/PC:2021
PN ESO/EOC, PN ESO
Any ideas on how I can get the entire reference?
Thank you very much in advance.

Here is an example on how you could approach this using Excel O365
Formula in B2:
=TEXTJOIN(", ",,LET(X,FILTERXML("<t><s>"&SUBSTITUTE(A2,"PN ","</s><s>PN ")&"</s></t>","//s[position() > 1]"),Y,LEFT(X,FIND("|",SUBSTITUTE(X,":","|",LEN(X)-LEN(SUBSTITUTE(X,":",""))))+4),Y))
The idea here is to first SUBSTITUTE() all instances of "PN " to a valid xpath construct. Then we using FILTERXML() to return all values as an array, obviously still with the concatenated "text text text". Therefor I used LET() to load the array as a variable and use some string manipulation on all elements.
First I substituted the last occurence of the colon in all strings into a pipe-symbol which we then FIND() and return its position. Now we have the positions we can extract the the proper substrings using LEFT(). Used TEXTJOIN() to join the resulting array back together.

If you can accept a VBA solution, regular expressions are well suited for this kind of problem. If your examples are all as you show:
We use the regex which will look for substrings that
start with PN
pick up the following characters until we end with a colon followed by multiple digits.
if there is a / following, then look for the next set up to colon-multiple digit pattern.
To enter this User Defined Function (UDF), <alt-F11> opens the Visual Basic Editor.
Ensure your project is highlighted in the Project Explorer window.
Then, from the top menu, select Insert/Module and
paste the code below into the window that opens.
To use this User Defined Function (UDF), enter a formula like =extrPN(cell_Ref) in some cell.
Option Explicit
Function extrPN(S As String) As String
Dim RE As Object, MC As Object, M As Object
Const sPat As String = "PN[^:]+:\d+(?:/[^:]+:\d+)?"
Dim sTemp As String
Set RE = CreateObject("vbscript.regexp")
With RE
.Global = True
.Pattern = sPat
.ignorecase = False
If .Test(S) = True Then
Set MC = .Execute(S)
For Each M In MC
sTemp = sTemp & ", " & M
Next M
extrPN = Mid(sTemp, 3)
Else: extrPN = "no match"
End If
End With
End Function
Explanation of Regex
extract PN
PN.*?:\d+(?:/[^:]+:\d+)?
Options: Case insensitive; ^$ match at line breaks
Match the character string “PN” literally PN
Match any single character that is NOT a line break character .*?
Between zero and unlimited times, as few times as possible, expanding as needed (lazy) *?
Match the colon character :
Match a single character that is a “digit” \d+
Between one and unlimited times, as many times as possible, giving back as needed (greedy) +
Match the regular expression below (?:/[^:]+:\d+)?
Between zero and one times, as many times as possible, giving back as needed (greedy) ?
Match the character “/” literally /
Match any character that is NOT the colon character [^:]+
Between one and unlimited times, as many times as possible, giving back as needed (greedy) +
Match the colon character :
Match a single character that is a “digit” \d+
Between one and unlimited times, as many times as possible, giving back as needed (greedy) +
Created with RegexBuddy

Related

Excel Formula To Replicate Text To Column Functionality

I would like a formula in excel that does what Text To Columns does.
For example the following string in A1
" text with a comma, stays in one column",," keep starting blank text",1,2,3,"123"
Would be split into multiple cells like this...
The following LET Function allows you to split the text into columns based on the splitter character (in this instance a comma).
It ignores commas that are between quotes (the Delim argument - which has double quotes in it).
It does this by ensuring there is an even number of quotes before the splitter character.
=LET(
NOTES,"Splits a string but also checks to see if the splitter is inside a delimiter. So will ignore a comma inside quotes.",
RawString,$A1,
Splitter,",",Note2,"This is the character to split the string by",
Delim,"""",Note4,"This is the text delimiter it looks odd but it's just a double quote - change to "" if you don't want text delimitation",
IgnoreBlanks,FALSE,
CleanTextDelims,TRUE,
TrimBlanks,FALSE,
SplitString,Splitter&RawString&Splitter,Note3,"Add the splitter to the start and the end to help create the array of split positions",
StringLength,LEN(SplitString),
Seq,SEQUENCE(1,StringLength),Note5,"Get a sequence from 1 to the length of the split string",
Note6,"The below does the bulk of the work. It works out if we are at an odd or even point in terms of count of text delimiters up to the point in the sequence we are processing.",
Note7,"if we are at an even point and we have a delimiter then make a note of the sequence otherwise put a blank.",
PosArray,IF(Seq=StringLength,Seq,IF(MOD(LEN(LEFT(SplitString,Seq))-LEN(SUBSTITUTE(LEFT(SplitString,Seq),Delim,"")),2)=0,IF(MID(SplitString,Seq,1)=Splitter,Seq,""),"")),
PosArrayClean,FILTER(PosArray,PosArray<>""),Note8,"Clean blanks",
StartArray,FILTER(PosArrayClean,PosArrayClean<>StringLength),
EndArray,FILTER(PosArrayClean,PosArrayClean<>1),
StringArray,MID(SplitString,StartArray+1,EndArray-StartArray-1),
StringArrayB,IF(IgnoreBlanks,FILTER(StringArray,StringArray<>""),StringArray),
StringArrayC,IF(CleanTextDelims,IF(LEFT(StringArrayB,1)=Delim,MID(StringArrayB,2,IF(RIGHT(StringArrayB,1)=Delim,LEN(StringArrayB)-2,LEN(StringArrayB))),StringArrayB),StringArrayB),
IFERROR(IF(TrimBlanks,TRIM(StringArrayC),StringArrayC),"")
)
Breaking down each step in the LET formula:
Supply the raw string (from cell A1 in this case)
Set the splitter character - in this case a comma
Set the text delimiter - in this case double quotes (looks odd because it has to be as double double quotes - Delim,"""" )
IgnoreBlanks is an option to exclude blank cells in the output
CleanTextDelims will clean the TextDelimiter (Double quotes) from the start and end of the resultant string
Create a SplitString variable with the split character at the front and back.
Get the length of the string for ease of use
Get a sequence from 1 to the length of the string.
Get an array of the position of characters that are splitters with an even number of Text Delimiters to the left of that position in the string the posArray (splitter position array).
Clean the blanks to get the posArrayClean
Create a start and end array (start array ignores the last and end array ignores the first item in the PosArrayClean)
Get the array of strings/cells to output.
If the IgnoreBlanks is used then igore blank cells
If the CleanTextDelims option is set then strip off the Text Delim (double quotes) from the start and end of the resultant string.
If the TrimBlanks option is set then trim blank spaces off the start and end of the resulting strings.
Hopefully the notes explain clearly how this works and make it easy to modify.
If you want create a named Lambda to use you can use the following code to paste into the formula of a named range called SplitStringDelim (you can name it what you like of course). NB You can't have the line separators in this and I stripped the notes out of it.
=LAMBDA(StringRaw,SplitChar,DelimChar,IgnoreBlank,CleanTextDelim,TrimBlank, LET( RawString,StringRaw, Splitter,SplitChar, Delim,DelimChar, IgnoreBlanks,IgnoreBlank, CleanTextDelims,CleanTextDelim, TrimBlanks,TrimBlank, SplitString,Splitter&RawString&Splitter, StringLength,LEN(SplitString), Seq,SEQUENCE(1,StringLength), PosArray,IF(Seq=StringLength,Seq,IF(MOD(LEN(LEFT(SplitString,Seq))-LEN(SUBSTITUTE(LEFT(SplitString,Seq),Delim,"")),2)=0,IF(MID(SplitString,Seq,1)=Splitter,Seq,""),"")), PosArrayClean,FILTER(PosArray,PosArray<>""),Note8,"Clean blanks", StartArray,FILTER(PosArrayClean,PosArrayClean<>StringLength), EndArray,FILTER(PosArrayClean,PosArrayClean<>1), StringArray,MID(SplitString,StartArray+1,EndArray-StartArray-1), StringArrayB,IF(IgnoreBlanks,FILTER(StringArray,StringArray<>""),StringArray), StringArrayC,IF(CleanTextDelims,IF(LEFT(StringArrayB,1)=Delim,MID(StringArrayB,2,IF(RIGHT(StringArrayB,1)=Delim,LEN(StringArrayB)-2,LEN(StringArrayB))),StringArrayB),StringArrayB), IFERROR(IF(TrimBlanks,TRIM(StringArrayC),StringArrayC),"")))

Splitting very large string separated with comma and i need to split 50 items only per row

im having very big string on 1st row.so 1st row contains lots of items with comma like below
12345,54322,44444,222222222,444444,121,333,44444,........
I just need to split this till 50 items in every row. lets assume there are 700 items separated with comma and I want to keep till 50 items only in 1st row and then next 50 in 2nd row and so on.
I tried with the below code which splits till 50 for sure but im not sure if this will works going forward. so need help on this
OutData = Split(InpData, ",")(50)
MsgBox OutData
You can do this in many more ways, but one would be to replace every nth comma. For example through Regular Expressions:
Sub Test()
Dim s As String: s = "1,2,3,4,5,6,7,8,9,10,11"
Dim n As Long: n = 2
Dim arr() As String
With CreateObject("vbscript.regexp")
.Global = True
.Pattern = "([^,]*(?:,[^,]*){" & n - 1 & "}),"
arr = Split(.Replace(s, "$1|"), "|")
End With
End Sub
The pattern used means:
( - Open 1st capture group;
[^,]* - Match 0+ (Greedy) characters other than comma;
(?: - Open a nested non-capture group;
,[^,]* - Match a comma and again 0+ characters other than comma;
){1} - Close the non-capture group and match n-1 times (1 time in the given example);
), - Close the capture group and match a literal comma.
Replace every match with the content of the 1st capture group and a character you know is not in the full string so we can split on that character. See an online demo
I suppose you can do whatever you like with the resulting array. You probably want to transpose it into the worksheet.

Find count of multiline in an Excel cell starting with delimiter -

I am looking to find formula which gives me count of -> how many line in multiline of the cell are begining with - (hyphen)
for e.g. if cell contains
how are you keeping up
-I am well and need toy
-"You" are asking wrong question
<you are wrong>
-why should i reply you
sum count of qualified multiline is = 3
can anyone help me out here please
If you first lines never start with an hyphen, or at least do not count towards the total, then try:
Formula in B1:
=(LEN(A1)-LEN(SUBSTITUTE(A1,CHAR(10)&"-","")))/2
If your first line can also start with an hyphen and therefor count towards the total, try:
=(LEN(CHAR(10)&A1)-LEN(SUBSTITUTE(CHAR(10)&A1,CHAR(10)&"-","")))/2
Here is a VBA solution:
Function CountLines(text As String, Optional flag As String = "") As Long
'counts all lines in text which starts with flag
Dim i As Long, count As Long
Dim lines As Variant
lines = Split(text, vbLf)
For i = LBound(lines) To UBound(lines)
If Mid(lines(i), 1, Len(flag)) = flag Then
count = count + 1
End If
Next i
CountLines = count
End Function
If this is in a standard code module, the example text in A1 and in B1 you enter the formula =CountLines(A1,"-"), it will evaluate to 3.
If you want to include the first line in the potential count, then, in Windows Excel 2013+, you can try:
=COUNTA(FILTERXML("<t><s>" & SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,">",">"),"<","<"),"""","""),CHAR(10),"</s><s>") & "</s></t>","//s[starts-with(text(),'-')]"))
Replace illegal xml characters ",<, and >
Create an XML by splitting into nodes based on the LF character
Use xpath //s[starts-with(text(),'-')] to return only those nodes that start with a hyphen.
COUNTA to return the count of those nodes

Excel replace characters in string before and after 'x'

Hello I have a column with strings (names of products) in it.
Now these are formatted as Name LenghtxWidth, example Green box 20x30. Now I need to change the 20 with the 30 in this example so I get Green box 30x20, any ideas how I can achieve this?
Thanks
Here is both a formula solution, as well as a VBA solution using Regular Expressions:
Formula
=LEFT(A1,FIND(TRIM(RIGHT(SUBSTITUTE(A1," ",REPT(" ",99)),99)),A1)-1)&
MID(TRIM(RIGHT(SUBSTITUTE(A1," ",REPT(" ",99)),99)),SEARCH("x",TRIM(RIGHT(SUBSTITUTE(A1," ",REPT(" ",99)),99)))+1,99)&
"x"&
LEFT(TRIM(RIGHT(SUBSTITUTE(A1," ",REPT(" ",99)),99)),SEARCH("x",TRIM(RIGHT(SUBSTITUTE(A1," ",REPT(" ",99)),99)))-1)
UDF
Option Explicit
Function RevWL(S As String)
Dim RE As Object
Const sPat As String = "(\d+.?\d*)x(\d+.?\d*)"
'If L or W might start with a decimal point, and not a digit,
'Then change sPat to: (\d*.?\d+)x(\d*.?\d+)
Set RE = CreateObject("vbscript.regexp")
With RE
.Global = True
.ignorecase = True
.Pattern = sPat
RevWL = .Replace(S, "$2x$1")
End With
End Function
Here is an example of the kinds of data I tested with:
The Formula works by looking at the last space-separated substring which would be LxW, then reversing the portion after and before the x, then concatenating everything back together.
The regex pattern captures the two numbers (could be integers or decimals, so long as the start with an integer -- although that could be changed if needed), and reversing them.
Here is a more detailed explanation of the regex (and the replacement string) with links to a tutorial:
(\d+.?\d*)x(\d+.?\d*)
(\d+.?\d*)x(\d+.?\d*)
Options: Case insensitive; ^$ don’t match at line breaks
Match the regex below and capture its match into backreference number 1 (\d+.?\d*)
Match a single character that is a “digit” \d+
Between one and unlimited times, as many times as possible, giving back as needed (greedy) +
Match any single character that is NOT a line break character .?
Between zero and one times, as many times as possible, giving back as needed (greedy) ?
Match a single character that is a “digit” \d*
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) *
Match the character “x” literally x
Match the regex below and capture its match into backreference number 2 (\d+.?\d*)
Match a single character that is a “digit” \d+
Between one and unlimited times, as many times as possible, giving back as needed (greedy) +
Match any single character that is NOT a line break character .?
Between zero and one times, as many times as possible, giving back as needed (greedy) ?
Match a single character that is a “digit” \d*
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) *
$2x$1
Insert the text that was last matched by capturing group number 2 $2
Insert the character “x” literally x
Insert the text that was last matched by capturing group number 1 $1
Created with RegexBuddy
Here is a VBA solution that will work for you:
Option Explicit
Function Switch(r As Range) As String
Dim measurement As String
Dim firstPart As String
Dim secondPart As String
measurement = Right(r, Len(r) - InStrRev(r, " "))
secondPart = Right(measurement, Len(measurement) - InStr(1, measurement, "x"))
firstPart = Left(measurement, InStr(1, measurement, "x") - 1)
Switch = Left(r, InStrRev(r, " ") - 1) & " " & secondPart & "x" & firstPart
End Function
You can paste this in a regular module in the VBE (Visual Basic Editor) and use it as a regular function/formula. If your value is in cell A1 then type =Switch(A1) in cell B1. Hope it helps!
Ok, so it is really easier to use VBA, but if you want only some formulas you can use some columns to split your text and then concatenate your cells.
Here is a little example:
Of course B1-4 are optional. It is here only to have something more readable, but you can do use only one formula
=CONCATENATE(LEFT(A1, SEARCH(" ",A1,1)-1)," ",RIGHT(RIGHT(A1,LEN(A1)-SEARCH(" ",A1,1)),LEN(RIGHT(A1,LEN(A1)-SEARCH(" ",A1,1)))-SEARCH("x",RIGHT(A1,LEN(A1)-SEARCH(" ",A1,1)),1)),"x",LEFT(RIGHT(A1,LEN(A1)-SEARCH(" ",A1,1)), SEARCH("x",RIGHT(A1,LEN(A1)-SEARCH(" ",A1,1)),1)-1))
If you have several spaces in your names, you can use this formula that will search the last space in the text
=CONCATENATE(LEFT(A1, SEARCH("^^",SUBSTITUTE(A1," ","^^",LEN(A1)-LEN(SUBSTITUTE(A1," ",""))))-1)," ",RIGHT(RIGHT(A1,LEN(A1)-SEARCH("^^",SUBSTITUTE(A1," ","^^",LEN(A1)-LEN(SUBSTITUTE(A1," ",""))))),LEN(RIGHT(A1,LEN(A1)-SEARCH("^^",SUBSTITUTE(A1," ","^^",LEN(A1)-LEN(SUBSTITUTE(A1," ",""))))))-SEARCH("x",RIGHT(A1,LEN(A1)-SEARCH("^^",SUBSTITUTE(A1," ","^^",LEN(A1)-LEN(SUBSTITUTE(A1," ",""))))),1)),"x",LEFT(RIGHT(A1,LEN(A1)-SEARCH("^^",SUBSTITUTE(A1," ","^^",LEN(A1)-LEN(SUBSTITUTE(A1," ",""))))), SEARCH("x",RIGHT(A1,LEN(A1)-SEARCH("^^",SUBSTITUTE(A1," ","^^",LEN(A1)-LEN(SUBSTITUTE(A1," ",""))))),1)-1))

Remove right half of string after a certain number of spaces

I'm writing a macro in Excel that is reading some text from a single cell.
ProductID = rw.Cells(1, 1).Text
However the cell may contain some buffer characters, specifically 5 consecutive space characters. I am trying to remove all the characters (length and actual text may vary) after the 5 spaces (including the spaces).
So if the string was:
MyProduct123 removethis
The desired string would be
MyProduct123
It seems I can remove the 5 spaces with
Replace(MyProductStr, " ", "")
but how can I get the position of the right side string or the text to remove that?
You can do this using InStr to find the starting position of the five spaces, and then Left to take just the part of the string before that:
Dim pos As Integer
pos = InStr(ProductID, " ")
If pos > 0 Then
ProductID = Left(ProductID, pos - 1)
End If

Resources