Extract string between spaces using location based on instr - excel

All - I'm stuck and need some assistance please. I have a string which is a free-text field to the end user. I need to extract a specific string of text from within this field.
Text is in an array and I have confirmed location of necessary text with InStr and I know it is typically surrounded by at least one space on either side.
I'm looking for a way to extract it based on the location using InStr and Split but I'm not sure how to nest these. There could be any number of spaces in field before or after the string I need because some people like excess spaces. String length is typically 12 BUT could be more or less bc it IS a free text field.
I'm open to any solution that gets the string containing "PO" extracted.
Example String: "V000012345 SAPO22-12345 additional information blah blah"
If InStr(1, Arr2(j, 10), "PO", 1) > 0 Then
Arr3(i, 18) = Split(Arr2(j, 10), " ")(??)
End if

You may try to Filter() the array after Split(). Alternatively, use a regular expression:
Sub Test()
Dim str As String: str = "V000012345 SAPO22-12345 additional information blah blah"
'Option 1: Via Filter() and Split():
Debug.Print Filter(Split(str), "PO")(0)
'Option 2: Via Regular Expressions:
With CreateObject("vbscript.regexp")
.Pattern = ".*?\s?(\S*PO\S*).*"
Debug.Print .Replace(str, "$1")
End With
End Sub
It's case-sensitive and the above would return the 1st match.

This would give you the first element of a SPLIT, that contains "PO":
PONumber = Split(arr2(j, 10), " ")(Len(Left(arr2(j, 10), InStr(1, arr2(j, 10), "PO"))) - Len(Replace(Left(arr2(j, 10), InStr(1, arr2(j, 10), "PO")), " ", "")))
This works by counting the number of spaces before the PO and using that as the index of the SPLIT.
I concede however, the FILTER function offered by JvdV saves you all this hassle - I've not seen it used that way before and it's very efficient.

Related

Extract only a specific text in excel in a machine generated chat transcript

I have a machine generated chat data exported to excel as below
User: I need help in cancelling my order
Agent: Sure I can assist you on that
User: cool
User: Also, I need your support in ordering some other item
User: I don't know how to do that
Agent: Sure, I'll help with all your queries
I'm trying to extract only the user text from the above conversation from each cell into a seperate cell in excel
I tried all known methods to the best of my knowledge to get it. Unfortunately, unable to do so. Please help me achieving this.
Sample output:
I want to extract only user typed message to a different cell from the above conversation like below
User: I need help in cancelling my order
User: cool
User: Also, I need your support in ordering some other item
User: I don't know how to do that
If VBA UDF is an option then we can use regular expressions like ...
Option Explicit
Function ExtractUserChat(ChatString As String) As String
Dim regex As Object, mc As Object, result As String, i As Long
Set regex = CreateObject("VBScript.regexp")
regex.ignorecase = False
regex.Global = True
regex.Pattern = "User: "
ChatString = regex.Replace(ChatString, "{<<USER>>: ")
'Or if you want each submatch on next line then
'ChatString = regex.Replace(ChatString, "{" & Chr(10) & "USER>>: ")
regex.Pattern = "Agent: "
ChatString = regex.Replace(ChatString, "}Agent: ") & "}"
regex.Pattern = "\{[^}]+\}"
Set mc = regex.Execute(ChatString)
result = ""
For i = 0 To mc.Count - 1
result = result & mc(i)
Next i
result = Replace(Replace(result, "{", ""), "}", "")
ExtractUserChat = result
End Function
Assuming the text string appears in a single cell, then FILTERXML might help (see here, for example).
I was able to get the desired result by using FILTERXML together with the the LET function:
=IFERROR(LET(
X, FILTERXML("<t><s>" &SUBSTITUTE($A$1, "User:", "</s><s>")&"</s></t>", "//s"),
Y, IFERROR(SEARCH("Agent:", X)-2, LEN(X)),
"User: "&LEFT(X,Y)),
"")
Working step-by-step:
FILTERXML("<t><s>" &SUBSTITUTE($A$1, "User:", "</s><s>")&"</s></t>", "//s")creates a dynamic array of the string, splitting the text into different rows whenever "User:" appears in the string.
Where X represents the value of each cell in the dynamic array generated in step 1, IFERROR(SEARCH("Agent:", X)-2, LEN(X)) returns either the position of the string "Agent:" within X or the length of X.
Using the LET function, we call step 1 X and step 2 Y. Then our output is the left-most Y characters of X, prepended with the string "User: " (to account for its removal in step 1).
We wrap the entire function in an IFERROR to maintain best practice .
Note that if you want the output in a single cell rather than multiple, you can simply join the cells together with TEXTJOIN. Something like the below would do the trick.
=TEXTJOIN(" ", 1, IFERROR(LET(
X, FILTERXML("<t><s>" &SUBSTITUTE($A$1, "User:", "</s><s>")&"</s></t>", "//s"),
Y, IFERROR(SEARCH("Agent:", X)-2, LEN(X)),
"User: "&LEFT(X,Y)),
""))

VB.net Trim function

I have an issue with trim the string method NOT working completely I have reviewed MS Docs and looked of forums but with no luck... It's probably something simple or some other parameter is missing. This is just a sample,
Please note I need to pick up text before and after #, hence than I was planning to use # as a separator. Trim start # #, Trim End # #. I can't use The last Index or Replace per my understanding they have no direction. But perhaps I am misunderstood MS docs regards to trim Start and End as well...
thanks!
Dim str As String = "this is a #string"
Dim ext As String = str.TrimEnd("#")
MsgBox(ext)
ANSWER:
I found a solution for my problem, if you experience similar please see below:
1st: Trim end will NOT scan for the "character" from the Right as I originally thought it will just remove it from the right.... A weak function I would say:). IndexOf direction ID would be a very simple and helpful. Regards My answer was answered by Andrew, thanks!
Now there is another way around it if you try to split a SINGLE String INTO - QTY based on CHARACTER separation and populate fields accordingly.
Answer is ArrayList. Array List will ID each String so you can avoid repeated populations and etc. After you can use CASE or IF to populate accordingly.
Dim arrList As New ArrayList("this is a # string".Split("#"c)) ' Will build the list of your strings
Dim index As Integer = 1 ' this will help us index the strings 1st, 2nd and etc.
For Each part In arrList 'here we are going thru the list
Select Case index ' Here we are identifying which field we are populating
Case 1 '1st string(split)
MsgBox("1 " & arrList(0) & index) '1st string value left to SPLIT arrList(0).
Case 2 '2nd string(split)
MsgBox("2 " & arrList(1) & index) '2nd string value left to SPLIT arrList(1).
End Select
index += 1 'Here we adding one shift thru strings as we go
Next
Rather than:
Dim str As String = "this is a #string"
Dim ext As String = str.TrimEnd("#")
Try:
Dim str As String = "this is a #string"
Dim ext As String = str.Replace("#", "")
Dim str As String = "this is a #string"
Dim parts = str.Split("#"c)
For Each part in parts
Console.WriteLine($"|{part}|")
Next
Output:
|this is a |
|string|
Maybe there is a better way as we know there are multiple things to do the same thing.
The solution I used is below:
Dim arrList As New ArrayList("this is a # string".Split("#"c)) ' Will build the list of your strings
Dim index As Integer = 1 ' this will help us index the strings 1st, 2nd and etc.
For Each part In arrList 'here we are going thru the list
Select Case index ' Here we are identifying which field we are populating
Case 1 '1st string(split)
MsgBox("1 " & arrList(0) & index) '1st string value left to SPLIT arrList(0).
Case 2 '2nd string(split)
MsgBox("2 " & arrList(1) & index) '2nd string value left to SPLIT arrList(1).
End Select
index += 1 'Here we adding one shift thru strings as we go
Next

regex for Excel to remove all but specific symbols after a specific symbol?

I have stings like this which are addresses, e.g.:
P.O. Box 422, E-commerce park<br>Vredenberg<br><br><br>Curaçao
Adelgatan 21<br>Malmö<br><br>211 22<br>Sweden
Läntinen Pitkäkatu 35 A 15<br>Turku<br><br>20100<br>Finland
I am interested in Country only. Country always comes last after a <br> tag.
Note, that there can be several such tags preceding this last value (e.g. 1st example string).
Is there a good way to do a formula may ve along those lines:
Identify end of string
Loop a character back until one reaches ">" character
Cut everything else (including the ">" encountered)
You don't need RegEx to do this if it's always the last part of the string.
You can get it with String modifiers doing
Sub Test()
Dim str As String, str1 As String, str2 As String
Dim Countries As String
str = "P.O. Box 422, E-commerce park<br>Vredenberg<br><br><br>Curaçao"
str1 = "Adelgatan 21<br>Malmö<br><br>211 22<br>Sweden"
str2 = "La¨ntinen Pitka¨katu 35 A 15<br>Turku<br><br>20100<br>Finland"
Countries = Right(str, Len(str) - InStrRev(str, "<br>") - 3)
Countries = Countries + vbNewLine + Right(str1, Len(str1) - InStrRev(str1, "<br>") - 3)
Countries = Countries + vbNewLine + Right(str2, Len(str2) - InStrRev(str2, "<br>") - 3)
MsgBox Countries
End Sub
Obviously this will need to be updated for how your data set is stored. You can loop through the dataset and use the string modifier on each line
A formula works too. If a string in A1, write in B1:
=TRIM(RIGHT(SUBSTITUTE(A1,"<br>",REPT(" ",100)),100))
Modified using an approach taken from here:
https://exceljet.net/formula/get-last-word

Split, escaping certain splits

I have a cell that contains multiple questions and answers and is organised like a CSV. So to get all these questions and answers separated a simple split using the comma as the delimiter should separate this easily.
Unfortunately, there are some values that use the comma as the decimal separator. Is there a way to escape the split for those occurrences?
Fortunately, my data can be split using ", " as separator, but if this wouldn't be the case, would there still be a solution besides manually replacing the decimal delimiter from a comma to a dot?
Example:
"Price: 0,09,Quantity: 12,Sold: Yes"
Using Split("Price: 0,09,Quantity: 12,Sold: Yes",",") would yield:
Price: 0
09
Quantity: 12
Sold: Yes
One possibility, given this test data, is to loop through the array after splitting, and whenever there's no : in the string, add this entry to the previous one.
The function that does this might look like this:
Public Function CleanUpSeparator(celldata As String) As String()
Dim ret() As String
Dim tmp() As String
Dim i As Integer, j As Integer
tmp = Split(celldata, ",")
For i = 0 To UBound(tmp)
If InStr(1, tmp(i), ":") < 1 Then
' Put this value on the previous line, and restore the comma
tmp(i - 1) = tmp(i - 1) & "," & tmp(i)
tmp(i) = ""
End If
Next i
j = 0
ReDim ret(j)
For i = 0 To UBound(tmp)
If tmp(i) <> "" Then
ret(j) = tmp(i)
j = j + 1
ReDim Preserve ret(j)
End If
Next i
ReDim Preserve ret(j - 1)
CleanUpSeparator = ret
End Function
Note that there's room for improvement by making the separator caharacters : and , into parameters, for instance.
I spent the last 24 hours or so puzzling over what I THINK is a completely analogous problem, so I'll share my solution here. Forgive me if I'm wrong about the applicability of my solution to this question. :-)
My Problem: I have a SharePoint list in which teachers (I'm an elementary school technology specialist) enter end-of-year award certificates for me to print. Teachers can enter multiple students' names for a given award, separating each name using a comma. I have a VBA macro in Access that turns each name into a separate record for mail merging. Okay, I lied. That was more of a story. HERE'S the problem: How can teachers add a student name like Hank Williams, Jr. (note the comma) without having the comma cause "Jr." to be interpreted as a separate student in my macro?
The full contents of the (SharePoint exported to Excel) field "Students" are stored within the macro in a variable called strStudentsBeforeSplit, and this string is eventually split with this statement:
strStudents = Split(strStudentsBeforeSplit, ",", -1, vbTextCompare)
So there's the problem, really. The Split function is using a comma as a separator, but poor student Hank Williams, Jr. has a comma in his name. What to do?
I spent a long time trying to figure out how to escape the comma. If this is possible, I never figured it out.
Lots of forum posts suggested using a different character as the separator. That's okay, I guess, but here's the solution I came up with:
Replace only the special commas preceding "Jr" with a different, uncommon character BEFORE the Split function runs.
Swap back to the commas after Split runs.
That's really the end of my post, but here are the lines from my macro that accomplish step 1. This may or may not be of interest because it really just deals with the minutiae of making the swap. Note that the code handles several different (mostly wrong) ways my teachers might type the "Jr" part of the name.
'Dealing with the comma before Jr. This will handle ", Jr." and ", Jr" and " Jr." and " Jr".
'Replaces the comma with ~ because commas are used to separate fields in Split function below.
'Will swap ~ back to comma later in UpdateQ_Comma_for_Jr query.
strStudentsBeforeSplit = Replace(strStudentsBeforeSplit, "Jr", "~ Jr.") 'Every Jr gets this treatment regardless of what else is around it.
'Note that because of previous Replace functions a few lines prior, the space between the comma and Jr will have been removed. This adds it back.
strStudentsBeforeSplit = Replace(strStudentsBeforeSplit, ",~ Jr", "~ Jr") 'If teacher had added a comma, strip it.
strStudentsBeforeSplit = Replace(strStudentsBeforeSplit, " ~ Jr", "~ Jr") 'In cases when teacher added Jr but no comma, remove the (now extra)...
'...space that was before Jr.

Finding multiple instance of a variable length string in a string

I'm trying to extract my parameters from my SQL query to build my xml for an SSRS report. I want to be able to copy/paste my SQL into Excel, look through the code and find all instances of '#' and the appropriate parameter attached to it. These paramaters will ultimately be copied and pasted to another sheet for further use. So for example:
where DateField between #FromDate and #ToDate
and (BalanceFiled between #BalanceFrom and #BalanceTo
OR BalancdField = #BalanceFrom)
I know I can use Instr to find the starting position of the first '#' in a line but how then do I go about extracting the rest of the parameter name (which varies) and also, in the first two lines of the example, finding the second parameter and extracting it's variable lenght? I've also tried using the .Find method which I've been able to copy the whole line over but not just the parameters.
I might approach this problem like so:
Remove characters that are not surrounded by spaces, but do not
belong. In your example, the parentheses need to be removed.
Split the text using the space as a delimiter.
For each element in the split array, check the first character.
If it is "#", then the parameter is found, and it is the entire value in that part of the array.
My user-defined function looks something like this:
Public Function GetParameters(ByRef rsSQL As String) As String
Dim sWords() As String
Dim s As Variant
Dim sResult As String
'remove parentheses and split at space
sWords = Split(Replace(Replace(rsSQL, ")", ""), "(", ""), " ")
'find parameters
For Each s In sWords
If Left$(s, 1) = "#" Then
sResult = sResult & s & ", "
End If
Next s
'remove extra comma from list
If sResult <> "" Then
sResult = Left$(sResult, Len(sResult) - 2)
End If
GetParameters = sResult
End Function

Resources