How to extract text within a string of text - excel

I have a simple problem that I'm hoping to resolve without using VBA but if that's the only way it can be solved, so be it.
I have a file with multiple rows (all one column). Each row has data that looks something like this:
1 7.82E-13 >gi|297848936|ref|XP_00| 4-hydroxide gi|297338191|gb|23343|randomrandom
2 5.09E-09 >gi|168010496|ref|xp_00| 2-pyruvate
etc...
What I want is some way to extract the string of numbers that begin with "gi|" and end with a "|". For some rows this might mean as many as 5 gi numbers, for others it'll just be one.
What I would hope the output would look like would be something like:
297848936,297338191
168010496
etc...

Here is a very flexible VBA answer using the regex object. What the function does is extract every single sub-group match it finds (stuff inside the parenthesis), separated by whatever string you want (default is ", "). You can find info on regular expressions here: http://www.regular-expressions.info/
You would call it like this, assuming that first string is in A1:
=RegexExtract(A1,"gi[|](\d+)[|]")
Since this looks for all occurance of "gi|" followed by a series of numbers and then another "|", for the first line in your question, this would give you this result:
297848936, 297338191
Just run this down the column and you're all done!
Function RegexExtract(ByVal text As String, _
ByVal extract_what As String, _
Optional separator As String = ", ") As String
Dim allMatches As Object
Dim RE As Object
Set RE = CreateObject("vbscript.regexp")
Dim i As Long, j As Long
Dim result As String
RE.pattern = extract_what
RE.Global = True
Set allMatches = RE.Execute(text)
For i = 0 To allMatches.count - 1
For j = 0 To allMatches.Item(i).submatches.count - 1
result = result & (separator & allMatches.Item(i).submatches.Item(j))
Next
Next
If Len(result) <> 0 Then
result = Right$(result, Len(result) - Len(separator))
End If
RegexExtract = result
End Function

Here it is (assuming data is in column A)
=VALUE(LEFT(RIGHT(A1,LEN(A1) - FIND("gi|",A1) - 2),
FIND("|",RIGHT(A1,LEN(A1) - FIND("gi|",A1) - 2)) -1 ))
Not the nicest formula, but it will work to extract the number.
I just noticed since you have two values per row with output separated by commas. You will need to check if there is a second match, third match etc. to make it work for multiple numbers per cell.
In reference to your exact sample (assuming 2 values maximum per cell) the following code will work:
=IF(ISNUMBER(FIND("gi|",$A1,FIND("gi|", $A1)+1)),CONCATENATE(LEFT(RIGHT($A1,LEN($A1)
- FIND("gi|",$A1) - 2),FIND("|",RIGHT($A1,LEN($A1) - FIND("gi|",$A1) - 2)) -1 ),
", ",LEFT(RIGHT($A1,LEN($A1) - FIND("gi|",$A1,FIND("gi|", $A1)+1)
- 2),FIND("|",RIGHT($A1,LEN($A1) - FIND("gi|",$A1,FIND("gi|", $A1)+1) - 2))
-1 )),LEFT(RIGHT($A1,LEN($A1) - FIND("gi|",$A1) - 2),
FIND("|",RIGHT($A1,LEN($A1) - FIND("gi|",$A1) - 2)) -1 ))
How's that for ugly? A VBA solution may be better for you, but I'll leave this here for you.
To go up to 5 numbers, well, study the pattern and recurse manually in the formula. IT will get long!

I'd probably split the data first on the | delimiter using the convert text to columns wizard.
In Excel 2007 that is on the Data tab, Data Tools group and then choose Text to Columns. Specify Other: and | as the delimiter.
From the sample data you posted it looks like after you do this the numbers will all be in the same columns so you could then just delete the columns you don't want.

As the other guys presented the solution without VBA... I'll present the one that does use. Now, is your call to use it or no.
Just saw that #Issun presented the solution with regex, very nice! Either way, will present a 'modest' solution for the question, using only 'plain' VBA.
Option Explicit
Option Base 0
Sub findGi()
Dim oCell As Excel.Range
Set oCell = Sheets(1).Range("A1")
'Loops through every row until empty cell
While Not oCell.Value = ""
oCell.Offset(0, 1).Value2 = GetGi(oCell.Value)
Set oCell = oCell.Offset(1, 0)
Wend
End Sub
Private Function GetGi(ByVal sValue As String) As String
Dim sResult As String
Dim vArray As Variant
Dim vItem As Variant
Dim iCount As Integer
vArray = Split(sValue, "|")
iCount = 0
'Loops through the array...
For Each vItem In vArray
'Searches for the 'Gi' factor...
If vItem Like "*gi" And UBound(vArray) > iCount + 1 Then
'Concatenates the results...
sResult = sResult & vArray(iCount + 1) & ","
End If
iCount = iCount + 1
Next vItem
'And removes trail comma
If Len(sResult) > 0 Then
sResult = Left(sResult, Len(sResult) - 1)
End If
GetGi = sResult
End Function

open your excel in Google Sheets and use the regular expression with REGEXEXTRACT
Sample Usage
=REGEXEXTRACT("My favorite number is 241, but my friend's is 17", "\d+")
Tip: REGEXEXTRACT will return 241 in this example because it returns the first matching case.
In your case
=REGEXEXTRACT(A1,"gi[|](\d+)[|]")

Related

How to check for the same delimiter multiple times?

I have an Excel sheet that contains strings and numbers. All the strings I am searching for have an underscore ("_"), which is my delimiter. However, some strings have the delimiter more than once.
For example:
text_in_00
text_in_01
text_out_00
text_out_01
Other strings with just one delimiter work beautifully. But here, with two delimiters, "in" and "out" are not being differentiated, due to the delimiter only being found once. How do I find EACH delimiter in a given string?
My goal with this code is to differentiate between ranges and copy and paste these different ranges into their own individual worksheets. Also, I cannot hard-code any cells or strings, as the string names are subject to change, as well as the size of the ranges.
My code:
'Dim arr As Variant
Dim i As Long
Dim filterRange As Range
Dim delimiterItem As String 'was variant
Dim a As Range
delimiterItem = "_"
Set filterRange = FindAll(Worksheets(newSheetName).UsedRange)
For i = filterRange.Rows.Count To 2 Step -1
'arr = Split(Cells(i, 1), delimiterItem)
'For j = LBound(arr) To UBound(arr)
If Split(filterRange.Cells(i, 1).Text, delimiterItem)(0) <> Split(filterRange.Cells(i - 1, 1).Text, delimiterItem)(0) Then
Range(filterRange.Cells(i, 1).EntireRow, filterRange.Cells(i, 1).EntireRow).Insert
End If
'Next j
Next i
Note: FindAll is another function in my code that finds the values I need to be looking at. Some strings don't contain any underscores ("_"), which are values I don't need. This function just filters out what I don't need and works great. I am focusing on the portion of code below the line: Set filterRange = FindAll(Worksheets(newSheetName).UsedRange))
Note: The commented out code was something I was trying, but gave the same result.
TLDR; How do I check for each instance of the delimiter? Thank you in advance for the help.
Use the following function to get a count of how many times Char appears in your string and then use a select case construct do do whatever, based on the count.
Public Function CountChars(ByVal Source As String, ByVal Char As String) As Long
CountChars = Len(Source) - Len(Replace(Source, Char, vbNullString))
End Function
Make a function that returns the Nth index of a substring inside another:
Public Function NthIndexOf(ByVal needle As String, ByVal haystack As String, ByVal n As Long) As Long
Dim currentN As Long
Dim currentIndex As Long
Do
currentIndex = InStr(currentIndex + 1, haystack, needle, vbTextCompare)
currentN = currentN + 1
Loop Until currentIndex = Len(haystack) Or currentN = n Or currentIndex = 0
NthIndexOf = currentIndex
End Function
Now you can get the NthIndexOf("_", "text_in_00", 2) and get 8. If you tried to get the 3rd index of "_", the output would be 0.
If you want the substring between each "delimiter", then you need to Split and then iterate the array. It's unclear what you intend to do with each substring though, but you should have all the tools you need to do whatever it is that you're doing now.
delimiterItem = "_"
Set filterRange = FindAll(Worksheets(newSheetName).UsedRange)
For i = filterRange.Rows.Count To 2 Step -1
If Split(InStrRev(filterRange.Cells(i, 1).Text, delimiterItem))(0) <> Split(InStrRev(filterRange.Cells(i - 1, 1).Text, delimiterItem))(0) Then
Range(filterRange.Cells(i, 1).EntireRow, filterRange.Cells(i, 1).EntireRow).Insert
End If
Next i

Print Up to 300 Strings of Arrays to PDF Based on a Calculated Value

I need to print a string of arrays dependent on a difference of two values on my input page to separate sheets within the same PDF but I have been running into a few issues.
Based on the difference of two cells, the function will determine which arrays to print.
There are two possible solutions I have thought of but have been unsuccessful attempting both.
Indirectly reference a string of arrays in a cell to print such as "abc,bcd,cde,def,efg..."
(As Shown Below) Use conditional if-then functions to invoke the array based on the difference in these two cells
Primary Goals
Print into a single PDF
Determine specific arrays to print depending on the difference in two values contained in a cell on my input page
Allow for PageSetup values (have this figured out)
I am using MSFT 365. I tried initially using an indirect array reference to a cell with a variable value string including the arrays to be included without success.
Next, I tried to hardcode for all 100 possible values for this difference but in that case, I am running into line limits and errors associated with using _ to continue the array function on another line.
If the difference value equals 3, it is shown as below. If the difference value equals 4, you would add another array line including "schedule05","report05","p&l05"
Option Explicit
Sub PrintTest()
'if a certain difference value, use
If (Worksheets("Inputs").Range("D7") - Worksheets("Inputs").Range("D6")) = "3" Then
Dim pageArray As Variant
'set array for given difference
pageArray = Array("schedule01", "report01", "p&l01", _
"schedule02", "report02", "p&l02", _
"schedule03", "report03", "p&l03", _
"schedule04", "report04", "p&l04")
Worksheets("data").Activate
Worksheets("data").PageSetup.CenterHorizontally = True
'page setup values
With ActiveSheet.PageSetup
.FitToPagesWide = 1
.FitToPagesTall = 1
.Orientation = xlLandscape
End With
'call array for print
Worksheets("data").Range("pageArray").PrintOut
Elseif
'Here is where I could put another similar function for a difference of 4
'......
Else
'Here is where I could put another similar function for a difference of x
End If
End Sub
I expected this would get me a PDF where each of these arrays is printed on a separate sheet and will print a selection of arrays based on the difference value.
To expand on my comment, it would look like this:
Dim lDiff As Long
Dim pageArray As Variant
Dim sFormat As String
Dim i As Long, j As Long
'if a certain difference value, use
lDiff = Worksheets("Inputs").Range("D7").Value - Worksheets("Inputs").Range("D6").Value
ReDim pageArray(1 To (lDiff + 1) * 3)
For i = 1 To UBound(pageArray, 1) Step 3
j = j + 1
If j < 100 Then sFormat = "00" Else sFormat = "000"
pageArray(i) = "schedule" & Format(j, sFormat)
pageArray(i + 1) = "report" & Format(j, sFormat)
pageArray(i + 2) = "p&l" & Format(j, sFormat)
MsgBox pageArray(i)
Next i

How can I remove text duplicates in a cell string?

So imagine there is the following string in a cell in excel:
A1 = "Company 1 Company 2 Company 1 Company 2 Company 3"
and the desired result for now is removing the duplicates:
A1 = "Company 1 Company 2 Company 3" (I imagine that this one doesn't require a macro)
the ideal one would be to put the distinct values in different cells in a vertical way:
A1 = "Company 1"
A2 = "Company 2"
A3 = "Company 3"
(which would require definitely programming but since I never used vba i'm not experienced enough I think to elaborate such code)
Is it feasible?
EDIT: the delimiter can be changed from a space " " to other, for example, a semicolon ";" to prevent errors and to be easier to solve this one.
Asumption is you have a delimiter between the strings to tell apart you could use the following code
Option Explicit
Sub RemoveDuplicates()
Const SEPARATOR = ","
Dim vDat As Variant
vDat = Split(Range("A1"), SEPARATOR)
' remove trailing blanks if necessary
Dim i As Long
For i = LBound(vDat) To UBound(vDat)
vDat(i) = Trim(vDat(i))
Next i
Dim dic As Object
Set dic = CreateObject("Scripting.Dictionary")
Dim vItem As Variant
For Each vItem In vDat
If Not dic.Exists(vItem) Then
dic.Add vItem, vItem
End If
Next
vDat = dic.Keys
' Write data to column B
Range("B1").Resize(UBound(vDat) + 1) = WorksheetFunction.Transpose(vDat)
'Debug.Print Join(vDat, SEPARATOR)
End Sub
Tested with the following data
A1 = Company 1, Company 2, Company 1, Company 2 , Company 3
or
A1 = IBM, Apple, Microsoft, Apple , IBM
With an unambiguous string, and by that I mean:
delimiter not included in the substrings, OR
each entry surrounded by double quotes
you can use Power Query in Excel 2010, 2013 or Data Get & Transform in Excel 2016, to do all of that.
Split the cell on the delimiter
Define the quote mark as the text qualifier if necessary
Rows - remove duplicates
So with data like:
Company 1;Company 2;Company 1;Company 2;Company 3
or (space delimiter)
"Company 1" "Company 2" "Company 1" "Company 2" "Company 3"
you can easily accomplish what you require without using VBA.
And if, as in your examples, there are extraneous spaces at the beginning or end of the data, Power Query has a Text.Trim function that will be useful.
Alternate solution using UDF (commented for clarity):
Public Function UNIQUELIST(ByVal arg_vOriginalList As String, ByVal arg_sDelimiter As String, ByVal arg_lReturnIndex As Long) As Variant
Dim oDict As Object
Dim vElement As Variant
Dim i As Long
'Use a dictionary to extract unique elements
Set oDict = CreateObject("Scripting.Dictionary")
i = 0 'This is a counter to keep track until we reach the appropriate return index
'Loop through each element
For Each vElement In Split(arg_vOriginalList, arg_sDelimiter)
'Check the trimmed, lowercase element against the keys of the dictionary
If Not oDict.Exists(LCase(Trim(vElement))) Then
'Unique element found
i = i + 1
If i = arg_lReturnIndex Then
'Found appropriate unique element, output and exit function
UNIQUELIST = Trim(vElement)
Exit Function
End If
'Not correct return index, add element to dictionary
'Lowercase the key (so uniques aren't case sensitive) and trim both the key and the value
oDict.Add LCase(Trim(vElement)), Trim(vElement)
End If
Next vElement
'arg_lReturnIndex was less than 1 or greater than the number of unique values, return blank
UNIQUELIST = vbNullString
End Function
Then in a cell where you want the output to start (for example, B1), put this formula and copy down (adjust the "," to be the correct delimiter):
=UNIQUELIST($A$1,",",ROW(A1))
Approach using same delimiters as in OP
I assume the same space delimiters as in your original post: As you want to get your company strings in groups of two, I slightly modified the good solution of #Storax by concatenating the Split result in steps of 2 first and demonstrate a shorter way to write results back to sheet (cf. section [5]).
Example Code
Option Explicit ' declaration head of your code module
Sub SplitCompanies()
' [0] declare variables and set objects
Dim v, vItem
Dim i As Integer, n As Integer
Dim s, str As String
Dim ws As Worksheet
Set ws = ThisWorkbook.Worksheets("Tabelle5") ' << change to your sheet name
Dim dict As Object ' late binding of dictionary
Set dict = CreateObject("Scripting.Dictionary")
' [1] get cell value and split it (space delimited as in Original Post)
str = ws.Range("A1") ' cell value, e.g. "Company 1 Company 2 Company 1 Company 2 Company 3"
s = Split(str, " ") ' split cell value (space delimiter)
' [2] count all companies and redimension helper array
n = Int((UBound(s) + 1) / 2) - 1 ' items counter equals 1/2 of split items
ReDim v(0 To n) ' redim zero-based 1-dim helper array
' [3] concatenate partial strings in helper array
For i = 0 To n
v(i) = s(i * 2) & " " & s(i * 2 + 1)
Next i
' [4] build dictionary with unique items
For Each vItem In v
If Not dict.Exists(vItem) Then
dict.Add vItem, vItem
End If
Next
' [5] Write data to column B
ws.Range("B1:B" & dict.Count) = Application.Transpose(dict.Keys)
' [6] clear memory
Set dict = Nothing: Set ws = Nothing
End Sub

Excel Parse out a list of numbers from text (several numbers from one cell)

I need to parse out a list of tracking numbers from text in excel. The position in terms of characters will not always be the same. An example:
Location ID 987
Your package is arriving 01/01/2015
Fruit Snacks 706970554628
<http://www.fedex. com/Tracking?tracknumbers=706970554628>
Olive Oil 709970554631
<http://www.fedex. com/Tracking?tracknumbers=709970554631>
Sign 706970594642
<http://www.fedex .com/Tracking?tracknumbers=706970594642>
Thank you for shopping with us!
The chunk of text is located in one cell. I would like the results to either be 3 separate columns or rows looking like this:
706970554628 , 709970554631 , 706970594642
There will not always be the same number of tracking numbers. One cell might have six while another has one.
Thank you for any help!!
I think you'll need some VBA to do this. And it's not going to be super simple stuff. #Gary'sStudent has a great example of grabbing numbers from a big string. If you need something that is more specific to your scenario you'll have to parse the string word by word and have it figure out when it encounters a tracking number in the URL.
Something like the following will do the trick:
Function getTrackingNumber(bigMessage As String, numberPosition As Integer) As String
Dim intStrPos As Integer
Dim arrTrackNumbers() As Variant
'create a variable to hold characters we'll use to identify words
Dim strWorkSeparators As String
strWordSeparators = "()=/<>?. " & vbCrLf
'iterate through each character in the big message
For intStrPos = 1 To Len(bigMessage)
'Identify distinct words
If InStr(1, strWordSeparators, Mid(bigMessage, intStrPos, 1)) > 1 Then 'we found the start of a new word
'if foundTrackNumber is true, then this must be a tracking number. Add it to the array of tracking numbers
If foundTrackNumber Then
'keep track of how many we've found
trackNumbersFound = trackNumbersFound + 1
'redim the array in which we are holding the track numbers
ReDim Preserve arrTrackNumbers(0 To trackNumbersFound - 1)
'add the track
arrTrackNumbers(trackNumbersFound - 1) = strword
End If
'Check to see if the word that we just grabbed is "tracknumber"
If strword = "tracknumbers" Then
foundTrackNumber = True
Else
foundTrackNumber = False
End If
'set this back to nothing
strword = ""
Else
strword = strword + Mid(bigMessage, intStrPos, 1)
End If
Next intStrPos
'return the requested tracking number if it exists.
If numberPosition > UBound(arrTrackNumbers) + 1 Then
getTrackingNumber = ""
Else
getTrackingNumber = arrTrackNumbers(numberPosition - 1)
End If
End Function
This is a UDF, so you can use it in your worksheet as a formula with:
=getTrackingNumber(A1, 1)
Which will return the first tracking number it encounters in cell A1. Consequently the formula
=getTrackingNumber(A1, 2)
will return the second tracking number, and so on.
This is not going to be a speedy function though since it's parsing the big string character by character and making decisions as it goes. If you can wrangle Gary's Student's answer into something workable it'll be much faster and less CPU intensive on larger data. However, if you are getting too many results and need to go at this like a surgeon, then this should get you in the ballpark.
If tracking is always a 12 digit number, then select the cell run run this short macro:
Sub parser117()
Dim s As String, ary, i As Long
With ActiveCell
ary = Split(Replace(Replace(.Text, Chr(10), " "), Chr(13), " "), " ")
i = 1
For Each a In ary
If Len(a) = 12 And IsNumeric(a) Then
.Offset(0, i).Value = a
i = i + 1
End If
Next a
End With
End Sub

trying to use a nested Instr function in Isnumeric

I'm trying to write a program in VBA for excel that will search through a column of "names", and if that name has the case-sensitive string "CAN" within it, then the column 6 columns over will be added to a total (canadaTotal). This is what I have so far... The problem is within the instr/isnumeric portion. I'm sure I'm using one of them incorrectly.. and if anybody could offer an alternative solution, or a quick fix, I would appreciate it.
(hint... I'm not sure if i can use my "search" variable as the second input of the instr function...)
Private Sub CommandButton5_Click()
Dim i As Integer
Dim col As Integer
Dim canadaTotal As Integer
Dim search As String
Dim canadaCheck As Long
i = 1
col = 4
canadaTotal = 0
Worksheets("sheet1").Activate
While Not Worksheets("Sheet1").Cells(i, col).Value = ""
search = Cells(i, col).Value
If IsNumeric(InStr(0, search, "CAN")) Then
canadaTotal = canadaTotal + Cells(i, col).Offset(0, 6).Value
End If
i = i + 1
Wend
MsgBox (canadaTotal)
End Sub
The problem you are having is that the Instr function starts with position 1, not position 0.
Also, Instr returns 0 if the string is not found, not a non-numeric value, so your test will always be true.
Additionally, the default for Instr is that it will not search case sensitive. In order to search case sensitive, you need to use the last "compare" parameter and set it to vbBinaryCompare.
Change this to:
If Instr(1, search, "CAN", vbBinaryCompare) <> 0 Then
and it should work.

Resources