How to check for the same delimiter multiple times? - excel

I have an Excel sheet that contains strings and numbers. All the strings I am searching for have an underscore ("_"), which is my delimiter. However, some strings have the delimiter more than once.
For example:
text_in_00
text_in_01
text_out_00
text_out_01
Other strings with just one delimiter work beautifully. But here, with two delimiters, "in" and "out" are not being differentiated, due to the delimiter only being found once. How do I find EACH delimiter in a given string?
My goal with this code is to differentiate between ranges and copy and paste these different ranges into their own individual worksheets. Also, I cannot hard-code any cells or strings, as the string names are subject to change, as well as the size of the ranges.
My code:
'Dim arr As Variant
Dim i As Long
Dim filterRange As Range
Dim delimiterItem As String 'was variant
Dim a As Range
delimiterItem = "_"
Set filterRange = FindAll(Worksheets(newSheetName).UsedRange)
For i = filterRange.Rows.Count To 2 Step -1
'arr = Split(Cells(i, 1), delimiterItem)
'For j = LBound(arr) To UBound(arr)
If Split(filterRange.Cells(i, 1).Text, delimiterItem)(0) <> Split(filterRange.Cells(i - 1, 1).Text, delimiterItem)(0) Then
Range(filterRange.Cells(i, 1).EntireRow, filterRange.Cells(i, 1).EntireRow).Insert
End If
'Next j
Next i
Note: FindAll is another function in my code that finds the values I need to be looking at. Some strings don't contain any underscores ("_"), which are values I don't need. This function just filters out what I don't need and works great. I am focusing on the portion of code below the line: Set filterRange = FindAll(Worksheets(newSheetName).UsedRange))
Note: The commented out code was something I was trying, but gave the same result.
TLDR; How do I check for each instance of the delimiter? Thank you in advance for the help.

Use the following function to get a count of how many times Char appears in your string and then use a select case construct do do whatever, based on the count.
Public Function CountChars(ByVal Source As String, ByVal Char As String) As Long
CountChars = Len(Source) - Len(Replace(Source, Char, vbNullString))
End Function

Make a function that returns the Nth index of a substring inside another:
Public Function NthIndexOf(ByVal needle As String, ByVal haystack As String, ByVal n As Long) As Long
Dim currentN As Long
Dim currentIndex As Long
Do
currentIndex = InStr(currentIndex + 1, haystack, needle, vbTextCompare)
currentN = currentN + 1
Loop Until currentIndex = Len(haystack) Or currentN = n Or currentIndex = 0
NthIndexOf = currentIndex
End Function
Now you can get the NthIndexOf("_", "text_in_00", 2) and get 8. If you tried to get the 3rd index of "_", the output would be 0.
If you want the substring between each "delimiter", then you need to Split and then iterate the array. It's unclear what you intend to do with each substring though, but you should have all the tools you need to do whatever it is that you're doing now.

delimiterItem = "_"
Set filterRange = FindAll(Worksheets(newSheetName).UsedRange)
For i = filterRange.Rows.Count To 2 Step -1
If Split(InStrRev(filterRange.Cells(i, 1).Text, delimiterItem))(0) <> Split(InStrRev(filterRange.Cells(i - 1, 1).Text, delimiterItem))(0) Then
Range(filterRange.Cells(i, 1).EntireRow, filterRange.Cells(i, 1).EntireRow).Insert
End If
Next i

Related

Counting the matching substrings in range

I am working on a workbook in which I need to count how many times the "St/" substring is present in a Range (Column Q). Note: I am interested in all the occurrences, not just the number of cells in which the substring is present.
Here is the code I am trying to work with (based on the comment of Santhosh Divakar - https://stackoverflow.com/a/23357807/12536295), but I receive a runtime error (13) when running it. What am I missing / doing wrong?
Dim lastrow, q as Integer
lastrow = Range("A1").End(xlToRight).End(xlDown).Row
With Application
q = .SumProduct((Len(Range("Q1:Q" & lastrow)) - Len(.Substitute(Range("Q1:Q" & lastrow), "St/", ""))) / Len("St/"))
End With
See if the code below helps you:
Public Sub TestCount()
lastrow = Range("Q" & Rows.Count).End(xlUp).Row
strformula = "=SUMPRODUCT(LEN(Q1:Q" & lastrow & ")-LEN(SUBSTITUTE(UPPER(Q1:Q" & lastrow & "),""/ST"","""")))/LEN(""/St"")"
MsgBox Evaluate(strformula)
End Sub
I think you can count the number of characters, replace your "St/" with nothing and then count the characters again and divide by len("St/"). Here's an example.
'''your existing code
Dim lCount As Long
Dim lCount_After As Long
'''set a Range to column Q
Set oRng = Range("Q1:Q" & lRow_last)
'''turn that range into a string
sValues = CStr(Join(Application.Transpose(oRng.Value2)))
lCount = Len(sValues)
lCount_After = lCount - Len(Replace(sValues, "St/", ""))
lCount_After = lCount_After / 3
Debug.Print lCount_After
Using ArrayToText() function
a) If you dispose of Excel version MS365 you can shorten a prior string building by evaluating the tabular ARRAYTOTEXT()
formula to get a joined string of all rows at once (complementing #Foxfire 's valid solution).
Note that it's necessary to insert the range address as string;
in order to fully qualify the range reference I use an additional External:=True argument.
b) VBA's Split() function eventually allows to return the number of found delimiters (e.g. "St/") via
UBound() function. It returns the upper boundary (i.e. the largest available subscript) for this
zero-based 1-dimensional split array.
Example: If there exist eight St/ delimiters, the split array consists
of nine elements; as it is zero-based the first element has index 0
and the last element gets identified by 8 which is already the wanted function result.
Function CountOccurrencies(rng As Range, Optional delim as String = "St/")
'a) get a final string (avoiding to join cells per row)
Dim txt As String
txt = Evaluate("ArrayToText(" & rng.Address(False, False, External:=True) & ")")
'b) get number of delimiters
CountOccurrencies = UBound(Split(txt, delim))
End Function
Not the cleanest one, but you can take all into arrays and split by St/. Size of that array would be how many coincidences you got:
Sub test()
Dim LR As Long
Dim MyText() As String
Dim i As Long
Dim q As Long
LR = Range("Q" & Rows.Count).End(xlUp).Row
ReDim Preserve MyText(1 To LR) As String
For i = 1 To LR Step 1
MyText(i) = Range("Q" & i).Value
Next i
q = UBound(Split(Join(MyText, ""), "St/"))
Debug.Print q
Erase MyText
End Sub
The output i get is 8
Please, note this code is case sensitive.
The TextJoin() function in Excel 2019+ is used:
Sub CalcSt()
Const WHAT = "St/": Dim joined As String
joined = WorksheetFunction.TextJoin("|", True, Columns("Q"))
Debug.Print (Len(joined) - Len(Replace(joined, WHAT, ""))) / Len(WHAT)
End Sub

InStr will not find dots in some cases

I have strings that consist of leading dots followed by a number (for example "..2" or "....4". I want to delete all leading dots and convert the string into a long variable.
So I have written a function that finds leading dots in strings and deletes them. For some reason, the function works for a string like "..2" but will not work for "...3". The InStr function will not find "." in "...3".
The strings are read out from a column in a worksheet. They are not formatted in any weird way, I have tried just typing them in manually in a new worksheet without any changes to the default formatting settings, same results.
So I have tried several things. I beleive there must be some error involving character encodings, I cannot figure out how to solve this problem though.
I have tried using a recursive function using InStr to delete the dots and then tried the split function with "." as the delimiter to test my assumption. Split has the same problem, works for "..2" but will not work for "...3".
When I debug print the strings that I read out, "...3" seems to be formatted differently than "..2" or ".1". I do not know why.
here you can see the difference in the formatting
Sub Gruppieren()
'read out strings first
'then try to delete the dots
Dim strArr() As String
Dim lngArr() As Long
Dim lLastRow As Long
Dim i As Long
lLastRow = getFirstEmptyRow("A", Tabelle1.Index)
ReDim strArr(1 To lLastRow)
ReDim lngArr(1 To lLastRow)
For i = 1 To UBound(strArr)
strArr(i) = Worksheets(1).Cells(i, 1).Value
Debug.Print strArr(i)
strArr(i) = clearLeadingDots(strArr(i))
'strArr(i) = splitMeIfYouCan(strArr(i))
If IsNumeric(strArr(i)) = True Then
lngArr(i) = CLng((strArr(i)))
Debug.Print lngArr(i)
End If
Next i
End Sub
'The functions:
Function clearLeadingDots(myText As String) As String
Dim i As Long
i = InStr(myText, ".")
If i <> 0 Then
myText = Right(CStr(myText), Len(myText) - i)
clearLeadingDots = clearLeadingDots(CStr(myText))
Else
clearLeadingDots = CStr(myText)
Exit Function
End If
End Function
Function splitMeIfYouCan(myText As String) As String
Dim myArr() As String
Dim i As Long
myArr = Split(myText, ".")
splitMeIfYouCan = myArr(UBound(myArr))
End Function
Edit: The answer was, that three dots were converted into an ellipsis automatically, searching for and eliminating Chr(133) did the job.

extract multiple expressions

I have a cell that contains usernames assigned to projects like this
,FC757_random_name,AP372_another_one,FC782_again_different,FC082_samesamebutdifferent,
I need to only extract the alphanumeric values the expressions start with, so everything in between , and _.
I made it work for one expression with the following, but I need all of them.
= MID(A1;FIND(",";A1)+1;FIND("_";A1)-FIND(",";A1)-1)
I also tinkered with Text to Data, but couldn't make it work for multiple lines at once.
Ideally this would work only with formulas, but I guess (/fear) I'll need VBA or Macros, which I have never worked with before.
All help will be appreciated!
Here is a regex based User Defined Function.
Option Explicit
Function extractMultipleExpressions(str As String, _
Optional delim As String = ", ")
Dim n As Long, nums() As Variant
Static rgx As Object, cmat As Object
'with rgx as static, it only has to be created once; beneficial when filling a long column with this UDF
If rgx Is Nothing Then
Set rgx = CreateObject("VBScript.RegExp")
End If
extractMultipleExpressions = vbNullString
With rgx
.Global = True
.MultiLine = False
.Pattern = "[A-Z]{2}[0-9]{3}"
If .Test(str) Then
Set cmat = .Execute(str)
'resize the nums array to accept the matches
ReDim nums(cmat.Count - 1)
'populate the nums array with the matches
For n = LBound(nums) To UBound(nums)
nums(n) = cmat.Item(n)
Next n
'convert the nums array to a delimited string
extractMultipleExpressions = Join(nums, delim)
End If
End With
End Function
I believe you are looking for something like this Press Alt + F11 and then choose Insert > Module and then paste the following code:
Public Function GetUsers(UserNameProject As String)
Dim userArray() As String
Dim users As String
Dim intPos As Integer
'this will split the users into an array based on the commas
userArray = Split(UserNameProject, ",")
'loop through the array and process any non blank element and extract user
'based on the position of the first underscore
For i = LBound(userArray) To UBound(userArray)
If Len(Trim(userArray(i))) > 0 Then
intPos = InStr(1, userArray(i), "_")
users = users & "," & Left(userArray(i), intPos - 1)
End If
Next
GetUsers = users
End Function
If your string is in A1 then use by putting =GetUsers(A1) in the approiate cell. I think this should get you started!
To clean the data of extra commas, use this formula in cell B1:
=TRIM(SUBSTITUTE(SUBSTITUTE(TRIM(SUBSTITUTE(SUBSTITUTE(A1;" ";"||");";";" "));" ";";");"||";" "))
Then use this formula in cell C1 and copy over and down to extract just the part you want from each section:
=IFERROR(INDEX(TRIM(LEFT(SUBSTITUTE(TRIM(MID(SUBSTITUTE($B1;";";REPT(" ";LEN($B1)));LEN($B1)*(ROW($A$1:INDEX($A:$A;LEN($B1)-LEN(SUBSTITUTE($B1;";";""))+1))-1)+1;LEN($B1)));"_";REPT(" ";LEN($B1)));LEN($B1)));COLUMN(A1));"")

How to extract text within a string of text

I have a simple problem that I'm hoping to resolve without using VBA but if that's the only way it can be solved, so be it.
I have a file with multiple rows (all one column). Each row has data that looks something like this:
1 7.82E-13 >gi|297848936|ref|XP_00| 4-hydroxide gi|297338191|gb|23343|randomrandom
2 5.09E-09 >gi|168010496|ref|xp_00| 2-pyruvate
etc...
What I want is some way to extract the string of numbers that begin with "gi|" and end with a "|". For some rows this might mean as many as 5 gi numbers, for others it'll just be one.
What I would hope the output would look like would be something like:
297848936,297338191
168010496
etc...
Here is a very flexible VBA answer using the regex object. What the function does is extract every single sub-group match it finds (stuff inside the parenthesis), separated by whatever string you want (default is ", "). You can find info on regular expressions here: http://www.regular-expressions.info/
You would call it like this, assuming that first string is in A1:
=RegexExtract(A1,"gi[|](\d+)[|]")
Since this looks for all occurance of "gi|" followed by a series of numbers and then another "|", for the first line in your question, this would give you this result:
297848936, 297338191
Just run this down the column and you're all done!
Function RegexExtract(ByVal text As String, _
ByVal extract_what As String, _
Optional separator As String = ", ") As String
Dim allMatches As Object
Dim RE As Object
Set RE = CreateObject("vbscript.regexp")
Dim i As Long, j As Long
Dim result As String
RE.pattern = extract_what
RE.Global = True
Set allMatches = RE.Execute(text)
For i = 0 To allMatches.count - 1
For j = 0 To allMatches.Item(i).submatches.count - 1
result = result & (separator & allMatches.Item(i).submatches.Item(j))
Next
Next
If Len(result) <> 0 Then
result = Right$(result, Len(result) - Len(separator))
End If
RegexExtract = result
End Function
Here it is (assuming data is in column A)
=VALUE(LEFT(RIGHT(A1,LEN(A1) - FIND("gi|",A1) - 2),
FIND("|",RIGHT(A1,LEN(A1) - FIND("gi|",A1) - 2)) -1 ))
Not the nicest formula, but it will work to extract the number.
I just noticed since you have two values per row with output separated by commas. You will need to check if there is a second match, third match etc. to make it work for multiple numbers per cell.
In reference to your exact sample (assuming 2 values maximum per cell) the following code will work:
=IF(ISNUMBER(FIND("gi|",$A1,FIND("gi|", $A1)+1)),CONCATENATE(LEFT(RIGHT($A1,LEN($A1)
- FIND("gi|",$A1) - 2),FIND("|",RIGHT($A1,LEN($A1) - FIND("gi|",$A1) - 2)) -1 ),
", ",LEFT(RIGHT($A1,LEN($A1) - FIND("gi|",$A1,FIND("gi|", $A1)+1)
- 2),FIND("|",RIGHT($A1,LEN($A1) - FIND("gi|",$A1,FIND("gi|", $A1)+1) - 2))
-1 )),LEFT(RIGHT($A1,LEN($A1) - FIND("gi|",$A1) - 2),
FIND("|",RIGHT($A1,LEN($A1) - FIND("gi|",$A1) - 2)) -1 ))
How's that for ugly? A VBA solution may be better for you, but I'll leave this here for you.
To go up to 5 numbers, well, study the pattern and recurse manually in the formula. IT will get long!
I'd probably split the data first on the | delimiter using the convert text to columns wizard.
In Excel 2007 that is on the Data tab, Data Tools group and then choose Text to Columns. Specify Other: and | as the delimiter.
From the sample data you posted it looks like after you do this the numbers will all be in the same columns so you could then just delete the columns you don't want.
As the other guys presented the solution without VBA... I'll present the one that does use. Now, is your call to use it or no.
Just saw that #Issun presented the solution with regex, very nice! Either way, will present a 'modest' solution for the question, using only 'plain' VBA.
Option Explicit
Option Base 0
Sub findGi()
Dim oCell As Excel.Range
Set oCell = Sheets(1).Range("A1")
'Loops through every row until empty cell
While Not oCell.Value = ""
oCell.Offset(0, 1).Value2 = GetGi(oCell.Value)
Set oCell = oCell.Offset(1, 0)
Wend
End Sub
Private Function GetGi(ByVal sValue As String) As String
Dim sResult As String
Dim vArray As Variant
Dim vItem As Variant
Dim iCount As Integer
vArray = Split(sValue, "|")
iCount = 0
'Loops through the array...
For Each vItem In vArray
'Searches for the 'Gi' factor...
If vItem Like "*gi" And UBound(vArray) > iCount + 1 Then
'Concatenates the results...
sResult = sResult & vArray(iCount + 1) & ","
End If
iCount = iCount + 1
Next vItem
'And removes trail comma
If Len(sResult) > 0 Then
sResult = Left(sResult, Len(sResult) - 1)
End If
GetGi = sResult
End Function
open your excel in Google Sheets and use the regular expression with REGEXEXTRACT
Sample Usage
=REGEXEXTRACT("My favorite number is 241, but my friend's is 17", "\d+")
Tip: REGEXEXTRACT will return 241 in this example because it returns the first matching case.
In your case
=REGEXEXTRACT(A1,"gi[|](\d+)[|]")

excel vlookup with multiple results

I am trying to use a vlookup or similar function to search a worksheet, match account numbers, then return a specified value. My problem is there are duplicate account numbers and I would like the result to concatenate the results into one string.
Acct No CropType
------- ---------
0001 Grain
0001 OilSeed
0001 Hay
0002 Grain
Is in the first worksheet, on the 2nd worksheet I have the Acct No with other information and I need to get all the matching results into one column on the 2nd worksheet ie. "Grain Oilseed Hay"
Here is a function that will do it for you. It's a little different from Vlookup in that you will only give it the search column, not the whole range, then as the third parameter you will tell it how many columns to go left (negative numbers) or right (positive) in order to get your return value.
I also added the option to use a seperator, in your case you will use " ". Here is the function call for you, assuming the first row with Acct No. is A and the results is row B:
=vlookupall("0001", A:A, 1, " ")
Here is the function:
Function VLookupAll(ByVal lookup_value As String, _
ByVal lookup_column As range, _
ByVal return_value_column As Long, _
Optional seperator As String = ", ") As String
Dim i As Long
Dim result As String
For i = 1 To lookup_column.Rows.count
If Len(lookup_column(i, 1).text) <> 0 Then
If lookup_column(i, 1).text = lookup_value Then
result = result & (lookup_column(i).offset(0, return_value_column).text & seperator)
End If
End If
Next
If Len(result) <> 0 Then
result = Left(result, Len(result) - Len(seperator))
End If
VLookupAll = result
End Function
Notes:
I made ", " the default seperator for results if you don't enter one.
If there is one or more hits, I added some checking at the end to
make sure the string doesn't end with an extra seperator.
I've used A:A as the range since I don't know your range, but
obviously it's faster if you enter the actual range.
One way to do this would be to use an array formula to populate all of the matches into a hidden column and then concatenate those values into your string for display:
=IFERROR(INDEX(cropTypeValues,SMALL(IF(accLookup=accNumValues,ROW(accNumValues)-MIN(ROW(accNumValues))+1,""),ROW(A1))),"")
cropTypeValues: Named range holding the list of your crop types.
accLookup: Named range holding the account number to lookup.
accNumValues: Named range holding the list of your account
numbers.
Enter as an array formula (Ctrl+Shift+Enter) and then copy down as far as necessary.
Let me know if you need any part of the formula explaining.
I've just had a similar problem and I have looked up similar solutions for a long time, nothing really convinced me though. Either you had to write a macro, or some special function, while yet, for my needs the easiest solution is to use a pivot table in e.g. Excel.
If you create a new pivot table from your data and first add "Acct No" as row label and then add "CropType" as RowLabel you will have a very nice grouping that lists for each account all the crop types. It won't do that in a single cell though.
Here is my code which even better than an excel vlookup because you can choose to criterie colum, and for sure a separator (Carriege return too)...
Function Lookup_concat(source As String, tableau As Range, separator As String, colSRC As Integer, colDST As Integer) As String
Dim i, y As Integer
Dim result As String
If separator = "CRLF" Then
separator = Chr(10)
End If
y = tableau.Rows.Count
result = ""
For i = 1 To y
If (tableau.Cells(i, colSRC) = source) Then
If result = "" Then
result = tableau.Cells(i, colDST)
Else
result = result & separator & tableau.Cells(i, colDST)
End If
End If
Next
Lookup_concat = result
End Function
And a gift, you can make also a lookup on multiple element of the same cell (based on the same separator). Really usefull
Function Concat_Lookup(source As String, tableau As Range, separator As String, colSRC As Integer, colDST As Integer) As String
Dim i, y As Integer
Dim result As String
Dim Splitted As Variant
If separator = "CRLF" Then
separator = Chr(10)
End If
Splitted = split(source, separator)
y = tableau.Rows.Count
result = ""
For i = 1 To y
For Each word In Splitted
If (tableau.Cells(i, colSRC) = word) Then
If result = "" Then
result = tableau.Cells(i, colDST)
Else
Dim Splitted1 As Variant
Splitted1 = split(result, separator)
If IsInArray(tableau.Cells(i, colDST), Splitted1) = False Then
result = result & separator & tableau.Cells(i, colDST)
End If
End If
End If
Next
Next
Concat_Lookup = result
End Function
Previous sub needs this function
Function IsInArray(stringToBeFound As String, arr As Variant) As Boolean
IsInArray = (UBound(Filter(arr, stringToBeFound)) > -1)
End Function
Function VLookupAll(vValue, rngAll As Range, iCol As Integer, Optional sSep As String = ", ")
Dim rCell As Range
Dim rng As Range
On Error GoTo ErrHandler
Set rng = Intersect(rngAll, rngAll.Columns(1))
For Each rCell In rng
If rCell.Value = vValue Then
VLookupAll = VLookupAll & sSep & rCell.Offset(0, iCol - 1).Value
End If
Next rCell
If VLookupAll = "" Then
VLookupAll = CVErr(xlErrNA)
Else
VLookupAll = Right(VLookupAll, Len(VLookupAll) - Len(sSep))
End If
ErrHandler:
If Err.Number <> 0 Then VLookupAll = CVErr(xlErrValue)
End Function
Use like this:
=VLookupAll(K1, A1:C25, 3)
to look up all occurrences of the value of K1 in the range A1:A25 and to return the corresponding values from column C, separated by commas.
If you want to sum values, you can use SUMIF, for example
=SUMIF(A1:A25, K1, C1:C25)
to sum the values in C1:C25 where the corresponding values in column A equal the value of K1.
ALL D BEST.

Resources