I have strings that consist of leading dots followed by a number (for example "..2" or "....4". I want to delete all leading dots and convert the string into a long variable.
So I have written a function that finds leading dots in strings and deletes them. For some reason, the function works for a string like "..2" but will not work for "...3". The InStr function will not find "." in "...3".
The strings are read out from a column in a worksheet. They are not formatted in any weird way, I have tried just typing them in manually in a new worksheet without any changes to the default formatting settings, same results.
So I have tried several things. I beleive there must be some error involving character encodings, I cannot figure out how to solve this problem though.
I have tried using a recursive function using InStr to delete the dots and then tried the split function with "." as the delimiter to test my assumption. Split has the same problem, works for "..2" but will not work for "...3".
When I debug print the strings that I read out, "...3" seems to be formatted differently than "..2" or ".1". I do not know why.
here you can see the difference in the formatting
Sub Gruppieren()
'read out strings first
'then try to delete the dots
Dim strArr() As String
Dim lngArr() As Long
Dim lLastRow As Long
Dim i As Long
lLastRow = getFirstEmptyRow("A", Tabelle1.Index)
ReDim strArr(1 To lLastRow)
ReDim lngArr(1 To lLastRow)
For i = 1 To UBound(strArr)
strArr(i) = Worksheets(1).Cells(i, 1).Value
Debug.Print strArr(i)
strArr(i) = clearLeadingDots(strArr(i))
'strArr(i) = splitMeIfYouCan(strArr(i))
If IsNumeric(strArr(i)) = True Then
lngArr(i) = CLng((strArr(i)))
Debug.Print lngArr(i)
End If
Next i
End Sub
'The functions:
Function clearLeadingDots(myText As String) As String
Dim i As Long
i = InStr(myText, ".")
If i <> 0 Then
myText = Right(CStr(myText), Len(myText) - i)
clearLeadingDots = clearLeadingDots(CStr(myText))
Else
clearLeadingDots = CStr(myText)
Exit Function
End If
End Function
Function splitMeIfYouCan(myText As String) As String
Dim myArr() As String
Dim i As Long
myArr = Split(myText, ".")
splitMeIfYouCan = myArr(UBound(myArr))
End Function
Edit: The answer was, that three dots were converted into an ellipsis automatically, searching for and eliminating Chr(133) did the job.
Related
I am having problems running a VBA function on my Excel for Mac.
I want to process a series of strings to remove any duplicate characters in the strings. For example: column 1 shows the original strings while column 2 has removed any duplicate characters.
|Original String | Duplicate Characters Removed
route | route
trout | trou
eater | eatr
brass | bras
seige | seig
smelt | smelt
I found some VBA code which purports to do this however it returns #VALUE! when I run it. Code is shown below:
Function RemoveDupes1(pWorkRng As Range) As String
'Updateby Extendoffice
Dim xValue As String
Dim xChar As String
Dim xOutValue As String
Set xDic = CreateObject("Scripting.Dictionary")
xValue = pWorkRng.Value
For i = 1 To VBA.Len(xValue)
xChar = VBA.Mid(xValue, i, 1)
If xDic.Exists(xChar) Then
Else
xDic(xChar) = ""
xOutValue = xOutValue & xChar
End If
Next
RemoveDupes1 = xOutValue
End Function
I call this function by entering =RemoveDupes1(A2) in cell B2 (where A2 holds the first string in my list), however I receive #VALUE! error.
I dont know if the problem is in the VBA code (others seem to have succesfully used it, but perhaps not on a Mac) or the way I am applying it (I dont really know VBA but have succesfully applied other snippets in the past). Any advice gratefully received. TIA.
As Ron wrote in the comments, there is no Scripting.Dictionary on MAC - the Dictionary is part of the Microsoft Scripting Library (scrrun.dll) and dll's (dynamic link library) doesn't exist on a Mac.
Now using a Dictionary for duplicate checking is a good idea in general because it's very fast and easy to use, however, when it comes to simple checking if a string contains a character, it's a little overkill.
The following function will copy every character of the source string into the destination string if it is not already in - the check is done by using InStr.
I have changed the parameter type to Variant and convert the content into a string using the function CStr - with that you can call the function with nearly every data type (Range, String, even Numbers or Dates).
Function RemoveDuplicateChars(param As Variant) As String
Dim sourceString As String, i As Long
sourceString = CStr(param)
For i = 1 To Len(sourceString)
Dim xChar As String
xChar = Mid(sourceString, i, 1)
If InStr(RemoveDuplicateChars, xChar) = 0 Then
RemoveDuplicateChars = RemoveDuplicateChars & xChar
End If
Next
End Function
Update: Just for completeness, I added an optional 2nd parameter so that the function can be used case insensitiv (If the string contains an uppercase and a lowercase character, only one of them is copied)
Function RemoveDuplicateChars(param As Variant, Optional IgnoreCase As Boolean = False) As String
Dim sourceString As String, i As Long
sourceString = CStr(param)
For i = 1 To Len(sourceString)
Dim xChar As String
xChar = Mid(sourceString, i, 1)
Dim compareMethod As Long
compareMethod = IIf(IgnoreCase, vbTextCompare, vbBinaryCompare)
If InStr(1, RemoveDuplicateChars, xChar, compareMethod) = 0 Then
RemoveDuplicateChars = RemoveDuplicateChars & xChar
End If
Next
End Function
Essentially I have multiple strings within my Excel Spreadsheet that are structured the following way:
JOHN-MD-HOPKINS
REC-PW-RESIN
I would like to use the proper function but exclude the part of the string that is within the dashes (-).
The end result should look like the following:
John-MD-Hopkins
Rec-PW-Resin
Is there an excel formula that is capable of doing this?
You may need to create your own VBA function to do this, that checks if there are two hyphens in the data, and if so converts the first and last words to proper case without touching the middle word, otherwise just converts the string to proper case.
Paste the following into a module within Excel:
Function fProperCase(strData As String) As String
Dim aData() As String
aData() = Split(strData, "-")
If UBound(aData) - LBound(aData) = 2 Then ' has two hyphens in the original data
fProperCase = StrConv(aData(LBound(aData)), vbProperCase) & "-" & aData(LBound(aData) + 1) & "-" & StrConv(aData(UBound(aData)), vbProperCase)
Else ' just do a normal string conversion to proper case
fProperCase = StrConv(strData, vbProperCase)
End If
End Function
Then, in your worksheet, you can use this just as you would any built-in formula, so if "JOHN-MD-HOPKINS" is in cell A1, you would use this as a formula in another cell:
=fProperCase(A1)
Which would display John-MD-Hopkins as required.
EDITED CODE
As the requirement is to leave the second word, then this modified VBA function, which "walks" the array should work instead:
Function fProperCase2(strData As String) As String
Dim aData() As String
Dim lngLoop1 As Long
aData() = Split(strData, "-")
For lngLoop1 = LBound(aData) To UBound(aData)
If (lngLoop1 = LBound(aData) + 1) And (lngLoop1 <> UBound(aData)) Then
aData(lngLoop1) = aData(lngLoop1)
Else
aData(lngLoop1) = StrConv(aData(lngLoop1), vbProperCase)
End If
Next lngLoop1
fProperCase2 = Join(aData, "-")
End Function
It basically looks to see if the array element being dealt with is the second (lngLoop1=LBound(aData)+1) and also not the last (lngLoop1<>UBound(aData)).
Regards,
I have an Excel sheet that contains strings and numbers. All the strings I am searching for have an underscore ("_"), which is my delimiter. However, some strings have the delimiter more than once.
For example:
text_in_00
text_in_01
text_out_00
text_out_01
Other strings with just one delimiter work beautifully. But here, with two delimiters, "in" and "out" are not being differentiated, due to the delimiter only being found once. How do I find EACH delimiter in a given string?
My goal with this code is to differentiate between ranges and copy and paste these different ranges into their own individual worksheets. Also, I cannot hard-code any cells or strings, as the string names are subject to change, as well as the size of the ranges.
My code:
'Dim arr As Variant
Dim i As Long
Dim filterRange As Range
Dim delimiterItem As String 'was variant
Dim a As Range
delimiterItem = "_"
Set filterRange = FindAll(Worksheets(newSheetName).UsedRange)
For i = filterRange.Rows.Count To 2 Step -1
'arr = Split(Cells(i, 1), delimiterItem)
'For j = LBound(arr) To UBound(arr)
If Split(filterRange.Cells(i, 1).Text, delimiterItem)(0) <> Split(filterRange.Cells(i - 1, 1).Text, delimiterItem)(0) Then
Range(filterRange.Cells(i, 1).EntireRow, filterRange.Cells(i, 1).EntireRow).Insert
End If
'Next j
Next i
Note: FindAll is another function in my code that finds the values I need to be looking at. Some strings don't contain any underscores ("_"), which are values I don't need. This function just filters out what I don't need and works great. I am focusing on the portion of code below the line: Set filterRange = FindAll(Worksheets(newSheetName).UsedRange))
Note: The commented out code was something I was trying, but gave the same result.
TLDR; How do I check for each instance of the delimiter? Thank you in advance for the help.
Use the following function to get a count of how many times Char appears in your string and then use a select case construct do do whatever, based on the count.
Public Function CountChars(ByVal Source As String, ByVal Char As String) As Long
CountChars = Len(Source) - Len(Replace(Source, Char, vbNullString))
End Function
Make a function that returns the Nth index of a substring inside another:
Public Function NthIndexOf(ByVal needle As String, ByVal haystack As String, ByVal n As Long) As Long
Dim currentN As Long
Dim currentIndex As Long
Do
currentIndex = InStr(currentIndex + 1, haystack, needle, vbTextCompare)
currentN = currentN + 1
Loop Until currentIndex = Len(haystack) Or currentN = n Or currentIndex = 0
NthIndexOf = currentIndex
End Function
Now you can get the NthIndexOf("_", "text_in_00", 2) and get 8. If you tried to get the 3rd index of "_", the output would be 0.
If you want the substring between each "delimiter", then you need to Split and then iterate the array. It's unclear what you intend to do with each substring though, but you should have all the tools you need to do whatever it is that you're doing now.
delimiterItem = "_"
Set filterRange = FindAll(Worksheets(newSheetName).UsedRange)
For i = filterRange.Rows.Count To 2 Step -1
If Split(InStrRev(filterRange.Cells(i, 1).Text, delimiterItem))(0) <> Split(InStrRev(filterRange.Cells(i - 1, 1).Text, delimiterItem))(0) Then
Range(filterRange.Cells(i, 1).EntireRow, filterRange.Cells(i, 1).EntireRow).Insert
End If
Next i
I have a column in a very large excel spreadsheet that is in some cases incorrectly formatted. It should contain first a street address, then a name, separated by a hyphen, as shown:
123 Main St-Smith
However, some are formatted in reverse, such as:
Jones-231 High St
All the addresses start with a numeric and all the names start with an alpha. I am looking for a macro or code that would swap only the name and address where it is incorrectly formatted. I have tried turning it into a comma delimited to separate them out, but since they only occur intermittently I am still left with fixing them one by one manually.
Any suggestions? I am by no means an Excel macro expert. Thanks!
Split the string on the hyphen then look for spaces in the second element.
dim i as long, tmp as variant
with worksheets("sheet1")
for i = 2 to .cells(.rows.count, "a").end(xlup).row
tmp = split(.cells(i, "a").value2, "-")
if cbool(instr(1, tmp(1), " ")) then _
.cells(i, "a") = join(array(tmp(1), tmp(0)), "-")
next i
end with
As you wrote
Street name is any string that begins with a digit and ends with either a hyphen or the end of the string
Name is any string that starts with a non-digit and ends with either a hyphen or the end of the string
This can be done using just native VBA, (although at first I was going to use Regular Expressions)
split on the hyphen
rearrange depending on if first starts with a number or not
do some error checking in case no hyphen present or don't have the number and non-number start as specified.
Option Explicit
Function fmtAddressName2(S As String) As String
Dim sAddr As String, sName As String
Dim v As Variant
v = Split(S, "-")
On Error GoTo badFormat
If IsNumeric(Left(v(0), 1)) And Not IsNumeric(Left(v(1), 1)) Then
sAddr = v(0)
sName = v(1)
ElseIf Not IsNumeric(Left(v(0), 1)) And IsNumeric(Left(v(1), 1)) Then
sAddr = v(1)
sName = v(0)
Else
GoTo badFormat
End If
fmtAddressName2 = sAddr & "-" & sName
Exit Function
badFormat:
'return unchanged string
fmtAddressName2 = S
'or could return an error message
End Function
I have a simple problem that I'm hoping to resolve without using VBA but if that's the only way it can be solved, so be it.
I have a file with multiple rows (all one column). Each row has data that looks something like this:
1 7.82E-13 >gi|297848936|ref|XP_00| 4-hydroxide gi|297338191|gb|23343|randomrandom
2 5.09E-09 >gi|168010496|ref|xp_00| 2-pyruvate
etc...
What I want is some way to extract the string of numbers that begin with "gi|" and end with a "|". For some rows this might mean as many as 5 gi numbers, for others it'll just be one.
What I would hope the output would look like would be something like:
297848936,297338191
168010496
etc...
Here is a very flexible VBA answer using the regex object. What the function does is extract every single sub-group match it finds (stuff inside the parenthesis), separated by whatever string you want (default is ", "). You can find info on regular expressions here: http://www.regular-expressions.info/
You would call it like this, assuming that first string is in A1:
=RegexExtract(A1,"gi[|](\d+)[|]")
Since this looks for all occurance of "gi|" followed by a series of numbers and then another "|", for the first line in your question, this would give you this result:
297848936, 297338191
Just run this down the column and you're all done!
Function RegexExtract(ByVal text As String, _
ByVal extract_what As String, _
Optional separator As String = ", ") As String
Dim allMatches As Object
Dim RE As Object
Set RE = CreateObject("vbscript.regexp")
Dim i As Long, j As Long
Dim result As String
RE.pattern = extract_what
RE.Global = True
Set allMatches = RE.Execute(text)
For i = 0 To allMatches.count - 1
For j = 0 To allMatches.Item(i).submatches.count - 1
result = result & (separator & allMatches.Item(i).submatches.Item(j))
Next
Next
If Len(result) <> 0 Then
result = Right$(result, Len(result) - Len(separator))
End If
RegexExtract = result
End Function
Here it is (assuming data is in column A)
=VALUE(LEFT(RIGHT(A1,LEN(A1) - FIND("gi|",A1) - 2),
FIND("|",RIGHT(A1,LEN(A1) - FIND("gi|",A1) - 2)) -1 ))
Not the nicest formula, but it will work to extract the number.
I just noticed since you have two values per row with output separated by commas. You will need to check if there is a second match, third match etc. to make it work for multiple numbers per cell.
In reference to your exact sample (assuming 2 values maximum per cell) the following code will work:
=IF(ISNUMBER(FIND("gi|",$A1,FIND("gi|", $A1)+1)),CONCATENATE(LEFT(RIGHT($A1,LEN($A1)
- FIND("gi|",$A1) - 2),FIND("|",RIGHT($A1,LEN($A1) - FIND("gi|",$A1) - 2)) -1 ),
", ",LEFT(RIGHT($A1,LEN($A1) - FIND("gi|",$A1,FIND("gi|", $A1)+1)
- 2),FIND("|",RIGHT($A1,LEN($A1) - FIND("gi|",$A1,FIND("gi|", $A1)+1) - 2))
-1 )),LEFT(RIGHT($A1,LEN($A1) - FIND("gi|",$A1) - 2),
FIND("|",RIGHT($A1,LEN($A1) - FIND("gi|",$A1) - 2)) -1 ))
How's that for ugly? A VBA solution may be better for you, but I'll leave this here for you.
To go up to 5 numbers, well, study the pattern and recurse manually in the formula. IT will get long!
I'd probably split the data first on the | delimiter using the convert text to columns wizard.
In Excel 2007 that is on the Data tab, Data Tools group and then choose Text to Columns. Specify Other: and | as the delimiter.
From the sample data you posted it looks like after you do this the numbers will all be in the same columns so you could then just delete the columns you don't want.
As the other guys presented the solution without VBA... I'll present the one that does use. Now, is your call to use it or no.
Just saw that #Issun presented the solution with regex, very nice! Either way, will present a 'modest' solution for the question, using only 'plain' VBA.
Option Explicit
Option Base 0
Sub findGi()
Dim oCell As Excel.Range
Set oCell = Sheets(1).Range("A1")
'Loops through every row until empty cell
While Not oCell.Value = ""
oCell.Offset(0, 1).Value2 = GetGi(oCell.Value)
Set oCell = oCell.Offset(1, 0)
Wend
End Sub
Private Function GetGi(ByVal sValue As String) As String
Dim sResult As String
Dim vArray As Variant
Dim vItem As Variant
Dim iCount As Integer
vArray = Split(sValue, "|")
iCount = 0
'Loops through the array...
For Each vItem In vArray
'Searches for the 'Gi' factor...
If vItem Like "*gi" And UBound(vArray) > iCount + 1 Then
'Concatenates the results...
sResult = sResult & vArray(iCount + 1) & ","
End If
iCount = iCount + 1
Next vItem
'And removes trail comma
If Len(sResult) > 0 Then
sResult = Left(sResult, Len(sResult) - 1)
End If
GetGi = sResult
End Function
open your excel in Google Sheets and use the regular expression with REGEXEXTRACT
Sample Usage
=REGEXEXTRACT("My favorite number is 241, but my friend's is 17", "\d+")
Tip: REGEXEXTRACT will return 241 in this example because it returns the first matching case.
In your case
=REGEXEXTRACT(A1,"gi[|](\d+)[|]")