Swapping delimited strings in an excel column - excel

I have a column in a very large excel spreadsheet that is in some cases incorrectly formatted. It should contain first a street address, then a name, separated by a hyphen, as shown:
123 Main St-Smith
However, some are formatted in reverse, such as:
Jones-231 High St
All the addresses start with a numeric and all the names start with an alpha. I am looking for a macro or code that would swap only the name and address where it is incorrectly formatted. I have tried turning it into a comma delimited to separate them out, but since they only occur intermittently I am still left with fixing them one by one manually.
Any suggestions? I am by no means an Excel macro expert. Thanks!

Split the string on the hyphen then look for spaces in the second element.
dim i as long, tmp as variant
with worksheets("sheet1")
for i = 2 to .cells(.rows.count, "a").end(xlup).row
tmp = split(.cells(i, "a").value2, "-")
if cbool(instr(1, tmp(1), " ")) then _
.cells(i, "a") = join(array(tmp(1), tmp(0)), "-")
next i
end with

As you wrote
Street name is any string that begins with a digit and ends with either a hyphen or the end of the string
Name is any string that starts with a non-digit and ends with either a hyphen or the end of the string
This can be done using just native VBA, (although at first I was going to use Regular Expressions)
split on the hyphen
rearrange depending on if first starts with a number or not
do some error checking in case no hyphen present or don't have the number and non-number start as specified.
Option Explicit
Function fmtAddressName2(S As String) As String
Dim sAddr As String, sName As String
Dim v As Variant
v = Split(S, "-")
On Error GoTo badFormat
If IsNumeric(Left(v(0), 1)) And Not IsNumeric(Left(v(1), 1)) Then
sAddr = v(0)
sName = v(1)
ElseIf Not IsNumeric(Left(v(0), 1)) And IsNumeric(Left(v(1), 1)) Then
sAddr = v(1)
sName = v(0)
Else
GoTo badFormat
End If
fmtAddressName2 = sAddr & "-" & sName
Exit Function
badFormat:
'return unchanged string
fmtAddressName2 = S
'or could return an error message
End Function

Related

Removing text with certain pattern in cell

I want to remove part of text in a cell with pattern such as [cid:image003.gif#01D863CC.CAE51sd0] & [https://xxxx=0].
It may appear several times in each cell randomly in different position.
I read some material (Remove Text Within Cell Starting with String and Ending with Character); but I have no clue how to handle the code with looping and line by line within a cell
I prepared 2 samples.
Sample A:
Hi xxx,
This is Ken
[cid:image003.gif#01D863CC.CAE51sd0]
[https://xxxx=0]
[cid:imagedddd0]
Expected:
Hi xxx,
This is Ken
Sample B:
[cid:image003.gif#01D863CC.CAE51sd0]
[https://xxxx=0]
Hi xxx,
This is Ken
[cid:imagedddd0]
Expected:
Hi xxx,
This is Ken
If I understand you correctly, you want to do something to any cell which has certain character, "[" and "]". What you want to do to that kind of cell, you want to remove all those "[", "]" and the value in between those two certain characters.
Example data in the active sheet:
The cell with that kind of data in yellow is scattered around to whatever cells in the active sheet.
if your data is similar with image above and the image below is your expected result after running the sub:
then the sub is something like this :
Sub test()
Dim c As Range
Dim pos1 As Long: Dim pos2 As Long
Do
Set c = ActiveSheet.UsedRange.Find("[", LookAt:=xlPart)
If Not c Is Nothing Then
Do
pos1 = InStr(c.Value, "["): If pos1 = 0 Then Exit Do
pos2 = InStr(c.Value, "]")
c.Replace What:=Mid(c.Value, pos1, pos2 - pos1 + 1), Replacement:="", LookAt:=xlPart
Loop
End If
Loop Until c Is Nothing
End Sub
There are two loops in the sub.
Loop-A is to find any cell of the active sheet which has "[" character and have it as c variable
This loop-A will stop when it doesn't find a cell which has "[" char.
Loop-B is to do something whenever there is "[" in the found cell.
This loop-B will stop if in the found cell there is no more "[" char.
What the sub do in this loop-B is to find the position of "[" as pos1 variable and find the position of "]" as pos2 variable. Then it replace the "[", the "]"
and whatever text in between those two char in the found cell (the c variable) value with nothing ("").
After seeing the sample data, I think it's better to do it in MS Words app. So I search the internet on how to do VBA in MS Words app. Not exactly sure if it's a correct syntax, but it seems the code below (MS Word VBA module) work as expected.
Sub test()
Dim pos1 As Long: Dim pos2 As Long
Dim txt As String: Dim slice As String: Dim rpl As String
Do
pos1 = InStr(ActiveDocument.Content, "[")
If pos1 = 0 Then Exit Do
pos2 = InStr(ActiveDocument.Content, "]")
txt = Mid(ActiveDocument.Content, pos1, pos2 - pos1 + 1)
If Len(txt) > 250 Then
slice = Left(txt, 250): rpl = "["
Else
slice = txt: rpl = ""
End If
With ActiveDocument.Content.Find
.Execute FindText:=slice, ReplaceWith:=rpl, _
Format:=True, Replace:=wdReplaceAll
End With
Loop
End Sub
The process of the sub is similar with the one in Excel app. The difference is, this sub check if the char is more then 250 in the text to be removed (the txt variable) then it will slice it the first 250 char into slice variable, and have "[" as the replacement into rpl variable.
If the txt variable is not more than 250 char, then the slice variable value is the same with the txt variable value, while the rpl variable value then it's just directly nothing ---> "".
WARNING:
In my computer, the sub takes almost 2 minutes to finish the job in MS Words app with the data coming from column A of your Excel sheet sample data.

vba remove comma without removing strikethough

How am I able to remove the comma without removing the strikethrough format
Example: C418, C419, C420 , C421, C422, C423, C424
Expected Result: C418 C419 C420 C421 C422 C423 C424
Final Result: C418, C419 C420 C421 C422 C423 C424
I am checking to see if that cell contain a strikethrough. By using the Function I am able to detect it. But once I try to remove the comma by using the replace function and replace comma with a blank. The format for the strikethrough will be remove causing the function not to work which will result in a different outcome.
I will like to use the space delimiter to match with the other cell so that I can split the cell value afterwards
If HasStrikethrough(BOMCk.Sheets("Filtered RO BOM").Range("B" & LCB)) = True Then
BOMCk.Sheets("Filtered RO BOM").Range("B" & LCB).Value = Replace(BOMCk.Sheets("Filtered RO BOM").Range("B" & LCB).Value, ",", "")
BOMCk.Sheets("Filtered RO BOM").Range("G" & LCB).Value = "strike-off"
ElseIf HasStrikethrough(BOMCk.Sheets("Filtered RO BOM").Range("B" & LCB)) = False Then
BOMCk.Sheets("Filtered RO BOM").Range("B" & LCB).Value = Replace(BOMCk.Sheets("Filtered RO BOM").Range("B" & LCB).Value, ",", "")
End If
Function HasStrikethrough(rng As Range) As Boolean
Dim i As Long
With rng(1)
For i = 1 To .Characters.Count
If .Characters(i, 1).Font.StrikeThrough Then
HasStrikethrough = True
Exit For
End If
Next i
End With
End Function
Range.Characters only works if the cells value is 255 characters or less.
Range.Characters(i, 1).Delete will delete the commas. Make sure to iterate from the last position to the first position when deleting.
Sub RemoveCommas(ByVal Target As Range)
If Target.Characters.Count > 255 Then
MsgBox "Range.Characters only works with String with 255 or less Characters", vbCritical, "String too long"
Exit Sub
End If
Dim n As Long
For n = Target.Characters.Count To 1 Step -1
If Target.Characters(n, 1).Text = "," Then Target.Characters(n, 1).Delete
Next
End Sub
Alternative via xlRangeValueXMLSpreadsheet Value
The ►.Value(11) approach solves the question by a very simple string replacement (though the xml string handling can reveal to be very complicated in many other cases):
Sub RemoveCommata(rng As Range, Optional colOffset As Long = 1)
'a) Get range data as xml spreadsheet value
Dim xmls As String: xmls = rng.Value(xlRangeValueXMLSpreadsheet) ' //alternatively: xmls = rng.Value(11)
'b) find start position of body
Dim pos As Long: pos = InStr(xmls, "<Worksheet ")
'c) define xml spreadsheet parts and remove commata in body
Dim head As String: head = Left(xmls, pos - 1)
Dim body As String: body = Replace(Mid(xmls, pos), ",", "")
'd) write cleaned range back
rng.Offset(0, colOffset).Value(11) = head & body
End Sub
Help reference links
Excel XlRangeValueDataType enumeration
Excel Range Value

Using Excel Proper Function with exception | Excel

Essentially I have multiple strings within my Excel Spreadsheet that are structured the following way:
JOHN-MD-HOPKINS
REC-PW-RESIN
I would like to use the proper function but exclude the part of the string that is within the dashes (-).
The end result should look like the following:
John-MD-Hopkins
Rec-PW-Resin
Is there an excel formula that is capable of doing this?
You may need to create your own VBA function to do this, that checks if there are two hyphens in the data, and if so converts the first and last words to proper case without touching the middle word, otherwise just converts the string to proper case.
Paste the following into a module within Excel:
Function fProperCase(strData As String) As String
Dim aData() As String
aData() = Split(strData, "-")
If UBound(aData) - LBound(aData) = 2 Then ' has two hyphens in the original data
fProperCase = StrConv(aData(LBound(aData)), vbProperCase) & "-" & aData(LBound(aData) + 1) & "-" & StrConv(aData(UBound(aData)), vbProperCase)
Else ' just do a normal string conversion to proper case
fProperCase = StrConv(strData, vbProperCase)
End If
End Function
Then, in your worksheet, you can use this just as you would any built-in formula, so if "JOHN-MD-HOPKINS" is in cell A1, you would use this as a formula in another cell:
=fProperCase(A1)
Which would display John-MD-Hopkins as required.
EDITED CODE
As the requirement is to leave the second word, then this modified VBA function, which "walks" the array should work instead:
Function fProperCase2(strData As String) As String
Dim aData() As String
Dim lngLoop1 As Long
aData() = Split(strData, "-")
For lngLoop1 = LBound(aData) To UBound(aData)
If (lngLoop1 = LBound(aData) + 1) And (lngLoop1 <> UBound(aData)) Then
aData(lngLoop1) = aData(lngLoop1)
Else
aData(lngLoop1) = StrConv(aData(lngLoop1), vbProperCase)
End If
Next lngLoop1
fProperCase2 = Join(aData, "-")
End Function
It basically looks to see if the array element being dealt with is the second (lngLoop1=LBound(aData)+1) and also not the last (lngLoop1<>UBound(aData)).
Regards,

InStr will not find dots in some cases

I have strings that consist of leading dots followed by a number (for example "..2" or "....4". I want to delete all leading dots and convert the string into a long variable.
So I have written a function that finds leading dots in strings and deletes them. For some reason, the function works for a string like "..2" but will not work for "...3". The InStr function will not find "." in "...3".
The strings are read out from a column in a worksheet. They are not formatted in any weird way, I have tried just typing them in manually in a new worksheet without any changes to the default formatting settings, same results.
So I have tried several things. I beleive there must be some error involving character encodings, I cannot figure out how to solve this problem though.
I have tried using a recursive function using InStr to delete the dots and then tried the split function with "." as the delimiter to test my assumption. Split has the same problem, works for "..2" but will not work for "...3".
When I debug print the strings that I read out, "...3" seems to be formatted differently than "..2" or ".1". I do not know why.
here you can see the difference in the formatting
Sub Gruppieren()
'read out strings first
'then try to delete the dots
Dim strArr() As String
Dim lngArr() As Long
Dim lLastRow As Long
Dim i As Long
lLastRow = getFirstEmptyRow("A", Tabelle1.Index)
ReDim strArr(1 To lLastRow)
ReDim lngArr(1 To lLastRow)
For i = 1 To UBound(strArr)
strArr(i) = Worksheets(1).Cells(i, 1).Value
Debug.Print strArr(i)
strArr(i) = clearLeadingDots(strArr(i))
'strArr(i) = splitMeIfYouCan(strArr(i))
If IsNumeric(strArr(i)) = True Then
lngArr(i) = CLng((strArr(i)))
Debug.Print lngArr(i)
End If
Next i
End Sub
'The functions:
Function clearLeadingDots(myText As String) As String
Dim i As Long
i = InStr(myText, ".")
If i <> 0 Then
myText = Right(CStr(myText), Len(myText) - i)
clearLeadingDots = clearLeadingDots(CStr(myText))
Else
clearLeadingDots = CStr(myText)
Exit Function
End If
End Function
Function splitMeIfYouCan(myText As String) As String
Dim myArr() As String
Dim i As Long
myArr = Split(myText, ".")
splitMeIfYouCan = myArr(UBound(myArr))
End Function
Edit: The answer was, that three dots were converted into an ellipsis automatically, searching for and eliminating Chr(133) did the job.

How to extract text within a string of text

I have a simple problem that I'm hoping to resolve without using VBA but if that's the only way it can be solved, so be it.
I have a file with multiple rows (all one column). Each row has data that looks something like this:
1 7.82E-13 >gi|297848936|ref|XP_00| 4-hydroxide gi|297338191|gb|23343|randomrandom
2 5.09E-09 >gi|168010496|ref|xp_00| 2-pyruvate
etc...
What I want is some way to extract the string of numbers that begin with "gi|" and end with a "|". For some rows this might mean as many as 5 gi numbers, for others it'll just be one.
What I would hope the output would look like would be something like:
297848936,297338191
168010496
etc...
Here is a very flexible VBA answer using the regex object. What the function does is extract every single sub-group match it finds (stuff inside the parenthesis), separated by whatever string you want (default is ", "). You can find info on regular expressions here: http://www.regular-expressions.info/
You would call it like this, assuming that first string is in A1:
=RegexExtract(A1,"gi[|](\d+)[|]")
Since this looks for all occurance of "gi|" followed by a series of numbers and then another "|", for the first line in your question, this would give you this result:
297848936, 297338191
Just run this down the column and you're all done!
Function RegexExtract(ByVal text As String, _
ByVal extract_what As String, _
Optional separator As String = ", ") As String
Dim allMatches As Object
Dim RE As Object
Set RE = CreateObject("vbscript.regexp")
Dim i As Long, j As Long
Dim result As String
RE.pattern = extract_what
RE.Global = True
Set allMatches = RE.Execute(text)
For i = 0 To allMatches.count - 1
For j = 0 To allMatches.Item(i).submatches.count - 1
result = result & (separator & allMatches.Item(i).submatches.Item(j))
Next
Next
If Len(result) <> 0 Then
result = Right$(result, Len(result) - Len(separator))
End If
RegexExtract = result
End Function
Here it is (assuming data is in column A)
=VALUE(LEFT(RIGHT(A1,LEN(A1) - FIND("gi|",A1) - 2),
FIND("|",RIGHT(A1,LEN(A1) - FIND("gi|",A1) - 2)) -1 ))
Not the nicest formula, but it will work to extract the number.
I just noticed since you have two values per row with output separated by commas. You will need to check if there is a second match, third match etc. to make it work for multiple numbers per cell.
In reference to your exact sample (assuming 2 values maximum per cell) the following code will work:
=IF(ISNUMBER(FIND("gi|",$A1,FIND("gi|", $A1)+1)),CONCATENATE(LEFT(RIGHT($A1,LEN($A1)
- FIND("gi|",$A1) - 2),FIND("|",RIGHT($A1,LEN($A1) - FIND("gi|",$A1) - 2)) -1 ),
", ",LEFT(RIGHT($A1,LEN($A1) - FIND("gi|",$A1,FIND("gi|", $A1)+1)
- 2),FIND("|",RIGHT($A1,LEN($A1) - FIND("gi|",$A1,FIND("gi|", $A1)+1) - 2))
-1 )),LEFT(RIGHT($A1,LEN($A1) - FIND("gi|",$A1) - 2),
FIND("|",RIGHT($A1,LEN($A1) - FIND("gi|",$A1) - 2)) -1 ))
How's that for ugly? A VBA solution may be better for you, but I'll leave this here for you.
To go up to 5 numbers, well, study the pattern and recurse manually in the formula. IT will get long!
I'd probably split the data first on the | delimiter using the convert text to columns wizard.
In Excel 2007 that is on the Data tab, Data Tools group and then choose Text to Columns. Specify Other: and | as the delimiter.
From the sample data you posted it looks like after you do this the numbers will all be in the same columns so you could then just delete the columns you don't want.
As the other guys presented the solution without VBA... I'll present the one that does use. Now, is your call to use it or no.
Just saw that #Issun presented the solution with regex, very nice! Either way, will present a 'modest' solution for the question, using only 'plain' VBA.
Option Explicit
Option Base 0
Sub findGi()
Dim oCell As Excel.Range
Set oCell = Sheets(1).Range("A1")
'Loops through every row until empty cell
While Not oCell.Value = ""
oCell.Offset(0, 1).Value2 = GetGi(oCell.Value)
Set oCell = oCell.Offset(1, 0)
Wend
End Sub
Private Function GetGi(ByVal sValue As String) As String
Dim sResult As String
Dim vArray As Variant
Dim vItem As Variant
Dim iCount As Integer
vArray = Split(sValue, "|")
iCount = 0
'Loops through the array...
For Each vItem In vArray
'Searches for the 'Gi' factor...
If vItem Like "*gi" And UBound(vArray) > iCount + 1 Then
'Concatenates the results...
sResult = sResult & vArray(iCount + 1) & ","
End If
iCount = iCount + 1
Next vItem
'And removes trail comma
If Len(sResult) > 0 Then
sResult = Left(sResult, Len(sResult) - 1)
End If
GetGi = sResult
End Function
open your excel in Google Sheets and use the regular expression with REGEXEXTRACT
Sample Usage
=REGEXEXTRACT("My favorite number is 241, but my friend's is 17", "\d+")
Tip: REGEXEXTRACT will return 241 in this example because it returns the first matching case.
In your case
=REGEXEXTRACT(A1,"gi[|](\d+)[|]")

Resources