splits cells after capital combination

splits cells after capital combination - string

I have a file with 200 rows which have values like this:
ANTWERPEN 3 ABDIJ Abdijstraat 71-73 2020 9:00 18:00 9:00 18:00 9:00 18:00 9:00 18:00 9:00 19:00 9:00 19:00 which I want to have splitted into separate columns.
I want to have 1 column for the part which is in Capitals entirely. In this specific case, that would be:
ANTWERPEN 3 ABDIJ.
And another column for the part that comes after it, until the 4 numeric characters. In this case: Abdijstraat 71-73
I am happy the row values have this distinction to separate the addresses, but I do not know how to do this.
I have had a similar situation for splitting cells at the first numeric character:
text to columns: split at the first number in the value
But now I am looking for a two-fold solution to have in the first column the first part which is entirely in capitals, which represents the city and in the 2nd column I need to have the string which starts with a capital but is then followed by non-capital characters and ends before a 4 characters string of numeric characters.
I would be happy if I could create a vba or excel code/formula which could do this for me, but unfortunately, I can not :-(
So I hope someone can.
edit:
finding some other routines and modifying and testing it, helped me to create this:
Sub doitall()
Dim cell As Range, j As Integer, i As Integer, x As String
Dim str As String
Dim strlen As Integer
Dim k As Integer
Dim l As Integer
Dim y As Integer
' Dim v As Integer
'
'
' For j = 1 To Cells(Rows.Count, 1).End(xlUp).Row
' For Each cell In ActiveSheet.Range(Cells(1, 1), Cells(j, 1))
' For i = 1 To Len(cell)
' x = Mid(cell, i, 1)
' If x = ":" Then Exit For
' Next i
' cell.Offset(0, 1) = Left(cell, i - 8)
' Next cell
' Next j
'geparkeerd
' If l >= 65 And l <= 90 Then
' If v > 1 Then
' m = v - 1
' l = Asc(Mid(Cells(j, 2), m, 1))
' Else
' l = 0
' End If
For j = 1 To Cells(Rows.Count, 2).End(xlUp).Row
For Each cell In ActiveSheet.Range(Cells(1, 2), Cells(j, 2))
For v = 1 To Len(cell)
k = Asc(Mid(cell, v, 1))
If k >= 97 And k <= 122 Then
If v < 1 Then
Exit For
Else: m = v - 1
End If
l = Asc(Mid(cell, m, 1))
If l >= 65 And l <= 90 Then
y = Len(cell) - (v - 1)
cell.Offset(0, 1) = Mid(cell, m, y + 1)
Else
End If
End If
Next v
Next cell
Next j
End Sub
the first part finds the ":" in the cell value and uses all characters on the left from ":" minus 8 as the cell value for the cell in the column next to it.
The second part has to use this 'new' value to separate the city name from the street name. Fortunately, the street name always starts with a capital and is followed by a non-capital.
And fortunately, the city name is completely in capitals which makes it easier to split the value based on Capital followed by non capital.
I focus on the second part now.
what the second part does is check for each cell and each position in the cell if it is non-capital. If it is, it checks if the position before is capital. If it does, it have to use the all characters from the capital as a new value in the cell in the next column.
This works.
But not for this value:
BELLE- ILE "Belle-Ile" Shop 22 -Quai des Vennes 1
the result from that value is only Vennes 1.
but why? v loops from 1 to the length of the cell. But starts at 1 so position 1 is at the left of the cell value. From this routine, the result should actually be Belle-Ile" Shop 22 -Quai des Vennes 1.
Anyone have the explanation for this?
I will adjust it by hand now, but I am just curious to find out why it returns this values.
Solution: v has to check from len(cell) to 1 step -1. After I changed that, It works almost perfectly.
But I still do not understand why. How I read it, is that v starts testing at the last position works towards the first position of the cell value. Like this, in my opinion, the routine would not work I believe. But somehow it does. The key is understanding why v has to be len(cell) to 1 step -1 instead of 1 to len(cell).
I hope someone can explain this to me.
(I will also try the regex solution after I have got to learn something about it).

I am new to regex, but the following works with the input line given above. No doubt a more elegant solution exists, but this might get you going in the right direction. StackOverflow links I found useful in building the regex patterns:
How to match "anything up until this sequence of characters" in a regular expression?
Regex to match mixed case words
Regex to match only uppercase "words" with some exceptions
How to use Regular Expressions (Regex) in Microsoft Excel both in-cell and loops
Function Part1(Myrange As Range) As String
Dim regEx As New RegExp
Dim strPattern As String
Dim strInput As String
strPattern = ".+?(?=[A-Z][a-z]+)"
If strPattern <> "" Then
strInput = Myrange.Value
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.test(strInput) Then
Set matches = regEx.Execute(strInput)
For Each Match In matches
Part1 = Part1 & Match.Value
Next
Else
Part1 = "Not matched"
End If
End If
End Function
Function Part2(Myrange As Range) As String
Dim regEx As New RegExp
Dim strPattern As String
Dim strInput As String
Dim strReplace As String
strPattern = ".+?(?=[A-Z][a-z]+)"
If strPattern <> "" Then
strInput = Myrange.Value
strReplace = ""
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.test(strInput) Then
Part2 = regEx.Replace(strInput, strReplace)
regEx.Pattern = ".+?(?=[0-9]{4})"
Set matches = regEx.Execute(Part2)
For Each Match In matches
Part2 = Match.Value
Next
Else
Part2 = "Not matched"
End If
End If
End Function

This is what I have and what satisfies my 'need':
Sub doitall()
Dim cell As Range, j As Integer, i As Integer, x As String
Dim str As String
Dim strlen As Integer
Dim k As Integer
Dim l As Integer
Dim y As Integer
Dim v As Integer
For j = 1 To Cells(Rows.Count, 1).End(xlUp).Row
For Each cell In ActiveSheet.Range(Cells(1, 1), Cells(j, 1))
For i = 1 To Len(cell)
x = Mid(cell, i, 1)
If x = ":" Then Exit For
Next i
cell.Offset(0, 1) = Left(cell, i - 8)
Next cell
Next j
For j = 1 To Cells(Rows.Count, 2).End(xlUp).Row
For Each cell In ActiveSheet.Range(Cells(1, 2), Cells(j, 2))
For v = Len(cell) To 1 Step -1
k = Asc(Mid(cell, v, 1))
If k >= 97 And k <= 122 Then
If v < 1 Then
Exit For
Else: m = v - 1
End If
l = Asc(Mid(cell, m, 1))
If l >= 65 And l <= 90 Then
y = Len(cell) - (v - 1)
cell.Offset(0, 1) = Mid(cell, m, y + 1)
cell.Offset(0, 2) = Left(cell, (m - 1))
Else
End If
End If
Next v
Next cell
Next j
End Sub
It works almost perfectly. except for some cells that have some other characters in the string which are not covered by this routine.
But I believe that could also be added (check op spaces, double quotes etc.)

Related

Highlight specific text in one cell based on another cell

I have few values in column I and column H, i have a code which highlights specific words in H column if those words are exactly present in I column.
Drawback is it highlights the works only if they are exactly ditto and are present together, Can any changes be made in the code and make highlight each word even if they are not together
attaching a image of what i want vs what i have, also attaching the existing code.
Dim c1 As Range, c2 As Range, md As Variant, i As Long, w1 As String, os As Long
Set c1 = Range("I2")
Set c2 = Range("H2")
md = Range(c1, Cells(Rows.Count, c1.Column).End(xlUp)).Value
For i = 1 To UBound(md)
If md(i, 1) <> "" Then
w1 = c2.Cells(i, 1).Value
os = InStr(1, w1, md(i, 1), vbTextCompare)
While os > 0
c2.Cells(i, 1).Characters(Start:=os, Length:=Len(md(i, 1))).Font.Color = vbBlue
os = InStr(os + 1, w1, md(i, 1), vbTextCompare)
Wend
End If
Next i
It would be a great help if someone solves my problem.

For pattern matching use a Regular Expression.
Option Explicit
Sub markup()
Dim regex As Object, m As Object, ar
Dim pattern As String, s As String
Dim Lastrow As Long, i As Long, k As Long, n As Long, p As Long
' Create regular expression.
Set regex = CreateObject("VBScript.RegExp")
With regex
.IgnoreCase = True
.Global = True
End With
'update sheet
With ActiveSheet
Lastrow = .Cells(.Rows.Count, "I").End(xlUp).Row
For i = 2 To Lastrow
pattern = Replace(.Cells(i, "I"), ",", "|")
If Len(pattern) > 0 Then
regex.pattern = pattern
s = .Cells(i, "H")
If regex.test(s) Then
' markup matches
Set m = regex.Execute(s)
For k = 0 To m.Count - 1
p = m(k).firstindex + 1
n = Len(m(k))
With .Cells(i, "H").Characters(Start:=p, Length:=n)
.Font.Color = vbBlue
.Font.Bold = True
End With
Next
End If
End If
Next
End With
End Sub

Making VBA udf compatible with Array Formula

I found the following great udf for fuzzy match a string but it doesnt work with Array formula, I am very basic in VBA and cant make it work (from reading different post it may have something to do with adding Lbound somewhere but cant figure it out).
Could I get some help ?
what I would like to do is something like
{=searchChars("yellow",if(list_of_product="productA",list_of_colors))}
.
'Name function and arguments
Function SearchChars(lookup_value As String, tbl_array As Variant) As String
'Declare variables and types
Dim i As Integer, str As String, Value As String
Dim a As Integer, b As Integer, cell As Variant
'Iterste through each cell
For Each cell In tbl_array
'Save cell value to variable
str = cell
'Iterate through characters
For i = 1 To Len(lookup_value)
'Same character?
If InStr(cell, Mid(lookup_value, i, 1)) > 0 Then
'Add 1 to number in array
a = a + 1
'Remove evaluated character from cell and contine with remaning characters
cell = Mid(cell, 1, InStr(cell, Mid(lookup_value, i, 1)) - 1) & Mid(cell, InStr(cell, Mid(lookup_value, i, 1)) + 1, 9999)
End If
'Next character
Next i
a = a - Len(cell)
'Save value if there are more matching characters than before
If a > b Then
b = a
Value = str
End If
a = 0
Next cell
'Return value with the most matching characters
SearchChars = Value
End Function
Option Explicit

Working OK for me - does not need to be entered as an array formula:
A few "improvements":
Function SearchChars(lookup_value As String, tbl_array As Variant) As String
Dim i As Long, str As String, Value As String, c As String
Dim a As Long, b As Long, cell As Variant
For Each cell In tbl_array
If Len(cell) > 0 Then 'skip empty values
str = cell
a = 0
For i = 1 To Len(lookup_value)
c = Mid(lookup_value, i, 1) '<< do this once
If InStr(cell, c) > 0 Then
a = a + 1
cell = Replace(cell, c, "", Count:=1) '<< simpler
If Len(cell) = 0 Then Exit For '<< nothing left...
End If
Next i
a = a - Len(cell)
'Debug.Print str, a
If a > b Then
b = a
Value = str
End If
End If
Next cell
SearchChars = Value
End Function

How to extract date time from an excel cell?

I have an excel file with cells containing some comments like:
txxxxx:10/15/2019 11:38:48 AM - Customer ID: xxxxx
) 1st contact - Text only sent to Mob TN xxxxxxw/Ref &TN
Txxxxxx:10/18/2019 1:34:12 PM -
Called BEST CBR xxxxxx, Spoke to Mr, he said they have been busy & unable to review/discuss yet. Offered to text him our direct info, they will check online &/or call us soon.
An SMS message has been successfully sent to Gull Family at xxxxxx
OK WITH FINAL CB next week.
The text could be anything potentially containing multiple date time fields.
I am trying to extract all such date occurrences from each cell and put them in different columns
I tried using =regExFind and =regExExtract. For instance:
=RegExFind($Cell,"\d{2}/\d{2}/\d{4}")
I also tried =Text($cell, dd/mm/yyyy)
However, neither approach is working.
Is there a way in excel to do a RegEx Extract and if so how to achieve that?
If not, what is the best way to extract all datetime fields?
UPDATE:
This is the code I have used:
Function RegexExtract(ByVal text As String, _
ByVal extract_what As String, _
Optional separator As String = ", ") As String
Dim allMatches As Object
Dim RE As Object
Set RE = CreateObject("vbscript.regexp")
Dim i As Long, j As Long
Dim result As String
RE.pattern = extract_what
RE.Global = True
Set allMatches = RE.Execute(text)
For i = 0 To allMatches.count - 1
For j = 0 To allMatches.Item(i).submatches.count - 1
result = result & (separator & allMatches.Item(i).submatches.Item(j))
Next
Next
If Len(result) <> 0 Then
result = Right$(result, Len(result) - Len(separator))
End If
RegexExtract = result
End Function
But no Output.

As your post is not very explicit, my answer can not be more too.
Try this one:
Dim regex As Object, str As String
Set regex = CreateObject("VBScript.RegExp")
With regex
.Pattern = "([0-9]+)/([0-9]+)/([0-9]+)"
.Global = True
End With
str = "Whatever string you have"
Set matches = regex.Execute(str)
For Each match In matches
Debug.Print match.Value
Next match
So you will need to loop it in your range. str should be your cell on the loop and instead of Debug.Print you should bring this value to whatever cell like Worksheet("?").Cells(i,j).Value = match.Value.
Hope it helps

A bit of a workaround...instead of using RegEx the idea is to find the "AM"s and "PM"s in the cell and copy paste the string of 19/20 characters before them in the column "Date Extraction". One of the limitations of this method is clearly that could work only if AM and PM are always present in the date format of your messages.
Sub ExtractDates()
Dim myRange, cell As Range
Dim StringInCell, MyDate As String
Dim DateExtrColNum, i, j As Integer
Set myRange = Worksheets("YourSheetName").UsedRange
DateExtrColNum = myRange.Columns(myRange.Columns.Count).Column
Cells(1, DateExtrColNum + 1).Value = "Date Extraction"
j = 2
For Each cell In myRange
If Not cell.Find("AM") Is Nothing Or Not cell.Find("PM") Is Nothing Then
StringInCell = cell.Value
i = 1
Do While InStr(i, StringInCell, "AM") <> 0 Or InStr(i, StringInCell, "PM") <> 0
If InStr(i, StringInCell, "AM") <> 0 Then
MyDate = Mid(StringInCell, InStr(i, StringInCell, "AM") - 20, 20)
If InStr(1, MyDate, ":") = 1 Then
MyDate = Right(MyDate, 19)
End If
i = InStr(i, StringInCell, "AM") + 1
Else: MyDate = Mid(StringInCell, InStr(i, StringInCell, "PM") - 20, 20)
If InStr(1, MyDate, ":") = 1 Then
MyDate = Right(MyDate, 19)
End If
i = InStr(i, StringInCell, "PM") + 1
End If
Cells(j, DateExtrColNum + 1).Value = MyDate
j = j + 1
Loop
End If
Next
End Sub

Extracting numbers from a string in each cell

I'm trying to write code that extracts X consecutive numbers from text.
For example, if I want to extract 5 consecutive numbers in my text:
Cell A1: dsuad28d2hr 22222222 11111 d33d11103
Cell B2: 11111 (wanted)
I could make it work for texts with only 5 numbers but the problem is if my text contains other consecutive numbers higher than 5.
Sub ExtractNum2()
Dim Caract() As String
Dim i As Integer
Dim j As Integer
Dim z As Integer
Dim cont As Integer
Dim goal As Integer
Dim Protocolo() As String
Dim cel As String
Dim lin As Long
lin = Range("A1", Range("A1").End(xlDown)).Rows.Count 'Repeat for each line
For z = 1 To lin
cel = Cells(z, 1)
ReDim Caract(Len(cel))
ReDim Protocolo(Len(cel))
cont = 0
For i = 1 To Len(cel)
Caract(i) = Left(Mid(cel, i), 1)
If IsNumeric(Caract(i)) Then 'Character check
cont = cont + 1
Protocolo(cont) = Caract(i)
'If Not IsNumeric(Caract(6)) And cont = 5 Then**
If cont = 5 '
Dim msg As String
For j = 1 To 5
msg = msg & Protocolo(j)
Next j
Cells(z, 2) = msg 'fills column B
msg = ""
End If
Else
cont = 0
End If
Next i
Next z 'end repeat
End Sub
I'm trying to use:
If Not IsNumeric(Caract(6)) And cont = 5 Then
But it is not working, my output is: B2: 22222 but I want 11111.
What am I missing?
EDIT
Sorry i wasnt clear. I want to extract X numbers with 6>x>4 (x=5). I dont want 22222 since it has 8 consecutive numbers and 11111 has 5 in my example.

UDF:
Function GetNum(cell)
With CreateObject("VBScript.RegExp")
.Pattern = "\b(\d{5})\b"
With .Execute(cell)
If .Count > 0 Then GetNum = .Item(0).SubMatches(0)
End With
End With
End Function
UPDATE:
If you want to return error (say, #N/A) instead of callee's default data type, you could write the following:
Function GetNum(cell)
With CreateObject("VBScript.RegExp")
.Pattern = "\b(\d{5})\b"
With .Execute(cell)
If .Count > 0 Then
GetNum = .Item(0).SubMatches(0)
Else
GetNum = CVErr(xlErrNA)
End If
End With
End With
End Function

I tried this with a Cell containing "Yjuj 525211111x5333332s5" to test whether 2 consecutive 5 characters get catch, and it worked great.
Sub Macro_Find_Five()
Dim str As String
Dim tmp As String
Dim cntr As Integer
Dim result As String
str = Sheet1.Cells(1, 1).Value
tmp = ""
cntr = 1
col = 2
result = ""
'For Loop for tracing each charater
For i = 1 To Len(str)
'Ignore first starting character
If i > 1 Then
'If the last character matches current character then
'enter the if condition
If tmp = Mid(str, i, 1) Then
'concatenate current character to a result variable
result = result + Mid(str, i, 1)
'increment the counter
cntr = cntr + 1
Else
'if the previous character does not match
'reset the cntr to 1
cntr = 1
'as well initialize the result string to "" (blank)
result = ""
End If
End If
'if cntr matches 5 i.e. 5 characters traced enter if condition
If cntr = 5 Then
'adding to next column the result found 5 characters same
Sheet1.Cells(1, col).Value = result
'increment the col (so next time it saves in next column)
col = col + 1
'initializing the variables for new search
cntr = 1
tmp = ""
result = ""
End If
'stores the last character
tmp = Mid(str, i, 1)
'if first character match concatenate.
If cntr = 1 Then
result = result + Mid(str, i, 1)
End If
Next i
End Sub

Matching two titles by words and calculate %

I was trying to automate an Excel file which has title in both A and B columns and I have to search each word from A within B and calculate the % by using the "no of words matched/total no of words (in column A)" formula.
I'm using the below code, however its not giving me the accurate % for which the title has repeated words (Duplicate words).
Sub percentage()
Dim a() As String, b() As String
Dim aRng As Range, cel As Range
Dim i As Integer, t As Integer
Set aRng = Range(Range("A1"), Range("A5").End(xlDown))
For Each cel In aRng
a = Split(Trim(cel), " ")
b = Split(Trim(cel.Offset(, 1)), " ")
d = 0
c = UBound(a) + 1
If cel.Value <> "" Then
If InStr(cel, cel.Offset(, 1)) Then
d = UBound(b) + 1
Else
For i = LBound(a) To UBound(a)
For t = LBound(b) To UBound(b)
If UCase(a(i)) = UCase(b(t)) Then
d = d + 1
End If
Next
Next
End If
End If
cel.Offset(0, 2).Value = (d / c)
Next
End Sub
If Title 1 : Really Nice pack with Nice print and Title 2 : Nice Print Nice pack then result should be 3/6 i.e. 67%.
But I'm getting a result as 100%.
Can anyone help me out please.
Titles are
Great job dud
Really Nice pack with Nice print
To give success and success process
Don’t eat too much. If you eat too much you will get sick
I have tried =noDuplicate(celladdress)

First, you should delete duplicate word in column B.
My function delete word and return array of word that not duplicate.
Function noDuplicate(ByVal str As String) As String()
Dim splitStr() As String
Dim result() As String
Dim i As Integer
Dim j As Integer
Dim k As Integer
Dim addFlag As Boolean
splitStr = Split(UCase(str), " ")
ReDim result(UBound(splitStr))
'
result(0) = splitStr(0)
k = 0
For i = 1 To UBound(splitStr)
addFlag = True
For j = 0 To k
If splitStr(i) = result(j) Then
addFlag = False
Exit For
End If
Next j
If addFlag Then
result(k + 1) = splitStr(i)
k = k + 1
End If
Next i
ReDim Preserve result(k)
noDuplicate = result
End Function
Then calculate the percentage of number of match word and number of word in column A.
Function percentMatch(ByVal colA As String, ByVal colB As String) As Double
Dim splitColA() As String
Dim splitColB() As String
Dim i As Integer
Dim j As Integer
Dim matchCount As Integer
splitColA = Split(UCase(colA), " ")
splitColB = noDuplicate(colB)
matchCount = 0
For i = 0 To UBound(splitColA)
For j = 0 To UBound(splitColB)
If splitColA(i) = splitColB(j) Then
matchCount = matchCount + 1
Exit For
End If
Next j
Next i
percentMatch = matchCount / (UBound(splitColA) + 1)
End Function
After add these two function, you can write your new code to below
Sub percentage()
Dim aRng As Range, cel As Range
Set aRng = Range(Range("A1"), Range("A5").End(xlDown))
For Each cel In aRng
cel.Offset(0, 2).Value = percentMatch(cel.Value, cel.Offset(0, 1).Value)
Next
End Sub
Note, I not protect for empty string in the function.

If you F8 through the code, you can see the problem.
The first Nice in column A loops through column B and counts 2 occurences.
Pack in column A loops through column B and counts 1 occurence.
The second Nice in column A loops through column B and counts 2 occurences.
Print in column A loops through column B and counts 1 occurence.
So you get a count of 6 against the 6 words in column A; 100%
If you add a random word to column A, you'll get 6 out of 7.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

splits cells after capital combination - string

Related

Highlight specific text in one cell based on another cell

Making VBA udf compatible with Array Formula

How to extract date time from an excel cell?

Extracting numbers from a string in each cell

Matching two titles by words and calculate %

Categories

Resources