Extract value from URL and set it as variable - excel

I want to double-click a cell in Excel to open a URL.
I've been using VBA for this aspect, but I am facing an issue.
I want to extract a value from URL and use it as variable in VBA.
Here is part of the script:
Dim ID As String
ID = ActiveSheet.Range("S" & Target.Cells.Row & "").Value
rptUrl = "http://...=" + ID
If (ID <> "") Then
ThisWorkbook.FollowHyperlink (rptUrl)
In such case, if the ID is at the end of the URL, it works.
What happens if the ID that I want to extract is somewhere in the middle of the URL, and not at the end?
For example:
rptUrl = "http://..**ID**..="
I tried the following:
rptUrl = "http://.. + **ID** + ..="

If you want to use a regular expression, here's an option that packages the regular expression into a function that you can call. If the URL contains "ID", it will return the corresponding value; otherwise, it will just return a blank string
Function GetId(sInput) As String
Dim oReg As Object
Dim m As Variant
Dim sOutput As String
sOutput = ""
Set oReg = CreateObject("VBScript.Regexp")
With oReg
.Global = False
.ignorecase = True
.MultiLine = False
.Pattern = "id=(\w+)[&|$]"
End With
If oReg.Test(sInput) Then
sOutput = oReg.Execute(sInput)(0).submatches(0)
End If
GetId = sOutput
End Function
Sub Test()
Debug.Print GetId("mysrv.com/form.jsp?id=12345&cn=0")
End Sub

Related

Using Excel VBA to load a website that is incompatible with IE11

In Excel VBA to load a website and get it into a sheet I have been using the following:
Dim IE As Object
Set IE = CreateObject("InternetExplorer.Application")
IE .navigate "https://www.wsj.com/market-data/bonds/treasuries"
And then I can copy and paste it into my Excel sheet.
But this website no longer works with IE11, and Excel VBA insists on using IE11 even though it is about to be deprecated.
Is there another way? I have also looked at:
Selenium: but it seems to be pretty much obsolete for VBA (not updated since 2016) and I couldn’t get it to work with Edge or Firefox in VBA anyway.
AutoIt: I got it to write the website’s HTML code to a TXT file (oHTTP = ObjCreate("winhttp.winhttprequest.5.1") ; $oHTTP.Open("GET", $URL1, False) ; $oHTTP.Send(); $oReceived = $oHTTP.ResponseText; FileWrite($file, $oReceived)) but the txt file contents are far from convenient as there is endless HTML stuff in it. It’ll take a fair amount of VBA code to sort through the mess, which probably means it won’t be reliable going forward. Also given the size of my workbook which is very slow, it will take literally several minutes to copy the website data into a sheet element by element.
Surely there must be an easy way to load the site, or just the table within the site, into an Excel sheet? This must be a well trodden path, but after much googling I can’t find an easy solution that actually works.
I have a 5-10 web pages being loaded into this workbook, and it seems to be a full time job keeping the whole thing working!! Any thoughts/help very much appreciated!!!
Similar idea to Christopher's answer in using regex. I am grabbing the instruments data (JS array), splitting the component dictionaries out (minus the end }), and then use regex, based on headers, to grab the appropriate values.
I use a dictionary to handle input/output headers, and set a couple of request headers to help to signal browser based request and to mitigate for being served cached results.
Ideally, one would use an html parser and grab the script tag, then use a json parser on the JavaScript object within the script tag.
If you want the data from the other tabbed results, I can add that in by explicitly setting re.Global = True, then looping the returned matches. Depends whether you want those and how you want them to appear in the sheet(s).
I currently write results out to a sheet called Treasury Notes & Bonds.
Option Explicit
Public Sub GetTradeData()
Dim s As String, http As MSXML2.XMLHTTP60 'required reference Microsoft XML v6,
Set http = New MSXML2.XMLHTTP60
With http
.Open "GET", "https://www.wsj.com/market-data/bonds/treasuries", False
.setRequestHeader "User-Agent", "Mozilla/5.0"
.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
.send
s = .responseText
End With
Dim re As VBScript_RegExp_55.RegExp 'required reference Microsoft VBScript Regular Expressions
Set re = New VBScript_RegExp_55.RegExp
re.Pattern = "instruments"":\[(.*?)\]"
s = re.Execute(s)(0).SubMatches(0)
Dim headers() As Variant, r As Long, c As Long, mappingDict As Scripting.Dictionary 'required reference Microsoft Scripting Runtime
Set mappingDict = New Scripting.Dictionary
mappingDict.Add "maturityDate", "MATURITY"
mappingDict.Add "coupon", "COUPON"
mappingDict.Add "bid", "BID"
mappingDict.Add "ask", "ASKED"
mappingDict.Add "change", "CHG"
mappingDict.Add "askYield", "ASKED YIELD"
headers = mappingDict.keys
Dim results() As String, output() As Variant, key As Variant
results = Split(s, "}")
ReDim output(1 To UBound(results), 1 To UBound(headers) + 1)
For r = LBound(results) To UBound(results) - 1
c = 1
For Each key In mappingDict.keys
re.Pattern = "" & key & """:""(.*?)"""
output(r + 1, c) = re.Execute(results(r))(0).SubMatches(0)
c = c + 1
Next
Next
re.Pattern = "timestamp"":""(.*?)"""
re.Global = True
With ThisWorkbook.Worksheets("Treasury Notes & Bonds")
.UsedRange.ClearContents
Dim matches As VBScript_RegExp_55.MatchCollection
Set matches = re.Execute(http.responseText)
.Cells(1, 1) = matches(matches.Count - 1).SubMatches(0)
.Cells(2, 1).Resize(1, UBound(headers) + 1) = headers
.Cells(3, 1).Resize(UBound(output, 1), UBound(output, 2)) = output
End With
End Sub
The following code (not using web drivers) works but isn't an easy solution. I was able to find the information stored within the body, which was isolated by using REGEX and then stored into a JSON file for parsing.
Dim XMLPage As New MSXML2.XMLHTTP60
Dim HTMLDoc As New MSHTML.HTMLDocument
Dim strPattern As String: strPattern = "window.__STATE__ = ({.+}}}});"
Dim JSON As Object
Dim Key As Variant
Dim key1, key2 As String
XMLPage.Open "GET", "https://www.wsj.com/market-data/bonds/treasuries", False
XMLPage.send
Set JSON = JsonConverter.ParseJson(REGEX(XMLPage.responseText, strPattern, "$1"))
' Notes and Bonds
key1 = "mdc_treasury_{" & """" & "treasury" & """" & ":" & """" & "NOTES_AND_BONDS" & """" & "}"
For Each Key In JSON("data")(key1)("data")("data")("instruments")
Debug.Print Key("maturityDate")
Debug.Print Key("ask")
Debug.Print Key("askYield")
Debug.Print Key("bid")
Debug.Print Key("change")
Next Key
' Bills
key2 = "mdc_treasury_{" & """" & "treasury" & """" & ":" & """" & "BILLS" & """" & "}"
For Each Key In JSON("data")(key2)("data")("data")("instruments")
Debug.Print Key("maturityDate")
Debug.Print Key("ask")
Debug.Print Key("askYield")
Debug.Print Key("bid")
Debug.Print Key("change")
Next Key
The following function will need to be copied into a module:
Function REGEX(strInput As String, matchPattern As String, Optional ByVal outputPattern As String = "$0") As Variant
Dim inputRegexObj As New VBScript_RegExp_55.RegExp, outputRegexObj As New VBScript_RegExp_55.RegExp, outReplaceRegexObj As New VBScript_RegExp_55.RegExp
Dim inputMatches As Object, replaceMatches As Object, replaceMatch As Object
Dim replaceNumber As Integer
With inputRegexObj
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = matchPattern
End With
With outputRegexObj
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = "\$(\d+)"
End With
With outReplaceRegexObj
.Global = True
.MultiLine = True
.IgnoreCase = False
End With
Set inputMatches = inputRegexObj.Execute(strInput)
If inputMatches.Count = 0 Then
REGEX = False
Else
Set replaceMatches = outputRegexObj.Execute(outputPattern)
For Each replaceMatch In replaceMatches
replaceNumber = replaceMatch.SubMatches(0)
outReplaceRegexObj.Pattern = "\$" & replaceNumber
If replaceNumber = 0 Then
outputPattern = outReplaceRegexObj.Replace(outputPattern, inputMatches(0).value)
Else
If replaceNumber > inputMatches(0).SubMatches.Count Then
'regex = "A to high $ tag found. Largest allowed is $" & inputMatches(0).SubMatches.Count & "."
REGEX = CVErr(xlErrValue)
Exit Function
Else
outputPattern = outReplaceRegexObj.Replace(outputPattern, inputMatches(0).SubMatches(replaceNumber - 1))
End If
End If
Next
REGEX = outputPattern
End If
End Function
The following resources will help:
How to use Regular Expressions (Regex) in Microsoft Excel both in-cell and loops
https://github.com/VBA-tools/VBA-JSON
You will need to install the JSON converter and reference Regular Expression in the library. The REGEX function was found elsewhere on stack overflow so someone else deserves the credit for it.

Check if Column List Contains Header via Regex - Excel vba

I'm trying to determine if a column has a header or not via VBA. Basically the column will have data following an unknown but identical regex pattern. My plan is to test if A2 has the same type regex string as A1. It would likely even be the same ID + 1. Eg
A1 = X001
A2 = X002
Func IsHeader("A") = True
A1 = ID's
A2 = X001
Func IsHeader("A") = False
I've got an idea to utilize an existing script I made to generate a regex pattern based on an input alphanumerical string, but I'm interested to see what other idea's/ways people might have of solving the issue. I realize there isn't much code, but I know I can do this and I'm working on it now. If you're not interested in answering, thats ok!
Update: Posted Answer, but I'm looking for more than a code review as I realize there is an exchange for that. I'd like to know better ways to achieve goal with a different attack vector.
This is what I got! I'm not sure how SO feels about code reviews, but im interested in what ppl think and how else they could "skin the cat" so please feel free to post an answer.
Sub Test()
If IsHeader = True Then
MsgBox "Has Header"
Else
MsgBox "No Header"
End If
End Sub
Public Function IsHeader() As Boolean
A1Pattern = RegExPattern(Range("A1").Value)
A2Pattern = RegExPattern(Range("A2").Value)
If A1Pattern = A2Pattern Then
IsHeader = True
End If
End Function
Public Function RegExPattern(my_string) As String
RegExPattern = ""
'''Special Character Section'''
Dim special_charArr() As String
Dim special_char As String
special_char = "!,#,#,$,%,^,&,*,+,/,\,;,:"
special_charArr() = Split(special_char, ",")
'''Special Character Section'''
'''Alpha Section'''
Dim regexp As Object
Set regexp = CreateObject("vbscript.regexp")
Dim strPattern As String
strPattern = "([a-z])"
With regexp
.ignoreCase = True
.Pattern = strPattern
End With
'''Alpha Section'''
Dim buff() As String
'my_string = "test1*1#"
ReDim buff(Len(my_string) - 1)
Dim i As Variant
For i = 1 To Len(my_string)
buff(i - 1) = Mid$(my_string, i, 1)
char = buff(i - 1)
If IsNumeric(char) = True Then
'MsgBox char & " = Number"
RegExPattern = RegExPattern & "([0-9])"
End If
For Each Key In special_charArr
special = InStr(char, Key)
If special = 1 Then
If Key <> "*" Then
'MsgBox char & " = Special NOT *"
RegExPattern = RegExPattern & "^[!##$%^&()].*$"
Else
'MsgBox char & " = *"
RegExPattern = RegExPattern & "."
End If
End If
Next
If regexp.Test(char) Then
'MsgBox char & " = Alpha"
RegExPattern = RegExPattern & "([a-z])"
End If
Next
'RegExPattern = Chr(34) & RegExPattern & Chr(34)
'MsgBox RegExPattern
End Function

VBA Scrape Date Widget from Search Results

when searching for a particular event. e.g. "oscars 2018 date", Google shows a widget with the date of the event, before any search results. I need to get this date in Excel but it seems difficult in terms of actual coding. I have been tinkering with these functions but not getting any results. The div I am interested in is:
<div class="Z0LcW">5 March 2018, 1:00 am GMT</div>
Here is the full code I am trying to use:
Option Explicit
Public Sub Example()
Call GoogleSearchDescription("oscars 2018 date")
End Sub
Public Function GoogleSearchDescription(ByVal SearchTerm As String) As String
Dim Query As String: Query = "https://www.google.com/search?q=" & URLEncode(SearchTerm)
Dim HTML As String: HTML = GetHTML(Query)
Dim Description() As String: Description = RegExer(HTML, "(<div class=""Z0LcW"">[\w\s.<>/]+<\/div>)")
Description(0) = FilterHTML(Description(0))
Debug.Print Description(0)
Debug.Print "ok"
End Function
Public Function GetHTML(ByVal URL As String) As String
On Error Resume Next
Dim HTML As Object
With CreateObject("InternetExplorer.Application")
.navigate URL
Do Until .ReadyState = 4: DoEvents: Loop
Do While .Busy: DoEvents: Loop
Set HTML = .Document.Body
GetHTML = HTML.innerHTML
.Quit
End With
Set HTML = Nothing
End Function
Private Function URLEncode(ByVal UnformattedString As String) As String
'CAUTION: This function URLEncodes strings to match Google Maps API URL specifications, see note below for details
'Note: We convert spaces to + signs, and skip converting plus signs to anything because they replace spaces
'We also skip ampersands [&] as they should not be parsed out of a valid query
Dim Index As Long, ReservedChars As String: ReservedChars = "!#$'()*/:;=?#[]""-.<>\^_`{|}~"
'Convert all % symbols to encoding, as the unformatted string should not already contain URL Encoded characters
UnformattedString = Replace(UnformattedString, "%", "%" & Asc("%"))
'Convert spaces to plus signs to match Google URI query specifications
UnformattedString = Replace(UnformattedString, " ", "+")
'Iterate through the reserved characters for encoding
For Index = 1 To (Len(ReservedChars) - 1)
UnformattedString = Replace(UnformattedString, Mid(ReservedChars, Index, 1), "%" & Asc(Mid(ReservedChars, Index, 1)))
Next Index
'Return URL encoded string
URLEncode = UnformattedString
End Function
Private Function FilterHTML(ByVal RawHTML As String) As String
If Len(RawHTML) = 0 Then Exit Function
Dim HTMLEntities As Variant, HTMLReplacements As Variant, Counter As Long
Const REG_HTMLTAGS = "(<[\w\s""':.=-]*>|<\/[\w\s""':.=-]*>)" 'Used to remove HTML formating from each step in the queried directions
HTMLEntities = Array(" ", "<", ">", "&", """, "&apos;")
HTMLReplacements = Array(" ", "<", ">", "&", """", "'")
'Parse HTML Entities into plaintext
For Counter = 0 To UBound(HTMLEntities)
RawHTML = Replace(RawHTML, HTMLEntities(Counter), HTMLReplacements(Counter))
Next Counter
'Remove any stray HTML tags
Dim TargetTags() As String: TargetTags = RegExer(RawHTML, REG_HTMLTAGS)
'Preemptively remove new line characters with actual new lines to separate any conjoined lines.
RawHTML = Replace(RawHTML, "<b>", " ")
For Counter = 0 To UBound(TargetTags)
RawHTML = Replace(RawHTML, TargetTags(Counter), "")
Next Counter
FilterHTML = RawHTML
End Function
Public Function RegExer(ByVal RawData As String, ByVal RegExPattern As String) As String()
'Outputs an array of strings for each matching expression
Dim RegEx As Object: Set RegEx = CreateObject("VBScript.RegExp")
Dim Matches As Object
Dim Match As Variant
Dim Output() As String
Dim OutputUBound As Integer
Dim Counter As Long
With RegEx
.Global = True
.MultiLine = True
.IgnoreCase = True
.Pattern = RegExPattern
End With
If RegEx.test(RawData) Then
Set Matches = RegEx.Execute(RawData)
For Each Match In Matches
OutputUBound = OutputUBound + 1
Next Match
ReDim Output(OutputUBound - 1) As String
For Each Match In Matches
Output(Counter) = Matches(Counter)
Counter = Counter + 1
Next Match
RegExer = Output
Else
ReDim Output(0) As String
RegExer = Output
End If
End Function
You can use data from web, with this query
https://www.google.com/search?q=oscars+2018+date&oq=oscars+2018
then check the whole page and import. it for me it was in row 27.

VBA StrComp never returns 0

I have a problem using the StrComp Function in VBA to compare two Strings.
Public Function upStrEQ(ByVal ps1 As String, ByVal ps2 As String) As Boolean
upStrEQ = False
If StrComp(ps1, ps2, vbTextCompare) = 0 Then
upStrEQ = True
End If
If Len(ps1) = Len(ps2) Then
Debug.Print ps1 & vbNewLine & ps2 & vbNewLine & upStrEQ
End If
End Function
Debug output:
Technischer Name
Technischer Name
Falsch
As you can see the two strings have the same length and equal text but upStrEQ is False and StrComp did not return 0.
Any help would be nice. Thanks.
Update:
Since one of the Strings being passed to the function is read from a cell before I made a sample document so you can reproduce my error: https://www.dropbox.com/s/6yh6d4h8zxz533a/strcompareTest.xlsm?dl=0
StrComp() works quite nice. The problem is with your input, probably you have a hidden space or a new line.
Test your code like this:
Public Function upStrEQ(ByVal ps1 As String, ByVal ps2 As String) As Boolean
If StrComp(ps1, ps2, vbTextCompare) = 0 Then
upStrEQ = True
End If
If Len(ps1) = Len(ps2) Then
Debug.Print ps1 & vbNewLine & ps2 & vbNewLine & upStrEQ
End If
End Function
Public Sub TestMe()
Debug.Print upStrEQ("a", "a")
End Sub
Furthermore, the default value of a boolean function is false, thus you do not need to set it at the beginning.
In order to clean a bit your input, only to letters and numbers, you can use a custom RegEx function. Thus, something like this would always return letters and numbers:
Public Function removeInvisibleThings(s As String) As String
Dim regEx As Object
Dim inputMatches As Object
Dim regExString As String
Set regEx = CreateObject("VBScript.RegExp")
With regEx
.pattern = "[^a-zA-Z0-9]"
.IgnoreCase = True
.Global = True
Set inputMatches = .Execute(s)
If regEx.test(s) Then
removeInvisibleThings = .Replace(s, vbNullString)
Else
removeInvisibleThings = s
End If
End With
End Function
Public Sub TestMe()
Debug.Print removeInvisibleThings("aa1 Abc 67 ( *^ 45 ")
Debug.Print removeInvisibleThings("aa1 ???!")
Debug.Print removeInvisibleThings(" aa1 Abc 1267 ( *^ 45 ")
End Sub
In your code, use it when you are passing the parameters ps1 and ps2 to the upStrEQ.

Retrieve alpha characters from alphanumeric string

How can I split up AB2468123 with excel-vba
I tried something along these lines:
myStr = "AB2468123"
split(myStr, "1" OR "2" OR "3"......."9")
I want to get only alphabet (letters) only.
Thanks.
How about this to retrieve only letters from an input string:
Function GetLettersOnly(str As String) As String
Dim i As Long, letters As String, letter As String
letters = vbNullString
For i = 1 To Len(str)
letter = VBA.Mid$(str, i, 1)
If Asc(LCase(letter)) >= 97 And Asc(LCase(letter)) <= 122 Then
letters = letters + letter
End If
Next
GetLettersOnly = letters
End Function
Sub Test()
Debug.Print GetLettersOnly("abc123") // prints "abc"
Debug.Print GetLettersOnly("ABC123") // prints "ABC"
Debug.Print GetLettersOnly("123") // prints nothing
Debug.Print GetLettersOnly("abc123def") // prints "abcdef"
End Sub
Edit: for completeness (and Chris Neilsen) here is the Regex way:
Function GetLettersOnly(str As String) As String
Dim result As String, objRegEx As Object, match As Object
Set objRegEx = CreateObject("vbscript.regexp")
objRegEx.Pattern = "[a-zA-Z]+"
objRegEx.Global = True
objRegEx.IgnoreCase = True
If objRegEx.test(str) Then
Set match = objRegEx.Execute(str)
GetLettersOnly = match(0)
End If
End Function
Sub test()
Debug.Print GetLettersOnly("abc123") //prints "abc"
End Sub
Simpler single shot RegExp
Sub TestIt()
MsgBox CleanStr("AB2468123")
End Sub
Function CleanStr(strIn As String) As String
Dim objRegex As Object
Set objRegex = CreateObject("vbscript.regexp")
With objRegex
.Pattern = "[^a-zA-Z]+"
.Global = True
CleanStr = .Replace(strIn, vbNullString)
End With
End Function
This is what i have found out that works the best. It may be somewhat basic, but it does the job :)
Function Split_String(Optional test As String = "ABC111111") As Variant
For i = 1 To Len(test)
letter = Mid(test, i, 1)
If IsNumeric(letter) = True Then
justletters = Left(test, i - 1)
justnumbers = Right(test, Len(test) - (i - 1))
Exit For
End If
Next
'MsgBox (justnumbers)
'MsgBox (justletters)
'just comment away the answer you want to have :)
'Split_String = justnumbers
'Split_String = justletters
End Function
Possibly the fastest way is to parse a Byte String:
Function alpha(txt As String) As String
Dim b, bytes() As Byte: bytes = txt
For Each b In bytes
If Chr(b) Like "[A-Za-z]" Then alpha = alpha & Chr(b)
Next b
End Function
More information here.

Resources