Excel macro to search a website with excel data and extract specific results and then loop for next value for another webiste - excel

I have replicated the code in Excel macro to search a website with excel data and extract specific results and then loop for next value, although I get a error on the line URL_Get_SKU_Query1 = entityRange.Offset(0, 1).Value2 stating "object variable or with block variable not set"
So I am just trying to replicate the code for another website.
This code pulls in a certain text and spits out a value from the webiste.
So I would like to enter in MFR SKU in sheet 1 as such:
Name // SKU // Price
WaterSaverFaucet // SS902BC
After I have created a macro button on sheet 2 and clicking it
Then have it spit out the price.
So that it ends up like this below:
Name // SKU // Price
WaterSaverFaucet // SS902BC // 979.08
I would need this in order to look up multiple items on a website.
Sub LoopThroughBusinesses1()
Dim i As Integer
Dim SKU As String
For i = 2 To Sheet1.UsedRange.Rows.Count
SKU = Sheet1.Cells(i, 2)
Sheet1.Cells(i, 3) = URL_Get_SKU_Query1(SKU)
Next i
End Sub
Function URL_Get_SKU_Query1(strSearch As String) As String ' Change it from a Sub to a Function that returns the desired string
' strSearch = Range("a1") ' This is now passed as a parameter into the Function
Dim entityRange As Range
With Sheet2.QueryTables.Add( _
Connection:="URL;https://www.neobits.com/SearchBySKU.aspx?SearchText=" & strSearch & "&safe=active", _
Destination:=Sheet2.Range("A1")) ' Change this destination to Sheet2
.BackgroundQuery = True
.TablesOnlyFromHTML = True
.Refresh BackgroundQuery:=False
.SaveData = True
End With
' Find the Range that has "Price"
Set entityRange = Sheet2.UsedRange.Find("Price")
' Then return the value of the cell to its' right
URL_Get_SKU_Query1 = entityRange.Offset(0, 1).Value2
' Clear Sheet2 for the next run
Sheet2.UsedRange.Delete
End Function

Your logic is flawed unfortunately. You cannot simply take the mechanism from one webpage and assume it works for the next. In this case the solution you are trying will not work. When you enter a SKU into search what actually happens is a page re-direct (302). Not the construction of an url as you have tried. You are getting the error you see primarily due to hitting a page not found - though surfaces due to your element not being found on the 404 page.
Instead, you can use the construct the page in question actually uses for initial url and then you can use xmlhttp which will follow the re-direct as follows:
VBA:
Option Explicit
Public Sub GetPrices()
Dim xhr As XMLHTTP60, html As HTMLDocument, ws As Worksheet, i As Long
Set ws = ThisWorkbook.Worksheets("Sheet1")
Set xhr = New XMLHTTP60
Set html = New HTMLDocument
Dim allData()
allData = ws.UsedRange.Value
With xhr
For i = 2 To UBound(allData, 1)
.Open "GET", "https://www.neobits.com/search?keywords=" & allData(i, 2), False
.send
Dim price As Object
html.body.innerHTML = .responseText
Set price = html.querySelector("#main_price")
If Not price Is Nothing Then
allData(i, 3) = price.innerText
Else
allData(i, 3) = "No price found"
End If
Set price = Nothing
Next
End With
ws.Cells(1, 1).Resize(UBound(allData, 1), UBound(allData, 2)) = allData
End Sub
I assume your page set-up, in Sheet1, is as follows:
Required project references:
The two references bounded in red are required. Press Alt+F11 to open the VBE and then go Tools > References and add references. You may have a different version number for xml library - in which case reference will need changing as will code references of
Dim xhr As XMLHTTP60
and
New XMLHTTP60
To run this code:
Press Alt+F11 to open the VBE > Right click in project explorer > Add standard module. Paste code into that standard module > Select anywhere inside the code and press F5, or hit the green Run arrow in the ribbon.
You could further develop, for example, to handle non 200 status codes:
Option Explicit
Public Sub GetPrices()
Dim xhr As XMLHTTP60, html As HTMLDocument, ws As Worksheet, i As Long
Set ws = ThisWorkbook.Worksheets("Sheet1")
Set xhr = New XMLHTTP60
Set html = New HTMLDocument
Dim allData(), price As Object
allData = ws.UsedRange.Value
With xhr
For i = 2 To UBound(allData, 1)
.Open "GET", "https://www.neobits.com/search?keywords=" & allData(i, 2), False
.send
If .Status <> 200 Then
allData(i, 3) = "Status not succeeded" '<== Little bit loose but you get the idea.
Else
html.body.innerHTML = .responseText
Set price = html.querySelector("#main_price")
If Not price Is Nothing Then
allData(i, 3) = price.innerText
Else
allData(i, 3) = "No price found"
End If
Set price = Nothing
End If
Next
End With
ws.Cells(1, 1).Resize(UBound(allData, 1), UBound(allData, 2)) = allData
End Sub

' Find the Range that has "Entity Type:"
Set entityRange = Sheet2.UsedRange.Find("Lists At:")
' Then return the value of the cell to its' right
URL_Get_SKU_Query1 = entityRange.Offset(0, 1).Value2
The problem is that Range.Find may not find what you're looking for, for various reasons. Always specify the optional parameters to that function, since it otherwise "conveniently remembers" the values from the last time it was invoked - either from other VBA code, or through the Excel UI (IOW there's no way to be 100% sure of what values it's going to be running with if you don't specify them). But even then, if Range.Find doesn't find what it's looking for, it will return Nothing - and you can't just assume that will never happen!
But, reading closer...
' Find the Range that has "Entity Type:"
Set entityRange = Sheet2.UsedRange.Find("Lists At:")
Someone's lying. Read the comment. Now read the code. Who's telling the truth? Don't write comments that say "what" - have comments say "why", and let the code say "what". Otherwise you have situations like that, where it's impossible to tell whether the comment is outdated or the code isn't right, at least not without looking at the worksheet.
In any case, you need to make sure entityRange isn't Nothing before you try to make a member call against it:
If Not entityRange Is Nothing Then
URL_Get_SKU_Query1 = entityRange.Offset(0, 1).Value2
End If

Related

I have questions about how to click and search on web using vba

I have questions about how to click and search on web using vba.
I have wrote the code, but cannot find how to click the button in this web
Sub LEISearch()
'dimension (declare or set aside memory for) our variables
Dim objIE As InternetExplorer 'special object variable representing the IE browser
Dim LEI As HTMLLinkElement 'special object variable for an <a> (link) element
Dim y As Integer 'integer variable we'll use as a counter
Dim result As String 'string variable that will hold our result link'
Dim result2 As String
'initiating a new instance of Internet Explorer and asigning it to objIE
Set objIE = New InternetExplorer
'make IE browser visible (False would allow IE to run in the background)
objIE.Visible = True
'navigate IE to this web page (a pretty neat search engine really)
objIE.navigate "https://www.gmeiutility.org/search.jsp?keyWord"
'wait here a few seconds while the browser is busy
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
'in the search box put cell "A2" value, the word "in" and cell "C1" value
objIE.document.getElementById("searchInput").Value = _
Sheets("Macro1").Range("A1").Value
'click the 'go' button
Set LEIButton = objIE.document.getElementsByClassName("hiddenSubmitButton")
LEIButton.Focus
LEIButton.Click
'wait again for the browser
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
End Sub
This case is a really nice and clean example of web scraping so i will take this opportunity to present an educational post.
I highly recommend avoiding using IE to scrape websites whenever it's possible. It's highly inefficient. Especially in a case like this where there can be multiple pages of results. Instead, you can use HTTP requests.
An HTTP request is a structured way to request something from a server. In this case we want to send a keyword to the server and get the corresponding search results.
To find out how this request should look like, you have to inspect the network traffic when the button with the magnifying glass is clicked. You can do that through your browser's developer tools (Ctrl+Shift+E if you're using Firefox):
If you go through the Headers and the Params of the request you will see how the url, the body and the headers should look like. In this particular case, all the parameters are encoded into the url and the headers are not essential to the success of the request, so all you need is the url.
Some of the parameters of the url are the keyword, the number of results per page and the number of page.
The response's payload is in json format. You can inspect its structure using a tool like this. Here's how it looks like:
Basically the JSON response consists of as many results as you have specified that should be displayed per page (or less). To get the next page you need to send a new request with the same keyword but specifying a new page number and so on.
In fact, as you can see the website offers a lot more data than what's displayed on your browser, which could prove to be useful.
The code below searches for the keyword test, while requesting 25 results per page. One first request is sent to find out how many pages of results are there and then the code loops through all pages and prints the results in a worksheet.
TL;DR
Option Explicit
Sub main()
Dim sht As Worksheet
Dim totalNumberOfPages As Long
Dim searchResults As Object
Dim pageNumber As Long
Dim results() As String
Dim entity As Object
Dim i As Long, j As Long
Dim rng As Range
Set sht = ThisWorkbook.Worksheets("Name of your Worksheet")
''''''First request to find out the number of pages''''''
Set searchResults = XHRrequest("test", 25, 1) '
totalNumberOfPages = searchResults("totalPages") '
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''
'''''''''''''''''''Loop through all the pages''''''''''''''''''''''''''''''''
For pageNumber = 1 To totalNumberOfPages Step 1 '
Set searchResults = XHRrequest("test", 25, pageNumber) '
ReDim results(1 To searchResults("entitySearchResult").Count, 1 To 7) '
i = 0 '
'''''''''''write the results in an array'''''''''''''''''''''''''''' '
For Each entity In searchResults("entitySearchResult") ' '
i = i + 1 ' '
results(i, 1) = entity("LEINumber") ' '
results(i, 2) = entity("legalName") ' '
results(i, 3) = entity("city") ' '
results(i, 4) = entity("headquartersCountry") ' '
results(i, 5) = entity("recordStatus") ' '
results(i, 6) = entity("renewalStatus") ' '
results(i, 7) = entity("entityStatus") ' '
Next entity ' '
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' '
'''''''''''''''write all the results in the worksheet in one go''''' '
With sht ' '
Set rng = .Range("A" & .Rows.Count).End(xlUp).Offset(1, 0) ' '
End With ' '
rng.Resize(UBound(results, 1), UBound(results, 2)) = results ' '
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' '
Next pageNumber '
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
End Sub
Public Function XHRrequest(ByVal key As String, ByVal resultsPerPage As Long, ByVal pageNumber As Long) As Object
Dim req As New WinHttpRequest
Dim url As String
url = "https://www.gmeiutility.org/actions/Search/?isPendingValidationChecked=true&isSearchAllLOUChecked=true&keyWord=" & key & "&page=" & pageNumber & "&resultsPerPage=" & resultsPerPage & "&searchType=baseSearch" 'build the URL according to the parameters
'''''''''Send the HTTP request'''''''''''''''''''''''''''''''
With req '
.Open "POST", url, False '
.send '
Set XHRrequest = JsonConverter.ParseJson(.responseText) '
End With '
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
End Function
For demonstration purposes the code above prints all the data in worksheet named Name of your Worksheet.
If you need to perform multiple searches you can easily modify the code to best fit your needs. More specifically you can loop through multiple keywords and call the XHRrequest function using those keywords instead of "test".
Here's a sample of the output:
You will need to add the following references to your project (VBE>Tools>References):
Microsoft WinHTTP Services version 5.1
Microsoft HTML Objects Library
Microsoft Scripting Runtime
You will also need to add this JSON parser to your project. Follow the installation instructions in the link and you should be set to go.

Search a website with Excel data to extract results and then loop

I have 8000 values in an Excel spreadsheet.
I need to search a website and then record a specific line of data from the website to in the Excel spreadsheet.
I found code which searches for data excel macro to search a website and extract results
Sub URL_Get_ABN_Query()
strSearch = Range("a1")
With ActiveSheet.QueryTables.Add( _
Connection:="URL;http://www.abr.business.gov.au/SearchByABN.aspx?SearchText=" & _
strSearch & "&safe=active", _
Destination:=Range("a5"))
.BackgroundQuery = True
.TablesOnlyFromHTML = True
.Refresh BackgroundQuery:=False
.SaveData = True
End With
'enter code here
End Sub
It collects the data from the website like this.
I only want the 'entity type' data line.
I can not find how to extend the code to only grab this line and input to the corresponding cell. i.e. ABN(b2)search, find input 'entity type' and paste into Company Type(c2).
Alternatively, I tried to find how to fill the information vertically instead of horizontally. I could delete the columns that are not needed. I thought this may be simpler.
I tried to record the macro with developer.
I also need to loop to the next ABN and populate the corresponding field and so on (B3>C3, B4>C4, etc.).
This is absolutely possible. You've got what I often find the hardest part, sourcing the information from another platform. To make this work I would separate it out a little bit and for simplicity use 2 sheets (Sheet1 with your known data and Sheet2 for the web data).
Loop through your table of ~8000 businesses. We can identify this from the UsedRange number of Rows. We know that the ABN is in column 2 (also known as B) so we copy that into the variable to pass to the function. The function will return the "Entity type:" to column 3 (C) of the same row.
Sub LoopThroughBusinesses()
Dim i As Integer
Dim ABN As String
For i = 2 To Sheet1.UsedRange.Rows.Count
ABN = Sheet1.Cells(i, 2)
Sheet1.Cells(i, 3) = URL_Get_ABN_Query(ABN)
Next i
End Sub
Change the subroutine you created to a Function so it returns the entity type you are after. The function will save the data into Sheet2 and then return just the Entity data that we are after.
Function URL_Get_ABN_Query(strSearch As String) As String ' Change it from a Sub to a Function that returns the desired string
' strSearch = Range("a1") ' This is now passed as a parameter into the Function
Dim entityRange As Range
With Sheet2.QueryTables.Add( _
Connection:="URL;http://www.abr.business.gov.au/SearchByABN.aspx?SearchText=" & strSearch & "&safe=active", _
Destination:=Sheet2.Range("A1")) ' Change this destination to Sheet2
.BackgroundQuery = True
.TablesOnlyFromHTML = True
.Refresh BackgroundQuery:=False
.SaveData = True
End With
' Find the Range that has "Entity Type:"
Set entityRange = Sheet2.UsedRange.Find("Entity type:")
' Then return the value of the cell to its' right
URL_Get_ABN_Query = entityRange.Offset(0, 1).Value2
' Clear Sheet2 for the next run
Sheet2.UsedRange.Delete
End Function
You do not want a load of connections (queryTables) set up in this way. It will be so slow if even possible. At 8000 requests, provided xmlhttp is not blocked or throttled, the below method will be significantly faster. If there does appear to be slowing/blocking then add in a small wait every x requests.
If possible use xmlhttp to gather data. Use css selectors to specifically target the entity type. Store values in an array and write out with loop at end. Use a class to hold the xmlhttp object for greater efficiency. Provide your class with methods including how to handle not found (example given). Add some further optimizations e.g. given is switching off screen-updating. This assumes your search numbers are in column B from B2. The code below also does some basic checks that there is something present in column B and handles the case of there being 1 or more numbers.
Good code is modular and you want a function to return something and a sub to perform actions. A single sub/function shouldn't complete lots of tasks. You want to easily debug with code that follows the principle of single responsibility (or close to it).
class clsHTTP
Option Explicit
Private http As Object
Private Sub Class_Initialize()
Set http = CreateObject("MSXML2.XMLHTTP")
End Sub
Public Function GetHTML(ByVal URL As String) As String
Dim sResponse As String
With http
.Open "GET", URL, False
.send
GetHTML = StrConv(.responseBody, vbUnicode)
End With
End Function
Public Function GetEntityType(ByVal html As HTMLDocument) As String
On Error GoTo errhand:
GetEntityType = html.querySelector("a[href*='EntityTypeDescription']").innerText
Exit Function
errhand:
GetEntityType = "Not Found"
End Function
Standard module:
Option Explicit
Public Sub GetInfo()
Dim http As clsHTTP, sResponse As String, lastRow As Long, groupResults(), i As Long, html As HTMLDocument
Set html = New HTMLDocument
Set http = New clsHTTP
Const BASE_URL As String = "http://www.abr.business.gov.au/ABN/View/"
Application.ScreenUpdating = False
With ThisWorkbook.Worksheets("Sheet1")
lastRow = .Cells(.rows.Count, "B").End(xlUp).Row
Select Case lastRow
Case 1
Exit Sub
Case 2
ReDim arr(1, 1): arr(1, 1) = .Range("B2").Value
Case Else
arr = .Range("B2:B" & lastRow).Value
End Select
ReDim groupResults(1 To lastRow - 1)
With http
For i = LBound(arr, 1) To UBound(arr, 1)
If Len(BASE_URL & arr(i, 1)) > Len(BASE_URL) Then
sResponse = .GetHTML(BASE_URL & arr(i, 1))
html.body.innerHTML = sResponse
groupResults(i) = .GetEntityType(html)
sResponse = vbNullString: html.body.innerHTML = vbNullString
End If
Next
End With
For i = LBound(groupResults) To UBound(groupResults)
.Cells(i + 1, "C") = groupResults(i)
Next
End With
Application.ScreenUpdating = True
End Sub
References (VBE> Tools > References):
Microsoft HTML Object Library
CSS selectors:
I use the fact the entity description is a hyperlink (a tag) and that its value contains the string EntityTypeDescription to use a css attribute = value with contains (*) operator to target.

Excel Pulling multiple Tables From a Website

I am working on a project to run some analytical models on NFL player stats. I have some code below that another user passed along to me. This code takes a list of links that I have on Sheet1, which is named "PlayerList", and creates a new tab for each player and pulls in their passing stats. All of the links are to Pro Football Reference. I am able to change this code to pull all necessary data for all positions other than quarterback. For the QBs I want to pull the passing stats table as well as the rushing and receiving stats table. Any help would be greatly appreciated. For reference here a few sample links:
https://www.pro-football-reference.com/players/R/RodgAa00.htm
https://www.pro-football-reference.com/players/B/BreeDr00.htm
Below is the code:
Option Explicit
Public Sub GetInfo()
Di If InStr(links(link, 1), "https://") > 0 Then
Set html = GetHTMLDoc(links(link, 1))
Set hTable = html.getElementById("passing")
If Not hTable Is Nothing Then
playerName = GetNameAbbr(links(link, 1))
Set ws = AddPlayerSheet(playerName)
WriteTableToSheet hTable, ws
FixTable ws
End If
End If
Next
Application.ScreenUpdating = True
End Sub
Public Function GetHTMLDoc(ByVal url As String) As HTMLDocument
Dim sResponse As String, html As New HTMLDocument
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", url, False
.send
sResponse = StrConv(.responseBody, vbUnicode)
End With
sResponse = Mid$(sResponse, InStr(1, sResponse, "<!DOCTYPE "))
html.body.innerHTML = sResponse
Set GetHTMLDoc = html
End Function
Public Sub WriteTableToSheet(ByVal hTable As HTMLTable, ByVal ws As Worksheet)
Dim x As Long, y As Long
With hTable
For x = 0 To .Rows.Length - 1
For y = 0 To .Rows(x).Cells.Length - 1
If y = 6 Or y = 7 Then
ws.Cells(x + 4, y + 1).Value = Chr$(39) & .Rows(x).Cells(y).innerText
Else
ws.Cells(x + 4, y + 1).Value = .Rows(x).Cells(y).innerText
End If
Next y
Next x
End With
End Sub
Public Function GetNameAbbr(ByVal url As String)
Dim tempArr() As String
tempArr = Split(url, "/")
GetNameAbbr = Left$(tempArr(UBound(tempArr)), 6)
End Function
Public Function AddPlayerSheet(ByVal playerName As String) As Worksheet
Dim ws As Worksheet
If SheetExists(playerName) Then
Application.DisplayAlerts = False
ThisWorkbook.Worksheets(playerName).Delete
Application.DisplayAlerts = True
End If
Set ws = ThisWorkbook.Worksheets.Add
ws.Name = playerName
Set AddPlayerSheet = ws
End Function
Public Function SheetExists(ByVal playerName As String) As Boolean
SheetExists = Evaluate("ISREF('" & playerName & "'!A1)")
End Function
Public Sub FixTable(ByVal ws As Worksheet)
Dim found As Range, numSummaryRows As Long
With ws
Set found = .Columns("A").Find("Career")
If found Is Nothing Then Exit Sub
numSummaryRows = .Cells(.Rows.Count, "A").End(xlUp).Row - found.Row
numSummaryRows = IIf(numSummaryRows = 0, 1, numSummaryRows + 1)
Debug.Print found.Offset(, 1).Resize(numSummaryRows, 30).Address, ws.Name
found.Offset(, 1).Resize(numSummaryRows, 30).Copy found.Offset(, 2)
found.Offset(, 1).Resize(numSummaryRows, 1).ClearContents
End With
End Subm html As New HTMLDocument, links(), link As Long, wsSourceSheet As Worksheet
Dim hTable As HTMLTable, ws As Worksheet, playerName As String
Set wsSourceSheet = ThisWorkbook.Worksheets("PlayerList")
Application.ScreenUpdating = False
With wsSourceSheet
links = .Range("C2:C" & .Cells(.Rows.Count, "C").End(xlUp).Row).Value
End With
For link = LBound(links, 1) To UBound(links, 1)
Is there a reason you need to do this with VBA? Excel is quite capable of importing well-organized data such as the [several] tables on that page.
Under the Data tab, click From Web and then enter the Website URL.
Click images to enlarge
Next you will choose the table(s) that you want. Don't go nuts - only get what you need, but you can choose more than one tables by enabling the checkbox.
It can take a few long minutes to parse and organize all the data on the page...
Once you're back at the worksheet you'll see the queries on the right side. Right-click a query and choose Load To..., then choose Table and a location for the table data. There are a ton of other properties that you can customize; there are tutorials describing what you can do.
More things to customize are hidden in two ribbon tabs that only appear when you click on a table, Design and Query.
I think there's also a way to just create a list of players and then to use the Advanced option when entering the URL to allow you to dynamically choose any player you want, while only adding the tables once... but I've never quite figured that part out yet.
I'm not a sports fan, but I assume the data will be changing throughout the season, and an advantage of using tables like this is that once you set up your worksheet how you want it, there are settings you can choose to auto-update every time you open the workbook, or on schedule, or manually, or never; whatever is appropriate.
Google "Excel web query" to find out more about the plethora of options available to you when using queries (aka: "Get & Transform") to extract and organize your data.
Perhaps this could be an alternative to consider instead of coding functionality that's already built-in to Excel.
Good luck, and "Go Sports!"
Yes there is a reason for doing this with VBA. In fact at least five.....
You don't manually have to set up it up for all the links, which if you have a very long list means you would end up having to turn to automation anyway;
On a related theme, powerquery has limitations on how many connections it can support and with NFL player lists you can easily go way beyond what is supported and end up, even when at the max number of connections allowed, with a workbook that crashes or grinds to a halt (I have been there!);
Both tables are not always present so the below has error handling to deal with that;
You get your player named sheets as before, and again error handling for if sheet already present;
Not all versions of powerquery have the nice interface which will allow you to select all the tables individually for these pages. My version of Excel 2016 basically offers only to select the entire page. In that case you have more data than you need and a slowed down process.
Whilst there may be ways to handle this with inbuilt tools, I love me a bit of powerquery, it is no longer "out of the box", but requires knowing how to code in M to some extent and/or reverting to using some VBA anyway.
If you tie this to a button on a sheet you can easily press to refresh when you want, link it to a workbook_open event to refresh on opening, even have windows scheduler open the workbook and refresh at certain times (just so you know VBA still got your back! Though maybe with a little help from my friends ♫ aka Windows).
It seems XHR is just a little too fast for the lower tables on each page, but do not despair, you could use Internet Explorer, with a short delay to ensure the Rushing & Receiving table is populated, or, as I have, use Selenium to automate the browser (I have used Chrome but Internet Explorer is possible). Although this is slower than XHR, we can be a little more efficient by running a headless browser instance.
Here you go with VBA which will give you the different tabs as you go and select only those tables required. Based on links in at C2 on sheet1.
Option Explicit
Public Sub GetInfo()
Dim d As New ChromeDriver
Dim html As New HTMLDocument, links(), link As Long, wsSourceSheet As Worksheet, clipboard As Object
Dim hTablePass As HTMLTable, hTableRushReceive As HTMLTable, ws As Worksheet, playerName As String
Set wsSourceSheet = ThisWorkbook.Worksheets("Sheet1") '<change to sheet containing links
Application.ScreenUpdating = False
With wsSourceSheet
If .Cells(.Rows.Count, "C").End(xlUp).Row = 2 Then
ReDim links(1 To 1, 1 To 1): links(1, 1) = .Range("C2")
Else
links = .Range("C2:C" & .Cells(.Rows.Count, "C").End(xlUp).Row).Value
End If
End With
For link = LBound(links, 1) To UBound(links, 1)
If InStr(links(link, 1), "https://") > 0 Then
With d
.AddArgument "--headless"
.get links(link, 1)
html.body.innerHTML = .PageSource
Set hTablePass = html.querySelector("#all_passing #passing")
Set hTableRushReceive = html.querySelector("#all_rushing_and_receiving #rushing_and_receiving")
playerName = GetNameAbbr(links(link, 1))
Set ws = AddPlayerSheet(playerName)
Set clipboard = GetObject("New:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}")
If Not hTablePass Is Nothing Then
clipboard.SetText Replace$(Replace$(hTablePass.outerHTML, "--></DIV>", vbNullString), "<!--", vbNullString)
clipboard.PutInClipboard
ws.Cells(GetLastRow(ws, 1), 1).PasteSpecial
End If
If Not hTableRushReceive Is Nothing Then
clipboard.SetText hTableRushReceive.outerHTML
clipboard.PutInClipboard
ws.Cells(GetLastRow(ws, 1) + 2, 1).PasteSpecial
End If
End With
End If
Next
Application.ScreenUpdating = True
End Sub
Public Function GetNameAbbr(ByVal url As String) As String
Dim tempArr() As String
tempArr = Split(url, "/")
GetNameAbbr = Left$(tempArr(UBound(tempArr)), 6)
End Function
Public Function AddPlayerSheet(ByVal playerName As String) As Worksheet
Dim ws As Worksheet
If SheetExists(playerName) Then
Application.DisplayAlerts = False
ThisWorkbook.Worksheets(playerName).Delete
Application.DisplayAlerts = True
End If
Set ws = ThisWorkbook.Worksheets.Add
ws.Name = playerName
Set AddPlayerSheet = ws
End Function
Public Function SheetExists(ByVal playerName As String) As Boolean '<== *#Rory
SheetExists = Evaluate("ISREF('" & playerName & "'!A1)")
End Function
Public Function GetLastRow(ByVal ws As Worksheet, Optional ByVal columnNumber As Long = 1) As Long
With ws
GetLastRow = .Cells(.Rows.Count, columnNumber).End(xlUp).Row
End With
End Function
References:
Microsoft HTML Object Library
Selenium Type Library
Selenium basic download:
https://github.com/florentbr/SeleniumBasic
*Function adapted from #Rory

Scraping data from a website with a dynamic array function - Excel VBA

I want to eventually create a function where I can specify a web page element and URL and populate all instances of that element down a column. But am currently only experiencing limited success with this function:
Sub GrabAnchorTags() '(URL As String) As Variant'
Dim objIE As InternetExplorer
Dim elem As Object
Set objIE = New InternetExplorer
objIE.Visible = False
objIE.navigate "http://example.com/"
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
Dim aRange As Range
Debug.Print objIE.document.getElementsByTagName("a").Length
For Each elem In objIE.document.getElementsByTagName("a")
Debug.Print elem
ActiveCell.Offset(x, y).Value = elem
ActiveCell.Offset(x, y + 1).Value = elem.textContent
x = x + 1
Next
objIE.Quit
Set objIE = Nothing
End Sub
I would like to be able to turn this successfully from a macro to a function.
Currently, it uses a for loop to populate the cells and I wonder if it's possible to accomplish the same thing using evaluate or something similar because the for loop is inefficient.
This function would need to live in a cell, reference a URL in another cell, and populate the cells bellow it with all elements of a type found on the page. I am currently working on the anchor tag.
Many other solutions I referenced used macros:
Scraping data from website using excel vba
Getting links url from a webpage excel vba
VBA – Web scraping with getElementsByTagName()
Generally speaking, whenever you have many cells to write to, you should enter the data into an internal array, and then write the entire array to the worksheet in one hit. However you seem to not want a macro/sub in your case.
If you wish it to take the worksheet formula approach for usability reasons, then the best way is to use a very powerful, but underused technique in Excel development.
A NAMED RANGE
Named ranges are Excels closest thing to getting an in-memory block of data, and then other simpler formulas can use the named range to get info from the Named Range.
A Named Range doesn't have to actually be a simple block of cells on a sheet. You can write your VBA formula as a Public formula, and then reference it in the Named Range.
Function getElems(url As String, tagName As String) As String()
Dim browser As New MSXML2.XMLHTTP60
Dim doc As MSHTML.HTMLDocument
With browser
.Open "GET", url, False
.send
If .readyState = 4 And .Status = 200 Then
Set doc = New MSHTML.HTMLDocument
doc.body.innerHTML = .responseText
Else
MsgBox "Error" & vbNewLine & "Ready state: " & .readyState & _
vbNewLine & "HTTP request status: " & .Status
End If
End With
Dim tag As MSHTML.IHTMLElement
Dim tags As MSHTML.IHTMLElementCollection
Set tags = doc.getElementsByTagName(tagName)
Dim arr() As String
Dim arrCounter As Long: arrCounter = 1
ReDim arr(1 To tags.Length, 1 To 2)
For Each tag In tags
arr(arrCounter, 1) = tag.innerText
'Change the below if block to suit
If tagName = "a" Then
arr(arrCounter, 2) = tag.href
Else
arr(arrCounter, 2) = tag.innerText
End If
arrCounter = arrCounter + 1
Next tag
Set doc = Nothing
Set browser = Nothing
getElems = arr
End Function
Now set a Named Range in Excel such as:
elementData
=getElems(Sheet1!$A$1, Sheet1!$B$1)
In A1, put the URL, and in B1 put the tag Name such as "a"
Then in your cells you can say
=INDEX(elementData, ROW(1:1), 1) and in adjacent cell put =INDEX(elementData, ROW(1:1), 2) (or use ROWS formula technique)
and drag down.

Web Query from URL in Cell

I believe I have thoroughly researched this question (sorry if you have seen the answer, please be patient with me).
Truly a newcomer to VBA/Macros and do not even fully understand where to "put" the codes that are provided in these message boards, that is why I prefer a formula.
My sheet has cells which feed to a hyperlink (i.e. A1=JFK, B1:CVG, C1=HYPERLINK("http://www.gcmap.com/dist?p="&A1&"-"&B1,"My Flight").
If you visit the link (http://www.gcmap.com/dist?P=jfk-cvg) it shows the flying distance between these two points - 589 mi.
What I am trying to do is do a web query in Excel based off the link provided in cell C1, and then have the web query point to the total distance included in the link - and then populate another cell on my sheet (D1) with that distance.
Any and all help would be appreciated!
How's something like this:
Sub getMiles()
'Thanks to http://stackoverflow.com/questions/16975506/how-to-download-source-code-from-a-website-with-vba for idea
Dim k As Long, s
Dim URL2 As String
Dim ws As Worksheet, newWS As Worksheet
Set ws = ActiveSheet
Application.ScreenUpdating = False
URL2 = ws.Cells(1, 3) 'Cell C1 is the URL
' to get data from the url we need to creat a win Http object_
' tools > references > select Windows Win Http Services 5.1
Dim Http2 As New WinHttpRequest
'open the url
Http2.Open "GET", URL2, False
' send request
Http2.Send
'MsgBox Http2.ResponseText
Debug.Print s
'Debug.Print Http2
Debug.Print URL2
Dim Resp As String: Resp = Http2.ResponseText
Dim Lines2 As Variant: Lines2 = Split(Resp, ">")
Worksheets.Add after:=Sheets(Sheets.Count)
Set newWS = ActiveSheet
newWS.Name = "Temp for source code"
k = 0
For k = LBound(Lines2) To UBound(Lines2)
newWS.Cells(1 + k, 1).Value = Lines2(k)
k = k + 1
Next k
Dim findString As String, stringCell As Range
findString = " mi"
Set stringCell = newWS.Columns(1).Find(what:=findString)
Dim milesFlown As String
milesFlown = Left(stringCell.Value, WorksheetFunction.Search("&", stringCell, 1) - 1)
'MsgBox ("You would fly " & milesFlown)
ws.Cells(1, 4).Value = milesFlown
Application.DisplayAlerts = False
newWS.Delete
Application.DisplayAlerts = True
Application.ScreenUpdating = True
End Sub
It's sort of roundabout, but what it does is get the source code of your URL, and in that source code, look for a string that only seems to occur before the miles are given (" mi"), then finds the numbers to the left of the &, and sets that as your miles. You will need to tweak the macro to correctly point to the cell with your URL. Let me know if you need any help doing so!
edit: Ah, to use this code, with Excel open, press ALT+F11, this will open up the VB editor. I think you can insert this code (just copy/paste) into the "Sheet1 (Sheet1)" part. If not, you'll need to right click "VBAProject ([yourbook])" and Insert Module, and put the code there. It should then show up in your macro list (View tab --> Macros).
Edit2: Also, you'll need to add a Reference most likely in VBA. Press ALT+F1 to open VB Editor, then in Tools -> References, look for "Microsoft WinHTTP Services, version 5.1" and add a check mark, and click "Ok" to add this reference. Otherwise, you'll get an error.
Edit3: Updated the code. It now puts the source code on a new sheet, so anything you have in Col. A won't be deleted.

Resources