I am extracting some data from the web everything is coming of fine, except for one set of data, when it is extracted it is showing backwards in the cell.
I can not work out why it is showing backwards, as everything else is extracting fine .
Q) Could some please advise why it would do this?
This is is what i am using to pull the data, it works fine for everything else, just not this class it shows backwards in excel
Set doc = NewHTMLDocument(CStr(link))
'''IF Statement, change class to suite needs 'bscd
' On Error Resume Next
If doc.getElementsByClassName("bscd")(0) Is Nothing Then
wsSheet.Cells(StartRow + Counter, 5).Value = "-"
Else
' On Error Resume Next
wsSheet.Cells(StartRow + Counter, 5).Value = doc.getElementsByClassName("bscd")(0).Children(1).InnerText
End If
This is Class
Result Showing Backwards in excel
Could it be that the "Complete Information" is a JAVA dropdown?
Just did Ctrl+U as recommended and this is how the html is, it is showing backwards here, but displays correct on the site.
You need to click on that link to access the content. This is one of the ways how you can do. I used Explicit Wait within the script instead of hardcoded delay, so the script will wait up to 10 seconds until the content is visible.
Public driver As ChromeDriver
Sub ScrapeContent()
Const URL$ = "https://www.ebay.co.uk/itm/Metal-Floor-Fan-High-velocity-chrome-free-stand-fan-industrial-fan-3-8-Speed-UK/333664038024"
Dim oElem As Object, oItem As Object
Set driver = New ChromeDriver
driver.get URL
driver.FindElementByXPath("//span/a[contains(.,'Complete information')]", Timeout:=10000).Click
Set oElem = driver.FindElementByXPath("//span[contains(.,'Phone:')]/following::span", Timeout:=10000)
Set oItem = driver.FindElementByXPath("//span[contains(.,'Email:')]/following::span", Timeout:=10000)
Debug.Print oElem.Text, oItem.Text
End Sub
Output:
13025438495 eshijiali#outlook.com
If you use xmlhttp requests, the result you may get is reversed. However, I've used a function to make them regular:
Function reverseString(inputStr As String)
Dim myString$, I&
For I = Len(inputStr) To 1 Step -1
myString = myString & Mid(inputStr, I, 1)
Next I
reverseString = myString
End Function
Sub FetchData()
Const Url$ = "https://www.ebay.co.uk/itm/Metal-Floor-Fan-High-velocity-chrome-free-stand-fan-industrial-fan-3-8-Speed-UK/333664038024"
Dim HTML As New HTMLDocument, oPost As Object
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", Url, False
.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.104 Safari/537.36"
.send
HTML.body.innerHTML = .responseText
End With
Set oPost = HTML.getElementsByClassName("bsi-lbl")
If Not oPost Is Nothing And oPost.Length >= 1 Then
Debug.Print reverseString(oPost(0).NextSibling.innerText)
End If
If Not oPost Is Nothing And oPost.Length >= 2 Then
Debug.Print reverseString(oPost(1).NextSibling.innerText)
End If
End Sub
Output:
13025438495 eshijiali#outlook.com
Related
I am trying to connect 2 modules in vba such that the output of the first module (geturl) feeds into the other (getdata).
Get Url to look up the dossier URL online for substances entered in column A e.g. Acetone or alternatively the CAS number in column B can be used (see image below). Note: currently only looks up for substance info in A1 or B1.
Public Function GetUrl() As String
Const Url = "https://echa.europa.eu/information-on-chemicals/registered-substances?p_p_id=dissregisteredsubstances_WAR_dissregsubsportlet&p_p_lifecycle=1&p_p_state=normal&p_p_mode=view&_dissregisteredsubstances_WAR_dissregsubsportlet_javax.portlet.action=dissRegisteredSubstancesAction"
Set oHtml = New HTMLDocument
Set oHttp = CreateObject("MSXML2.XMLHTTP")
Set MyDict = CreateObject("Scripting.Dictionary")
SubstanceName = Cells(1, 1)
CASNumber = Cells(1, 2)
MyDict("_dissregisteredsubstances_WAR_dissregsubsportlet_disreg_name") = SubstanceName
MyDict("_dissregisteredsubstances_WAR_dissregsubsportlet_disreg_cas-number") = CASNumber
MyDict("_disssimplesearchhomepage_WAR_disssearchportlet_disclaimer") = "true"
MyDict("_disssimplesearchhomepage_WAR_disssearchportlet_disclaimerCheckbox") = "on"
payload = vbNullString
For Each DictKey In MyDict
payload = IIf(Len(DictKey) = 0, WorksheetFunction.EncodeURL(DictKey) & "=" & WorksheetFunction.EncodeURL(MyDict(DictKey)), _
payload & "&" & WorksheetFunction.EncodeURL(DictKey) & "=" & WorksheetFunction.EncodeURL(MyDict(DictKey)))
Next DictKey
With oHttp
.Open "POST", Url, False
.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.135 Safari/537.36"
.setRequestHeader "Content-type", "application/x-www-form-urlencoded"
.send (payload)
oHtml.body.innerHTML = .responseText
End With
GetUrl = oHtml.querySelector(".details").getAttribute("href")
Debug.Print oHtml.querySelector(".substanceNameLink ").innerText
Debug.Print oHtml.querySelector(".details").getAttribute("href")
End Function
If run this should return
Acetone https://echa.europa.eu/registration-dossier/-/registered-dossier/15460
Get Data uses the Url from geturl to return "DNEL" values:
Sub GetData()
'Start ECHA Search via XML HTTP Request
Dim XMLReq As New MSXML2.XMLHTTP60
Dim HTMLDoc As New MSHTML.HTMLDocument
Dim Route(1 To 3) As String
Route(1) = "sGeneralPopulationHazardViaInhalationRoute"
Route(2) = "sGeneralPopulationHazardViaDermalRoute"
Route(3) = "sGeneralPopulationHazardViaOralRoute"
XMLReq.Open "Get", GetUrl & "/7/1", False
XMLReq.send
If XMLReq.Status <> 200 Then
MsgBox "Problem" & vbNewLine & XMLReq.Status & " - " & XMLReq.statusText
Exit Sub
End If
HTMLDoc.body.innerHTML = XMLReq.responseText
'Loops through each element
For c = 1 To UBound(Route, 1)
Set Info = HTMLDoc.getElementById(Route(c))
Debug.Print Info.innerText
Set Info = HTMLDoc.getElementById(Route(c)).NextSibling.NextSibling.NextSibling
Set Data = Info.getElementsByTagName("dd")(0)
Debug.Print Data.innerText
Set Data = Info.getElementsByTagName("dd")(1)
Debug.Print Data.innerText
'Cells(r, c + 2) = Data.innerText
Next c
End Sub
For Acetone in Cell(1,1) This should Return:
Acetone
https://echa.europa.eu/registration-dossier/-/registered-dossier/15460
General Population - Hazard via inhalation route
DNEL (Derived No Effect Level)
200 mg/m³
General Population - Hazard via dermal route
DNEL (Derived No Effect Level)
62 mg/kg bw/day
General Population - Hazard via oral route
DNEL (Derived No Effect Level)
62 mg/kg bw/day
Instead of just relying on Cell A1 however, I wish to have the entire code loop for each cell with a substance in columnA/ColumnB. So in this case the URL for Acetone is found and the corresponding data is then pulled then the same occurs for Oxydipropanol.
Note in this image Substances can be looked up online using either the substance name, CAS number in columnB, or a combination of both.
Trying to connect the two modules, zo far I have only been able to get the geturl module to cycle through for each substance. I have also tried to combine both into 1 module but cant figure out how to correctly nest the for loops.
A quick google search states that you cant nest functions in vba. This makes me wonder if what I'm doing is even the right way to approach this. But I've seen similar things achieved In the past so I'm sure it's possible.
Note: If testing please use the example substances for testing. Using a random chemical say Benzene may result in an error as the tox profile for this substance doesn't exist. I still need to implement handling errors but this can be ignored for now.
I Will update you here with any further progress made, Thanks.
This worked for me:
Sub PopulateExposures()
Dim url, rw As Range
Set rw = Sheets("data").Range("A1:E1") 'first row with inputs
Do While Application.CountA(rw) > 0
url = SubstanceUrl(rw.Cells(1).Value, rw.Cells(2).Value) 'get the URL
rw.Cells(3).Resize(1, 3).Value = ExposureData(url) 'get exposure data (as array) and add to row
Set rw = rw.Offset(1, 0) 'next substance
Loop
End Sub
Public Function SubstanceUrl(SubstanceName, CASNumber) As String
Const url = "https://echa.europa.eu/information-on-chemicals/registered-substances?" & _
"p_p_id=dissregisteredsubstances_WAR_dissregsubsportlet&p_p_lifecycle=1&" & _
"p_p_state=normal&p_p_mode=view&" & _
"__dissregisteredsubstances_WAR_dissregsubsportlet_javax.portlet.action=dissRegisteredSubstancesAction"
Dim oHTML, oHttp, MyDict, payload, DictKey, sep
Set oHTML = New HTMLDocument
Set oHttp = CreateObject("MSXML2.XMLHTTP")
Set MyDict = CreateObject("Scripting.Dictionary")
MyDict("_dissregisteredsubstances_WAR_dissregsubsportlet_disreg_name") = SubstanceName
MyDict("_dissregisteredsubstances_WAR_dissregsubsportlet_disreg_cas-number") = CASNumber
MyDict("_disssimplesearchhomepage_WAR_disssearchportlet_disclaimer") = "true"
MyDict("_disssimplesearchhomepage_WAR_disssearchportlet_disclaimerCheckbox") = "on"
payload = ""
For Each DictKey In MyDict
payload = payload & sep & DictKey & "=" & WorksheetFunction.EncodeURL(MyDict(DictKey))
sep = "&"
Next DictKey
With oHttp
.Open "POST", url, False
.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.135 Safari/537.36"
.setRequestHeader "Content-type", "application/x-www-form-urlencoded"
.send payload
oHTML.body.innerHTML = .responseText
End With
SubstanceUrl = oHTML.querySelector(".details").getAttribute("href")
End Function
Function ExposureData(urlToGet)
Dim XMLReq As New MSXML2.XMLHTTP60
Dim HTMLDoc As HTMLDocument, dds
Dim Route(1 To 3) As String, Results(1 To 3) As String, c, Info, Data
Route(1) = "sGeneralPopulationHazardViaInhalationRoute"
Route(2) = "sGeneralPopulationHazardViaDermalRoute"
Route(3) = "sGeneralPopulationHazardViaOralRoute"
XMLReq.Open "Get", urlToGet & "/7/1", False
XMLReq.send
If XMLReq.Status <> 200 Then
Results(1) = "Problem" & vbNewLine & XMLReq.Status & " - " & XMLReq.statusText
Else
Set HTMLDoc = New HTMLDocument
HTMLDoc.body.innerHTML = XMLReq.responseText
For c = 1 To UBound(Route, 1)
Set Info = HTMLDoc.getElementById(Route(c))
If Not Info Is Nothing Then
Set Info = Info.NextSibling.NextSibling.NextSibling
Set dds = Info.getElementsByTagName("dd")
If dds.Length > 1 Then
Results(c) = dds(1).innerText
Else
Results(c) = "hazard unknown"
End If
Else
Results(c) = "no info"
End If
Next c
End If
ExposureData = Results
End Function
I am slowly exploring if I can use VBA to code a macro that will search a website from a list of keywords/codes in column A and extract the data. Currently The code below searches the desired website using the range in ("A1") only but does get to the right page with the data I wish to extract. In this case the Code in a1 is 100-52-7
Sub BrowseToSite()
Dim IE As New SHDocVw.InternetExplorer
Dim HTMLDoc As MSHTML.HTMLDocument
IE.Visible = True
IE.Navigate "https://apps.who.int/food-additives-contaminants-jecfa-database/Search.aspx"
Do While IE.ReadyState <> READYSTATE_COMPLETE
Loop
IE.Document.forms("form1").Elements("ctl00$ContentPlaceHolder1$txtSearch").Value = Range("a1").Value
IE.Document.forms("form1").Elements("ctl00$ContentPlaceHolder1$btnSearch").Click
Set HTMLDoc = IE.Document
'Sheet1.Range("B1").Value = HTMLDoc.getElementsByClassName("sectionHead1")(0).innerText
End Sub
Now I wish to pull the "0-5 mg/kg bw (1996)" phrase on this page into Excel. I planned to do this by retriving the inner text within the class name however I run into an error Object Variable or With Block variable not set with the following line:
Sheet1.Range("B1").Value = HTMLDoc.getElementsByClassName("sectionHead1")(0).innerText
You can get rid of IE altogether and try using xmlhttp requests to make the script robust. What the following script does is send a get http requests first to scrape the value of certain parameters supposed to be used within post requests and then issue a post requests to parse the desired content.
This is one of the efficient ways how you can:
Option Explicit
Public Sub GetContent()
Const Url = "https://apps.who.int/food-additives-contaminants-jecfa-database/Search.aspx"
Dim oHttp As Object, oHtml As HTMLDocument, MyDict As Object
Dim DictKey As Variant, payload$, searchKeyword$
Set oHtml = New HTMLDocument
Set oHttp = CreateObject("MSXML2.XMLHTTP")
Set MyDict = CreateObject("Scripting.Dictionary")
'send get requests first to parse the value of "__VIEWSTATE", "__VIEWSTATEGENERATOR" e.t.c., as in oHtml.getElementById("__VIEWSTATE").Value
With oHttp
.Open "GET", Url, False
.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.135 Safari/537.36"
.send
oHtml.body.innerHTML = .responseText
End With
searchKeyword = "100-52-7" 'this is the search keyword you wanna use from your predefined search terms
'MyDict stores keys and values within dictionary, as in __VIEWSTATE = "some value" and so on
MyDict("__VIEWSTATE") = oHtml.getElementById("__VIEWSTATE").Value
MyDict("__VIEWSTATEGENERATOR") = oHtml.getElementById("__VIEWSTATEGENERATOR").Value
MyDict("__EVENTVALIDATION") = oHtml.getElementById("__EVENTVALIDATION").Value
MyDict("ctl00$ContentPlaceHolder1$txtSearch") = searchKeyword
MyDict("ctl00$ContentPlaceHolder1$btnSearch") = "Search"
MyDict("ctl00$ContentPlaceHolder1$txtSearchFEMA") = ""
'joining each set of key and value with ampersand to make it a string so that you can use it as a parameter while issuing post requests, which is what payload is doing
payload = ""
For Each DictKey In MyDict
payload = IIf(Len(DictKey) = 0, WorksheetFunction.encodeURL(DictKey) & "=" & WorksheetFunction.encodeURL(MyDict(DictKey)), _
payload & "&" & WorksheetFunction.encodeURL(DictKey) & "=" & WorksheetFunction.encodeURL(MyDict(DictKey)))
Next DictKey
With oHttp
.Open "POST", Url, False
.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.135 Safari/537.36"
.setRequestHeader "Content-type", "application/x-www-form-urlencoded"
.send (payload)
oHtml.body.innerHTML = .responseText
End With
MsgBox oHtml.querySelector("#SearchResultItem > a").NextSibling.NodeValue
End Sub
Make sure to add the following libraries to execute the above script:
Microsoft XML, v6.0
Microsoft Scripting Runtime
Microsoft HTML Object Library
You click on an element with this line of code:
IE.Document.forms("form1").Elements("ctl00$ContentPlaceHolder1$btnSearch").Click
for which IE makes a POST request to retrieve your results, as can be seen here:
The above is a screen shot from Edge's dev tools, but concept is the same
During this request, the element in question is not immediately there, so you will need to wait for it to load.
Your prior method of
Do While IE.ReadyState <> READYSTATE_COMPLETE
Loop
would probably work, but I find it to be inconsistent at times and would also include checking the .Busy property as well.
Try using this after your click:
IE.Document.forms("form1").Elements("ctl00$ContentPlaceHolder1$btnSearch").Click
'~~WAIT FOR SEARCH RESULTS TO LOAD~~
Do While IE.ReadyState < READYSTATE_COMPLETE Or IE.Busy
Loop
Set HTMLDoc = IE.Document
Sheet1.Range("B1").Value = HTMLDoc.getElementsByClassName("sectionHead1")(0).innerText
If you're still having issues, you can force IE to wait for the element in question to become available by doing this:
On Error Resume Next
Do while HTMLDoc.getElementsByClassName("sectionHead1")(0) is Nothing
Loop
On Error Goto 0
Set HTMLDoc = IE.Document
Sheet1.Range("B1").Value = HTMLDoc.getElementsByClassName("sectionHead1")(0).innerText
This is a simple loop that checks for the object, and will continue to loop until that object is no longer Nothing (which essentially means it has loaded).
And I would recommend that you add some sort of timeout that may trigger an error or something just in case the webpage is having issues so you're not in an infinite loop.
Pro Tip:
If you are clicking the search button a lot of times and waiting for a
lot of objects to load, instead of duplicating the above code you can
turn it into it's own sub and do something like:
Sub WaitForElement(IE as InternetExplorer, elem As Object)
Do While IE.ReadyState < 4 Or IE.Busy: Loop
On Error Resume Next
Do While elem is Nothing: Loop
On error Goto 0
End Sub
Then you would just need to use the following line after each click:
WaitForElement IE, HTMLDoc.getElementsByClassName("sectionHead1")(0)
Not only would this cut down on the number of lines in your code, it could greatly improve readability as well.
EDIT:
I have tested the response text and I found out the reason behind the error is that the request gets redirected to the "I'm not a robot" page due to high number of search requests sent by excel. Is there anything that can be done to prevent this from happening?
Original:
I have this code (found it online and modified it) that works fine when i execute it few times, it then creates
Run-time error " 91 - Object Variable or with block variable not set"
Then after I wait for about an hour it works fine without me changing anything, then it creates the error again. and so on...
The code was originally to search Google for a list of strings and return how many results. I modified it to search for the exact string and just return if there are results or not.
Here is the code:
Sub SearchHits()
Dim url, name As String
Dim i, lastRow As Long
Dim XMLHTTP As Object
Dim html As Object
Application.ScreenUpdating = False
Application.DisplayStatusBar = False
Application.EnableEvents = False
ActiveSheet.DisplayPageBreaks = False
lastRow = Range("a" & Rows.Count).End(xlUp).Row
For i = 1 To lastRow
Name = """" & Cells(i, 1).Value & """"
url = "https://www.google.co.in/search?q=" & Name & "&rnd=" & WorksheetFunction.RandBetween(1, 10000)
Set XMLHTTP = CreateObject("MSXML2.ServerXMLHTTP.6.0")
XMLHTTP.Open "GET", url, False
XMLHTTP.setRequestHeader "Content-Type", "text/xml"
XMLHTTP.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0"
XMLHTTP.send
Set html = CreateObject("htmlfile")
html.body.innerHTML = XMLHTTP.ResponseText
If html.getElementById("topstuff").innerText <> "" Then '<-----Highlighted on error
Cells(i, 9) = "–"
Else
Cells(i, 9) = "Results Found"
End If
Next
End Sub
I'm not sure what is the reason, I have few guesses but am hoping someone over here could explain to me the reason and the work around. I work in research and everyday I search a lot of names and it would be very helpful if I can know which has results and which doesn't.
It would be even more amazing if someone can help with the code to return the exact number of searches (the one found at the bottom of the page).
Best Regards!
I'm trying to scrape a website developed with ASP.NET Ajax with Excel VBA. I'm using the Microsoft HTML Object Library and the Microsoft XML, v6.0 library. What I would like to do is to get in a table all the items in the second text box when I select an item in the first text box.
When you select an item in the first text box automatically the items on the second text box are loaded. So first I make a GET request to the website, then I scrape all the elements with the class aspNetHidden. I add two elements to the POST string that doesnt' appear in the first scrape: ctl00$ctl18, __ASYNCPOST, with their respective values. I also added the value for the first text box ctl00$MainContent$cboDenominacionSocial.
Sub Macro1()
'
' Macro1 Macro
'
' Declare variables
Dim xmlhttp As New MSXML2.XMLHTTP60
Dim urlMF As String
'
urlMF = "https://www.smv.gob.pe/Frm_EVCP?data=5A959494701B26421F184C081CACF55BFA328E8EBC"
'
'
xmlhttp.Open "GET", urlMF, False
'xmlhttp.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3842.0 Safari/537.36"
'xmlhttp.setRequestHeader "Content-type", "application/x-www-form-urlencoded"
xmlhttp.send
Dim bodySMV As New HTMLDocument
bodySMV.body.innerHTML = xmlhttp.responseText
Dim topicsSMV As Object
Dim topicElem As Object
Set topicsSMV = bodySMV.getElementsByClassName("aspNetHidden")
Dim postReq As String
postReq = ""
i = 1
For Each topic In topicsSMV
Set topicElem = topic.getElementsByTagName("input")
For Each dataTopic In topicElem
Cells(i, 1) = dataTopic.Name
Cells(i, 2) = dataTopic.Value
temp = dataTopic.Name & "=" & dataTopic.Value
If i = 1 Then postReq = "ctl00%24ctl18=ctl00%24MainContent%24UpdatePanel1%7Cctl00%24MainContent%24cboDenominacionSocial"
If i > 1 Then postReq = postReq & Chr(38) & temp
i = i + 1
Next dataTopic
Next topic
postReq = postReq & "ctl00%24MainContent%24cboDenominacionSocial=156429&__ASYNCPOST=true&"
Cells(i, 1).Value = postReq
xmlhttp.Open "POST", urlMF, False
xmlhttp.send postReq
bodySMV.body.innerHTML = xmlhttp.responseText
'
End Sub
I'd like to get all the list of possible elements from the second text box, depending on the selection of the first box. What am I missing in my POST request?
For a university research project, I plan to run a larger number of data requests with a total of c. 3,000 different spreadsheets, each including about 800-1,000 unique data requests.
Aim of the code is to get the number of Google News results for certain search terms within specific time frames, as for instance the results for "Elon Musk" between 01/01/2015 and 12/31/2015.
I have so far constructed a code that works relatively well with a single-threaded approach but would, given the large number of data requests, take weeks for the 3,000 spreadsheets to be completed (Google is occasionally blocking the requests given the sheer number of search queries but this is something generally manageable).
On the other hand, I have read that there is not "native" option in VBA to go for a time saving multi-threading approach, while several suggestions exist to work around this lack of a real multi-threading function. None of them has however really worked for my case so far.
Is there any practicable option to have the code below send out several Google requests at a time? This would allow the data collection in a much shorter time span. As mentioned, I have already been through a number of complex "out of the box" multi-threading solutions, of which none really worked.
Option Explicit
Sub TermCheck()
Dim url As String, lastRow As Long
Dim XMLHTTP As Object, html As Object, objResultDiv As Object, objH3 As Object, link As Object
Dim start_time As Date
Dim end_time As Date
Dim var As String
Dim var1 As Object
lastRow = Range("A" & Rows.Count).End(xlUp).Row
Dim cookie As String
Dim result_cookie As String
start_time = Time
Debug.Print "start_time:" & start_time
For i = 62 To lastRow
url = "https://www.google.com/search?q=" & Cells(i, 3) & "&source=lnt&tbs=cdr%3A1%2Ccd_min%3A" & Cells(i, 4) & "%2Ccd_max%3A" & Cells(i, 5) & "&tbm=nws"
Set XMLHTTP = CreateObject("MSXML2.serverXMLHTTP")
XMLHTTP.Open "GET", url, False
XMLHTTP.setRequestHeader "Content-Type", "text/xml"
XMLHTTP.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0"
XMLHTTP.send
Set html = CreateObject("htmlfile")
html.body.innerHTML = XMLHTTP.responseText
Set objResultDiv = html.getElementById("rso")
Set var1 = html.getElementById("resultStats")
If Not var1 Is Nothing Then
Cells(i, 6).Value = var1.innerText
End If
DoEvents
Next
end_time = Time
Debug.Print "end_time:" & end_time
Debug.Print "done" & "Time taken : " & DateDiff("n", start_time, end_time)
MsgBox "done" & "Time taken : " & DateDiff("n", start_time, end_time)
End Sub
I think you are on the right track by consolidating your data upfront, this is often a good idea for most projects. I'm not so sure you should give up on Async requests so quickly, have a go at this code below and see if that helps speed up your project.
I assumed that dates where in Columns D and E, so I mocked up my data to this idea. I also hard coded 'Elon Musk' in there to make things easier testing. You probably need to change this.
Option Explicit
Sub TermCheck(RunAsync As Boolean)
Const READYSTATE_COMPLETE As Long = 4
Dim url As String
Dim WebRequest As Object
Dim WebRequests As Object
Dim CellIndex As Variant
Dim Document As Object
Dim ResultStat As Object
Dim ws As Worksheet
Dim StartDate As Date
Dim EndDate As Date
Dim i As Long
StartDate = #1/1/2015#
EndDate = #1/2/2015#
Set ws = ThisWorkbook.Worksheets("Sheet3")
Set WebRequests = CreateObject("Scripting.Dictionary")
For i = 1 To 30
'Change URL here
url = "https://www.google.com/search?q=Elon%20Musk" & _
"&source=lnt&tbs=cdr%3A1%2Ccd_min%3A" & Replace(Format(ws.Cells(i, 4), "m/d/yyyy"), "/", "%2F") & _
"%2Ccd_max%3A" & Replace(Format(ws.Cells(i, 5), "m/d/yyyy"), "/", "%2F") & "&tbm=nws"
Set WebRequest = CreateObject("MSXML2.XMLHTTP")
With WebRequest
.Open "GET", url, RunAsync
.setRequestHeader "Content-Type", "text/xml"
.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0"
.send
End With
StartDate = DateAdd("d", 1, StartDate)
EndDate = DateAdd("d", 1, EndDate)
WebRequests.Add i, WebRequest
Next
For Each CellIndex In WebRequests.Keys
Set WebRequest = WebRequests(CellIndex)
While WebRequest.Readystate <> READYSTATE_COMPLETE: DoEvents: Wend
If WebRequest.Status = 200 Then
Set Document = CreateObject("htmlfile")
Document.body.innerhtml = WebRequest.ResponseText
Set ResultStat = Document.getElementById("resultStats")
'Missing equals sign was here
If Not ResultStat Is Nothing Then ws.Cells(CellIndex, 6).Value2 = ResultStat.innertext
End If
Next
End Sub
Sub TestRunRequests()
'Run it Synchronous
Application.ScreenUpdating = False
Dim MyTimer As Double
MyTimer = Timer
TermCheck False
Debug.Print "Synchronous took: " & Timer - MyTimer
'Run it Asynchronous
MyTimer = Timer
TermCheck True
Debug.Print "Asynchronous took: " & Timer - MyTimer
Application.ScreenUpdating = True
End Sub
Here are my the timings (in seconds) I got when I did 100 Requests for each method (async and sync):
Synchronous took: 44.5625
Asynchronous took: 22.46875
This may rather be the "pedestrian way" regarding a time saving approach but I decided to paste all relevant spreadsheets into one major Excel file. When amending the code to cover all spreadsheets simultaneously (and not only the active one), the search processes can literally be conducted in a fraction of the time otherwise required.