Cannot get the text inside a <p> tag using VBA - excel

I have the following URL
https://www.wavemotion.gr/shop/smartphone-accessories/easy-one-touch-wireless-fast-charging-mount
I am trying to get the availability of the product by using the following
For i = 2 To lastrow
mylink = wks.Cells(i, 2).Value
ie.Navigate mylink
While ie.Busy Or ie.ReadyState < 4: DoEvents: Wend
t = Timer
Do
DoEvents
On Error Resume Next
Set instock = ie.Document.querySelector(".stock.in-stock").innerText
If instock Is Nothing Then
Set availability = ie.Document.querySelector(".stock.out-of-stock").innerText
Else
Set availability = instock
End If
wks.Cells(i, "D") = availability
If Timer - t > MAX_WAIT_SEC Then Exit Do
On Error GoTo 0
Loop
Next i
But I get allways nothing on
Set instock = ie.Document.querySelector(".stock.in-stock").innerText
I checked the query on
https://try.jsoup.org/
It is working
What I am doing wrong here? There is not any id to target only class name
<p class="stock in-stock">Διαθέσιμο</p>

So, what's happening here is that you're trying to Set string datatype innerText to object variable instock. The reason it's returning Nothing is because your On Error Resume Next statement is suppressing the error message. If you took that out and ran it, you would get a Type Mismatch. What you'd need to do is split it into a line that assigns the object to the object variable and then a line that reads the innerText of the assigned object.
Set instock = ie.Document.querySelector(".stock.in-stock")
If instock Is Nothing Then
Set availability = ie.Document.querySelector(".stock.out-of-stock")
Else
Set availability = instock
End If
wks.Cells(i, "D") = availability.innerText

There is a better, faster way. Use xmlhttp and parse that info out of the json stored in one of the script tags. If issuing large numbers of requests you may need to add a wait every x number of requests in case of throttling/blocking. Note: You can use the same approach with InternetExplorer and thus remove many of your lines of code, though you have another library (.bas) dependancy.
You need to install jsonconverter.bas from here and go vbe > tools > references > and add a reference to Microsoft Scripting Runtime
Option Explicit
Public Sub GetStocking()
Dim json As Object, html As HTMLDocument
Set html = New HTMLDocument
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://www.wavemotion.gr/shop/smartphone-accessories/easy-one-touch-wireless-fast-charging-mount", False
.send
html.body.innerHTML = StrConv(.responseBody, vbUnicode)
End With
Set json = JsonConverter.ParseJson(html.querySelector("script[type='application/ld+json']").innerHTML)
Debug.Print json("offers")("availability")
End Sub
This is what the entire json contains:
Internet Explorer version:
Option Explicit
Public Sub GetInfo()
Dim ie As New InternetExplorer, i As Long, s As String, scripts As Object, json As Object
With ie
.Visible = False
.Navigate2 "https://www.wavemotion.gr/shop/smartphone-accessories/easy-one-touch-wireless-fast-charging-mount"
While .Busy Or .readyState < 4: DoEvents: Wend
Set scripts = .document.querySelectorAll("script[type='application/ld+json']")
For i = 0 To scripts.Length - 1
s = scripts.item(i).innerHTML
If InStr(s, "availability") > 0 Then
Set json = JsonConverter.ParseJson(s)
Exit For
End If
Next
.Quit
If Not json Is Nothing Then Debug.Print json("offers")("availability")
End With
End Sub

Related

Scraping using VBA

i am trying to extract one figure from a gov website, I have done a lot of googling and I am kinda lost for ideas, my code below returns a figure but it isnt the figure I want to get and I am not entirely sure why.
I want to subtract the figure from the 'Cases by Area (Whole Pandemic)' table 'Upper tier LA' section and 'Southend on Sea' Case number.
https://coronavirus.data.gov.uk/details/cases
I stole this code from online somewhere and tried to replicate with my class number I found within F12 section on the site.
Sub ExtractLastValue()
Set objIE = CreateObject("InternetExplorer.Application")
objIE.Top = 0
objIE.Left = 0
objIE.Width = 800
objIE.Height = 600
objIE.Visible = True
objIE.Navigate ("https://coronavirus.data.gov.uk/details/cases")
Do
DoEvents
Loop Until objIE.readystate = 4
MsgBox objIE.document.getElementsByClassName("sc-bYEvPH khGBIg govuk-table__cell govuk-table__cell--numeric ")(0).innerText
Set objIE = Nothing
End Sub
Data comes from the official API and returns a json response dynamically on that page when you click the Upper Tier panel.
Have a look and play with the API guidance
here:
https://coronavirus.data.gov.uk/details/developers-guide
You can make a direct xhr request by following the guidance in the API documentation and then using a json parser to handle the response. For your request it would be something like the following:
https://coronavirus.data.gov.uk/api/v1/data?filters=areaName=Southend-on-Sea&areaType=utla&latestBy=cumCasesByPublishDate&structure=
{"date":"date", "areaName":"areaName","cumCasesByPublishDate":"cumCasesByPublishDate",
"cumCasesByPublishDateRate":"cumCasesByPublishDateRate"}
XHR:
A worked example using jsonconverter.bas as the json parser
Option Explicit
Public Sub GetCovidNumbers()
Dim http As Object, json As Object
Set http = CreateObject("MSXML2.XMLHTTP")
With http
.Open "GET", "https://coronavirus.data.gov.uk/api/v1/data?filters=areaName=Southend-on-Sea&areaType=utla&latestBy=cumCasesByPublishDate&structure={""date"":""date"",""areaName"":""areaName"",""cumCasesByPublishDate"":""cumCasesByPublishDate"",""cumCasesByPublishDateRate"":""cumCasesByPublishDateRate""}", False
.setRequestHeader "User-Agent", "Mozilla/5.0"
.send
Set json = JsonConverter.ParseJson(.responseText)("data")(1)
End With
With ActiveSheet
Dim arr()
arr = json.Keys
.Cells(1, 1).Resize(1, UBound(arr) + 1) = arr
arr = json.Items
.Cells(2, 1).Resize(1, UBound(arr) + 1) = arr
End With
End Sub
Json library (Used in above solution):
I use jsonconverter.bas. Download raw code from here and add to standard module called JsonConverter . You then need to go VBE > Tools > References > Add reference to Microsoft Scripting Runtime. Remove the top Attribute line from the copied code.
Internet Explorer:
You could do a slower, more complicated, internet explorer solution where you need to select the utla option when present, then select from the table the desired value:
Option Explicit
Public Sub GetCovidNumbers()
'Tools references Microsoft Internet Controls and Microsoft HTML Object Library
Dim ie As SHDocVw.InternetExplorer, t As Date, ele As Object
Const MAX_WAIT_SEC As Long = 10
Set ie = New SHDocVw.InternetExplorer
With ie
.Visible = True
.Navigate2 "https://coronavirus.data.gov.uk/details/cases"
While .Busy Or .ReadyState <> READYSTATE_COMPLETE: DoEvents: Wend
t = Timer 'timed loop for element to be present to click on (to get utla)
Do
On Error Resume Next
Set ele = .Document.querySelector("#card-cases_by_area_whole_pandemic [aria-label='Upper tier LA']")
On Error GoTo 0
If Timer - t > MAX_WAIT_SEC Then Exit Do
Loop While ele Is Nothing
If ele Is Nothing Then Exit Sub
ele.Click
While .Busy Or .ReadyState <> READYSTATE_COMPLETE: DoEvents: Wend
Dim table As MSHTML.HTMLTable, datetime As String, result()
Set table = .Document.querySelector("table[download='cumCasesByPublishDate,cumCasesByPublishDateRate']")
datetime = .Document.querySelector("time").getAttribute("datetime")
result = GetDataForUtla("Southend-on-Sea", datetime, table)
With ActiveSheet
.Cells(1, 1).Resize(1, 4) = Array("Datetime", "Area", "Cases", "Rate per 100,000 population")
.Cells(2, 1).Resize(1, UBound(result) + 1) = result
End With
.Quit
End With
End Sub
Public Function GetDataForUtla(ByVal utla As String, ByVal datetime As String, ByVal table As MSHTML.HTMLTable) As Variant
Dim row As MSHTML.HTMLTableRow, i As Long
For Each row In table.Rows
If InStr(row.outerHTML, utla) > 0 Then
Dim arr(4)
arr(0) = datetime
For i = 0 To 2
arr(i + 1) = row.Children(i).innerText
Next
GetDataForUtla = arr
Exit Function
End If
Next
GetDataForUtla = Array("Not found")
End Function
References:
https://developer.mozilla.org/en-US/docs/Web/CSS/CSS_Selectors
https://developer.mozilla.org/en-US/docs/Web/API/Document/querySelector

Webscraping of product prices and specs

I'm new to the VBA world and i was trying to create a Excel file to get the cheapest price and name tag on a website. I've created one file that goes through the whole search list and gathers each product URL and places on a spreadsheet. My challenge now is in making this second code work. It starts out getting the information without any issues but then, after 10 or more URLs the macro gives a bug and it starts repeating the information on all the following cells.. Is there a way i can make the code run slower so it doesn't get into this problem?
I'll list here my code and a sample of the URLs im scraping..
here's the code i've been using:
Sub test()
Dim URL As String
Set ie = CreateObject("internetexplorer.application")
For i = 2 To 300
URL = Cells(i, 1).Value
ie.navigate URL
ie.Visible = False
Do While ie.busy And ie.readystate <> "readystate_complete"
DoEvents
Loop
Cells(i, 3) = ie.document.getElementsByTagName("h1")(0).innerText
Cells(i, 4) = ie.document.getElementsByTagName("strong")(0).innerText
Next i
ie.Quit
MsgBox "acabou"
End Sub
<https://www.zoom.com.br/notebook/notebook-vaio-vjf157f11x-b0211s-intel-core-i5-8250u-15-6-8gb-ssd-256-gb-windows-10-8-geracao>
<https://www.zoom.com.br/notebook/notebook-samsung-chromebook-plus-intel-celeron-3965y-12-2-4gb-emmc-32-gb-chrome-os-touchscreen>
<https://www.zoom.com.br/notebook/notebook-dell-xps-7390-intel-core-i7-10710u-13-3-16gb-ssd-512-gb-windows-10-touchscreen>
<https://www.zoom.com.br/notebook/notebook-dell-i15-3583-a5-intel-core-i7-8565u-15-6-8gb-hd-2-tb-windows-10-8-geracao>
<https://www.zoom.com.br/notebook/notebook-lenovo-b330-intel-core-i5-8250u-15-6-4gb-hd-1-tb-windows-10-8-geracao>
<https://www.zoom.com.br/notebook/notebook-dell-i15-7580-a20-intel-core-i7-8550u-15-6-8gb-hd-1-tb-geforce-mx150-windows-10>
<https://www.zoom.com.br/notebook/notebook-dell-i14-3480-u30-intel-core-i5-8265u-14-4gb-hd-1-tb-linux-8-geracao>
<https://www.zoom.com.br/notebook/macbook-pro-apple-muhn2bz-intel-core-i5-13-3-8gb-ssd-128-gb-tela-de-retina>
<https://www.zoom.com.br/notebook/notebook-multilaser-pc150-amd-a4-9120-14-2gb-emmc-32-gb-windows-10>
<https://www.zoom.com.br/notebook/notebook-samsung-np930qaa-kw1br-intel-core-i7-8550u-13-3-8gb-ssd-256-gb-windows-10-touchscreen>
<https://www.zoom.com.br/notebook/notebook-acer-a515-51g-58vh-intel-core-i5-7200u-15-6-8gb-hd-1-tb-geforce-940mx>
<https://www.zoom.com.br/notebook/notebook-multilaser-pc222-intel-celeron-dual-core-13-3-4gb-emmc-64-gb-windows-10>
<https://www.zoom.com.br/notebook/notebook-acer-pt515-51-788a-intel-core-i7-9750h-15-6-32gb-ssd-1-tb-geforce-rtx-2070-windows-10>
<https://www.zoom.com.br/notebook/notebook-acer-a315-53-53ak-intel-core-i5-7200u-15-6-4gb-hd-1-tb-windows-10-7-geracao>
<https://www.zoom.com.br/notebook/notebook-dell-i15-5584-m40-intel-core-i7-8565u-15-6-8gb-hd-2-tb-geforce-mx130-windows-10>
<https://www.zoom.com.br/notebook/notebook-acer-a315-41g-r21b-amd-ryzen-5-2500u-15-6-8gb-hd-1-tb-radeon-535-windows-10>
<https://www.zoom.com.br/notebook/notebook-positivo-master-n2140-intel-core-i3-7020u-14-4gb-hd-500-gb-windows-10-7-geracao>
<https://www.zoom.com.br/notebook/notebook-multilaser-pc101-intel-atom-14-1gb-ssd-32-gb-windows-10>
<https://www.zoom.com.br/notebook/notebook-lenovo-b330-intel-core-i5-8250u-15-6-8gb-hd-1-tb-windows-10-8-geracao>
<https://www.zoom.com.br/notebook/notebook-acer-an515-51-77fh-intel-core-i7-7700hq-15-6-8gb-hd-1-tb-geforce-gtx-1050-windows-10>
<https://www.zoom.com.br/notebook/notebook-dell-i15-3583-a2yp-intel-core-i5-8265u-15-6-4gb-optane-16-gb-hd-1-tb-windows-10>
<https://www.zoom.com.br/notebook/notebook-asus-g531gt-intel-core-i7-9750h-15-6-16gb-ssd-512-gb-geforce-gtx-1650-windows-10>
<https://www.zoom.com.br/notebook/notebook-vaio-fit-15s-intel-core-i3-7100u-15-6-4gb-hd-1-tb-windows-10-home>
<https://www.zoom.com.br/notebook/notebook-samsung-s50-intel-core-i7-7500u-13-3-8gb-ssd-256-gb-windows-10-style>
<https://www.zoom.com.br/notebook/notebook-lenovo-b330-intel-core-i3-7020u-15-6-4gb-ssd-120-gb-windows-10-7-geracao>
First of all:
Always declare all variables. To force this, always write Option Explicit as the first line in each module. This way, especially typos in variable names are immediately detected by the compiler.
Here is what to do about your problem:
IE is sometimes a real diva. For example, it doesn't like to have to process URLs in the same instance in quick succession. Therefore it is advisable to kick it out of memory and restart it for each new URL.
To restart it quickly, the deletion of coockies, the cache, etc. must not be set in its settings. Otherwise automation errors will occur.
Try this macro. With the given URLs it works:
Option Explicit
Sub test()
Dim URL As String
Dim ie As Object
Dim i As Long
For i = 2 To 300
If i > 14 Then
ActiveWindow.SmallScroll down:=1
End If
URL = ActiveSheet.Cells(i, 1).Value
Set ie = CreateObject("internetexplorer.application")
ie.navigate URL
ie.Visible = False
Do While ie.readystate <> 4: DoEvents: Loop
ActiveSheet.Cells(i, 3) = ie.document.getElementsByTagName("h1")(0).innerText
ActiveSheet.Cells(i, 4) = ie.document.getElementsByTagName("strong")(0).innerText
ie.Quit
Set ie = Nothing
Next i
MsgBox "acabou"
End Sub
Try this
Sub GetPrices()
Dim html As MSHTML.HTMLDocument, r As Long
For r = 1 To 4
Set html = GetHTML(Cells(r, 1).Value)
Cells(r, 3).Value = html.querySelector("h1.product-name").innerText
Cells(r, 4).Value = Replace(Replace(html.querySelector(".product-price").innerText, "a partir de ", vbNullString), ":( ", "")
Set html = Nothing
Next r
End Sub
Function GetHTML(ByVal sURL As String) As HTMLDocument
Dim http As MSXML2.XMLHTTP60, html As MSHTML.HTMLDocument
Set http = New MSXML2.XMLHTTP60
Set html = New MSHTML.HTMLDocument
With http
.Open "Get", sURL, False
.send
html.body.innerHTML = .responseText
End With
Set GetHTML = html
End Function
In python I will do :
You should first import library
from time import sleep
Two functions available :
this function let you sleep before execute every 3 seconds
time.sleep(3)
this function let you random sleep between every execute from 1 to 3 seconds
sleep(randint(1, 3)
Note: take aware about :
cookies because sometimes you need to post request to scrape some id before get.
the syntax of your header and set correctly origin & referrer parameters.

Get a single value from a table with no ID with VBA

I am developing a web bot that scrapes the importation taxes from different countries customs website, and I have a problem retrieving the value I want from the following site : http://www.aduanet.gob.pe/itarancel/arancelS01Alias , using the test value 3303000000 next to CODIGO. The value I want to retrieve is the 6% next to "Ad / Valorem", but the table it is in has no ID properties nor class or something relevant to get directly to it or at least near to it. I have been trying to use .parent and .child methods, but without success. My code so far is as follows:
Function Peru(partida As String) As String
'Open IE
Set objIE = New InternetExplorer
objIE.Visible = True
objIE.navigate "http://www.aduanet.gob.pe/itarancel/arancelS01Alias"
'Load sub
Cargar
'Navigate further into the website (Im using partida = 3303000000)
For Each box In objIE.document.getElementsByTagName("input")
If box.Name = "cod_partida" Then
box.Value = partida
Exit For
End If
Next
For Each boton In objIE.document.getElementsByTagName("input")
If boton.Value = "Consultar" Then
boton.Click
Exit For
End If
Next
'Get the 6% value (This part is the one I cant figure out)
End Function
This is how you can get the data from that page. It was needed to switch two iframes from that page to reach the required content.
Sub Aduanet_Info()
Dim IE As New InternetExplorer, html As HTMLDocument
Dim elem As Object, frm As Object, frm1 As Object
With IE
.Visible = False
.navigate "http://www.aduanet.gob.pe/itarancel/arancelS01Alias"
Do While .readyState <> READYSTATE_COMPLETE: Loop
Set html = .document
End With
html.getElementsByTagName("input")(0).Value = "3303000000"
html.getElementsByTagName("input")(3).Click
Application.Wait Now + TimeValue("00:00:05")
Set frm = html.getElementsByClassName("autoHeight")(0).contentWindow.document
Set frm1 = frm.getElementsByClassName("autoHeight")(1).contentWindow.document
For Each elem In frm1.getElementsByTagName("td")
If InStr(elem.innerText, "Valorem") > 0 Then MsgBox elem.NextSibling.NextSibling.innerText: Exit For
Next elem
IE.Quit
End Sub
Output:
6%

VBA error when navigating with Internet Explorer

I am trying to download a table of proprietary investments/positions/pricing from Nationwide. The code seems to do what I want, EXCEPT for producing an "object required" error when I attempt to select a particular account (click)
I thought I had the proper code to tell my macro to wait until IE was ready to go on, but clearly I am missing something.
In the code, the relevant line is highlighted. If I enter a STOP above the error line, I can wait until I "see" the link appear, then "continue" the code and it runs as expected.
Because this goes to my financial accounts, I cannot provide the user name and password to allow someone to replicate the exact problem, but here is the code, and the error message and highlight. Suggestions appreciated.
Option Explicit
'set Reference to Microsoft Internet Controls
Sub DownLoadFunds()
Dim IE As InternetExplorer
Dim sHTML
Const sURL As String = "https://www.nationwide.com/access/web/login.htm"
Const sURL2 As String = "https://isc.nwservicecenter.com/iApp/isc/app/ia/balanceDetail.do?basho.menuNodeId=12245"
Dim wsTemp As Worksheet
Set wsTemp = Worksheets("Scratch")
Set IE = New InternetExplorer
With IE
.Navigate sURL
.Visible = True 'for debugging
Do While .ReadyState <> READYSTATE_COMPLETE
DoEvents
Loop
Do While .Busy = True
DoEvents
Loop
'Login: User Name and Password "remembered" by IE
.Document.all("submitButton").Click
Do While .ReadyState <> READYSTATE_COMPLETE
DoEvents
Loop
Do While .Busy = True
DoEvents
Loop
'Select this account to show
.Document.all("RothIRA_#########").Click '<--Error at this line
Do While .ReadyState <> READYSTATE_COMPLETE
DoEvents
Loop
Do While .Busy = True
DoEvents
Loop
.Navigate sURL2
Do While .ReadyState <> READYSTATE_COMPLETE
DoEvents
Loop
Do While .Busy = True
DoEvents
Loop
Set sHTML = .Document.GetElementByID("fundByFundOnly")
With wsTemp
.Cells.Clear
.Range("a2") = sHTML.innertext
End With
.Quit
End With
Set IE = Nothing
End Sub
This is the error message:
This shows the highlighted line:
EDIT:
At Tim Williams suggestion, I added a loop to test for the presence of the desired element. This seems to work:
...
On Error Resume Next
Do
Err.Clear
DoEvents
Application.Wait (Time + TimeSerial(0, 0, 1))
.Document.getelementbyid("RothIRA_#########").Click
Loop Until Err.Number = 0
On Error GoTo 0
....
IE.Document.all("#RothIRA_....") is returning Nothing (null in more refined languages), so calling the Click method is causing the error.
Your code is the same as doing this:
Dim rothElement As Whatever
rothElement = IE.Document.all("#RothIRA_....")
rothElement.Click
...when you should do this:
Dim rothElement As Whatever
rothElement = IE.Document.all("#RothIRA_....")
If rothElement <> Nothing Then
rothElement.Click
End If
I suggest using the modern document.GetElementById method instead of the deprecated (if not obsolete) document.All API.
It's possible/likely that the page is using script to dynamically load some content or generate some layout after your "wait" loop has finished. That loop only waits until all linked content/resources have been loaded - it does not wait for scripts on the loaded page to finish, etc.
One approach is to loop your code waiting for the desired element to be rendered:
Const MAX_WAIT_SEC as Long = 5 'limit on how long to wait...
Dim t
t = Timer
Do While .Document.all("RothIRA_#########") Is Nothing
DoEvents
'or you can Sleep here
If Timer - t > MAX_WAIT_SEC Then Exit Do
Loop
'carry on...

Web-scraping on intranet

I wrote a VBA code to scrape data from my company's intranet.
Issues:
The below error occurs:
Run-time error '91':
object variable or with block variable not set
It happens on:
myPoints = Trim(Doc.getElementsByName("price")(0).getAttribute("value"))
When I debug it and run line by line, it can retrieve all the values.
Input and Output:
I input multiple product ID on column B and retrieve data on column C:
Column B = product ID
Column C = price
HTML:
<td id="myPower_val_9" style="visibility: visible;">
<input type="text" disabled="disabled" value="300" name="price"></input>
</td>
VBA:
Sub Button1_Click()
Dim ie As Object
Dim r As Integer
Dim myPoints As String
Dim Doc As HTMLDocument
Set ie = New InternetExplorerMedium
For r = 2 To Range("B65535").End(xlUp).Row
With ie
.Visible = 0
.navigate "www.example.com/product/" & Cells(r, "B").Value
Do Until .readyState = 4
DoEvents
Loop
End With
Set Doc = ie.document
myPoints = Trim(Doc.getElementsByName("price")(0).getAttribute("value"))
Cells(r, "C").Value = myPoints
Next r
End Sub
Have I missed an error handler?
You need to wait for the document to be fully rendered and the DOM available before accessing any elements. ie.ReadyState changes to READYSTATE_COMPLETE once the page connects and starts loading. The reason that your code works when debugging is that in the couple of seconds it takes for you to start working with the debugger, the page finishes loading.
With ie
.Visible = True
.Navigate "www.example.com/product/" & Cells(r, "B").Value
Do Until .ReadyState = READYSTATE_COMPLETE
DoEvents
Loop
Do Until .Document.ReadyState = "complete"
DoEvents
Loop
End With
I would also recommend that you make the ie Window visible, at least while you're developing. Once you've got your functionality complete and debugging, you can make the window invisible. Keep in mind if you forget to close your invisible IE windows when your code finishes, your users will end up with runaway iexplore.exe processes.
If you only want to ignore the error and continue with the next iteration, use this modified code:
Sub Button1_Click()
Dim ie As Object
Dim r As Integer
Dim myPoints As String
Dim Doc As HTMLDocument
Set ie = New InternetExplorerMedium
For r = 2 To Range("B65535").End(xlUp).Row
With ie
.Visible = 0
.navigate "www.example.com/product/" & Cells(r, "B").Value
Do Until .readyState = 4
DoEvents
Loop
End With
Set Doc = ie.document
'Edit:
myPoints = ""
On Error Resume Next
myPoints = Trim(Doc.getElementsByName("price")(0).getAttribute("value"))
On Error Goto 0
Cells(r, "C").Value = myPoints
Next r
End Sub
You could also loop until element is set (add a timeout clause as well)
Dim a As Object
Do
DoEvents
On Error Resume Next
Set a = Doc.getElementsByName("price")
On Error GoTo 0
Loop While a Is Nothing

Resources