I have such peas of code that open web page with two input boxes. I am trying to display a currency table with a different date than the default, but it does not work. Everything is fine only while mouse clicking on "Report" button - then I can display any date.
Does anyone have any idea?
I have already tried with: "Application.SendKeys ("{ENTER}"), True" and with different date format. I was also looking for information about hasDatepicker class...
Sub getDataFrombrowser()
Dim address As String
Dim browser As InternetExplorer
Set browser = New InternetExplorerMedium
With browser
.Visible = True
End With
address = "http://www.nbrm.mk/kursna_lista-en.nspx"
With browser
.navigate address
Do While .Busy Or .readyState <> 4: DoEvents: Loop
.navigate address
Do While .Busy Or .readyState <> 4: DoEvents: Loop
End With
browser.document.getElementsByClassName("form-control sdate hasDatepicker")(0).Value = Format(Date - 1, "DD.MM.YYYY")
browser.document.getElementsByClassName("form-control edate hasDatepicker")(0).Value = Format(Date - 1, "DD.MM.YYYY")
Set objCollection = browser.document.getElementsByTagName("input")
objCollection(7).Click
End Sub
You can mimic the POST request the page does and use XMLHTTP rather than a slow browser. You get a json response. You can use a json parser to handle this and extract what info you want. I extract everything. Headers are in Slovenian but you could substitute with your own hardcoded English values. View full example json response here.
Download json parser here
You specify the start and end dates in the body of the request.
Public Sub GetRates()
'install https://github.com/VBA-tools/VBA-JSON/blob/master/JsonConverter.bas and add to project
'VBE > Tools > References > Microsoft Scripting Runtime Library
Dim json As Object, body As String
Dim ws As Worksheet, results(), headers()
body = "{""startDate"":""23.03.2019"",""endDate"":""21.04.2019"",""isStateAuth"":""0""}"
Set ws = ThisWorkbook.Worksheets("Sheet1")
With CreateObject("MSXML2.XMLHTTP")
.Open "POST", "http://www.nbrm.mk/services/ExchangeRates.asmx/GetEXRates", False
.setRequestHeader "User-Agent", "Mozilla/5.0"
.setRequestHeader "Content-Type", "application/json; charset=UTF-8"
.setRequestHeader "Referer", "http://www.nbrm.mk/kursna_lista-en.nspx"
.setRequestHeader "Content-Length", Len(body)
.send body
Set json = JsonConverter.ParseJson(.responseText)
Dim ratesParent As Object, rates As Object, rate As Object, header As Object
Set ratesParent = json("d")
Set header = ratesParent.item(1)("ExchangeRates").item(1)
ReDim results(1 To 10000, 1 To header.Count)
ReDim headers(1 To header.Count)
Dim key As Variant, c As Long, r As Long
headers = header.keys
For Each rates In ratesParent
For Each rate In rates("ExchangeRates") 'dictionaries
r = r + 1: c = 1
For Each key In rate.keys
results(r, c) = rate(key)
c = c + 1
Next
Next
Next
With ws
.Cells(1, 1).Resize(1, UBound(headers) + 1) = headers
.Cells(2, 1).Resize(UBound(results, 1), UBound(results, 2)) = results
End With
End With
End Sub
Related
I am working on a code to get data from : https://www.investing.com/economic-calendar/core-durable-goods-orders-59
I have got the code for getting this via httprequest: but looking to change this to work for the economic data (link above) is there any way I can get the same for the economic indicators??
code below:
Option Explicit
Sub Export_Table()
'Html Objects---------------------------------------'
Dim htmlDoc As MSHTML.HTMLDocument
Dim htmlBody As MSHTML.htmlBody
Dim ieTable As MSHTML.HTMLTable
Dim Element As MSHTML.HTMLElementCollection
'Workbooks, Worksheets, Ranges, LastRow, Incrementers ----------------'
Dim wb As Workbook
Dim Table As Worksheet
Dim i As Long
Set wb = ThisWorkbook
Set Table = wb.Worksheets("Sheet1")
'-------------------------------------------'
Dim xmlHttpRequest As New MSXML2.XMLHTTP60 '
'-------------------------------------------'
i = 2
'Web Request --------------------------------------------------------------------------'
With xmlHttpRequest
.Open "POST", "https://www.investing.com/instruments/HistoricalDataAjax", False
.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
.setRequestHeader "X-Requested-With", "XMLHttpRequest"
.send "curr_id=951681&smlID=1695217&header=CLNX+Historical+Data&st_date=01%2F01%2F2017&end_date=03%2F01%2F2019&interval_sec=Monthly&sort_col=date&sort_ord=DESC&action=historical_data"
If .Status = 200 Then
Set htmlDoc = CreateHTMLDoc
Set htmlBody = htmlDoc.body
htmlBody.innerHTML = xmlHttpRequest.responseText
Set ieTable = htmlDoc.getElementById("curr_table")
For Each Element In ieTable.getElementsByTagName("tr")
Table.Cells(i, 1) = Element.Children(0).innerText
Table.Cells(i, 2) = Element.Children(1).innerText
Table.Cells(i, 3) = Element.Children(2).innerText
Table.Cells(i, 4) = Element.Children(3).innerText
Table.Cells(i, 5) = Element.Children(4).innerText
Table.Cells(i, 6) = Element.Children(5).innerText
Table.Cells(i, 7) = Element.Children(6).innerText
i = i + 1
DoEvents: Next Element
End If
End With
Set xmlHttpRequest = Nothing
Set htmlDoc = Nothing
Set htmlBody = Nothing
Set ieTable = Nothing
Set Element = Nothing
End Sub
Public Function CreateHTMLDoc() As MSHTML.HTMLDocument
Set CreateHTMLDoc = CreateObject("htmlfile")
End Function
I have used the excel tool Power Query for this very thing. It is also called Get & Transform Data. I don't like using vba for doing this type of thing.
To make it work:
In Excel Go to Data>Get Data>From Other Sources>From Web.
Enter the URL
Wait for the webpage to load and then pick the table that you want.
This website took awhile to load, but it did work for me.
Choose "Load" which goes directly to the sheet, or "Transform Data" to manipulate the data in Power Query. There are many options in power query such as split columns, filter data, Calculate Columns and ...
I would avoid the overhead of setting up a permanent connection and simply continue using XHR. With the data > from web, you cannot grab more rows than are present on the initial landing. If however you go with XHR, you can issue POST requests to get more data. The code below utilizes a loop to retrieve additional results beyond the immediate visible on the page.
When you press the Show more link there is a POST request for an additional 6 rows which uses the latest date from the current set of results as part of the POST body. The response returned is JSON. Rather than bringing in a JSON parser, given the standard nature of the JSON, and that I am already using regex to clean the date format in column 1 to put in the POST body, I use two simple regexes to extract the html for the next results table from the response, and to check whether there are more results.
The format of the JSON is:
{
"historyRows": "<tr>…..</tr>",
"hasMoreHistory": "1"
}
Or
{
"historyRows": "<tr>…..</tr>",
"hasMoreHistory": false
}
So, I do some cleaning of the extracted html in order to not confuse the html parser within MSHTML. Furthermore, I add in an id to identify the table I have constructed, so I can continue to use an id css selector (#) list within my UpdateDateResults function.
I initially oversize an array to store each retrieved table which I update ByRef. I loop requesting more results until either there are no more results, there is an error parsing the maximum date from the last retrieved table column 1, or until my specified earliest date for data retrieval falls within the date range of the latest returned table.
Finally, I write the results array out to the sheet in one go.
N.B. You can target the table by its id. It looks like the number at the end of the id could be the same as for the goods url, lending itself to generalizing the code below to work for other goods.
VBA:
Option Explicit
Public Sub GetInvestingInfo()
'tools > references > Microsoft HTML Object Library
Dim html As MSHTML.HTMLDocument, xhr As Object
Const STARTDATE As Date = "2019-11-25" 'Adjust as required. DateAdd("yyyy", -2, Date) 2 years back. This means may have some earlier months in _
batch that spans the start date but won't issue an additional request after this
Set xhr = CreateObject("MSXML2.XMLHTTP")
Set html = New MSHTML.HTMLDocument
With xhr
.Open "GET", "https://www.investing.com/economic-calendar/core-durable-goods-orders-59", False
.setRequestHeader "User-Agent", "Safari/537.36"
.send
html.body.innerHTML = .responseText
End With
Dim firstTable As Boolean, r As Long, results() As Variant
ReDim results(1 To 100000, 1 To 5)
'process initial table and update results, get cleaned date needed for request for more results
firstTable = True
Dim latestDate As String
UpdateDateResults latestDate, results, firstTable, r, html
Dim re As Object, maxDate As String, hasMoreHistory As Boolean, s As String
Set re = CreateObject("VBScript.RegExp")
With re
.Global = True
.MultiLine = False
End With
maxDate = cleanedDate(latestDate, re)
hasMoreHistory = True
Dim errorDate As Date
errorDate = DateAdd("d", 1, Date)
Do While maxDate >= STARTDATE And maxDate < errorDate 'break loop using pre-defined earliest date, error with date conversion, or when no more rows found
Application.Wait (Now + TimeSerial(0, 0, 1)) 'Pause
s = GetMoreRows(xhr, Format$(maxDate, "YYYY-MM-DD")) 'max a POST request for more data
re.Pattern = "hasMoreHistory"":(""?.*?""?)}" 'Check if there are more rows still available. "1" for yes, false for no
hasMoreHistory = (re.Execute(s)(0).submatches(0) <> False)
If Not hasMoreHistory Then Exit Do
re.Pattern = "historyRows"":""(.*)"","
html.body.innerHTML = "<table id=""me"">" & Replace$(re.Execute(s)(0).submatches(0), "\/", "/") & "</table>" 'fix html and feed into html variable
UpdateDateResults latestDate, results, firstTable, r, html
maxDate = cleanedDate(latestDate, re) 'convert value retrieved from last row in date column of table to an actual date
Loop
With ActiveSheet
.Cells.ClearContents
.Cells(1, 1).Resize(r, 5) = results 'Don't bother to resize results as clear all cells before write ou
End With
End Sub
Public Sub UpdateDateResults(ByRef latestDate As String, ByRef results() As Variant, ByRef firstTable As Boolean, ByRef r As Long, ByVal html As MSHTML.HTMLDocument)
Dim table As MSHTML.HTMLTable 'return latest date from function
Set table = html.querySelector("#eventHistoryTable59, #me")
latestDate = table.Rows(table.Rows.Length - 1).Children(0).innerText
Dim i As Long, n As Long, j As Long
n = IIf(firstTable, 0, 1)
For i = n To table.Rows.Length - 1
r = r + 1
For j = 0 To table.Rows(i).Children.Length - 2
results(r, j + 1) = table.Rows(i).Children(j).innerText
Next
Next
firstTable = False
End Sub
Public Function cleanedDate(ByVal dirtyString As String, ByVal re As Object) As Date
re.Pattern = "(^[A-Z][a-z]{2}).*(\d{2}),.(\d{4})(.*)"
On Error GoTo errhand:
If re.test(dirtyString) Then
cleanedDate = CDate(re.Replace(dirtyString, "$2" & Chr$(32) & "$1" & Chr$(32) & "$3"))
Exit Function
End If
errhand:
cleanedDate = DateAdd("d", 1, Date)
End Function
Public Function GetMoreRows(ByVal xhr As Object, ByVal dateStamp As String) As String
With xhr
.Open "POST", "https://www.investing.com/economic-calendar/more-history", False
.setRequestHeader "User-Agent", "Safari/537.36"
.setRequestHeader "x-requested-with", "XMLHttpRequest"
.setRequestHeader "content-type", "application/x-www-form-urlencoded"
.send "eventID=430865&event_attr_ID=59&event_timestamp=" & dateStamp & "+" & Application.WorksheetFunction.EncodeURL("12:30:00") & "&is_speech=0"
GetMoreRows = .responseText
End With
End Function
Regexes (without the double " escaping for VBA):
hasMoreHistory":("?.*?"?)}
historyRows":"(.*)",
i am trying to extract one figure from a gov website, I have done a lot of googling and I am kinda lost for ideas, my code below returns a figure but it isnt the figure I want to get and I am not entirely sure why.
I want to subtract the figure from the 'Cases by Area (Whole Pandemic)' table 'Upper tier LA' section and 'Southend on Sea' Case number.
https://coronavirus.data.gov.uk/details/cases
I stole this code from online somewhere and tried to replicate with my class number I found within F12 section on the site.
Sub ExtractLastValue()
Set objIE = CreateObject("InternetExplorer.Application")
objIE.Top = 0
objIE.Left = 0
objIE.Width = 800
objIE.Height = 600
objIE.Visible = True
objIE.Navigate ("https://coronavirus.data.gov.uk/details/cases")
Do
DoEvents
Loop Until objIE.readystate = 4
MsgBox objIE.document.getElementsByClassName("sc-bYEvPH khGBIg govuk-table__cell govuk-table__cell--numeric ")(0).innerText
Set objIE = Nothing
End Sub
Data comes from the official API and returns a json response dynamically on that page when you click the Upper Tier panel.
Have a look and play with the API guidance
here:
https://coronavirus.data.gov.uk/details/developers-guide
You can make a direct xhr request by following the guidance in the API documentation and then using a json parser to handle the response. For your request it would be something like the following:
https://coronavirus.data.gov.uk/api/v1/data?filters=areaName=Southend-on-Sea&areaType=utla&latestBy=cumCasesByPublishDate&structure=
{"date":"date", "areaName":"areaName","cumCasesByPublishDate":"cumCasesByPublishDate",
"cumCasesByPublishDateRate":"cumCasesByPublishDateRate"}
XHR:
A worked example using jsonconverter.bas as the json parser
Option Explicit
Public Sub GetCovidNumbers()
Dim http As Object, json As Object
Set http = CreateObject("MSXML2.XMLHTTP")
With http
.Open "GET", "https://coronavirus.data.gov.uk/api/v1/data?filters=areaName=Southend-on-Sea&areaType=utla&latestBy=cumCasesByPublishDate&structure={""date"":""date"",""areaName"":""areaName"",""cumCasesByPublishDate"":""cumCasesByPublishDate"",""cumCasesByPublishDateRate"":""cumCasesByPublishDateRate""}", False
.setRequestHeader "User-Agent", "Mozilla/5.0"
.send
Set json = JsonConverter.ParseJson(.responseText)("data")(1)
End With
With ActiveSheet
Dim arr()
arr = json.Keys
.Cells(1, 1).Resize(1, UBound(arr) + 1) = arr
arr = json.Items
.Cells(2, 1).Resize(1, UBound(arr) + 1) = arr
End With
End Sub
Json library (Used in above solution):
I use jsonconverter.bas. Download raw code from here and add to standard module called JsonConverter . You then need to go VBE > Tools > References > Add reference to Microsoft Scripting Runtime. Remove the top Attribute line from the copied code.
Internet Explorer:
You could do a slower, more complicated, internet explorer solution where you need to select the utla option when present, then select from the table the desired value:
Option Explicit
Public Sub GetCovidNumbers()
'Tools references Microsoft Internet Controls and Microsoft HTML Object Library
Dim ie As SHDocVw.InternetExplorer, t As Date, ele As Object
Const MAX_WAIT_SEC As Long = 10
Set ie = New SHDocVw.InternetExplorer
With ie
.Visible = True
.Navigate2 "https://coronavirus.data.gov.uk/details/cases"
While .Busy Or .ReadyState <> READYSTATE_COMPLETE: DoEvents: Wend
t = Timer 'timed loop for element to be present to click on (to get utla)
Do
On Error Resume Next
Set ele = .Document.querySelector("#card-cases_by_area_whole_pandemic [aria-label='Upper tier LA']")
On Error GoTo 0
If Timer - t > MAX_WAIT_SEC Then Exit Do
Loop While ele Is Nothing
If ele Is Nothing Then Exit Sub
ele.Click
While .Busy Or .ReadyState <> READYSTATE_COMPLETE: DoEvents: Wend
Dim table As MSHTML.HTMLTable, datetime As String, result()
Set table = .Document.querySelector("table[download='cumCasesByPublishDate,cumCasesByPublishDateRate']")
datetime = .Document.querySelector("time").getAttribute("datetime")
result = GetDataForUtla("Southend-on-Sea", datetime, table)
With ActiveSheet
.Cells(1, 1).Resize(1, 4) = Array("Datetime", "Area", "Cases", "Rate per 100,000 population")
.Cells(2, 1).Resize(1, UBound(result) + 1) = result
End With
.Quit
End With
End Sub
Public Function GetDataForUtla(ByVal utla As String, ByVal datetime As String, ByVal table As MSHTML.HTMLTable) As Variant
Dim row As MSHTML.HTMLTableRow, i As Long
For Each row In table.Rows
If InStr(row.outerHTML, utla) > 0 Then
Dim arr(4)
arr(0) = datetime
For i = 0 To 2
arr(i + 1) = row.Children(i).innerText
Next
GetDataForUtla = arr
Exit Function
End If
Next
GetDataForUtla = Array("Not found")
End Function
References:
https://developer.mozilla.org/en-US/docs/Web/CSS/CSS_Selectors
https://developer.mozilla.org/en-US/docs/Web/API/Document/querySelector
The URL from Yahoo is not working when I try to retrieve quotes from a particular stock. There are several discussion about it, However, it seems nothing is shown regarding VBA macro
Sub Get_Data()
Dim URL As String
Dim Ticker As String
Dim http As New WinHttpRequest
Dim sCotes As String
Dim Lignes
Dim Valeurs
Dim i As Long
Dim j As Long
Dim sLigne As String
Dim sValeur As String
Ticker = Range("Ticker")
URL = "https://query1.finance.yahoo.com/v7/finance/download/TECK?period1=1540456339&period2=1571992339&interval=1d&events=history&crumb=kjOZLFv6ch2"
http.Send
sCotes = http.ResponseText
MsgBox sCotes
Lignes = Split(sCotes, Chr(10))
For i = 1 To UBound(Lignes) 'until the end of the Lignes variable
sLigne = Lignes(i)
Valeurs = Split(sLigne, ",")
For j = 0 To UBound(Valeurs) - 1
Select Case j
Case 0
sValeur = DateSerial(CLng(Left(Valeurs(0), 4)), CLng(Mid(Valeurs(0), 6, 2)), CLng(Right(Valeurs(0), 2)))
Case 5
sValeur = CLng(Valeurs(5))
Case Else
sValeur = CDbl(Replace(Valeurs(j), ".", ","))
End Select
Range("A1").Offset(i, j) = sValeur
Application.StatusBar = Format(Cells(i, 1), "Short Date")
Next
Next
Application.StatusBar = False
End Sub
Execution error at the step Http.send : "This method cannot be called until the Open method has been called"
You would need to use the "open" method before attempting to send and GET is perfectly fine. However, a few things....
There is an easier way. The headers worth adding are the User-Agent and one to mitigate being served cached results. The following shows you how to get a json response from the server for a specified time period and write to Excel. Note: You need to concatenate the ticker into the url. You should probably also test the response code from server to ensure successful.
I use jsonconverter.bas as the json parser to handle response. Download raw code from here and add to standard module called JsonConverter . You then need to go VBE > Tools > References > Add reference to Microsoft Scripting Runtime. Remove the top Attribute line from the copied code.
The values for startDate and endDate need to be passed as unix timestamps. #TimWilliams wrote a nice function, toUnix, for converting Date to Unix here which I use. I have added my own function to manage the conversion in the opposite direction.
This method avoids using any session based identifiers so avoids your issue with invalid cookie crumb.
VBA:
Option Explicit
Public Sub GetYahooHistoricData()
Dim ticker As String, ws As Worksheet, url As String, s As String
Dim startDate As Long, endDate As Long
Set ws = ThisWorkbook.Worksheets("Sheet1")
ticker = ws.Range("ticker") 'Range A1. Above write out range
endDate = toUnix("2019-10-27")
startDate = toUnix("2018-10-25")
url = "https://query1.finance.yahoo.com/v8/finance/chart/" & ticker & "?region=US&lang=en-US&includePrePost=false&interval=1d&period1=" & startDate & "&period2=" & endDate & "&corsDomain=finance.yahoo.com&.tsrc=finance"
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", url, False
.setRequestHeader "User-Agent", "Mozilla/5.0"
.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
.send
s = .responseText
End With
Dim json As Object
Set json = JsonConverter.ParseJson(s)("chart")("result")
Dim dates As Object, results(), rows As Object, adjClose As Object, r As Long, headers()
headers = Array("date", "close", "volume", "open", "high", "low", "adjclose")
Set dates = json(1)("timestamp")
ReDim results(1 To dates.Count, 1 To UBound(headers) + 1)
Set rows = json(1)("indicators")("quote")(1)
Set adjClose = json(1)("indicators")("adjclose")(1)("adjclose")
For r = 1 To dates.Count
results(r, 1) = GetDate(dates(r))
results(r, 2) = rows("close")(r)
results(r, 3) = rows("volume")(r)
results(r, 4) = rows("open")(r)
results(r, 5) = rows("high")(r)
results(r, 6) = rows("low")(r)
results(r, 7) = adjClose(r)
Next
With ws
.Cells(3, 1).Resize(1, UBound(headers) + 1) = headers
.Cells(4, 1).Resize(UBound(results, 1), UBound(results, 2)) = results
End With
End Sub
Public Function GetDate(ByVal t As Variant) As String
GetDate = Format$(t / 86400 + DateValue("1970-01-01"), "yyyy-mm-dd")
End Function
Public Function toUnix(ByVal dt As Variant) As Long
toUnix = DateDiff("s", "1/1/1970", dt)
End Function
Example top 10 rows:
Try replacing this code
URL = "https://query1.finance.yahoo.com/v7/finance/download/TECK?period1=1540456339&period2=1571992339&interval=1d&events=history&crumb=kjOZLFv6ch2"
http.Send
with this code:
set http = Server.Createobject("MSXML2.ServerXMLHTTP.6.0")
URL = "https://query1.finance.yahoo.com/v7/finance/download/TECK?period1=1540456339&period2=1571992339&interval=1d&events=history&crumb=kjOZLFv6ch2"
http.open "POST", URL, False
http.Send
The error is pretty clear: you need to call the open method before the Send method. Also this would be a POST request. You also may need to put these two lines after the open method:
http.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
http.setRequestHeader "Content-Length", 0
The question is about 99% duplicate as the one from here - How can I send an HTTP POST request to a server from Excel using VBA?. Anyway, the mistake is obviously, because the .Send() method simply sends a completely empty Dim http As New WinHttpRequest object.
To make the code work, copy the example from the duplcated question and print the http.ResponseText:
Sub TestMe()
Dim http As Object
Dim url As String
Set http = CreateObject("MSXML2.ServerXMLHTTP.6.0")
url = "https://query1.finance.yahoo.com/v7/finance/download/TECK?period1=1540456339&period2=1571992339&interval=1d&events=history&crumb=kjOZLFv6ch2"
http.Open "POST", url, False
http.Send
MsgBox http.responsetext
End Sub
So, I have the following code to scrap the data from a website and it's working without any problem.
My "issue" now it's that I need to run the code trought multiple webpages because the website I'm scraping has a pagination script.
Eg: One single page has 48 records, but in most of the cases the page has 200+ records but they are sub-divided on 3/4 pages.
My code:
Public Sub Roupa()
Dim data As Object, i As Long, html As HTMLDocument, r As Long, c As Long, item As Object, div As Object
Set html = New HTMLDocument '<== VBE > Tools > References > Microsoft HTML Object Library
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://www.worten.pt/grandes-eletrodomesticos/maquinas-de-roupa/maquinas-de-roupa-ver-todos-marca-BALAY-e-BOSCH-e-SIEMENS?per_page=100", False
.send
html.body.innerHTML = .responseText
End With
Set data = html.getElementsByClassName("w-product__content")
For Each item In data
r = r + 1: c = 1
For Each div In item.getElementsByTagName("div")
With ThisWorkbook.Worksheets("Roupa")
.Cells(r, c) = div.innerText
End With
c = c + 1
Next
Next
Sheets("Roupa").Range("A:A,C:C,F:F,G:G,H:H,I:I").EntireColumn.Delete
End Sub
UPDATE
I've tried adding this For n = 1 To 2 before the With, it works but I need to know the exact number of pages so that's not so helpful..
Work out how many pages there are by dividing the result count by the results per page. Then do a loop concatenating the appropriate page number onto the url
Option Explicit
Public Sub Roupa()
Dim data As Object, i As Long, html As HTMLDocument, r As Long, c As Long, item As Object, div As Object
Set html = New HTMLDocument '<== VBE > Tools > References > Microsoft HTML Object Library
Const RESULTS_PER_PAGE As Long = 48
Const START_URL As String = "https://www.worten.pt/grandes-eletrodomesticos/maquinas-de-roupa/maquinas-de-roupa-ver-todos-marca-BALAY-e-BOSCH-e-SIEMENS?per_page=" & RESULTS_PER_PAGE & "&page=1"
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", START_URL, False
.setRequestHeader "User-Agent", "Mozilla/5.0"
.send
html.body.innerHTML = .responseText
Dim numPages As Long, numResults As Long, arr() As String
arr = Split(html.querySelector(".w-filters__element").innerText, Chr$(32))
numResults = arr(UBound(arr))
numPages = 1
If numResults > RESULTS_PER_PAGE Then
numPages = Application.RoundUp(numResults / RESULTS_PER_PAGE, 0)
End If
For i = 1 To numPages
If i > 1 Then
.Open "GET", Replace$("https://www.worten.pt/grandes-eletrodomesticos/maquinas-de-roupa/maquinas-de-roupa-ver-todos-marca-BALAY-e-BOSCH-e-SIEMENS?per_page=" & RESULTS_PER_PAGE & "&page=1", "page=1", "page=" & i), False
.setRequestHeader "User-Agent", "Mozilla/5.0"
.send
html.body.innerHTML = .responseText
End If
Set data = html.getElementsByClassName("w-product__content")
For Each item In data
r = r + 1: c = 1
For Each div In item.getElementsByTagName("div")
With ThisWorkbook.Worksheets("Roupa")
.Cells(r, c) = div.innerText
End With
c = c + 1
Next
Next
Next
End With
Sheets("Roupa").Range("A:A,C:C,F:F,G:G,H:H,I:I").EntireColumn.Delete
End Sub
Thinking about about what #AhmedAu said, if page has loaded properly, looks like a good way to also get page count is to simply use:
numPages = html.querySelectorAll("[data-page]").Length
I want to scrap every post heading form this blog. I am using the code bellow but it's giving me an error "Run time error 438 object does not support this property or method" in line
Cells(i, 1).Value = ele.getElementsByClassName("entry-title")(0).getElementsByTagName("a")(0).innerText
The code is:
Private Sub CommandButton1_Click()
Dim bot As Object
Dim doc As New HTMLDocument
Dim ele As HTMLElementCollection
Dim i As Long
Set bot = CreateObject("MSXML2.XMLHTTP")
bot.Open "GET", "http://themakeupblogger.com/makeup/", False
bot.send
doc.body.innerHTML = bot.responseText
For Each ele In doc.getElementsByTagName("article")
i = Cells(Rows.Count, 1).End(xlUp).Row + 1
Cells(i, 1).Value = ele.getElementsByClassName("entry-title")(0).getElementsByTagName("a")(0).innerText
Next ele
End Sub
Give this a shot and get all the titles you are after.
Sub demo()
Dim http As New XMLHTTP60, html As New HTMLDocument
Dim r As Long, ele As Object
With http
.Open "GET", "http://themakeupblogger.com/makeup/", False
.send
html.body.innerHTML = .responseText
End With
For Each elem In html.getElementsByClassName("entry-title")
With elem.getElementsByTagName("a")
If .Length Then r = r + 1: Cells(r, 1) = .Item(0).innerText
End With
Next elem
End Sub
Reference to add to the library:
1. Microsoft XML, v6.0
2. Microsoft HTML Object Library
Partial results:
4 High-Coverage Foundations That Might As Well Be Skincare
10 Memorial Day Beauty Essentials That Belong In Your Beach Bag
Don’t Get Married Without These Wedding Day Makeup Tips (Courtesy of a Makeup Artist)
To get the articles from that page you can do something like:
Sub demo()
Dim http As New InternetExplorer, html As New HTMLDocument
Dim r As Long, elem As Object
With http
.Visible = False
.navigate "http://themakeupblogger.com/makeup/"
Do Until .readyState = READYSTATE_COMPLETE: Loop
Set html = .document
End With
For Each elem In html.getElementsByTagName("article")
With elem.getElementsByTagName("h1")
If .Length Then r = r + 1: Cells(r, 1) = .Item(0).getElementsByTagName("a")(0).innerText
End With
With elem.getElementsByTagName("div")(3).getElementsByTagName("p")
If .Length Then Cells(r, 2) = .Item(0).innerText
End With
Next elem
End Sub
This time the reference you should add to the library:
1. Microsoft Internet Controls
2. Microsoft HTML Object Library