I'm trying to create a macro to fetch some content from a webpage and write the same in an excel file in a customized manner. I've used two identical links from the same website. Here is one of them. I'm interested in three fields Name,Recipe and Ingredients.
The script that I've created can parse the data accordingly. However, I wanna arrange them in an excel file like this.
I've written so far (working flawlessly):
Sub GetAndArrangeData()
Dim HTML As New HTMLDocument, oPost As Object
Dim HTMLDoc As New HTMLDocument, ws As Worksheet
Dim oTitle As Object, oPosts As Object
Dim linklist As Variant, url As Variant
linklist = Array( _
"https://www.chelseasmessyapron.com/avocado-chicken-salad-2/", _
"https://www.chelseasmessyapron.com/caprese-quinoa-salad/" _
)
Set ws = ThisWorkbook.Worksheets("Sheet1")
For Each url In linklist
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", url, False
.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.104 Safari/537.36"
.send
HTML.body.innerHTML = .responseText
End With
Set oTitle = HTML.querySelector("h1.entry-title")
Debug.Print oTitle.innerText
Set oPost = HTML.querySelectorAll(".cma-recipe-nutrition > .wprm-nutrition-label-container > span[class*='nutrition-container']")
For I = 0 To oPost.Length - 1
HTMLDoc.body.innerHTML = oPost(I).outerHTML
Debug.Print HTMLDoc.querySelector("span.wprm-nutrition-label-text-nutrition-label").innerText
Debug.Print HTMLDoc.querySelector("span[class*='nutrition-value']").innerText
Next I
Set oPosts = HTML.querySelectorAll(".wprm-recipe-block-container")
For I = 0 To oPosts.Length - 1
HTMLDoc.body.innerHTML = oPosts(I).outerHTML
On Error Resume Next
Debug.Print HTMLDoc.querySelector("span.wprm-recipe-details-label").innerText
Debug.Print HTMLDoc.querySelector("span.wprm-recipe-details").innerText
On Error GoTo 0
Next I
Next url
End Sub
How can I write the data in an excel file the way I've shown in the image above?
Btw, this is the result I got in the immediate window:
Avocado Chicken Salad
Calories:
542
Carbohydrates:
30
Protein:
11
Fat:
45
Saturated Fat:
7
Cholesterol:
16
Sodium:
285
Potassium:
687
Fiber:
8
Sugar:
9
Vitamin A:
945
Vitamin C:
19
Calcium:
36
Iron:
1
Course
Cuisine
Keyword
Prep Time
20
Cook Time
15
Total Time
35
Servings
2
Calories
542
Cost
$6.82
Caprese Quinoa Salad
Calories:
375
Carbohydrates:
30
Protein:
11
Fat:
26
Saturated Fat:
4
Cholesterol:
7
Sodium:
73
Potassium:
996
Fiber:
9
Sugar:
7
Vitamin A:
17616
Vitamin C:
32
Calcium:
168
Iron:
4
Course
Cuisine
Keyword
Prep Time
35
Cook Time
25
Chilling Time (Quinoa)
1
Total Time
2
Servings
6
Calories
375
Cost
$6.84
Basically, you just need to keep track where to write the data. I define a variable row that is set to the first row where you want to put data into. After every recipe, the number of rows written is added to row. To keep track of the number of rows, I am using two separate variables oPostNut and oPostsRecipe(instead of only oneoPosts`) and add the number of entries of the larger list - that's basically all.
(...)
Dim row As Long
row = 1 ' Change to whatever row you want to start
For Each url In linklist
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", url, False
.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.104 Safari/537.36"
.send
HTML.body.innerHTML = .responseText
End With
Set oTitle = HTML.querySelector("h1.entry-title")
ws.Cells(row, 1) = oTitle.innerText
Dim i As long
Dim oPostsNut As Object
Set oPostsNut = HTML.querySelectorAll(".cma-recipe-nutrition > .wprm-nutrition-label-container > span[class*='nutrition-container']")
For i = 0 To oPostsNut.Length - 1
HTMLDoc.body.innerHTML = oPostsNut(i).outerHTML
ws.Cells(row + i, 2) = HTMLDoc.querySelector("span.wprm-nutrition-label-text-nutrition-label").innerText
ws.Cells(row + i, 3) = HTMLDoc.querySelector("span[class*='nutrition-value']").innerText
Next i
Dim oPostsRecipe As Object
Set oPostsRecipe = HTML.querySelectorAll(".wprm-recipe-block-container")
For i = 0 To oPostsRecipe.Length - 1
HTMLDoc.body.innerHTML = oPostsRecipe(i).outerHTML
On Error Resume Next
ws.Cells(row + i, 4) = HTMLDoc.querySelector("span.wprm-recipe-details-label").innerText
ws.Cells(row + i, 5) = HTMLDoc.querySelector("span.wprm-recipe-details").innerText
On Error GoTo 0
Next i
row = row + IIf(oPostsNut.Length > oPostsRecipe.Length, oPostsNut.Length, oPostsRecipe.Length)
Next url
I think we can do better. If we use a more selective css selector we can get rid of the additional info that I am seeing in other answer (12/02/21) and your original attempt. Using the selector below I remove that additional info and only return desired info. I work with an array as is faster than writing to sheet all the time. I remove the need for re-creating xmlhttp object and the additional HTMLDocument.
Option Explicit
Public Sub GetAndArrangeData()
Dim html As New MSHTML.HTMLDocument, xhr As Object, ws As Worksheet
Dim linklist As Variant, url As Variant, totalRows
linklist = Array( _
"https://www.chelseasmessyapron.com/avocado-chicken-salad-2/", _
"https://www.chelseasmessyapron.com/caprese-quinoa-salad/" _
)
Set ws = ThisWorkbook.Worksheets("Sheet1")
Set xhr = CreateObject("MSXML2.XMLHTTP")
totalRows = 1
For Each url In linklist
With xhr
.Open "GET", url, False
.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.104 Safari/537.36"
.send
html.body.innerHTML = .responseText
End With
Dim title As String
title = html.querySelector("h1.entry-title").innerText
Dim nutritionRows As Object, timesOtherRows As Object, maxRows As Long
Set nutritionRows = html.querySelectorAll(".wprm-nutrition-label-container .wprm-nutrition-label-text-nutrition-container")
Set timesOtherRows = html.querySelectorAll(".cma-recipe-mobile .wprm-recipe-times-container .wprm-recipe-block-container-columns, .wprm-recipe-meta-container ~ .wprm-recipe-block-container-columns")
maxRows = IIf(nutritionRows.Length > timesOtherRows.Length, nutritionRows.Length, timesOtherRows.Length) - 1
Dim recipeInfo(), i As Long
ReDim recipeInfo(1 To maxRows, 1 To 5)
On Error Resume Next
For i = 0 To maxRows
recipeInfo(i + 1, 1) = IIf(i = 0, title, vbNullString)
recipeInfo(i + 1, 2) = nutritionRows.Item(i).Children(0).innerText
recipeInfo(i + 1, 3) = nutritionRows.Item(i).Children(1).innerText
recipeInfo(i + 1, 4) = timesOtherRows.Item(i).Children(1).innerText
recipeInfo(i + 1, 5) = timesOtherRows.Item(i).Children(2).innerText
Next
On Error GoTo 0
ws.Cells(totalRows, 1).Resize(UBound(recipeInfo, 1), UBound(recipeInfo, 2)) = recipeInfo
totalRows = totalRows + maxRows
Next url
End Sub
JSON:
Perhaps easier though is to grab all the info as json from a script tag in the HEAD part of the response. You will need to wrap the response in body tags to prevent the HTML parser stripping this content out when you add it into the MSHTML.HTMLDocument object's body.innerHTML.
I am not going to show the json parsing as there are plenty of example but will show extracting it.
Option Explicit
Public Sub GetAndArrangeData()
Dim html As New MSHTML.HTMLDocument, xhr As Object
Dim linklist As Variant, url As Variant
linklist = Array( _
"https://www.chelseasmessyapron.com/avocado-chicken-salad-2/", _
"https://www.chelseasmessyapron.com/caprese-quinoa-salad/" _
)
Set xhr = CreateObject("MSXML2.XMLHTTP")
For Each url In linklist
With xhr
.Open "GET", url, False
.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.104 Safari/537.36"
.send
html.body.innerHTML = "<body>" & .responseText & "</body>"
End With
Debug.Print html.querySelector(".yoast-schema-graph").innerHTML
Next url
End Sub
Related
Help gratefully received on this one. I have some VBA running in Excel that inspects a series of webpages displaying betting odds for football matches and puts the odds into my spreadsheet. It has been working perfectly for months and has stopped working in the last few weeks. Here is a simplified version of the code I'm using:
Sub TestImport()
Dim http As New MSXML2.XMLHTTP60
Dim html As New MSHTML.HTMLDocument
Dim htmlEle1 As MSHTML.IHTMLElement
Dim columncounter As Integer
Dim rowccounter As Integer
Dim targetwebpage As String
Dim ColumnHeader As Variant
On Error GoTo ErrorStop
trowNum = 1
targetwebpage = "https://www.oddschecker.com/football/english/premier-league"
With http
.Open "get", targetwebpage, False
.send
End With
Set table_data = html.getElementsByTagName("tr")
If table_data.Length = 0 Then GoTo SkipLeague
For Each trow In table_data
For Each tcell In trow.Children
If tcell.innerText <> "TIP" Then 'Ignore this
tcellNum = tcellNum + 1
Cells(trowNum, tcellNum) = tcell.innerText
End If
Next tcell
Cells(trowNum, 1) = Worksheets("Leagues").Cells(j, 1)
trowNum = trowNum + 1
tcellNum = 1
Next trow
SkipLeague:
ErrorStop:
End Sub
No data gets returned because [table_data] is always null. It's always null because there are no tags in my variable, [html]. Instead, [html] seems to be simply this:
"<HEAD></HEAD>
<BODY>
<P> </P></BODY>"
Why would [html] return this value when the actual webpage (https://www.oddschecker.com/football/english/premier-league) is much more complex when displayed in my browser? And why has this problem only started in the last few weeks?
I'd be grateful for any help on this.
I did a quick test and had no issue. Some page, like Google require the User-Agent to be sent, but not the oddschecker page.
Sub TestURL()
Debug.Print GetResult("https://www.oddschecker.com/football/english/premier-league")
End Sub
Function GetResult(url As String) As String
Dim XMLHTTP As Object, ret As String
Set XMLHTTP = CreateObject("MSXML2.ServerXMLHTTP")
XMLHTTP.Open "GET", url, False
XMLHTTP.setRequestHeader "Content-Type", "text/xml"
XMLHTTP.setRequestHeader "Cache-Control", "no-cache"
XMLHTTP.setRequestHeader "Pragma", "no-cache"
XMLHTTP.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0"
XMLHTTP.send
ret = XMLHTTP.responseText
GetResult = ret
End Function
I am trying to connect 2 modules in vba such that the output of the first module (geturl) feeds into the other (getdata).
Get Url to look up the dossier URL online for substances entered in column A e.g. Acetone or alternatively the CAS number in column B can be used (see image below). Note: currently only looks up for substance info in A1 or B1.
Public Function GetUrl() As String
Const Url = "https://echa.europa.eu/information-on-chemicals/registered-substances?p_p_id=dissregisteredsubstances_WAR_dissregsubsportlet&p_p_lifecycle=1&p_p_state=normal&p_p_mode=view&_dissregisteredsubstances_WAR_dissregsubsportlet_javax.portlet.action=dissRegisteredSubstancesAction"
Set oHtml = New HTMLDocument
Set oHttp = CreateObject("MSXML2.XMLHTTP")
Set MyDict = CreateObject("Scripting.Dictionary")
SubstanceName = Cells(1, 1)
CASNumber = Cells(1, 2)
MyDict("_dissregisteredsubstances_WAR_dissregsubsportlet_disreg_name") = SubstanceName
MyDict("_dissregisteredsubstances_WAR_dissregsubsportlet_disreg_cas-number") = CASNumber
MyDict("_disssimplesearchhomepage_WAR_disssearchportlet_disclaimer") = "true"
MyDict("_disssimplesearchhomepage_WAR_disssearchportlet_disclaimerCheckbox") = "on"
payload = vbNullString
For Each DictKey In MyDict
payload = IIf(Len(DictKey) = 0, WorksheetFunction.EncodeURL(DictKey) & "=" & WorksheetFunction.EncodeURL(MyDict(DictKey)), _
payload & "&" & WorksheetFunction.EncodeURL(DictKey) & "=" & WorksheetFunction.EncodeURL(MyDict(DictKey)))
Next DictKey
With oHttp
.Open "POST", Url, False
.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.135 Safari/537.36"
.setRequestHeader "Content-type", "application/x-www-form-urlencoded"
.send (payload)
oHtml.body.innerHTML = .responseText
End With
GetUrl = oHtml.querySelector(".details").getAttribute("href")
Debug.Print oHtml.querySelector(".substanceNameLink ").innerText
Debug.Print oHtml.querySelector(".details").getAttribute("href")
End Function
If run this should return
Acetone https://echa.europa.eu/registration-dossier/-/registered-dossier/15460
Get Data uses the Url from geturl to return "DNEL" values:
Sub GetData()
'Start ECHA Search via XML HTTP Request
Dim XMLReq As New MSXML2.XMLHTTP60
Dim HTMLDoc As New MSHTML.HTMLDocument
Dim Route(1 To 3) As String
Route(1) = "sGeneralPopulationHazardViaInhalationRoute"
Route(2) = "sGeneralPopulationHazardViaDermalRoute"
Route(3) = "sGeneralPopulationHazardViaOralRoute"
XMLReq.Open "Get", GetUrl & "/7/1", False
XMLReq.send
If XMLReq.Status <> 200 Then
MsgBox "Problem" & vbNewLine & XMLReq.Status & " - " & XMLReq.statusText
Exit Sub
End If
HTMLDoc.body.innerHTML = XMLReq.responseText
'Loops through each element
For c = 1 To UBound(Route, 1)
Set Info = HTMLDoc.getElementById(Route(c))
Debug.Print Info.innerText
Set Info = HTMLDoc.getElementById(Route(c)).NextSibling.NextSibling.NextSibling
Set Data = Info.getElementsByTagName("dd")(0)
Debug.Print Data.innerText
Set Data = Info.getElementsByTagName("dd")(1)
Debug.Print Data.innerText
'Cells(r, c + 2) = Data.innerText
Next c
End Sub
For Acetone in Cell(1,1) This should Return:
Acetone
https://echa.europa.eu/registration-dossier/-/registered-dossier/15460
General Population - Hazard via inhalation route
DNEL (Derived No Effect Level)
200 mg/m³
General Population - Hazard via dermal route
DNEL (Derived No Effect Level)
62 mg/kg bw/day
General Population - Hazard via oral route
DNEL (Derived No Effect Level)
62 mg/kg bw/day
Instead of just relying on Cell A1 however, I wish to have the entire code loop for each cell with a substance in columnA/ColumnB. So in this case the URL for Acetone is found and the corresponding data is then pulled then the same occurs for Oxydipropanol.
Note in this image Substances can be looked up online using either the substance name, CAS number in columnB, or a combination of both.
Trying to connect the two modules, zo far I have only been able to get the geturl module to cycle through for each substance. I have also tried to combine both into 1 module but cant figure out how to correctly nest the for loops.
A quick google search states that you cant nest functions in vba. This makes me wonder if what I'm doing is even the right way to approach this. But I've seen similar things achieved In the past so I'm sure it's possible.
Note: If testing please use the example substances for testing. Using a random chemical say Benzene may result in an error as the tox profile for this substance doesn't exist. I still need to implement handling errors but this can be ignored for now.
I Will update you here with any further progress made, Thanks.
This worked for me:
Sub PopulateExposures()
Dim url, rw As Range
Set rw = Sheets("data").Range("A1:E1") 'first row with inputs
Do While Application.CountA(rw) > 0
url = SubstanceUrl(rw.Cells(1).Value, rw.Cells(2).Value) 'get the URL
rw.Cells(3).Resize(1, 3).Value = ExposureData(url) 'get exposure data (as array) and add to row
Set rw = rw.Offset(1, 0) 'next substance
Loop
End Sub
Public Function SubstanceUrl(SubstanceName, CASNumber) As String
Const url = "https://echa.europa.eu/information-on-chemicals/registered-substances?" & _
"p_p_id=dissregisteredsubstances_WAR_dissregsubsportlet&p_p_lifecycle=1&" & _
"p_p_state=normal&p_p_mode=view&" & _
"__dissregisteredsubstances_WAR_dissregsubsportlet_javax.portlet.action=dissRegisteredSubstancesAction"
Dim oHTML, oHttp, MyDict, payload, DictKey, sep
Set oHTML = New HTMLDocument
Set oHttp = CreateObject("MSXML2.XMLHTTP")
Set MyDict = CreateObject("Scripting.Dictionary")
MyDict("_dissregisteredsubstances_WAR_dissregsubsportlet_disreg_name") = SubstanceName
MyDict("_dissregisteredsubstances_WAR_dissregsubsportlet_disreg_cas-number") = CASNumber
MyDict("_disssimplesearchhomepage_WAR_disssearchportlet_disclaimer") = "true"
MyDict("_disssimplesearchhomepage_WAR_disssearchportlet_disclaimerCheckbox") = "on"
payload = ""
For Each DictKey In MyDict
payload = payload & sep & DictKey & "=" & WorksheetFunction.EncodeURL(MyDict(DictKey))
sep = "&"
Next DictKey
With oHttp
.Open "POST", url, False
.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.135 Safari/537.36"
.setRequestHeader "Content-type", "application/x-www-form-urlencoded"
.send payload
oHTML.body.innerHTML = .responseText
End With
SubstanceUrl = oHTML.querySelector(".details").getAttribute("href")
End Function
Function ExposureData(urlToGet)
Dim XMLReq As New MSXML2.XMLHTTP60
Dim HTMLDoc As HTMLDocument, dds
Dim Route(1 To 3) As String, Results(1 To 3) As String, c, Info, Data
Route(1) = "sGeneralPopulationHazardViaInhalationRoute"
Route(2) = "sGeneralPopulationHazardViaDermalRoute"
Route(3) = "sGeneralPopulationHazardViaOralRoute"
XMLReq.Open "Get", urlToGet & "/7/1", False
XMLReq.send
If XMLReq.Status <> 200 Then
Results(1) = "Problem" & vbNewLine & XMLReq.Status & " - " & XMLReq.statusText
Else
Set HTMLDoc = New HTMLDocument
HTMLDoc.body.innerHTML = XMLReq.responseText
For c = 1 To UBound(Route, 1)
Set Info = HTMLDoc.getElementById(Route(c))
If Not Info Is Nothing Then
Set Info = Info.NextSibling.NextSibling.NextSibling
Set dds = Info.getElementsByTagName("dd")
If dds.Length > 1 Then
Results(c) = dds(1).innerText
Else
Results(c) = "hazard unknown"
End If
Else
Results(c) = "no info"
End If
Next c
End If
ExposureData = Results
End Function
I am beginner at VBA. I have done the below code by referring to lot of articles found online.
I am trying to fetch API data from a website. It is taking the first fetch and I need the data to be fetched every 5 mins. But it is not refreshing at all. What can I do? Can anyone have a look at the code and advise?
I am using the below code to get the JSON data and later I am extracting using a JSON parser.
Sub FetchOptionChain()
Dim Json As Object
Dim webURL, webURL2 As String, mainString, subString
Dim i As Integer
Dim j As Integer
Dim k As Integer
Dim l As Integer
Dim dtArr() As String
Dim request, request2 As Object
Dim HTML_Content As Object
Dim requestString As String
webURL2 = "https://www.nseindia.com/"
webURL = "https://www.nseindia.com/api/option-chain-indices?symbol=BANKNIFTY"
subString = "Resource not found"
Set HTML_Content = CreateObject("htmlfile")
'Get the WebPage Content to HTMLFile Object
With CreateObject("msxml2.xmlhttp")
.Open "GET", webURL2, False
.send
End With
FetchAgain:
With CreateObject("msxml2.xmlhttp")
.Open "GET", webURL, False
'Found online that I have to add the below to remove the cached results. Adding this is hanging the excel and it never comes out of it. Excel is hanging here
.setRequestHeader "Content-Type", "application/json"
.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36"
.send
mainString = .ResponseText
If InStr(mainString, subString) <> 0 Then
' Data has not been fetched properly. Will wait two seconds and try again.
Application.Wait (Now + TimeValue("0:00:2"))
GoTo FetchAgain
Added end with, end if and end sub. And fixed indenting to make code easier to read.
Sub FetchOptionChain()
Dim Json As Object
Dim webURL, webURL2 As String, mainString, subString
Dim i As Integer
Dim j As Integer
Dim k As Integer
Dim l As Integer
Dim dtArr() As String
Dim request, request2 As Object
Dim HTML_Content As Object
Dim requestString As String
webURL2 = "https://www.nseindia.com/"
webURL = "https://www.nseindia.com/api/option-chain-indices?symbol=BANKNIFTY"
subString = "Resource not found"
'''''''''''''''''''''''''''''''''''''''''''''
''' I don't understand this part though '''
'''''''''''''''''''''''''''''''''''''''''''''
Set HTML_Content = CreateObject("htmlfile")
'Get the WebPage Content to HTMLFile Object
With CreateObject("msxml2.xmlhttp")
.Open "GET", webURL2, False
.send
End With
'''''''''''''''''''''''''''''''''''''''''''''
'''''''''''''''' To here ''''''''''''''''''
'''''''''''''''''''''''''''''''''''''''''''''
FetchAgain:
With CreateObject("msxml2.xmlhttp")
.Open "GET", webURL, False
'Found online that I have to add the below to remove the cached results. Adding this is hanging the excel and it never comes out of it. Excel is hanging here
.setRequestHeader "Content-Type", "application/json"
.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36"
.send
mainString = .responseText
End With
If InStr(mainString, subString) <> 0 Then
' Data has not been fetched properly. Will wait two seconds and try again.
Application.Wait (Now + TimeValue("00:00:02"))
GoTo FetchAgain
End If
End Sub
But it runs and works as expected for me.
I am extracting some data from the web everything is coming of fine, except for one set of data, when it is extracted it is showing backwards in the cell.
I can not work out why it is showing backwards, as everything else is extracting fine .
Q) Could some please advise why it would do this?
This is is what i am using to pull the data, it works fine for everything else, just not this class it shows backwards in excel
Set doc = NewHTMLDocument(CStr(link))
'''IF Statement, change class to suite needs 'bscd
' On Error Resume Next
If doc.getElementsByClassName("bscd")(0) Is Nothing Then
wsSheet.Cells(StartRow + Counter, 5).Value = "-"
Else
' On Error Resume Next
wsSheet.Cells(StartRow + Counter, 5).Value = doc.getElementsByClassName("bscd")(0).Children(1).InnerText
End If
This is Class
Result Showing Backwards in excel
Could it be that the "Complete Information" is a JAVA dropdown?
Just did Ctrl+U as recommended and this is how the html is, it is showing backwards here, but displays correct on the site.
You need to click on that link to access the content. This is one of the ways how you can do. I used Explicit Wait within the script instead of hardcoded delay, so the script will wait up to 10 seconds until the content is visible.
Public driver As ChromeDriver
Sub ScrapeContent()
Const URL$ = "https://www.ebay.co.uk/itm/Metal-Floor-Fan-High-velocity-chrome-free-stand-fan-industrial-fan-3-8-Speed-UK/333664038024"
Dim oElem As Object, oItem As Object
Set driver = New ChromeDriver
driver.get URL
driver.FindElementByXPath("//span/a[contains(.,'Complete information')]", Timeout:=10000).Click
Set oElem = driver.FindElementByXPath("//span[contains(.,'Phone:')]/following::span", Timeout:=10000)
Set oItem = driver.FindElementByXPath("//span[contains(.,'Email:')]/following::span", Timeout:=10000)
Debug.Print oElem.Text, oItem.Text
End Sub
Output:
13025438495 eshijiali#outlook.com
If you use xmlhttp requests, the result you may get is reversed. However, I've used a function to make them regular:
Function reverseString(inputStr As String)
Dim myString$, I&
For I = Len(inputStr) To 1 Step -1
myString = myString & Mid(inputStr, I, 1)
Next I
reverseString = myString
End Function
Sub FetchData()
Const Url$ = "https://www.ebay.co.uk/itm/Metal-Floor-Fan-High-velocity-chrome-free-stand-fan-industrial-fan-3-8-Speed-UK/333664038024"
Dim HTML As New HTMLDocument, oPost As Object
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", Url, False
.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.104 Safari/537.36"
.send
HTML.body.innerHTML = .responseText
End With
Set oPost = HTML.getElementsByClassName("bsi-lbl")
If Not oPost Is Nothing And oPost.Length >= 1 Then
Debug.Print reverseString(oPost(0).NextSibling.innerText)
End If
If Not oPost Is Nothing And oPost.Length >= 2 Then
Debug.Print reverseString(oPost(1).NextSibling.innerText)
End If
End Sub
Output:
13025438495 eshijiali#outlook.com
I'm trying to scrape a website developed with ASP.NET Ajax with Excel VBA. I'm using the Microsoft HTML Object Library and the Microsoft XML, v6.0 library. What I would like to do is to get in a table all the items in the second text box when I select an item in the first text box.
When you select an item in the first text box automatically the items on the second text box are loaded. So first I make a GET request to the website, then I scrape all the elements with the class aspNetHidden. I add two elements to the POST string that doesnt' appear in the first scrape: ctl00$ctl18, __ASYNCPOST, with their respective values. I also added the value for the first text box ctl00$MainContent$cboDenominacionSocial.
Sub Macro1()
'
' Macro1 Macro
'
' Declare variables
Dim xmlhttp As New MSXML2.XMLHTTP60
Dim urlMF As String
'
urlMF = "https://www.smv.gob.pe/Frm_EVCP?data=5A959494701B26421F184C081CACF55BFA328E8EBC"
'
'
xmlhttp.Open "GET", urlMF, False
'xmlhttp.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3842.0 Safari/537.36"
'xmlhttp.setRequestHeader "Content-type", "application/x-www-form-urlencoded"
xmlhttp.send
Dim bodySMV As New HTMLDocument
bodySMV.body.innerHTML = xmlhttp.responseText
Dim topicsSMV As Object
Dim topicElem As Object
Set topicsSMV = bodySMV.getElementsByClassName("aspNetHidden")
Dim postReq As String
postReq = ""
i = 1
For Each topic In topicsSMV
Set topicElem = topic.getElementsByTagName("input")
For Each dataTopic In topicElem
Cells(i, 1) = dataTopic.Name
Cells(i, 2) = dataTopic.Value
temp = dataTopic.Name & "=" & dataTopic.Value
If i = 1 Then postReq = "ctl00%24ctl18=ctl00%24MainContent%24UpdatePanel1%7Cctl00%24MainContent%24cboDenominacionSocial"
If i > 1 Then postReq = postReq & Chr(38) & temp
i = i + 1
Next dataTopic
Next topic
postReq = postReq & "ctl00%24MainContent%24cboDenominacionSocial=156429&__ASYNCPOST=true&"
Cells(i, 1).Value = postReq
xmlhttp.Open "POST", urlMF, False
xmlhttp.send postReq
bodySMV.body.innerHTML = xmlhttp.responseText
'
End Sub
I'd like to get all the list of possible elements from the second text box, depending on the selection of the first box. What am I missing in my POST request?