I am trying to get the inner text and href attribute of the column Name at this website:
https://stockcharts.com/freecharts/sectorsummary.html?&G=SECTOR_DJUSNS&O=1
but I get all hyperlinks except the ones inside the table.
Can somebody please take a look at this code and let me know what is wrong?
Sub Scraping_StockCharts()
Dim XMLPage As New MSXML2.XMLHTTP60
Dim HTMLDoc As New MSHTML.HTMLDocument
Dim HTMLIm As MSHTML.IHTMLElement
Dim HTMLIms As MSHTML.IHTMLElementCollection
Dim URL As String
URL = "https://stockcharts.com/freecharts/sectorsummary.html?&G=SECTOR_DJUSNS&O=1"
XMLPage.Open "Get", URL, False
XMLPage.setRequestHeader "Content-Type", "text/xml"
XMLPage.send
HTMLDoc.body.innerHTML = XMLPage.responseText
Row = 1
Set HTMLIms = HTMLDoc.getElementsByTagName("a")
For Each HTMLIm In HTMLIms
Sheets("Results").Cells(Row, 2).Value = HTMLIm.innerText
Sheets("Results").Cells(Row, 3).Value = HTMLIm.getAttribute("href")
Row = Row + 1
Next HTMLIm
End Sub
Scraping is not allowed via xmlhttp. Not sure about automating a browser. You will need to read terms of service carefully. With browser automation you could just use the url you have I suspect.
From an intellectual point of view the data can be got from https://c.stockcharts.com/j-sum/sum?cmd=perf&group=SECTOR_DJUSNS which will return JSON. From that json you can reconstruct the url by accessing the sym value for each dictionary in the list of dictionaries returned. Concatenate that sym value onto the end of base string of https://stockcharts.com/h-sc/ui?s=
e.g. for first dictionary in list
https://stockcharts.com/h-sc/ui?s= + sym
gives
https://stockcharts.com/h-sc/ui?s=TKAT
Basically, the server expects a query string and returns json. The page uses this to update content. This can be viewed in network tab of browser when refreshing page.
You might be better off looking for a free API that serves similar data.
Related
I am using the VBA code to extract information from a website into excel cells, and the numerical information is fine but I have a problem with text strings. I am mostly extracting information from Georgian websites, and the texts with the Georgian language are not properly displayed in excel, so I was wondering if there is any chance (code or something else) I could convert these symbols into proper language.
Sub GetData()
Dim request As Object
Dim response As String
Dim html As New HTMLDocument
Dim website As String
Dim price As Variant
Dim address As Variant
Dim x As Integer
Dim y As Range
x = 1
Do Until x = 9
Set y = Worksheets(1).Range("A21:A200"). _
Find(x, LookIn:=xlValues, lookat:=xlWhole)
website = "https://www.myhome.ge/ka/pr/11247371/iyideba-Zveli-ashenebuli-bina-veraze-T.-WoveliZis-qucha"
' Create the object that will make the webpage request.
Set request = CreateObject("MSXML2.XMLHTTP")
' Where to go and how to go there.
request.Open "GET", website, False
' Get fresh data.
request.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
' Send the request for the webpage.
request.send
' Get the webpage response data into a variable.
response = StrConv(request.responseBody, vbUnicode)
' Put the webpage into an html object.
html.body.innerHTML = response
' Get info from the specified element on the page.
address = html.getElementsByClassName("address").Item(0).innerText
price = html.getElementsByClassName("d-block convertable").Item(0).innerText
y.Offset(0, 1).Value = address
y.Offset(0, 5).Value = price
x = x + 1
Loop
End Sub
This is the code that I took from a youtube video (https://www.youtube.com/watch?v=IOzHacoP-u4) and slightly modified, and it works, I just have a problem with how excel displays the characters in text strings.
For your issue in the question
Remove this line response = StrConv(request.responseBody, vbUnicode) as it's not required.
Change html.body.innerHTML = response to html.body.innerHTML = request.responseText.
For your issue in comment
To retrieve the ID of the property, it can be retrieved from the class id-container, you will need to perform some string processing though to remove the extract :
propertyID = Trim$(Replace(html.getElementsByClassName("id-container")(0).innerText, ":", vbNullString))
Note: You should try to avoid declaring variable as Variant. innerText property returns a String datatype so you should declare address and price as String.
I'm trying to extract the stats table from This page into Excel so that I can refresh automatically. I tried navigating through the Get External Data process, but where the DIV with the table should be, it shows as null instead of as a Table element. I suspect it's how the site is coded, as looking at the source code, that table is generated from a script, even though it just looks like standard table nomenclature as a table element in the actual display:
So I tried some macros for this purpose, such as the following:
Sub Extract_data()
Dim url As String, links_count As Integer
Dim i As Integer, j As Integer, row As Integer
Dim XMLHTTP As Object, html As Object
Dim tr_coll As Object, tr As Object
Dim td_coll As Object, td As Object
links_count = 39
For i = 0 To links_count
url = "http://www.admision.unmsm.edu.pe/res20130914/A/011/" & i & ".html"
Set XMLHTTP = CreateObject("MSXML2.XMLHTTP")
XMLHTTP.Open "GET", url, False
XMLHTTP.send
Set html = CreateObject("htmlfile")
html.body.innerHTML = XMLHTTP.ResponseText
Set tbl = html.getelementsbytagname("Table")
Set tr_coll = tbl(0).getelementsbytagname("TR")
For Each tr In tr_coll
j = 1
Set td_col = tr.getelementsbytagname("TD")
For Each td In td_col
Cells(row + 1, j).Value = td.innerText
j = j + 1
Next
row = row + 1
Next
Next
End Sub
However, that only pulls the same tables I have access to already with Get External Data. It doesn't seem they can 'see' the table on the page. Is there a way to VBA code such that it will pull the actual table on the page, instead of just checking the page source code?
On a sidenote, you can see from the screenshot code above that there is a CSV export link on the page. However, because this is generated with JS, it just shows up as a huge string of characters beginning with data:application and not an actual link that can be refreshed (and I suspect it could not be anyway since the characters likely change as the table parameters do). It does have a download attribute attached for the filename, is there a way to work backwards from that attribute to get Excel to find the file?
I'll take any method I can get. Thanks!
I am trying to scrape the pickup branch locations from a car rental website home page. The idea is to see exactly where pickup branches exist for a given company.
I have successfully created this before but this company revamped their website recently and now my code doesn't work. The branch locations seem to be hidden within a form of some sort, the locations only become visible in the html once you click on the pickup location space.
My current code looks as below:
Option Explicit
Private Sub pickuplocations()
Dim html As Object
Dim ws As Worksheet
Dim headers()
Dim i As Long
Dim r As Long
Dim c As Long
Dim numrows As Long
Set ws = ThisWorkbook.Worksheets("Europcar Branches(2)")
Set html = New HTMLDocument
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://www.europcar.co.za", False
.send
html.body.innerHTML = .responseText 'fetches all html from the website
Dim pickupbranches As Object
Dim pickupbranchresults()
Set pickupbranches = html.getElementById("_location-search-widget_15").getElementsByTagName("span")
headers = Array("Pickup Location", "Option value") 'for the ws
numrows = pickupbranches.Length - 1 'sets the row length
ReDim pickupbranchresults(1 To numrows, 1 To 2) 'sets array size for the results
For i = 1 To numrows
pickupbranchresults(i, 1) = pickupbranches.Item(i).innerText
pickupbranchresults(i, 2) = pickupbranches.Item(i).Value
Next
With ws
.Cells(1, 1).Resize(1, UBound(headers) + 1) = headers 'sets the column headers
.Cells(2, 1).Resize(UBound(pickupbranchresults, 1), UBound(pickupbranchresults, 2)) = pickupbranchresults
End With
End With
End Sub
Your current code requests the source HTML code and attempts to scrape it.
However, as explained in the comments, the list of locations is dynamically loaded when you click on the search bar and it is not a part of the page's source HTML. For this reason your code will yield no results.
It makes more sense to scrape the page dedicated to the locations:
https://www.europcar.co.za/rental-locations/
Now, if you navigate to this page and inspect the network traffic in your browser's developer tools (F12) when the page is loaded, you will see that an XHR request is being sent. It looks like so:
If you go through the Headers and the Params of the request you will see how the url, the body and the headers should look like. In this particular case, there are no parameters and the headers are not essential to the success of the request, so all you need is the url.
The response's payload is in json format. You can inspect its structure using a tool like this. Here's how it looks like:
Basically, the JSON consists of the different countries, each country consists of provinces and each province consists of the corresponding branches. Each branch consists of all the corresponding info.
To parse a response like that you need a JSON parser (look at the end of this post).
TL;DR
Here's how the code should look like:
Option Explicit
Sub getLocations()
Dim req As New WinHttpRequest
Dim url As String, results() As String
Dim sht As Worksheet
Dim responseJSON As Object, country As Object, province As Object, branch As Object
Dim i As Long
Dim rng As Range
Set sht = ThisWorkbook.Worksheets("Name of your Worksheet")
url = "https://www.europcar.co.za/api/rentalLocations/impressLocations"
With req
.Open "GET", url, False
.send
Set responseJSON = JsonConverter.ParseJson(.responseText)
End With
For Each country In responseJSON
For Each province In country("provinces")
i = 0
ReDim results(1 To province("branches").Count, 1 To 5)
For Each branch In province("branches")
i = i + 1
results(i, 1) = country("name")
results(i, 2) = province("name")
results(i, 3) = branch("name")
results(i, 4) = branch("emailAddress")
results(i, 5) = branch("contactNumber")
Next branch
With sht
Set rng = .Range("A" & .Rows.Count).End(xlUp).Offset(1, 0)
End With
rng.Resize(UBound(results, 1), UBound(results, 2)) = results
Next province
Next country
End Sub
For demonstration purposes the code above prints out the results in the following way:
Having in mind the JSON structure and the sample code I provided, you can easily modify it to fit your needs.
For the code to work you will need to add the following references to your project (VBE>Tools>References):
1. Microsoft WinHTTP Services version 5.1
2. Microsoft Scripting Runtime
You will also need to add this JSON parser to your project. Follow the installation instructions in the link and you should be set to go.
I hope this question is not in other post since I have searched and not found an answer. I'm also quite new to programming but specially to scraping the web. If you guys know of any good, complete tutorial, I'll appreciate if you can direct me to it. I work with VBA and Python.
I begun working after reading this: Scraping data from website using vba
Very helpful, by the way. I understood the old method better so I chose that one.
The site I want to search in is: http://www.bcra.gob.ar/PublicacionesEstadisticas/Principales_variables.asp
The code I've written so far:
Sub Test()
Dim ie As Object
Dim form As Variant, button As Variant
Set ie = CreateObject("InternetExplorer.Application")
Dim TR_col As Object, TR As Object
Dim TD_col As Object, TD As Object
Dim xx As Object, x As Object
With ie
.Visible = True '< Show browser window
.navigate ("http://www.bcra.gob.ar/PublicacionesEstadisticas/Principales_variables.asp") '> Travel to homepage
Do While ie.Busy
DoEvents
Loop '< Wait for page to have loaded
End With
Set TR_col = ie.Document.getElementsByTagName("TR")
For Each TR In TR_col
Set xx = ie.Document.getElementsByTagName("a")
If xx = "Base Monetaria - Promedio acumulado del mes (MM de $)" Then
Cells(1, 1) = "Ok"
End If
Next TR
End Sub
Lastly, this is the Inspector looks like:
[![enter image description here][1]][1]
[1]: https://i.stack.imgur.com/YoG4H.png
I also highlighted the piece of information I'm using for testing purposes.
So, my approach is to search for all the "tr" tags and then validate whether the first column of the table (I guess this would be the first "td" tag) is equal to a text I'll have in a cell (in this case I just wrote in text for testing purposes). The result should be copying the number next to the date to a cell in the worksheet. In this case I wrote "Ok" just to see whether the if statement was working. But it isn't.
I guess I'm not sure how tell VBA to search for all "tr" tags, search for all the "td" tags within each "tr", find the one that matches some text, and return the 3rd "td" tag within that "tr". Makes sense?
Hope I've been specific enough and that someone can guide me.
It's not necessary to load whole browser to get HTML - you can do without it.
Sub Test()
'// References required:
'// 1) Microsoft HTML Object Library
'// 2) Microsoft XML, v6.0
Dim req As MSXML2.XMLHTTP60
Dim doc As MSHTML.HTMLDocument
Dim tbl As MSHTML.HTMLTable
Dim tblRow As MSHTML.HTMLTableRow
Dim tblCell As MSHTML.HTMLTableCell
Dim anch As MSHTML.HTMLAnchorElement
Dim html$, url$, sText$, fecha$, valor$, j%
Set req = New MSXML2.XMLHTTP60
url = "http://www.bcra.gob.ar/PublicacionesEstadisticas/Principales_variables.asp"
Set req = New MSXML2.XMLHTTP60
req.Open "GET", url, False
req.send
html = req.responseText
Set doc = New MSHTML.HTMLDocument
doc.body.innerHTML = html
Set tbl = doc.getElementsByClassName("table-BCRA")(, 0)
For j = 1 To tbl.Rows.Length - 1
With tbl.Rows(j)
'// Skip cells without data.
'// Assume correct data has three cells.
If .Cells.Length = 3 Then
Set anch = .Cells(0)
sText = anch.textContent
If sText = "Base Monetaria - Promedio acumulado del mes (MM de $)" Then
fecha = .Cells(1).innerText
valor = .Cells(2).innerText
End If
End If
End With
Next
End Sub
I want to know if I can type and click on a web page using XMLHTTP. For example I would like to go to Yahoo Finance and type something in the search bar and hit submit.
I would not like to use internet explorer, instead to use XMLHTTP. And I am not looking to just concatenate URLs. Here is what I have:
Dim XHTTP As New MSXML2.XMLHTTP60
Dim HTMLDoc As New MSHTML.HTMLDocument
Dim HTTPCol, HTTPCol2 As MSHTML.IHTMLElementCollection
Dim HTTPEl As MSHTML.IHTMLElement
XHTTP.Open "GET", "https://finance.yahoo.com/", False
XHTTP.send
If XHTTP.Status <> 200 Then
MsgBox "Problem"
Exit Sub
End If
HTMLDoc.body.innerHTML = XHTTP.responseText
Set HTTPCol = HTMLDoc.getElementsByName("yfin-usr-qry")
For Each HTTPEl In HTTPCol
IDTarget.Value = "AAPL"
Next
Set HTTPCol2 = HTMLDoc.getElementsByName("search-button")
Set HTTPCol2(0).Click
'But then the "click" does submit to go to the next page where I can grab data from this page. I would like to, for example, get the stock price of apple on this new page.