How to get all contents from a webpage as text in VBA? - excel

I am trying to bring into vba the data displayed under the -Dividend Summary- title of this webpage:
https://seekingalpha.com/symbol/ABBV/dividends/scorecard
By running this line of code in the Google Chrome console I managed to get the info, so I am trying to replicate this in VBA.
document.querySelectorAll("div [data-test-id='dynamic-tooltips-area']")[1].innerText
The VBA code I have written is this:
Public Sub Stackoverflow_Question()
Dim sResponse As String, i As Long, Html As New HTMLDocument
Dim oSelectors As MSHTML.IHTMLDOMChildrenCollection 'Object
'Get response from webpage
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://seekingalpha.com/symbol/ABBV/dividends/scorecard", False
.send
sResponse = .responseText
End With
'Write and read HTML
With Html
.body.innerHTML = sResponse
Set oSelectors = .querySelectorAll("div [data-test-id='dynamic-tooltips-area']")
End With
'Print obtained data -this produces no result-
'I loop only to be sure that I don't request the wrong index
For i = 0 To oSelectors.Length - 1
Debug.Print oSelectors(i).innerText
Next i
'Auxiliary: Create txt with response text
Dim FilePath As String
Dim TextFile As Integer
FilePath = Application.ActiveWorkbook.Path & "\HTML_ResponseText.txt"
TextFile = FreeFile
Open FilePath For Output As TextFile
Print #TextFile, sResponse
Close TextFile
End Sub
Not only do I get no result from the debug.print, but a simple search in the responseText string shows that contents from the webpage such as the header "Div Yield (FWD)" or the values (4.99% as of today) are not there.
Why is the .ResponseText not working as I expect?
Is there and alternative way to retrieve the webpage content as text?
Thank you in advance

As pointed out by Tim Williams, loading a webpage usually takes more than one request.
In Google Chrome, pressing F12 and going to the "Network" tab shows all of them, as well as their respective Request URL's.
Looking for the wanted one looks like a trial and error task, but it all seems to be there.
In my case, the URL I was interested in is:
https://seekingalpha.com/api/v3/symbol_data?fields[]=divYieldFwd&fields[]=divRate&fields[]=payoutRatio&fields[]=divGrowRate5&fields[]=dividendGrowth&fields[]=divDistribution&fields[]=dividends&slugs=ABBV

Related

Converting weird characters and symbols into normal language in excel

I am using the VBA code to extract information from a website into excel cells, and the numerical information is fine but I have a problem with text strings. I am mostly extracting information from Georgian websites, and the texts with the Georgian language are not properly displayed in excel, so I was wondering if there is any chance (code or something else) I could convert these symbols into proper language.
Sub GetData()
Dim request As Object
Dim response As String
Dim html As New HTMLDocument
Dim website As String
Dim price As Variant
Dim address As Variant
Dim x As Integer
Dim y As Range
x = 1
Do Until x = 9
Set y = Worksheets(1).Range("A21:A200"). _
Find(x, LookIn:=xlValues, lookat:=xlWhole)
website = "https://www.myhome.ge/ka/pr/11247371/iyideba-Zveli-ashenebuli-bina-veraze-T.-WoveliZis-qucha"
' Create the object that will make the webpage request.
Set request = CreateObject("MSXML2.XMLHTTP")
' Where to go and how to go there.
request.Open "GET", website, False
' Get fresh data.
request.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
' Send the request for the webpage.
request.send
' Get the webpage response data into a variable.
response = StrConv(request.responseBody, vbUnicode)
' Put the webpage into an html object.
html.body.innerHTML = response
' Get info from the specified element on the page.
address = html.getElementsByClassName("address").Item(0).innerText
price = html.getElementsByClassName("d-block convertable").Item(0).innerText
y.Offset(0, 1).Value = address
y.Offset(0, 5).Value = price
x = x + 1
Loop
End Sub
This is the code that I took from a youtube video (https://www.youtube.com/watch?v=IOzHacoP-u4) and slightly modified, and it works, I just have a problem with how excel displays the characters in text strings.
For your issue in the question
Remove this line response = StrConv(request.responseBody, vbUnicode) as it's not required.
Change html.body.innerHTML = response to html.body.innerHTML = request.responseText.
For your issue in comment
To retrieve the ID of the property, it can be retrieved from the class id-container, you will need to perform some string processing though to remove the extract :
propertyID = Trim$(Replace(html.getElementsByClassName("id-container")(0).innerText, ":", vbNullString))
Note: You should try to avoid declaring variable as Variant. innerText property returns a String datatype so you should declare address and price as String.

Web Scraping - StockCharts - getElementsByTagName ("a")

I am trying to get the inner text and href attribute of the column Name at this website:
https://stockcharts.com/freecharts/sectorsummary.html?&G=SECTOR_DJUSNS&O=1
but I get all hyperlinks except the ones inside the table.
Can somebody please take a look at this code and let me know what is wrong?
Sub Scraping_StockCharts()
Dim XMLPage As New MSXML2.XMLHTTP60
Dim HTMLDoc As New MSHTML.HTMLDocument
Dim HTMLIm As MSHTML.IHTMLElement
Dim HTMLIms As MSHTML.IHTMLElementCollection
Dim URL As String
URL = "https://stockcharts.com/freecharts/sectorsummary.html?&G=SECTOR_DJUSNS&O=1"
XMLPage.Open "Get", URL, False
XMLPage.setRequestHeader "Content-Type", "text/xml"
XMLPage.send
HTMLDoc.body.innerHTML = XMLPage.responseText
Row = 1
Set HTMLIms = HTMLDoc.getElementsByTagName("a")
For Each HTMLIm In HTMLIms
Sheets("Results").Cells(Row, 2).Value = HTMLIm.innerText
Sheets("Results").Cells(Row, 3).Value = HTMLIm.getAttribute("href")
Row = Row + 1
Next HTMLIm
End Sub
Scraping is not allowed via xmlhttp. Not sure about automating a browser. You will need to read terms of service carefully. With browser automation you could just use the url you have I suspect.
From an intellectual point of view the data can be got from https://c.stockcharts.com/j-sum/sum?cmd=perf&group=SECTOR_DJUSNS which will return JSON. From that json you can reconstruct the url by accessing the sym value for each dictionary in the list of dictionaries returned. Concatenate that sym value onto the end of base string of https://stockcharts.com/h-sc/ui?s=
e.g. for first dictionary in list
https://stockcharts.com/h-sc/ui?s= + sym
gives
https://stockcharts.com/h-sc/ui?s=TKAT
Basically, the server expects a query string and returns json. The page uses this to update content. This can be viewed in network tab of browser when refreshing page.
You might be better off looking for a free API that serves similar data.

Print inner text under "id"

Any one of three highlighted part are the value i want to print. I am trying below code
Sub JJ()
Dim IE As New SHDocVw.InternetExplorer
Dim hdoc As MSHTML.HTMLDocument
Dim ha As String
IE.Visible = True
IE.navigate "https://www.nseindia.com/get-quotes/equity?symbol=DIVISLAB"
Do While IE.readyState <> READYSTATE_COMPLETE
Loop
Set hdoc = IE.document
ha = hdoc.getElementById("preOpenFp").innerText
Debug.Print ha
End Sub
But the output is nothing pls help.
The website you're trying to scrape offers a very convenient way to do it. All you need to do is send an HTTP request and get the corresponding JSON response which looks like so:
If you take a look at the network traffic in your browser's developer tools, you'll see the requests that are being sent to the server when the page is being loaded. Among these requests you'll find the following one:
To send this request and get the info you need, you have to do the following:
Option Explicit
Sub nse()
Dim req As New MSXML2.XMLHTTP60
Dim url As String
Dim json As Object
url = "https://www.nseindia.com/api/quote-equity?symbol=DIVISLAB"
With req
.Open "GET", url, False
.send
Set json = JsonConverter.ParseJson(.responseText)
End With
Debug.Print json("preOpenMarket")("IEP")
End Sub
This will print the value of IEP to your immediate window (in this case 2390). You can modify the code to best fit your needs.
To parse a JSON string you will need to add this to your project. Follow the installation instructions in the link and you should be set to go.
You will also need to add the following references to your project (VBE>Tools>References):
Microsoft XML version 6.0
Microsoft Scripting Runtime

Export HTML to text file with different results

I have two codes .. that are supposed to export the html file to text file
Sub Demo1()
Dim http As New XMLHTTP60
Dim html As New HTMLDocument
With http
.Open "GET", "https://www.google.com.eg/", False
.send
html.body.innerHTML = .responseText
WriteTxtFile html.body.innerHTML
End With
End Sub
Sub WriteTxtFile(ByVal aString As String, Optional ByVal filePath As String = "C:\Users\Future\Desktop\Output.txt")
Dim fso As Object
Dim fileout As Object
Set fso = CreateObject("Scripting.FileSystemObject")
Set fileout = fso.CreateTextFile(filePath, True, True)
fileout.write aString
fileout.Close
End Sub
Sub Demo2()
Dim ie As Object
Dim f As Integer
Set ie = CreateObject("InternetExplorer.Application")
With ie
.Visible = True
.navigate ("https://www.google.com.eg/")
Do: DoEvents: Loop Until .readyState = 4
f = FreeFile()
Open ThisWorkbook.Path & "\Sample.txt" For Output As #f
Print #f, .document.body.innerHTML
Close #f
.Quit
End With
End Sub
Both Demo1 and Demo2 are the codes .. and they resulted in "Sample.txt" and "Output.txt"
But I found those html documents are different results
Can you help me to clarify what is the right one .. and why they are different?
Thanks advanced for help
Xmlhttp does not provide all the rendered content of a webpage. Particularly anything rendered via JavaScript execution. Any scripts are not executed.
Internet Explorer on the other hand will render the page (provided the browser version and JavaScript syntax is supported. For example, you will run into problems with the ec6 - latest Ecmascript as this is not supported on legacy browsers. It is I believe on Edge for Windows 10. You can check compatibility tables to see what is and isn’t supported ) fully.
If you familiarize yourself with dev tools for your browser you can explore how different parts of a webpage are rendered. You can learn to debug scripts and see what changes are made to the DOM and page styling. Often a page will issue XHR requests to update content on a page for example. If you want to have a play look here.
So, I suspect that the first html document may have less content and a different overall DOM structure from the second on this basis.
To test for differences due to writing to text file methodology you need to compare Apples with Apples i.e use the same scraping access method and syntax to retrieve the page content before writing out.
Please provide the differences if you want a deeper explanation.
Exploring page updating:
Firefox Network Tab
Internet Explorer Network Inspector
Chrome Network Tab

MSXML2.XMLHTTP broken yahoo ticker request

I have a nice VBA script that allowed me to download stock ticker information from Yahoo into Excel.
Yahoo have recently changed their web interface and the download commands that used to work have now ceased to do so.
The previous commands had the following structure:-
"http://chart.finance.yahoo.com/table.csv?s=TSCO.L&a=5&b=09&c=2016&d=5&e=26&f=2016&g=d&ignore=.csv"
where the new commands look like:-
"https://query1.finance.yahoo.com/v7/finance/download/TSCO.L?period1=1465456200&period2=1466925000&interval=1d&events=history&crumb=Ns6veY6jrcA"
period1 & period2 are epoch representations of the date, and the 'crumb' I believe is unique to each machine that sends a download request to the yahoo server.
I think this remains the same for a reasonable period of time so it doesn't have to be changed.
If I paste the https request into a browser it works. However the routine that I used to check for the data's existence and to subsequently download the data no longer work.
This is a shortened version of the overall code, if anyone is so good as to try it out, you will probably have to replace my crumb value with your crumb value, this can be found if you follow this link:-
https://uk.finance.yahoo.com/quote/TSCO.L/history?p=TSCO.L and hover your mouse over the 'Download Data' link.
`Function IsResourceAvailable(strUrl As String) As Boolean
Dim objXhr As Object 'MSXML2.XMLHTTP60
Dim strStatus As String
Set objXhr = CreateObject("MSXML2.XMLHTTP") 'New XMLHTTP60
With objXhr
.Open "GET", strUrl, False
.send
strStatus = .Status
End With
'HTTP response of 200 = OK
IsResourceAvailable = (strStatus = "200")
End Function'
'Sub dloadDebug()
Dim strUrl As String
Dim blnAvailable As Boolean
strUrl = "https://query1.finance.yahoo.com/v7/finance/download/TSCO.L?period1=1465456200&period2=1466925000&interval=1d&events=history&crumb=Ns6veY6jrcA"
blnAvailable = IsResourceAvailable(strUrl)
Workbooks.Open Filename:=(strUrl)
End Sub`
Neither of the above now work, can anybody point me in the right direction please?
many thanks
GLW

Resources