Scraping data from a specific table - excel

I'm trying to scrape each of the symbol codes and names from here (about 1/4 of the way down the page): https://uk.finance.yahoo.com/quote/MSFT?p=MSFT&.tsrc=fin-srch
If I inspect the HTML of the first row with the symbol AAPL I am given the following
<tr class="Va(t) Bdc($seperatorColor) TapHc(h) Fw(500) Bgc($hoverBgColor):h H(44px) BdT"</tr>
So in my VBA I navigate to the webpage by creating an internetexplorer object and then the first piece of code to actually begin the scraping is the following:
Dim allRowOfData As Variant
Set allRowOfData = appIE.document.getElementsByClassName("Va(t)")
Dim myValue As String
myValue = allRowOfData.Cells(1).innerHTML
If I look in the immediate window I am then presented with so many HTMLElements (20) plus all of their children that I have no idea where to begin, to be able to get the data that I want.
Is there an easier way to do this?
Also, how do we know what to put in the getElementsByClassName? Initially I had the entire string after the <tr class= and this returned nothing at all.

Related

Scrape Data From A HTML Table [duplicate]

This question already has an answer here:
Scrape table from website
(1 answer)
Closed 2 years ago.
I am really struggling in trying to pull some data of a web table. I have scraped web data in the past but never from a table and can not work it out
I have tried several variations but nothing seems to work, I have channged the class several times and the child node number to reflect each items, however I can not extract anything from the table
Q) Can someone advise on the table class and how to extract from a td
I have read several posts on this forum and other forums on scraping from a table, however none helped, hence the post
''''Data 1
On Error Resume Next
If doc.getElementsByClassName("content")(0).getElementsByTagName("td").Children(0) Is Nothing Then
wsSheet.Cells(StartRow + myCounter, 1).Value = "-"
Else
On Error Resume Next
wsSheet.Cells(StartRow + myCounter, 1).Value = doc.getElementsByClassName("content")(0).getElementsByTagName("td").Children(0).innerText
End If
I have tried the following Variations
doc.getElementsByClassName("content")(0)
doc.getElementsByClassName("content")(0)).Children(0)
doc.getElementsByClassName("content")(0).getElementsByTagName("th").getElementsByTagName("td").Children(0)
doc.getElementsByClassName("content")(0).getElementsByTagName("td").Children(0)
This is an image of the html, I tried to put in the html code, but could not get it to look right
As always thanks in advance
First an advice: Split those statements into pieces and save the result into intermediate variables.
Then an observation: The <td>-tags have no children, so children(0) will return Nothing (the <th> on that page has a child, the <span>-tag) . You probably want to read the content of the cell, you can do this with the property InnerHtml.
Remove the On Error Resume Next-statement. As long as you are developing your routine, let the code run into errors so you can easily debug and see the place where the code fails. And once you are ready, it's better to check for errors by yourself.
Not sure if the following fits, but it should give you the idea:
' Fetch the "Content"-DIV
Dim content As Object
Set content = HtmlDoc.getElementsByClassName("content")(0)
' Fetch the first table with that div
Dim table As Object
Set table = content.getElementsByTagName("table")(0)
' Loop over all <td>-Tags and print the content
Dim td As Object
For Each td In table.getElementsByTagName("td")
Debug.Print td.innerHTML
If td.Children.Length > 0 Then
' If <td> has children, fetch the first child and show the content
Dim child As Object
Set child = td.Children(0)
Debug.Print " We found a child: " & child.tagName, child.innerHTML
End If
Next
When you debug the code, remember to use the "Locals Window" of the VBA (View->Locals Window). There you can inspect all the details of the objects.

GetAttribute in Selenium VBA for style

I am working on selenium in VBA and I have stored a variable"post" to store all the occurrences of a specific element like that
Dim post As Object
Set post = .FindElementsByCss("#DetailSection1")
Dim i As Long
For i = 1 To post.Count
Debug.Print post.Item(i).getAttribute("style")
Next i
I need to extract the style value from the elements
<div id="DetailSection1" style="z-index:3;clip:rect(0px,746px,32px,0px);top:228px;left:0px;width:746px;height:32px;">
</div>
Also I need to print in the immediate window the innerHTML and when I used getAttribute("innerHTML"), it doesn't work for me
Any ideas
getAttribute("style") should work but you have to induce a waiter for the element to be present/visible within the HTML DOM.
Debug.Print post.Item(i).getAttribute("style")
Precisely, to extract value of the style attributes from the elements you can use the getCssValue() method as follows:
Debug.Print post.Item(i).getCssValue("z-index")
Debug.Print post.Item(i).getCssValue("top")
Debug.Print post.Item(i).getCssValue("left")
Debug.Print post.Item(i).getCssValue("width")
Debug.Print post.Item(i).getCssValue("height")
getAttribute("innerHTML")
get_attribute("innerHTML") can be used to read the innerHTML or the text within any node / WebElement
You can find a detailed discussion in Difference between text and innerHTML using Selenium
References
You can find a couple of relevant discussions in:
How to get child property value of a element property using selenium webdriver, NUnit and C#
How can I verify text is bold using selenium on an angular website with C#

Scrape a bloomberg website to get market cap into excel [duplicate]

This question already has answers here:
Scraping data from website using vba
(5 answers)
Closed 3 years ago.
I am trying to extract market cap from the website "https://www.bloomberg.com/quote/206:HK"
which is 1.059B in this case.
I would like to extract the market cap value into an excel column for a list of bloomberg tickers. I would like to do this in VBA and unfortunately not sure where to start from.
Basically I have a column with all the links to bloomberg. I would like to extract market cap values in a column next to it
You ca do that with the code below. I use two steps to get the value. One can guess it works also over the css class value__b93f12ea. But the class name include a hex value and I know that this is often the case when such identifiers are dynamically generated.
Sub ScrapMarketCap()
Dim browser As Object
Dim url As String
Dim nodeMarketCapAll As Object
Dim nodeMarketCap As Object
url = "https://www.bloomberg.com/quote/206:HK"
'Initialize Internet Explorer, set visibility,
'Call URL and wait until page is fully loaded
Set browser = CreateObject("internetexplorer.application")
browser.Visible = True
browser.navigate url
Do Until browser.ReadyState = 4: DoEvents: Loop
'Get all html elements withh the css class "dataBox marketcap numeric"
'in a node collection and get the first one by index (0)
'There will be only one element with this class. But we still need to
'specify the index, because we need the specific element from the node list
'
'We want this html in our dom object
'<section class="dataBox marketcap numeric">
' <header class="title__49417cb9"><span>Market Cap</span></header>
' <div class="value__b93f12ea">1.074B</div>
'</section>
Set nodeMarketCapAll = browser.document.getElementsByClassName("dataBox marketcap numeric")(0)
If Not nodeMarketCapAll Is Nothing Then
'If we got the element
'We take the value of the market cap from the first div tag
Set nodeMarketCap = nodeMarketCapAll.getElementsByTagName("div")(0)
If Not nodeMarketCap Is Nothing Then
'If we got the div
'We take the value from it
MsgBox Trim(nodeMarketCap.innertext)
End If
End If
End Sub

VBA getelementsbytagname issue

Good morning,
I'm attempting to extract HTML table information and collate results on en excel spreadsheet.
I'm using the getelementsbytagname("table")(0) function to extract the HTML table info, which has worked well. Can someone please tell me what is the significance of the (0) after the table?
Also, I have an instance where an opened webpage does not have any table information to process (I don't know this until the page is opened), this leads to an error in my code as I try to initialize my data array to the table dimensions. Is there a way of extracting a result from getelementsbytagname("table")(0), I've tried:-
If (iDom.getelementsbytagname("table")(0) = 0) Then
but this returns a run time error.
Many thanks in advance for your help.
First add reference to Microsoft Internet Controls (SHDocVw) and to Microsoft HTML Object Library:
Then the Object Explorer is your friend:
So getElementsByTagName returns IHTMLElementCollection which has property length. When on the page some elements with specific tag name are found then length is greater then zero. HTH
Dim tables As IHTMLElementCollection
Set tables = doc.getElementsByTagName("table")
If tables.Length > 0 Then
Dim table As HTMLTable
Set table = tables.item(0)
' Because item is the default property of IHTMLElementCollection we can simplyfy
Set table = tables(0) ' this is the same as tables.item(0)
End If
In VBA the appended (0) refers to the first element of an array (assuming an Option Base 0). Here is a short example:
vArr = Array("element 1", "element 2", "element 3")
Debug.Print v(1)
The above code should return element 2 as the second element of a zero-based array.
So, getelementsbytagname("table")(0) refers to the first element of that table. Yet, if the "table" is not found then there is no array to get from that table and getting the first element from that array (by appending (0)) yields an error.
Instead you should test if there is actually a table by that name (before trying to access the array of elements within that table) like so:
If (iDom.getelementsbytagname("table") = 0) Then

Use excel vba to Input value on an .asp page

I'm writing VBA code in Excel to generate various reports.
Once it's done I would really like (and I would be looked at like a hero by my fellow co-worker) to input my results directly on our corporate intranet.
So I've started educating myself on how to use VBA to interact with Internet Explorer. I know get the basics so I can do cool stuff (but unusefull in this case ) like loading a web site. But when I try to input values in a text box on this page on the Intranet, I can't go any where.
I'm suspecting that the problem is caused by the fact that the adress I'm accessing is ending with .asp extension.
Here's the code I'm using below
Beware that I will most probably have other questions following this first one. You might just become my new-web-geek-bestfrient ;-)
Sub interaction()
'Variables declaration
Dim IE As New InternetExplorer
Dim IEDoc As HTMLDocument
Dim ZoneMotsClés As HTMLInputElement
'page to be loaded, it's on a corporate intranet
IE.navigate "http://intranet.cima.ca/fr/application/paq/projets/index.asp"
IE.Visible = True
Do ' Wait till the Browser is loaded
Loop Until IE.readyState = READYSTATE_COMPLETE
Set IEDoc = IE.document
'this is the text zone that I'm trying to input value into
Set ZoneMotsClés = IEDoc.getElementById("txtMotCle")
'this is where it crashes. At this point I'm only trying to enter a project number into
'the "txtMotCle" 'text zone
ZoneMotsClés.Value = "Q141763B"
'.....
Set IE = Nothing
Set IEDoc = Nothing
End Sub
So at this point (when I try to input the value in the text box I get a:
error 91 object variable or with block variable not set
and here's the html code of the section on the page I'm trying to write in.
<INPUT onfocus="javascript:document.frmMyForm.TypeRecherche.value='simple';"
style="FONT-SIZE: 9px; FONT-FAMILY: verdana" maxLength=250 size=60 name=txtMotCle>
This time I tried the suggestions of the 2 contributors (Tx Jeeped and Tim Williams) but still getting the same error 91.
Now I tried that modification (tx SeardAndResQ)
'this is the text zone that I'm trying to input into
'Set ZoneMotsClés = IEDoc.all("txtMotCle")
ID = "txtMotCle"
Set ZoneMotsClés = IEDoc.getElementById(ID)
'this is where it crashes. At this point I'm only trying to enter a project number into the "txtMotCle"
'text zone
ZoneMotsClés.Value = "Q141763B"
Same result. I'm not sure I made it the way #searchAndResQ meant it

Resources