This question already has an answer here:
Scrape table from website
(1 answer)
Closed 2 years ago.
I am really struggling in trying to pull some data of a web table. I have scraped web data in the past but never from a table and can not work it out
I have tried several variations but nothing seems to work, I have channged the class several times and the child node number to reflect each items, however I can not extract anything from the table
Q) Can someone advise on the table class and how to extract from a td
I have read several posts on this forum and other forums on scraping from a table, however none helped, hence the post
''''Data 1
On Error Resume Next
If doc.getElementsByClassName("content")(0).getElementsByTagName("td").Children(0) Is Nothing Then
wsSheet.Cells(StartRow + myCounter, 1).Value = "-"
Else
On Error Resume Next
wsSheet.Cells(StartRow + myCounter, 1).Value = doc.getElementsByClassName("content")(0).getElementsByTagName("td").Children(0).innerText
End If
I have tried the following Variations
doc.getElementsByClassName("content")(0)
doc.getElementsByClassName("content")(0)).Children(0)
doc.getElementsByClassName("content")(0).getElementsByTagName("th").getElementsByTagName("td").Children(0)
doc.getElementsByClassName("content")(0).getElementsByTagName("td").Children(0)
This is an image of the html, I tried to put in the html code, but could not get it to look right
As always thanks in advance
First an advice: Split those statements into pieces and save the result into intermediate variables.
Then an observation: The <td>-tags have no children, so children(0) will return Nothing (the <th> on that page has a child, the <span>-tag) . You probably want to read the content of the cell, you can do this with the property InnerHtml.
Remove the On Error Resume Next-statement. As long as you are developing your routine, let the code run into errors so you can easily debug and see the place where the code fails. And once you are ready, it's better to check for errors by yourself.
Not sure if the following fits, but it should give you the idea:
' Fetch the "Content"-DIV
Dim content As Object
Set content = HtmlDoc.getElementsByClassName("content")(0)
' Fetch the first table with that div
Dim table As Object
Set table = content.getElementsByTagName("table")(0)
' Loop over all <td>-Tags and print the content
Dim td As Object
For Each td In table.getElementsByTagName("td")
Debug.Print td.innerHTML
If td.Children.Length > 0 Then
' If <td> has children, fetch the first child and show the content
Dim child As Object
Set child = td.Children(0)
Debug.Print " We found a child: " & child.tagName, child.innerHTML
End If
Next
When you debug the code, remember to use the "Locals Window" of the VBA (View->Locals Window). There you can inspect all the details of the objects.
Related
This question already has answers here:
Scraping data from website using vba
(5 answers)
Closed 3 years ago.
I am trying to extract market cap from the website "https://www.bloomberg.com/quote/206:HK"
which is 1.059B in this case.
I would like to extract the market cap value into an excel column for a list of bloomberg tickers. I would like to do this in VBA and unfortunately not sure where to start from.
Basically I have a column with all the links to bloomberg. I would like to extract market cap values in a column next to it
You ca do that with the code below. I use two steps to get the value. One can guess it works also over the css class value__b93f12ea. But the class name include a hex value and I know that this is often the case when such identifiers are dynamically generated.
Sub ScrapMarketCap()
Dim browser As Object
Dim url As String
Dim nodeMarketCapAll As Object
Dim nodeMarketCap As Object
url = "https://www.bloomberg.com/quote/206:HK"
'Initialize Internet Explorer, set visibility,
'Call URL and wait until page is fully loaded
Set browser = CreateObject("internetexplorer.application")
browser.Visible = True
browser.navigate url
Do Until browser.ReadyState = 4: DoEvents: Loop
'Get all html elements withh the css class "dataBox marketcap numeric"
'in a node collection and get the first one by index (0)
'There will be only one element with this class. But we still need to
'specify the index, because we need the specific element from the node list
'
'We want this html in our dom object
'<section class="dataBox marketcap numeric">
' <header class="title__49417cb9"><span>Market Cap</span></header>
' <div class="value__b93f12ea">1.074B</div>
'</section>
Set nodeMarketCapAll = browser.document.getElementsByClassName("dataBox marketcap numeric")(0)
If Not nodeMarketCapAll Is Nothing Then
'If we got the element
'We take the value of the market cap from the first div tag
Set nodeMarketCap = nodeMarketCapAll.getElementsByTagName("div")(0)
If Not nodeMarketCap Is Nothing Then
'If we got the div
'We take the value from it
MsgBox Trim(nodeMarketCap.innertext)
End If
End If
End Sub
I want to enter data into a web page field.
There are 2 data entry fields on the web page.
I entered data in the first section.
However, I cannot enter data in the other field.
Information you need to review the site :
Site : http://splan.byethost7.com/mesaj_yaz.php?fno=1&kip=yeni
user :kurucu password :a11111
I entered the data in the "BAŞLIK" field.
However I am unable to write data to the field named "İÇERİK"
I want to enter data in this field using an Excel macro. But I can't enter data using the code:
Sub deneme()
Dim URL As String
On Error Resume Next
URL = "http://splan.byethost7.com/mesaj_yaz.php?fno=1&kip=yeni"
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = 1
For i = 1 To Range("A" & Rows.Count).End(3).row
If Cells(i, 1) <> Empty Then
ie.navigate URL
Call bekle
ie.Document.getElementById("mesaj_icerik").Value = "TEST"
ie.document.getElementsByName("mesaj_baslik").Item(0).Value = Cells(i, 1)
'IE.Document.getElementsByClassName("submitButton")(0).Click
Call bekle
End If
Next i
' IE.Quit
Set ie = Nothing
End Sub
Sub bekle()
With ie
Do Until .readyState = 4: DoEvents: Loop
Do While .Busy: DoEvents: Loop
End With
Application.Wait (Now + TimeValue("00:00:02"))
End Sub
As I said in my comments, there are several issues with your code, although the overall effort is good.
Firstly, this ie.document.getElementsid("mesaj_baslik") is not a valid method. If what you want is to access a single HTML element with a unique ID, then the method you need to use is ie.Document.getElementById("the element's ID").
Assuming that what I wrote above is what you were trying to achieve, you have to keep in mind that the .getElementById() method, returns only one single element.
So this ie.Document.getElementById("the element's ID").item(0) would give you an error saying:
Object doesn't support this property or method.
Even if all the aforementioned mistakes were corrected, I still don't see any elements with an ID equal to "mesaj_baslik", in the HTML snippet that you have provided. In fact this particular string is nowhere to be found in the HTML.
So even if the method was correct, this ie.Document.getElementById("mesaj_baslik"), would return Nothing.
Secondly, although your usage of the method ie.document.getElementsByName() is correct, there is no element with a Name attribute being equal to "formlar_mesajyaz", in the HTML snippet you have provided.
In fact this string seems to be a Class name rather than anything else. In this case you would have to use this method: ie.document.getElementsByClassName().
Now, from the info you have provided, the best I can do is assume that, what you want to do is enter some text in the textArea element. To do that, you can use the element's ID like so:
ie.Document.getElementById("mesaj_icerik").Value = "TEST"
Good morning,
I'm attempting to extract HTML table information and collate results on en excel spreadsheet.
I'm using the getelementsbytagname("table")(0) function to extract the HTML table info, which has worked well. Can someone please tell me what is the significance of the (0) after the table?
Also, I have an instance where an opened webpage does not have any table information to process (I don't know this until the page is opened), this leads to an error in my code as I try to initialize my data array to the table dimensions. Is there a way of extracting a result from getelementsbytagname("table")(0), I've tried:-
If (iDom.getelementsbytagname("table")(0) = 0) Then
but this returns a run time error.
Many thanks in advance for your help.
First add reference to Microsoft Internet Controls (SHDocVw) and to Microsoft HTML Object Library:
Then the Object Explorer is your friend:
So getElementsByTagName returns IHTMLElementCollection which has property length. When on the page some elements with specific tag name are found then length is greater then zero. HTH
Dim tables As IHTMLElementCollection
Set tables = doc.getElementsByTagName("table")
If tables.Length > 0 Then
Dim table As HTMLTable
Set table = tables.item(0)
' Because item is the default property of IHTMLElementCollection we can simplyfy
Set table = tables(0) ' this is the same as tables.item(0)
End If
In VBA the appended (0) refers to the first element of an array (assuming an Option Base 0). Here is a short example:
vArr = Array("element 1", "element 2", "element 3")
Debug.Print v(1)
The above code should return element 2 as the second element of a zero-based array.
So, getelementsbytagname("table")(0) refers to the first element of that table. Yet, if the "table" is not found then there is no array to get from that table and getting the first element from that array (by appending (0)) yields an error.
Instead you should test if there is actually a table by that name (before trying to access the array of elements within that table) like so:
If (iDom.getelementsbytagname("table") = 0) Then
I have scoured the web and this site looking for an answer on this, so I would really appreciate some help.
I'm creating a VBScript to do some modifications to a user-specified Excel spreadsheet. I have the first part of my script working fine, but the second part is driving me nuts. I need it to search the first column for a value and, if found, delete the row. Right now I'm not worrying about the deletion statement--I'm doing testing by seeing if I can get the For Each statement to run properly as well as the If Then statement. Here's the specific block of code:
For Each cell in objSheet.Columns("A:A").Cells
Set cell = objSheet.Columns("A:A").Cells
If cell.Value = "60802400040000" then
cell.font.bold = True
End If
Next
I have tried many variations of this and cannot find the right combination. Initially I was getting an "Object Required" messages, and after reading a number of posts, found that I needed to put in a Set statement for cell, which I did. Now I am getting a Mismatch Type error message.
The funny thing is, before I put in the Set statement, the code would execute, but it would throw the Object Required error when I closed the spreadsheet. After adding it, the error for the Type Mismatch pops up immediately.
Most examples I keep finding on the web are for VBA, and I try to modify them for VBS, which I don't know very well. Any assistance anyone can give me will be greatly appreciated.
You are redefining cell, cell is defined automatically in the For Each statement.
Delete this line
Set cell = objSheet.Columns("A:A").Cells
This is an example from Help, unfortunately Help doesn't have any examples that uses For Each, only For x = n to n and other means. For Each is the right thing to do.
Set r = Range("myRange")
For n = 1 To r.Rows.Count
If r.Cells(n, 1) = r.Cells(n + 1, 1) Then
MsgBox "Duplicate data in " & r.Cells(n + 1, 1).Address
End If
Next n
For vba to vbs, you have to create the object and use, as some objects are automatically available in VBA (like app object) - Set exceldoc = CreateObject("c:\blah\blah.xls) then to use Set r = exceldoc.worksheets(0).range("MyRange").
Also you have to use constant values not names as vbscript can't look them up.
I got 1000s of rows of URLs and for each row I am using:
For i = 2 to last row
Set links = html.getElementsByTagName("a")
For Each lnk In links
If lnk.innertext = "something" then
do something
end if
Next lnk
Next i
a method coomonly used, I guess, and as a ref shown by Sid's code at How to access innerText of HTML tag inside a <TD> tag
Is For loop (the one for each lnk), pretty much the only method in this scenario or are there faster efficient methods?
MATCH is probably used only for sheet ranges, but tried it anyways. It runs w/o error, does nothing and takes the same time as the For loop method. I think it does nothing due to the lack of appropriate addressing in :
If Not IsError(Application.Match("something", Range("A1:A100"), 0)) 'normally used for ranges
If Not IsError(Application.Match("something", links.innertext, 0)) 'what I tried