I am trying to scrape the cyrpto currency information in my portfolio (e.g. current worth, change% etc.). I tried to come up with a useful code in last 10 hours but couldn't do it. First I tried the very nice code in here: Web scraping in Investing.com with Excel vba
However, it is for getting defined table information and I am not familiar with webscraping too much, especially with XML method. So I couldn't make it work.
The page I am trying to scrape is only reachable via login; therefore, I will try to show the html via copying here and screenshots.
The page I am trying to scrape:
You may check the example html screenshot (1061477 is id of Dogecoin) and html code below:
<tbody id="tbody_overview_5563889" class="ui-sortable">
<tr id="sort_945629" rel="5563889_945629" data-pair-id="945629" data-pair-exchange-id="1014" data-is-open-by="exchange" data-is-pair-exchange-open="1">
<td class="left dragHandle"><span class="checkers"></span></td>
<td class="flag"><span title="" class="ceFlags bitcoin"> </span></td>
<td data-column-name="name" data-pair-id="945629" class="symbol plusIconTd left bold elp alert js-injected-user-alert-container">
<span class="aqPopupWrapper js-hover-me-wrapper"><a target="_blank" href="/crypto/bitcoin/btc-usd" title="BTC/USD - Bitcoin US Dollar" class="aqlink js-hover-me" hoverme="markets" data-pairid="945629">BTC/USD</a></span>
<span class="js-plus-icon alertBellGrayPlus genToolTip oneliner" data-tooltip="Create Alert" data-tooltip-alt="Alert is active"></span>
</td>
<td data-column-name="symbol" class="left bold "><a target="_blank" href="/crypto/bitcoin/btc-usd" title=""></a></td>
<td data-column-name="exchange" class="left displayNone" title="Bitfinex">Bitfinex</td>
<td data-column-name="last" class="pid-945629-last" id="5563889_last_945629">40,324.0</td>
<td data-column-name="bid" class="pid-945629-bid displayNone" id="5563889_bid_945629">40,322.0</td>
<td data-column-name="ask" class="pid-945629-ask displayNone" id="5563889_ask_945629">40,323.0</td>
<td data-column-name="extended_hours" class="js-extended-hours js-extended-last Font pidExt-945629-last displayNone">--</td>
<td data-column-name="extended_hours_percent" class="js-extended-hours js-extended-percent Font pidExt-945629-pcp displayNone">--</td>
<td data-column-name="open" class="">37,461.0</td>
<td data-column-name="prev" class="displayNone">37,461.0</td>
<td data-column-name="high" class="pid-945629-high " id="5563889_high_945629">40,380.0</td>
<td data-column-name="low" class="pid-945629-low " id="5563889_low_945629">37,233.0</td>
<td data-column-name="chg" class="bold pid-945629-pc greenFont" id="5563889_chg_945629">+2863.0</td>
<td data-column-name="chgpercent" class="bold pid-945629-pcp greenFont" id="5563889_p_chg_945629">+7.64%</td>
<td data-column-name="vol" class="pid-945629-turnover " data-value="8733">8.88K</td>
<td data-column-name="next_earning" class="left textNum displayNone" data-value="0">--</td>
<td data-column-name="time" class="pid-945629-time " id="5563889_time_945629" data-value="1612610025">06:13:45</td>
<td class="icon" id="5563889_isopen_945629"><span class="greenClockIcon middle isOpenExch-1014"></span></td>
<td class="icon"> </td>
</tr><tr id="sort_1061477" rel="5563889_1061477" data-pair-id="1061477" data-pair-exchange-id="1037" data-is-open-by="exchange" data-is-pair-exchange-open="1">
<td class="left dragHandle"><span class="checkers"></span></td>
<td class="flag"><span title="" class="ceFlags dogecoin"> </span></td>
<td data-column-name="name" data-pair-id="1061477" class="symbol plusIconTd left bold elp alert js-injected-user-alert-container">
<span class="aqPopupWrapper js-hover-me-wrapper"><a target="_blank" href="/indices/investing.com-doge-usd" title="Investing.com Dogecoin Index" class="aqlink js-hover-me" hoverme="markets" data-pairid="1061477">Dogecoin</a></span>
<span class="js-plus-icon alertBellGrayPlus genToolTip oneliner" data-tooltip="Create Alert" data-tooltip-alt="Alert is active"></span>
</td>
<td data-column-name="symbol" class="left bold "><a target="_blank" href="/indices/investing.com-doge-usd" title="DOGE/USD">DOGE/USD</a></td>
<td data-column-name="exchange" class="left displayNone" title="Investing.com">Investing.com</td>
<td data-column-name="last" class="pid-1061477-last" id="5563889_last_1061477">0.048506</td>
<td data-column-name="bid" class=" displayNone" id="5563889_bid_1061477">-</td>
<td data-column-name="ask" class=" displayNone" id="5563889_ask_1061477">-</td>
<td data-column-name="extended_hours" class="js-extended-hours js-extended-last Font pidExt-1061477-last displayNone">--</td>
<td data-column-name="extended_hours_percent" class="js-extended-hours js-extended-percent Font pidExt-1061477-pcp displayNone">--</td>
<td data-column-name="open" class="">0.043969</td>
<td data-column-name="prev" class="displayNone">0.043969</td>
<td data-column-name="high" class="pid-1061477-high " id="5563889_high_1061477">0.051038</td>
<td data-column-name="low" class="pid-1061477-low " id="5563889_low_1061477">0.044505</td>
<td data-column-name="chg" class="bold pid-1061477-pc greenFont" id="5563889_chg_1061477">+0.004537</td>
<td data-column-name="chgpercent" class="bold pid-1061477-pcp greenFont" id="5563889_p_chg_1061477">+10.32%</td>
<td data-column-name="vol" class="pid-1061477-turnover " data-value="21137638982">21.08B</td>
<td data-column-name="next_earning" class="left textNum displayNone" data-value="0">--</td>
<td data-column-name="time" class="pid-1061477-time " id="5563889_time_1061477" data-value="1612610031">06:13:51</td>
<td class="icon" id="5563889_isopen_1061477"><span class="greenClockIcon middle isOpenExch-1037"></span></td>
<td class="icon"> </td>
</tr><tr id="sort_1057392" rel="5563889_1057392" data-pair-id="1057392" data-pair-exchange-id="1037" data-is-open-by="exchange" data-is-pair-exchange-open="1">
<td class="left dragHandle"><span class="checkers"></span></td>
<td class="flag"><span title="" class="ceFlags ripple"> </span></td>
<td data-column-name="name" data-pair-id="1057392" class="symbol plusIconTd left bold elp alert js-injected-user-alert-container">
<span class="aqPopupWrapper js-hover-me-wrapper"><a target="_blank" href="/indices/investing.com-xrp-usd" title="Investing.com XRP Index" class="aqlink js-hover-me" hoverme="markets" data-pairid="1057392">XRP</a></span>
<span class="js-plus-icon alertBellGrayPlus genToolTip oneliner" data-tooltip="Create Alert" data-tooltip-alt="Alert is active"></span>
</td>
I highlighted the parts that I am trying to get.
Although it is too slow, I was able to scrape some of the data with below code (x=1061477). I am getting error on redfont ones since it becomes green when the currency is going up. I tried to use the ID, but couldn't get the data. Also it changes my computer's time somehow :)
Sub getprice()
Dim ws As Worksheet: Set ws = ThisWorkbook.Worksheets("Sheet1")
Dim text As String
Dim lastrow As Long
Dim sht As Worksheet
Set sht = ActiveSheet
lastrow = sht.Cells(sht.Rows.Count, "A").End(xlUp).Row
For i = 2 To lastrow
x = Cells(i, 1).Value
Set IE = CreateObject("InternetExplorer.Application")
IE.navigate "https://www.investing.com/portfolio/?portfolioID=NTUwZjJiZjkzbT46NW8%3D"
Do While IE.Busy And IE.readyState <> 4: DoEvents: Loop
Sleep 500
Dim last As String
'Name = .document.getElementsByClassName("aqPopupWrapper js-hover-me-wrapper")(0).outerText
last = IE.document.getElementsByClassName("pid-" & x & "-last")(0).outerText
high = IE.document.getElementsByClassName("pid-" & x & "-high")(0).outerText
low = IE.document.getElementsByClassName("pid-" & x & "-low")(0).outerText
'Change = IE.document.getElementById("5563889_chg_1057392")(0).innerHTML
Change = IE.document.getElementsByClassName("bold pid-" & x & "-pc redFont")(0).outerText
change2 = IE.document.getElementsByClassName("bold pid-" & x & "-pcp redFont")(0).outerText
volume = IE.document.getElementsByClassName("pid-" & x & "-turnover")(0).outerText
Time = IE.document.getElementsByClassName("pid-" & x & "-time")(0).outerText
IE.Quit
' ws.Cells(2, 1).Value = Name
ws.Cells(i, 3).Value = last
ws.Cells(i, 4).Value = high
ws.Cells(i, 5).Value = low
ws.Cells(i, 6).Value = Change
ws.Cells(i, 7).Value = change2
ws.Cells(i, 8).Value = volume
ws.Cells(i, 9).Value = Time
Next i
End Sub
Any idea on how to scrape this data? Especially with XML method.
Thanks in advance for your help
Not sure what you mean in comments about whole table like in shared link but the whole table as per your image should be possible. You only show HTML from the tbody level (better would have been from table tag level); however, I reconstruct a table from that HTML, by matching on start substring of the id of the tbody, pulling out the outerHTML, adding wrapping table tags, and passing that html to the clipboard to then paste to sheet.
Technically, I could easily have generated a table object and grabbed .Rows(2).outHTML (assuming the "japanese dog" coin is in row 2) and wrapped in table tags instead, just to get the one row of interest.
NOTE: NOT TESTED
Dim s As String
s = "<table>" & ie.document.querySelector("[id^='tbody_overview]").outerHTML & "</table>"
' s = "<table>" & ie.document.querySelector("[id^='tbody_overview]").rows(2).outerHTML & "</table>"
Dim clipboard As Object
Set clipboard = GetObject("New:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}")
clipboard.SetText s
clipboard.PutInClipboard
ThisWorkbook.Worksheets("Sheet1").Cells(1, 1).PasteSpecial
First comment is that I think that authentication to investing.com may not even be needed.
investing.com provides a public page for each asset (stock or crypto-coin) that you can analyze. For example, to get the dogecoin info you could use the following url:
https://www.investing.com/crypto/dogecoin/doge-usd
Second comment is that there are ways to transform an html page to an Excel sheet without coding. I did something similar using coinmarketcap.com. See blog post here.
The same can be done for investing.com:
Create a local file called dogecoin-from-investing.com.iqy in a location you will remember. Type into the file the following:
WEB
1
https://www.investing.com/crypto/dogecoin/doge-usd
Selection=AllTables
Formatting=None
PreFormattedTextToColumns=True
ConsecutiveDelimitersAsOne=True
SingleBlockTextImport=False
DisableDateRecognition=False
DisableRedirections=False
Open Excel and create a new helper sheet in your worksheet and call it DogeCoin.
Navigate to Data → Get External Data → Run Web Query… and accept all defaults.
Magic! Excel did all the work for you.
You should now have a populated helper sheet with the up-to-date data about dogecoin.
You can use that data in any other sheet as needed. The data model does not change too often, at least until investing.com decides to change it.
To refresh the data, navigate to the Data menu and hit Refresh All.
Related
I am trying to click a link in a website as showed in screenshot below. This link's HTML code is like <a target="_blank" href="gamit_main.htm?gamitId=163734">'3311-10310</a>.
So that I have a code to click the link like:
Set HTMLDoc = ie.Document
HTMLDoc.getElementsByTagName("a").Click
But I got error as:
"Object doesn't support this property or method".
Ref code snap:
The full code:
Sub Something()
Dim ie As Object
Dim HTMLDoc As MSHTML.HTMLDocument
Dim ckt_No As String
ckt_No = Range("A2").Value
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = True
ShowWindow ie.Hwnd, SW_MAXIMIZE
ie.Navigate "http://xyw.htm"
Do While ie.Busy = True Or ie.ReadyState <> 4: DoEvents: Loop
ie.Navigate "http:dgetcjdm_ckt_No&display_content=Y&noFormFields=Y&refresh=Y"
Do While ie.Busy = True Or ie.ReadyState <> 4: DoEvents: Loop
Set HTMLDoc = ie.Document
HTMLDoc.getElementById("resultRow").getElementByTagName("a")(0).Click
End Sub
The HTML Code as follows:
<td align="center" height="15" title="1" class="tdborder">
<font class="rtabletext">1</font>
</td>
<td align="Left" title="GAMITARM Status Date" class="tdborder" nowrap="">
<font class="rtabletext">19-MAR-2020</font>
</td>
<td align="Left" title="Circuit Number" class="tdborder" nowrap="">
<font class="rtabletext"><a target="_blank" href="gamit_main.htm?gamitId=168592">'70F934C8</a></font>
</td>
<td align="Left" title="CUSTOMER" class="tdborder" nowrap="">
<font class="rtabletext">MICROSOFT CORPORATION</font>
</td>
<td align="Left" title="Customer Id" class="tdborder" nowrap="">
<font class="rtabletext">8684</font>
</td>
<td align="Left" title="AO AM" class="tdborder" nowrap="">
<font class="rtabletext"> </font>
</td>
<td align="Left" title="AO SD" class="tdborder" nowrap="">
<font class="rtabletext"> </font>
</td>
<td align="Left" title="AO ED/AVP" class="tdborder" nowrap="">
<font class="rtabletext"> </font>
</td>
<td align="Left" title="AO VP" class="tdborder" nowrap="">
<font class="rtabletext"> </font>
<td align="Left" title="LCON Phone Contact (SM Feed) " class="tdborder" nowrap="">
<font class="rtabletext"> </font>
</td>
<td align="Left" title="LCON Cell Contact (SM Feed) " class="tdborder" nowrap="">
<font class="rtabletext"> </font>
</td>
<td align="Left" title="Programme Office Status" class="tdborder" nowrap="">
<font class="rtabletext">New</font>
</td>
<td align="Left" title="Major Initiative" class="tdborder" nowrap="">
<font class="rtabletext">Ethernet</font>
</td>
<td align="Left" title="Project" class="tdborder" nowrap="">
<font class="rtabletext">APAC Singapore Ethernet Tail Rolls</font>
</td>
<td align="Left" title="Phase" class="tdborder" nowrap="">
<font class="rtabletext">x.2.2.a</font>
</td>
<td align="Left" title="LOA received" class="tdborder" nowrap="">
<font class="rtabletext">NO</font>
</td>
<td align="Left" title="Technical Connectivity details received" class="tdborder" nowrap="">
<font class="rtabletext">NO</font>
</td>
</tr>
and am trying to focus on this line
<font class="rtabletext"><a target="_blank" href="gamit_main.htm?gamitId=168592">'70F934C8</a></font>
Here some snap for parent: Plz find the Highlighted elements
I would combine a class selector for the parent node with an attribute = value selector (using contains operator * ) to target the child a tag by its href by it's value
htmlDoc.querySelector(".rtabletext [href*='gamitId=']").click
No need to retrieve more than one node or to use a loop
This does assume first node matched is the desired one. You can always extend the href value used between the '' to be more selector (or more generally extend the css selector used by querySelector to target exact node).
css selectors
The getElementsByTagName() method returns a collection of all elements in the document with the specified tag name, as an HTMLCollection object.
Thus, you should indicate, which element from the collection should be clicked. In this case, the first one is probably the safest choice:
Dim ie As Object
Set ie = CreateObject("InternetExplorer.Application")
Set HTMLDoc = ie.Document
HTMLDoc.getElementsByTagName("a")(0).Click
And by clicking F12 in Chrome, you may examine the website. If that element, you want to click has an ID tag in it, then it is a good idea to specify it:
HTMLDoc.getElementById("SomeFancyID").getElementsByTagName("a")(0).Click
Edit:
As far as the error is in the Set HTMLDoc, then try to minimize the code as much as possible and debug from there. Try this:
Sub Something()
Dim ie As Object
Dim HTMLDoc As MSHTML.HTMLDocument
Set ie = CreateObject("InternetExplorer.Application")
ie.Navigate "https://stackoverflow.com/questions"
Do While ie.Busy = True Or ie.ReadyState <> 4: DoEvents: Loop
Set HTMLDoc = ie.Document
Stop 'Press SHIFT + F9 and examine the window...
End Sub
And once the code stops on the Stop line, press SHIFT + F9 and examine the window. There you would see all the collections:
This code is working fine for Me:
set ElementCol=IE.Document.frames("Content_iFrame").Document.all
For each Link in ElementCol
getElementsByTagName("a")
is returning a collection of a tags
You have to loop through the collection and find the one you want (or if you know exactly the position of that tag in the DOM, then you could directly access that from the collection).
For example,
For each a in ie.document.getElementsByTagName("a")
If a.ClassName = "whatever_it_is" Then
a.Click
Exit For
Next
I am trying to extract specific text using a CSS selector. Here's a screenshot of the part that I would like to extract
I tried
div[id="Section3"]:first-child
but this doesn't return anything. I can't depend on locating the element by the text because I need to extract that text as shown.
This is the relevant HTML
<div class="ad24123fa4-c17c-4dc5-9aa5-ea007a8db30e-5" style="top:8px;left:218px;width:124px;height:31px;text-align:center;">
<table width="113px" border="0" cellpadding="0" cellspacing="0">
<tbody>
<tr>
<td>
<table width="100%" border="0" cellpadding="0" cellspacing="0">
<tbody>
<tr>
<td align="center">
<span class="fcb900b29f-64d7-453d-babf-192e86f17d6f-7">نظامي</span>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</div>
The full HTML is here.
This is my try
On Error Resume Next
Set ele = .FindElementByXPath("//span[text()='ãäÇÒá']")
If ele Is Nothing Then sStatus = "äÙÇãí" Else sStatus = "ãäÇÒá"
On Error GoTo 0
While inspecting the element I noticed that there is a hint of using $0 in the console .. Can this be useful?
As for the two possible texts "نظامي" and "منازل"
To use xpath with multiple possible search values use the following syntax:
//*[text()='نظامي' or text()='منازل']
CSS selectors (that work for me):
driver.findElementByCss("#ctl00_ContentPlaceHolder1_CrystalReportViewer1 div.ad071889d2-8e6f-4755-ad7d-c44ae0ea9fca-5 table span").text
which is an abbreviation of the full selector:
#ctl00_ContentPlaceHolder1_CrystalReportViewer1 > tbody > tr > td > div > div.crystalstyle > div.ad071889d2-8e6f-4755-ad7d-c44ae0ea9fca-5 > table > tbody > tr > td > table > tbody > tr > td > span
You can also index into table nodeList
Set matches = html.querySelectorAll("#ctl00_ContentPlaceHolder1_CrystalReportViewer1 div.crystalstyle table")
ActiveSheet.Cells(1, 1) = matches.item(80).innerText
Otherwise:
Reading in from html file I can take the last index of the matches based on class selector. For selenium you would switch to:
driver.FindElementsByCss(".fc180999a8-04b5-46bc-bf86-f601317d19c8-7").count
VBA:
Option Explicit
Public Sub test()
Dim html As HTMLDocument, matches As Object
Dim fStream As ADODB.Stream
Set html = New HTMLDocument
Set fStream = New ADODB.Stream
With fStream
.Charset = "UTF-8"
.Open
.LoadFromFile "C:\Users\User\Desktop\Output6.html"
html.body.innerHTML = .ReadText
.Close
End With
Set matches = html.querySelectorAll(".fc180999a8-04b5-46bc-bf86-f601317d19c8-7")
ActiveSheet.Cells(1, 1) = matches.item(matches.Length - 1).innerText
End Sub
This is my original query, when i save my data records to excel file the date format from dd/mm/yyyy to d/m/yyyy, what can i do to change to the excel date format? Hope someone can help me..thanks ya..i hope the date can be dd/mm/yyyy.
strSQL2= "select DISTINCT to_char(PROD,'dd/mm/yyyy') as PROD_FORMATTED, to_char(PRAD,'dd/mm/yyyy') as PRAD_FORMATTED,PROD,PRAD, BRCH,DEPT,SANO,SUBM, to_char(SUBD,'dd/mm/yyyy') as SUBD, STAT, PSFG, TSAM, TLEV, CLEV, GROP, CTLV, CCLV, CRNM, EXFL FROM SANCTH " & _
"where" & _
sqlWhere2 & " ((cono,sano) in " & strFilterRole & " or crid='" & SQLEncode(StrCrid) & "')" & _
"order by SUBD"
Some of the Excel output code;
<td align="left" style="vertical-align:middle"><%=((objRS_Search("PROD_FORMATTED")))%></td>
<td align="left" style="vertical-align:middle"><%=((objRS_Search("PRAD_FORMATTED")))%></td>
<td align="left" style="vertical-align:middle"><%=((objRS_Search("BRCH")))%></td>
<td align="left" style="vertical-align:middle"><%=((objRS_Search("DEPT")))%></td>
<td align="left" style="vertical-align:middle"><%=((objRS_Search("SANO")))%></td>
I get "02/04/2014" in my excel. when i use the code below, it is remove every double quote and the date become 2/4/2014.how can it be 02/04/2014? need anyone's help, thank you very much!
<td align="left" style="vertical-align:middle"><%=replace((("""" & objRS_Search("PROD_FORMATTED"))) & """",chr(34),"") %></td>
<td align="left" style="vertical-align:middle; mso-number-format:\#;"><%=(( objRS_Search("PROD_FORMATTED"))) %></td>
it is done when adding this mso-number-format:\#;
I have a worksheet that currently updates about 200 stocks using Yahoo Finance API and MSXML. I would like to also get some other info from other websites that don't have an api, for example "http://www.earningswhisper.com/stocks.asp?symbol=googl".
Also for example, if you look at some info from that web page below, you see there is a Release Date of 1/29/2015. There is also some text in between, currently "[not confirmed]", but at some point it will change to "[confirmed]" and both items of text are of interest.
For lack of better web skills, I currently have a single sheet with a QueryTable that is refreshed in a code synchronously. It works...eventually. I would rather work with the response in code, but I don't know how to do that. I don't need this particular info to be auto refreshed.
Questions
Is there a preferred VBA way work with the html response? Can you show a code snippet to illustrate?
Is it possible to convert the html to xml or json relatively easy? code snippet?
If QueryTable is in fact the good enough solution, would it be faster to create a sheet for each stock and refresh async, using events?
I know there is oodles of info on the web, but most of it seems dated and confusing. I am using Excel 2013.
I am able to get at the data I want using html and grabbing Table(6) as shown in the code below. I guess I could parse the InnerText but I suspect there is an easier way to grab the elements I need from that table.
Sub TestHtml()
Dim Resp, sText, FirstCode As String
Dim oHttp, oFile, oTable As Object
Dim lines As Variant
Set oHttp = CreateObject("Microsoft.XMLHttp")
oHttp.Open "GET", "http://www.earningswhisper.com/stocks.asp? symbol=googl", False
oHttp.send ("")
Resp = oHttp.responseText
Set oFile = CreateObject("htmlfile")
oFile.Write Resp
Set oTable = oFile.getElementsByTagName("Table")(6)
sText = oTable.innertext
MsgBox sText
End Sub
Here is a line from that table, and the full table below.
<tr onMouseover="this.className='newsart_s2'" onMouseOut="this.className='newsart'">
<td width="67%" align=left valign=middle> Release Date: <font color='#505050'><small>[not confirmed]</small></font>
</td>
<td width="33%" align=right valign=middle>1/29/2015 </td>
</tr>
What is the best way to drill down and get at the elements in the table, using VBA code?
<TABLE cellpadding=1 cellspacing=0 border=0 id=QEsts width="100%" bgcolor="#505050"><tr><td><TABLE cellpadding=2 cellspacing=0 border=0 width="100%" bgcolor="#FFFFFF" height='148'><tr><td valign=top>
<table cellpadding=0 cellspacing=0 border=0 width="100%" class='newsart'>
<tr><td colspan="2" bgcolor="#FFFFFF"><table width="100%" bgcolor="#FFFFFF" cellpadding=1 cellspacing=0 border=0><tr><td style="background-image: url('images/headbar2.gif'); background-color: #000000; BORDER-RIGHT: #000000 thin solid; BORDER-TOP: #000000 thin solid; FONT-WEIGHT: bold; FONT-SIZE: 12px; MARGIN: 2px; BORDER-LEFT: #000000 thin solid; COLOR: #e1b64b; BORDER-BOTTOM: #000000 thin solid; FONT-FAMILY: Arial;"> 4th Quarter Ending December 2014</td></tr></table></td></tr>
<tr onMouseover="this.className='newsart_s2'" onMouseOut="this.className='newsart'"><td width="67%" align=left valign=middle> <b>Earnings Whisper</b> <small>®</small>: </td><td width="33%" align=right valign=middle><b>$7.24</b> </td></tr>
<tr onMouseover="this.className='newsart_s2'" onMouseOut="this.className='newsart'"><td width="67%" align=left valign=middle> Consensus Estimate:</td><td width="33%" align=right valign=middle>$7.16 </td></tr>
<tr onMouseover="this.className='newsart_s2'" onMouseOut="this.className='newsart'"><td width="67%" align=left valign=middle> Surprise Expectation <small><sup>1</sup></small>: </td><td width="33%" align=right valign=middle> </td></tr>
<tr onMouseover="this.className='newsart_s2'" onMouseOut="this.className='newsart'"><td width="67%" align=left valign=middle> Release Date: <font color='#505050'><small>[not confirmed]</small></font></td><td width="33%" align=right valign=middle>1/29/2015 </td></tr>
<tr><td width="100%" align=right colspan="2" valign=middle>After Close </td></tr>
<tr onMouseover="this.className='newsart_s2'" onMouseOut="this.className='newsart'"><td width="67%" align=left valign=middle> Expected Time <small><sup>2</sup></small>: </td><td width="33%" align=right valign=middle>N/A </td></tr>
<tr onMouseover="this.className='newsart_s2'" onMouseOut="this.className='newsart'"><td width="67%" align=left valign=top> Conference Call: </td><td width="33%" align=right valign=top>4:30 PM ET <small><br> </small></td></tr>
</table>
</td></tr></TABLE></td></tr></TABLE>
</td>
You can interact with DOM since you created that object successfully, but you know in your case the easiest way to get the necessary value from the table above is parsing the InnerText as string, try the code below:
aTmp0 = Split(sText, "Release Date:")
If Ubound(aTmp0) = 1 Then
aTmp1 = Split(aTmp0(1), "Expected Time")
MsgBox aTmp1(0)
Else
MsgBox "Release Date not found"
End If
I know this is really easy for some of you out there. But I have been going deep on the internet and I can not find an answer. I need to get the company name that is inside the
tbody tr td a eBay-tradera.com
and
td class="bS aR" 970,80
/td /tr /tbody
<tbody id="matrix1_group0">
<tr class="oR" onmouseover="onMouseOver(this, false)" onmouseout="onMouseOut(this, false)" onclick="onClick(this, false)">
<td class="bS"> </td>
<td>
<a href="aProgramInfoApplyRead.action?programId=175&affiliateId=2014848" title="http://www.tradera.com/" target="_blank">
eBay-Tradera.com
</a>
</td>
<td class="aR">
175</td>
<td class="bS aR">0</td><td class="bS aR">0</td><td class="bS aR">187</td>
<td class="aR">0,00%</td><td class="bS aR">124</td>
<td class="aR">0,00%</td>
<td class="bS aR">26</td>
<td class="aR">20,97%</td>
<td class="bS aR">32</td>
<td class="aR">60,80</td>
<td class="aR">25,81%</td>
<td class="bS aR">5 102,00</td>
<td class="bS aR">0,00</td>
<td class="aR">0,00</td>
<td class="bS aR">
970,80
</td>
</tr>
</tbody>
This is my code, where I only try to get the a tag to start of with but I cant get that to work either
Set TDelements = document.getElementById("matrix1_group0").document.getElementsbytagname("a").innerHTML
r = 0
C = 0
For Each TDelement In TDelements
Blad1.Range("A1").Offset(r, C).Value = TDelement.innerText
r = r + 1
Next
Thanks on beforehand I know that this might be to simple. But I hope that other people might have the same issue and this will be helpful for them as well. The reason for the "r = r + 1" is because there are many more companies on this list. I just wanted to make it as easy as I could. Thanks again!
You will need to specify the element location in the table. Ebay seems to be obfuscating the class-names so we cannot rely on those being consistent. Nor would I usually rely on the elements by their table index being consistent but I don't see any way around this.
I am assuming that this is the HTML document you are searching
<tbody id="matrix1_group0">
<tr class="oR" onmouseover="onMouseOver(this, false)" onmouseout="onMouseOut(this, false)" onclick="onClick(this, false)">
<td class="bS"> </td>
<td>
<a href="aProgramInfoApplyRead.action?programId=175&affiliateId=2014848" title="http://www.tradera.com/" target="_blank">
eBay-Tradera.com <!-- <=== You want this? -->
</a>
</td>
<!-- ... -->
</tr>
<!-- ... -->
</tbody>
We can ignore the rest of the document as the table element has an ID. In short, we assume that
.getElementById("matrix1_group0").getElementsByTagName("TR")
will return a collection of html row objects sorted by their appearance.
Set matrix = document.getElementById("matrix1_group0")
Set firstRow = matrix.getElementsByTagName("TR")(1)
Set firstRowSecondCell = firstRow.getElementsByTagName("TD")(2)
traderaName = firstRowSecondCell.innerText
Of course you could inline this all as
document.getElementById("matrix1_group0").getElementsByTagName("TR")(1).getElementsByTagName("TD")(2).innerText
but that would make debugging harder. Also if the web-page is ever presented to you in a different format then this won't work. Ebay is deliberately making it hard for you to scrape data off of it for security.
With only the HTML you have shown you can use CSS selectors to obtain these:
a[href*='aProgramInfoApplyRead.action?programId']
Which says a tag with attribute href that contains the string 'aProgramInfoApplyRead.action?programId'. This matches two elements but the first is the one you want.
CSS Selector:
VBA:
You can use .querySelector method of .document to retrieve the first match
Debug.Print ie.document.querySelector("a[href*='aProgramInfoApplyRead.action?programId']").innerText