VBA Excel get text inside HTMLObject - excel

I know this is really easy for some of you out there. But I have been going deep on the internet and I can not find an answer. I need to get the company name that is inside the
tbody tr td a eBay-tradera.com
and
td class="bS aR" 970,80
/td /tr /tbody
<tbody id="matrix1_group0">
<tr class="oR" onmouseover="onMouseOver(this, false)" onmouseout="onMouseOut(this, false)" onclick="onClick(this, false)">
<td class="bS"> </td>
<td>
<a href="aProgramInfoApplyRead.action?programId=175&affiliateId=2014848" title="http://www.tradera.com/" target="_blank">
eBay-Tradera.com
</a>
</td>
<td class="aR">
175</td>
<td class="bS aR">0</td><td class="bS aR">0</td><td class="bS aR">187</td>
<td class="aR">0,00%</td><td class="bS aR">124</td>
<td class="aR">0,00%</td>
<td class="bS aR">26</td>
<td class="aR">20,97%</td>
<td class="bS aR">32</td>
<td class="aR">60,80</td>
<td class="aR">25,81%</td>
<td class="bS aR">5 102,00</td>
<td class="bS aR">0,00</td>
<td class="aR">0,00</td>
<td class="bS aR">
970,80
</td>
</tr>
</tbody>
This is my code, where I only try to get the a tag to start of with but I cant get that to work either
Set TDelements = document.getElementById("matrix1_group0").document.getElementsbytagname("a").innerHTML
r = 0
C = 0
For Each TDelement In TDelements
Blad1.Range("A1").Offset(r, C).Value = TDelement.innerText
r = r + 1
Next
Thanks on beforehand I know that this might be to simple. But I hope that other people might have the same issue and this will be helpful for them as well. The reason for the "r = r + 1" is because there are many more companies on this list. I just wanted to make it as easy as I could. Thanks again!

You will need to specify the element location in the table. Ebay seems to be obfuscating the class-names so we cannot rely on those being consistent. Nor would I usually rely on the elements by their table index being consistent but I don't see any way around this.
I am assuming that this is the HTML document you are searching
<tbody id="matrix1_group0">
<tr class="oR" onmouseover="onMouseOver(this, false)" onmouseout="onMouseOut(this, false)" onclick="onClick(this, false)">
<td class="bS"> </td>
<td>
<a href="aProgramInfoApplyRead.action?programId=175&affiliateId=2014848" title="http://www.tradera.com/" target="_blank">
eBay-Tradera.com <!-- <=== You want this? -->
</a>
</td>
<!-- ... -->
</tr>
<!-- ... -->
</tbody>
We can ignore the rest of the document as the table element has an ID. In short, we assume that
.getElementById("matrix1_group0").getElementsByTagName("TR")
will return a collection of html row objects sorted by their appearance.
Set matrix = document.getElementById("matrix1_group0")
Set firstRow = matrix.getElementsByTagName("TR")(1)
Set firstRowSecondCell = firstRow.getElementsByTagName("TD")(2)
traderaName = firstRowSecondCell.innerText
Of course you could inline this all as
document.getElementById("matrix1_group0").getElementsByTagName("TR")(1).getElementsByTagName("TD")(2).innerText
but that would make debugging harder. Also if the web-page is ever presented to you in a different format then this won't work. Ebay is deliberately making it hard for you to scrape data off of it for security.

With only the HTML you have shown you can use CSS selectors to obtain these:
a[href*='aProgramInfoApplyRead.action?programId']
Which says a tag with attribute href that contains the string 'aProgramInfoApplyRead.action?programId'. This matches two elements but the first is the one you want.
CSS Selector:
VBA:
You can use .querySelector method of .document to retrieve the first match
Debug.Print ie.document.querySelector("a[href*='aProgramInfoApplyRead.action?programId']").innerText

Related

Vim Replace On Each Line With Different String

I've got a document from a client which has a GIANT table in it which looks something like this:
<table id="someid">
<tr>
<td>Product</td>
<td class="product1">Product 1</td>
<td class="product2">Product 2</td>
<td class="product3">Product 3</td>
<td class="product4">Product 4</td>
<td class="product5">Product 5</td>
</tr>
<tr>
<td>Boiling Point</td>
<td>72</td>
<td>91</td>
<td>38</td>
<td>21</td>
<td>41</td>
</tr>
[ 45 more rows here]
</table>
Only there are actually 15 products, and instead of "product1" and "product2" they have the actual name of the products as their preexisting classes.
The client has asked me to add classes to each of the appropriate td elements so that they are matched up with their product like class="product1" added to each 2nd td for every row.
Everything is static... I'm wondering if there's a quick way to do this in vim? Is it possible to tell vim to add a string to a certain position on every 18th line? Or am I stuck manually adding all the classes?
Suppose that the relation between your class name and the line number can be described as an expression, you can use :[range]substitute and s/\= to replace lines with any expression. For example,
1,10s/<td/\=submatch(0) . ' class="product' . (line('.') % 5 - 1) . '"'
will change
<tr>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
</tr>
to
<tr>
<td class="product1"></td>
<td class="product2"></td>
<td class="product3"></td>
</tr>
<tr>
<td class="product1"></td>
<td class="product2"></td>
<td class="product3"></td>
</tr>
You can adapt those arguments in the above command to suit your needs.
For complex replacement, you can define a helper function in a temporary vim file such as foo.vim
function! GetClassName()
let order = line('.') % 5
if order == 1
return 'a'
endif
if order == 2
return 'b'
endif
endfunction
then source it by :source %.
Next, you can switch to your file and use it as follows
1,10s/^\s\+<td/\=submatch(0) . ' class="' . GetClassName() . '"'

Web scraping in Investing.com portfolio with Excel vba

I am trying to scrape the cyrpto currency information in my portfolio (e.g. current worth, change% etc.). I tried to come up with a useful code in last 10 hours but couldn't do it. First I tried the very nice code in here: Web scraping in Investing.com with Excel vba
However, it is for getting defined table information and I am not familiar with webscraping too much, especially with XML method. So I couldn't make it work.
The page I am trying to scrape is only reachable via login; therefore, I will try to show the html via copying here and screenshots.
The page I am trying to scrape:
You may check the example html screenshot (1061477 is id of Dogecoin) and html code below:
<tbody id="tbody_overview_5563889" class="ui-sortable">
<tr id="sort_945629" rel="5563889_945629" data-pair-id="945629" data-pair-exchange-id="1014" data-is-open-by="exchange" data-is-pair-exchange-open="1">
<td class="left dragHandle"><span class="checkers"></span></td>
<td class="flag"><span title="" class="ceFlags bitcoin"> </span></td>
<td data-column-name="name" data-pair-id="945629" class="symbol plusIconTd left bold elp alert js-injected-user-alert-container">
<span class="aqPopupWrapper js-hover-me-wrapper"><a target="_blank" href="/crypto/bitcoin/btc-usd" title="BTC/USD - Bitcoin US Dollar" class="aqlink js-hover-me" hoverme="markets" data-pairid="945629">BTC/USD</a></span>
<span class="js-plus-icon alertBellGrayPlus genToolTip oneliner" data-tooltip="Create Alert" data-tooltip-alt="Alert is active"></span>
</td>
<td data-column-name="symbol" class="left bold "><a target="_blank" href="/crypto/bitcoin/btc-usd" title=""></a></td>
<td data-column-name="exchange" class="left displayNone" title="Bitfinex">Bitfinex</td>
<td data-column-name="last" class="pid-945629-last" id="5563889_last_945629">40,324.0</td>
<td data-column-name="bid" class="pid-945629-bid displayNone" id="5563889_bid_945629">40,322.0</td>
<td data-column-name="ask" class="pid-945629-ask displayNone" id="5563889_ask_945629">40,323.0</td>
<td data-column-name="extended_hours" class="js-extended-hours js-extended-last Font pidExt-945629-last displayNone">--</td>
<td data-column-name="extended_hours_percent" class="js-extended-hours js-extended-percent Font pidExt-945629-pcp displayNone">--</td>
<td data-column-name="open" class="">37,461.0</td>
<td data-column-name="prev" class="displayNone">37,461.0</td>
<td data-column-name="high" class="pid-945629-high " id="5563889_high_945629">40,380.0</td>
<td data-column-name="low" class="pid-945629-low " id="5563889_low_945629">37,233.0</td>
<td data-column-name="chg" class="bold pid-945629-pc greenFont" id="5563889_chg_945629">+2863.0</td>
<td data-column-name="chgpercent" class="bold pid-945629-pcp greenFont" id="5563889_p_chg_945629">+7.64%</td>
<td data-column-name="vol" class="pid-945629-turnover " data-value="8733">8.88K</td>
<td data-column-name="next_earning" class="left textNum displayNone" data-value="0">--</td>
<td data-column-name="time" class="pid-945629-time " id="5563889_time_945629" data-value="1612610025">06:13:45</td>
<td class="icon" id="5563889_isopen_945629"><span class="greenClockIcon middle isOpenExch-1014"></span></td>
<td class="icon"> </td>
</tr><tr id="sort_1061477" rel="5563889_1061477" data-pair-id="1061477" data-pair-exchange-id="1037" data-is-open-by="exchange" data-is-pair-exchange-open="1">
<td class="left dragHandle"><span class="checkers"></span></td>
<td class="flag"><span title="" class="ceFlags dogecoin"> </span></td>
<td data-column-name="name" data-pair-id="1061477" class="symbol plusIconTd left bold elp alert js-injected-user-alert-container">
<span class="aqPopupWrapper js-hover-me-wrapper"><a target="_blank" href="/indices/investing.com-doge-usd" title="Investing.com Dogecoin Index" class="aqlink js-hover-me" hoverme="markets" data-pairid="1061477">Dogecoin</a></span>
<span class="js-plus-icon alertBellGrayPlus genToolTip oneliner" data-tooltip="Create Alert" data-tooltip-alt="Alert is active"></span>
</td>
<td data-column-name="symbol" class="left bold "><a target="_blank" href="/indices/investing.com-doge-usd" title="DOGE/USD">DOGE/USD</a></td>
<td data-column-name="exchange" class="left displayNone" title="Investing.com">Investing.com</td>
<td data-column-name="last" class="pid-1061477-last" id="5563889_last_1061477">0.048506</td>
<td data-column-name="bid" class=" displayNone" id="5563889_bid_1061477">-</td>
<td data-column-name="ask" class=" displayNone" id="5563889_ask_1061477">-</td>
<td data-column-name="extended_hours" class="js-extended-hours js-extended-last Font pidExt-1061477-last displayNone">--</td>
<td data-column-name="extended_hours_percent" class="js-extended-hours js-extended-percent Font pidExt-1061477-pcp displayNone">--</td>
<td data-column-name="open" class="">0.043969</td>
<td data-column-name="prev" class="displayNone">0.043969</td>
<td data-column-name="high" class="pid-1061477-high " id="5563889_high_1061477">0.051038</td>
<td data-column-name="low" class="pid-1061477-low " id="5563889_low_1061477">0.044505</td>
<td data-column-name="chg" class="bold pid-1061477-pc greenFont" id="5563889_chg_1061477">+0.004537</td>
<td data-column-name="chgpercent" class="bold pid-1061477-pcp greenFont" id="5563889_p_chg_1061477">+10.32%</td>
<td data-column-name="vol" class="pid-1061477-turnover " data-value="21137638982">21.08B</td>
<td data-column-name="next_earning" class="left textNum displayNone" data-value="0">--</td>
<td data-column-name="time" class="pid-1061477-time " id="5563889_time_1061477" data-value="1612610031">06:13:51</td>
<td class="icon" id="5563889_isopen_1061477"><span class="greenClockIcon middle isOpenExch-1037"></span></td>
<td class="icon"> </td>
</tr><tr id="sort_1057392" rel="5563889_1057392" data-pair-id="1057392" data-pair-exchange-id="1037" data-is-open-by="exchange" data-is-pair-exchange-open="1">
<td class="left dragHandle"><span class="checkers"></span></td>
<td class="flag"><span title="" class="ceFlags ripple"> </span></td>
<td data-column-name="name" data-pair-id="1057392" class="symbol plusIconTd left bold elp alert js-injected-user-alert-container">
<span class="aqPopupWrapper js-hover-me-wrapper"><a target="_blank" href="/indices/investing.com-xrp-usd" title="Investing.com XRP Index" class="aqlink js-hover-me" hoverme="markets" data-pairid="1057392">XRP</a></span>
<span class="js-plus-icon alertBellGrayPlus genToolTip oneliner" data-tooltip="Create Alert" data-tooltip-alt="Alert is active"></span>
</td>
I highlighted the parts that I am trying to get.
Although it is too slow, I was able to scrape some of the data with below code (x=1061477). I am getting error on redfont ones since it becomes green when the currency is going up. I tried to use the ID, but couldn't get the data. Also it changes my computer's time somehow :)
Sub getprice()
Dim ws As Worksheet: Set ws = ThisWorkbook.Worksheets("Sheet1")
Dim text As String
Dim lastrow As Long
Dim sht As Worksheet
Set sht = ActiveSheet
lastrow = sht.Cells(sht.Rows.Count, "A").End(xlUp).Row
For i = 2 To lastrow
x = Cells(i, 1).Value
Set IE = CreateObject("InternetExplorer.Application")
IE.navigate "https://www.investing.com/portfolio/?portfolioID=NTUwZjJiZjkzbT46NW8%3D"
Do While IE.Busy And IE.readyState <> 4: DoEvents: Loop
Sleep 500
Dim last As String
'Name = .document.getElementsByClassName("aqPopupWrapper js-hover-me-wrapper")(0).outerText
last = IE.document.getElementsByClassName("pid-" & x & "-last")(0).outerText
high = IE.document.getElementsByClassName("pid-" & x & "-high")(0).outerText
low = IE.document.getElementsByClassName("pid-" & x & "-low")(0).outerText
'Change = IE.document.getElementById("5563889_chg_1057392")(0).innerHTML
Change = IE.document.getElementsByClassName("bold pid-" & x & "-pc redFont")(0).outerText
change2 = IE.document.getElementsByClassName("bold pid-" & x & "-pcp redFont")(0).outerText
volume = IE.document.getElementsByClassName("pid-" & x & "-turnover")(0).outerText
Time = IE.document.getElementsByClassName("pid-" & x & "-time")(0).outerText
IE.Quit
' ws.Cells(2, 1).Value = Name
ws.Cells(i, 3).Value = last
ws.Cells(i, 4).Value = high
ws.Cells(i, 5).Value = low
ws.Cells(i, 6).Value = Change
ws.Cells(i, 7).Value = change2
ws.Cells(i, 8).Value = volume
ws.Cells(i, 9).Value = Time
Next i
End Sub
Any idea on how to scrape this data? Especially with XML method.
Thanks in advance for your help
Not sure what you mean in comments about whole table like in shared link but the whole table as per your image should be possible. You only show HTML from the tbody level (better would have been from table tag level); however, I reconstruct a table from that HTML, by matching on start substring of the id of the tbody, pulling out the outerHTML, adding wrapping table tags, and passing that html to the clipboard to then paste to sheet.
Technically, I could easily have generated a table object and grabbed .Rows(2).outHTML (assuming the "japanese dog" coin is in row 2) and wrapped in table tags instead, just to get the one row of interest.
NOTE: NOT TESTED
Dim s As String
s = "<table>" & ie.document.querySelector("[id^='tbody_overview]").outerHTML & "</table>"
' s = "<table>" & ie.document.querySelector("[id^='tbody_overview]").rows(2).outerHTML & "</table>"
Dim clipboard As Object
Set clipboard = GetObject("New:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}")
clipboard.SetText s
clipboard.PutInClipboard
ThisWorkbook.Worksheets("Sheet1").Cells(1, 1).PasteSpecial
First comment is that I think that authentication to investing.com may not even be needed.
investing.com provides a public page for each asset (stock or crypto-coin) that you can analyze. For example, to get the dogecoin info you could use the following url:
https://www.investing.com/crypto/dogecoin/doge-usd
Second comment is that there are ways to transform an html page to an Excel sheet without coding. I did something similar using coinmarketcap.com. See blog post here.
The same can be done for investing.com:
Create a local file called dogecoin-from-investing.com.iqy in a location you will remember. Type into the file the following:
WEB
1
https://www.investing.com/crypto/dogecoin/doge-usd
Selection=AllTables
Formatting=None
PreFormattedTextToColumns=True
ConsecutiveDelimitersAsOne=True
SingleBlockTextImport=False
DisableDateRecognition=False
DisableRedirections=False
Open Excel and create a new helper sheet in your worksheet and call it DogeCoin.
Navigate to Data → Get External Data → Run Web Query… and accept all defaults.
Magic! Excel did all the work for you.
You should now have a populated helper sheet with the up-to-date data about dogecoin.
You can use that data in any other sheet as needed. The data model does not change too often, at least until investing.com decides to change it.
To refresh the data, navigate to the Data menu and hit Refresh All.

How to get all td[3] tags from the tr tags with selenium Xpath in python

I have a webpage HTML like this:
<table class="table_type1" id="sailing">
<tbody>
<tr>
<td class="multi_row"></td>
<td class="multi_row"></td>
<td class="multi_row">1</td>
<td class="multi_row"></td>
</tr>
<tr>
<td class="multi_row"></td>
<td class="multi_row"></td>
<td class="multi_row">1</td>
<td class="multi_row"></td>
</tr>
</tbody>
</table>
and tr tags are dynamic so i don't know how many of them exist, i need all td[3] of any tr tags in a list for some slicing stuff.it is much better iterate with built in tools if find_element(s)_by_xpath("") has iterating tools.
Try
cells = driver.find_elements_by_xpath("//table[#id='sailing']//tr/td[3]")
to get third cell of each row
Edit
For iterating just use a for loop:
print ([i.text for i in cells])
Try following code :
tdElements = driver.find_elements_by_xpath("//table[#id="sailing "]/tbody//td")
Edit : for 3rd element
tdElements = driver.find_elements_by_xpath("//table[#id="sailing "]/tbody/tr/td[3]")
To print the text e.g. 1 from each of the third <td> you can either use the get_attribute() method or text property and you can use either of the following solutions:
Using CssSelector and get_attribute():
print(driver.find_elements_by_css_selector("table.table_type1#sailing tr td:nth-child(3)").get_attribute("innerHTML"))
Using CssSelector and text property:
print(driver.find_elements_by_css_selector("table.table_type1#sailing tr td:nth-child(3)").text)
Using XPath and get_attribute():
print(driver.find_elements_by_xpath('//table[#class='table_type1' and #id="sailing"]//tr//following::td[3]').get_attribute("innerHTML"))
Using XPath and text property:
print(driver.find_elements_by_xpath('//table[#class='table_type1' and #id="sailing"]//tr//following::td[3]').text)
To get the 3 rd td of each row, you can try either with xpath
driver.find_elements_by_xpath('//table[#id="sailing"]/tbody//td[3]')
or you can try with css selector like
driver.find_elements_by_css_selector('table#sailing td:nth-child(3)')
As it is returning list you can iterate with for each,
elements=driver.find_elements_by_xpath('//table[#id="sailing"]/tbody//td[3]')
for element in elements:
print(element.text)

JavaFX hide text of column in tableview

I have a tableview and I want to show an image in the first column. My problem is I can't sort the column then. My idea is to set text in the column too and hide the text so it is only for the correct sorting set. Is there a way to do that? Or what other solutions are possible for my problem?
I think this is the perfect example what you wants to do.Still let me know if you have any issue.
Check here
I would have a look at TableColumn.setCellValueFactory() and TableColumn.setCellFactory(). The further is used to provide the actual cell value (used for sorting!), the latter is used to provide the rendering.
In other words: If you need the sort order, you must not change the content, but only the Cell rendering. The methods mentioned above let you do exactly this.
Hope that helps ...
You could do it with just CSS using text-indent. You would also need to set the image as a css background. You did not provide an code of your table, but below is some example:
HTML:
<table width="100%" border="1" cellspacing="1" cellpadding="1">
<tr>
<td class="hidetext image">Text 1</td>
<td>Some text to show</td>
</tr>
<tr>
<td class="hidetext image">Text 2</td>
<td>Some text to show</td>
</tr>
<tr>
<td class="hidetext image">Text 3</td>
<td>Some text to show</td>
</tr>
<tr>
<td class="hidetext image">Text 4</td>
<td>Some text to show</td>
</tr>
</table>
CSS:
.hidetext {text-indent:-9000px}
.image {background:url(http://www.madisoncopy.com/images/jpeg.jpg) no-repeat;}
See how in the left column the text does not show (but it is actually there just indented off the screen).
See this fiddle: http://jsfiddle.net/D297P/

Formatting HTML Tables for Excel

I have a page in asp that makes a xls table, however when open the table all the rows are stuffed into the default column width which I would like to set.
My table looks something like this:
<table>
<thead>
'A for loop makes a series of th
</thead>
'another loop pulls db values
<tr><td>value1</td><td>value2</td> 'etc </tr>
</table>
I have tried the following to set the space
width="3.29in"
&nsp; spam (barbaric but sometimes effective)
width="400px"
style="width:300px"
none of the above seem to work.
Additionally here is my header asp incase its relevant
Response.Clear()
Response.Buffer = False
Response.ContentType = "application/vnd.ms-excel"
Response.AddHeader "Content-Disposition", "attachment; filename=blah.xls"
Also on a side note for some reason when I have a dollar value printed as such
<td>$<%=dbvalue%></td>
for some reason this yields '$dollar value and I am not sure how to nuke the single quote.
Do you need the thead tag? Something like this should work:
<html>
<body>
<h1>Report Title</h1>
<table >
<tr>
<th style="width : 300px">header1</th>
<th style="width : 100px">header2</th>
<th style="width : 200px">header..</th>
<th style="width : 300px">....</th>
</tr>
<tr class="row1">
<td >value1</td>
<td >value2</td>
<td >value..</td>
<td >....</td>
</tr>
....
Optionally you can put the table row with th tags inside a <thead> tag

Resources