Today I have come across an interesting question in stackoverflow in the side of Java Selenium Binding, I am trying to find a solution through WATIR but I couldn't succeed here,
I try to read the mobile number from a page and here is the webpage
<span class="telnowpr">
<a class="tel mtel">
<span class="mobilesv icon-ba"/>
<span class="mobilesv icon-ts"/>
<span class="mobilesv icon-oqp"/>
<span class="mobilesv icon-wx"/>
<span class="mobilesv icon-nlm"/>
<span class="mobilesv icon-ts"/>
<span class="mobilesv icon-ji"/>
<span class="mobilesv icon-ji"/>
<span class="mobilesv icon-ji"/>
<span class="mobilesv icon-fde"/>
<span class="mobilesv icon-fde"/>
<span class="mobilesv icon-nlm"/>
<span class="mobilesv icon-lk"/>
</a>
,
<a class="tel mtel">
<span class="mobilesv icon-ba"/>
<span class="mobilesv icon-ts"/>
<span class="mobilesv icon-oqp"/>
<span class="mobilesv icon-wx"/>
<span class="mobilesv icon-nlm"/>
<span class="mobilesv icon-ts"/>
<span class="mobilesv icon-ji"/>
<span class="mobilesv icon-ji"/>
<span class="mobilesv icon-ji"/>
<span class="mobilesv icon-fde"/>
<span class="mobilesv icon-fde"/>
<span class="mobilesv icon-nlm"/>
<span class="mobilesv icon-ikj"/>
</a>
</span>
Each and every span is holding a single number which I can see when move the firebug tool arrow but no number is inside the span in html but however it's appearing in the page. I tried to extract through value and also text but no suceess so far, I haven't seen pages like this so far.
Code
b = Watir::Browser.new #driver
b.goto 'https://www.justdial.com/Ahmedabad/Knife-Fork-Restaurant-Shah-E-Alam-Tollnaka-Opposite-Swaminarayan-College-Shah-Alam/079PXX79-XX79-170524174654-D3J2_BZDET?xid=QWhtZWRhYmFkIEFmZ2hhbmkgUmVzdGF1cmFudHM='
p b.span(class: 'telnowpr').text
which actually prints a comma, this prints a comma because comma appears between two <a> but I don't how to bring out all the numbers. Can anyone help me?
They appear to be using icons to display the numbers instead of just numbers. I presume it is to prevent bots and automated software from collecting all of the numbers for spam.
Each number appears to have a unique css class, so why don't you create a function that will check the text in the class and return the number it corresponds to?
Try something like below:
List<WebElement> allSpans = driver.findElements(By.cssSelector("ul#comp-contact span.telnowpr >a > span.icon-acb"));
System.out.println(allSpans.size());
//for(WebElement item : allSpans) {
String script = "return window.getComputedStyle(document.querySelector('ul#comp-contact span.telnowpr >a > span.icon-acb'),':before').getPropertyValue('content')";
JavascriptExecutor js = (JavascriptExecutor) driver;
String content = (String) js.executeScript(script);
System.out.println("Value : " + content);
//}
this is for a single span element shown in below image, You'll need to to do it for all <span> elements under
<ul id="comp-contact">
:
Related
I am new to scraping so please be patient with me. I have this HTML code and I want to extract the type of property e.g. ‘Apartment’, the no. of beds e.g. 2 and the location e.g. ‘Birmingham’ only. I want to save each of these in a list. The problem is that there’s no unique class identifier.
<div class="extra">
<span class="tablet-visible">
<span class="item"><label><i class="ouricon classified"></i><b></b></label>
<span>For Sale</span></span>
</span>
<span class="tablet-visible">
<span class="item"><label><i class="ouricon house"></i><b></b></label>
<span>Apartment</span></span>
</span>
<span class="">
<span class="item"><label><i class="ouricon bed"></i><b></b></label>
<span>2</span>
</span>
</span>
<span class="">
<span class="item"><label><i class="ouricon locationpin"></i><b></b></label>
<span>Birmingham</span>
</span>
</span>
</div>
I tried this code but of course this prints all the text in class=extra including the 'For Sale' which is not what I want.
results = requests.get(url)
soup = BeautifulSoup(results.text, "html.parser")
desc_div = soup.find_all('div', attrs={"data-itemid": True})
for property in desc_div:
extra = property.find('div', class_='extra')
print(extra.text.strip())
Any help would be much appreciated.
Since For Sale is in the same tag and class, just filter it out.
from bs4 import BeautifulSoup
html = """
<div class="extra">
<span class="tablet-visible">
<span class="item"><label><i class="ouricon classified"></i><b></b></label>
<span>For Sale</span></span>
</span>
<span class="tablet-visible">
<span class="item"><label><i class="ouricon house"></i><b></b></label>
<span>Apartment</span></span>
</span>
<span class="">
<span class="item"><label><i class="ouricon bed"></i><b></b></label>
<span>2</span>
</span>
</span>
<span class="">
<span class="item"><label><i class="ouricon locationpin"></i><b></b></label>
<span>Birmingham</span>
</span>
</span>
</div>
"""
soup = BeautifulSoup(html, "html.parser").find_all("span", {"class": "item"})
print([i.text.strip() for i in soup if i.text.strip() != "For Sale"])
Output:
['Apartment', '2', 'Birmingham']
I have a problem with Selenium webdriver.
I want to log in a website and collect some articles from there. I could log in the website with my code, but couldn't find all elements.
This code can find all elements except ninth element of 'major' and 'notice'. (I don't know why but it could find ninth element of 'upload')
from selenium import webdriver
import MyCode
Data = []
....
driver = webdriver.Chrome()
driver.get(MyCode.url[0])
driver.implicitly_wait(10)
major = driver.find_elements_by_class_name('board-lecture-title')
notice = driver.find_elements_by_xpath('//*[#class=\'post-title\']/*')
upload = driver.find_elements_by_class_name('post-date')
....
driver.close()
This is a part of HTML code of the website
<li class="isnotice" style="width:calc(100% - 20px) !important;">
<span class="post-title">
<a href="https://url">
<span class="board-lecture-title">[Course]</span>Title
</a>
</span>
<br>
<span class="post-date">2020년 7월 30일, 목요일, 오전 10:58</span>
<span class="post-viewinfo area-right">
0
<br>
<span>View</span>
</span>
</li>
<li class="isnotice" style="width:calc(100% - 20px) !important;">
<span class="post-title">
<a href="https://url">
<span class="board-lecture-title">[Course]</span>Title
</a>
</span>
<br>
<span class="post-date">2020년 7월 15일, 수요일, 오후 12:20</span>
<span class="post-viewinfo area-right">
0
<br>
<span>View</span>
</span>
</li>
<li class="isnotice" style="width:calc(100% - 20px) !important;">
<span class="post-title">
<a href="https://url">
<span class="board-lecture-title">[Course]</span>Title
</a>
</span>
<br>
<span class="post-date">2020년 6월 29일, 월요일, 오전 11:18</span>
<span class="post-viewinfo area-right">
47
<br>
<span>View</span>
</span>
....
</ul>
I'm sorry for my poor english give you confusion.
wondering how to target the "Switch" text on the below html:
<div class="product_title">
<a href="/game/pc/into-the-breach" class="hover_none">
<h1>Into the Breach</h1>
</a>
<span class="platform">
<a href="/game/pc">
PC
</a>
</span>
</div>
<div class="product_data">
<ul class="summary_details">
<li class="summary_detail publisher" >
<span class="label">Publisher:</span>
<span class="data">
<a href="/company/subset-games" >
Subset Games
</a>
</span>
</li>
<li class="summary_detail release_data">
<span class="label">Release Date:</span>
<span class="data" >Feb 27, 2018</span>
</li>
<li class="summary_detail product_platforms">
<span class="label">Also On:</span>
<span class="data">
Switch </span>
</li>
</ul>
</div>
so far I am capturing the "Also On:" text as well (with a lot of spaces) with this code:
self.playable_on_systems_label.setText(self.html_soup.find("span", class_='platform').text.strip() + ', ' + self.html_soup.find("li", class_='summary_detail product_platforms').text.strip())
how do I capture (in this case) only the "Switch" text?
FYI - for the first half of the statement (capturing the "PC") text works fine just not the "also on" text
Thanks in advance,
Your query is getting the entire span element with class="summary_detail product_platforms", which is going to include all the text starting from "Also On:" until "Switch." Try something like .find('a', href=re.compile("^.+switch.+$")) or alternately (using CSS) .select("a[href*=switch]") (solution from here)
you can use BeautifulSoup select() function to navigate the the "Switch" text, check this code!!!
rom bs4 import BeautifulSoup
html = '''<div class="product_title">
<a class="hover_none" href="/game/pc/into-the-breach">
<h1>Into the Breach</h1>
</a>
<span class="platform">
<a href="/game/pc">
PC
</a>
</span>
</div>
<div class="product_data">
<ul class="summary_details">
<li class="summary_detail publisher">
<span class="label">Publisher:</span>
<span class="data">
<a href="/company/subset-games">
Subset Games
</a>
</span>
</li>
<li class="summary_detail release_data">
<span class="label">Release Date:</span>
<span class="data">Feb 27, 2018</span>
</li>
<li class="summary_detail product_platforms">
<span class="label">Also On:</span>
<span class="data">
<a class="hover_none" href="/game/switch/into-the-breach">Switch</a> </span>
</li>
</ul>
</div>'''
soup = BeautifulSoup(html, 'html.parser')
text = soup.select('.summary_detail.product_platforms .hover_none')[0].text.strip()
print(text)
Output:
Switch
There is a standard Pager component which renders below html structure:
<div class="xspPagerContainer">
<div class="xspPagerRight" id="view:_id1:pager1">
<span class="xspPagerNav xspFirst" id="view:_id1:pager1__First">
First
</span>
<span class="xspPagerNav xspPrevious" id="view:_id1:pager1__Previous">
Previous
</span>
<span class="xspPagerNav xspGroup" id="view:_id1:pager1__Group">
<span>
<span class="xspFirstItem xspCurrentItem">
1
</span>
<span>
<a>2</a>
</span>
<span class="xspLastItem">
<a>3</a>
</span>
</span>
</span>
</div>
</div>
I would like to add some extra html tags like into above structure:
<div class="xspPagerContainer">
<div class="xspPagerRight" id="">
<span id="NewSpan"></span>
<span class="xspPagerNav xspFirst" id="someId">
First
</span>
<span class="xspPagerNav xspPrevious" id="view:_id1:pager1__Previous">
Previous
</span>
I was trying to use renderer to do it but I was not able to add anything inside xspPagerContainer div. All I achieved was to add something before or after this div.
Is something like this possible? How can I add html tag to already existing component without javascript (jQuery etc.) ? Should I create own component and create own renderer ?
This is IBM variation of JSF used in Xpages.
I'm using Watir to try to interface to the Quibids "Bid Now" (not Buy Now) button but can't seem to get the right combination of the following command to click the button:
browser.button(:value => 'Bid Now').click
I'm able to fill in a text field on the page so all my object set up is correct. It's just this command that I can't get to work. Every attempt gives me the error that the element cannot be found. I've also tried :id but nothing works and after working on it for 2 hours, thought I'd ask.
The following is the html out of IE around that button and any help would be appreciated. Thanks.
<p class="large-price">
<span style="background-image: none;" class="price">$5.68</span>
<span class="medium light-grey">USD</span>
</p>
<p class="time large-timer2 red">00:00:06</p>
<h2 class="margin-five username-height">
<span><img style="display: inline;" class="user-icon winning_avatar" src="https://s1.quibidscdn.com/n1/avatards/12.png" width="64" height="64"></span>
<br>
<span style="height: 30px;" class="winning">TOLLCOLLECTOR</span>
</h2>
<div id="298085604">
<p>
<a class="buttons bid large orange" href="#">Bid Now</a>
</p>
</div>
<script>
$(document).ready(function(){
AuctionDetail.updateSavings({"r":0,"v":0,"value":0});
});
</script>
<ul class="price-breakdown">
<li>Value Price:
<span id="product_valueprice" class="float-right">$649.99</span>
</li>
<li>Bids Credit:
<span class="float-right">- <span id="breakdown_bidsvalue">$0.00</span></span>
</li>
<li class="bid_breakdown last">
<span id="breakdown_realbids">0</span> Real / <span id="breakdown_voucherbids">0</span> Voucher </li>
<li>Buy Now Price<span class="float-right breakdown_buynowtotal">$649.99</span>
</li>
</ul>
<p>
<a id="buynowbtn" class="buynowbtn buttons large blue" href="#">
Buy Now
<span class="clear"></span>
<span class="buynow-price breakdown_buynowbtn">$649.99</span>
</a>
</p>
The HTML does not list it as a button. It lists it as a link- <a>. I've never used WATIR but given that the command is browser.button, it would seem that you need something like browser.link instead.
The process is quicker if you don't do a text scan and use the classes available to you:
I'd recommend using
browser.a(:class => "buttons bid large orange").click
instead, both will work, just generally it would be better practice to use the html if it's available to you!
My Friend...
My two possibilities.
browser.div(:id, "298085604").text(:name "Bid Now").click
or
browser.div(:text, "Bid Now").click
Remember that, the HTML is not displayed all functions, only the HTML code that the browser recognizes. this is the cause of programming language. example (framework, NET) these encrypted language code for the final user. You can not see the button entirely.
Good Luck!