How show stars in snippts? - structured-data

i have add itemprop too my code for rating value, but it does not show in my snippt(the stars), what should i do? does it take time?
<div itemprop="aggregateRating" itemscope="" itemtype="http://schema.org/AggregateRating" title="Rate" + " : " + hidContentRate.Value %>" >
<span itemprop="ratingValue"><%= hidContentRate.Value %></span>
</div>

Related

How to iterate through a list of Beautful soup tag elements and get a particular text if found else an empty string?

Case1:
<li style="padding:5px;border-bottom:1px solid #ccc">
<div itemscope="" itemtype="http://schema.org/LocalBusiness">
<h5 itemprop="name">
Derattizzazione Disinfestazione Punteruolo Rosso - Quark Srl
</h5>
<div itemprop="address" itemscope="" itemtype="http://schema.org/PostalAddress">
<span itemprop="streetAddress">
Via S. Pellico, 198/L
</span>
<br/>
<span itemprop="postalCode">
63039,
<span itemprop="addressLocality">
San Benedetto del Tronto
</span>
(AP)
</span>
<br/>
</div>
<span itemprop="telephone">
tel: 800 99 83 01
</span>
<br/>
<span>
sito:quarksrl.it
</span>
<br/>
<span>
parole chiave:
<strong>
derattizzazione,consulenza ambientale,disinfestazione ratti,allontanamento piccioni,punteruolo rosso
</strong>
</span>
</div>
</li>
Case2:
<li style="padding:5px;border-bottom:1px solid #ccc">
<div itemscope="" itemtype="http://schema.org/LocalBusiness">
<h5 itemprop="name">
V&b Home Comfort
</h5>
<div itemprop="address" itemscope="" itemtype="http://schema.org/PostalAddress">
<span itemprop="streetAddress">
via delle Torri, 5
</span>
<br/>
<span itemprop="postalCode">
63100,
<span itemprop="addressLocality">
Ascoli Piceno
</span>
(AP)
</span>
<br/>
</div>
<span>
sito:vebhomecomfort.it
</span>
<br/>
</div>
</li>
in case 1 the text 'parole chiave:' is present so I want to fetch the data which is thereafter and in case 2 element is not present so I want None or 'Empty Text' there.
or is there any way to do the same in scrapy?
I really appreciate your efforts in taking out time thanks!
If txt is the string from case 1 + case 2, then you cam use this script to extract the elements:
from bs4 import BeautifulSoup
soup = BeautifulSoup(txt, 'html.parser')
for li in soup.select('li'):
name = li.select_one('h5').get_text(strip=True, separator=' ')
address = li.select_one('[itemprop="streetAddress"]').get_text(strip=True, separator=' ')
postal_code = li.select_one('[itemprop="postalCode"]').get_text(strip=True, separator=' ')
address_locality = li.select_one('[itemprop="addressLocality"]').get_text(strip=True, separator=' ')
telephone = li.select_one('[itemprop="telephone"]')
telephone = telephone.get_text(strip=True, separator=' ') if telephone else '-'
web = li.find(lambda t: t.name=='span' and t.get_text(strip=True).startswith('sito:'))
web = web.get_text(strip=True, separator=' ').replace('sito:', '') if web else '-'
keywords = li.find(lambda t: t.name=='span' and t.get_text(strip=True).startswith('parole chiave:'))
keywords = keywords.get_text(strip=True, separator=' ').replace('parole chiave:', '').split(',') if keywords else []
print(name)
print(address)
print(postal_code)
print(address_locality)
print(telephone)
print(web)
print(keywords)
print('-' * 80)
Prints:
Derattizzazione Disinfestazione Punteruolo Rosso - Quark Srl
Via S. Pellico, 198/L
63039, San Benedetto del Tronto (AP)
San Benedetto del Tronto
tel: 800 99 83 01
quarksrl.it
[' derattizzazione', 'consulenza ambientale', 'disinfestazione ratti', 'allontanamento piccioni', 'punteruolo rosso']
--------------------------------------------------------------------------------
V&b Home Comfort
via delle Torri, 5
63100, Ascoli Piceno (AP)
Ascoli Piceno
-
vebhomecomfort.it
[]
--------------------------------------------------------------------------------

Python web scraping style content

I just want to pull data from HTML by using python.(I need data = 20%)
Any help on this would be greatly appreciated.
<div class="ratings-container">
<div class="ratings">
<div class="ratings active" style="width: 20%"></div>
</div>
</div>
I don't know how to get the style content. The following similar code's result is NULL:
mratingNew = (tag.findAll('div',attrs={"class":"ratings active"}))
for i in range(len(muserName)):
print(mratingNew[i].['style'])
You can get width with using find and can split it according to :
from bs4 import BeautifulSoup
html = '''<div class="ratings-container">
<div class="ratings">
<div class="ratings active" style="width: 20%"></div>
</div>
</div>'''
soup = BeautifulSoup(html,"html.parser")
finddiv = soup.find('div',attrs={'class':'ratings active'})
style = finddiv['style']
style = style.split(':',1)[-1]
print style
OUTPUT :
20%
If you have more than one width with the same class name like :
html = '''<div class="ratings-container">
<div class="ratings">
<div class="ratings active" style="width: 20%"></div>
<div class="ratings active" style="width: 40%"></div>
<div class="ratings active" style="width: 30%"></div>
</div>
</div>'''
You need to use findAll and split it one by one
find_last_div = soup.findAll('div',attrs={'class':'ratings active'})
for width_value in find_last_div:
width_Get = width_value['style'].split(':',1)[-1]
print width_Get
OUTPUT :
20%
40%
30%

Need to get a specific class exists in HTML body

I am trying to check if class = "special-price" exists in below code.
Here is html code :
<div class="product-shop">
<div class="f-fix">
<h2 class="product-name newname"> Xiaomi Mi Band 2 Strap (Black with White Border) </h2>
<!--product price-->
<div class="text-center ">
<div class="price-box">
<p class="old-price"> <span class="price-label">Regular Price:</span >
<span class = "price" id = "old-price-8846" > ৳200 </span>
</p >
<p class = "special-price" >
<span class = "price-label"> Special Price </span>
<span class="price" itemprop="price" content="149" id="product-price-8846"> ৳149 </span>
</p>
</div>
</div >
</div>
I am using Scrapy with python. After checking if the class found I need to collect text of class="price".
Did you try something like:
if response.css('.special-price'):
price = response.css('.price::text').get() # or do whatever you need
or for short:
price = response.css('.special-price .price::text').get()
it will give you None in case there is no element with special-price class.

How to select only divs with specific children span with xpath python

I am currently trying to scrap information of a particular ecommerce site and i only want to get product information like product name, price, color and sizes of only products whose prices have been slashed.
i am currently using xpath
this is my python scraping code
from lxml import html
import requests
class CategoryCrawler(object):
def __init__(self, starting_url):
self.starting_url = starting_url
self.items = set()
def __str__(self):
return('All Items:', self.items)
def crawl(self):
self.get_item_from_link(self.starting_url)
return
def get_item_from_link(self, link):
start_page = requests.get(link)
tree = html.fromstring(start_page.text)
names = tree.xpath('//span[#class="name"][#dir="ltr"]/text()')
print(names)
Note this is not the original URL
crawler = CategoryCrawler('https://www.myfavoriteecommercesite.com/')
crawler.crawl()
When the program is Run ... These are the HTML Content Gotten from the E-commerce Site
Div of Products With Price Slash
div class="products-info">
<h2 class="title"><span class="brand ">Apple </span> <span class="name" dir="ltr">IPhone X 5.8-Inch HD (3GB,64GB ROM) IOS 11, 12MP + 7MP 4G Smartphone - Silver</span></h2>
<div class="price-container clearfix">
<span class="sale-flag-percent">-22%</span>
<span class="price-box ri">
<span class="price ">
<span data-currency-iso="NGN">₦</span>
<span dir="ltr" data-price="388990">388,990</span>
</span>
<span class="price -old ">
<span data-currency-iso="NGN">₦</span>
<span dir="ltr" data-price="500000">500,000</span>
</span>
</span>
</div>
div
Div of Products with No Price Slash
div class="products-info">
<h2 class="title"><span class="brand ">Apple </span> <span class="name" dir="ltr">IPhone X 5.8-Inch HD (3GB,64GB ROM) IOS 11, 12MP + 7MP 4G Smartphone - Silver</span></h2>
<div class="price-container clearfix">
<span class="price-box ri">
<span class="price ">
<span data-currency-iso="NGN">₦</span>
<span dir="ltr" data-price="388990">388,990</span>
</span>
</span>
</div>
div
Now this is my exact Question
i want to know how to select only the parent divs i.e
div class="price-container clearfix"> that also contains any of these children span classes
span class="price -old "> or
span class="sale-flag-percent">
Thank you all
One solution would be get all <div class="price-container clearfix"> and iterate, checking with the string of the whole element that your keywords exist.
But a better solution would be to use conditionals with xpath:
from lxml import html
htmlst = 'your html'
tree=html.fromstring(htmlst)
divs = tree.xpath('//div[#class="price-container clearfix" and .//span[#class = "price -old " or #class = "sale-flag-percent"] ]')
print(divs)
This get all divs where class="price-container clearfix" and then check if contains span with the searched classes.

How to create list of web elements?

I am trying to make a list of web elements, but it can not seem to find the elements on the web page, although did worked 3 days ago and i can not find any changes in the web page.
this is the html code :
<li id="wlg_41410" class="leagueWindow " dataid="41410">
<h5 style="cursor: pointer; cursor: hand;" onclick="TodaysEventsLeagueWindow.minimizeRestoreClick(41410)">Europa League</h5>
<div class="bet_type select" id="_bet_types"></div>
<div class="bet_type lastscore ">
<h6>1X2 FT </h6>
<div class="types_bg">
<!--[if IE]> <div id="IEroot"> <![endif]-->
<div class="first_buttons_line">
</div>
<!--[if IE]> </div> <![endif]-->
<div class="time"> 23/11 | 18:00 </div>
<div class="bets ml">
</div>
<div class="time"> 23/11 | 20:00 </div>
<div class="bets ml">
</div>
<div class="time"> 23/11 | 20:00 </div>
<div class="bets ml">
</div>
<div class="time"> 23/11 | 20:00 </div>
<div class="bets ml">
</div>
<div class="time"> 23/11 | 20:00 </div>
<div class="bets ml">
</div>
<div class="clr"></div>
</div>
</div> <span class="x" onclick="TodaysEventsLeagueWindow.closeLeagueWindow(41410)"></span>
</li>
i am trying to make a list from the <div class="bets ml"></div> elements
but keep getting the selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: element is not attached to the page document exception , as if selenium can't find the web element.
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import StaleElementReferenceException
import time
driver.get("https://www.luckia.es/apuestas")
WebDriverWait(driver, 10).until(EC.frame_to_be_available_and_switch_to_it("sbtechBC"))
eventos_de_hoy = driver.find_element_by_id("today_event_btn")
eventos_de_hoy.click()
ligi_len = len(WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.CLASS_NAME, "leagueWindow "))))
print(ligi_len)
for index in range(ligi_len):
item = WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.CLASS_NAME, "leagueWindow ")))[index]
driver.execute_script("arguments[0].scrollIntoView(true);", item)
nume_liga = item.find_element_by_tag_name("h5").text
time.sleep(3)
print('try', nume_liga)
meci = item.find_elements_by_xpath("//*[#class='bets ml']")
print("there are", len(meci), "in one liga")
the reason for the index is that the iframe refreshes every 25 sec.
i also tried meci = item.find_elements_by_css_selector('.bets.ml') and meci = item.find_elements_by_class_name('ml')
Why should i be able to extract the <h5></h5> element and not the other elements?
From your code block, its pretty clear you have just managed to cover up the real issue through time.sleep(3) as follows :
nume_liga = item.find_element_by_tag_name("h5").text
time.sleep(3)
print('try', nume_liga)
While invoking print() for a text, I am not sure why time.sleep(3) was induced. So our main issue got covered up there. But as the List was already created, you are able to print('try', nume_liga)
But next, when you do meci = item.find_elements_by_xpath("//*[#class='bets ml']") you face a StaleElementReferenceException because the HTML DOM have changed.
A closer look into the <h5> tag reveals it have a onclick() event as :
<h5 style="cursor: pointer; cursor: hand;" onclick="TodaysEventsLeagueWindow.minimizeRestoreClick(41410)">Europa League</h5>
A wild guess, while invoking .text on <h5> tag, the HTML DOM changes.
Solution :
A possible solution with your current code block may be to use getAttribute("innerHTML") instead of .text. So your line of code will be :
nume_liga = item.find_element_by_tag_name("h5").get_attribute("innerHTML")

Resources