Selenium can't find ninth element - python - python-3.x

I have a problem with Selenium webdriver.
I want to log in a website and collect some articles from there. I could log in the website with my code, but couldn't find all elements.
This code can find all elements except ninth element of 'major' and 'notice'. (I don't know why but it could find ninth element of 'upload')
from selenium import webdriver
import MyCode
Data = []
....
driver = webdriver.Chrome()
driver.get(MyCode.url[0])
driver.implicitly_wait(10)
major = driver.find_elements_by_class_name('board-lecture-title')
notice = driver.find_elements_by_xpath('//*[#class=\'post-title\']/*')
upload = driver.find_elements_by_class_name('post-date')
....
driver.close()
This is a part of HTML code of the website
<li class="isnotice" style="width:calc(100% - 20px) !important;">
<span class="post-title">
<a href="https://url">
<span class="board-lecture-title">[Course]</span>Title
</a>
</span>
<br>
<span class="post-date">2020년 7월 30일, 목요일, 오전 10:58</span>
<span class="post-viewinfo area-right">
0
<br>
<span>View</span>
</span>
</li>
<li class="isnotice" style="width:calc(100% - 20px) !important;">
<span class="post-title">
<a href="https://url">
<span class="board-lecture-title">[Course]</span>Title
</a>
</span>
<br>
<span class="post-date">2020년 7월 15일, 수요일, 오후 12:20</span>
<span class="post-viewinfo area-right">
0
<br>
<span>View</span>
</span>
</li>
<li class="isnotice" style="width:calc(100% - 20px) !important;">
<span class="post-title">
<a href="https://url">
<span class="board-lecture-title">[Course]</span>Title
</a>
</span>
<br>
<span class="post-date">2020년 6월 29일, 월요일, 오전 11:18</span>
<span class="post-viewinfo area-right">
47
<br>
<span>View</span>
</span>
....
</ul>
I'm sorry for my poor english give you confusion.

Related

How do I scrape some text from <span> with no unique class identifier?

I am new to scraping so please be patient with me. I have this HTML code and I want to extract the type of property e.g. ‘Apartment’, the no. of beds e.g. 2 and the location e.g. ‘Birmingham’ only. I want to save each of these in a list. The problem is that there’s no unique class identifier.
<div class="extra">
<span class="tablet-visible">
<span class="item"><label><i class="ouricon classified"></i><b></b></label>
<span>For Sale</span></span>
</span>
<span class="tablet-visible">
<span class="item"><label><i class="ouricon house"></i><b></b></label>
<span>Apartment</span></span>
</span>
<span class="">
<span class="item"><label><i class="ouricon bed"></i><b></b></label>
<span>2</span>
</span>
</span>
<span class="">
<span class="item"><label><i class="ouricon locationpin"></i><b></b></label>
<span>Birmingham</span>
</span>
</span>
</div>
I tried this code but of course this prints all the text in class=extra including the 'For Sale' which is not what I want.
results = requests.get(url)
soup = BeautifulSoup(results.text, "html.parser")
desc_div = soup.find_all('div', attrs={"data-itemid": True})
for property in desc_div:
extra = property.find('div', class_='extra')
print(extra.text.strip())
Any help would be much appreciated.
Since For Sale is in the same tag and class, just filter it out.
from bs4 import BeautifulSoup
html = """
<div class="extra">
<span class="tablet-visible">
<span class="item"><label><i class="ouricon classified"></i><b></b></label>
<span>For Sale</span></span>
</span>
<span class="tablet-visible">
<span class="item"><label><i class="ouricon house"></i><b></b></label>
<span>Apartment</span></span>
</span>
<span class="">
<span class="item"><label><i class="ouricon bed"></i><b></b></label>
<span>2</span>
</span>
</span>
<span class="">
<span class="item"><label><i class="ouricon locationpin"></i><b></b></label>
<span>Birmingham</span>
</span>
</span>
</div>
"""
soup = BeautifulSoup(html, "html.parser").find_all("span", {"class": "item"})
print([i.text.strip() for i in soup if i.text.strip() != "For Sale"])
Output:
['Apartment', '2', 'Birmingham']

Unable to click in a dropdown menu with Selenium

I'm trying to click on a link inside a dropdown menu, but I keep getting a TimeoutException, using XPATH.
test = WebDriverWait(driver, 30).until(EC.element_to_be_clickable((By.XPATH, "/html/body/div/div[1]/div[2]/div/ul/li[4]/ul/li[7]/a")))
driver.execute_script("arguments[0].click();", test)
But if I try to click on the Log Out option the script sometimes works.
logOut = WebDriverWait(driver, 30).until(EC.element_to_be_clickable((By.XPATH, "/html/body/div/div[1]/div[2]/div/ul/li[7]/ul/li/a")))
driver.execute_script("arguments[0].click();", logOut)
There are 3 dropdown menus. The test is in a menu with some 12 options and the Log Out is in another by itself. I want to understand what I'm doing wrong.
Here's the code for the menus.
<body>
<div class="container-fluid">
<div class="row headerLogo">
<!--<header class="navbar-default navbar-static-top headerLogo">-->
<div class="col-md-2">
<div class="vericaltext">WIISCPRD23V</div>
<img src="/Content/images/Logo_menu2.png" class="logoSea" alt='SEA' />
</div>
<div class="col-md-10">
<div class="row flat-nav">
<li class="color20 effect3">
<a><i class="fa fa-comments-o fa-2x"></i><span>Peq</span></a>
<ul class="column-based">
<li class="color20" style="font-weight:bold">Question</li>
<li>Quest</li>
<li>Retr</li>
<li class="color20" style="font-weight:bold">Vitss</li>
<li>Vit</li>
<li class="color20" style="font-weight:bold">BC</li>
<li>Trat</li>
<li>BC</li>
<li class="color20" style="font-weight:bold">CAD</li>
<li>Ant</li>
<li class="color20" style="font-weight:bold">Fer</li>
<li>Add</li>
<li>Emp</li>
<li>Est</li>
<li>Seg</li>
<li>Rec</li>
<li>Cal</li>
</ul>
</li>
<li class="color49 effect3">
<a><i class="fa fa-envelope-o fa-2x"></i><span>E-mails</span></a>
<li class="color5 effect3 divLogout">
<a>
<div style="width:100%;padding-top:15px">
<div class="divAlinhadaEsquerda"><i class="fa fa-user-circle fa-3x"></i></div>
<div class="divAlinhadaEsquerda">
<div class="fontNomUsr">xxx</div>
<div class="fontUser">xxx
</div>
<div class="fontUlti">xxx</div>
</div>
</div>
</a>
<ul class="column-based">
<li><i class="fa fa-power-off"></i><span>Sair</span></li>
</ul>
</li>
</ul>
Your xpath is incorrect, please try below solution :
Solution 1:
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
PesquisasElement=WebDriverWait(browser, 10).until(
EC.visibility_of_element_located((By.XPATH, "//*[contains(text(), 'Pesquisas')]")))
TratamentoElement=WebDriverWait(browser, 10).until(
EC.element_to_be_clickable((By.XPATH, "//a[#href='/ManterBC']")))
#Create the object for Action Chains
actions = ActionChains(driver)
actions.move_to_element(PesquisasElement).move_to_element(TratamentoElement).click()
actions.perform()

BeautifulSoup .find() capturing too much text (how do I narrow it down?)

wondering how to target the "Switch" text on the below html:
<div class="product_title">
<a href="/game/pc/into-the-breach" class="hover_none">
<h1>Into the Breach</h1>
</a>
<span class="platform">
<a href="/game/pc">
PC
</a>
</span>
</div>
<div class="product_data">
<ul class="summary_details">
<li class="summary_detail publisher" >
<span class="label">Publisher:</span>
<span class="data">
<a href="/company/subset-games" >
Subset Games
</a>
</span>
</li>
<li class="summary_detail release_data">
<span class="label">Release Date:</span>
<span class="data" >Feb 27, 2018</span>
</li>
<li class="summary_detail product_platforms">
<span class="label">Also On:</span>
<span class="data">
Switch </span>
</li>
</ul>
</div>
so far I am capturing the "Also On:" text as well (with a lot of spaces) with this code:
self.playable_on_systems_label.setText(self.html_soup.find("span", class_='platform').text.strip() + ', ' + self.html_soup.find("li", class_='summary_detail product_platforms').text.strip())
how do I capture (in this case) only the "Switch" text?
FYI - for the first half of the statement (capturing the "PC") text works fine just not the "also on" text
Thanks in advance,
Your query is getting the entire span element with class="summary_detail product_platforms", which is going to include all the text starting from "Also On:" until "Switch." Try something like .find('a', href=re.compile("^.+switch.+$")) or alternately (using CSS) .select("a[href*=switch]") (solution from here)
you can use BeautifulSoup select() function to navigate the the "Switch" text, check this code!!!
rom bs4 import BeautifulSoup
html = '''<div class="product_title">
<a class="hover_none" href="/game/pc/into-the-breach">
<h1>Into the Breach</h1>
</a>
<span class="platform">
<a href="/game/pc">
PC
</a>
</span>
</div>
<div class="product_data">
<ul class="summary_details">
<li class="summary_detail publisher">
<span class="label">Publisher:</span>
<span class="data">
<a href="/company/subset-games">
Subset Games
</a>
</span>
</li>
<li class="summary_detail release_data">
<span class="label">Release Date:</span>
<span class="data">Feb 27, 2018</span>
</li>
<li class="summary_detail product_platforms">
<span class="label">Also On:</span>
<span class="data">
<a class="hover_none" href="/game/switch/into-the-breach">Switch</a> </span>
</li>
</ul>
</div>'''
soup = BeautifulSoup(html, 'html.parser')
text = soup.select('.summary_detail.product_platforms .hover_none')[0].text.strip()
print(text)
Output:
Switch

Why does Google not accept my address for jobLocation in microdata?

What is the problem with my code when tested with https://search.google.com/structured-data/testing-tool?
I have tried to set up structured data for JobPosting and then added location. But i dont have an exact postal address: so no zip code and no street just a city.
<div itemscope itemtype="http://schema.org/JobPosting">
<h2 itemprop="title">Data Analyst</h2>
<span itemprop="description">
<strong>Company:</strong>
<span itemprop="hiringOrganization">
<span itemscope itemtype="http://schema.org/Organization">
<span itemprop="name">
Euro London Banking and Finance Germany
</span>
</span>
</span>
</span>
<p><strong>Location:</strong>
<span itemprop="jobLocation">
<span itemscope itemtype="http://schema.org/Place">
<span itemprop="address">
<span itemscope itemtype="http://schema.org/PostalAddress">
<span itemprop="addressLocality">Mycityname</span>
</span>
</span>
</span>
</span>
</p>
<p><strong>Employment type:</strong>
<span itemmprop="employmentType">
Full-time
</span>,
<span itemprop="workHours">
40 hours per week
</span>
</p>
<p><strong>Base salary:</strong>
<span itemprop="salaryCurrency">
EUR
</span>
<span itemprop="baseSalary">
35000
</span>
</p>
<p><strong>Responsabilities:</strong></p>
<ul itemprop="responsibilities">
<li>a</li>
<li>b</li>
<li>c</li>
</ul>
<p><strong>Educational requirements:</strong>
<span itemprop="educationRequirements">
Bachelor's degree
</span>
</p>
<p><strong>Experience requirements:</strong>
<span itemprop="experienceRequirements">
At least 2 years of working experience, however recent graduates with relevant technical knowledge and experience through
internships, etc. will also be considered
</span>
</p>
<p><strong>Qualifications:</strong></p>
<ul itemprop="qualifications">
<li>Profound knowledge of SQL Server and relational databases</li>
<li>Profound knowledge of Visual Basic for Applications</li>
<li>Profound knowledge of Microsoft Excel and Access</li>
<li>Knowledge in ASP.Net and HTML is preferred</li>
<li>Fluent in English; knowledge of the German language is preferred but not a must</li>
</ul>
<p><strong>Skills:</strong></p>
<ul itemprop="skills">
<li>Good analytical skills</li>
<li>Good communication and interpersonal skills</li>
<li>Ability to work in teams</li>
<li>Stress resilient, goal-oriented and efficient</li>
</ul>
<p><strong>Date posted:</strong>
<span itemprop="datePosted">
2011-11-29
</span>
</p>
</div>
To me it seems that I can have text value for addres property. that is also what schema.org says. still it does not get verified :(
Use itemprop and itemtype in the same span
<span itemprop="jobLocation" itemscope itemtype="http://schema.org/Place">
<span itemprop="address" itemscope itemtype="http://schema.org/PostalAddress">
<span itemprop="addressLocality">Mycityname</span>
</span>
</span>
The Place type can accept Text as well as a postalAddress type (http://schema.org/address) therefore you can do the following as a minimum and still be valid:
"jobLocation": {
"#type": "Place",
"address": "Central City"
}
I just tested it and although you will get warnings, you will get no errors.

How do I select a JS tab in watir-webdriver?

I have a webpage which has tab list, the HTML looks like this for this piece:
<div id="content">
<div class="col span-6">
<div class="section first no-border">
<h2>New Search</h2>
<ul class="tabs clear">
<li id="simple-li" class="current">
<a onclick="switch_search_type('SimpleSearch');; return false;" href="#">Simple</a>
</li>
<li id="structured-li">
<a onclick="redirect_to_search('/search/structured_searches/new'); return false;" href="#">Wizard</a>
</li>
<li id="advanced-li" class="">
</li>
<li id="custom-li" class="">
<a onclick="switch_search_type('ComplexQuerySearch');; return false;" href="#">Custom</a>
</li>
</ul>
<div class="tabbed-panel">
I want to select the "Custom" item in this tab list. I tried multiple things but have failed, some of the things I tried:
browser.li(:id, "custom-li").click
browser.select_list(:id, "custom-li").set("Custom")
browser.link(:xpath, "id('custom-li')/x:a").click
browser.select_list(:id => 'custom-li').select "Custom"
I am new to watir-webdriver. Any feedback and help is greatly appreciated.
Try this:
browser.a(:text => "Custom").click

Resources