I can't find how to get text with selenium python [duplicate] - python-3.x

This question already has answers here:
How to get text with Selenium WebDriver in Python
(9 answers)
Closed 3 years ago.
i try to get the content of
<span class="noactive">0 Jours restants</span>
(which is the expiration date of the warranty)
but i don't know how to get it (i need to print it in a file)
my code
def scrapper_lenovo(file, line):
CHROME_PATH = 'C:\Program Files (x86)\Google\Chrome\Application\chrome.exe'
CHROMEDRIVER_PATH = 'C:\webdriver\chromedriver'
WINDOW_SIZE = "1920,1080"
chrome_options = Options()
chrome_options.add_argument("--headless")
chrome_options.add_argument("--window-size=%s" % WINDOW_SIZE)
chrome_options.binary_location = CHROME_PATH
d = driver.Chrome(executable_path=CHROMEDRIVER_PATH,
chrome_options=chrome_options)
d.get("https://pcsupport.lenovo.com/fr/fr/warrantylookup")
search_bar = d.find_element_by_xpath('//*[#id="input_sn"]')
search_bar.send_keys(line[19])
search_bar.send_keys(Keys.RETURN)
time.sleep(4)
try:
warrant = d.find_element_by_xpath('//*[#id="W-Warranties"]/section/div/div/div[1]/div[1]/div[1]/p[1]/span')
file.write(warrant)
except:
print ("test")
pass
if ("In Warranty" not in d.page_source):
file.write(line[3])
file.write("\n")
d.close()
i tried as you can see to print the content of 'warrant' and i couldn't find any function allowing it (i saw some using .text(), .gettext() but for whatever reasons i couldn't get them to work).

You could try explicitly matching the required tag, the relevant XPath expression would be:
//span[#class='noactive']
I would also recommend getting of this time.sleep() function, it's some form of a performance anti-pattern, if you need to wait for presence/visibility/invisibility/absence of a certain element you should rather go for Explicit Wait
So remove these lines:
time.sleep(4)
warrant = d.find_element_by_xpath('//*[#id="W-Warranties"]/section/div/div/div[1]/div[1]/div[1]/p[1]/span')
and use this one instead:
warrant = WebDriverWait(driver, 10).until(expected_conditions.presence_of_element_located((By.XPATH, "//span[#class='noactive']")))
More information: How to use Selenium to test web applications using AJAX technology

Related

web scraping data from glassdoor using selenium

please I need some help to run this code (https://github.com/PlayingNumbers/ds_salary_proj/blob/master/glassdoor_scraper.py)
In order to scrape job offer data from Glassdoor
Here's the code snippet:
from selenium.common.exceptions import NoSuchElementException, ElementClickInterceptedException
from selenium import webdriver
import time
import pandas as pd
options = webdriver.ChromeOptions()
#Uncomment the line below if you'd like to scrape without a new Chrome window every time.
#options.add_argument('headless')
#Change the path to where chromedriver is in your home folder.
driver = webdriver.Chrome(executable_path=path, options=options)
driver.set_window_size(1120, 1000)
url = "https://www.glassdoor.com/Job/jobs.htm?suggestCount=0&suggestChosen=false&clickSource=searchBtn&typedKeyword="+'data scientist'+"&sc.keyword="+'data scientist'+"&locT=&locId=&jobType="
#url = 'https://www.glassdoor.com/Job/jobs.htm?sc.keyword="' + keyword + '"&locT=C&locId=1147401&locKeyword=San%20Francisco,%20CA&jobType=all&fromAge=-1&minSalary=0&includeNoSalaryJobs=true&radius=100&cityId=-1&minRating=0.0&industryId=-1&sgocId=-1&seniorityType=all&companyId=-1&employerSizes=0&applicationType=0&remoteWorkType=0'
driver.get(url)
#Let the page load. Change this number based on your internet speed.
#Or, wait until the webpage is loaded, instead of hardcoding it.
time.sleep(5)
#Test for the "Sign Up" prompt and get rid of it.
try:
driver.find_element_by_class_name("selected").click()
except NoSuchElementException:
pass
time.sleep(.1)
try:
driver.find_element_by_css_selector('[alt="Close"]').click() #clicking to the X.
print(' x out worked')
except NoSuchElementException:
print(' x out failed')
pass
#Going through each job in this page
job_buttons = driver.find_elements_by_class_name("jl")
I'm getting an empty list
job_buttons
[]
Your problem is with wrong except argument.
With driver.find_element_by_class_name("selected").click() you are trying to click non-existing element. There is no element matching "selected" class name on that page. This causes NoSuchElementException exception as you can see yourself while you are trying to catch ElementClickInterceptedException exception.
To fix this you should use the correct locator or at least the correct argument in except.
Like this:
try:
driver.find_element_by_class_name("selected").click()
except NoSuchElementException:
pass
Or even
try:
driver.find_element_by_class_name("selected").click()
except:
pass
I'm not sure what elements do you want to get into job_buttons.
The search results containing all the details per each job can be found by this:
job_buttons = driver.find_elements_by_css_selector("li.react-job-listing")

Button click works inconsistently (hangs) on Chrome Webdriver while executing from Python Selenium

I am using Selenium to automate a few browser actions on a particular website and I am using the below set of tools to achieve it.
Python 3.8
Selenium
Chrome Web Driver 79.0
Chrome 79.0
The tasks that I do are just fill up a form and then click on the submit button in the form. And this works most of the time except that sometimes it just won't no matter what! The filling up of form is very smooth and then, when the click on the submit button happens then Chrome just hangs there forever. There's no error on the console whatsoever and googling this issue I see this is such a common occurrence and almost all of the solution out there are just workarounds and not actual fixes. And I have tried almost all of them but to no avail. There's an issue on the selenium GitHub page as well which the maintainers weren't interested in much and closed it. How do I even go about resolving this issue. At this moment I am out of ideas actually. Any help would be appreciated. Thanks.
Below is the source code that I am trying to execute.
import time
from selenium import webdriver
import os
import csv
from datetime import datetime
url = 'https://www.nseindia.com/products/content/equities/equities/eq_security.htm'
xpath_get_data_button = '//*[#id="get"]'
xpath_download_link = '/html/body/div[2]/div[3]/div[2]/div[1]/div[3]/div/div[3]/div[1]/span[2]/a'
nse_list_file = 'nse_list.csv'
wait_time = 5
wait_time_long = 10
start_year = 2000
stop_year = 2019
curr_year = start_year
browser = webdriver.Chrome("E:/software/chromedriver_win32/chromedriver.exe")
browser.get(url)
time.sleep(wait_time)
with open(nse_list_file, 'r') as file:
reader = csv.reader(file)
for row in reader:
scrip = row[0]
year_registered = datetime.strptime(row[1], '%d-%m-%Y').year
if year_registered > start_year:
curr_year = year_registered
else:
curr_year = start_year
try:
browser.find_element_by_class_name('reporttitle1').clear()
browser.find_element_by_class_name('reporttitle1').send_keys(scrip)
browser.find_element_by_id('rdDateToDate').click()
while curr_year <= stop_year:
from_date = '01-01-' + str(curr_year)
to_date = '31-12-' + str(curr_year)
browser.find_element_by_id('fromDate').clear()
browser.find_element_by_id('fromDate').send_keys(from_date)
browser.find_element_by_id('toDate').clear()
browser.find_element_by_id('toDate').send_keys(to_date)
time.sleep(wait_time)
browser.find_element_by_xpath(xpath_get_data_button).click()
time.sleep(wait_time_long)
download_link_element = browser.find_element_by_xpath(xpath_download_link).click()
curr_year = curr_year + 1
except Exception as ex:
print('Could not find download link')
print(str(ex))
if os.path.isfile("stop_loading"):
break
browser.quit()
print('DONE')

ElementNotVisibleException Error with xpath selenium python

Hi I am scraping a website and I am using selenium python. The code is as below
url ='https://www.hkexnews.hk/'
options = webdriver.ChromeOptions()
browser = webdriver.Chrome(chrome_options=options, executable_path=r'chromedriver.exe')
browser.get(url)
tier1 = browser.find_element_by_id('tier1-select')
tier1.click()
tier12 = browser.find_element_by_xpath('//*[#data-value="rbAfter2006"]')
tier12.click()
time.sleep(1)
tier2 = browser.find_element_by_id('rbAfter2006')
tier2.click()
tier22 = browser.find_element_by_xpath("//*[#id='rbAfter2006']//*[#class='droplist-item droplist-item-level-1']//*[text()='Circulars']")
tier22.click()
tier23 = browser.find_element_by_xpath("//*[#id='rbAfter2006']//*[#class='droplist-item droplist-item-level-2']//*[text()='Securities/Share Capital']")
tier23.click()
tier24 = browser.find_element_by_xpath("//*[#id='rbAfter2006']//*[#class='droplist-group droplist-submenu level3']//*[text()='Issue of Shares']")
tier24.click()
It works stops at tier23 by showing ElementNoVisibleException. I have tried with different class, yet it seems like not working. Thank you for you help
There are two elements that can be selected by your XPath. The first one is hidden. Try below to select required element:
tier23 = browser.find_element_by_xpath("//*[#id='rbAfter2006']//li[#aria-expanded='true']//*[#class='droplist-item droplist-item-level-2']//*[text()='Securities/Share Capital']")
or shorter
tier23 = browser.find_element_by_xpath("//li[#aria-expanded='true']//a[.='Securities/Share Capital']")
tier23.location_once_scrolled_into_view
tier23.click()
P.S. Note that that option will still be not visible because you need to scroll list down first. I used tier23.location_once_scrolled_into_view for this purpose
Also it's better to use Selenium built-in Waits instead of time.sleep

How do I relocate/disable GeckoDriver's log file in selenium, python 3?

Ahoy, how do I disable GeckoDriver's log file in selenium, python 3?
If that's not possible, how do I relocate it to Temp files?
To relocate the GeckoDriver logs you can create a directory within your project space e.g. Log and you can use the argument log_path to store the GeckoDriver logs in a file as follows :
from selenium import webdriver
driver = webdriver.Firefox(executable_path=r'C:\path\to\geckodriver.exe', log_path='./Log/geckodriver.log')
driver.get('https://www.google.co.in')
print("Page Title is : %s" %driver.title)
driver.quit()
ref: 7. WebDriver API > Firefox WebDriver
according to the documents, you can relocate it to Temp following:
from selenium import webdriver
from selenium.webdriver.firefox.options import Options$
import os
options = Options()
driver = webdriver.Firefox(executable_path=geckodriver_path, service_log_path=os.path.devnull, options=options)
Following argument are deprecated:
firefox_options – Deprecated argument for options
log_path – Deprecated argument for service_log_path
using WebDriver(log_path=path.devnull) and WebDriver(service_log_path=path.devnull are both deprecated at this point in time, both result in a warning.
using a service object is now the prefered way of doing this:
from os import path
from selenium.webdriver.firefox.service import Service
from selenium.webdriver.firefox.webdriver import WebDriver
service = Service(log_path=path.devnull)
driver = WebDriver(service=service)
driver.close()
You should be using the service_log_path, as of today the log_path is deprecated, example with pytest:
#pytest.mark.unit
#pytest.fixture
def browser(pytestconfig):
"""
Args:
pytestconfig (_pytest.config.Config)
"""
driver_name = pytestconfig.getoption('browser_driver')
driver = getattr(webdriver, driver_name)
driver = driver(service_log_path='artifacts/web_driver-%s.log' % driver_name)
driver.implicitly_wait(10)
driver.set_window_size(1200, 800)
yield driver
driver.quit()
No #hidehara, but I found a way how to do it. I looked up the file init in the Selenium2Library directory. in my case: C:\Users\Eigenaardig\AppData\Local\Programs\Python\Lib\site-packages\SeleniumLibrary
there I added these 2 lines...
from selenium import webdriver
driver = webdriver.Firefox(executable_path=r'C:\Users\Eigenaar\eclipse-workspace\test\test\geckodriver.exe', log_path='./Log/geckodriver.log')
created the directory LOG (in Windows Explorer)
helaas, that started 2 instances.
I added in a separate library (.py file)
which looks like this (for test purposes):
import time
import random
from selenium import webdriver
driver = webdriver.Firefox(executable_path=r'C:\Users\specimen\RobotFrameWorkExperienced\RobotLearn\Log\geckodriver.exe', service_log_path='./Log/geckodriver.log')
class CustomLib:
ROBOT_LIBRARY_SCOPE = 'RobotLearn'
num = random.randint(1, 18)
if num % 2 == 0:
def get_current_time_as_string(self):
localtime = time.localtime()
formatted_time = time.strftime("%Y%m%d%H%M%S", localtime)
return formatted_time
else:
def get_current_time_as_string(self):
localtime = time.localtime()
formatted_time = time.strftime("%S%m%d%H%M%Y", localtime)
return formatted_time
But now it opens up 2 instances,
1 runs correct,
1 stays open and does nothing furthermore.
help help.
If it all for some reason does not work. (which was the case in our case).
then go to this (relative) directory:
C:\Users\yourname\AppData\Local\Programs\Python\Python38\Lib\site-packages\SeleniumLibrary\keywords\webdrivertools
there is a file called: webdrivertools.py
on line 157 you can edit
service_log_path='./robots/robotsiot/Results/Results/Results/geckoresults', executable_path=executable_path,
advantages:
#1 if you're using something like Github and you synchronize a directory then the log files are kept separate.
#2 the original file of the previous run gets overwritten
(if that is what you want, but in some cases that is exactly how you need it).
note: the section written above is in case you're using FireFox, if you are using another browser you'll have to edit it on a different line.
note2: this path overrides on a high level, so arguments in Eclipse->Robot framework will not have any effect anymore.
use this option with caution: it's sort of a last resort if the other options don't work!

How to automate the crawling without hardcoding any number to it?

I've written a script using python with selenium to scrape names of restaurants from a webpage. It is working great if I hardcode the number of amount I want to parse. The page has got lazy-loading process and it displays 40 names in each scroll. However, my script can handle it. The only thing I would like to improve in my script is that I do not wish to hardcode the number; rather, I want it to detect itself how many are there and parse it successfully. Hope there is someone to help. Here is the code:
from selenium import webdriver
import time
driver = webdriver.Chrome()
driver.get('https://www.yellowpages.ca/search/si/1/pizza/Toronto')
while True:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(3)
links = [posts.text for posts in driver.find_elements_by_xpath("//div[#itemprop='itemListElement']//h3[#itemprop='name']/a")]
if (len(links) == 240):
break
for link in links:
print(link)
driver.quit()
You can check if the number of links has changed in the last iteration
num_Of_links = -1
num = 0
while num != num_Of_links:
num_Of_links = num
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(3)
links = [posts.text for posts in driver.find_elements_by_xpath("//div[#itemprop='itemListElement']//h3[#itemprop='name']/a")]
num = len(links)

Resources