Python & Selenium: How to wait until is text is present to continue? - python-3.x

I am trying to automate an extraction of stock prices in my broker website because yahoo and google finance have delays. But i need the code to wait for the 'home-broker' to be online so it can continue with scraping...
Here is my code:
expected = 'online'
while True:
try:
driver.find_element_by_xpath('//*[#id="spnStatusConexao"]').text == expected
except NoSuchElementException:
print('offline')
else:
print('online')
But, while testing it, it prints 'online' even when the homebroker displays 'offline' message.
I need to print 'offline' when the xpath text is equal to: offline . And to print 'online' when xpath text is equal to: online.
EDIT:
outter HTML:
<span id="spnStatusConexao" hover="DV_bgConexao" class="StatusConexao online">online</span>
XPath:
//*[#id="spnStatusConexao"]
Full XPath:
/html/body/form/div[9]/div/div/p[2]/span

expected_conditions in Python has a built in operation for this called text_to_be_present_in_element. The below code snippet will wait for the span element to contain the text online:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
WebDriverWait(driver, 10).until(EC.text_to_be_present_in_element((By.ID, "spnStatusConexao"), 'online'))
If this does not work, you can try invoking WebDriverWait on the presence_of_element_located and include the text in your XPath query:
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, "//span[#id='spnStatusConexao' and contains(text(),'online')]")))

Related

Scrape Google Maps results Website URL selenium

I am trying to search with python via Google Maps and I want to get the URL from the results.
Following steps I approach:
open google
Accept cookies
Search for random thing (in this example "pediatrician in Aargau")
switch to google maps
This is where I get the error, as I am trying to wait for the results to load, but I always get a timeout. I can see in the window that opens, that the results are fully loaded.
Is there anything wrong with my code? I would like to extract the website URL of the results.
Here is the code that I have so far:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
# Start the browser
driver = webdriver.Chrome()
# Open google.de and accept cookies
driver.get("https://www.google.de/")
wait = WebDriverWait(driver, 25)
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "#L2AGLb > div"))).click()
# Search for "Kinderarzt Kanton Aargau"
search_box = driver.find_element(By.NAME, "q")
search_box.send_keys("Kinderarzt Kanton Aargau")
search_box.submit()
# Switch to Maps tab
wait.until(EC.element_to_be_clickable((By.XPATH, "//a[contains(text(), 'Maps')]"))).click()
# Wait for links and extract
results = wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div[aria-label^='Results for'] > div > div > a")))
for result in results:
link = result.get_attribute("href")
print(link)
# Close the browser
driver.quit()
PS: I have tried to increase the time for the webdriver, but that won't help. I think it can not find the object and there must be another way to identify the objects.
First, you can skip several steps by just building the URL for google maps with the desired search string. Second, your "Wait for results to load" locator was not on my page. My guess is that the class you are using is changing regularly. I used a different CSS selector and found it just fine.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
# Start the browser
driver = webdriver.Chrome()
# Declare string to search for and encode it
search_string = "Kinderarzt Kanton Aargau"
partial_url = search_string.replace(" ", "+")
# Open google.de and accept cookies
driver.get(f"https://www.google.de/maps/search/{partial_url}/")
wait = WebDriverWait(driver, 25)
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "#L2AGLb > div"))).click()
# Wait for links and extract
results = wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div[aria-label^='Results for'] > div > div > a")))
for result in results:
link = result.get_attribute("href")
print(link)
# Close the browser
driver.quit()
The result is
https://www.google.de/maps/place/Dr.+med.+Helena+Gerritsma+Schirlo/data=!4m7!3m6!1s0x47903be8d0d4a09d:0xc97d85a6fa076207!8m2!3d47.3906733!4d8.0443884!16s%2Fg%2F1tghc1gd!19sChIJnaDU0Og7kEcRB2IH-qaFfck?authuser=0&hl=en&rclk=1
https://www.google.de/maps/place/Kinderarztpraxis+Dr.+med.+Armin+B%C3%BChler+%26+Thomas+Justen/data=!4m7!3m6!1s0x479069d7b30c674b:0xd04693e64cbc42b0!8m2!3d47.5804824!4d8.2163541!16s%2Fg%2F1ptw0srs4!19sChIJS2cMs9dpkEcRsEK8TOaTRtA?authuser=0&hl=en&rclk=1
https://www.google.de/maps/place/Kinderarztpraxis+Lenzburg/data=!4m7!3m6!1s0x4790160e650976b1:0x5352d33510a53d99!8m2!3d47.3855278!4d8.1753395!16s%2Fg%2F11hz17jwcy!19sChIJsXYJZQ4WkEcRmT2lEDXTUlM?authuser=0&hl=en&rclk=1
https://www.google.de/maps/place/Kinderarzthaus+-+Kinderarztpraxis/data=!4m7!3m6!1s0x47903bf002633251:0xf029086640b016ee!8m2!3d47.391928!4d8.051698!16s%2Fg%2F11cfdn2j8!19sChIJUTJjAvA7kEcR7hawQGYIKfA?authuser=0&hl=en&rclk=1
https://www.google.de/maps/place/Dr.+med.+Nils+Hammerich/data=!4m7!3m6!1s0x4790160e650976b1:0x7116ed2cc14996ea!8m2!3d47.3856086!4d8.1753854!16s%2Fg%2F1tl0w7qv!19sChIJsXYJZQ4WkEcR6pZJwSztFnE?authuser=0&hl=en&rclk=1
https://www.google.de/maps/place/Kinderarzt+Berikon/data=!4m7!3m6!1s0x47900e152314a493:0x72ca7fe58b7b3a5f!8m2!3d47.3612625!4d8.3674472!16s%2Fg%2F11c311g_px!19sChIJk6QUIxUOkEcRXzp7i-V_ynI?authuser=0&hl=en&rclk=1
https://www.google.de/maps/place/Dr.+med.+Hana+Balent+Ilitsch/data=!4m7!3m6!1s0x4790697f95fe3a73:0xaff715a22ab56e78!8m2!3d47.5883105!4d8.2882387!16s%2Fg%2F11hyjwg_32!19sChIJczr-lX9pkEcReG61KqIV968?authuser=0&hl=en&rclk=1
https://www.google.de/maps/place/Dr.+med.+Belzer+Heierling+Tanja/data=!4m7!3m6!1s0x47906d2a4e9698fd:0x6865ac23234b8dc9!8m2!3d47.4637622!4d8.3284463!16s%2Fg%2F1tksm8d9!19sChIJ_ZiWTiptkEcRyY1LIyOsZWg?authuser=0&hl=en&rclk=1
https://www.google.de/maps/place/Praxis+f%C3%BCr+Kinder-+und+Jugendmedizin+Dr.+Dirk+Bock/data=!4m7!3m6!1s0x47906b5c9071d861:0x516c763f7642c9ff!8m2!3d47.4731839!4d8.1959905!16s%2Fg%2F11mpc9wm91!19sChIJYdhxkFxrkEcR_8lCdj92bFE?authuser=0&hl=en&rclk=1
https://www.google.de/maps/place/Alleviamed+Kinderarztpraxis+Meisterschwanden/data=!4m7!3m6!1s0x4790193bdf03b5f1:0xfef98e265772814a!8m2!3d47.2956342!4d8.2279202!16s%2Fg%2F11gr2z_z2f!19sChIJ8bUD3zsZkEcRSoFyVyaO-f4?authuser=0&hl=en&rclk=1
https://www.google.de/maps/place/Kinderarztpraxis+Suhrepark+AG/data=!4m7!3m6!1s0x47903c69ae471281:0xcb34880030319dd7!8m2!3d47.3727496!4d8.0809937!16s%2Fg%2F1v3kl_4v!19sChIJgRJHrmk8kEcR150xMACINMs?authuser=0&hl=en&rclk=1

ActionChains is not updating the data within the table

I am trying to scrape data from this website using Selenium.
There are three features in data, "Value", "Net change" and "percent change", including values for net and percentage changes for 1, 3, 6, and 12 months. I want to fetch 1 month's net change and percent change. For that, I need to click on the check boxes and click on the update button.
Now, I performed these actions using selenium's find element by XPath method but for percent change, I needed to use the ActionChains command, as I was getting "Element not clickable error".
When I execute the code, all three features should occur in the downloaded csv. But that's not happening. I am just able to fetch "Value" and "1 Month Net change". If anyone knows, may I know, why the is not getting updated and or how to fix it? Thanks
My code:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains
from selenium import webdriver
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
driver = webdriver.Chrome('chromedriver',chrome_options=chrome_options)
driver.get("https://beta.bls.gov/dataViewer/view/timeseries/CUUR0000SA0")
soup = BeautifulSoup(driver.page_source, "html.parser", from_encoding='utf-8')
driver.find_element(By.XPATH, '/html/body/div[2]/div/div/div[4]/div/div[1]/form/div[2]/fieldset/div[1]/table/tbody/tr[1]/td[1]/label/input').click() //1 month net change
element = WebDriverWait(driver, 60).until(EC.element_to_be_clickable((By.XPATH, '// [#id="percent_monthly_changes_div"]/table/tbody/tr[1]/td[1]/label/input')))
ActionChains(driver).move_to_element(element).click().perform() //1 month percent change
driver.find_element(By.XPATH, '/html/body/div[2]/div/div/div[4]/div/div[1]/form/div[4]/input').click() //update button
driver.find_element(By.XPATH, '//*[#id="csvclickCU"]').click() //download csv button
The website is showing N/A in the column of 1 Month Net change.
if you still not getting 1 month % change value you can do
driver.execute_script('document.querySelector("#percent_monthly_changes_div > table > tbody > tr:nth-child(1) > td:nth-child(1) > label > input").click()')
instead of:
element = WebDriverWait(driver, 60).until(EC.element_to_be_clickable((By.XPATH, '// [#id="percent_monthly_changes_div"]/table/tbody/tr[1]/td[1]/label/input')))
ActionChains(driver).move_to_element(element).click().perform()
this might not be the optimal solution, but it works fine.
and 1 month net change value is not given from the website itself.
To click on the elements with text as 1-Month Net Change and 1-Month % Change using ActionChains will be an overhead and you can avoid it easily.
Ideally, you need to induce WebDriverWait for the element_to_be_clickable() and you can use either of the following locator strategies:
Using CSS_SELECTOR:
driver.get("https://beta.bls.gov/dataViewer/view/timeseries/CUUR0000SA0")
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "input[value='1N']"))).click()
driver.find_element(By.CSS_SELECTOR, "input[value='1P']").click()
Using XPATH:
driver.get("https://beta.bls.gov/dataViewer/view/timeseries/CUUR0000SA0")
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//input[#value='1N']"))).click()
driver.find_element(By.XPATH, "//input[#value='1P']").click()
Note: You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
Browser Snapshot:

How do I wait until text of element changes to something else than a string?

I'm using selenium with Python 3 to retrieve the contents from an element on a webpage. There is an element with dynamic text.
<div id="accResp">Please wait...</div>
The "Please wait..." section changes to different strings like "Password Incorrect", "Loading".
How do I wait until the string changes to anything except "Please wait..."?
Based on your functionality you will see text in your web page so you can handle text using below xpath's:
wait = WebDriverWait(driver, 10)
password Incorrect
passwordIncorrect = wait.until(EC.element_to_be_clickable((By.XPATH, "//div[contains(text(),'Password Incorrect')]")))
Loading
loading = wait.until(EC.element_to_be_clickable((By.XPATH, "//div[contains(text(),'Loading')]")))
Please wait...
waitText= wait.until(EC.element_to_be_clickable((By.XPATH, "//div[contains(text(),'Please wait...')]")))
Note : please add below imports to your solution
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait

Attribute raises NoSuchElementException in selenium Python

I can get the element which attribute contains X but I can't get the attribute itself. Why?
This code does not raise any errors:
links = browser.find_elements_by_xpath('//div[#aria-label="Conversations"]//a[contains(#data-href, "https://www.messenger.com/t/")]')
This code raises NoSuchElementException error:
links = browser.find_elements_by_xpath('//div[#aria-label="Conversations"]//a[contains(#data-href, "https://www.messenger.com/t/")]/#data-href')
This code is extracted from Messenger main page with the chats. I would like to get all links in the ul list...
I don't get it... Any help please?
To get all the links data-href value you need to traverse links elements and use the get_attribute()
links = browser.find_elements_by_xpath('//div[#aria-label="Conversations"]//a[contains(#data-href, "https://www.messenger.com/t/")]')
ul_list=[link.get_attribute("data-href") for link in links]
print(ul_list)
wait = WebDriverWait(driver, 20)
lists=wait.until(EC.presence_of_all_elements_located((By.XPATH, '//div[#aria-label="Conversations"]//a[contains(#data-href, "https://www.messenger.com/t/")]/#data-href')))
for element in lists:
print element.text
Note: Add below imports to your solution
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait

Contains text in Selenium Python

I am trying to capture an Error which would restart my program and change proxy but I am unable to catch the error as its stored like this and classes are dynamically named :
<p class="g4Vm4">By signing up, you agree to our <a target="_blank" href="https://help.instagram.com/581066165581870">Terms</a> . Learn how we collect, use and share your data in our <a target="_blank" href="https://help.instagram.com/519522125107875">Data Policy</a> and how we use cookies and similar technology in our <a target="_blank" href="/legal/cookies/">Cookies Policy</a> .</p>
so I am trying to catch the xpath by this function but I am un able to do so.
def has_error(browser):
try: #/*[contains(text(), 'technology')]/html/body/span/section/main/div/article/div/div[1]/div/form/p"
browser.find_element_by_xpath("/html/body//*[contains(text(),'technology')]")
return False
except: return True
if not has_error(browser):
print('Error found! , aborted!')
browser.quit()
os.execv(sys.executable, ['python'] + sys.argv)
To Handle dynamic element use WebDriverwait and following Xpath Startegy.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions
element=WebDriverWait(driver,30).until(expected_conditions.element_to_be_clickable((By.XPATH,'//p[contains(.,"technology")]')))
print(element.text)
You can check if the source of the web-page contains special text.
if 'By signing up, you agree to our ' in browser.page_source:
pass
# TODO Exception

Resources