Scrape Google Maps results Website URL selenium - python-3.x

I am trying to search with python via Google Maps and I want to get the URL from the results.
Following steps I approach:
open google
Accept cookies
Search for random thing (in this example "pediatrician in Aargau")
switch to google maps
This is where I get the error, as I am trying to wait for the results to load, but I always get a timeout. I can see in the window that opens, that the results are fully loaded.
Is there anything wrong with my code? I would like to extract the website URL of the results.
Here is the code that I have so far:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
# Start the browser
driver = webdriver.Chrome()
# Open google.de and accept cookies
driver.get("https://www.google.de/")
wait = WebDriverWait(driver, 25)
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "#L2AGLb > div"))).click()
# Search for "Kinderarzt Kanton Aargau"
search_box = driver.find_element(By.NAME, "q")
search_box.send_keys("Kinderarzt Kanton Aargau")
search_box.submit()
# Switch to Maps tab
wait.until(EC.element_to_be_clickable((By.XPATH, "//a[contains(text(), 'Maps')]"))).click()
# Wait for links and extract
results = wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div[aria-label^='Results for'] > div > div > a")))
for result in results:
link = result.get_attribute("href")
print(link)
# Close the browser
driver.quit()
PS: I have tried to increase the time for the webdriver, but that won't help. I think it can not find the object and there must be another way to identify the objects.

First, you can skip several steps by just building the URL for google maps with the desired search string. Second, your "Wait for results to load" locator was not on my page. My guess is that the class you are using is changing regularly. I used a different CSS selector and found it just fine.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
# Start the browser
driver = webdriver.Chrome()
# Declare string to search for and encode it
search_string = "Kinderarzt Kanton Aargau"
partial_url = search_string.replace(" ", "+")
# Open google.de and accept cookies
driver.get(f"https://www.google.de/maps/search/{partial_url}/")
wait = WebDriverWait(driver, 25)
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "#L2AGLb > div"))).click()
# Wait for links and extract
results = wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div[aria-label^='Results for'] > div > div > a")))
for result in results:
link = result.get_attribute("href")
print(link)
# Close the browser
driver.quit()
The result is
https://www.google.de/maps/place/Dr.+med.+Helena+Gerritsma+Schirlo/data=!4m7!3m6!1s0x47903be8d0d4a09d:0xc97d85a6fa076207!8m2!3d47.3906733!4d8.0443884!16s%2Fg%2F1tghc1gd!19sChIJnaDU0Og7kEcRB2IH-qaFfck?authuser=0&hl=en&rclk=1
https://www.google.de/maps/place/Kinderarztpraxis+Dr.+med.+Armin+B%C3%BChler+%26+Thomas+Justen/data=!4m7!3m6!1s0x479069d7b30c674b:0xd04693e64cbc42b0!8m2!3d47.5804824!4d8.2163541!16s%2Fg%2F1ptw0srs4!19sChIJS2cMs9dpkEcRsEK8TOaTRtA?authuser=0&hl=en&rclk=1
https://www.google.de/maps/place/Kinderarztpraxis+Lenzburg/data=!4m7!3m6!1s0x4790160e650976b1:0x5352d33510a53d99!8m2!3d47.3855278!4d8.1753395!16s%2Fg%2F11hz17jwcy!19sChIJsXYJZQ4WkEcRmT2lEDXTUlM?authuser=0&hl=en&rclk=1
https://www.google.de/maps/place/Kinderarzthaus+-+Kinderarztpraxis/data=!4m7!3m6!1s0x47903bf002633251:0xf029086640b016ee!8m2!3d47.391928!4d8.051698!16s%2Fg%2F11cfdn2j8!19sChIJUTJjAvA7kEcR7hawQGYIKfA?authuser=0&hl=en&rclk=1
https://www.google.de/maps/place/Dr.+med.+Nils+Hammerich/data=!4m7!3m6!1s0x4790160e650976b1:0x7116ed2cc14996ea!8m2!3d47.3856086!4d8.1753854!16s%2Fg%2F1tl0w7qv!19sChIJsXYJZQ4WkEcR6pZJwSztFnE?authuser=0&hl=en&rclk=1
https://www.google.de/maps/place/Kinderarzt+Berikon/data=!4m7!3m6!1s0x47900e152314a493:0x72ca7fe58b7b3a5f!8m2!3d47.3612625!4d8.3674472!16s%2Fg%2F11c311g_px!19sChIJk6QUIxUOkEcRXzp7i-V_ynI?authuser=0&hl=en&rclk=1
https://www.google.de/maps/place/Dr.+med.+Hana+Balent+Ilitsch/data=!4m7!3m6!1s0x4790697f95fe3a73:0xaff715a22ab56e78!8m2!3d47.5883105!4d8.2882387!16s%2Fg%2F11hyjwg_32!19sChIJczr-lX9pkEcReG61KqIV968?authuser=0&hl=en&rclk=1
https://www.google.de/maps/place/Dr.+med.+Belzer+Heierling+Tanja/data=!4m7!3m6!1s0x47906d2a4e9698fd:0x6865ac23234b8dc9!8m2!3d47.4637622!4d8.3284463!16s%2Fg%2F1tksm8d9!19sChIJ_ZiWTiptkEcRyY1LIyOsZWg?authuser=0&hl=en&rclk=1
https://www.google.de/maps/place/Praxis+f%C3%BCr+Kinder-+und+Jugendmedizin+Dr.+Dirk+Bock/data=!4m7!3m6!1s0x47906b5c9071d861:0x516c763f7642c9ff!8m2!3d47.4731839!4d8.1959905!16s%2Fg%2F11mpc9wm91!19sChIJYdhxkFxrkEcR_8lCdj92bFE?authuser=0&hl=en&rclk=1
https://www.google.de/maps/place/Alleviamed+Kinderarztpraxis+Meisterschwanden/data=!4m7!3m6!1s0x4790193bdf03b5f1:0xfef98e265772814a!8m2!3d47.2956342!4d8.2279202!16s%2Fg%2F11gr2z_z2f!19sChIJ8bUD3zsZkEcRSoFyVyaO-f4?authuser=0&hl=en&rclk=1
https://www.google.de/maps/place/Kinderarztpraxis+Suhrepark+AG/data=!4m7!3m6!1s0x47903c69ae471281:0xcb34880030319dd7!8m2!3d47.3727496!4d8.0809937!16s%2Fg%2F1v3kl_4v!19sChIJgRJHrmk8kEcR150xMACINMs?authuser=0&hl=en&rclk=1

Related

ActionChains is not updating the data within the table

I am trying to scrape data from this website using Selenium.
There are three features in data, "Value", "Net change" and "percent change", including values for net and percentage changes for 1, 3, 6, and 12 months. I want to fetch 1 month's net change and percent change. For that, I need to click on the check boxes and click on the update button.
Now, I performed these actions using selenium's find element by XPath method but for percent change, I needed to use the ActionChains command, as I was getting "Element not clickable error".
When I execute the code, all three features should occur in the downloaded csv. But that's not happening. I am just able to fetch "Value" and "1 Month Net change". If anyone knows, may I know, why the is not getting updated and or how to fix it? Thanks
My code:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains
from selenium import webdriver
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
driver = webdriver.Chrome('chromedriver',chrome_options=chrome_options)
driver.get("https://beta.bls.gov/dataViewer/view/timeseries/CUUR0000SA0")
soup = BeautifulSoup(driver.page_source, "html.parser", from_encoding='utf-8')
driver.find_element(By.XPATH, '/html/body/div[2]/div/div/div[4]/div/div[1]/form/div[2]/fieldset/div[1]/table/tbody/tr[1]/td[1]/label/input').click() //1 month net change
element = WebDriverWait(driver, 60).until(EC.element_to_be_clickable((By.XPATH, '// [#id="percent_monthly_changes_div"]/table/tbody/tr[1]/td[1]/label/input')))
ActionChains(driver).move_to_element(element).click().perform() //1 month percent change
driver.find_element(By.XPATH, '/html/body/div[2]/div/div/div[4]/div/div[1]/form/div[4]/input').click() //update button
driver.find_element(By.XPATH, '//*[#id="csvclickCU"]').click() //download csv button
The website is showing N/A in the column of 1 Month Net change.
if you still not getting 1 month % change value you can do
driver.execute_script('document.querySelector("#percent_monthly_changes_div > table > tbody > tr:nth-child(1) > td:nth-child(1) > label > input").click()')
instead of:
element = WebDriverWait(driver, 60).until(EC.element_to_be_clickable((By.XPATH, '// [#id="percent_monthly_changes_div"]/table/tbody/tr[1]/td[1]/label/input')))
ActionChains(driver).move_to_element(element).click().perform()
this might not be the optimal solution, but it works fine.
and 1 month net change value is not given from the website itself.
To click on the elements with text as 1-Month Net Change and 1-Month % Change using ActionChains will be an overhead and you can avoid it easily.
Ideally, you need to induce WebDriverWait for the element_to_be_clickable() and you can use either of the following locator strategies:
Using CSS_SELECTOR:
driver.get("https://beta.bls.gov/dataViewer/view/timeseries/CUUR0000SA0")
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "input[value='1N']"))).click()
driver.find_element(By.CSS_SELECTOR, "input[value='1P']").click()
Using XPATH:
driver.get("https://beta.bls.gov/dataViewer/view/timeseries/CUUR0000SA0")
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//input[#value='1N']"))).click()
driver.find_element(By.XPATH, "//input[#value='1P']").click()
Note: You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
Browser Snapshot:

Selenium (Python) not finding dynamically loaded JavaScript table after automated login occurs

Im using Selenium with Python3 on a Service Now Website.
So the process is as follows: selenium loads up the ServiceNow URL and then I use sendKeys to automate typing in of username and password, then the page is loaded which has a table of incidents I need to extract. Unfortunately I have to login in every single time because of the group policy I have.
This works up until I have to find the dynamically rendered Javascript table with data and I can't for the life of me seem to find it. I even tried to put a sleep in there for 15 seconds to allow it to load.
I also double checked the XPaths and Id / Class names and they match up. When I print query.page_source I don't see anything rendered by JS.
I've used beautiful soup too but that also doesn't work.
Any ideas?
from time import sleep
from collections import deque
from selenium import webdriver
from selenium.webdriver.support.ui import Select # for <SELECT> HTML form
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
query = webdriver.Firefox()
get_query = query.get("SERVICENOW URL")
query.implicitly_wait(10)
login_username = query.find_element_by_id('username')
login_password = query.find_element_by_id('password')
login_button = query.find_element_by_id('signOnButton')
username = "myUsername"
password = "myPassword"
login_username.send_keys(username)
login_password.send_keys(password)
login_button.click()
sleep(10)
incidentTableData = []
print(query.page_source)
// *** THESE ALL FAIL AND RETURN NONE ***
print(query.find_elements())
tableById = query.find_element_by_id('service-now-table-id')
tableByXPath = query.find_element_by_xpath('service-now-xpath')
tableByClass = query.find_element_by_id('service-now-table-class')
Since it's a dynamically rendered Javascript table, I would suggest you to implement explicit wait in your code.
so instead of this :
tableById = query.find_element_by_id('service-now-table-id')
tableByXPath = query.find_element_by_xpath('service-now-xpath')
tableByClass = query.find_element_by_id('service-now-table-class')
re-write these lines like this :
wait = WebDriverWait(query, 10)
service_now_with_id = wait.until(EC.element_to_be_clickable((By.ID, "service-now-table-id")))
service_now_with_xpath = wait.until(EC.element_to_be_clickable((By.XPATH, "service-now-xpath")))
service_now_with_class = wait.until(EC.element_to_be_clickable((By.ID, "service-now-table-class")))
You are gonna need to use the below imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as E
PS :- service_now_with_id, service_now_with_xpath, service_now_with_class, these are web elements returned by explicit waits. you may wanna have to interact with them as per your requirement meaning, clicking on it or sending keys or whatever.

Scrape Product Image with BeautifulSoup (Error)

I need your help. I'm working on a telegram bot which sends me all the sales from amazon.
It works well but this function doesn't work properly. I have always the same error that, however, blocks the script
imgs_str = img_div.img.get('data-a-dynamic-image') # a string in Json format
AttributeError: 'NoneType' object has no attribute 'img'
def take_image(soup):
img_div = soup.find(id="imgTagWrapperId")
imgs_str = img_div.img.get('data-a-dynamic-image') # a string in Json format
# convert to a dictionary
imgs_dict = json.loads(imgs_str)
#each key in the dictionary is a link of an image, and the value shows the size (print all the dictionay to inspect)
num_element = 0
first_link = list(imgs_dict.keys())[num_element]
return first_link
I still don't understand how to solve this issue.
Thanks for All!
From the looks of the error, soup.find didn't work.
Have you tried using images = soup.findAll("img",{"id":"imgTagWrapperId"})
This will return a list
Images are not inserted in HTML Page they are linked to it so you need wait until uploaded. Here i will give you two options;
1-) (not recommend cause there may be a margin of error) simply; you can wait until the image is loaded(for this you can use "time.sleep()"
2-)(recommend) I would rather use Selenium Web Driver. You also have to wait when you use selenium, but the good thing is that selenium has a unique function for this job.
I will show how make it with selenium;
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
browser = webdriver.Chrome()
browser.get("url")
delay = 3 # seconds
try:
myElem = WebDriverWait(browser, delay).until(EC.presence_of_element_located((By.ID, 'imgTagWrapperId')))# I used what do you want find
print ("Page is ready!")
except TimeoutException:
print ("Loading took too much time!")
More Documention
Code example for way 1
Q/A for way 2

How to get the whole source code from a web page with selenium / webdriver?

I use this python program (usually) successfully for webscraping.
It gives me not only the page's source code but also the code which is hidden behind Javascript.
However, it does not work as desired on this particular website. Information is missing.
It does not seem to be a timing problem.
from selenium import webdriver
url = "https://www.youbet.dk/sport/fodbold/"
options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
driver = webdriver.Chrome(executable_path='D:/Programme/chromedriver_win32/chromedriver.exe',options=options)
driver.get(url)
After execution, driver.page_source contains the code.
I am interested in the text on the buttons (team name and a number).
Right-clicking and inspecting a button in Chrome gives me something like the following code which contains the information I am looking for (here "Villarreal" and "1.51"):
<button class="rj-ev-list__bet-btn rj-ev-list__selection-0ML54283820_1" data-uat="button-ev-list-bet-btn"><div class="rj-ev-list__bet-btn__inner " data-uat="div-ev-list-bet-btn-inner"><div class="rj-ev-list__bet-btn__row" data-uat="div-ev-list-bet-btn-row"><span class="rj-ev-list__bet-btn__content rj-ev-list__bet-btn__text" data-uat="ev-list-ev-list-bet-btn-text">Villarreal</span></div><div class="rj-ev-list__bet-btn__row" data-uat="div-ev-list-bet-btn-row"><span class="rj-ev-list__bet-btn__content rj-ev-list__bet-btn__odd" data-uat="ev-list-ev-list-bet-btn-odd">1.51</span></div></div><span class="rj-ev-list__bet-btn__arrow-up"></span><span class="rj-ev-list__bet-btn__arrow-down"></span></button>
But this does not show up in driver.page_source.
How can I access this information using python and selenium?
These did not help:
* Adding time.sleep(10)
* Adding driver.implicitly_wait(10)
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup
url = "https://www.youbet.dk/sport/fodbold/"
options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
driver = webdriver.Chrome(options=options)
driver.get(url)
WebDriverWait(driver, 20).until(EC.presence_of_all_elements_located((By.CLASS_NAME, 'rj-ev-list__bet-btn__inner')))
html = driver.page_source
soup = BeautifulSoup(html, 'html.parser')
mydivs = soup.find_all("button", {"class": "rj-ev-list__bet-btn"})
alldata = [[div.find("span", {"class": "rj-ev-list__bet-btn__content"}).text,
div.find("span", {"class": "rj-ev-list__bet-btn__odd"}).text] for div in mydivs]
print(alldata)
driver.quit()
# [['Fulham', '2.55'], ['Uafgjort', '3.40'], ['Leeds', '2.80'], ['Real Betis', '1.83'], ['Uafgjort', '3.65'], ['Levante', '4.55'], ['Parma', '2.40'], ['Uafgjort', '3.10'], ['Genoa', '3.35']]
The problem with your approach:
You were close. The issue with the delays that you added to your code were they didn't relate directly to the visibility of the element (maybe ten seconds of wait time wasn't enough). To combat the issue, in this code, I used the more specific WebDriverWait (additional resource: https://www.geeksforgeeks.org/explicit-waits-in-selenium-python/)
WebDriverWait(driver, 20).until(EC.presence_of_all_elements_located(
(By.CLASS_NAME, 'rj-ev-list__bet-btn__inner')))
to wait for the presence of all the elements in the class. The code solution worked for me. Tell me in the comments if you have any issues.

Can't locate elements from a website using selenium

Trying to scrape data from a business directory but I keep getting the data was not found
name =
driver.find_elements_by_xpath('/html/body/div[3]/div/div/div[1]/div/div[1]/div/div[1]/h4')[0].text
# Results in: IndexError: list index out of range
So I tried to use WebDriverWait to make the code wait for the data to load but it doesn't find the elements, even though the data get loaded to the website.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd
from bs4 import BeautifulSoup
import requests
import time
url='https://www.dmcc.ae/business-search?directory=1&submissionGuid=2c8df029-a92e-4b5d-a014-7ef9948e664b'
driver = webdriver.Firefox()
driver.get(url)
wait=WebDriverWait(driver,50)
wait.until(EC.visibility_of_element_located((By.CLASS_NAME,'searched-list ng-scope')))
name = driver.find_elements_by_xpath('/html/body/div[3]/div/div/div[1]/div/div[1]/div/div[1]/h4')[0].text
print(name)
driver.switch_to.frame(driver.find_element_by_css_selector("#pym-0 iframe"))
wait = WebDriverWait(driver, 10)
wait.until(EC.presence_of_element_located(
(By.CSS_SELECTOR, '.searched-list.ng-scope')))
name = driver.find_elements_by_xpath(
'/html/body/div[3]/div/div/div[1]/div/div[1]/div/div[1]/h4')[0].text
its inside iframe , to interact with iframe element switch to it first. Here iframe doesn't have any unique identified . So we used the parent div which had unique id as reference from that we found the child iframe
now if you want to interact outside iframe use;
driver.switch_to.default_content()
<iframe src="https://dmcc.secure.force.com/Business_directory_Page?initialWidth=987&childId=pym-0&parentTitle=List%20of%20Companies%20Registered%20in%20Dubai%2C%20DMCC%20Free%20Zone&parentUrl=https%3A%2F%2Fwww.dmcc.ae%2Fbusiness-search%3Fdirectory%3D1%26submissionGuid%3D2c8df029-a92e-4b5d-a014-7ef9948e664b" width="100%" scrolling="no" marginheight="0" frameborder="0" height="3657px"></iframe>
Switch to iframe and handle the accept button.
driver.get('https://www.dmcc.ae/business-search?directory=1&submissionGuid=2c8df029-a92e-4b5d-a014-7ef9948e664b')
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "#hs-eu-confirmation-button"))).click()
wait.until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,'#pym-0 > iframe')))
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR,'.searched-list.ng-scope')))
name = driver.find_elements_by_xpath('//*[#id="directory_list"]/div/div/div/div[1]/h4')[0]
print(name.text))
Outputs
1 BOXOFFICE DMCC

Resources