import os
import selenium
from selenium import webdriver
import time
browser = webdriver.Chrome()
browser.get('https://www.skysports.com/champions-league-fixtures')
time.sleep(7) #So page loads completely
teamnames = browser.find_element_by_tag("span")
print(teamnames.text)
seems find_element attribute is changed on selenium :/
i also want to find all <img on another local website ( images url ) , appreciate if you can help.
Replace teamnames = browser.find_element_by_tag("span")
with teamnames = browser.find_element_by_tag_name("span")
Try to find elements instead of element, because in DOM Tags are always considered multiple.
Example:
browser.find_elements_by_tag_name('span')
Also, not that it will return a list of elements you need to traverse to access properties further.
Seems selenium made some changes in new version:
from selenium.webdriver.common.by import By
browser = webdriver.Firefox()
browser.get('url')
browser.find_element(by=By.CSS_SELECTOR, value='')
You can also use : By.ID - By.NAME - By.XPATH - By.LINK_TEXT - By.PARTIAL_LINK_TEXT - By.TAG_NAME - By.CLASS_NAME - By.CSS_SELECTOR
I used these in Python 3.10 and now its working just fine
Related
I want to scrape the comments off this page using beautifulsoup - https://www.x....s.com/video_id/the-suburl
The comments are loaded on click via Javascript. The comments are paginated and each page loads comments on click too. I wish to fetch all comments, for each comment, I want to get the poster profile url, the comment, no. of likes, no of dislikes, and time posted (as stated on the page).
The comments can be a list of dictionaries.
How do I go about this?
This script will print all comments found on the page:
import json
import requests
from bs4 import BeautifulSoup
url = 'https://www.x......com/video_id/gggjggjj/'
video_id = url.rsplit('/', maxsplit=2)[-2].replace('video', '')
u = 'https://www.x......com/threads/video/ggggjggl/{video_id}/0/0'.format(video_id=video_id)
comments = requests.post(u, data={'load_all':1}).json()
for id_ in comments['posts']['ids']:
print(comments['posts']['posts'][id_]['date'])
print(comments['posts']['posts'][id_]['name'])
print(comments['posts']['posts'][id_]['url'])
print(BeautifulSoup(comments['posts']['posts'][id_]['message'], 'html.parser').get_text())
# ...etc.
print('-'*80)
This would be done with Selenium. Selenium emulates a browser. Depending on your preferences you can use a chrome driver or the Firefox driver which is the geckodriver.
Here is a link on how to install the chrome webdriver:
http://jonathansoma.com/lede/foundations-2018/classes/selenium/selenium-windows-install/
Then in your code here is how you would set it up:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
# this part may change depending on where you installed the webdriver.
# You may have to define the path to the driver.
# For me my driver is in C:/bin so I do not need to define the path
chrome_options = Options()
# or '-start maximized' if you want the browser window to open
chrome_options.add_argument('--headless')
driver = webdriver.Chrome(options=chrome_options)
driver.get(your_url)
html = driver.page_source # downloads the html from the driver
Selenium has several functions that you can use to perform certain actions such as click on elements on the page. Once you find an element with selenium you can use the .click() method to interact with the element.
Let me know if this helps
I'm facing a situation where a modal sometimes has a desired button for logging into Facebook and sometimes I have to click 'More Options' to get to the Facebook button. Another hurdle is that the xpath of the more_options_btn is the same as one called 'Trouble logging in?' so I added a [contains(text(),"More Options")], but am unsure of the syntax here and cannot find the right approach in the docs. In any case it is throwing an error, which might be caused by the fact that the button element does not have text, instead it is nested like this button > span > text
Please keep in mind that I want to make my solution as dynamic and robust as possible, but don't have a lot of experience with Selenium yet.
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
driver = webdriver.Chrome(ChromeDriverManager().install())
driver.get('http://localhost:3000/')
try:
fb_btn = driver.find_element_by_xpath(
'//*[#id="modal-manager"]/div/div/div/div/div[3]/span/div[2]/button')
fb_btn.click()
except:
more_options_btn = driver.find_element_by_xpath(
'//*[#id="modal-manager"]/div/div/div/div/div[3]/span/button[contains(text(),"More Options")]')
more_options_btn.click()
fb_btn = driver.find_element_by_xpath(
'//*[#id="modal-manager"]/div/div/div/div/div[3]/span/div[2]/button')
fb_btn.click()
Try to change xpath to:
//*[#id="modal-manager"]/div/div/div/div/div[3]/span/button[contains(.,"More Options")]
It can find the children node of 'button' contains has the text.
You can use OR operator in xpath | to match your xpath with multiple condition. I would recommend to use Relative xpath instead of Absolute one.
As you xpath are too lengthy so it would look like this with OR condition
//*[#id="modal-manager"]/div/div/div/div/div[3]/span/div[2]/button | //*[#id="modal-manager"]/div/div/div/div/div[3]/span/button[contains(text(),"More Options")]
With a combination from the input above I managed to get it to work. I got rid of a lot of the long xpath. This has the advantage that my code will still work if the order of the buttons change. It will however still break if the text of the buttons changes, so it is a compromise. Here is the result:
from time import sleep
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from webdriver_manager.chrome import ChromeDriverManager
driver = webdriver.Chrome(ChromeDriverManager().install())
driver.get('http://localhost:3000/')
sleep(5)
try:
fb_btn = driver.find_element_by_xpath(
"//*/button[contains(.,'Log in with Facebook')]").click()
except NoSuchElementException:
more_options_btn = driver.find_element_by_xpath(
"//*[contains(text(), 'More Options')]")
more_options_btn.click()
fb_btn = driver.find_element_by_xpath(
'//*/button[contains(.,"Log in with Facebook")]')
fb_btn.click()
I am unable to click on the 'Search photos' button on flickr (image below including the html).
I have tried the following:
sp = browser.find_element_by_partial_link_text('/search/?text=tennis%20shoes')
sp.click()
sp = browser.find_element_by_name('Select photos')
sp.click()
searchPhotos = browser.find_element_by_class_name('Search photos')
searchPhotos.click()
browser.find_element_by_xpath("//class[#name='Search photos']").click()
But none of them seem to work. I am learning how to do this, including how to use xpath, so maybe I am not using it correctly. Any advice to point me in the right direction?
EDIT: full section of code to answer comment below:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import Select
profile = webdriver.FirefoxProfile()
profile.set_preference("browser.download.folderList", 2)
profile.set_preference("browser.download.manager.showWhenStarting", False)
profile.set_preference("browser.download.dir", '/Users/home/Box/Temp-to delete')
profile.set_preference("browser.helperApps.neverAsk.saveToDisk", 'png/jpg')
browser = webdriver.Firefox(firefox_profile=profile, executable_path='/usr/local/bin/geckodriver')
browser.get('https://www.flickr.com/')
searchBar = browser.find_element_by_css_selector('#search-field')
searchBar.send_keys(searchTerm)
browser.find_element_by_xpath(".//*[#data-track='autosuggestNavigate_searchPhotos']").click()
Using firefox 72.0.2 (64-bit), python3, geckodriver v0.26.0
The path used in your XPath won't work. Try this one .//*[#data-track='autosuggestNavigate_searchPhotos'].
The .// tells Selenium so search anywhere in the DOM. The asterisk (*) will make Selenium to look for any element (no matter if it is div, li or any other HTML tag). Then it will check which element has the data-track attribute, with value autosuggestNavigate_searchPhotos. Since there is only one element like this, we are fine.
I advise to read more about XPath and train a bit, you may start here
Solved it Just had to hit ENTER for the photos results page to show. Here is the single line of code I changed:
searchBar.send_keys(searchTerm, Keys.ENTER)
I'm recently trying to learn Selenium and found a website that just ignores my attempts to find particular element by ID, name or xpath. The website is here:
https://www.creditview.pl/PL/Creditview.htm
I am trying to select first text window, the one labeled Uzytkownik, the code for it goes like that:
I am trying to find it using several methods:
from selenium import webdriver
browser = webdriver.Chrome()
site = "https://www.creditview.pl/pl/creditview.htm"
browser.get(site)
login_txt = browser.find_element_by_xpath(r"/html//input[#id='ud_username']")
login_txt2 = browser.find_element_by_id("ud_username")
login_txt3 = browser.find_element_by_name("ud_username")
No matter what I try I keep getting:
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element:
as if the element wasn't there at all.
I have suspected that the little frame containing the field might be an iframe and tried to switch to various elements with no luck. Also tried to check if the element isn't somehow obscured to my code (hidden element). Nothing seems to work, or I am making some newbie mistake and the answer is right in front of me. Finally I was able to select other element on the site and used several TAB keys to move cursor to desired position, but is feels like cheating.
Can someone please point show me how to find the element ? I literally can't sleep because of this issue :)
Given that your element is there, you still need to wait for your element to be loaded/visible/clickable etc. You can do that using selenium's expected conditions (EC).
from selenium import webdriver
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.common.by import By
my_XPATH = r"/html//input[#id='ud_username']"
wait_time = 10 # Define maximum time to wait in seconds
driver = webdriver.Chrome()
site = "https://www.creditview.pl/pl/creditview.htm"
driver.get(site)
try:
my_element = driver.WebDriverWait(driver, wait_time).until(EC.presence_of_element_located(By.XPATH,my_XPATH))
except:
print ("element not found after %d seconds" % (wait_time))
I am trying to locate a search box with id as (search2) from a website. I have been able to successfully login to the website using the below code.
import requests
from tqdm import tqdm
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
options = webdriver.ChromeOptions()
driver = webdriver.Chrome(executable_path=r'C:\chromedriver_win32\chromedriver.exe', chrome_options=options)
driver.implicitly_wait(30)
tgt = "C:\\mypath"
profile = {"plugins.plugins_list": [{"enabled":False, "name":"Chrome PDF Viewer"}],
"download.default_directory" : tgt}
options.add_experimental_option("prefs",profile)
print(options)
driver.get("http://mylink.com/")
user=driver.find_element_by_id("username")
passw=driver.find_element_by_id("password")
user.send_keys("abc#xyz.com")
passw.send_keys("Pwd")
driver.find_element_by_xpath('/html/body/div[2]/div/div/div[2]/form/div[3]/button').click()
page=driver.find_element_by_id("search2")
print(page)
The code works perfectly till here but the moment I add the below to it I get an error
page.send_keys("abc")
The error that I get is as below.
selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: element is not attached to the page document
What I am trying to do here is login to the website and search for some items and download the results. I have already tried using the implicitly wait options as mentioned in the code. Any help would be highly appreciated.
Adding the below piece of code did the trick. Had to make the current thread sleep while the program continues to run the next steps.
time.sleep(5)
Thanks everyone.