I'm trying to scrape a website for an about 2 days, but scrolling down to get more elements is the problem. I've almost checked every javascript code in stackoverflow to do that, but none of them worked.
For example :
window.scrollTo(1, 1000)
window.scrollTo(0, document.body.scrollHeight);
arguments[0].scrollIntoView(true);
I even used this article, but it didn't work either.
I checked the network to see if I can find the API to use requests to get more elements, but I couldn't find it.
All I want to do is to get more elements, so is there a way to do that?
Try this approach to scroll page down:
from selenium.webdriver.common.keys import Keys
driver.find_element('xpath', '//body').send_keys(Keys.END)
For me the following worked (move_to_element):
from selenium.webdriver import ActionChains
your_link = driver.find_elements(By.XPATH, "//*[contains(#class, 'ClassName')]"
action = ActionChains(driver)
action.move_to_element(your_link).perform()
Related
Currently I'm trying to scrape something from a website. For that I need content of an email and so I use yopmail for that (https://yopmail.com).
In yopmail you have the mails on the left side on the screen with the mail subject under it. This text is the part I need.
[the mail view][1]
[the devtools code][2]
The problem now is that this code is not available in the page source. For what I red online it can be caused by javascript generation although, I'm not sure that is exactly the problem
I've tried multiple solutions:
attempt 1:
using beautifulSoup and locate the element (failed because not in the page source)
attempt 2:
tried locate element with xpath with the selenium driver (also unable to find)
attempt 3:
get the inner html of the body (still not available in that html)
driver.find_element_by_tag_name('body').get_attribute('innerHTML')
It feels like nothing works and also the other related posts here dont give me an answer that helps. Is there anyone who can help me with this?
[1]: https://i.stack.imgur.com/vTi0s.png
[2]: https://i.stack.imgur.com/nmBZ8.png
It seems like the element you are trying to get is inside an iframe, that's why you are not able to locate it. So first you have to switch to the iframe by using:
WebDriverWait(driver, 10).until(EC.frame_to_be_available_and_switch_to_it((By.ID ,'ifinbox')))
element = driver.find_element(By.XPATH, "//div[#class='lms']")
print(element.text)
When you are done you can switch back to default content by using
driver.switch_to.default_content()
NOTE: You need to import the following
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
newbie here. I've been reading the site for a while as I'm still new to coding but hoping you can help.
I've worked my way through some tutorials/worked examples on web scraping and am looking at the website http://enfa.co.uk/
I am trying to open an instance of Chrome using chromedriver with selenium in Python and click on one of the sidebar options on the website called 'Clubs' - located on the left of the homepage.
I've navigated to the element that needs to be clicked and taken the xpath to use in my code (simple use of 'inspect' in the Chrome dev tools when hovering over the 'Clubs' link, then copying the xpath). My code opens chrome fine (so no issues with Chromedriver and that part of the project) but I receive an error telling me the object has no click attribute.
I've tried returning the object and it states my list has no elements (which seems to be the problem) but am unsure why... am I using the incorrect xpath or do some websites react differently i.e. won't respond to a click request like this?
I have run my code on other sites to check I'm utilising the click function and it seems to work ok so I'm a little stumped by this one. Any help would be great!
Code:
chromedriver = "/Users/philthomas/Desktop/web/chromedriver"
driver = webdriver.Chrome(chromedriver)
driver.get("http:enfa.co.uk")
button = driver.find_elements_by_xpath("/html/body/table/tbody/tr[5]/td")
button.click()
Traceback (most recent call last):
File "sel.py", line 9, in
button.click()
AttributeError: 'list' object has no attribute 'click'
HTML of link I am trying to click
find_elements_by_xpath returns list of all web elements that match the criteria. You have to use to find_element_by_xpath instead of find_elements_by_xpath.
Also, iframe are present on your page so you need to switch to that frame before you perform any action on it. Kindly find below ssolution which is working fine for me.
from selenium.webdriver.common.by import By
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome(executable_path=r"C:\New folder\chromedriver.exe")
driver.maximize_window()
driver.get("http:enfa.co.uk")
driver.switch_to.frame(2);
ClubsLink=WebDriverWait(driver, 20).until(
EC.presence_of_element_located((By.XPATH, "//span[contains(text(),'Clubs')]")))
ClubsLink.click()
Output after clicking Clubs link :
helpful link to locate element :https://selenium-python.readthedocs.io/locating-elements.html
Change button = driver.find_elements_by_xpath("/html/body/table/tbody/tr[5]/td") to button = driver.find_element_by_xpath("/html/body/table/tbody/tr[5]/td")
driver.find_elements_by_xpath returns a collection of element, not a single one, hence you cannot click it.
You can find some details and examples here.
I can't find the element I need to tell selenium that I want it to click it, I believe it is because the page is generated by javascript
can someone please help? maybe show me a way to do it and then explain how to find?
the website I'm working on is www.howlongtobeat.com
I want selenium to do the following:
go to http://www.howlongtobeat.com => click on the search tab => enter "God of War (2018)" => click the link that pops up
this is the code I have so far:
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.keys import Keys
from requests import get
from bs4 import BeautifulSoup
url = "http://www.howlongtobeat.com"
driver = webdriver.Chrome()
driver.get(url)
search_element = driver.find_element_by_name("global_search_box")
search_element.clear()
search_element.send_keys("God of War (2018)")
search_element.send_keys(Keys.RETURN)
#this is where my isssue is, I dont know what element it is or how to find
link = driver.find_element_by_link_text("input")
link.click()
it's just the last step I need help with
can someone advise?
#Ankur Singh solution works fine. You can also use the CSS Selector to do the same clicking (I generally prefer CSS Selectors)
element=WebDriverWait(driver,10).until(EC.presence_of_element_located((By.CSS_SELECTOR,"h3.shadow_text")))
element1= driver.find_element_by_css_selector('h3.shadow_text > a')
element1.click()
time.sleep(3)
driver.quit()
You can use below code to click on link:-
element = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.CLASS_NAME, "search_list_image"))
link = driver.find_element_by_link_text("God of War (2018)")
link.click()
i have written parameters in webpage & applying them by selecting apply button.
1. apply button is getting clicked for web elements outside frame using below command
browser1.find_element_by_name("action").click()
apply button not getting clicked when saving the paramters inside a frame of web page using same command
browser1.find_element_by_name("action").click()
you need to switch to iframe
fist you need to find the iframe then switch to it then click
driver.switch_to_frame(driver.find_element_by_tag_name("iframe"))
or you could use xpath to locate the element
driver.find_element_by_xpath("//iframe[]")
I do not know the Python syntax, but in Selenium you have to switch to the frame much like you would switch to a new tab or window before performing your click.
The syntax in Java would be: driver.switchTo().frame();
As mentioned by the others:
WebDriverWait(driver, 10).until(EC.frame_to_be_available_and_switch_to_it((By.XPATH,"//iframe[]"))) #elements are in a iframe, have to select it first
If there is a chance that it doesn't show up immediately you might want to build in this wait function.
Tells python to wait for max 10 secs, until frame is available and switches to it right away. I use Xpath to track down the iframe
Dont know how new you are, so provided the imports below:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
Credit for this solution goes to #Andersson
Finding xpaths on pages running script
Upvote him there.
Couldn't find the duplicate report button. Title of this post makes it easier to find then the above mentioned question. (full disclosure, the link links to one of my own questions)
I am trying to scrape content of a page.
Let's say this is the page:
http://finance.yahoo.com/quote/AAPL/key-statistics?p=AAPL
I know I need to use Selenium to get the data I want.
I found this example from Stackoverflow that shows how to do it:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait
driver = webdriver.Chrome()
driver.maximize_window()
driver.get("http://finance.yahoo.com/quote/AAPL/profile?p=AAPL")
# wait for the Full Time Employees to be visible
wait = WebDriverWait(driver, 10)
employees = wait.until(EC.visibility_of_element_located((By.XPATH, "//span[. = 'Full Time Employees']/following-sibling::strong")))
print(employees.text)
driver.close()
My question is this:
In the above example to find Full Time Employees the code that has been used is:
employees = wait.until(EC.visibility_of_element_located((By.XPATH, "//span[. = 'Full Time Employees']/following-sibling::strong")))
How the author has found that s/he needs to use:
"//span[. = 'Full Time Employees']/following-sibling::strong"
To find the number of employees.
For my example page: http://finance.yahoo.com/quote/AAPL/key-statistics?p=AAPL how can I find for example Trailing P/E?
Can you please tell me the steps you took to find this? I do right click and choose Inspect, but then what shall I do?
A picture is worth of thousand words.
In web dev. tools (F12) you do the following steps:
Choose Elements tab
Press Element Selector button
With that button pressed you click an element in the main browser window.
In the DOM-elements window you right-click that highlighted element.
The context menu gets transpired and you choose Copy.
Choose Copy XPath in a sub menu. Now you have that element xpath in a console buffer.
NOTE!
The browser makes/composes an element xpath based on its own algorithm. It might not be the way you think or the way that fits to your code. So, you have to understand xpath in nature, be acquainted with it.
See what xpath the Chrome browser has issued for Trailing P/E:
//*[#id="main-0-Quote-Proxy"]/section/div[2]/section/div/section/div[2]/div[1]/div[1]/div/table/tbody/tr[3]/td[1]/span
'//h3[contains(., "Valuation Measures")]/following-sibling::div[1]//tr[3]'
Here I have the answer for all your confusions.
It will be better to look on some xpath tutorials and do practice from yourself, then you will be able to decide what you have to use .
There are so many site. You can start Here or Here
Now come to your Query -
Suppose I am using following xpath to locate the element
//h3/span[text()='Financial Highlights']/../preceding-sibling::div//tr[3]/td/span
Your requirement to find Trailing P/E in your page, definatly you will look unique xpath which won't change. If you try to find this using firepath it shows some lengthy xpath
Now you will check alternative and find another element (may be sibling, child or ancestor of your element) based on that you can to locate your element
in My case, first will find the Financial Highlights text which I will be able to find using //h3/span[text()='Financial Highlights']
Now I move its parent tag which is h3 and I will do this using /..
I have Trailing P/E element in just above the current node so move on just above node using /preceding-sibling::div
And finally find your element in that <div> like -//tr[3]/td/span
See the screens as well -
Step 1 :
Step 2 :
Step 3 :
Step 4 :