I'm checking a python library: requests-html. Looks interesting, easy and clear scraping. However, I'm not sure how to render a page with infinite scrolling.
From their documentation I understand that I should render a page with special attribute (scrolldown). I'm trying but I do not know how exactly. I know how to use selenium to handle infinite scroll, but I wonder whether it is possible with requests-html.
from requests_html import HTML, HTMLSession
page1 = session.get(url1)
page1.html.render( scrolldown=5,sleep=3)
html = HTML(html=page1.text)
noticeName = html.find('h2.noticeName')
for element in noticeName:
print(element.text)
It finds 10 elements from 13. 10 is visible without scrolling (and loading new content because of infinite scroll).
scrolldown=5 means scroll 5 pixel down, is your monitor that small?? or vm height that small?? now give it a bigger value like height of the screen with sleep or 2000 or 5000 without sleep
And it will not give you uniquely next elements, it will give you exactly all elements from the starting.
I will add some sample code soon.
I hope you've solved this already, but I'll post this for any other curious souls.
In most cases, if you want to infinite scroll, scrolldown needs to be a large value because it is based on the number of times requests_html will send a "page down" request in Chromium.
According to the docs:
scrolldown – Integer, if provided, of how many times to page down.
However, the requests_html uses the pyppeteer library which sends a page down as a key press. This means that if you are on a page that blocks the page down keys or simply doesn't infinite scroll using only the key presses, you will need a different solution.
Alternative solution (in Javascript)
Documentation: requests_html (archived)
Related
I am trying to scroll Telegram using selenium in python. In the attached screenshot I have shared I have selected 'Members' as the element to send Keys.PAGE_DOWN as it is stick all the time to top and is static while scrolling so it should be visible all the time and can be the perfect element to send Keys.PAGE_DOWN to.
But on sending page_down I get error 'Element not Interactable'.
Any suggestions what I am doing wrong?
I have attached the script and screenshot.
I am using python 3.10 and selenium latest version.
`driver.find_element(By.XPATH, "//*[#id='RightColumn']/div[2]/div/div/div[2]/div[2]/div[1]").send_keys(Keys.PAGE_DOWN)`
I have tried all the answers currently available on the internet and they don't work here. This looks like some complex issue.
I think Selenium is throwing the right error message as this div is not an interactable element and you are trying to send keystrokes into the element.
Another approach for scrolling is using Javascript commands.
Find an element locator you need to scroll to.
(Ex: if you need to scroll to the bottom find the element at the bottom)
Use the below code to scroll
# Find the element in the page to scroll to
element = driver.find_element_by_xpath("//element/at/bottom/of/the/page")
# Fire javascript command to scroll in to view
driver.execute_script("arguments[0].scrollIntoView();", element)
I'm making a small project with selenium. but before I can do anything, I need to wait for the whole page to load. is there anyway, for me to send keys as soon as the search bar loads?
You can use
driver.implicitly_wait(10)
On top of your code so each time you try to find an element it will try for 10 seconds and if it doesn’t find it, it will raise an exception
I'm trying to run the code:
for j in range(1,13):
driver.find_element_by_xpath('//*[#id="gateway-page"]/body/table/tbody/tr[3]/td[2]/table/tbody/tr[2]/td[2]/table/tbody/tr/td/table/tbody/tr[2]/td/div/div[2]/ul/li['+str(j)+']').click()
time.sleep(3)
To click every satisfying element on this website. But it ignores some elements every time, while it worked when I tried them not in the for loop but separately. Any idea why this happened?
Seems problem is with /ul/li['+str(j)+'] you are performing the click on <li> tag while actual link reside in it. That's why sometime the actual link won't receive the click without any error as link wrapped inside <li> tag .
Try to locate actual link tag. Use below code. I have tested on my system. Hope this will help you.
driver.get('http://catalog.sps.cuny.edu/preview_program.php?catoid=2&poid=607')
driver.implicitly_wait(10)
links = driver.find_elements_by_xpath("//div//h2[contains(.,'Electives')]/..//ul/li//span/a")
for link in links:
link.click()
time.sleep(3)
After observing xpath, I observed that you are trying to click the Elective option on that website. I think you have stored text of all electives in str array and using the loop, you are trying to click on each elective.
I suggest using another approach. Store all electives in list and then iterate over the elements and click them. e.g.
elements = driver.find_elements_by_xpath('///*[#id="gateway-page"]/body/table/tbody/tr[3]/td[2]/table/tbody/tr[2]/td[2]/table/tbody/tr/td/table/tbody/tr[2]/td/div/div[2]/ul/li')
for element in elements:
element.click()
time.sleep()
Probable problems in your solution
You are storing the name of electives in the array. If there is any typo, xPath will become invalid
You are starting loop from 1 to 13 but str is 0 indexed so start the loop from 0. because in you case you will always miss the first elective
Also after each click, elective expands. So you can also think about scrolling if an element is not found
Suggestion:
Also, use relative xpaths instead of absolute. Relative xpaths are more stable.
Happy Coding~
I am learning Python and attempting to build a program that will scrape specific data from a website, store it and then manipulate it.
Currently I run my application, it opens a new chrome browser window and loads the page correctly. The problem is it should begin to start scrolling down and loading the remaining elements on the page.
I know the code works because if I manually click somewhere on the page that doesn't normally illicit a response (white space/empty areas) the browser somehow comes into "focus" and begins to iterate through the loop that scrolls down the page (by sending keys) prints the data I am after. I also noticed if I click another similar "dead space" area that contains the header, it doesn't have the same effect. I am unsure if this is something specific to Chrome, iFrames or something of that nature but I am completely stumped and would greatly appreciate any help.
Any thoughts on why I need to manually click on the new chrome window for it to work would be great.
Update: Still having the same issue, even tried with Safari and the same problem seems to exist.
Fixed this with:
element = driver.find_element_by_css_selector("div[id^='app-container']")
action = ActionChains(driver)
action.click(on_element = element)
action.perform()
I am working with Selenium WebDriver and wrote a small code to find and element (i.e a button) and click on it.
Here is the HTML code for the button:
<input type="submit" name="j_id0:j_id2:j_id3:j_id4:j_id7" value="New Master Health Program" onclick="AddLink()" class="btn">
Here is the C# code for the Test Case:
IWebElement newMasterHealthProgramsLink = driver.FindElement(By.Name("j_id0:j_id2:j_id3:j_id4:j_id7"));
newMasterHealthProgramsLink.Click();
I tried using XPath as well:
IWebElement newMasterHealthProgramsLink = driver.FindElement(By.XPath("//input[#id='j_id0:j_id2:j_id3:j_id4:j_id5']"));
newMasterHealthProgramsLink.Click();
I found a solution saying that you must not have implemented Wait for this. Page does not wait to load completely and tries to find the element. So I added wait command but nothing useful happened. Still getting the same error:
TestAutomation.Driver.Login:
OpenQA.Selenium.NoSuchElementException : The element could not be found
Since your element is in an IFrame, you'll need to 'switch' to that IFrameusing the WebDriver API.
By default, Selenium will use the 'top' frame, and therefore any 'find element' queries will be directed to the most top frame, and ignore any child IFrames.
To solve this, 'switching' to the current IFrame directs Selenium to shove any requests to that frame instead.
driver.SwitchTo().Frame()
Note that you'll need a way of accessing the frame, either by it's ID, index (the top frame is 0, next frame down is 1, etc...) or name.
Another note is that any further requests will be directed to that IFrame, ignoring any others, which includes the top one...so if you need to access the top frame, you'll need to switch back:
driver.Switch().DefaultContent().