Scrapy Splash: clicking the button doesn't open next page - python-3.x

I am having trouble to execute the Click Button with Scrapy-Splash. The website I am trying to scrape is this one: https://search.siemens.com/en/?q=iot&lr=lang_en&as_oq=&as_sitesearch=&site=siemens_c_ww&client=siemens_f_ww&getfields=%2A&proxystylesheet=p_ia&queryString=lang%3Den%26site%3Dsiemens_c_ww%26q%3Diot%26lr%3Dlang_en%26collapse%3Dtrue%26class%3Dsearch%2Cbanner%2Ctext%26_charset%3DUTF-8&start=10&hl=en&access=p&filter=1&output=xml_no_dtd&sort=date%253AD%253AL%253Ad1&oe=UTF-8&ie=UTF-8&exclude_apps=1&ud=1&sheet=0
I am using the following script:
function main(splash, args)
assert(splash:go(args.url))
assert(splash:wait(0.5))
assert(splash:runjs('document.querySelector(".next a[href]").click()'))
splash:set_viewport_full()
return {
html = splash:html(),
png = splash:png(),
har = splash:har(),
}
end
When executed I get back the first page and not the next one. The click button manually works. I tried using mouse_click() with the same result. Thank you for more ideas to solve this problem :)

I think you need to wait some delay time after button is clicked. Splash need time to re render dynamic page

Related

Cannot set notebook page in fuction with gtk in nodejs

I have notebook pages setup like tabs in a browser. You click the "+" tab and it creates a new page and moves the "+" tab to the end. That all works except I'm trying to change to the page that was created. Attempting to use "setCurrentPage" doesn't do anything. No errors are thrown and it stays on the current tab completely ignoring the command.
It doesn't matter what index number I put in the command. Still nothing happends. I thought maybe it has something to do with the switch-page signal but all the other commands work.
edit: I should add that it doesn't work outside the function either.
notebook.on('switch-page', AddTab)
function AddTab(selff, page, index) {
b = notebook.pageNum(newtab)
a = (notebook.getCurrentPage() + 1)
if ( b == a) {
TabPage()
notebook.reorderChild(newtab, -1)
//notebook.setCurrentPage(page)
win.showAll()
notebook.setCurrentPage(page)
}
}

can't click() an onclick element with selenium (tried text link, partial text link, xpath, css selector)

I need to scrap some data from this url:
https://www.cnrtl.fr/definition/coupe
The data/results I need to scrap are located in those 3 different tabs:
I'm unable to click on the onclick element which should let me switch from a tab to another.
Here the html code for one of the 3 onclick elements:
The 3 onclick elements differ from each other by the number at the end:
#COUPE1:
return sendRequest(5,'/definition/coupe//0');
#COUPE2:
return sendRequest(5,'/definition/coupe//1');
#COUPER:
return sendRequest(5,'/definition/coupe//2');
I tried to find them by link text, partial link text, xpath and css selector.
I've followed this thread:
Python + Selenium: How can click on "onclick" elements?
Also try the contains and text() method.
Without success.
There are a few ways you could do this. I chose the method I did because the page reloads causing the elements to become stale.
#Get the URL
driver.get("https://www.cnrtl.fr/definition/coupe")
#Find the parent element of the tabs
tabs = driver.find_element(By.ID, 'vtoolbar')
#Get all the list items under the parent (tabs)
lis = tabs.find_elements(By.TAG_NAME, 'li')
#loop over them (skipping the first tab, because that's already loaded)
for i in range(1, len(lis)):
#Execute the same JS as the page would on click, using the index of the loop
driver.execute_script(f"sendRequest(5,'/definition/coupe//{i}');")
#Sleep to visualise the clicking
time.sleep(3)

IPython.display not showing TextBox in ipywidgets

I'm trying to build such a functionality such that whenever the user clicks in the Add button in my code, it generates a new text box, just under the old one. For example, like this:
Now, if the user were to click on add once again, a 5th text box should appear.
I've tried to achieve the same using this piece of code:
add_button = widgets.Button(description='Add',
disabled=False,
button_style='',
style={'description_width': 'initial', 'button_width': 'auto'},
icon='plus'
)
display(add_button)
add_button.on_click(add_new)
And my add_new function is simply defined as follows:
def add_new(*args):
display(widgets.Text(placeholder='Type something',description='String:'))
But this does not seem to be working nothing happens on clicking the button, any help would be appreciated. Also if there is a better way to do this, please help, I'm new to ipywidgets.
Try like this:
output = widgets.Output()
def add_new(*args):
with output:
display(widgets.Text(placeholder='Type something',description='String:'))
add_button = widgets.Button(description='Add',
disabled=False,
button_style='',
style={'description_width': 'initial', 'button_width': 'auto'},
icon='plus'
)
display(add_button)
add_button.on_click(add_new)
output

How to click "Next" button until it no longer exists - Python, Selenium, Requests

I am scraping data from a webpage that is paginated, and once I finish scraping one page, I need to click the next button and continue scraping the next page. I then need to stop once I have scraped all of the pages and a next button no longer exists. Below contains the html around the "Next" button that I need to click.
<tr align="center">
<td colspan="8" bgcolor="#FFFFFF">
<br>
<span class="paging">
<b> -- Page 1 of 3 -- </b>
</span>
<p>
<span class="paging">
<a href="page=100155&by=state&state=AL&pagenum=2"> .
<b>Next -></b>
</a>
</span>
<span class="paging">
Last ->>
</span>
</p>
</td>
</tr>
I have tried selecting on class and on link text, and both have not worked for me in my current attempts.
2 examples of my code:
while True:
try:
link = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.LINK_TEXT, "Next ->"))).click()
except TimeoutException:
break
while True:
try:
link = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CLASS_NAME, "paging"))).click()
except TimeoutException:
break
All of the solutions I have found online have not worked, and have primarily ended with the following error:
ElementClickInterceptedException: Message: element click
intercepted: Element <a href="?
page=100155&by=state&state=AL&pagenum=2">...</a> is not
clickable at point (119, 840). Other element would receive the
click: <body class="custom-background hfeed" style="position:
relative; min-height: 100%; top: 0px;">...</body>
(Session info: chrome=76.0.3809.132)
If the remainder of the error code would be helpful to review, please let me know and I will update the post with this error.
I have looked at the following resources, all to no avail:
Python Selenium clicking next button until the end
python - How to click "next" in Selenium until it's no longer available?
Python Selenium Click Next Button
Python Selenium clicking next button until the end
Selenium clicking next button programmatically until the last page
How can I make Selenium click on the "Next" button until it is no longer possible?
Could anyone provide suggestions on how I can select the "Next" button (if it exists) and go to the next page with this set of HTML? Please let me know if you need any further clarification on the request.
We can approach this problem through the solution using two major libraries - selenium and requests.
Approach - Scrape the page for page number and next page link every time
Using Selenium (If the site is Dynamic)
We can check if the page we are on is the last page or not, and if it is not the last page, we can check for the next button (assuming the website follows the same html structure for paging in all pages)
stop = False
driver.get(url)
while not stop:
paging_elements = driver.find_elements_by_class_name("paging")
page_numbers = paging_elements[0].text.strip(" -- ").split("of")
## Getting the current page number and the final page number
final = int(page_numbers[1].strip())
current = int(page_numbers[0].split("Page")[-1].strip())
if current==final:
stop=True
else:
next_page_link = paging_elements[-2].find_element_by_name("a").get_attribute('href')
driver.get(next_page_link)
time.sleep(5) # This gap can be changed as per the load time of the page
Using Requests and BS4 (If the site is static)
import requests
r = requests.get(url)
stop = False
while not stop:
soup = BeautifulSoup(r.text, 'html.parser')
paging_elements = soup.find_all('span', attrs={'class': "paging"})
page_numbers = paging_elements[0].text.strip(" -- ").split("of")
## Getting the current page number and the final page number
final = int(page_numbers[1].strip())
current = int(page_numbers[0].split("Page")[-1].strip())
if current==final:
stop=True
else:
next_page_link = paging_elements[-2].find("a").get('href')
r = request.get(next_page_link)
Alternative approaches
One method is using the URL of the website itself instead of the button-clicking process as the button click is intercepted in this case.
Most web pages have a page attribute added to their URL (visible for pages >=2). So, a paginated website might have URLs such as:
www.targetwebsite.com/category?page_num=1
www.targetwebsite.com/category?page_num=2
www.targetwebsite.com/category?page_num=3
and so on.
In such cases, one can simply iterate over the page numbers until the final page number (as originally out in the proposed answer). This approach eliminates the breakage possibility of the target website changing CSS layout/style.
Furthermore, there might be a requirement to create the next_page_link by appending the base URL as done for next_url in the other question (line 40-41):
next_url = next_link.find("a").get("href")
r = session.get("https://reverb.com/marketplace" + next_url)
I hope this helps!
It sounds like you're asking two different questions here:
How to click Next button until it no longer exists
How to click Next button with Javascript.
Here's a solution to #2 -- Javascript clicking:
public static void ExecuteJavaScriptClickButton(this IWebDriver driver, IWebElement element)
{
((IJavaScriptExecutor) driver).ExecuteScript("arguments[0].click();", element);
}
In the above code, you have to cast your WebDriver instance as IJavascriptExecutor, which allows you to run JS code through Selenium. The parameter element is the element you wish to click -- in this case, the Next button.
Based on your code sample, your Javascript click may look something like this:
var nextButton = driver.findElement(By.LINK_TEXT, "Next ->"));
driver.ExecuteJavascriptClickButton(nextButton);
Now, moving onto your other issue -- clicking until the button is no longer visible. I would implement this in a while loop that breaks whenever the Next button no longer exists. I also recommend implementing a function that can check the presence of the Next button, and ignore the ElementNotFound or NoSuchElement exception in case the button does not exist, to avoid breaking your test. Here's a sample that includes an ElementExists implementation:
public bool ElementExists(this IWebDriver driver, By by)
{
// attempt to find the element -- return true if we find it
try
{
return driver.findElements(by).Count > 0;
}
// catch exception where we did not find the element -- return false
catch (Exception e)
{
return false;
}
}
public void ClickNextUntilInvisible()
{
while (driver.ElementExists(By.LINK_TEXT, "Next ->"))
{
// find next button inside while loop so it does not go stale
var nextButton = driver.findElement(By.LINK_TEXT, "Next ->"));
// click next button using javascript
driver.ExecuteJavascriptClickButton(nextButton);
}
}
This while loop checks for the presence of the Next button with each iteration. If the button does not exist, the loop breaks. Inside the loop, we call driver.findElement with each successive click, so that we do not get a StaleElementReferenceException.
Hope this helps.

click on button which appears sometime on screen using python selenium

I am working automating website which generates quotations for mixer selection, sometimes for some parameters list of mixers suggested is more than 5 then Next button appears below the list. This button shown on the screen only when mixer list is more than 5.
below code I am using to save screenshot of the page and I want to screenshot of page which will come after click on Next button whenever it is available.
name = str(worksheet.cell_value(1,1))+"_"+str(worksheet.cell_value(1,2))+"_"+str(worksheet.cell_value(1,3))+\
"_"+str(worksheet.cell_value(1,4))+str(worksheet.cell_value(1,5))+".png"
driver.save_screenshot("D:\Automation\Pycharm_project\MRMix\Screenshots\%s"%name)
Lets say if next button is displayed, then need to click on it and need to take screenshot. You can simply try like below (in java)
you can have small method to return true or false depends on next button display
boolean isNextDisplay=false;
try {
if(driver.findElement(By.id("nextButton")).isDisplayed()==true) {
isNextDisplay=true;
}
}catch (Exception e) {
System.out.println("next button not displayed");
}
depends on display, click on next button and take screenshot.
if(isNextDisplay==true) {
//click on next button
//take screenshot
}
instead of methods, you can straightway write it. in Try >>If >> click and take screenshot

Resources