I'm crawlling some web pages for my research.
I want to inject javascript code below when redirecting to other page:
window.alert = function() {};
I tried to inject the javascript code below using WebDriverWait, so that selenium may execute the code as soon as the driver redirect to new page. But It doesn't work.
while (some conditions) :
try:
WebDriverWait(browser, 5).until(
lambda driver: original_url != browser.current_url)
browser.execute_script("window.alert = function() {};")
except:
//do sth
original_url = browser.current_url
It seems that the driver execute javascript code after the page loaded because the alert that made in the redirected page is showing.
Chrome 14+ blocks alerts inside onunload (https://stackoverflow.com/a/7080331/3368011)
But, I think the following questions may help you:
JavaScript before leaving the page
How to call a function before leaving page with Javascript
JavaScript before leaving the page
I solved my problem in other way.
I tried and tried again with browser.switch_to_alert but it didn't work. Then I found that it was deprecated, so not works correctly. I checked the alert and dismiss it in every 1 second with following code :
while *some_condition* :
try:
Alert(browser).dismiss()
except:
print("no alert")
continue
This works very fine, in Windows 10, python 3.7.4
Related
I have the code bellow which I know has worked before but for some reason seems to be broken now. The code is mean't to open a search engine, search for a query and return a list of results by the href tag. The webbrowser will open and navigate to http://www.startpage.com success fully, it then puts the term I have entered at the bottom into the search box but then just closes the browser. No error, no links. Nothing.
import selenium.webdriver as webdriver
def get_results(search_term):
url = "https://www.startpage.com"
browser = webdriver.Firefox()
browser.get(url)
search_box = browser.find_element_by_id("query")
search_box.send_keys(search_term)
search_box.submit()
try:
links = browser.find_elements_by_xpath("//ol[#class='web_regular_results']//h3//a")
except:
links = browser.find_elements_by_xpath("//h3//a")
results = []
for link in links:
href = link.get_attribute("href")
print(href)
results.append(href)
browser.close()
return results
get_results("dog")
Does anyone know what is wrong with this? Basically it gets to search_box.submit() then skips everything until browser.close().
Unlike find_element_by_xpath (single WebElement) If find_elements_by_xpath won't find any results it won't throw an exception, it will return an empty list. links is empty so the for loop is never executed. You can change the try except to if condition, and check if it has values
links = browser.find_elements_by_xpath("//ol[#class='web_regular_results']//h3//a")
if not links:
links = browser.find_elements_by_xpath("//h3//a")
It is not recommended to use browser close function within the function that you are testing. Instead you can use after the get_results("dog") function and keep the test logic away.
get_results("dog")
browser.close()
By doing this way selenium will complete the execution of the function first and then close the browser window.
The problem with your solution is that the method is returning the result set after the browser is closing the window due to which you are facing logical problem with your script.
I have a simple program that logs into Facebook and gets 3 urls:
def setup_driver():
prefs = {"profile.default_content_setting_values.notifications": 2}
chrome_options = webdriver.ChromeOptions()
chrome_options.add_experimental_option("prefs", prefs)
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
driver = webdriver.Chrome(executable_path="./chromedriver_linux",
chrome_options=chrome_options)
return driver
def log_into_facebook(driver):
driver.get("https://www.facebook.com/")
email_field = driver.find_element_by_id("email")
email_field.send_keys("<MY EMAIL ADDRESS>")
password_field = driver.find_element_by_id("pass")
password_field.send_keys("<MY FB PASSWORD>")
driver.find_element_by_id("loginbutton").click()
if __name__ == "__main__":
driver = setup_driver()
log_into_facebook(driver)
print("before getting url 1")
driver.get('https://facebook.com/2172111592857876')
print("before getting url 2")
#Stackoverflow is breaking indentation
driver.get('https://www.facebook.com/beaverconfessions/posts/2265225733546461')
print("before getting url 3")
driver.get('https://www.facebook.com/beaverconfessions/posts/640487179353666')
print("finished getting 3 urls")
On my local machine, this program runs fine. However, on my AWS EC2 instance, this program makes my instance unusable (the Python script will hang/stall after "before getting url 2" is printed to the console. While the script is hanging, the EC2 instance will become so slow that other programs on the instance will also stop working properly. I need to forcefully close the program with Ctrl-C in order for the instance to start being responsive again.). However, if I comment out log_into_facebook(driver), then the program runs fine.
I would try to get an stacktrace, but the program doesn't actually crash, rather it just never reaches "before getting url 3".
It is worth nothing, previously I was getting "invalid session id" errors with a program that was similar to this (it also logged into Facebook and then called driver.get several times).
Update: Removing the --no-sandbox option from the webdriver seemed to fix the problem. I'm not sure why. I originally had this option in place because I was previously having a "unable to fix open pages" error, and I read that "--no-sandbox" would fix the error.
chrome_options.add_argument('--no-sandbox')
Roymunson reports that the appropriate way to fix the hanging problem is:
Avoid specifying the --no-sandbox option in the webdriver.
I want to get the current url when I am running Selenium.
I looked at this stackoverflow page: How do I get current URL in Selenium Webdriver 2 Python?
and tried the things posted but it's not working. I am attaching my code below:
from selenium import webdriver
#launch firefox
driver = webdriver.Firefox()
url1='https://poshmark.com/search?'
# search in a window a window
driver.get(url1)
xpath='//input[#id="user-search-box"]'
searchBox=driver.find_element_by_xpath(xpath)
brand="freepeople"
style="top"
searchBox.send_keys(' '.join([brand,"sequin",style]))
from selenium.webdriver.common.keys import Keys
#EQUIValent of hitting enter key
searchBox.send_keys(Keys.ENTER)
print(driver.current_url)
my code prints https://poshmark.com/search? but it should print: https://poshmark.com/search?query=freepeople+sequin+top&type=listings&department=Women because that is what selenium goes to.
The issue is that there is no lag between your searchBox.send_keys(Keys.ENTER) and print(driver.current_url).
There should be some time lag, so that the statement can pick the url change. If your code fires before url has actually changed, it gives you old url only.
The workaround would be to add time.sleep(1) to wait for 1 second. A hard coded sleep is not a good option though. You should do one of the following
Keep polling url and wait for the change to happen or the url
Wait for a object that you know would appear when the new page comes
Instead of using Keys.Enter simulate the operation using a .click() on search button if it is available
Usually when you use click method in selenium it takes cared of the page changes, so you don't see such issues. Here you press a key using selenium, which doesn't do any kind of waiting for page load. That is why you see the issue in the first place
I had the same issue and I came up with solution that uses default explicit wait (see how explicit wait works in documentation).
Here is my solution
class UrlHasChanged:
def __init__(self, old_url):
self.old_url = old_url
def __call__(self, driver):
return driver.current_url != self.old_url:
#contextmanager
def url_change(driver):
current_url = driver.current_url
yield
WebDriverWait(driver, 10).until(UrlHasChanged(current_url))
Explanation:
At first, I created my own wait condition (see here) that takes old_url as a parameter (url from before action was made) and checks whether old url is the same like current_url after some action. It returns false when both urls are the same and true otherwise.
Then, I created context manager to wrap action that I wanted to make, and I saved url before action was made, and after that I used WebDriverWait with created before wait condition.
Thanks to that solution I can now reuse this function with any action that changes url to wait for the change like that:
with url_change(driver):
login_panel.login_user(normal_user['username'], new_password)
assert driver.current_url == dashboard.url
It is safe because WebDriverWait(driver, 10).until(UrlHasChanged(current_url)) waits until current url will change and after 10 seconds it will stop waiting by throwing an exception.
What do you think about this?
I fixed this problem by clicking on the button by using href. Then do driver.get(hreflink). Click() was not working for me!
This code is supposed to open a new browser set at "www.website.com," submit a username and password, wait for the page to load after being submitted, navigate to a new page, and inject a javascript code into the address bar.
My current results from this code are opening a new browser set at "www.website.com," submit a username and password. The submit works, then from here, the code breaks and instead of navigating to the next page (page2) it just hangs.
Even when I add an ignore command to the _IEFormSubmit($oForm) I can't get my page to navigate to the next page.
#include <IE.au3>
#RequireAdmin
Local $oIE = _IECreate("http://www.website.com")
;_IELinkClickByText($oIE, "Sign In") ;Optional
Local $oForm = _IEFormGetObjByName($oIE, "regular-user-login-form")
Local $oText = _IEFormElementGetObjByName($oForm, "log")
_IEFormElementSetValue($oText, "username")
Local $oText = _IEFormElementGetObjByName($oForm, "pwd")
_IEFormElementSetValue($oText, "password")
_IEFormSubmit($oForm)
_IENavigate($oIE, "http://www.website.com/page2/")
Send("{F4}javascript:check_in();{ENTER}")
Please for the love of god, what am I doing wrong.
Edit: I've also changed the _IEFormSubmit($oForm) to another javascript submit and I can still log in without any problems, but once I reach that next page I can't use _IENavigate, so the problem has to lie there.
I have automated so many different pages and sometimes the script gets stuck on _IEFormSubmit for no reason. That's an AutoIt bug.
Her's a quick fix for that
_IEFormSubmit($oForm, 0)
_IELoadWait($oIE, 1000)
Your code should be:
#include <IE.au3>
#RequireAdmin
Local $oIE = _IECreate("http://www.website.com")
;_IELinkClickByText($oIE, "Sign In") ;Optional
Local $oForm = _IEFormGetObjByName($oIE, "regular-user-login-form")
Local $oText = _IEFormElementGetObjByName($oForm, "log")
_IEFormElementSetValue($oText, "username")
Local $oText = _IEFormElementGetObjByName($oForm, "pwd")
_IEFormElementSetValue($oText, "password")
_IEFormSubmit($oForm, 0)
_IELoadWait($oIE, 1000)
_IENavigate($oIE, "http://www.website.com/page2/")
Send("{F4}javascript:check_in();{ENTER}")
Onetime, I tried to report two bugs to autoit forum, but the guys who are there are pretty frustrated fellas. So, now when I find one, I just solve it my way.
Off topic:
When dealing with objects you should know of another bug
If $oInput.outertext = "Continue" then ...
The above code will sometime fail even if the outertext matches.
This is solved using, use
If $oInput.outertext == "Continue" then ...
I want to download a page with javascript executed using python. QT is one of solutions and here is the code:
class Downloader(QApplication):
__event = threading.Event()
def __init__(self):
QApplication.__init__(self, [])
self.webView = QWebView()
self.webView.loadFinished.connect(self.loadFinished)
def load(self, url):
self.__event.clear()
self.webView.load(QUrl(url))
while not self.__event.wait(.05): self.processEvents()
return self.webView.page().mainFrame().documentElement() if self.__ok else None
def loadFinished(self, ok):
self.__ok = ok
self.__event.set()
downloader = Downloader()
page = downloader.load(url)
The problem is that sometimes downloader.load() return a page without javascript executed. Downloader.loadStarted() and Downloader.loadFinished() are called only once.
What is the proper way to wait for a complete page download?
EDIT
If add self.webView.page().networkAccessManager().finished.connect(request_ended) into __init__() and define
def request_ended(reply):
print(reply.error(), reply.url().toString())
then it turns out that sometimes reply.error()==QNetworkReply.UnknownNetworkError. This behaviour stands when unreliable proxy is used, that fails to download some of the resources (part of which are js files), hence some of js not being executed. When proxy is not used (== connection is stable), every reply.error()==QNetworkReply.NoError.
So, the updated question is:
Is it possible to retry getting reply.request() and apply it to the self.webView?
JavaScript requires a runtime to be executed with (python alone won't do) a popular one is PhantomJS these days.
Unfortuantely, PhantomJs has no python support anymore so you could resort to e.g. Ghost.py to do this job for you which allows you to selectively execute JS you want.
You should use Selenium
It provides different WebDriver, for example, PhantomJS, or other common browsers, like firefox.