Selenium Connection Refused - python-3.x

I am scraping Google search pages using Python/Selenium, and since last night I have been encountering a MaxRetyError: p[Errno 61] Connection refused error. I debugged my code and found that the error begins in this code block right here"
domain = pattern.search(website)
counter = 2
# keep running this until the url appears like normal
while domain is None:
counter += 1
# close chrome and try again
print('link not found, closing chrome and restarting ...\nwaiting {} seconds...'.format(counter))
chrome.quit()
time.sleep(counter)
# chrome = webdriver.Chrome()
time.sleep(10) ### tried inserting a timer.sleep to delay request
chrome.get('https://google.com') ### error is right here. This is the second instance of chrome.get in this script
target = chrome.find_element_by_name('q')
target.send_keys(college)
target.send_keys(Keys.RETURN)
# parse the webpage
soup = BeautifulSoup(chrome.page_source, 'html.parser')
website = soup.find('cite', attrs={'class': 'iUh30'}).text
print('tried to get URL, is this it? : {}\n'.format(website))
pattern = re.compile(r'\w+\.(edu|com)')
domain = pattern.search(website)
I keep getting the following error:
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='ADDRESS', port=PORT): Max retries exceeded with url: /session/92ca3da95353ca5972fb5c520b704be4/url (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x11100e4e0>: Failed to establish a new connection: [Errno 61] Connection refused',))
As you can see in the code block above, I entered a timer.sleep() but it doesn't appear to help at all. For context, this script is part of a function, which in another script is repeatedly called in a loop. But again, I make sure to add delays in between each call of the webdriver.get() method. As of now, my script fails at the first iteration of this loop.
I tried googling the issue but the closest thing I found was this. It appears to speak to the same exact error, and the top answer identifies the same method that is causing the issue, but I don't really understand what the Solution and Conclusion sections are saying. I get that the MaxRetryError is confusing for debugging, but what precisely is the solution?
It mentions a max_retries argument, and Tracebacks, but I don't know what they mean in this context. Is there any way that I can catch this error (in the context of selenium)? I have some threads on Stack Exchange with mention of catching an error, but only in the context of urllib3. In my case, I would need to catch the same error for the Selenium package.
Thanks for any advice

My code still runs into issues every once in a while (which could be solved by using proxies), but I think I found the source of the issue. This loop anticipates that the first pattern match will return a .edu or .com, but does not anticipate for a .org. Therefore, my code runs indefinitely when the first search result returns a .org. Here is the source of the issue:
website = soup.find('cite', attrs={'class': 'iUh30'}).text
print('tried to get URL, is this it? : {}\n'.format(website))
pattern = re.compile(r'\w+\.(edu|com)') # does not anticipate .org's
Now my code runs okay, though I do run into errors when the code runs for too long (in this case the source of the issue is much clearer).

You are quitting the Chrome driver too early. After you call chrome.quit() it will cause subsequent calls to chrome.get('https://google.com') to fail and then automated retries lead to the MaxRetryError.
Try removing the call to chrome.quit().

Related

Selenium Vba Excel - Timeout and raise not working when use .click

friends.
I need some help, please.
I'm trying to disable the timeout alert in an instruction and pass a timeout parameter, however it's not working:
Driver.FindElementById("ctl01_lnkCharacter", 5000, False).Click
I've already tried the way below too, the same situation happens
Driver.FindElementById("ctl01_lnkCharacter", timeout:=5000, raise:=False).Click
The timeout parameters (5000) and raise for not to alert in case of timeout (False) are ignored. It uses the default timeout parameter, or if it has passed the parameter explicitly before, this value is assumed.
I tested passing the different parameter above to test:
Driver.Timeouts.PageLoad = 1500
Driver.FindElementById("ctl01_lnkCharacter", 5000, False).Click
In this case, the timeout is considered to be 1500 and not 5000 (I did this just to test) and the "False" is ignored, if it reaches the timeout there is an error alert.
In the instruction below it works perfectly:
Driver.Get url, 2000, False
It respects the timeout (2000) and if it is reached there is no error alert.
The problem is only in the .click statement
When I use .FindElementById(...).click What is passed in the parameters as timeout and raise are ignored.
.FindElementById("ctl01_lnkCharacter", TIMEOUT, RAISE).click
I thought I'd try to use a variable, maybe it could works, for example
var = Driver.FindElementById("ctl01_lnkCharacter", 5000, False)
var.click
But I couldn't do that, the variable is not accepted, does anyone have any idea of a solution for this case?
EDIT:
I Forgot to say, the problem is not the timeout, i can pass the timeout before, explicitly.
The problem is the raise, I need to disable the error alert in case of timeout. Is there any other way to disable the error alert in case of timeout (without using On Error Resume Next, I can't use this )
EDIT2:
Sorry, I forgot to say too. The button is clicked correctly, the page opens correctly, but sometimes some images are loaded and I don't need to wait for the loading time and it takes a long time and this time varies a lot, while these images are loading it hits the timeout and generates the error.
That's why I need to disable the timeout error alert (raise parameter False).

Issue wit checking for time on check

Hi I'm currently having an issue getting the time allocated for the user which is saved on a postgres database. What I'm trying to achieve is when the users duration expires a role is removed I'm wanting to get the time from the database when the check runs but this doesn't seem to be working,
My console is not outputting an error but the check doesn't seem to be running.
Here is what I'm working with:
#tasks.loop(seconds=5.0)
async def check_mute(self):
guild = self.bot.get_guild(750744359632121661)
restricted = discord.utils.get(member.guild.roles, name="Restricted")
members = discord.utils.get(member.guild.roles, name="Members")
for member in list(guild.members):
conn = psycopg2.connect(DATABASE_URL, sslmode='require')
cursor = conn.cursor()
cursor.execute("SELECT time FROM blacklist WHERE username=%s", (member.id, ))
restricted_role = get(ctx.guild.roles, name="Restricted")
muted_time = cursor.fetchone()
current_time = dt.datetime.now()
mute_elapsed_time = current_time - muted_time
if member.id:
await member.add_roles(members)
await member.remove_roles(restricted, reason=f'Restricted role removed by {author.name}')
You're not getting errors because tasks don't throw any errors by default. In order to get some info out of them, you need to write a custom error handler for them.
With that said, I do see a few issues that might cause your loop to break.
First of all, the variables ctx and author are never defined anywhere in your code fragment, but you're using them. This will throw errors & interrupt the loop.
Are you starting your loop using check_mute.start()? Tasks need to be started before they run, and your code fragment doesn't have this.
the check doesn't seem to be running
I don't see you checking the time difference anywhere. You said the check didn't happen, but it just isn't there in the first place. You're defining mute_elapsed_time, but never using it to check if enough time has elapsed or not.
After fixing those, could you put some debug prints in your task to see where it gets & when it stops? If something else is wrong, that might help identify it.
PS. This is unrelated, but you're get'ing restricted_role in the loop for every member, while you can just do that once above the loop (and you already did do it above the loop, so it's really unnecessary). You're not even using it as far as I can see so consider removing it ^^. That's also the line where the ctx is (which doesn't exist) so removing it all together might be a good idea.

How can I make .goto non-blocking?

I'm writing a rails app which fetches text from an HTML page using Watir and Chrome Headless. All good so far!
The problem starts when I request a page which has a long load time to completely load all elements despite the fact that I don't need them.
Current code I use:
browser = Watir::Browser.new :chrome, headless: true
browser.goto(url)
The .goto function call, however, blocks until ALL elements have loaded. That's not really what I need - what I need is for goto to just start fetching the page, then continue running code since I really just want to wait until the text I need is present, then fetch it.
Any ideas?
Goto will not leave the control until 60 seconds, If page load time exceeds 60 seconds, then it would throw the error. And also Watir.default_timeout has nothing to do with Goto's page loading. You need to set the timings for page_load which you can do by directly calling selenium driver as I have done below because Watir hasn't offered any systax for that
Write the below code, you could achieve what you want
begin
b.driver.manage.timeouts.page_load=5
b=Watir::Browser.new
b.goto(url)
rescue #I have written the rescue block here because goto will the error for you If page is not loaded within a given time
end
AND THEN you can write your rest of the code here, for an example,
puts b.span(text: 'something').text
What happens here is, goto will be block the execution of the code followed by goto for 5 seconds, and then it would fall into the rescue block, so program would continue to execute next line as you expected.
With the new w3c webdriver specification, you can set the page load strategy to 'none.' https://w3c.github.io/webdriver/webdriver-spec.html#navigation
Only Firefox and IE might have this implemented already.

Make Watir-webdriver to load a page for limited time and to be able to retrieve information later

I know that there are several questions related to implementation of waiting and timeouts in Watir, however I have not found an answer to my problem (which must be common). I use Watir-webdriver for testing of a page that due to AJAX implementation loads portion-by-portion for very long time (more than 5 min). I need to be able just to sample this page for a limited time (20-40 sec) and to be able to analyze the information that is loaded during this short time. However, as I know, there is no straightforward direct mechanism to tell Watir::Browser to stop. I can use the Timeout, but although my script gets the control after rescue, it is impossible to interrogate the browser and verify the information that it is able to received during the timeout window. All I can do at this point is to kill the process and restart the browser as discussed here: Make headless browser stop loading page and elsewhere.
The code below illustrates my situation. In this example I have a global timeout (30 sec) and a local timeout (15 sec) used for reading the page. It never gets to b.text call; the script just outputs the first exception after 15 sec and then it keeps waiting for the browser to be released and after the global timeout of 30 sec prints the second exception message.
Time out. Got into exception branch
Dropped to bottom rescue.
The end.
I also tried to send an 'escape' key to the browser, but any communication with it while it is in the goto method is impossible. Any tips and suggestions will be appreciated!
require 'watir-webdriver'
require 'timeout'
client = Selenium::WebDriver::Remote::Http::Default.new
client.timeout = 30 # Set the global timeout
b = Watir::Browser.new :chrome, :http_client => client
my_url = '...here is my address...'
begin
begin
Timeout::timeout(15) { b.goto my_url } # Access the page with local timeout
b.close # if all is unbelievably good and the page is loaded
rescue Exception => e
puts 'Time out. Got into exception branch'
if b.text.include? 'my_text' # NEVER GETS HERE
puts 'Yes, I see the text!'
else
puts 'I do not see the text.'
end
end
rescue Exception => e
puts 'Dropped to bottom rescue.'
end
puts 'The end.'
Watir relies on Selenium WebDriver to handle calls to the browser. At this time all browsers require that the document.readyState of the current frame return "complete" before returning control to your code.
A recent update to the webdriver specification appears to allow for the possibility of a browser driver implementing a page loading strategy that is not blocking, but it is not a requirement and is not supported at this time.
https://w3c.github.io/webdriver/webdriver-spec.html#the-page-load-strategy

How to ignore specific errors from the New Relic Dashboard

Helllo,
My application is a web server that fires many requests to other servers. We set up a maximum timeout on those requests, and whenever the timeout is reached, the connection is closed and a ESOCKETTIMEDOUT rises.
Error: socket hang up
at createHangUpError (http.js:1472:15)
at Socket.socketCloseListener (http.js:1522:23)
at Socket.EventEmitter.emit (events.js:117:20)
at TCP.close (net.js:465:12)
I want to exclude these errors from the New Relic Dashboard, since they distort the error rate and other metrics. Hiding them doesn't work either, because they still count in the error rate.
How can remove specific errors (that do not have a HTTP status code) from my Dashboard?
You can pass status codes to ignore to the error collector. If you are configuring the New Relic agent using environment variables you can use a comma separated list of codes as the value for NEW_RELIC_ERROR_COLLECTOR_IGNORE_ERROR_CODES.
See the README.
If you are using newrelic.js to do so you can set the error_collector.ignore_codes value to an Array of status codes to ignore:
See the example config.
Important caveat: when setting this value manually you are overriding the default value of 404 which means that if you do not specify 404 in your manual configuration the Error Collector will start logging all 404 errors in your application (which you probably do not want).
I noticed you have javascript, I'm not sure if my solution can help you but I'll answer in the hope it does.
I use Java agent, and we have the same kind of problem. So far the only way that I found that can do something near what I want is having the specific errors wrapped in a dedicated exception ("NewRelicIgnorableException") and wrap whatever error I don't want to see in it.
Then I'd have to go into the dashboard/application and select "error collection". Last, I'd fill in the "Ignore these errors" with the full package name AND exception class name, like com.mypackage.NewRelicIgnorableException. Save and enjoy. These particular errors should not impact your apdex, but they will still count towards RPM and other metrics.
Other solutions have drawbacks. For example if I call ignoreexception the RPM and time metrics will not count. If you click the "hide error" button you only hide them from the error panel, but everything else will be as usual. If you ignore by status code you can get more or less the same results as ignoring the specific exception, but without any hope for fine control.
It's a pity that there's so little documentation on their site, I had to run tests to find these out.

Resources