How to avoid selenium driver closing automatically? - python-3.x

Before my question, it might be helpful to show you the general structure of my code:
class ItsyBitsy(object):
def __init__(self):
self.targets_a = dict() # data like url:document_summary
# filled by another function
def visit_targets_a(self):
browser = webdriver.Safari()
for url in self.targets_a.keys():
try:
browser.switch_to.new_window('tab')
browser.get(url)
time.sleep(2)
except Exception as e:
print(f'{url} FAILED: {e}')
continue
# do some automation stuff
time.sleep(2)
print('All done!')
I can then instantiate the class and call my method without any issues:
spider = ItsyBitsy()
spider.visit_targets_a()
>>> All done!
However after every tab is opened and automations are completed, the window closes without any prompt even though I do not have browser.close() or browser.exit() anywhere in my code.
My band-aid fix is calling time.sleep(9999999999999999) on the last loop, which keeps the window open indefinitely due to the Overflow Error, but it is obviously not a solution.
So, how do I stop the browser from exiting?!
Bonus points if you can educate me on why this is happening.
Thanks guys/gals!

You need to override exit and prevent 'browser.quit()' from automatically happening.
This keeps the browser open if you set teardown=False:
class ItsyBitsy(object):
def __init__(self, teardown=False):
self.teardown = teardown
self.targets_a = dict() # data like url:document_summary
# filled by another function
self.browser = webdriver.Safari()
def visit_targets_a(self):
for url in self.targets_a.keys():
try:
self.browser.switch_to.new_window('tab')
self.browser.get(url)
time.sleep(2)
except Exception as e:
print(f'{url} FAILED: {e}')
continue
# do some automation stuff
time.sleep(2)
print('All done!')
def __exit__(self, exc_type, exc_val, exc_tb):
if self.teardown:
self.browser.quit()
spider = ItsyBitsy(teardown=False)
spider.visit_targets_a()

Are you using VS Code? Half year ago I had the same problem and switching to Sublime text fixed this. This problem appers because VS Code has a bit wierd way to run python code (via extension) - it kills all processes which were created by the script when the last line of code has been excecuted.

Related

Django initialising AppConfig multiple times

I wanted to use the ready() hook in my AppConfig to start django-rq scheduler job. However it does so multiple times, every times I start the server. I imagine that's due to threading however I can't seem to find a suitable workaround. This is my AppConfig:
class AnalyticsConfig(AppConfig):
name = 'analytics'
def ready(self):
print("Init scheduler")
from analytics.services import save_hits
scheduler = django_rq.get_scheduler('analytics')
scheduler.schedule(datetime.utcnow(), save_hits, interval=5)
Now when I do runserver, Init scheduler is displayed 3 times. I've done some digging and according to this question I started the server with --noreload which didn't help (I still got Init scheduler x3). I also tried putting
import os
if os.environ.get('RUN_MAIN', None) != 'true':
default_app_config = 'analytics.apps.AnalyticsConfig'
in my __init__.py however RUN_MAIN appears to be None every time.
Afterwards I created a FileLock class, to skip configuration after the first initialization, which looks like this:
class FileLock:
def __get__(self, instance, owner):
return os.access(f"{instance.__class__.__name__}.lock", os.F_OK)
def __set__(self, instance, value):
if not isinstance(value, bool):
raise AttributeError
if value:
f = open(f"{instance.__class__.__name__}.lock", 'w+')
f.close()
else:
os.remove(f"{instance.__class__.__name__}.lock")
def __delete__(self, obj):
raise AttributeError
class AnalyticsConfig(AppConfig):
name = 'analytics'
locked = FileLock()
def ready(self):
from analytics.services import save_hits
if not self.locked:
print("Init scheduler")
scheduler = django_rq.get_scheduler('analytics')
scheduler.schedule(datetime.utcnow(), save_hits, interval=5)
self.locked = True
This does work, however the lock is not destroyed after the app quits. I tried removing the .lock files in settings.py but it also runs multiple times, making this pointless.
My question is: How can I prevent django from calling ready() multiple times, or how otherwise can I teardown the .lock files after django exits or right after it boots?
I'm using python 3.8 and django 3.1.5

Cause python to exit if any thread has an exception

I have a python3 program that starts a second thread (besides the main thread) for handling some events asynchronously. Ideally, my program works without a flaw and never has an unhandled exceptions. But stuff happens. When/if there is an exception, I want the whole interpreter to exit with an error code as if it had been a single thread. Is that possible?
Right now, if an exception occurs on the spawned thread, it prints out the usual error information, but doesn't exit. The main thread just keeps going.
Example
import threading
import time
def countdown(initial):
while True:
print(initial[0])
initial = initial[1:]
time.sleep(1)
if __name__ == '__main__':
helper = threading.Thread(target=countdown, args=['failsoon'])
helper.start()
time.sleep(0.5)
#countdown('THISWILLTAKELONGERTOFAILBECAUSEITSMOREDATA')
countdown('FAST')
The countdown will eventually fail to access [0] from the string because it's been emptied causing an IndexError: string index out of range error. The goal is that whether the main or helper dies first, the whole program dies alltogether, but the stack trace info is still output.
Solutions Tried
After some digging, my thought was to use sys.excepthook. I added the following:
def killAll(etype, value, tb):
print('KILL ALL')
traceback.print_exception(etype, value, tb)
os.kill(os.getpid(), signal.SIGKILL)
sys.excepthook = killAll
This works if the main thread is the one that dies first. But in the other case it does not. This seems to be a known issue (https://bugs.python.org/issue1230540). I will try some of the workarounds there.
While the example shows a main thread and a helper thread which I created, I'm interested in the general case where I may be running someone else's library that launches a thread.
Well, you could simply raise an error in your thread and have the main thread handle and report that error. From there you could even terminate the program.
For example on your worker thread:
try:
self.result = self.do_something_dangerous()
except Exception as e:
import sys
self.exc_info = sys.exc_info()
and on main thread:
if self.exc_info:
raise self.exc_info[1].with_traceback(self.exc_info[2])
return self.result
So to give you a more complete picture, your code might look like this:
import threading
class ExcThread(threading.Thread):
def excRun(self):
pass
#Where your core program will run
def run(self):
self.exc = None
try:
# Possibly throws an exception
self.excRun()
except:
import sys
self.exc = sys.exc_info()
# Save details of the exception thrown
# DON'T rethrow,
# just complete the function such as storing
# variables or states as needed
def join(self):
threading.Thread.join(self)
if self.exc:
msg = "Thread '%s' threw an exception: %s" % (self.getName(), self.exc[1])
new_exc = Exception(msg)
raise new_exc.with_traceback(self.exc[2])
(I added an extra line to keep track of which thread is causing the error in case you have multiple threads, it's also good practice to name them)
My solution ended up being a happy marriage between the solution posted here and the SIGKILL solution piece from above. I added the following killall.py submodule to my package:
import threading
import sys
import traceback
import os
import signal
def sendKillSignal(etype, value, tb):
print('KILL ALL')
traceback.print_exception(etype, value, tb)
os.kill(os.getpid(), signal.SIGKILL)
original_init = threading.Thread.__init__
def patched_init(self, *args, **kwargs):
print("thread init'ed")
original_init(self, *args, **kwargs)
original_run = self.run
def patched_run(*args, **kw):
try:
original_run(*args, **kw)
except:
sys.excepthook(*sys.exc_info())
self.run = patched_run
def install():
sys.excepthook = sendKillSignal
threading.Thread.__init__ = patched_init
And then ran the install right away before any other threads are launched (of my own creation or from other imported libraries).
Just wanted to share my simple solution.
In my case I wanted the exception to display as normal but then immediately stop the program. I was able to accomplish this by starting a timer thread with a small delay to call os._exit before raising the exception.
import os
import threading
def raise_and_exit(args):
threading.Timer(0.01, os._exit, args=(1,)).start()
raise args[0]
threading.excepthook = raise_and_exit
Python 3.8 added threading.excepthook which makes it possible to handle this more cleanly.
I wrote the package "unhandled_exit" to do just that. It basically adds os._exit(1) to after the default handler. This means you get the normal backtrace before the process exits.
Package is published to pypi here: https://pypi.org/project/unhandled_exit/
Code is here: https://github.com/rfjakob/unhandled_exit/blob/master/unhandled_exit/\_\_init__.py
Usage is simply:
import unhandled_exit
unhandled_exit.activate()

Transfer of data between Python files

Need some help please to explain why the following does not work.
Environment: Python 3.4, Gtk3.0, limited experience of Python
File selectcontact.py contains code to select one of a number of records and pass its key back to its parent process for use in one of at least three other actions.
Code snippet from the parent class:
….
self.cindex = 0
….
def editcontact_clicked (self, menuitem):
import selectcontact
selectcontact.SelectContactGUI(self)
print ('From Manage ', self.cindex)
if self.cindex > 0:
import editcontact
editcontact.EditContactGUI(self.db, self.cindex)
….
Code snippet from selectcontact:
class SelectContactGUI:
def init(self, parent_class):
self.builder = Gtk.Builder()
self.builder.add_from_file(UI_FILE)
self.builder.connect_signals(self)
self.parent_class = parent_class
self.db = parent_class.db
self.cursor = self.db.cursor(cursor_factory = psycopg2.extras.NamedTupleCursor)
self.contact_store = self.builder.get_object('contact_store')
self.window = self.builder.get_object('window1')
self.window.show_all()
def select_contact_path(self, path):
self.builder.get_object('treeview_selection1').select_path(path)
def contact_treerow_changed (self, treeview):
selection = self.builder.get_object('treeview_selection1')
model, path = selection.get_selected()
if path != None:
self.parent_class.cindex = model[path][0]
print ('From select ', self.parent_class.cindex)
self.window.destroy()
….
window1 is declared as “modal”, so I was expecting the call to selectcontact to act as a subroutine, so that editcontact wouldn’t be called until control was passed back to the parent. The parent_class bit works because the contact_store is correctly populated. However the transfer back to the parent appears not to work, and the two print statements occur in the wrong order:
From Manage 0
From select 2
Comments gratefully received.
Graeme
"Modal" refers to windows only. That is, a modal window prevents accessing the parent window.
It has little to do with what code is running. I am not familiar with this particular windowing framework, but any I have worked with has had a separate thread for GUI and at least one for processing, to keep the GUI responsive, and message loops running in all active windows, not just the one currently with the focus. The modal dialog has no control over what code in other threads are executed when.
You should be able to break into the debugger and see what threads are running and what is running in each thread at any given time.

Keep GIF animation running while doing calculations

I am trying to improve the user experience by showing a load mask above the active QMainWindow/QDialog when performing tasks that takes some time. I have managed to get it working as I want it, except for a moving GIF when performing the task. If I leave the load mask on after the task is complete, the GIF starts moving as it should.
My class for the load mask:
from PyQt4 import QtGui, QtCore
from dlgLoading_view import Ui_dlgLoading
class dlgLoading(QtGui.QDialog, Ui_dlgLoading):
def __init__(self,parent):
QtGui.QDialog.__init__(self,parent)
self.setupUi(self)
self.setWindowFlags(QtCore.Qt.WindowFlags(QtCore.Qt.FramelessWindowHint))
self.setGeometry(0, 0, parent.frameGeometry().width(), parent.frameGeometry().height())
self.setStyleSheet("background-color: rgba(255, 255, 255, 100);")
movie = QtGui.QMovie("loader.gif")
self.lblLoader.setMovie(movie)
movie.start()
def showEvent(self, event):
QtGui.qApp.processEvents()
super(dlgLoading, self).showEvent(event)
def setMessage(self,message):
self.lblMessage.setText(message)
The Ui_dlgLoading contains two labels and some vertical spacers: lblLoader (will contain the gif) and lblMessage (will contain a message if needed)
I create the load mask with this code:
loadmask = dlgLoading(self)
loadmask.setMessage('Reading data... Please wait')
loadmask.show()
I figured I needed some multithreading/multiprocessing, but I can't for the life of me figure out how to do it. I read somewhere that you can't tamper with the GUIs threading, so I would need to move the heavy task there instead, but I'm still blank.
As a simple example, let's say I am trying to load a huge file into memory:
file = open(dataFilename, 'r')
self.dataRaw = file.read()
file.close()
Around that I would create and close my load mask dialog. How do I start the file read without halting the GIF animation?
The GUI is for running some heavy external exe files, so it should work with that too.
I ended up doing this:
class runthread(threading.Thread):
def __init__(self, commandline, cwd):
self.stdout = None
self.stderr = None
self.commandline = commandline
self.cwd = cwd
self.finished = False
threading.Thread.__init__(self)
def run(self):
subprocess.call(self.commandline, cwd=self.cwd)
self.finished = True
class command()
def __init__(self):
...
def run():
...
thread = runthread("\"%s\" \"%s\"" % (os.path.join(self.__caller.exefolder, "%s.exe" % self.__cmdtype), self.__name), self.__caller.exeWorkdir)
thread.start()
count = 0
sleeptime = 0.5
maxcount = 60.0/sleeptime
while True:
time.sleep(sleeptime)
QtWidgets.qApp.processEvents()
count += 1
if thread.finished:
break
if count >= maxcount:
results = QtWidgets.QMessageBox.question(self.__caller, "Continue?", "The process is taking longer than expected. Do you want to continue?", QtWidgets.QMessageBox.Yes | QtWidgets.QMessageBox.No)
if results == QtWidgets.QMessageBox.Yes:
count == 0
else:
QtWidgets.QMessageBox.warning(self.__caller, "Process stopped", "The process was stopped")
return False
It actually doesn't directly answer my question, but it worked for me, so I'm posting the answer if others want to do something similar.
I call a process (in this case Pythons subprocess.call) through a thread and track when the process is actually finished. A continuous loop checks periodically if the process is done and updates the GUI (processEvents - this is what triggers the GIF to update). To avoid an infinite loop I offer the user an option to exit after some time.

Tornado thread blocking task

Can someone help me solve this problem. Here is my code:
class Handler(RequestHandler):
#asynchronous
def get(self):
res = 'result '
_t = threading.Thread(target=self._thread, args=(res,))
print _t, time.time()
_t.start()
def _thread(self, response):
time.sleep(5)
IOLoop.instance().add_callback(callback=lambda: self.print_response(response))
def print_response(self, _response):
self.write(_response)
self.finish()
application = Application([
(r'/', Handler),
])
if __name__ == '__main__':
application.listen(8889)
IOLoop.instance().start()
On browser, visit localhost:8889 in one tab and localhost:8889 in another: I'll see that “result” is not printed in the second tab until the first one has finished, after 5 seconds.
I think I was created 2 threads parallel processing, and when finished, add_callback result main loop. Tab2 should have results shortly after tab1 finished???
If I copy Handle class to Handle1 class, adding route r'/1', Handle1. Try again, localhost:8889 and localhost:8889/1 ---> It's will ok.
Anyone can explain to me this problem and how to sloved it.
Thank you!
It's not tornado, it's the browser. Browsers don't like making multiple requests for the same thing, so they won't send the second request until the first has finished. If you use two different browsers (or different urls), you'll see that it works.

Resources